OmniCustom: Sync Audio-Video Customization Via Joint Audio-Video Generation Model
This new model can clone your face and voice into any video scenario.
Researchers have unveiled OmniCustom, a new AI model that synchronously customizes both video and audio from single references. Given one reference image and one audio clip, it generates a video where the subject maintains the reference's visual identity and vocal timbre, while speaking any text prompt. Built on a Diffusion Transformer framework with specialized LoRA modules and contrastive learning, it was trained on a large-scale human dataset and outperforms existing methods in fidelity.
Why It Matters
It enables hyper-realistic, personalized video content creation for marketing, entertainment, and synthetic media, all from minimal input.