Audio & Speech

OmniCustom AI Generates Custom Videos With Matching Audio in One Click

This new model can clone your face and voice into any video scenario.

Deep Dive

Researchers have unveiled OmniCustom, a new AI model that synchronously customizes both video and audio from single references. Given one reference image and one audio clip, it generates a video where the subject maintains the reference's visual identity and vocal timbre, while speaking any text prompt. Built on a Diffusion Transformer framework with specialized LoRA modules and contrastive learning, it was trained on a large-scale human dataset and outperforms existing methods in fidelity.

Why It Matters

It enables hyper-realistic, personalized video content creation for marketing, entertainment, and synthetic media, all from minimal input.

📬 Get the top 10 AI stories daily