You can try Qwen3.5-Omni on hf now
The new 72B parameter model processes images, audio, and text in a single architecture, available now on Hugging Face.
Alibaba's QWen team has officially launched Qwen3.5-Omni for public testing, making the powerful multimodal model instantly accessible via a free demo on Hugging Face. This release marks a significant step in open-source AI, offering a unified 72-billion-parameter architecture that natively processes text, images, audio, and documents without relying on cascading separate models. The online demo allows anyone to experiment with complex, cross-format interactions, from analyzing uploaded documents and images to holding conversations that incorporate audio cues.
The model's "Omni" designation highlights its all-in-one design, aiming to provide more coherent and contextually aware responses across modalities compared to stitched-together systems. By hosting it on Hugging Face, the team encourages immediate community feedback and benchmarking, positioning it as a strong, accessible contender against other leading multimodal models. This move accelerates real-world testing and could quickly influence how developers and researchers approach building integrated AI applications.
- A unified 72B parameter model that natively processes text, images, audio, and documents in one architecture.
- Freely available for public testing right now via an online demo on Hugging Face Spaces.
- Represents a significant open-source challenger in the multimodal AI space, encouraging direct community experimentation.
Why It Matters
It provides a free, state-of-the-art multimodal AI for developers to test and build upon, lowering the barrier for advanced AI applications.