Startups & Funding

DeepL, known for text translation, now wants to translate your voice

The text translation leader enters the voice AI race with a full-stack solution for business communication.

Deep Dive

DeepL, the company renowned for its neural machine translation technology, has officially expanded from text into real-time voice translation. The new suite is designed for business-critical use cases like multilingual meetings, customer support, and training sessions. A key feature is its integration with collaboration platforms like Zoom and Microsoft Teams, where participants can hear translated audio or read live captions. The company is also releasing a developer API, enabling businesses to build custom applications, such as for call centers, on top of DeepL's translation engine. CEO Jarek Kutylowski emphasized that voice was a "natural step" after years of perfecting text translation, aiming to solve the latency-versus-accuracy challenge inherent in real-time systems.

Currently, the technology operates by converting speech to text, translating it, and then synthesizing speech again. However, DeepL controls this entire stack and plans to develop an end-to-end model that bypasses text entirely for greater speed. The system can learn custom vocabulary, including industry-specific terms and proper names, making it adaptable for specialized professional contexts. The launch places DeepL in direct competition with well-funded startups like Palabra, which focuses on preserving speaker voice, and Sanas, which modifies accents for call centers. The service is currently in an early access phase, with organizations invited to join a waitlist to test its capabilities in real-world scenarios.

Key Points
  • DeepL's new suite offers real-time voice translation for Zoom, Teams, and custom apps via an API.
  • The system focuses on balancing low latency with high accuracy and can learn industry-specific vocabulary.
  • It enters a competitive field against startups like Palabra and Sanas, with plans for an end-to-end voice model.

Why It Matters

This enables seamless multilingual collaboration for global teams and provides scalable customer support in languages with talent shortages.