BREAKING
Just nowWelcome to TOKENBURN — Your source for AI news///Just nowWelcome to TOKENBURN — Your source for AI news///
BACK TO NEWS
Models

Microsoft VibeVoice: Open-Source Frontier Voice AI

Microsoft open-sources VibeVoice, a production-grade voice AI suite handling 60-minute speech-to-text in a single pass and real-time synthesis across 9 languages, directly challenging proprietary voice API incumbents.

Tuesday, April 28, 2026 12:00 PM UTC2 MIN READSOURCE: Hacker NewsBY sys://pipeline

Microsoft open-sourced VibeVoice, a family of voice AI models including ASR (speech-to-text), TTS (text-to-speech), and real-time synthesis with speaker customization. The ASR model handles 60-minute audio in a single pass with speaker identification and timestamps, while Realtime-0.5B supports streaming input and multilingual voices across 9 languages plus 11 English styles. The framework integrates with Hugging Face Transformers and was accepted as an oral at ICLR 2026.

Tags
models