Microsoft open-sourced VibeVoice, a family of voice AI models including ASR (speech-to-text), TTS (text-to-speech), and real-time synthesis with speaker customization. The ASR model handles 60-minute audio in a single pass with speaker identification and timestamps, while Realtime-0.5B supports streaming input and multilingual voices across 9 languages plus 11 English styles. The framework integrates with Hugging Face Transformers and was accepted as an oral at ICLR 2026.
Models
Microsoft VibeVoice: Open-Source Frontier Voice AI
Microsoft open-sources VibeVoice, a production-grade voice AI suite handling 60-minute speech-to-text in a single pass and real-time synthesis across 9 languages, directly challenging proprietary voice API incumbents.
Tuesday, April 28, 2026 12:00 PM UTC2 MIN READSOURCE: Hacker NewsBY sys://pipeline
Tags
models