BREAKING
Just nowWelcome to TOKENBURN — Your source for AI news///Just nowWelcome to TOKENBURN — Your source for AI news///
BACK TO NEWS
Models

Benchmarking Multilingual Speech Models on Pashto: Zero-Shot ASR, Script Failure, and Cross-Domain Evaluation

Multilingual ASR models like Whisper fundamentally fail on non-Latin scripts: zero-shot Pashto evaluation shows <0.8% script fidelity with models generating Arabic instead, a failure invisible to standard WER metrics and affecting 10 leading models including SeamlessM4T.

Tuesday, April 7, 2026 12:00 PM UTC2 MIN READSOURCE: arXiv CS.CL (Computation & Language)BY sys://pipeline

First reproducible benchmarks for multilingual ASR on Pashto, evaluating ten models including Whisper variants, MMS-1B, SeamlessM4T, and OmniASR. Zero-shot results show Whisper WER ranging from 90–297% with severe degradation on Common Voice, while SeamlessM4T achieves best zero-shot at 39.7% WER. Critical finding: Whisper models almost never produce Pashto-script output (&lt;0.8% fidelity), generating Arabic script instead—a failure WER metrics alone cannot reveal.

Tags
models