
BRIDGE Report
AI Listens to Everyone. Except 5.5 Billion People.
That’s the gap we want to bridge.
BRIDGE is the first independent Global South ASR benchmark evaluating 14 global models across 22 languages on a first-of-its-kind 7 metric stack.
How the Benchmark was built ?
From speaker recruitment to evaluation pipeline — the decisions that make IndicBench reproducible, auditable, and resistant to benchmark gaming.
Dual-speaker audio collected from real conversations: 22+ Indian states across the Indic corpus; three Latin-American Spanish dialects (Argentinian, Peruvian, Venezuelan), Brazilian Portuguese, and Vietnamese on the international side. Contributors sourced to reflect diverse demographics — age, gender, region — with no scripting or prompting. Every file is a genuine naturalistic conversation.
Before any metric runs, both reference and hypothesis pass through three normalisation layers: base cleaning (lowercasing, punctuation stripping, Unicode marks preserved), loanword normalisation (script-variant English words unified), and OIWER normalisation (British/American spelling + mixed-script token expansion).
Each (audio, model) pair scored on: WER & CER (word/character accuracy), Semantic Similarity (meaning preservation via multilingual embeddings), CS F1 (code-switching quality), PIER (English token recall), toWER (phonetic WER via ITRANS), and OIWER (orthography-informed WER). WER alone is insufficient — a model can score WER 0.97 and SemanticSim 0.99 simultaneously.
Every conversation is tagged across 7 dimensions: language, gender mix, region (same/cross-state on Indic; dialect on Spanish), age group, speaker overlap, conversational density, and gap pattern. The same scheme runs over both corpora so cross-language comparisons stay apples-to-apples.
Model Leaderboard
All 14 models on WER (and more). Filter by language to see how each model performs per-language. Filter by metric to change what the bars represent.
ElevenLabs Scribe v2 leads overall at 8.53% WER
The only model simultaneously accurate and code-switch aware. Deepgram nova-3 has the best CS F1 but limited language coverage. Several models sit above 30% WER — not production-ready for Indic conversational audio.
Key Findings
Six evidence-based conclusions drawn from the merged BRIDGE corpus — Indic conversational audio plus Spanish (3 dialects), Brazilian Portuguese, and Vietnamese — relevant for enterprise AI teams deploying voice products across the Global South.
ElevenLabs Scribe v2 posts 8.53% mean WER across the merged corpus, with 0.65–1.45% on the three Latin-American Spanish dialects and 1.45% on Brazilian Portuguese. The 7.67pp gap to the second-ranked model on the international slice is wider than the entire spread from rank 2 to rank 11 — Scribe v2 is the unambiguous default wherever it has language coverage.
ElevenLabs Scribe v2 leads broadly — except where it doesn’t
Scribe v2 dominates Indic, Spanish, and Portuguese; AssemblyAI Universal owns Vietnamese. Overlap, cross-state pairs, and Caribbean Spanish are the three biggest performance killers across the merged corpus. CS F1 exposes Indic code-switch failures invisible to WER. Don’t pick an ASR provider on aggregate WER alone — evaluate on the cohort that matches your deployment.
Cohort Performance Analysis
Choose a cohort dimension, a metric, and a model to see how performance shifts across conditions. All three filters work together — any combination is valid.
Speaker overlap is the biggest acoustic stressor
Cross-state pairs are harder than same-state. Gender and age have modest effects. Duration and gap patterns show minimal impact — language and accent dominate.
Dataset access & citation
The full BRIDGE corpus — Indic + Latin American Spanish + Brazilian Portuguese + Vietnamese — is available on Hugging Face. If you use this benchmark in your research, please cite the following.
Audio files, golden transcripts, speaker metadata, cohort labels, and evaluation scripts are available under the BRIDGE dataset card on Hugging Face — covering Indic (17 languages), Latin American Spanish (3 dialects), Brazilian Portuguese, and Vietnamese. Additional languages and an overlap-focused corpus are in preparation.
Hugging Face@misc{humynlabs_bridge_2026,
title = {BRIDGE: State of Conversational ASR Across the Global South},
author = {HumynLabs Research Team},
year = {2026},
month = {April},
note = {Independent benchmark evaluating 15 commercial ASR APIs on dual-speaker conversational audio across 20 languages — Indic (17 languages, 22+ Indian states), Latin American Spanish (Argentinian, Peruvian, Venezuelan), Brazilian Portuguese, and Vietnamese — scored on a 7-metric stack across 7 cohort dimensions},
url = {https://bridge-report-hazel.vercel.app},
howpublished = {HumynLabs}
}If you work on conversational ASR and want to submit your model for evaluation, or partner on expanding the corpus into new languages, contact the BRIDGE team at humynlabs.ai.