Skip to content

SYSTEM Cited by 1 source

Waxal dataset

Waxal is an open speech dataset for African languages, introduced by Google and developed "with the community." Cited in the 2026-05-28 I/O 2026 roundup post as part of Google's multilinguality and localization research arc supporting Gemini's deployment across 70+ languages and 230+ countries (Source: sources/2026-05-28-google-a-new-era-of-innovation-google-research-at-io-2026).

This is a minimum-viable wiki page anchored to the I/O 2026 post's pointer to the Waxal announcement blog. Dataset specification (languages covered, hours, speaker count, license, methodology) is not in this raw and lives in the linked blog post and any accompanying release notes.

Role

Waxal is named alongside the ECLeKTic benchmark as evidence of Google's investment in low-resource-language coverage. Open-sourcing the data (rather than only the benchmark) targets the broader research-community ability to train African-language ASR / TTS / understanding models — extending what evaluation benchmarks alone can drive.

Seen in

Last updated · 542 distilled / 1,571 read