SYSTEM Cited by 1 source

Waxal dataset¶

Waxal is an open speech dataset for African languages, introduced by Google and developed "with the community." Cited in the 2026-05-28 I/O 2026 roundup post as part of Google's multilinguality and localization research arc supporting Gemini's deployment across 70+ languages and 230+ countries (Source: sources/2026-05-28-google-a-new-era-of-innovation-google-research-at-io-2026).

This is a minimum-viable wiki page anchored to the I/O 2026 post's pointer to the Waxal announcement blog. Dataset specification (languages covered, hours, speaker count, license, methodology) is not in this raw and lives in the linked blog post and any accompanying release notes.

Role¶

Waxal is named alongside the ECLeKTic benchmark as evidence of Google's investment in low-resource-language coverage. Open-sourcing the data (rather than only the benchmark) targets the broader research-community ability to train African-language ASR / TTS / understanding models — extending what evaluation benchmarks alone can drive.

Seen in¶

sources/2026-05-28-google-a-new-era-of-innovation-google-research-at-io-2026 — cited as community-developed open dataset for African speech technology.

concepts/multilingual-llm-evaluation — broader evaluation discipline this dataset supports.
systems/gemini — production LLM family this dataset supports.
companies/google — operator.

Waxal dataset¶

Role¶

Seen in¶

Related¶