siSwati is a tonal language — the same written word can mean completely different things depending on pitch. AI systems have no way to know the difference. We're fixing that.
Press play on each card to hear how a generic AI pronounces these siSwati words — then see what the correct pronunciation should convey. The same spelling, two different meanings.
* Examples below are illustrative. Exact diacritic notation is pending linguistic expert validation.
What you just heard is the problem
The AI above pronounced siSwati using an English phonetic model. It has no concept of Bantu tonal structure. The resulting speech is not only mispronounced — in real communication, it would convey the wrong meaning entirely. This affects every siSwati AI assistant, voice tool, and educational application built today.
Research on comparable orthographic reforms in tonal languages projects significant gains across literacy, comprehension, and digital inclusion.
Projected over 5-year implementation. Based on comparable orthographic reforms (UNESCO, 2016).
Tone marking improved reading speed 27% in comparable Bantu languages (Bird, 1999).
Orthographic clarity → 30–40% comprehension improvement (Roberts et al., 2022).
Standardised orthography improved NLP performance 35–50% for African languages (Adebara, 2022).
Aligned with UN Sustainable Development Goals
A phased initiative to standardise siSwati diacritics and integrate them into AI speech systems — starting with the words where getting it wrong matters most.
Document ambiguous words, establish diacritic notation standards with linguists, begin native speaker recordings.
Align with Ministry of Education, University of Eswatini, LiSwati Lekubhala, UNDP, and UNESCO.
Fine-tune TTS/STT models on diacritic-annotated siSwati datasets. Demonstrate measurable accuracy improvements.
Deploy literacy tools, reading applications, and teacher resources using the new diacritic-enabled AI models.
Formalise diacritics in national orthography standards. Open-source the dataset and models for other Bantu languages.
Every voice recording from a native siSwati speaker gets us closer to AI that pronounces words correctly, preserves meaning, and gives every Swati speaker access to the digital world on their own terms.
Record yourself reading siSwati words to help build our training dataset.
Help validate diacritic notation and curate the ambiguous word list.
Schools, government, and NGOs can integrate early tools and provide feedback.