The standard approach to supporting African languages in AI products goes roughly like this: build an English-first system, add a machine translation layer, and call it multilingual. The product literature says "supports Yoruba." What it actually means is "converts Yoruba speech to English text, processes in English, translates the response back to Yoruba, and speaks it." At each translation step, something is lost.
Building genuine Yoruba AI — a system that processes Yoruba as a native language rather than a detour through English — is substantially harder. It is also the only approach that works for the actual way Yoruba is spoken by Nigerian callers in everyday commercial contexts.
What makes Yoruba linguistically distinct
Yoruba is a tonal language. The word "ọba" (king), "oba" (sponge), and "obà" (palm wine mash) are distinguished by the pitch pattern of the vowel. In written Yoruba, diacritics mark these tonal distinctions — the dot below ọ and ẹ indicates a mid-open vowel distinct from the mid-close o and e. In spoken Yoruba, pitch carries grammatical meaning.
This creates an immediate problem for AI systems built on English-dominant training data. English is not tonal. A speech recognition model trained primarily on English will systematically misidentify Yoruba tones because its acoustic model does not represent them. The model may transcribe the correct phonemes but with incorrect tone markings — which changes the meaning of words, sometimes substantially.
- ọ̀jọ́ (day) vs. ojò (rain) — easily confused in "what day is your shop open?" vs. a weather comment
- owó (money/price) vs. owo (respect/hand) — critical distinction in pricing conversations
- tà (to sell) vs. ta (to shoot/push) — in "do you sell X?" the wrong tone produces a meaningless or incorrect verb
- Names: Adéola, Adèolá, Àdéolá are all distinct people — misidentification in booking contexts creates confusion
The code-switching layer
Most Yoruba speakers in commercial settings do not speak pure Yoruba. They code-switch — alternating between Yoruba and English within a sentence or across turns in a conversation. A caller might open with "E kaaro" (good morning), state their query in English, insert Yoruba discourse markers ("sha", "o", "abi"), and close with "E se" (thank you).
This is not broken English or incomplete Yoruba. It is standard Southwest Nigerian communication. A system that can only process one language at a time will mishandle the switches — either ignoring the Yoruba segments, or treating the entire utterance as unrecognised. Either way, the caller's intent is lost.
Maraba's language detection operates at the segment level, not the utterance level. Within a single sentence, the system identifies which language each phrase belongs to and processes each in its native model. The outputs are combined before response generation, preserving the full meaning of what the caller said across both languages.
Why Yoruba speakers notice bad AI immediately
English speakers in the US and UK have been using AI voice assistants since Siri launched in 2011. Most English speakers have calibrated their expectations down — they know to speak slowly, clearly, use complete sentences, avoid idiom. They have adapted to the AI's limitations.
Yoruba speakers in Nigeria have not had a decade of habituation. When they encounter an AI that mishandles Yoruba, they do not adapt their speech — they hang up. The tolerance for friction is lower precisely because there is less accumulated experience of AI voice products. This means that a Yoruba-capable system that works well earns significant loyalty; a system that handles Yoruba badly loses the caller immediately and permanently.
Nigerian businesses serving Yoruba-speaking markets cannot afford a system that treats their callers' language as a secondary consideration. When a caller from Ibadan rings a Lagos pharmacy and opens in Yoruba, getting a response that reflects genuine comprehension — not just approximately correct English — is a meaningful signal of respect.
The diacritics problem in written Yoruba output
When Maraba sends a WhatsApp summary of a Yoruba-language call, the summary may include Yoruba words — a caller's name, a place name, a medication name in Yoruba. Producing these correctly requires UTF-8 encoding throughout the pipeline. A system that strips diacritics or substitutes plain ASCII will produce summaries that are ambiguous or incorrect.
This is why Maraba's rule against lowercasing non-English text exists at the engineering level, not just as a policy. .lower() applied to Yoruba text destroys the diacritics that distinguish vowel quality. "Adéola" becomes "adéola" — which loses the tonal information that distinguishes this name from other similar names. For a business recording customer interactions, this creates a data quality problem that compounds over time.
What native Yoruba AI looks like in practice
A caller dials a Lagos clinic. They open in Yoruba: "E kaaro. Mo fẹ́ bẹ̀rẹ̀ àpèjọ pẹ̀lú dókítà." (Good morning. I would like to book an appointment with the doctor.)
A translation-layer system hears this, converts to English ("Good morning. I want to make an appointment with the doctor"), processes the intent in English, generates a response in English, and translates back to Yoruba. The tones in the translated response are likely incorrect because the translation model does not carry tonal information. The result sounds like Yoruba text read aloud without understanding — grammatically plausible but tonally flat.
Maraba's system processes the Yoruba utterance directly. The intent is extracted from the Yoruba model — "appointment booking" — without translation. The response is generated in Yoruba natively, with correct tonal patterns in the TTS output. The caller hears Yoruba that sounds like Yoruba, not English translated by someone who learned the language from a textbook.
The size of the Yoruba market
Southwest Nigeria — Lagos, Ogun, Oyo, Ondo, Ekiti, Osun — has a combined economy that represents a substantial fraction of Nigeria's GDP. Lagos alone is Africa's largest city by most estimates. These are not edge-case customers for Nigerian business AI. They are the core market.
A business AI that cannot handle Yoruba fluently is not serving the Lagos market — it is serving only the fraction of the Lagos market that is comfortable in English. For most consumer-facing businesses — clinics, pharmacies, restaurants, salons, logistics companies — the English-only fraction is a minority of their callers. Native Yoruba support is not a nice-to-have. It is table stakes for competing in the southwest Nigerian market.
Built for Nigerian voices. No translation layer. Diacritics intact. Start free with 50 calls, or unlock Yoruba on the Starter plan for ₦20,000/month.
Request beta →