Pidgin English and AI: The Next Frontier for Nigerian Tech

Nigerian Pidgin English — Naijá — is not a mistake. It is not the result of Nigerians who could not learn English properly. It is a fully developed creole language with consistent grammatical structures, a stable phonological system, and a rich expressive vocabulary that English alone cannot replicate. When a Lagos market woman asks "How much e be?" or a PH driver says "E don reach?", they are speaking Naijá — not broken English — and any AI that transcribes these as malformed English sentences has already failed.

The linguistic status of Nigerian Pidgin is now established: it was granted official recognition by the BBC World Service (BBC Pidgin launched in 2017) and has an ISO 639-3 code (pcm for Nigerian Creole). What has not kept pace is AI capability in Pidgin — a gap that represents one of the largest underserved language populations in any AI market globally.

What makes Pidgin linguistically distinct from English

Pidgin is not English with Nigerian pronunciation. It has its own grammar. Several key structural differences that matter for AI:

Nigerian Pidgin grammatical patterns that diverge from English

Aspect marking instead of tense: "I don chop" (I have eaten, completed) vs. "I dey chop" (I am eating, ongoing). Dey marks progressive; don marks completive. Time is expressed by context, not verb form.
Serial verb constructions: "I go come back" (I will return) uses two verbs where English uses one. An English ASR model may hear this as grammatical error; a Pidgin model recognises it as a standard serial verb chain.
Na as copula and focus marker: "Na doctor him be" (He is the doctor / It is the doctor who...) — na serves multiple grammatical functions that an English model has no representation for.
Reduplication for emphasis: "E small small" (It is very small / It is getting smaller gradually) — reduplication is a productive morphological process in Pidgin that doesn't exist in standard English.

An ASR system trained on English will transcribe "E dey come" as something like "A day come" or "He day come" — phonetically close but linguistically wrong, and useless for intent extraction. The word "dey" in Pidgin is a progressive aspect marker and copula — it carries grammatical information that English "day" does not. Getting the transcription wrong means the entire downstream processing fails.

The challenge of Pidgin variation

Nigerian Pidgin varies across regions in ways that are significant for NLP. Lagos Pidgin, Port Harcourt Pidgin, Warri Pidgin, and Benin City Pidgin share a common core but have distinct vocabulary items, phonological patterns, and code-switching habits. Warri Pidgin — spoken in Delta State and considered by many linguists the most developed variety — has additional features that differ from Lagos Pidgin. Sapele Pidgin, spoken in Edo State, has influences from Urhobo and Itsekiri that make it distinct.

For an AI handling business calls, this means that a single Pidgin model trained only on Lagos data will have elevated error rates for callers from Warri or Calabar. The dialect variation challenge in Pidgin is analogous to the Hausa dialect problem — it requires multi-regional training data to achieve robustness across the full Pidgin-speaking population.

Where Pidgin AI stands in 2026

The honest answer: Pidgin AI is two to three years behind Hausa AI, which is itself five years behind English AI. This is a data problem above all else. Labelled conversational Pidgin audio data is scarce. Academic corpora of Pidgin text exist, but they are dominated by formal written sources — BBC Pidgin articles, literary texts — rather than the conversational register that business AI needs to handle.

Progress is happening. The masakhane research community has produced NLP resources for Nigerian Pidgin, including text corpora and basic NER taggers. Some Whisper fine-tuning experiments on Pidgin have shown promising results for read speech. But conversational Pidgin ASR — handling fast-paced informal speech in noisy conditions, with the full range of regional variation — remains an open research problem.

State of the art — Nigerian Pidgin AI in 2026

Text classification: Sentiment analysis and intent classification in written Pidgin — working, though with limitations on regional variants
Named entity recognition: Adequate for person names and locations; struggles with Pidgin-specific terminology
Machine translation (Pidgin ↔ English): Usable for simple sentences; breaks down on complex constructions and regional variants
ASR (conversational speech): Experimental; word error rates above 30% in realistic conditions — not yet ready for commercial deployment as a primary language
TTS (text-to-speech): Early stage; most TTS systems produce English with approximated Pidgin pronunciation rather than genuine Naijá voice

How Maraba handles Pidgin callers today

Maraba's current approach: Pidgin callers are handled via the English language model with a Pidgin vocabulary extension. The system recognises common Pidgin words and constructions — dey, don, na, wetin, how far, abeg — and maps them correctly to intent. This is not full Pidgin NLP; it is a pragmatic accommodation that handles the most frequent commercial Pidgin constructions accurately.

In practice, most Pidgin speakers in commercial contexts mix a significant amount of standard English into their speech. A caller asking about delivery status might say: "My package — e don reach? I dey wait for am since morning." The English portions (my package, I wait, since morning) are handled natively. The Pidgin constructions (e don reach, dey, for am) are handled via the vocabulary extension. Intent extraction — delivery status query — succeeds.

The cases where the current system fails: heavy Warri or Calabar regional Pidgin with dense non-English constructions, very fast conversational Pidgin with elision, and regional vocabulary that is not in the extension vocabulary. These callers are routed to escalation more frequently than the system's average — a known gap that will narrow as Pidgin model work matures.

What is coming

The two developments that will most significantly improve Pidgin AI in the next two years:

Data collection at scale. Several Nigerian-focused AI labs are now running structured Pidgin data collection programmes — paying speakers from across the Pidgin-speaking belt to record conversational speech in varied conditions. Each hour of labelled Pidgin audio collected today is training data for the models that will be deployed in 2027.

Transfer learning from related creoles. Nigerian Pidgin shares structural features with other West African English creoles — Cameroonian Pidgin English (Kamtok), Ghanaian Pidgin, Krio (Sierra Leone). Multilingual training across these related creoles can bootstrap Pidgin ASR before sufficient Nigerian-specific data exists.

The Pidgin-speaking Nigerian market is enormous. The commercial case for investing in Pidgin AI is strong. The gap between the size of the market and the state of the AI technology is one of the largest mismatches in language AI globally — which means the opportunity for whoever closes it first is correspondingly large.

Serving Nigerian callers — in all the languages they use

Maraba handles Yoruba, Hausa, Igbo, English, and common Pidgin constructions today. As Pidgin AI matures, we will be the first to deploy it at scale. Request beta, limited beta spots.

Request beta →