Igbo Speech Recognition API: Build Igbo Voice Apps

The state of Igbo speech technology

Igbo (also written Ibo) is a major world language — one of Nigeria's three official languages alongside Hausa and Yoruba, and the native language of the Igbo people of south-eastern Nigeria. The Igbo diaspora is large and commercially significant: communities in the UK, US, and across West Africa maintain strong Igbo-language commerce and culture.

Despite this, the state of Igbo speech technology as of 2026 is almost non-existent in practical terms:

Mozilla Common Voice has a small Igbo dataset (under 5 hours), predominantly recorded in controlled conditions that do not reflect natural conversational speech
The Masakhane project has excellent Igbo text corpora (MENYO-20k includes Igbo, JW300 has a large Igbo-English parallel corpus) but text is not speech
No major commercial speech API (Google, AWS, Azure, OpenAI Whisper base) offers production-quality Igbo recognition
OpenAI Whisper base achieves approximately 45–55% word error rate on standard Igbo speech — roughly every other word wrong, which is not usable

Maraba fine-tuned a Whisper model specifically on Nigerian Igbo speech, combining available public corpora with proprietary recordings from Enugu, Onitsha, Owerri, and Port Harcourt speakers. This covers the main dialect regions and brings WER down to approximately 22% on in-domain conversational Igbo — still room to improve, but functional for business use cases where context narrows the vocabulary range.

The Igbo diacritic system

Before calling the API, understand Igbo's orthographic system. The standard Igbo orthography uses:

ị / Ị — i with a dot below, a high front unrounded vowel distinct from plain i. Example: ịnị (now), ịgba (drum)
ụ / Ụ — u with a dot below, a high back unrounded vowel distinct from plain u. Example: ụlọ (house), ụmụ (children)
ọ / Ọ — o with a dot below, an open-mid back rounded vowel. Example: ọ dị mma (it is good), ọchịchọ (desire)
ṅ / Ṅ — n with a dot above, a syllabic nasal consonant. Less common but phonemically important: ṅke (the one), ṅna (father, in some dialects)

Igbo also uses tone marks (acute accent for high, grave for low, unmarked for level), though tone is less consistently marked in modern standard Igbo writing than in Yoruba. The STT API outputs standard Igbo orthography including dotted vowels — never strip these.

Prerequisites

Maraba developer account — sign up at maraba.ai
API key from Developer → API Keys
Python 3.9+ with requests installed
Audio file in WAV, MP3, OGG, or FLAC format; 16kHz mono recommended

Basic Igbo transcription: Python

The endpoint and request structure are identical to the Hausa STT API — only the language code changes to ig.

Python

import requests

API_KEY = "your-api-key-here"
AUDIO_FILE = "igbo_sample.wav"

with open(AUDIO_FILE, "rb") as f:
    response = requests.post(
        "https://maraba.ai/api/v1/transcribe/",
        headers={"X-API-Key": API_KEY},
        data={"language": "ig"},
        files={"audio": (AUDIO_FILE, f, "audio/wav")},
    )

response.raise_for_status()
result = response.json()

print(result["transcript"])
print(f"Confidence: {result['confidence']:.2f}")
print(f"Duration: {result['duration_seconds']:.1f}s")

For a recording of the sentence "Ọ dị mma, aga m-enye gị oge." (That is fine, I will give you time / I will attend to you now), the response looks like:

JSON Response

{
  "transcript": "Ọ dị mma, aga m-enye gị oge.",
  "language_detected": "ig",
  "confidence": 0.87,
  "duration_seconds": 2.8,
  "words": [
    {"word": "Ọ", "start": 0.0, "end": 0.2, "confidence": 0.91},
    {"word": "dị", "start": 0.2, "end": 0.45, "confidence": 0.93},
    {"word": "mma,", "start": 0.45, "end": 0.8, "confidence": 0.95},
    {"word": "aga", "start": 0.85, "end": 1.1, "confidence": 0.88},
    {"word": "m-enye", "start": 1.1, "end": 1.5, "confidence": 0.84},
    {"word": "gị", "start": 1.5, "end": 1.75, "confidence": 0.89},
    {"word": "oge.", "start": 1.75, "end": 2.2, "confidence": 0.92}
  ],
  "cost_ngn": 0.23
}

Note that the transcript preserves the dotted ọ in "Ọ" and the dotted ị in "gị" — these are the correct Igbo characters, not plain ASCII o and i.

JavaScript example

JavaScript (Node.js)

const fs = require("fs");
const FormData = require("form-data");
const fetch = require("node-fetch");

const API_KEY = "your-api-key-here";

async function transcribeIgbo(filePath) {
  const form = new FormData();
  form.append("language", "ig");
  form.append("audio", fs.createReadStream(filePath), {
    filename: filePath,
    contentType: "audio/wav",
  });

  const response = await fetch("https://maraba.ai/api/v1/transcribe/", {
    method: "POST",
    headers: {
      "X-API-Key": API_KEY,
      ...form.getHeaders(),
    },
    body: form,
  });

  if (!response.ok) {
    const err = await response.json();
    throw new Error(`STT error ${response.status}: ${err.error}`);
  }

  return response.json();
}

transcribeIgbo("igbo_sample.wav")
  .then((r) => console.log("Transcript:", r.transcript))
  .catch(console.error);

Bilingual Igbo-English mode

Igbo business speech in major cities — Port Harcourt, Onitsha, Enugu — commonly mixes Igbo with English. Use language=ig-en for this:

Python — bilingual mode

with open("mixed_igbo_english.wav", "rb") as f:
    response = requests.post(
        "https://maraba.ai/api/v1/transcribe/",
        headers={"X-API-Key": API_KEY},
        data={"language": "ig-en"},
        files={"audio": ("audio.wav", f, "audio/wav")},
    )

result = response.json()
print(result["transcript"])
# Example output:
# "A fọrọ m ịbịa, I want to check if my order is ready."

Handling Igbo dialect variation

This is the most important practical consideration for Igbo STT in production. Unlike Hausa and Yoruba, which have reasonably standardised orthography and a dominant dialect, Igbo has significant dialect variation across its regions — Owerri Igbo, Onitsha Igbo, and Enugu Igbo differ not just in vocabulary but in phonology. Words that sound similar across dialects may be transcribed with the dominant-dialect spelling.

Practical guidance:

Igbo dialect handling recommendations

If your application serves callers predominantly from one region, test specifically with speakers from that region. Kano-Hausa and Lagos-Igbo are different accents; test data should match your deployment context.
Use a post-processing vocabulary list of domain-specific Igbo terms your callers are likely to use (product names, location names, business-specific vocabulary) to catch common substitutions.
For applications where exact transcript accuracy is critical (legal, medical), build in a human review step for low-confidence transcripts — flag any response where confidence < 0.75.
Contact Maraba support if you have a significant volume of audio from a specific Igbo dialect region. Custom fine-tuning for dialect-specific use cases is available on enterprise plans.

Building an Igbo voice application: end-to-end example

Here is a complete example of an Igbo voice menu system — a caller speaks a choice in Igbo, the system transcribes it, classifies the intent, and routes accordingly:

Python — intent routing

import requests

API_KEY = "your-api-key-here"

# Igbo intents we expect callers to express
INTENTS = {
    "hours": ["oge", "oge mmepụta", "mepee", "oge ọrụ"],
    "order": ["ọchịchọ", "iwu ihe", "order", "azụtara"],
    "location": ["ebe", "adrese", "ụlọ", "ebe a nọ"],
    "human": ["mmadụ", "onye ọrụ", "ọ dị mma ikọ n'ụzọ ọzọ"],
}

def classify_intent(transcript: str) -> str:
    """Very simple keyword-based intent matching for Igbo."""
    transcript_lower = transcript  # Do NOT call .lower() — use exact match
    # Check for keyword presence using unicode-safe search
    for intent, keywords in INTENTS.items():
        for keyword in keywords:
            if keyword in transcript:
                return intent
    return "unknown"

def handle_igbo_voice_input(audio_file: str) -> dict:
    with open(audio_file, "rb") as f:
        response = requests.post(
            "https://maraba.ai/api/v1/transcribe/",
            headers={"X-API-Key": API_KEY},
            data={"language": "ig-en"},
            files={"audio": (audio_file, f, "audio/wav")},
        )
    response.raise_for_status()
    result = response.json()

    transcript = result["transcript"]
    intent = classify_intent(transcript)

    return {
        "transcript": transcript,
        "intent": intent,
        "confidence": result["confidence"],
        "route": {
            "hours": "play_hours_message",
            "order": "open_order_flow",
            "location": "play_location_message",
            "human": "transfer_to_agent",
            "unknown": "play_fallback_message",
        }.get(intent, "play_fallback_message"),
    }

result = handle_igbo_voice_input("caller_input.wav")
print(f"Heard: {result['transcript']}")
print(f"Intent: {result['intent']}")
print(f"Route: {result['route']}")

Important: in the classify_intent function, we do not call .lower() on the Igbo transcript. Because Igbo uses dotted characters like ị, ụ, ọ that have different Unicode code points from their undotted counterparts, forcing lowercase conversion can corrupt or alter these characters. Use exact string matching or normalise only with a Unicode-aware library.

Common failure modes

Failure: Dotted vowels appearing as plain ASCII in output

This means a character encoding issue somewhere in your stack, not in the API output. The API always returns UTF-8. Check that your database, ORM, and API client are all configured for UTF-8. In Python, str is Unicode by default in Python 3 — no action needed. In Node.js, ensure you are reading the response as UTF-8 text, not as a Buffer that you then convert improperly.

Failure: Low confidence on Igbo numerals

Igbo number words are long and distinctive: ọtọ (three), asatọ (eight), iri (ten). In conversational speech these are often partially swallowed. For order-taking or booking applications where callers give numbers, consider accepting English numerals ("one two three") as an alternative input mode, or use DTMF for critical numeric input.

Failure: Mixing up ọ and o in output

Callers speaking at high speed or with heavy code-switching may produce audio where the ọ/o distinction is subtle. If your downstream application needs to distinguish these, include a confidence threshold check on the word level — if a key word like a name or location has confidence below 0.80, flag for review.

What you can build with Igbo STT

Igbo STT opens up voice application development for south-eastern Nigeria — a market that has been essentially inaccessible to voice technology until now:

Onitsha market voice ordering. Onitsha has one of the largest markets in West Africa. Traders could take voice orders from customers in Igbo.
Port Harcourt logistics. Oil and gas supply chain companies in PH have Igbo-speaking staff and clients. Voice-to-CRM in Igbo reduces data entry errors.
Healthcare in Igbo communities. Community clinics in Anambra, Imo, and Enugu serve patients who are more comfortable speaking Igbo than English.
Igbo language learning tools. STT paired with TTS enables pronunciation feedback for Igbo learners — a niche but growing market for diaspora reconnection.

For a complete multilingual Nigerian voice pipeline, pair Igbo STT with Hausa STT and Yoruba TTS. For detecting which language a caller is using before routing, see the language detection API.

Build the first Igbo voice app

The Maraba Igbo STT API is available now. Sign up free, grab your API key, and start transcribing. ₦5 per minute, no minimum spend.

Start building →