← Back to blog
Developer Tutorial

Igbo speech recognition API: build Igbo voice apps

Igbo is spoken by over 45 million people across south-eastern Nigeria — Enugu, Anambra, Imo, Abia, Rivers — yet practical Igbo speech recognition has been essentially unavailable to developers. This guide covers the Maraba Igbo STT API: how it works, how to call it, how to handle Igbo's distinctive diacritics (ị, ụ, ọ, ṅ), and how to deal with dialect variation in the real world.

The state of Igbo speech technology

Igbo (also written Ibo) is a major world language — one of Nigeria's three official languages alongside Hausa and Yoruba, and the native language of the Igbo people of south-eastern Nigeria. The Igbo diaspora is large and commercially significant: communities in the UK, US, and across West Africa maintain strong Igbo-language commerce and culture.

Despite this, the state of Igbo speech technology as of 2026 is almost non-existent in practical terms:

Maraba fine-tuned a Whisper model specifically on Nigerian Igbo speech, combining available public corpora with proprietary recordings from Enugu, Onitsha, Owerri, and Port Harcourt speakers. This covers the main dialect regions and brings WER down to approximately 22% on in-domain conversational Igbo — still room to improve, but functional for business use cases where context narrows the vocabulary range.

The Igbo diacritic system

Before calling the API, understand Igbo's orthographic system. The standard Igbo orthography uses:

Igbo also uses tone marks (acute accent for high, grave for low, unmarked for level), though tone is less consistently marked in modern standard Igbo writing than in Yoruba. The STT API outputs standard Igbo orthography including dotted vowels — never strip these.

Prerequisites

Basic Igbo transcription: Python

The endpoint and request structure are identical to the Hausa STT API — only the language code changes to ig.

Python
import requests

API_KEY = "your-api-key-here"
AUDIO_FILE = "igbo_sample.wav"

with open(AUDIO_FILE, "rb") as f:
    response = requests.post(
        "https://maraba.ai/api/v1/transcribe/",
        headers={"X-API-Key": API_KEY},
        data={"language": "ig"},
        files={"audio": (AUDIO_FILE, f, "audio/wav")},
    )

response.raise_for_status()
result = response.json()

print(result["transcript"])
print(f"Confidence: {result['confidence']:.2f}")
print(f"Duration: {result['duration_seconds']:.1f}s")

For a recording of the sentence "Ọ dị mma, aga m-enye gị oge." (That is fine, I will give you time / I will attend to you now), the response looks like:

JSON Response
{
  "transcript": "Ọ dị mma, aga m-enye gị oge.",
  "language_detected": "ig",
  "confidence": 0.87,
  "duration_seconds": 2.8,
  "words": [
    {"word": "Ọ", "start": 0.0, "end": 0.2, "confidence": 0.91},
    {"word": "dị", "start": 0.2, "end": 0.45, "confidence": 0.93},
    {"word": "mma,", "start": 0.45, "end": 0.8, "confidence": 0.95},
    {"word": "aga", "start": 0.85, "end": 1.1, "confidence": 0.88},
    {"word": "m-enye", "start": 1.1, "end": 1.5, "confidence": 0.84},
    {"word": "gị", "start": 1.5, "end": 1.75, "confidence": 0.89},
    {"word": "oge.", "start": 1.75, "end": 2.2, "confidence": 0.92}
  ],
  "cost_ngn": 0.23
}

Note that the transcript preserves the dotted ọ in "Ọ" and the dotted ị in "gị" — these are the correct Igbo characters, not plain ASCII o and i.

JavaScript example

JavaScript (Node.js)
const fs = require("fs");
const FormData = require("form-data");
const fetch = require("node-fetch");

const API_KEY = "your-api-key-here";

async function transcribeIgbo(filePath) {
  const form = new FormData();
  form.append("language", "ig");
  form.append("audio", fs.createReadStream(filePath), {
    filename: filePath,
    contentType: "audio/wav",
  });

  const response = await fetch("https://maraba.ai/api/v1/transcribe/", {
    method: "POST",
    headers: {
      "X-API-Key": API_KEY,
      ...form.getHeaders(),
    },
    body: form,
  });

  if (!response.ok) {
    const err = await response.json();
    throw new Error(`STT error ${response.status}: ${err.error}`);
  }

  return response.json();
}

transcribeIgbo("igbo_sample.wav")
  .then((r) => console.log("Transcript:", r.transcript))
  .catch(console.error);

Bilingual Igbo-English mode

Igbo business speech in major cities — Port Harcourt, Onitsha, Enugu — commonly mixes Igbo with English. Use language=ig-en for this:

Python — bilingual mode
with open("mixed_igbo_english.wav", "rb") as f:
    response = requests.post(
        "https://maraba.ai/api/v1/transcribe/",
        headers={"X-API-Key": API_KEY},
        data={"language": "ig-en"},
        files={"audio": ("audio.wav", f, "audio/wav")},
    )

result = response.json()
print(result["transcript"])
# Example output:
# "A fọrọ m ịbịa, I want to check if my order is ready."

Handling Igbo dialect variation

This is the most important practical consideration for Igbo STT in production. Unlike Hausa and Yoruba, which have reasonably standardised orthography and a dominant dialect, Igbo has significant dialect variation across its regions — Owerri Igbo, Onitsha Igbo, and Enugu Igbo differ not just in vocabulary but in phonology. Words that sound similar across dialects may be transcribed with the dominant-dialect spelling.

Practical guidance:

Igbo dialect handling recommendations
  • If your application serves callers predominantly from one region, test specifically with speakers from that region. Kano-Hausa and Lagos-Igbo are different accents; test data should match your deployment context.
  • Use a post-processing vocabulary list of domain-specific Igbo terms your callers are likely to use (product names, location names, business-specific vocabulary) to catch common substitutions.
  • For applications where exact transcript accuracy is critical (legal, medical), build in a human review step for low-confidence transcripts — flag any response where confidence < 0.75.
  • Contact Maraba support if you have a significant volume of audio from a specific Igbo dialect region. Custom fine-tuning for dialect-specific use cases is available on enterprise plans.

Building an Igbo voice application: end-to-end example

Here is a complete example of an Igbo voice menu system — a caller speaks a choice in Igbo, the system transcribes it, classifies the intent, and routes accordingly:

Python — intent routing
import requests

API_KEY = "your-api-key-here"

# Igbo intents we expect callers to express
INTENTS = {
    "hours": ["oge", "oge mmepụta", "mepee", "oge ọrụ"],
    "order": ["ọchịchọ", "iwu ihe", "order", "azụtara"],
    "location": ["ebe", "adrese", "ụlọ", "ebe a nọ"],
    "human": ["mmadụ", "onye ọrụ", "ọ dị mma ikọ n'ụzọ ọzọ"],
}

def classify_intent(transcript: str) -> str:
    """Very simple keyword-based intent matching for Igbo."""
    transcript_lower = transcript  # Do NOT call .lower() — use exact match
    # Check for keyword presence using unicode-safe search
    for intent, keywords in INTENTS.items():
        for keyword in keywords:
            if keyword in transcript:
                return intent
    return "unknown"

def handle_igbo_voice_input(audio_file: str) -> dict:
    with open(audio_file, "rb") as f:
        response = requests.post(
            "https://maraba.ai/api/v1/transcribe/",
            headers={"X-API-Key": API_KEY},
            data={"language": "ig-en"},
            files={"audio": (audio_file, f, "audio/wav")},
        )
    response.raise_for_status()
    result = response.json()

    transcript = result["transcript"]
    intent = classify_intent(transcript)

    return {
        "transcript": transcript,
        "intent": intent,
        "confidence": result["confidence"],
        "route": {
            "hours": "play_hours_message",
            "order": "open_order_flow",
            "location": "play_location_message",
            "human": "transfer_to_agent",
            "unknown": "play_fallback_message",
        }.get(intent, "play_fallback_message"),
    }

result = handle_igbo_voice_input("caller_input.wav")
print(f"Heard: {result['transcript']}")
print(f"Intent: {result['intent']}")
print(f"Route: {result['route']}")

Important: in the classify_intent function, we do not call .lower() on the Igbo transcript. Because Igbo uses dotted characters like ị, ụ, ọ that have different Unicode code points from their undotted counterparts, forcing lowercase conversion can corrupt or alter these characters. Use exact string matching or normalise only with a Unicode-aware library.

Common failure modes

Failure: Dotted vowels appearing as plain ASCII in output

This means a character encoding issue somewhere in your stack, not in the API output. The API always returns UTF-8. Check that your database, ORM, and API client are all configured for UTF-8. In Python, str is Unicode by default in Python 3 — no action needed. In Node.js, ensure you are reading the response as UTF-8 text, not as a Buffer that you then convert improperly.

Failure: Low confidence on Igbo numerals

Igbo number words are long and distinctive: ọtọ (three), asatọ (eight), iri (ten). In conversational speech these are often partially swallowed. For order-taking or booking applications where callers give numbers, consider accepting English numerals ("one two three") as an alternative input mode, or use DTMF for critical numeric input.

Failure: Mixing up ọ and o in output

Callers speaking at high speed or with heavy code-switching may produce audio where the ọ/o distinction is subtle. If your downstream application needs to distinguish these, include a confidence threshold check on the word level — if a key word like a name or location has confidence below 0.80, flag for review.

What you can build with Igbo STT

Igbo STT opens up voice application development for south-eastern Nigeria — a market that has been essentially inaccessible to voice technology until now:

For a complete multilingual Nigerian voice pipeline, pair Igbo STT with Hausa STT and Yoruba TTS. For detecting which language a caller is using before routing, see the language detection API.

Build the first Igbo voice app

The Maraba Igbo STT API is available now. Sign up free, grab your API key, and start transcribing. ₦5 per minute, no minimum spend.

Start building →