← Back to blog
Developer Tutorial

Hausa speech to text API: transcribe Hausa audio in Python

There is almost no production-ready Hausa speech recognition available to developers today. This post outlines how the Orinode STT API will work once it ships out of beta — currently in private beta, trained on Nigerian Hausa audio — and shows you how to transcribe Hausa speech to text in Python and JavaScript, preserve diacritics like ƙ and ɗ, and handle the common failure modes.

Why Hausa speech recognition is a hard problem

Hausa is spoken by approximately 70 million people as a first language, with another 20–30 million using it as a lingua franca across West Africa. It is the dominant language of northern Nigeria — Kano, Kaduna, Sokoto, Katsina, Bauchi — and the commercial language of huge swaths of the Nigerian economy. Yet if you search for "Hausa speech to text API" today, you will find almost nothing useful. A few academic papers. Some references to Mozilla Common Voice's small Hausa dataset. Nothing a developer can call and ship against.

The technical reasons for this gap are real:

Orinode STT (used by Maraba) model addresses all of these. We fine-tuned OpenAI Whisper (small) on 6.5 hours of Nigerian Hausa audio — sourced from Kano, Kaduna, and Sokoto speakers — combined with the Mozilla Common Voice Hausa set and proprietary call recordings. The result is a word error rate of approximately 18% on in-domain Nigerian Hausa, compared to roughly 40% WER from the base Whisper model on the same test set.

Prerequisites

Before you start, you need:

The STT API is billed at ₦5 per minute of audio. A 30-second Hausa clip costs ₦2.50. There is no minimum charge on the STT endpoint.

Your first Hausa transcription: Python

The endpoint is POST /api/v1/transcribe/. You send a multipart form with the audio file and the language code ha. The response returns the transcript text, detected language, confidence score, and duration.

Python
import requests

API_KEY = "your-api-key-here"
AUDIO_FILE = "hausa_sample.wav"

with open(AUDIO_FILE, "rb") as f:
    response = requests.post(
        "https://maraba.ai/api/v1/transcribe/",
        headers={"X-API-Key": API_KEY},
        data={"language": "ha"},
        files={"audio": (AUDIO_FILE, f, "audio/wav")},
    )

response.raise_for_status()
result = response.json()

print(result["transcript"])
print(f"Confidence: {result['confidence']:.2f}")
print(f"Duration: {result['duration_seconds']:.1f}s")

For a recording of the sentence "Ina son in yi alƙawari da likita a ranar Talata." (I would like to make an appointment with the doctor on Tuesday), the API returns:

JSON Response
{
  "transcript": "Ina son in yi alƙawari da likita a ranar Talata.",
  "language_detected": "ha",
  "confidence": 0.91,
  "duration_seconds": 3.4,
  "words": [
    {"word": "Ina", "start": 0.0, "end": 0.3, "confidence": 0.97},
    {"word": "son", "start": 0.3, "end": 0.55, "confidence": 0.95},
    {"word": "in", "start": 0.55, "end": 0.7, "confidence": 0.93},
    {"word": "yi", "start": 0.7, "end": 0.85, "confidence": 0.98},
    {"word": "alƙawari", "start": 0.85, "end": 1.4, "confidence": 0.88},
    {"word": "da", "start": 1.4, "end": 1.55, "confidence": 0.99},
    {"word": "likita", "start": 1.55, "end": 1.95, "confidence": 0.92},
    {"word": "a", "start": 1.95, "end": 2.1, "confidence": 0.97},
    {"word": "ranar", "start": 2.1, "end": 2.5, "confidence": 0.94},
    {"word": "Talata.", "start": 2.5, "end": 2.9, "confidence": 0.89}
  ],
  "cost_ngn": 0.28
}

Notice that the transcript preserves alƙawari with the hooked ƙ — not the plain ASCII "k". This is critical. In Hausa, ƙ and k are different phonemes. Transcribing alƙawari as "alƙawari" is correct; transcribing it as "alkawari" is phonemically wrong and will cause downstream errors in any NLP pipeline that works with Hausa text.

JavaScript / Node.js example

JavaScript (Node.js)
const fs = require("fs");
const FormData = require("form-data");
const fetch = require("node-fetch");

const API_KEY = "your-api-key-here";
const AUDIO_FILE = "hausa_sample.wav";

async function transcribeHausa(filePath) {
  const form = new FormData();
  form.append("language", "ha");
  form.append("audio", fs.createReadStream(filePath), {
    filename: filePath,
    contentType: "audio/wav",
  });

  const response = await fetch("https://maraba.ai/api/v1/transcribe/", {
    method: "POST",
    headers: {
      "X-API-Key": API_KEY,
      ...form.getHeaders(),
    },
    body: form,
  });

  if (!response.ok) {
    const error = await response.json();
    throw new Error(`API error ${response.status}: ${error.error}`);
  }

  return response.json();
}

transcribeHausa(AUDIO_FILE)
  .then((result) => {
    console.log("Transcript:", result.transcript);
    console.log("Confidence:", result.confidence);
  })
  .catch(console.error);

The diacritic rule: never call .lower() on Hausa text

This is important enough to state explicitly. When you receive a Hausa transcript from the API, do not apply Python's .lower() or JavaScript's .toLowerCase() to it. These methods behave incorrectly or inconsistently with Hausa-specific characters on some platforms:

Python — what NOT to do
# WRONG — destroys Hausa diacritics
transcript = result["transcript"]
lowered = transcript.lower()  # "ƙ" may become "k", breaking Hausa text

# CORRECT — preserve the transcript exactly as returned
transcript = result["transcript"]
# Use it as-is. Do not normalise case for Hausa.

The specific characters to preserve in Hausa text:

These are all in the Unicode Latin Extended-B block and should be handled correctly by any system that declares UTF-8 encoding. Ensure your database columns, API endpoints, and storage layers are configured for UTF-8 or UTF-8mb4 — they need to be, and most modern stacks already are.

Handling code-switching audio

Much real-world Nigerian Hausa speech switches between Hausa and English mid-sentence. A caller might say: "Ina son order, but I want to confirm the price first." The Hausa opening shifts to English in the middle. To handle this, use the language=ha-en bilingual mode:

Python — bilingual Hausa/English
with open("codeswitched_audio.wav", "rb") as f:
    response = requests.post(
        "https://maraba.ai/api/v1/transcribe/",
        headers={"X-API-Key": API_KEY},
        data={"language": "ha-en"},  # bilingual mode
        files={"audio": ("audio.wav", f, "audio/wav")},
    )

result = response.json()
print(result["transcript"])
# Output: "Ina son order, but I want to confirm the price first."
# The transcript preserves the language switch naturally

In bilingual mode the model detects the dominant language of each segment and switches accordingly. The word-level timestamps still reflect the mixed-language reality.

Streaming transcription for real-time use cases

The standard endpoint processes a complete audio file. For real-time applications — such as transcribing a live phone call — use the WebSocket streaming endpoint at wss://maraba.ai/api/v1/transcribe/stream/. This returns partial transcripts as the audio arrives, with a latency of approximately 600–900ms on typical Nigerian network conditions.

Streaming is out of scope for this tutorial but is documented in the full STT API reference.

Common failure modes and fixes

Here are the transcription problems you are most likely to encounter with Hausa audio and how to fix them:

Failure: Low confidence on long vowels

Hausa vowel length is phonemically contrastive — gida (house) vs giida (houses) differ only in vowel duration. In low-quality audio, the model may mis-transcribe these. Fix: record at 16kHz mono minimum, avoid heavy audio compression before submission.

Failure: ƙ transcribed as k

This happens when audio is heavily compressed or the speaker's ejective consonant is under-articulated. The model makes its best guess. Fix: use higher bitrate audio. If you are generating synthetic test audio, use the Orinode TTS API with voice=ha-NG to create Hausa audio with correct phoneme rendering.

Failure: English words in a Hausa sentence are transcribed incorrectly

If you submit a bilingual recording with language=ha instead of language=ha-en, the model will try to force English words into Hausa phonology. Fix: use language=ha-en for any audio that may contain English.

Failure: 400 error on file upload

The API returns a 400 with "code": "unsupported_format" if the audio codec is not recognised. Accepted MIME types: audio/wav, audio/mpeg (MP3), audio/ogg, audio/flac. AAC files will be rejected. Convert with ffmpeg: ffmpeg -i input.aac -ar 16000 -ac 1 output.wav.

Rate limits and error codes

Developer accounts have a default rate limit of 60 requests per minute. For batch transcription of large audio archives, use the X-Rate-Limit-Remaining response header to pace your requests, or contact support to request a higher limit.

Key error codes returned in {"error": "...", "code": "...", "detail": {}} format:

What you can build with Hausa STT

The practical applications for Hausa speech recognition in the Nigerian market are substantial. Here are the use cases developers are building today:

If you are building a Hausa-language voice application end-to-end, pair the STT API with the Hausa TTS output for a full speech-in, speech-out pipeline. For detecting which Nigerian language a speaker is using before you route to the correct STT model, see the Nigerian Language Detection API guide.

Pricing

The STT API charges ₦5 per minute of audio, billed in 10-second increments. There is no setup fee and no monthly minimum. You pay only for what you transcribe. A 1,000-minute batch transcription of Hausa call recordings costs ₦5,000.

API access is available on all Maraba plans including the free tier. Free plan accounts receive 50 API minutes per month. Starter (₦15,000/month) and Pro (₦45,000/month) plans include higher included API usage; additional usage is charged at the PAYG rate.

Start transcribing Hausa audio today

Sign up free, get your API key, and make your first Hausa transcription in under five minutes. No credit card required for the free tier.

Get your API key →