🇰🇪 Model Card for `RareElf/kalenjin-asr`

This model is a fine-tuned version of facebook/wav2vec2-xls-r-300m for automatic speech recognition (ASR) in Kalenjin. It represents the Stage 2 research milestone in research inquiry into low-resource Nilotic languages at iLabAfrica, Strathmore University.

📋 Model Details

Model Description

This model leverages the Wav2Vec2-XLS-R-300M architecture, specifically fine-tuned for the phonetic and morphological complexities of Kalenjin. It employs a cascaded decoding strategy using an external KenLM language model to ensure orthographic consistency and reduce phonetic hallucinations common in low-resource settings.

Developed by: Kevin Obote / RareElf
Shared by: RareElf
Model type: Automatic Speech Recognition (ASR)
Language(s): Kalenjin (kln)
License: Apache-2.0
Finetuned from model: facebook/wav2vec2-xls-r-300m

🚀 Uses

Direct Use

Transcribing Kalenjin audio into text for research, documentation, and accessibility.
Integration into multi-stage translation pipelines (e.g., Kalenjin → English).

Out-of-Scope Use

Not suitable for high-stakes medical or legal transcription without human verification.
May struggle with rapid code-switching or extreme background noise.

⚠️ Bias, Risks, and Limitations

Performance reflects the distribution of Common Voice Scripted Speech 24.0 - Kalenjin.
Tonal variations in Kalenjin remain challenging for CTC-based architectures.
Performance may vary across dialects not represented in training data.

🏋️ Training Details

Training Data

Dataset: Common Voice Scripted Speech 24.0 - Kalenjin
Augmentation: SpecAugment (mask_time_prob=0.05)

Training Procedure

Architecture: Wav2Vec2-XLS-R-300M
Optimizer: AdamW
Callbacks: EarlyStopping (patience=5)

Preprocessing

Resampled audio to 16kHz
Orthographic normalization
Removal of corrupted or misaligned samples

Training Hyperparameters

Epochs: 30
Effective Batch Size: 32
(16 per device + 2 gradient accumulation)
Learning Rate: 5e-5
Mixed Precision: fp16

📊 Evaluation

Testing Data

Held-out test set from Common Voice Kalenjin

Metrics

WER: Word Error Rate
CER: Character Error Rate

Results

Decoding Strategy	WER (%)	CER (%)
KenLM Beam Search	61.75	20.11
Greedy Decoding	69.03	24.15

~10.36% absolute WER improvement over Stage 1 baseline.

🌍 Environmental Impact

Hardware: NVIDIA A100 GPU
Cloud Provider: Modal
Compute Region: us-east-1

🛠 Technical Specifications

Architecture & Objective

Architecture: Wav2Vec2 + CTC Head
Objective: Map 16kHz audio to Kalenjin character-level transcriptions

Compute Infrastructure

Hardware

Training: NVIDIA A100 GPU (Modal)
Development: Lenovo ThinkPad T14 Gen 1 (32GB RAM, 1TB SSD)

Software

Python 3.10
PyTorch 2.1.0
Transformers 4.42.3

📚 Citation

@phdthesis{obote2026asr,
  author = {Obote, Kevin},
  title = {Automatic Speech Recognition for Low-Resource Nilotic Languages: A Stage-2 Acoustic Adaptation Approach},
  school = {iLabAfrica, Strathmore University},
  year = {2026}
}

📖 Glossary

ASR: Automatic Speech Recognition
WER: Word Error Rate
CER: Character Error Rate
KenLM: Efficient n-gram language modeling library

👤 Model Card Authors

Kevin Obote / RareElf / Guild Code Team

📬 Contact

[email protected]

🛠 Script: Uploading KenLM Binary

To replicate the 61.75% WER result, upload your KenLM binary (.bin or .arpa) file to the repository:

from huggingface_hub import HfApi

api = HfApi()
repo_id = "RareElf/kalenjin-asr"

api.upload_file(
    path_or_fileobj="path/to/your/kalenjin_lm.bin",
    path_in_repo="kalenjin_lm.bin",
    repo_id=repo_id,
    repo_type="model"
)

print(f"KenLM binary uploaded to {repo_id}")

Downloads last month: 88

Safetensors

Model size

0.3B params

Tensor type

F32

Model tree for RareElf/kalenjin-asr

Base model

facebook/wav2vec2-xls-r-300m

Finetuned

(806)

this model

🇰🇪 Model Card for RareElf/kalenjin-asr