π°πͺ Model Card for RareElf/kalenjin-asr
This model is a fine-tuned version of facebook/wav2vec2-xls-r-300m for automatic speech recognition (ASR) in Kalenjin. It represents the Stage 2 research milestone in research inquiry into low-resource Nilotic languages at iLabAfrica, Strathmore University.
π Model Details
Model Description
This model leverages the Wav2Vec2-XLS-R-300M architecture, specifically fine-tuned for the phonetic and morphological complexities of Kalenjin. It employs a cascaded decoding strategy using an external KenLM language model to ensure orthographic consistency and reduce phonetic hallucinations common in low-resource settings.
- Developed by: Kevin Obote / RareElf
- Shared by: RareElf
- Model type: Automatic Speech Recognition (ASR)
- Language(s): Kalenjin (
kln) - License: Apache-2.0
- Finetuned from model:
facebook/wav2vec2-xls-r-300m
π Uses
Direct Use
- Transcribing Kalenjin audio into text for research, documentation, and accessibility.
- Integration into multi-stage translation pipelines (e.g., Kalenjin β English).
Out-of-Scope Use
- Not suitable for high-stakes medical or legal transcription without human verification.
- May struggle with rapid code-switching or extreme background noise.
β οΈ Bias, Risks, and Limitations
- Performance reflects the distribution of Common Voice Scripted Speech 24.0 - Kalenjin.
- Tonal variations in Kalenjin remain challenging for CTC-based architectures.
- Performance may vary across dialects not represented in training data.
ποΈ Training Details
Training Data
- Dataset: Common Voice Scripted Speech 24.0 - Kalenjin
- Augmentation: SpecAugment (
mask_time_prob=0.05)
Training Procedure
- Architecture: Wav2Vec2-XLS-R-300M
- Optimizer: AdamW
- Callbacks: EarlyStopping (
patience=5)
Preprocessing
- Resampled audio to 16kHz
- Orthographic normalization
- Removal of corrupted or misaligned samples
Training Hyperparameters
- Epochs: 30
- Effective Batch Size: 32
(16 per device + 2 gradient accumulation) - Learning Rate: 5e-5
- Mixed Precision: fp16
π Evaluation
Testing Data
- Held-out test set from Common Voice Kalenjin
Metrics
- WER: Word Error Rate
- CER: Character Error Rate
Results
| Decoding Strategy | WER (%) | CER (%) |
|---|---|---|
| KenLM Beam Search | 61.75 | 20.11 |
| Greedy Decoding | 69.03 | 24.15 |
~10.36% absolute WER improvement over Stage 1 baseline.
π Environmental Impact
- Hardware: NVIDIA A100 GPU
- Cloud Provider: Modal
- Compute Region: us-east-1
π Technical Specifications
Architecture & Objective
- Architecture: Wav2Vec2 + CTC Head
- Objective: Map 16kHz audio to Kalenjin character-level transcriptions
Compute Infrastructure
Hardware
- Training: NVIDIA A100 GPU (Modal)
- Development: Lenovo ThinkPad T14 Gen 1 (32GB RAM, 1TB SSD)
Software
- Python 3.10
- PyTorch 2.1.0
- Transformers 4.42.3
π Citation
@phdthesis{obote2026asr,
author = {Obote, Kevin},
title = {Automatic Speech Recognition for Low-Resource Nilotic Languages: A Stage-2 Acoustic Adaptation Approach},
school = {iLabAfrica, Strathmore University},
year = {2026}
}
π Glossary
- ASR: Automatic Speech Recognition
- WER: Word Error Rate
- CER: Character Error Rate
- KenLM: Efficient n-gram language modeling library
π€ Model Card Authors
Kevin Obote / RareElf / Guild Code Team
π¬ Contact
π Script: Uploading KenLM Binary
To replicate the 61.75% WER result, upload your KenLM binary (.bin or .arpa) file to the repository:
from huggingface_hub import HfApi
api = HfApi()
repo_id = "RareElf/kalenjin-asr"
api.upload_file(
path_or_fileobj="path/to/your/kalenjin_lm.bin",
path_in_repo="kalenjin_lm.bin",
repo_id=repo_id,
repo_type="model"
)
print(f"KenLM binary uploaded to {repo_id}")
- Downloads last month
- 88
Model tree for RareElf/kalenjin-asr
Base model
facebook/wav2vec2-xls-r-300m