Instructions to use KasuleTrevor/cdli-parakeet-en-finetune with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- NeMo
How to use KasuleTrevor/cdli-parakeet-en-finetune with NeMo:
import nemo.collections.asr as nemo_asr asr_model = nemo_asr.models.ASRModel.from_pretrained("KasuleTrevor/cdli-parakeet-en-finetune") transcriptions = asr_model.transcribe(["file.wav"]) - Notebooks
- Google Colab
- Kaggle
CDLI Parakeet TDT 0.6B English Fine-Tune
This repository contains a NeMo ASR model fine-tuned from
nvidia/parakeet-tdt-0.6b-v2 on the gated
cdli/ugandan_english_nonstandard_speech_v1.0 dataset.
The task is English automatic speech recognition for atypical or non-standard speech from Ugandan speakers, including dysarthric speech. The dataset is part of the CDLI research collection and requires access approval.
Model Details
- Base model:
nvidia/parakeet-tdt-0.6b-v2 - Fine-tuning framework: NVIDIA NeMo
- Language: English
- Acoustic model family: FastConformer-TDT / RNNT-BPE
- Output text: lower-case English transcription with standard ASR normalization
Dataset
- Dataset:
cdli/ugandan_english_nonstandard_speech_v1.0 - License:
cc-by-sa-4.0 - Split sizes used:
- train:
5176 - validation:
638 - test:
1017
- train:
- Audio sampling rate:
16 kHz
The model was fine-tuned and evaluated on the CDLI Ugandan English non-standard speech corpus, which includes speaker metadata such as severity of speech impairment, age, gender, type of non-standard speech, and etiology.
Training Configuration
- Work root:
/jupyter_kernel/parakeet_cdli_en - Max manifest audio length:
30.0 s - Max training audio length:
30.0 s - Min audio length:
0.2 s - Train batch size:
8 - Eval batch size:
8 - Gradient accumulation steps:
2 - Effective train batch size:
16 - Learning rate:
1e-4 - Weight decay:
1e-3 - Warmup steps:
100 - Scheduler:
CosineAnnealing - Training steps:
2000 - Precision:
bf16-mixedwhen supported, otherwise mixed precision fallback
Evaluation
Evaluation was run on the held-out test split using both raw transcript
comparison and normalized transcript comparison.
Corpus Metrics
- Raw WER:
27.67% - Raw CER:
14.77% - Normalized WER:
21.72% - Normalized CER:
13.46%
Average Utterance Metrics
- Average normalized utterance WER (capped at 1.0):
20.98% - Average normalized utterance CER (capped at 1.0):
13.36%
Usage
from nemo.collections.asr.models import ASRModel
model = ASRModel.from_pretrained("KasuleTrevor/cdli-parakeet-en-finetune")
predictions = model.transcribe(["path/to/audio.wav"])
print(predictions[0].text if hasattr(predictions[0], "text") else predictions[0])
Files
EN-PARAKEET-TDT-F1.nemo: exported NeMo checkpointcheckpoints/: intermediate training checkpoints
Notes
- This model card reports the best-performing English Parakeet result obtained
in this project between the
0.6Band1.1Bruns. - Normalized metrics use transcript normalization to reduce punctuation, casing, and formatting noise during evaluation.
- Access to the source dataset is gated. Users should review the dataset terms before requesting access.
- Downloads last month
- -
Dataset used to train KasuleTrevor/cdli-parakeet-en-finetune
Collection including KasuleTrevor/cdli-parakeet-en-finetune
Evaluation results
- Test WER (raw) on CDLI Ugandan English Non-Standard Speech v1.0test set self-reported27.670
- Test CER (raw) on CDLI Ugandan English Non-Standard Speech v1.0test set self-reported14.770
- Test WER (normalized) on CDLI Ugandan English Non-Standard Speech v1.0test set self-reported21.720
- Test CER (normalized) on CDLI Ugandan English Non-Standard Speech v1.0test set self-reported13.460