legacy-datasets/common_voice
Updated β’ 1.43k β’ 144
How to use PuristanLabs1/urdu-turn-v2 with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("audio-classification", model="PuristanLabs1/urdu-turn-v2") # Load model directly
from transformers import AutoProcessor, WhisperTurnDetector
processor = AutoProcessor.from_pretrained("PuristanLabs1/urdu-turn-v2")
model = WhisperTurnDetector.from_pretrained("PuristanLabs1/urdu-turn-v2")This model is a high-speed, audio-native turn detection system designed for real-time Urdu voice applications. It uses a Whisper-Tiny encoder combined with an Attention Pooling mechanism to detect if a speaker has finished their turn or is just pausing.
openai/whisper-tiny (Encoder only).| Metric | Value |
|---|---|
| Inference Latency (CPU) | ~120ms |
| Inference Latency (CUDA) | ~45ms |
| F1 Score (Turn Detection) | 95%+ (estimated) |
The model can be used directly via the urdu-turn-detector library:
pip install urdu-turn-detector
from urdu_turn_detection import UrduTurnDetector
# Auto-downloads from Hub
detector = UrduTurnDetector.from_pretrained("PuristanLabs1/urdu-turn-v2")
# Predict on file or buffer
result = detector.predict("audio.wav")
print(f"Turn is {result.label} (Conf: {result.confidence})")
This model is also compatible with HF Inference Endpoints using the included handler.py.
Trained on a combination of Common Voice 13 (Urdu) and synthetically augmented samples simulating natural turn transitions and interruptions.