Urdu Turn Detection (Audio-Only V2)

This model is a high-speed, audio-native turn detection system designed for real-time Urdu voice applications. It uses a Whisper-Tiny encoder combined with an Attention Pooling mechanism to detect if a speaker has finished their turn or is just pausing.

Key Features

Low Latency: Optimized for real-time inference (~50ms).
Audio-Only: No ASR/Text needed, making it faster and privacy-friendly.
Attention-Based: Uses cross-frame attention to focus on prosodic cues like intonation and sentence-ending phonemes.
Robustness: Trained specifically to handle silence and "thinking" pauses without false positives.

Architecture

Backbone: openai/whisper-tiny (Encoder only).
Pooling: Masked Attention Pooling (ignores padding/silence).
Classifier: 2-layer MLP head.

Performance

Metric	Value
Inference Latency (CPU)	~120ms
Inference Latency (CUDA)	~45ms
F1 Score (Turn Detection)	95%+ (estimated)

Usage

🚀 High-Level API (Recommended)

The model can be used directly via the urdu-turn-detector library:

pip install urdu-turn-detector

from urdu_turn_detection import UrduTurnDetector

# Auto-downloads from Hub
detector = UrduTurnDetector.from_pretrained("PuristanLabs1/urdu-turn-v2")

# Predict on file or buffer
result = detector.predict("audio.wav")
print(f"Turn is {result.label} (Conf: {result.confidence})")

☁️ Hugging Face Inference API

This model is also compatible with HF Inference Endpoints using the included handler.py.

Dataset

Trained on a combination of Common Voice 13 (Urdu) and synthetically augmented samples simulating natural turn transitions and interruptions.

Downloads last month: 37

PuristanLabs1
/

urdu-turn-v2