metadata
license: apache-2.0
language:
- ne
- en
metrics:
- accuracy
- f1
- precision
- recall
base_model: sentence-transformers/all-MiniLM-L6-v2
new_version: 1.0.0
pipeline_tag: text-classification
library_name: scikit-learn
tags:
- hybrid-model
- logistic-regression
- sentence-transformers
- sbert
- ne-en
- rule-based
- text-priority
- low-resource-nlp
- multilingual
- civictech
- complaint-triage
- emergency-detection
eval_results:
- task:
type: text-classification
name: Priority Detection (Nepali + English)
dataset:
name: priority_clean.csv (custom)
type: csv
size: 266 samples
metrics:
accuracy: 0.725
f1_macro: 0.72
precision_macro: 0.73
recall_macro: 0.73
per_class:
HIGH:
precision: 0.73
recall: 0.66
f1: 0.69
MEDIUM:
precision: 0.74
recall: 0.8
f1: 0.76
LOW:
precision: 0.71
recall: 0.72
f1: 0.71
Priority Classification Model (Nepali + English Hybrid)
Model Overview
This model automatically classifies citizen complaints or service requests into priority levels — HIGH, MEDIUM, or LOW — based on the urgency and nature of the text.
It supports both Nepali and English inputs and uses a hybrid ML + rule-based approach to ensure robustness, especially on small datasets.
Model Architecture
| Component | Description |
|---|---|
| Embedder | sentence-transformers/all-MiniLM-L6-v2 |
| Classifier | Logistic Regression (multiclass, balanced weights) |
| Rule-based Layer | Keyword-based fallback for urgency terms in Nepali and English |
| Features | SBERT embeddings + priority keyword preservation |
| Hybrid Inference | Combines ML prediction confidence with rules for safer decisions |
Training Summary
| Metric | Value |
|---|---|
| Total raw samples | 266 |
| After preprocessing & augmentation | 594 |
| Train/Test Split | 445 / 149 |
| Embedding Dimension | 384 |
| Classes | HIGH, MEDIUM, LOW |
| Test Accuracy | 72.5% |
| Macro F1-score | 0.72 |
Label Distribution (After Normalization)
| Label | Count |
|---|---|
| HIGH | 203 |
| MEDIUM | 29 |
| LOW | 34 |
Label Distribution (After Augmentation)
| Label | Count |
|---|---|
| HIGH | 200 |
| MEDIUM | 194 |
| LOW | 200 |
Classification Report
| Class | Precision | Recall | F1 | Support |
|---|---|---|---|---|
| HIGH | 0.73 | 0.66 | 0.69 | 50 |
| MEDIUM | 0.74 | 0.80 | 0.76 | 49 |
| LOW | 0.71 | 0.72 | 0.71 | 50 |
| Overall Accuracy | 0.725 | 149 |
Performance is acceptable (≥70%) given dataset size. The model performs best on clearly marked “urgent/emergency” cases and slightly lower on borderline MEDIUM cases.
Inference (Usage)
Using the model directly (ML only or Hybrid)
from huggingface_hub import hf_hub_download
import joblib
from priority_det import Embedder, predict_priority
# Download the model
model_path = hf_hub_download(repo_id="your-username/priority-classifier", filename="classifier.joblib")
# Load the classifier
bundle = joblib.load(model_path)
clf = bundle["clf"]
label_map = bundle["label_map"]
# Initialize the embedder
embedder = Embedder()
# Predict
text = "पानी आपूर्ति बन्द छ। तत्काल समाधान चाहिन्छ।"
result = predict_priority(text, embedder, clf, label_map, use_hybrid=True)
print(result)