File size: 3,629 Bytes
6b0db41
8d379f5
6b0db41
 
8d379f5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
---
license: apache-2.0
language:
- ne
- en
metrics:
- accuracy
- f1
- precision
- recall
base_model: sentence-transformers/all-MiniLM-L6-v2
new_version: 1.0.0
pipeline_tag: text-classification
library_name: scikit-learn
tags:
- hybrid-model
- logistic-regression
- sentence-transformers
- sbert
- ne-en
- rule-based
- text-priority
- low-resource-nlp
- multilingual
- civictech
- complaint-triage
- emergency-detection
eval_results:
- task:
    type: text-classification
    name: Priority Detection (Nepali + English)
  dataset:
    name: priority_clean.csv (custom)
    type: csv
    size: 266 samples
  metrics:
    accuracy: 0.725
    f1_macro: 0.72
    precision_macro: 0.73
    recall_macro: 0.73
    per_class:
      HIGH:
        precision: 0.73
        recall: 0.66
        f1: 0.69
      MEDIUM:
        precision: 0.74
        recall: 0.8
        f1: 0.76
      LOW:
        precision: 0.71
        recall: 0.72
        f1: 0.71
---

# Priority Classification Model (Nepali + English Hybrid)

## Model Overview
This model automatically classifies citizen complaints or service requests into **priority levels**`HIGH`, `MEDIUM`, or `LOW` — based on the urgency and nature of the text.
It supports **both Nepali and English** inputs and uses a **hybrid ML + rule-based approach** to ensure robustness, especially on small datasets.

---

## Model Architecture

| Component | Description |
|------------|-------------|
| **Embedder** | [`sentence-transformers/all-MiniLM-L6-v2`](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) |
| **Classifier** | Logistic Regression (multiclass, balanced weights) |
| **Rule-based Layer** | Keyword-based fallback for urgency terms in Nepali and English |
| **Features** | SBERT embeddings + priority keyword preservation |
| **Hybrid Inference** | Combines ML prediction confidence with rules for safer decisions |

---

## Training Summary

| Metric | Value |
|---------|-------|
| **Total raw samples** | 266 |
| **After preprocessing & augmentation** | 594 |
| **Train/Test Split** | 445 / 149 |
| **Embedding Dimension** | 384 |
| **Classes** | `HIGH`, `MEDIUM`, `LOW` |
| **Test Accuracy** | **72.5%** |
| **Macro F1-score** | **0.72** |

### Label Distribution (After Normalization)
| Label | Count |
|--------|-------|
| HIGH | 203 |
| MEDIUM | 29 |
| LOW | 34 |

### Label Distribution (After Augmentation)
| Label | Count |
|--------|-------|
| HIGH | 200 |
| MEDIUM | 194 |
| LOW | 200 |

---

## Classification Report

| Class | Precision | Recall | F1 | Support |
|--------|------------|--------|----|----------|
| HIGH | 0.73 | 0.66 | 0.69 | 50 |
| MEDIUM | 0.74 | 0.80 | 0.76 | 49 |
| LOW | 0.71 | 0.72 | 0.71 | 50 |
| **Overall Accuracy** | | | **0.725** | 149 |

**Performance is acceptable (≥70%)** given dataset size.
The model performs best on clearly marked “urgent/emergency” cases and slightly lower on borderline MEDIUM cases.

---

## Inference (Usage)

### Using the model directly (ML only or Hybrid)
```python
from huggingface_hub import hf_hub_download
import joblib
from priority_det import Embedder, predict_priority

# Download the model
model_path = hf_hub_download(repo_id="your-username/priority-classifier", filename="classifier.joblib")

# Load the classifier
bundle = joblib.load(model_path)
clf = bundle["clf"]
label_map = bundle["label_map"]

# Initialize the embedder
embedder = Embedder()

# Predict
text = "पानी आपूर्ति बन्द छ। तत्काल समाधान चाहिन्छ।"
result = predict_priority(text, embedder, clf, label_map, use_hybrid=True)
print(result)