File size: 3,629 Bytes
6b0db41 8d379f5 6b0db41 8d379f5 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 |
---
license: apache-2.0
language:
- ne
- en
metrics:
- accuracy
- f1
- precision
- recall
base_model: sentence-transformers/all-MiniLM-L6-v2
new_version: 1.0.0
pipeline_tag: text-classification
library_name: scikit-learn
tags:
- hybrid-model
- logistic-regression
- sentence-transformers
- sbert
- ne-en
- rule-based
- text-priority
- low-resource-nlp
- multilingual
- civictech
- complaint-triage
- emergency-detection
eval_results:
- task:
type: text-classification
name: Priority Detection (Nepali + English)
dataset:
name: priority_clean.csv (custom)
type: csv
size: 266 samples
metrics:
accuracy: 0.725
f1_macro: 0.72
precision_macro: 0.73
recall_macro: 0.73
per_class:
HIGH:
precision: 0.73
recall: 0.66
f1: 0.69
MEDIUM:
precision: 0.74
recall: 0.8
f1: 0.76
LOW:
precision: 0.71
recall: 0.72
f1: 0.71
---
# Priority Classification Model (Nepali + English Hybrid)
## Model Overview
This model automatically classifies citizen complaints or service requests into **priority levels** — `HIGH`, `MEDIUM`, or `LOW` — based on the urgency and nature of the text.
It supports **both Nepali and English** inputs and uses a **hybrid ML + rule-based approach** to ensure robustness, especially on small datasets.
---
## Model Architecture
| Component | Description |
|------------|-------------|
| **Embedder** | [`sentence-transformers/all-MiniLM-L6-v2`](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) |
| **Classifier** | Logistic Regression (multiclass, balanced weights) |
| **Rule-based Layer** | Keyword-based fallback for urgency terms in Nepali and English |
| **Features** | SBERT embeddings + priority keyword preservation |
| **Hybrid Inference** | Combines ML prediction confidence with rules for safer decisions |
---
## Training Summary
| Metric | Value |
|---------|-------|
| **Total raw samples** | 266 |
| **After preprocessing & augmentation** | 594 |
| **Train/Test Split** | 445 / 149 |
| **Embedding Dimension** | 384 |
| **Classes** | `HIGH`, `MEDIUM`, `LOW` |
| **Test Accuracy** | **72.5%** |
| **Macro F1-score** | **0.72** |
### Label Distribution (After Normalization)
| Label | Count |
|--------|-------|
| HIGH | 203 |
| MEDIUM | 29 |
| LOW | 34 |
### Label Distribution (After Augmentation)
| Label | Count |
|--------|-------|
| HIGH | 200 |
| MEDIUM | 194 |
| LOW | 200 |
---
## Classification Report
| Class | Precision | Recall | F1 | Support |
|--------|------------|--------|----|----------|
| HIGH | 0.73 | 0.66 | 0.69 | 50 |
| MEDIUM | 0.74 | 0.80 | 0.76 | 49 |
| LOW | 0.71 | 0.72 | 0.71 | 50 |
| **Overall Accuracy** | | | **0.725** | 149 |
**Performance is acceptable (≥70%)** given dataset size.
The model performs best on clearly marked “urgent/emergency” cases and slightly lower on borderline MEDIUM cases.
---
## Inference (Usage)
### Using the model directly (ML only or Hybrid)
```python
from huggingface_hub import hf_hub_download
import joblib
from priority_det import Embedder, predict_priority
# Download the model
model_path = hf_hub_download(repo_id="your-username/priority-classifier", filename="classifier.joblib")
# Load the classifier
bundle = joblib.load(model_path)
clf = bundle["clf"]
label_map = bundle["label_map"]
# Initialize the embedder
embedder = Embedder()
# Predict
text = "पानी आपूर्ति बन्द छ। तत्काल समाधान चाहिन्छ।"
result = predict_priority(text, embedder, clf, label_map, use_hybrid=True)
print(result) |