README.md · HighOnCaffiene/grievance-priority-classifier at main

File size: 3,629 Bytes

6b0db41
8d379f5
6b0db41
 
8d379f5

---
license: apache-2.0
language:
- ne
- en
metrics:
- accuracy
- f1
- precision
- recall
base_model: sentence-transformers/all-MiniLM-L6-v2
new_version: 1.0.0
pipeline_tag: text-classification
library_name: scikit-learn
tags:
- hybrid-model
- logistic-regression
- sentence-transformers
- sbert
- ne-en
- rule-based
- text-priority
- low-resource-nlp
- multilingual
- civictech
- complaint-triage
- emergency-detection
eval_results:
- task:
    type: text-classification
    name: Priority Detection (Nepali + English)
  dataset:
    name: priority_clean.csv (custom)
    type: csv
    size: 266 samples
  metrics:
    accuracy: 0.725
    f1_macro: 0.72
    precision_macro: 0.73
    recall_macro: 0.73
    per_class:
      HIGH:
        precision: 0.73
        recall: 0.66
        f1: 0.69
      MEDIUM:
        precision: 0.74
        recall: 0.8
        f1: 0.76
      LOW:
        precision: 0.71
        recall: 0.72
        f1: 0.71
---

# Priority Classification Model (Nepali + English Hybrid)

## Model Overview
This model automatically classifies citizen complaints or service requests into **priority levels** — `HIGH`, `MEDIUM`, or `LOW` — based on the urgency and nature of the text.
It supports **both Nepali and English** inputs and uses a **hybrid ML + rule-based approach** to ensure robustness, especially on small datasets.

---

## Model Architecture

| Component | Description |
|------------|-------------|
| **Embedder** | [`sentence-transformers/all-MiniLM-L6-v2`](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) |
| **Classifier** | Logistic Regression (multiclass, balanced weights) |
| **Rule-based Layer** | Keyword-based fallback for urgency terms in Nepali and English |
| **Features** | SBERT embeddings + priority keyword preservation |
| **Hybrid Inference** | Combines ML prediction confidence with rules for safer decisions |

---

## Training Summary

| Metric | Value |
|---------|-------|
| **Total raw samples** | 266 |
| **After preprocessing & augmentation** | 594 |
| **Train/Test Split** | 445 / 149 |
| **Embedding Dimension** | 384 |
| **Classes** | `HIGH`, `MEDIUM`, `LOW` |
| **Test Accuracy** | **72.5%** |
| **Macro F1-score** | **0.72** |

### Label Distribution (After Normalization)
| Label | Count |
|--------|-------|
| HIGH | 203 |
| MEDIUM | 29 |
| LOW | 34 |

### Label Distribution (After Augmentation)
| Label | Count |
|--------|-------|
| HIGH | 200 |
| MEDIUM | 194 |
| LOW | 200 |

---

## Classification Report

| Class | Precision | Recall | F1 | Support |
|--------|------------|--------|----|----------|
| HIGH | 0.73 | 0.66 | 0.69 | 50 |
| MEDIUM | 0.74 | 0.80 | 0.76 | 49 |
| LOW | 0.71 | 0.72 | 0.71 | 50 |
| **Overall Accuracy** | | | **0.725** | 149 |

**Performance is acceptable (≥70%)** given dataset size.
The model performs best on clearly marked “urgent/emergency” cases and slightly lower on borderline MEDIUM cases.

---

## Inference (Usage)

### Using the model directly (ML only or Hybrid)
```python
from huggingface_hub import hf_hub_download
import joblib
from priority_det import Embedder, predict_priority

# Download the model
model_path = hf_hub_download(repo_id="your-username/priority-classifier", filename="classifier.joblib")

# Load the classifier
bundle = joblib.load(model_path)
clf = bundle["clf"]
label_map = bundle["label_map"]

# Initialize the embedder
embedder = Embedder()

# Predict
text = "पानी आपूर्ति बन्द छ। तत्काल समाधान चाहिन्छ।"
result = predict_priority(text, embedder, clf, label_map, use_hybrid=True)
print(result)