grievance-priority-classifier / README.md

HighOnCaffiene

Updated README.md

8d379f5 verified about 2 months ago

preview code

raw

history blame contribute delete

3.63 kB

metadata

license: apache-2.0
language:
  - ne
  - en
metrics:
  - accuracy
  - f1
  - precision
  - recall
base_model: sentence-transformers/all-MiniLM-L6-v2
new_version: 1.0.0
pipeline_tag: text-classification
library_name: scikit-learn
tags:
  - hybrid-model
  - logistic-regression
  - sentence-transformers
  - sbert
  - ne-en
  - rule-based
  - text-priority
  - low-resource-nlp
  - multilingual
  - civictech
  - complaint-triage
  - emergency-detection
eval_results:
  - task:
      type: text-classification
      name: Priority Detection (Nepali + English)
    dataset:
      name: priority_clean.csv (custom)
      type: csv
      size: 266 samples
    metrics:
      accuracy: 0.725
      f1_macro: 0.72
      precision_macro: 0.73
      recall_macro: 0.73
      per_class:
        HIGH:
          precision: 0.73
          recall: 0.66
          f1: 0.69
        MEDIUM:
          precision: 0.74
          recall: 0.8
          f1: 0.76
        LOW:
          precision: 0.71
          recall: 0.72
          f1: 0.71

Priority Classification Model (Nepali + English Hybrid)

Model Overview

This model automatically classifies citizen complaints or service requests into priority levels — HIGH, MEDIUM, or LOW — based on the urgency and nature of the text. It supports both Nepali and English inputs and uses a hybrid ML + rule-based approach to ensure robustness, especially on small datasets.

Model Architecture

Component	Description
Embedder	`sentence-transformers/all-MiniLM-L6-v2`
Classifier	Logistic Regression (multiclass, balanced weights)
Rule-based Layer	Keyword-based fallback for urgency terms in Nepali and English
Features	SBERT embeddings + priority keyword preservation
Hybrid Inference	Combines ML prediction confidence with rules for safer decisions

Training Summary

Metric	Value
Total raw samples	266
After preprocessing & augmentation	594
Train/Test Split	445 / 149
Embedding Dimension	384
Classes	`HIGH`, `MEDIUM`, `LOW`
Test Accuracy	72.5%
Macro F1-score	0.72

Label Distribution (After Normalization)

Label	Count
HIGH	203
MEDIUM	29
LOW	34

Label Distribution (After Augmentation)

Label	Count
HIGH	200
MEDIUM	194
LOW	200

Classification Report

Class	Precision	Recall	F1	Support
HIGH	0.73	0.66	0.69	50
MEDIUM	0.74	0.80	0.76	49
LOW	0.71	0.72	0.71	50
Overall Accuracy			0.725	149

Performance is acceptable (≥70%) given dataset size. The model performs best on clearly marked “urgent/emergency” cases and slightly lower on borderline MEDIUM cases.

Inference (Usage)

Using the model directly (ML only or Hybrid)

from huggingface_hub import hf_hub_download
import joblib
from priority_det import Embedder, predict_priority

# Download the model
model_path = hf_hub_download(repo_id="your-username/priority-classifier", filename="classifier.joblib")

# Load the classifier
bundle = joblib.load(model_path)
clf = bundle["clf"]
label_map = bundle["label_map"]

# Initialize the embedder
embedder = Embedder()

# Predict
text = "पानी आपूर्ति बन्द छ। तत्काल समाधान चाहिन्छ।"
result = predict_priority(text, embedder, clf, label_map, use_hybrid=True)
print(result)