Sambodhan MuRIL Nepali Grievance Classification Model (Fold 1)
Model Description
This is a fine-tuned MuRIL (Multilingual Representations for Indian Languages) model for Nepali grievance classification. This model classifies citizen grievances into 5 department categories.
Best performing model from 5-fold cross-validation - This is Fold 1.
Model Details
- Base Model: google/muril-base-cased
- Language: Nepali (ne)
- Task: Multi-class Text Classification
- Classes: 5 departments
- Training Strategy: 5-Fold Cross-Validation with Data Augmentation
Performance Metrics (Fold 1)
| Metric | Score |
|---|---|
| Accuracy | 1.0000 (100.00%) |
| F1 Score | 1.0000 |
| Precision | 1.0000 |
| Recall | 1.0000 |
Cross-Validation Summary
| Fold | Accuracy | F1 Score | Precision | Recall |
|---|---|---|---|---|
| Fold 0 | 0.9949 | 0.9949 | 0.9950 | 0.9949 |
| Fold 1 ✅ | 1.0000 | 1.0000 | 1.0000 | 1.0000 |
| Fold 2 | 1.0000 | 1.0000 | 1.0000 | 1.0000 |
| Fold 3 | 1.0000 | 1.0000 | 1.0000 | 1.0000 |
| Fold 4 | 1.0000 | 1.0000 | 1.0000 | 1.0000 |
| Mean | 0.9990 | 0.9990 | 0.9990 | 0.9990 | | Std Dev | ±0.0023 | ±0.0023 | ±0.0022 | ±0.0023 |
✅ = Best performing fold (this model)
Training Configuration
- Max Length: 128 tokens
- Batch Size: 16
- Learning Rate: 2e-05
- Epochs: 3
- Warmup Steps: 500
- Training Samples per Fold: ~2125 (augmented)
- Validation Samples per Fold: ~198 (original)
Department Classes
The model classifies grievances into these 5 departments:
- Infrastructure (पूर्वाधार) - Roads, electricity, water supply
- Education (शिक्षा) - Schools, educational services
- Municipal (नगरपालिका) - Municipality services, land records
- Security (सुरक्षा) - Police, safety issues
- Other (अन्य) - Other government services
Usage
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
# Load model and tokenizer
model_name = "sandip404/sambodhan-folded-muril-grievance"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
# Example inference
text = "बिजुली मर्मतका लागि पटक पटक सम्पर्क गर्दा पनि आएको छैन।"
inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=128)
with torch.no_grad():
outputs = model(**inputs)
predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
predicted_class_id = predictions.argmax().item()
confidence = predictions.max().item()
print(f"Predicted class: {predicted_class_id}")
print(f"Confidence: {confidence:.3f}")
Training Data
- Balanced dataset with data augmentation
- Augmentation techniques: paraphrasing, word reordering, word insertion, noise reduction
- Stratified 5-fold cross-validation
Intended Use
This model is designed for:
- Automatic grievance classification in Nepali government systems
- Routing citizen complaints to appropriate departments
- Analysis of grievance patterns
Limitations
- Trained specifically for Nepali language
- Performance may vary on grievances outside the 5 training categories
- Best results with clean, well-formatted Nepali text
Citation
If you use this model, please cite:
@misc{sambodhan-muril-nepali-grievance,
author = {Sambodhan Team},
title = {MuRIL Nepali Grievance Classification Model},
year = {2025},
publisher = {HuggingFace},
howpublished = {\url{https://huggingface.co/sandip404/sambodhan-folded-muril-grievance}}
}
Contact
For questions or feedback, please open an issue in the Sambodhan repository.
- Downloads last month
- -