Sambodhan MuRIL Nepali Grievance Classification Model (Fold 1)

Model Description

This is a fine-tuned MuRIL (Multilingual Representations for Indian Languages) model for Nepali grievance classification. This model classifies citizen grievances into 5 department categories.

Best performing model from 5-fold cross-validation - This is Fold 1.

Model Details

Base Model: google/muril-base-cased
Language: Nepali (ne)
Task: Multi-class Text Classification
Classes: 5 departments
Training Strategy: 5-Fold Cross-Validation with Data Augmentation

Performance Metrics (Fold 1)

Metric	Score
Accuracy	1.0000 (100.00%)
F1 Score	1.0000
Precision	1.0000
Recall	1.0000

Cross-Validation Summary

Fold	Accuracy	F1 Score	Precision	Recall
Fold 0	0.9949	0.9949	0.9950	0.9949
Fold 1 ✅	1.0000	1.0000	1.0000	1.0000
Fold 2	1.0000	1.0000	1.0000	1.0000
Fold 3	1.0000	1.0000	1.0000	1.0000
Fold 4	1.0000	1.0000	1.0000	1.0000

| Mean | 0.9990 | 0.9990 | 0.9990 | 0.9990 | | Std Dev | ±0.0023 | ±0.0023 | ±0.0022 | ±0.0023 |

✅ = Best performing fold (this model)

Training Configuration

Max Length: 128 tokens
Batch Size: 16
Learning Rate: 2e-05
Epochs: 3
Warmup Steps: 500
Training Samples per Fold: ~2125 (augmented)
Validation Samples per Fold: ~198 (original)

Department Classes

The model classifies grievances into these 5 departments:

Infrastructure (पूर्वाधार) - Roads, electricity, water supply
Education (शिक्षा) - Schools, educational services
Municipal (नगरपालिका) - Municipality services, land records
Security (सुरक्षा) - Police, safety issues
Other (अन्य) - Other government services

Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Load model and tokenizer
model_name = "sandip404/sambodhan-folded-muril-grievance"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# Example inference
text = "बिजुली मर्मतका लागि पटक पटक सम्पर्क गर्दा पनि आएको छैन।"
inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=128)

with torch.no_grad():
    outputs = model(**inputs)
    predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
    predicted_class_id = predictions.argmax().item()
    confidence = predictions.max().item()

print(f"Predicted class: {predicted_class_id}")
print(f"Confidence: {confidence:.3f}")

Training Data

Balanced dataset with data augmentation
Augmentation techniques: paraphrasing, word reordering, word insertion, noise reduction
Stratified 5-fold cross-validation

Intended Use

This model is designed for:

Automatic grievance classification in Nepali government systems
Routing citizen complaints to appropriate departments
Analysis of grievance patterns

Limitations

Trained specifically for Nepali language
Performance may vary on grievances outside the 5 training categories
Best results with clean, well-formatted Nepali text

Citation

If you use this model, please cite:

@misc{sambodhan-muril-nepali-grievance,
  author = {Sambodhan Team},
  title = {MuRIL Nepali Grievance Classification Model},
  year = {2025},
  publisher = {HuggingFace},
  howpublished = {\url{https://huggingface.co/sandip404/sambodhan-folded-muril-grievance}}
}

Contact

For questions or feedback, please open an issue in the Sambodhan repository.

Downloads last month: -

Safetensors

Model size

0.2B params

Tensor type

F32