Sambodhan MuRIL Nepali Grievance Classification Model (Fold 1)

Model Description

This is a fine-tuned MuRIL (Multilingual Representations for Indian Languages) model for Nepali grievance classification. This model classifies citizen grievances into 5 department categories.

Best performing model from 5-fold cross-validation - This is Fold 1.

Model Details

  • Base Model: google/muril-base-cased
  • Language: Nepali (ne)
  • Task: Multi-class Text Classification
  • Classes: 5 departments
  • Training Strategy: 5-Fold Cross-Validation with Data Augmentation

Performance Metrics (Fold 1)

Metric Score
Accuracy 1.0000 (100.00%)
F1 Score 1.0000
Precision 1.0000
Recall 1.0000

Cross-Validation Summary

Fold Accuracy F1 Score Precision Recall
Fold 0 0.9949 0.9949 0.9950 0.9949
Fold 1 ✅ 1.0000 1.0000 1.0000 1.0000
Fold 2 1.0000 1.0000 1.0000 1.0000
Fold 3 1.0000 1.0000 1.0000 1.0000
Fold 4 1.0000 1.0000 1.0000 1.0000

| Mean | 0.9990 | 0.9990 | 0.9990 | 0.9990 | | Std Dev | ±0.0023 | ±0.0023 | ±0.0022 | ±0.0023 |

✅ = Best performing fold (this model)

Training Configuration

  • Max Length: 128 tokens
  • Batch Size: 16
  • Learning Rate: 2e-05
  • Epochs: 3
  • Warmup Steps: 500
  • Training Samples per Fold: ~2125 (augmented)
  • Validation Samples per Fold: ~198 (original)

Department Classes

The model classifies grievances into these 5 departments:

  1. Infrastructure (पूर्वाधार) - Roads, electricity, water supply
  2. Education (शिक्षा) - Schools, educational services
  3. Municipal (नगरपालिका) - Municipality services, land records
  4. Security (सुरक्षा) - Police, safety issues
  5. Other (अन्य) - Other government services

Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Load model and tokenizer
model_name = "sandip404/sambodhan-folded-muril-grievance"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# Example inference
text = "बिजुली मर्मतका लागि पटक पटक सम्पर्क गर्दा पनि आएको छैन।"
inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=128)

with torch.no_grad():
    outputs = model(**inputs)
    predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
    predicted_class_id = predictions.argmax().item()
    confidence = predictions.max().item()

print(f"Predicted class: {predicted_class_id}")
print(f"Confidence: {confidence:.3f}")

Training Data

  • Balanced dataset with data augmentation
  • Augmentation techniques: paraphrasing, word reordering, word insertion, noise reduction
  • Stratified 5-fold cross-validation

Intended Use

This model is designed for:

  • Automatic grievance classification in Nepali government systems
  • Routing citizen complaints to appropriate departments
  • Analysis of grievance patterns

Limitations

  • Trained specifically for Nepali language
  • Performance may vary on grievances outside the 5 training categories
  • Best results with clean, well-formatted Nepali text

Citation

If you use this model, please cite:

@misc{sambodhan-muril-nepali-grievance,
  author = {Sambodhan Team},
  title = {MuRIL Nepali Grievance Classification Model},
  year = {2025},
  publisher = {HuggingFace},
  howpublished = {\url{https://huggingface.co/sandip404/sambodhan-folded-muril-grievance}}
}

Contact

For questions or feedback, please open an issue in the Sambodhan repository.

Downloads last month
-
Safetensors
Model size
0.2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support