neodac-mini / README.md
Badnyal's picture
Update README.md
6b083e4 verified
---
license: apache-2.0
base_model: google/gemma-3-1b-it
tags:
- gemma
- northeast-india
- cultural
- fine-tuned
- assam
- manipur
- nagaland
- mizoram
- tripura
- meghalaya
- arunachal-pradesh
- sikkim
- neodac-mini
language:
- en
pipeline_tag: text-generation
library_name: transformers
widget:
- example_title: Bihu Festival
text: |
<start_of_turn>user
What is Bihu festival?<end_of_turn>
<start_of_turn>model
- example_title: Hornbill Festival
text: |
<start_of_turn>user
Tell me about Hornbill Festival.<end_of_turn>
<start_of_turn>model
- example_title: Assamese Cuisine
text: |
<start_of_turn>user
What is traditional Assamese cuisine?<end_of_turn>
<start_of_turn>model
---
# Neodac-mini: Northeast India Cultural AI Model
**Neodac-mini** (Northeast India Cultural) is a specialized language model fine-tuned on cultural knowledge of Northeast India's eight states. Built on Google's Gemma 3 1B Instruct, Neodac-mini provides authentic, detailed responses about the rich cultural heritage of the region.
## 🎯 Model Overview
- **Base Model**: [google/gemma-3-1b-it](https://huggingface.co/google/gemma-3-1b-it)
- **Specialization**: Northeast India Cultural Knowledge
- **Training Data**: 6,205 culturally authentic Q&A pairs
- **Coverage**: All 8 Northeast Indian states
- **Languages**: English (with cultural context)
## 🌟 Key Features
### Cultural Domains Covered
- **Festivals & Celebrations**: Bihu, Hornbill, Losar, Chapchar Kut, etc.
- **Traditional Arts**: Dance forms, music, crafts, weaving
- **Cuisine**: Regional foods, cooking methods, traditional recipes
- **Tribal Heritage**: Community practices, languages, customs
- **Geography**: Cultural significance of places and landmarks
- **Literature**: Folk tales, oral traditions, regional literature
### Model Capabilities
- βœ… Accurate cultural information without hallucinations
- βœ… Detailed responses about regional traditions
- βœ… Authentic representation of tribal communities
- βœ… Contextual understanding of cultural nuances
- βœ… Preservation of cultural knowledge through AI
## πŸš€ Quick Start
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("MWirelabs/neodac-mini")
model = AutoModelForCausalLM.from_pretrained(
"MWirelabs/neodac-mini",
torch_dtype=torch.bfloat16,
device_map="auto"
)
# Example usage
def ask_neodac-mini(question):
prompt = f"<start_of_turn>user\n{question}<end_of_turn>\n<start_of_turn>model\n"
inputs = tokenizer(prompt, return_tensors="pt")
with torch.no_grad():
outputs = model.generate(
**inputs,
max_length=300,
temperature=0.7,
do_sample=True,
pad_token_id=tokenizer.eos_token_id
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
return response.split("<start_of_turn>model\n")[-1].strip()
# Ask about Northeast India culture
response = ask_neodac-mini("What is the significance of bamboo in Northeast India?")
print(response)
```
## πŸ“Š Training Details
### Dataset
- **Size**: 6,205 cultural Q&A pairs
- **Sources**: Regional cultural databases, wiki content, expert curation
- **Quality**: Manually verified for cultural authenticity
- **Split**: 90% training, 10% validation
### Training Configuration
- **Hardware**: NVIDIA A40 40GB
- **Epochs**: 5 (enhanced from initial 3)
- **Learning Rate**: 2e-5 (optimized for detailed responses)
- **Batch Size**: 8 per device
- **Precision**: bfloat16
- **Max Sequence Length**: 512 tokens
### Improvements Over Base Model
| Aspect | Base Gemma 3 1B-IT | Neodac-mini |
|--------|-------------------|---------|
| Cultural Accuracy | ❌ Hallucinations | βœ… Factually correct |
| Response Detail | ⚠️ Generic/brief | βœ… Rich & comprehensive |
| Regional Context | ❌ Limited knowledge | βœ… Deep cultural understanding |
| Tribal Information | ❌ Inaccurate/missing | βœ… Authentic representation |
## πŸŽͺ Example Comparisons
### Question: "What is Bihu festival?"
**Base Model Response:**
> Claims Bihu is about Lord Shiva (incorrect)
**Neodac-mini Response:**
> Bihu is the most important festival of Assam, celebrated by all Assamese people. There are three Bihus that mark different stages of the agricultural calendar: Rongali (or Bohag) Bihu in spring, Kati (or Kongali) Bihu in autumn, and Magh (or Bhogali) Bihu in winter.
## 🎯 Use Cases
### Cultural Education
- Educational institutions teaching Northeast India studies
- Cultural preservation initiatives
- Tourism and travel information
### Research & Documentation
- Academic research on regional culture
- Cultural anthropology studies
- Digital heritage preservation
### Community Applications
- Cultural chatbots for tourism
- Educational tools for diaspora communities
- Content creation for cultural media
## ⚠️ Limitations
- **Geographic Scope**: Specialized for Northeast India only
- **Language**: Responses in English (cultural terms may be in local languages)
- **Temporal Knowledge**: Training data has knowledge cutoff
- **Bias Inheritance**: May inherit biases from base model and training data
## πŸ”¬ Evaluation & Performance
The model was evaluated on cultural accuracy, response completeness, and factual correctness. Significant improvements were observed over the base model in all cultural domains.
## πŸ“œ Citation
If you use Neodac-mini in your research or applications, please cite:
```bibtex
@misc{neodac2025,
title={Neodac-mini: A Specialized Language Model for Northeast India Cultural Knowledge},
author={MWire Labs},
year={2025},
publisher={Hugging Face},
url={https://huggingface.co/MWirelabs/neodac-mini},
note={Fine-tuned from google/gemma-3-1b-it for cultural preservation and education}
}
```
## 🀝 Contributing
Interested in improving Neodac-mini? We welcome:
- Additional cultural data from Northeast India
- Feedback on cultural accuracy
- Suggestions for new cultural domains
- Community validation of responses
## πŸ“„ License
This model is released under the Apache 2.0 license, same as the base Gemma model.
## πŸ™ Acknowledgments
- Google for the Gemma 3 1B-IT base model
- Cultural experts and communities of Northeast India
- Contributors to the cultural dataset
- Hugging Face for the platform and tools
---
*Neodac-mini represents a step forward in culturally-aware AI, preserving and making accessible the rich heritage of Northeast India through technology.*