Instructions to use Sakonii/distilgpt2-nepali with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Sakonii/distilgpt2-nepali with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Sakonii/distilgpt2-nepali")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Sakonii/distilgpt2-nepali")
model = AutoModelForCausalLM.from_pretrained("Sakonii/distilgpt2-nepali")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use Sakonii/distilgpt2-nepali with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Sakonii/distilgpt2-nepali"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Sakonii/distilgpt2-nepali",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/Sakonii/distilgpt2-nepali

SGLang

How to use Sakonii/distilgpt2-nepali with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Sakonii/distilgpt2-nepali" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Sakonii/distilgpt2-nepali",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Sakonii/distilgpt2-nepali" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Sakonii/distilgpt2-nepali",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use Sakonii/distilgpt2-nepali with Docker Model Runner:
```
docker model run hf.co/Sakonii/distilgpt2-nepali
```

distilgpt2-nepali

This model is pre-trained on nepalitext dataset consisting of over 13 million Nepali text sequences using a Causal language modeling (CLM) objective. Our approach trains a Sentence Piece Model (SPM) for text tokenization similar to XLM-ROBERTa and trains distilgpt2 for language modeling.

It achieves the following results on the evaluation set:

Training Loss	Validation Loss	Perplexity
3.3968	3.2705	26.3245

Model description

Refer to original distilgpt2

Intended uses & limitations

This raw model can be used for Nepali text generation and intends to be fine-tuned on Nepali language focused downstream task. The language model being trained on a data with texts grouped to a block size of 512, it handles text sequence up to 512 tokens and may not perform satisfactorily on shorter sequences.

Usage

This model can be used directly with a pipeline for text generation. Since the generation relies on some randomness, we set a seed for reproducibility:

>>> from transformers import pipeline, set_seed
>>> set_seed(42)
>>> generator = pipeline('text-generation', model='Sakonii/distilgpt2-nepali')
>>> generator("नेपालका धेरैजसो चाडपर्वहरूमध्ये,", max_length=30, num_return_sequences=5)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
[{'generated_text': 'नेपालका धेरैजसो चाडपर्वहरूमध्ये, तिहार र छठपर्व विशेष रूपमा मनाइने भएकाले नेपाली मौलिक पर्व पनि हो । हिन्दू धर्म र संस्कृतिक... काठमाडौं ।'},
 {'generated_text': 'नेपालका धेरैजसो चाडपर्वहरूमध्ये, तिहारको मुख्य दिन आज साँझ अस्ताउँदो सूर्यलाई अर्घ्य दिइएको छ । वैदिक विधि...विस्तृतमा पढ्नुस् काठमाडौं । नेपाल चिकित्सक संघका'},
 {'generated_text': 'नेपालका धेरैजसो चाडपर्वहरूमध्ये, चाडपर्व, विवाह,... नेपाली काँग्रेसका प्रवक्ता विश्वप्रकाश शर्माले पार्टीभित्र आन्तरिक झगडा हुने निश्चित भएको र गुटबन्दीका कारण चुनावमा हार बेहोर्नु'},
 {'generated_text': 'नेपालका धेरैजसो चाडपर्वहरूमध्ये, दशैं नेपालीहरूको मौलिक पर्वका रूपमा मनाउँछन् । नेपालीहरूको दोस्रो महान् पर्व तिहार हो । तिहारले दाजुभाइ तथा दिदीबहिनीहरूको बीचमा प्रगाढ सम्बन्ध स्थापित'},
 {'generated_text': 'नेपालका धेरैजसो चाडपर्वहरूमध्ये, माघे संक्रान्ति र माघे संक्रान्तिमा माघे संक्रान्तिमा मात्र नभएर फागुन महिनाभर नै विशेष महत्व रहने गरेको छ । काठमाडौं ।'}]

Here is how we can use the model to get the features of a given text in PyTorch:

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained('Sakonii/distilgpt2-nepali')
model = AutoModelForCausalLM.from_pretrained('Sakonii/distilgpt2-nepali')

# prepare input
text = "चाहिएको text यता राख्नु होला।"
encoded_input = tokenizer(text, return_tensors='pt')

# forward pass
output = model(**encoded_input)

Training data

This model is trained on nepalitext language modeling dataset which combines the datasets: OSCAR , cc100 and a set of scraped Nepali articles on Wikipedia. As for training the language model, the texts are tokenized using Sentence Piece Model (SPM), a vocabulary size of 24,576 and texts are are grouped to a block of 512 tokens.

Training procedure

The model is trained with the same configuration as the original distilgpt2; but with 512 tokens per instance, 12 instances per batch, and around 188.8K training steps.

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 12
eval_batch_size: 12
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 5
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Perplexity
3.7645	1.0	94395	3.6291	37.6789
3.5857	2.0	188790	3.4442	31.3182
3.505	3.0	283185	3.3749	29.2214
3.4688	4.0	377580	3.3439	28.3294
3.3968	5.0	471975	3.2705	26.3245

Framework versions

Transformers 4.17.0
Pytorch 1.9.1
Datasets 2.0.0
Tokenizers 0.11.6

Downloads last month: 1,417

Safetensors

Model size

88.2M params

Tensor type

F32

Model tree for Sakonii/distilgpt2-nepali

Adapters

2 models

Finetunes

9 models

Dataset used to train Sakonii/distilgpt2-nepali

Paper for Sakonii/distilgpt2-nepali

Unsupervised Cross-lingual Representation Learning at Scale

Paper • 1911.02116 • Published Nov 5, 2019 • 4