Bitig-Nano

This is a small AI model that can write text in the Göktürk (Old Turkic) script. It was trained on the Turkish Wikipedia dataset, which was converted into Göktürk letters.

Disclaimer: This project is for fun and hobby purposes only. It is not a professional tool. The model might make mistakes or write things that are not historically accurate. It is a "Nano" sized model created for educational experiments.

How to Use

You can use this model with the Python transformers library.

from transformers import GPT2LMHeadModel, PreTrainedTokenizerFast

model_name = "eokayakca/Bitig-Nano"

tokenizer = PreTrainedTokenizerFast.from_pretrained(model_name)
model = GPT2LMHeadModel.from_pretrained(model_name)

prompt = "𐱅𐰇𐰼" # Start with "Tür"
input_ids = tokenizer.encode(prompt, return_tensors="pt")

output = model.generate(input_ids, max_length=50)
generated_text = tokenizer.decode(output[0], skip_special_tokens=True)

# The output is in Logical Order (Left-to-Right).
# For correct display, you might need to reverse it to Right-to-Left.
print(f"Logical (LTR): {generated_text}")
print(f"Visual (RTL):  {generated_text[::-1]}")

About the Data

The model learned from Turkish Wikipedia articles. We changed the Latin letters to Göktürk letters using a custom converter script.

Technical Note: The text is stored in Logical Order (Left-to-Right) for Unicode compatibility. However, Göktürk script is historically written and read from Right-to-Left. When you view the output, you may need to reverse it visually.

Training Details

Hardware: Apple M1 Mac (16 GB RAM)
Training Time: ~20 hours
Epochs: 3
Dataset: eokayakca/turkish-wikipedia-gokturk

Limitations

The model is very small (Nano size).
It may generate nonsense words or grammatically incorrect sentences.
It is designed for testing and learning, not for serious translation or historical research.

Downloads last month: 26

Safetensors

Model size

13.8M params

Tensor type

F32