Instructions to use neuroturk/HYZ-01-0.6B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use neuroturk/HYZ-01-0.6B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="neuroturk/HYZ-01-0.6B") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForMultimodalLM tokenizer = AutoTokenizer.from_pretrained("neuroturk/HYZ-01-0.6B") model = AutoModelForMultimodalLM.from_pretrained("neuroturk/HYZ-01-0.6B") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use neuroturk/HYZ-01-0.6B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "neuroturk/HYZ-01-0.6B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "neuroturk/HYZ-01-0.6B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/neuroturk/HYZ-01-0.6B
- SGLang
How to use neuroturk/HYZ-01-0.6B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "neuroturk/HYZ-01-0.6B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "neuroturk/HYZ-01-0.6B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "neuroturk/HYZ-01-0.6B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "neuroturk/HYZ-01-0.6B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use neuroturk/HYZ-01-0.6B with Docker Model Runner:
docker model run hf.co/neuroturk/HYZ-01-0.6B
1. Introduction
HYZ-01-Instruct is the instruction-tuned version of the HYZ-01 series developed by NeuroTürk. Building on the base model's strong Turkish language understanding, supervised fine-tuning (SFT) on high-quality instruction-response pairs has improved instruction-following performance across tasks such as conversation, question answering, summarization, and code generation.
The model is built on a multilingual foundation covering 119 languages, followed by Turkish-focused continual pre-training (CPT) and fine-tuning on 372,697 instruction-response pairs. The tokenizer has been extended specifically for Turkish morphological structure and advanced use cases. HYZ-01-0.6B is the lightweight, open-source version of HYZ-01, developed by NeuroTürk for Turkish.
Note: This is the instruction fine-tuned version. For the base model, see: HYZ-01-0.6B-Base
2. Model Summary
Continual Pre-Training and Fine-Tuning
- Base model: 4-stage Turkish continual pre-training (CPT) applied on top of a multilingual foundation.
- Fine-tuning (SFT): 372,697 carefully curated Turkish instruction-response pairs.
- Optimization: LoRA (r=64) + DoRA, bfloat16, flash-attention-2, AdamW.
- Final training loss: 0.6707
Tokenizer Extension
New special tokens were added to the tokenizer for two purposes:
- Language-structure tokens: To represent Turkish morphological features more efficiently.
- Task and structure tokens: To support structural use cases such as chain-of-thought, code blocks, section markers, and language labels.
The following 20 tokens have been added to the vocabulary but were not used during training; they are defined as infrastructure for future advanced capabilities:
| Group | Tokens | Future Use |
|---|---|---|
| Brand | <|neuroturk|> <|hyz01|> <|tr|> <|en|> |
Model identity and multilingual control |
| Chain-of-Thought | <|think|> <|/think|> <|step|> <|answer|> |
Step-by-step reasoning (CoT) |
| Dialogue | <|system|> <|user|> <|assistant|> <|end|> |
Multi-turn dialogue and role management |
| Code | <|code|> <|/code|> <|output|> <|error|> |
Structured code generation and debugging |
| Structure | <|title|> <|section|> <|list|> <|note|> |
Long-form and structured text generation (reports, articles, etc.) |
Note:
<|system|><|user|><|assistant|>tokens are actively used in the chat template.
3. Model Details
| Feature | Value |
|---|---|
| Total parameters | 595,798,016 (~0.6B) |
| Non-embedding parameters | 440,467,456 (~0.44B) |
| Hidden dimension | 1,024 |
| Number of layers | 28 |
| Attention heads (Q) | 16 |
| Attention heads (KV) | 8 (GQA) |
| Head dimension | 128 |
| Activation | SiLU |
| Normalization | RMSNorm (ε = 1 × 10⁻⁶) |
| Positional encoding | RoPE (θ = 1,000,000) |
| Vocabulary size | 151,690 |
| Training context length | 4,096 tokens |
| Theoretical max context | 32,768 tokens |
| Precision | BFloat16 |
| VRAM usage (fp16) | ~1.11 GB |
| Disk size | ~1.11 GB |
4. Training Details
| Setting | Value |
|---|---|
| Base model training | Multi-stage Turkish CPT |
| Fine-tuning type | Supervised Fine-Tuning (SFT) |
| Fine-tuning data size | 372,697 instruction-response pairs |
| Optimization | LoRA (r=64) + DoRA, AdamW |
| Precision | BFloat16 |
| Final loss | 0.6707 |
| LR schedule | Cosine with warmup |
| Context length | 4,096 tokens |
5. Usage
Installation
pip install transformers torch accelerate
Quick Start (Chat Format)
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model_name = "neuroturk/HYZ-01-0.6B"
tokenizer = AutoTokenizer.from_pretrained(
model_name,
trust_remote_code=True,
fix_mistral_regex=True
)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.bfloat16,
device_map="auto",
)
messages = [
{"role": "system", "content": "Senin adın HYZ-01, NeuroTürk tarafından geliştirilmiş bir Türkçe asistansın."},
{"role": "user", "content": "Yapay zeka nedir?"},
]
inputs = tokenizer.apply_chat_template(
messages,
return_tensors="pt",
add_generation_prompt=True
).to(model.device)
outputs = model.generate(
**inputs,
max_new_tokens=200,
temperature=0.8,
top_p=0.95,
do_sample=True,
repetition_penalty=1.1,
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Low VRAM (4-bit Quantization)
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
import torch
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_compute_dtype=torch.bfloat16,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type="nf4",
)
tokenizer = AutoTokenizer.from_pretrained(
"neuroturk/HYZ-01-0.6B",
trust_remote_code=True,
)
model = AutoModelForCausalLM.from_pretrained(
"neuroturk/HYZ-01-0.6B",
quantization_config=bnb_config,
device_map="auto",
)
Additional Fine-Tuning with Unsloth
from unsloth import FastLanguageModel
model, tokenizer = FastLanguageModel.from_pretrained(
model_name="neuroturk/HYZ-01-0.6B",
max_seq_length=4096,
load_in_4bit=True,
)
model = FastLanguageModel.get_peft_model(
model,
r=32,
lora_alpha=64,
lora_dropout=0.0,
target_modules=[
"q_proj", "k_proj", "v_proj", "o_proj",
"gate_proj", "up_proj", "down_proj",
],
use_gradient_checkpointing="unsloth",
)
6. Chat Template
{% for message in messages %}
{% if message['role'] == 'system' %}
<|system|>
{{ message['content'] }}<|endoftext|>
{% elif message['role'] == 'user' %}
<|user|>
{{ message['content'] }}<|endoftext|>
{% elif message['role'] == 'assistant' %}
<|assistant|>
{{ message['content'] }}<|endoftext|>
{% endif %}
{% endfor %}
{% if add_generation_prompt %}<|assistant|>
{% endif %}
7. Evaluation Results
All evaluations were conducted using lm-evaluation-harness.
| Task | Category | Setting | Score |
|---|---|---|---|
| TurBLiMP (ditransitive) | Grammar | 0-shot | 89.10% |
| TurBLiMP (transitive) | Grammar | 0-shot | 86.40% |
| XCOPA TR | Causality | 0-shot | 56.80% |
| XNLI TR | Natural language inference | 0-shot | 36.59% |
| Belebele TR | Reading comprehension | 0-shot | 40.33% |
| Global MMLU TR | General knowledge | 5-shot | 33.08% |
| TurkishMMLU | Turkish MMLU (9 subjects) | 5-shot | 27.44% |
| XQuAD TR | Question answering (EM / F1) | 1-shot | 16.00% / 29.16% |
| TokSuite TR | Morphology | 0-shot | — |
| MGSM TR | Mathematics | 8-shot | — |
Note: XQuAD TR was evaluated in generative question-answering format. The Exact Match (EM) score appears low due to strict string matching requirements; the F1 score better reflects the model's actual performance.
Note: TokSuite TR and MGSM TR evaluations are ongoing; results will be added upon completion.
The model may perform somewhat better than benchmark scores indicate on tasks such as everyday conversation, text summarization, code generation, and open-ended question answering.
8. Limitations
- Although the model is successful at instruction following, it may occasionally produce incorrect or inconsistent outputs.
- Complex multi-step reasoning may be limited with 0.6B parameters.
- Biases present in the training data may be reflected in outputs.
- Performance drops significantly in languages other than Turkish.
- Human verification of outputs is recommended for critical applications.
9. Citation
@misc{neuroturk2026hyz01,
author = {NeuroTürk},
title = {HYZ-01-0.6B: A Lightweight Turkish Instruction Model},
year = 2026,
}
- Downloads last month
- 113
