Instructions to use HuiyuWang/dpo-qwen-cot-merged with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use HuiyuWang/dpo-qwen-cot-merged with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="HuiyuWang/dpo-qwen-cot-merged")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("HuiyuWang/dpo-qwen-cot-merged")
model = AutoModelForCausalLM.from_pretrained("HuiyuWang/dpo-qwen-cot-merged")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use HuiyuWang/dpo-qwen-cot-merged with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "HuiyuWang/dpo-qwen-cot-merged"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "HuiyuWang/dpo-qwen-cot-merged",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/HuiyuWang/dpo-qwen-cot-merged

SGLang

How to use HuiyuWang/dpo-qwen-cot-merged with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "HuiyuWang/dpo-qwen-cot-merged" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "HuiyuWang/dpo-qwen-cot-merged",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "HuiyuWang/dpo-qwen-cot-merged" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "HuiyuWang/dpo-qwen-cot-merged",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Unsloth Studio

How to use HuiyuWang/dpo-qwen-cot-merged with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for HuiyuWang/dpo-qwen-cot-merged to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for HuiyuWang/dpo-qwen-cot-merged to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for HuiyuWang/dpo-qwen-cot-merged to start chatting

Load model with FastModel

pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="HuiyuWang/dpo-qwen-cot-merged",
    max_seq_length=2048,
)

Docker Model Runner
How to use HuiyuWang/dpo-qwen-cot-merged with Docker Model Runner:
```
docker model run hf.co/HuiyuWang/dpo-qwen-cot-merged
```

dpo-qwen-cot-merged

This repository provides a multi-stage fine-tuned version of Qwen3-4B-Instruct-2507.

The training pipeline consists of:

Supervised Fine-Tuning (SFT)
Stage-2 Hard SFT refinement
Direct Preference Optimization (DPO)

The LoRA adapters have been merged into the base model. This repository contains the final merged full-precision weights.

Training Pipeline

Stage 1 — Supervised Fine-Tuning (SFT)

Base model: Qwen/Qwen3-4B-Instruct-2507
Dataset: u-10bei/structured_data_with_cot_dataset_512_v5

Configuration:

Method: QLoRA (4-bit, Unsloth)
LoRA: r=64, alpha=128
Max sequence length: 512
Epochs: 2
Learning rate: 1e-4
Batch size: 2
Gradient accumulation: 8
Warmup ratio: 0.05
Weight decay: 0.0
Seed: 3407
CoT masking: Enabled (loss applied only to final outputs)

Stage 2 — Hard Data Refinement

Dataset: daichira/structured-hard-sft-4k

Configuration:

Epochs: 1
Learning rate: 3e-5
Same LoRA configuration as Stage 1

This stage improves robustness on difficult structured transformation tasks.

Stage 3 — Direct Preference Optimization (DPO)

Dataset: u-10bei/dpo-dataset-qwen-cot

Configuration:

Method: DPO via TRL + Unsloth
LoRA: r=8, alpha=16
Learning rate: 1e-7
Beta: 0.1
Max sequence length: 1024
Max prompt length: 512
Epochs: 1
Optimizer: adamw_8bit
Batch size: 2
Gradient accumulation: 4
Warmup ratio: 0.1
Weight decay: 0.01
Seed: 42

The objective is to align the model toward preferred Chain-of-Thought reasoning patterns using (prompt, chosen, rejected) preference data.

Merge Status

All LoRA adapters have been merged into the base model.

No PEFT loading is required.

Intended Use

This model is designed for:

Structured transformation tasks
Chain-of-Thought reasoning
Preference-aligned generation
Academic research experiments
Competition submission

Research Notes

This work explores multi-stage fine-tuning combining:

Structured SFT with CoT masking
Hard data refinement
Preference-based alignment via DPO

The training was performed using the Unsloth library for memory-efficient 4-bit fine-tuning.

License

This model follows the license of the base model:

Qwen/Qwen3-4B-Instruct-2507

Users must comply with the original base model license.

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "HuiyuWang/dpo-qwen-cot-merged"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

prompt = "Solve the following problem step by step: ..."
inputs = tokenizer(prompt, return_tensors="pt")

outputs = model.generate(
    **inputs,
    max_new_tokens=512,
    temperature=0.7,
)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Downloads last month: 10

Safetensors

Model size

4B params

Tensor type

BF16

Model tree for HuiyuWang/dpo-qwen-cot-merged

Base model

Qwen/Qwen3-4B-Instruct-2507

Finetuned

(1723)

this model

HuiyuWang
/

dpo-qwen-cot-merged

dpo-qwen-cot-merged

Training Pipeline

Stage 1 — Supervised Fine-Tuning (SFT)

Stage 2 — Hard Data Refinement

Stage 3 — Direct Preference Optimization (DPO)

Merge Status

Intended Use

Research Notes

License

Usage

Model tree for HuiyuWang/dpo-qwen-cot-merged

Datasets used to train HuiyuWang/dpo-qwen-cot-merged