Instructions to use DJLougen/Ornstein-27B-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use DJLougen/Ornstein-27B-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="DJLougen/Ornstein-27B-GGUF",
	filename="Ornstein-27B-F16.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": [
				{
					"type": "text",
					"text": "Describe this image in one sentence."
				},
				{
					"type": "image_url",
					"image_url": {
						"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
					}
				}
			]
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use DJLougen/Ornstein-27B-GGUF with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf DJLougen/Ornstein-27B-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf DJLougen/Ornstein-27B-GGUF:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf DJLougen/Ornstein-27B-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf DJLougen/Ornstein-27B-GGUF:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf DJLougen/Ornstein-27B-GGUF:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf DJLougen/Ornstein-27B-GGUF:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf DJLougen/Ornstein-27B-GGUF:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf DJLougen/Ornstein-27B-GGUF:Q4_K_M

Use Docker

docker model run hf.co/DJLougen/Ornstein-27B-GGUF:Q4_K_M

LM Studio
Jan

vLLM

How to use DJLougen/Ornstein-27B-GGUF with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "DJLougen/Ornstein-27B-GGUF"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "DJLougen/Ornstein-27B-GGUF",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/DJLougen/Ornstein-27B-GGUF:Q4_K_M

Ollama
How to use DJLougen/Ornstein-27B-GGUF with Ollama:
```
ollama run hf.co/DJLougen/Ornstein-27B-GGUF:Q4_K_M
```

Unsloth Studio new

How to use DJLougen/Ornstein-27B-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for DJLougen/Ornstein-27B-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for DJLougen/Ornstein-27B-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for DJLougen/Ornstein-27B-GGUF to start chatting

Pi new

How to use DJLougen/Ornstein-27B-GGUF with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf DJLougen/Ornstein-27B-GGUF:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "DJLougen/Ornstein-27B-GGUF:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use DJLougen/Ornstein-27B-GGUF with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf DJLougen/Ornstein-27B-GGUF:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default DJLougen/Ornstein-27B-GGUF:Q4_K_M

Run Hermes

hermes

Docker Model Runner
How to use DJLougen/Ornstein-27B-GGUF with Docker Model Runner:
```
docker model run hf.co/DJLougen/Ornstein-27B-GGUF:Q4_K_M
```

Lemonade

How to use DJLougen/Ornstein-27B-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull DJLougen/Ornstein-27B-GGUF:Q4_K_M

Run and chat with the model

lemonade run user.Ornstein-27B-GGUF-Q4_K_M

List all available models

lemonade list

Ornstein-27B-GGUF

GGUF quantizations of DJLougen/Ornstein-27B — a reasoning-focused fine-tune of Qwen 3.5 27B trained on 1,229 high-quality reasoning traces curated through a custom Drift Diffusion Modeling (DDM) pipeline.

Support This Work

I'm a PhD student in visual neuroscience at the University of Toronto who also happens to spend way too much time fine-tuning, merging, and quantizing open-weight models on rented H100s and a local DGX Spark. All training compute is self-funded — balancing GPU costs against a student budget. If my uploads have been useful to you, consider buying a PhD student a coffee. It goes a long way toward keeping these experiments running.

Support on Ko-fi

What Makes Ornstein Different

Unlike typical reasoning fine-tunes that use large volumes of synthetic data, Ornstein implements quality-over-quantity:

Detects degenerate reasoning: Identifies "fake" reasoning that mimics thought without substance (hedging, restating, circling)
Premium vs. Degenerate split: 799 premium traces + 430 selected degenerate traces = 1,229 total
DDM AUC of 0.9705 separating premium from degenerate reasoning with 99.49% sensitivity

The model uses <think>...</think> blocks for extended multi-phase reasoning with self-correction and verification before providing final answers.

Available Quantizations

Quantization	Size	Use Case
F16	53.8 GB	Full precision, no quality loss
Q8_0	28.6 GB	Near-lossless, good for high-end consumer GPUs
Q6_K	22.1 GB	High quality
Q5_K_M	19.2 GB	Good balance
Q5_K_S	18.7 GB	Lighter variant
Q4_K_M	16.5 GB	Recommended — strong quality/size tradeoff
IQ4_XS	11.6 GB	Efficient 4-bit
Q3_K_L	14.3 GB	Lighter 3-bit
Q3_K_M	13.3 GB	Mid 3-bit
Q3_K_S	12.1 GB	Light 3-bit
Q2_K	10.7 GB	Minimal footprint

Quick Start

llama.cpp

# Download a quantization (example: Q4_K_M)
huggingface-cli download DJLougen/Ornstein-27B-GGUF ornstein-27b-q4_k_m.gguf --local-dir .

# Run with llama.cpp
./llama-cli -m ornstein-27b-q4_k_m.gguf \
  -p "You are a helpful reasoning assistant." \
  --temp 0.6 -n 8192

Ollama

# Create a Modelfile
cat <<EOF > Modelfile
FROM ./ornstein-27b-q4_k_m.gguf
PARAMETER temperature 0.6
PARAMETER num_predict 8192
SYSTEM "You are a helpful reasoning assistant."
EOF

ollama create ornstein -f Modelfile
ollama run ornstein

LM Studio

Download the desired quantization from the Files tab
Load it in LM Studio
Set context length to 8192 for full reasoning depth

Recommended Settings

Parameter	Suggested Value
Temperature	0.6
Top-P	0.95
Max Tokens	8192
Repeat Penalty	1.1

Training Details

Parameter	Value
Base Model	`unsloth/Qwen3.5-27B`
Parameters	27B
Method	LoRA (rank 32, alpha 32)
Dropout	0.05
Epochs	1
Learning Rate	1e-4 (cosine schedule, 10% warmup)
Max Sequence Length	8192
Micro Batch Size	1
Gradient Accumulation	4 steps
Weight Decay	0.01
LoRA Targets	q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Framework	Unsloth

Data Quality Metrics

Metric	Value
Total Examples	1,229
Mean Thinking Depth	~1,667 words
Self-correction Present	100% of traces
Verification Present	100% of traces
Exploration Present	100% of traces
Quality Gate Pass Rate	100%

Training Data Profile

Category Mix: Math (1,016), Code (124), Science (45), Logic (44)
Reasoning Depth: Premium traces average ~1,263 words of thinking vs ~281 for degenerate traces
Drift Score Threshold: 1.463 cleanly separates premium from degenerate traces
DDM AUC: 0.9705 | Sensitivity: 99.49% | False Positive Rate: ~5%

Intended Use

Designed for tasks requiring structured, multi-step reasoning:

Mathematics
Logic problems
Code analysis
Scientific problems
Complex question answering

Limitations

Single epoch training on 1,229 examples means the model retains most base Qwen 3.5 27B behavior; the fine-tune primarily shapes reasoning style rather than injecting new knowledge
Language scope: DDM pipeline optimized for English; other languages reflect base model performance
Edge cases: Extended thinking can occasionally loop on adversarial or highly ambiguous prompts

Citation

@misc{ornstein27b,
  author = {DJLougen},
  title = {Ornstein-27B: DDM-Curated Reasoning Fine-Tune of Qwen 3.5 27B},
  year = {2026},
  publisher = {Hugging Face},
  url = {https://huggingface.co/DJLougen/Ornstein-27B}
}

Model tree for DJLougen/Ornstein-27B-GGUF

Base model

Qwen/Qwen3.5-27B

Finetuned

unsloth/Qwen3.5-27B

Adapter

DJLougen/Ornstein-27B

Quantized

(3)

this model

DJLougen
/

Ornstein-27B-GGUF

Ornstein-27B-GGUF

Support This Work

What Makes Ornstein Different

Available Quantizations

Quick Start

llama.cpp

Ollama

LM Studio

Recommended Settings

Training Details

Data Quality Metrics

Training Data Profile

Intended Use

Limitations

Citation

Links

Model tree for DJLougen/Ornstein-27B-GGUF