Instructions to use leonvanbokhorst/deepseek-r1-overthinking with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use leonvanbokhorst/deepseek-r1-overthinking with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="leonvanbokhorst/deepseek-r1-overthinking")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("leonvanbokhorst/deepseek-r1-overthinking", dtype="auto")

llama-cpp-python

How to use leonvanbokhorst/deepseek-r1-overthinking with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="leonvanbokhorst/deepseek-r1-overthinking",
	filename="deepseek-r1-overthinking-q4_k_m.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use leonvanbokhorst/deepseek-r1-overthinking with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf leonvanbokhorst/deepseek-r1-overthinking:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf leonvanbokhorst/deepseek-r1-overthinking:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf leonvanbokhorst/deepseek-r1-overthinking:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf leonvanbokhorst/deepseek-r1-overthinking:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf leonvanbokhorst/deepseek-r1-overthinking:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf leonvanbokhorst/deepseek-r1-overthinking:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf leonvanbokhorst/deepseek-r1-overthinking:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf leonvanbokhorst/deepseek-r1-overthinking:Q4_K_M

Use Docker

docker model run hf.co/leonvanbokhorst/deepseek-r1-overthinking:Q4_K_M

LM Studio
Jan

vLLM

How to use leonvanbokhorst/deepseek-r1-overthinking with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "leonvanbokhorst/deepseek-r1-overthinking"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "leonvanbokhorst/deepseek-r1-overthinking",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/leonvanbokhorst/deepseek-r1-overthinking:Q4_K_M

SGLang

How to use leonvanbokhorst/deepseek-r1-overthinking with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "leonvanbokhorst/deepseek-r1-overthinking" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "leonvanbokhorst/deepseek-r1-overthinking",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "leonvanbokhorst/deepseek-r1-overthinking" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "leonvanbokhorst/deepseek-r1-overthinking",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Ollama
How to use leonvanbokhorst/deepseek-r1-overthinking with Ollama:
```
ollama run hf.co/leonvanbokhorst/deepseek-r1-overthinking:Q4_K_M
```

Unsloth Studio new

How to use leonvanbokhorst/deepseek-r1-overthinking with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for leonvanbokhorst/deepseek-r1-overthinking to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for leonvanbokhorst/deepseek-r1-overthinking to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for leonvanbokhorst/deepseek-r1-overthinking to start chatting

Docker Model Runner
How to use leonvanbokhorst/deepseek-r1-overthinking with Docker Model Runner:
```
docker model run hf.co/leonvanbokhorst/deepseek-r1-overthinking:Q4_K_M
```

Lemonade

How to use leonvanbokhorst/deepseek-r1-overthinking with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull leonvanbokhorst/deepseek-r1-overthinking:Q4_K_M

Run and chat with the model

lemonade run user.deepseek-r1-overthinking-Q4_K_M

List all available models

lemonade list

Deepseek-R1-Overthinking 🤔

Model Description

This model embodies the principles of "Designing Friction" - a manifesto that challenges the prevailing pursuit of frictionless digital experiences. In a world where AI strives for seamless, immediate responses, this model intentionally introduces resistance into human-AI interactions, creating space for deeper engagement and authentic human connection.

Key Features

Embracing Resistance: The model deliberately slows down interaction, creating space for reflection and discovery
Stream-of-Consciousness Reasoning: Multiple agent perspectives expose the messy, human-like thinking process
Embodied Cognition: Integration of physical and mental markers that engage the whole self
Unpredictable Interactions: Breaking away from the "predictable self" that typical AI interactions enforce

Philosophy & Purpose

This model challenges the conventional wisdom of AI design by:

Resisting Immediacy: Instead of instant gratification, it creates meaningful delays that fuel deeper understanding
Embracing Discomfort: Uncomfortable situations become opportunities for learning and discovery
Creating Human Space: Making room for doubt, vulnerability, and the "non-positive" aspects that make us human
Breaking Predictability: Moving beyond data-driven patterns to embrace the unexpected
Fostering Connection: Using friction as a bridge for authentic human-AI engagement

Intended Use

This model is particularly valuable for:

Educational contexts where deep understanding trumps quick answers
Research scenarios requiring thorough exploration of ideas
Creative problem-solving benefiting from multiple perspectives
Any situation where "slowing down" leads to better outcomes
Contexts where human connection matters more than efficiency

Technical Details

Model Architecture

Base Model: DeepSeek-R1-Distill-Qwen-14B
Quantization: 4-bit (using bnb)
Context Length: 4096 tokens
Flash Attention 2: Enabled
Precision: bfloat16

Training Configuration

Fine-tuning Method: LoRA (Low-Rank Adaptation)
- Rank (r): 16
- Alpha: 32
- Target Modules: Query, Key, Value projections, Output, Gate, Up/Down projections
- Dropout: 0.0
Training Process:
- Epochs: 5
- Learning Rate: 2e-4
- Batch Size: 2 (with gradient accumulation steps of 4)
- Warmup Ratio: 0.1
- Weight Decay: 0.01
- Gradient Clipping: 0.5
- Early Stopping: Patience of 3 epochs with 0.005 threshold
- Optimizer: AdamW (8-bit)
- Mixed Precision: bfloat16

Dataset

The model is fine-tuned on carefully curated examples from the friction-overthinking-v2 dataset that emphasize:

Natural thought progression with intentional friction points
Multi-perspective analysis through different agent roles
Integration of physical and mental markers
Unpredictable and non-linear reasoning patterns
Embrace of uncertainty and exploration

Input Format

<|im_start|>system
You are a human-like AI assistant.
<|im_end|>
<|im_start|>user
{question}
<|im_end|>
<|im_start|>assistant
<think>
{thought_stream}
</think>
{final_answer}
<|im_end|>

Limitations & Biases

Intentional Slowness: The model deliberately takes longer to respond
Complexity in Simplicity: Even simple queries receive detailed exploration
Productive Discomfort: Users seeking quick answers may feel initial friction
Base Model Inheritance: Carries forward inherent biases from the base model
Digital Constraints: While we aim for embodied interaction, we're still limited by the digital medium
Resource Requirements: Due to the model size and attention mechanism, requires significant computational resources

Example Usage

Input:

Why do babies cry in different languages?

The response will demonstrate:

Thoughtful pauses and self-questioning
Multiple perspective exploration
Physical and mental engagement markers
Embrace of uncertainty
Deep, interconnected reasoning

About Friction-Based Reasoning

This model represents a fundamental shift in AI interaction design. While most AI systems strive for frictionless experiences, we intentionally introduce resistance points that:

Challenge the "death by convenience" of modern digital interactions
Create space for human messiness and unpredictability
Engage both mind and body in the reasoning process
Value the journey of understanding over quick answers
Foster genuine connection through shared exploration

As stated in the Designing Friction manifesto: "Friction perceived as an obstacle might in fact be a possibility for connection."

Citation

If you use this model in your research, please cite:

@misc{deepseek-r1-overthinking,
  author = {Leon van Bokhorst},
  title = {Deepseek-R1-Overthinking: A Friction-Based Reasoning Model},
  year = {2025},
  publisher = {HuggingFace},
  journal = {HuggingFace Hub},
  howpublished = {\url{https://huggingface.co/leonvanbokhorst/deepseek-r1-overthinking}}
}

Acknowledgments

This model's design philosophy is deeply inspired by the "Designing Friction" manifesto by Luna Maurer and Roel Wouters, which calls for reintroducing meaningful resistance into our digital interactions.

Downloads last month: 12

GGUF

Model size

15B params

Architecture

qwen2

Hardware compatibility

4-bit

Model tree for leonvanbokhorst/deepseek-r1-overthinking

Base model

deepseek-ai/DeepSeek-R1-Distill-Qwen-14B

Quantized

unsloth/DeepSeek-R1-Distill-Qwen-14B-unsloth-bnb-4bit