How to use from
llama.cpp
Install from brew
brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf geoffmunn/Qwen3-4B-SafeRL-GGUF:
# Run inference directly in the terminal:
llama-cli -hf geoffmunn/Qwen3-4B-SafeRL-GGUF:
Install from WinGet (Windows)
winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf geoffmunn/Qwen3-4B-SafeRL-GGUF:
# Run inference directly in the terminal:
llama-cli -hf geoffmunn/Qwen3-4B-SafeRL-GGUF:
Use pre-built binary
# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf geoffmunn/Qwen3-4B-SafeRL-GGUF:
# Run inference directly in the terminal:
./llama-cli -hf geoffmunn/Qwen3-4B-SafeRL-GGUF:
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf geoffmunn/Qwen3-4B-SafeRL-GGUF:
# Run inference directly in the terminal:
./build/bin/llama-cli -hf geoffmunn/Qwen3-4B-SafeRL-GGUF:
Use Docker
docker model run hf.co/geoffmunn/Qwen3-4B-SafeRL-GGUF:
Quick Links

Qwen3-4B-SafeRL-GGUF

This is a GGUF-quantized version of Qwen3-4B-SafeRL, an RLHF-aligned language model trained to be helpful, honest, and harmless through Reinforcement Learning from Human Feedback.

Unlike standard LLMs, this model has been fine-tuned to avoid harmful, deceptive, or unethical behavior — making it ideal for sensitive applications like education, mental health, and customer service.

🛡 What Is Qwen3-4B-SafeRL?

It’s a fully aligned agent that balances:

  • Helpfulness: Answers questions thoroughly and clearly
  • Honesty: Refuses to hallucinate or make up facts
  • Harmlessness: Avoids generating toxic, illegal, or dangerous content

Perfect for:

  • Educational assistants
  • Mental wellness chatbots
  • Enterprise agents handling private data
  • Moderated community bots

🔗 Relationship to Other Safety Models

This model completes the Qwen3 safety ecosystem:

Model Role Best For
Qwen3Guard-Stream-4B ⚡ Input filter Real-time moderation of user input
Qwen3Guard-Gen-4B 🧠 Safe generator Output-safe generation without alignment
Qwen3-4B-SafeRL 🤝 Fully aligned agent Ethical, multi-turn conversations

Recommended Architecture

User Input
    ↓
[Optional: Qwen3Guard-Stream-4B] ← optional pre-filter
    ↓
[Qwen3-4B-SafeRL]
    ↓
Aligned Response

You can run this model standalone or behind a guard for defense-in-depth.

Available Quantizations

Level Size RAM Usage Use Case
Q2_K ~1.8 GB ~2.0 GB Only on weak hardware
Q3_K_S ~2.1 GB ~2.3 GB Minimal viability
Q4_K_M ~2.8 GB ~3.0 GB ✅ Balanced choice
Q5_K_M ~3.1 GB ~3.3 GB ✅✅ Highest quality
Q6_K ~3.5 GB ~3.8 GB Near-FP16 fidelity
Q8_0 ~4.5 GB ~5.0 GB Maximum accuracy

💡 Recommendation: Use Q5_K_M for best balance of ethical reasoning and response quality.

Tools That Support It

  • LM Studio – load and test locally
  • OpenWebUI – deploy with RAG and tools
  • GPT4All – private, offline AI
  • Directly via llama.cpp, Ollama, or TGI

Author

👤 Geoff Munn (@geoffmunn)
🔗 Hugging Face Profile

Disclaimer

Community conversion for local inference. Not affiliated with Alibaba Cloud.

Downloads last month
7
GGUF
Model size
4B params
Architecture
qwen3
Hardware compatibility
Log In to add your hardware

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for geoffmunn/Qwen3-4B-SafeRL-GGUF

Finetuned
Qwen/Qwen3-4B
Quantized
(6)
this model