Text Generation
GGUF
English
Russian
multilingual
glm4
Mixture of Experts
heretic
uncensored
decensored
abliterated
imatrix
conversational
Instructions to use ENOSYS/GLM-4.7-Flash-Uncensored-750-v1-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use ENOSYS/GLM-4.7-Flash-Uncensored-750-v1-GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="ENOSYS/GLM-4.7-Flash-Uncensored-750-v1-GGUF", filename="GLM-4.7-Flash-Uncensored-BPW10.0.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use ENOSYS/GLM-4.7-Flash-Uncensored-750-v1-GGUF with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf ENOSYS/GLM-4.7-Flash-Uncensored-750-v1-GGUF # Run inference directly in the terminal: llama-cli -hf ENOSYS/GLM-4.7-Flash-Uncensored-750-v1-GGUF
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf ENOSYS/GLM-4.7-Flash-Uncensored-750-v1-GGUF # Run inference directly in the terminal: llama-cli -hf ENOSYS/GLM-4.7-Flash-Uncensored-750-v1-GGUF
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf ENOSYS/GLM-4.7-Flash-Uncensored-750-v1-GGUF # Run inference directly in the terminal: ./llama-cli -hf ENOSYS/GLM-4.7-Flash-Uncensored-750-v1-GGUF
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf ENOSYS/GLM-4.7-Flash-Uncensored-750-v1-GGUF # Run inference directly in the terminal: ./build/bin/llama-cli -hf ENOSYS/GLM-4.7-Flash-Uncensored-750-v1-GGUF
Use Docker
docker model run hf.co/ENOSYS/GLM-4.7-Flash-Uncensored-750-v1-GGUF
- LM Studio
- Jan
- vLLM
How to use ENOSYS/GLM-4.7-Flash-Uncensored-750-v1-GGUF with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "ENOSYS/GLM-4.7-Flash-Uncensored-750-v1-GGUF" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ENOSYS/GLM-4.7-Flash-Uncensored-750-v1-GGUF", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/ENOSYS/GLM-4.7-Flash-Uncensored-750-v1-GGUF
- Ollama
How to use ENOSYS/GLM-4.7-Flash-Uncensored-750-v1-GGUF with Ollama:
ollama run hf.co/ENOSYS/GLM-4.7-Flash-Uncensored-750-v1-GGUF
- Unsloth Studio new
How to use ENOSYS/GLM-4.7-Flash-Uncensored-750-v1-GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for ENOSYS/GLM-4.7-Flash-Uncensored-750-v1-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for ENOSYS/GLM-4.7-Flash-Uncensored-750-v1-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for ENOSYS/GLM-4.7-Flash-Uncensored-750-v1-GGUF to start chatting
- Pi new
How to use ENOSYS/GLM-4.7-Flash-Uncensored-750-v1-GGUF with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf ENOSYS/GLM-4.7-Flash-Uncensored-750-v1-GGUF
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "ENOSYS/GLM-4.7-Flash-Uncensored-750-v1-GGUF" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use ENOSYS/GLM-4.7-Flash-Uncensored-750-v1-GGUF with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf ENOSYS/GLM-4.7-Flash-Uncensored-750-v1-GGUF
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default ENOSYS/GLM-4.7-Flash-Uncensored-750-v1-GGUF
Run Hermes
hermes
- Docker Model Runner
How to use ENOSYS/GLM-4.7-Flash-Uncensored-750-v1-GGUF with Docker Model Runner:
docker model run hf.co/ENOSYS/GLM-4.7-Flash-Uncensored-750-v1-GGUF
- Lemonade
How to use ENOSYS/GLM-4.7-Flash-Uncensored-750-v1-GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull ENOSYS/GLM-4.7-Flash-Uncensored-750-v1-GGUF
Run and chat with the model
lemonade run user.GLM-4.7-Flash-Uncensored-750-v1-GGUF-{{QUANT_TAG}}List all available models
lemonade list
Experimental global target bits‑per‑weight quantization of HauhauCS/GLM-4.7-Flash-Uncensored-HauhauCS-Aggressive
- Using non-standard (forked) LLaMA C++ branch for quantization.
- Using a CLI tool to build KLD evaluation and imatrix calibration datasets for GGUF models, sourced from eaddario/imatrix-calibration.
- Using dataset sources: tools, math, code, text_en, text_ru.
- Using dataset chunks: 750.
- Tensors quantinization F16 instead of BF16, Nvidia Pascal architecture friendly like P100.
- Small set of patches added.
Many thanks to Ed Addario for an impressive job.
Quantization comparison
| BPW/TGS | PPL correlation | PPL mean ratio | ΔPPL | Mean KLD | Median KLD | Maximum KLD | 99.9% KLD | Mean Δp | RMS Δp |
|---|---|---|---|---|---|---|---|---|---|
| 3.50 | 93.18% | 1.205301 ± 0.003693 | 3.066378 ± 0.059410 | 0.346790 ± 0.003130 | 0.113532 | 36.079124 | 17.036463 | -1.548 ± 0.027 % | 12.313 ± 0.063 % |
| 4.00 | 92.50% | 1.081636 ± 0.003399 | 1.219317 ± 0.050468 | 0.350924 ± 0.003492 | 0.094924 | 35.051384 | 19.111687 | -0.644 ± 0.027 % | 11.874 ± 0.066 % |
| 4.50 | 93.92% | 1.159525 ± 0.003402 | 2.382673 ± 0.054462 | 0.212138 ± 0.003037 | 0.035486 | 37.016945 | 19.404484 | -0.722 ± 0.019 % | 8.384 ± 0.070 % |
| 5.00 | 94.32% | 1.030399 ± 0.002807 | 0.454046 ± 0.041667 | 0.223456 ± 0.003202 | 0.029363 | 33.802094 | 20.041710 | -0.307 ± 0.018 % | 7.986 ± 0.072 % |
| 5.50 | 93.59% | 0.970038 ± 0.002771 | -0.447509 ± 0.042194 | 0.234948 ± 0.003451 | 0.024535 | 32.256840 | 19.587420 | 0.123 ± 0.020 % | 8.691 ± 0.081 % |
| 6.00 | 96.85% | 1.028155 ± 0.002101 | 0.420519 ± 0.031506 | 0.107335 ± 0.002290 | 0.008626 | 36.048412 | 17.149174 | -0.060 ± 0.012 % | 5.211 ± 0.072 % |
| 6.50 | 97.55% | 1.037597 ± 0.001880 | 0.561555 ± 0.028552 | 0.080116 ± 0.001919 | 0.007975 | 32.534607 | 14.545952 | -0.128 ± 0.011 % | 4.691 ± 0.069 % |
| 7.00 | 96.92% | 1.015746 ± 0.002049 | 0.235178 ± 0.030606 | 0.099637 ± 0.002328 | 0.003383 | 35.733624 | 17.560083 | 0.007 ± 0.010 % | 4.330 ± 0.078 % |
| 7.50 | 97.57% | 1.030245 ± 0.001857 | 0.451735 ± 0.028043 | 0.067106 ± 0.001900 | 0.002916 | 37.250160 | 14.828805 | -0.069 ± 0.009 % | 3.827 ± 0.077 % |
| 8.00 | 97.42% | 1.022089 ± 0.001892 | 0.329923 ± 0.028377 | 0.077545 ± 0.002062 | 0.002760 | 33.393574 | 16.344349 | -0.035 ± 0.009 % | 3.835 ± 0.077 % |
| 8.50 | 97.26% | 1.026630 ± 0.001957 | 0.397746 ± 0.029396 | 0.082303 ± 0.002123 | 0.002307 | 31.664230 | 17.153591 | -0.049 ± 0.009 % | 3.815 ± 0.079 % |
| 9.00 | 98.34% | 1.019983 ± 0.001520 | 0.298461 ± 0.022937 | 0.044058 ± 0.001500 | 0.001003 | 35.295940 | 10.766310 | -0.013 ± 0.007 % | 3.132 ± 0.075 % |
| 9.50 | 98.27% | 1.010330 ± 0.001530 | 0.154286 ± 0.022915 | 0.051995 ± 0.001669 | 0.000858 | 32.079849 | 13.546452 | 0.032 ± 0.007 % | 3.195 ± 0.078 % |
| 10.00 | 98.40% | 1.013286 ± 0.001478 | 0.198433 ± 0.022200 | 0.044456 ± 0.001528 | 0.000833 | 31.551548 | 12.255160 | 0.002 ± 0.007 % | 3.022 ± 0.076 % |
| 10.50 | 98.30% | 1.012990 ± 0.001525 | 0.194020 ± 0.022882 | 0.047429 ± 0.001597 | 0.000826 | 33.701038 | 13.457508 | 0.019 ± 0.007 % | 3.073 ± 0.078 % |
| 11.00 | 98.35% | 1.019113 ± 0.001514 | 0.285470 ± 0.022865 | 0.042238 ± 0.001490 | 0.000819 | 31.194330 | 11.399836 | 0.001 ± 0.006 % | 2.878 ± 0.075 % |
- Downloads last month
- 1,068
Hardware compatibility
Log In to add your hardware
We're not able to determine the quantization variants.
Model tree for ENOSYS/GLM-4.7-Flash-Uncensored-750-v1-GGUF
Dataset used to train ENOSYS/GLM-4.7-Flash-Uncensored-750-v1-GGUF
Viewer • Updated • 299 • 42.4k • 42