Instructions to use ENOSYS/Octen-Embedding-0.6B-750-v1-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use ENOSYS/Octen-Embedding-0.6B-750-v1-GGUF with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("ENOSYS/Octen-Embedding-0.6B-750-v1-GGUF") sentences = [ "That is a happy person", "That is a happy dog", "That is a very happy person", "Today is a sunny day" ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [4, 4] - Transformers
How to use ENOSYS/Octen-Embedding-0.6B-750-v1-GGUF with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("ENOSYS/Octen-Embedding-0.6B-750-v1-GGUF", dtype="auto") - llama-cpp-python
How to use ENOSYS/Octen-Embedding-0.6B-750-v1-GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="ENOSYS/Octen-Embedding-0.6B-750-v1-GGUF", filename="Octen-Embedding-0.6B-BPW10.0.gguf", )
llm.create_chat_completion( messages = "{\n \"source_sentence\": \"That is a happy person\",\n \"sentences\": [\n \"That is a happy dog\",\n \"That is a very happy person\",\n \"Today is a sunny day\"\n ]\n}" ) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use ENOSYS/Octen-Embedding-0.6B-750-v1-GGUF with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf ENOSYS/Octen-Embedding-0.6B-750-v1-GGUF # Run inference directly in the terminal: llama-cli -hf ENOSYS/Octen-Embedding-0.6B-750-v1-GGUF
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf ENOSYS/Octen-Embedding-0.6B-750-v1-GGUF # Run inference directly in the terminal: llama-cli -hf ENOSYS/Octen-Embedding-0.6B-750-v1-GGUF
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf ENOSYS/Octen-Embedding-0.6B-750-v1-GGUF # Run inference directly in the terminal: ./llama-cli -hf ENOSYS/Octen-Embedding-0.6B-750-v1-GGUF
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf ENOSYS/Octen-Embedding-0.6B-750-v1-GGUF # Run inference directly in the terminal: ./build/bin/llama-cli -hf ENOSYS/Octen-Embedding-0.6B-750-v1-GGUF
Use Docker
docker model run hf.co/ENOSYS/Octen-Embedding-0.6B-750-v1-GGUF
- LM Studio
- Jan
- Ollama
How to use ENOSYS/Octen-Embedding-0.6B-750-v1-GGUF with Ollama:
ollama run hf.co/ENOSYS/Octen-Embedding-0.6B-750-v1-GGUF
- Unsloth Studio new
How to use ENOSYS/Octen-Embedding-0.6B-750-v1-GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for ENOSYS/Octen-Embedding-0.6B-750-v1-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for ENOSYS/Octen-Embedding-0.6B-750-v1-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for ENOSYS/Octen-Embedding-0.6B-750-v1-GGUF to start chatting
- Pi new
How to use ENOSYS/Octen-Embedding-0.6B-750-v1-GGUF with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf ENOSYS/Octen-Embedding-0.6B-750-v1-GGUF
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "ENOSYS/Octen-Embedding-0.6B-750-v1-GGUF" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use ENOSYS/Octen-Embedding-0.6B-750-v1-GGUF with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf ENOSYS/Octen-Embedding-0.6B-750-v1-GGUF
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default ENOSYS/Octen-Embedding-0.6B-750-v1-GGUF
Run Hermes
hermes
- Docker Model Runner
How to use ENOSYS/Octen-Embedding-0.6B-750-v1-GGUF with Docker Model Runner:
docker model run hf.co/ENOSYS/Octen-Embedding-0.6B-750-v1-GGUF
- Lemonade
How to use ENOSYS/Octen-Embedding-0.6B-750-v1-GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull ENOSYS/Octen-Embedding-0.6B-750-v1-GGUF
Run and chat with the model
lemonade run user.Octen-Embedding-0.6B-750-v1-GGUF-{{QUANT_TAG}}List all available models
lemonade list
llm.create_chat_completion(
messages = "{\n \"source_sentence\": \"That is a happy person\",\n \"sentences\": [\n \"That is a happy dog\",\n \"That is a very happy person\",\n \"Today is a sunny day\"\n ]\n}"
)Experimental global target bits‑per‑weight quantization of Octen/Octen-Embedding-0.6B
- Using non-standard (forked) LLaMA C++ branch for quantization.
- Using a CLI tool to build KLD evaluation and imatrix calibration datasets for GGUF models, sourced from eaddario/imatrix-calibration.
- Using dataset sources: text_en, text_ru.
- Using dataset chunks: 750.
- Small set of patches added.
- Tensors quantinization F16 instead of BF16, Nvidia Pascal architecture friendly like P100.
- Small set of patches added.
Many thanks to Ed Addario for an impressive job.
Quantization comparison
| BPW/TGS | PPL correlation | PPL mean ratio | ΔPPL | Mean KLD | Maximum KLD | 99.9% KLD | Mean Δp | RMS Δp |
|---|---|---|---|---|---|---|---|---|
| 3.50 | 85.56% | 1.328184 ± 0.006198 | 113.055840 ± 2.030409 | 1.738910 ± 0.002968 | 19.720037 | 10.812436 | -3.331 ± 0.037 % | 17.941 ± 0.063 % |
| 4.00 | 93.58% | 1.360601 ± 0.004364 | 124.223093 ± 1.754530 | 0.787587 ± 0.001566 | 19.145452 | 6.742295 | -1.688 ± 0.027 % | 12.813 ± 0.052 % |
| 4.50 | 95.88% | 1.235695 ± 0.003178 | 81.194364 ± 1.267961 | 0.492418 ± 0.001110 | 16.026636 | 5.010898 | -1.135 ± 0.022 % | 10.330 ± 0.047 % |
| 5.00 | 97.12% | 1.225914 ± 0.002645 | 77.825006 ± 1.132301 | 0.323054 ± 0.000773 | 14.785572 | 3.622450 | -1.015 ± 0.018 % | 8.707 ± 0.042 % |
| 5.50 | 98.61% | 1.140659 ± 0.001722 | 48.455459 ± 0.754229 | 0.142510 ± 0.000361 | 10.251470 | 1.794577 | -0.500 ± 0.012 % | 5.834 ± 0.031 % |
| 6.00 | 99.03% | 1.095986 ± 0.001385 | 33.066115 ± 0.583789 | 0.089186 ± 0.000256 | 9.821702 | 1.338731 | -0.228 ± 0.010 % | 4.639 ± 0.028 % |
| 6.50 | 99.28% | 1.086861 ± 0.001185 | 29.922486 ± 0.515390 | 0.056230 ± 0.000156 | 9.794716 | 0.742507 | -0.200 ± 0.008 % | 3.695 ± 0.022 % |
| 7.00 | 99.47% | 1.032955 ± 0.000969 | 11.352594 ± 0.361050 | 0.033634 ± 0.000093 | 3.941931 | 0.450176 | -0.041 ± 0.006 % | 2.933 ± 0.020 % |
| 7.50 | 99.53% | 1.015463 ± 0.000891 | 5.326939 ± 0.316756 | 0.024763 ± 0.000071 | 4.088861 | 0.356147 | 0.027 ± 0.005 % | 2.558 ± 0.019 % |
| 8.00 | 99.56% | 1.012318 ± 0.000866 | 4.243431 ± 0.305956 | 0.021680 ± 0.000059 | 1.800071 | 0.296283 | 0.037 ± 0.005 % | 2.380 ± 0.015 % |
| 8.50 | 99.63% | 1.020174 ± 0.000803 | 6.949556 ± 0.292500 | 0.013173 ± 0.000038 | 3.675829 | 0.164360 | -0.008 ± 0.004 % | 1.903 ± 0.015 % |
| 9.00 | 99.64% | 1.017785 ± 0.000789 | 6.126855 ± 0.285191 | 0.011793 ± 0.000035 | 3.644340 | 0.155200 | 0.001 ± 0.004 % | 1.822 ± 0.015 % |
| 9.50 | 99.64% | 1.023754 ± 0.000790 | 8.182888 ± 0.292987 | 0.011307 ± 0.000036 | 3.703608 | 0.149881 | -0.017 ± 0.004 % | 1.799 ± 0.017 % |
| 10.00 | 99.64% | 1.026870 ± 0.000790 | 9.256477 ± 0.297126 | 0.010960 ± 0.000039 | 4.368244 | 0.146243 | -0.023 ± 0.004 % | 1.781 ± 0.017 % |
| 10.50 | 99.65% | 1.031793 ± 0.000791 | 10.952329 ± 0.305078 | 0.010631 ± 0.000033 | 2.633135 | 0.139757 | -0.033 ± 0.004 % | 1.756 ± 0.016 % |
| 11.00 | 99.65% | 1.032131 ± 0.000785 | 11.068814 ± 0.303826 | 0.010088 ± 0.000026 | 1.066854 | 0.125272 | -0.039 ± 0.004 % | 1.698 ± 0.010 % |
| 11.50 | 99.66% | 1.034359 ± 0.000785 | 11.836127 ± 0.307385 | 0.009816 ± 0.000026 | 1.311119 | 0.122771 | -0.039 ± 0.004 % | 1.685 ± 0.010 % |
| 12.00 | 99.66% | 1.033216 ± 0.000782 | 11.442555 ± 0.304820 | 0.009535 ± 0.000028 | 2.683337 | 0.118959 | -0.036 ± 0.004 % | 1.669 ± 0.015 % |
| 12.50 | 99.66% | 1.035454 ± 0.000782 | 12.213620 ± 0.308823 | 0.009296 ± 0.000026 | 1.888097 | 0.117823 | -0.036 ± 0.003 % | 1.653 ± 0.011 % |
| 13.00 | 99.66% | 1.035315 ± 0.000779 | 12.165472 ± 0.307315 | 0.009015 ± 0.000024 | 1.563708 | 0.112765 | -0.037 ± 0.003 % | 1.641 ± 0.012 % |
| 13.50 | 99.66% | 1.035511 ± 0.000777 | 12.233185 ± 0.307197 | 0.008828 ± 0.000027 | 2.020073 | 0.110392 | -0.042 ± 0.003 % | 1.634 ± 0.016 % |
| 14.00 | 99.67% | 1.034812 ± 0.000775 | 11.992309 ± 0.305653 | 0.008529 ± 0.000025 | 1.908343 | 0.105182 | -0.034 ± 0.003 % | 1.617 ± 0.017 % |
| 14.50 | 99.66% | 1.035023 ± 0.000780 | 12.064932 ± 0.307359 | 0.008970 ± 0.000023 | 0.820967 | 0.111618 | -0.034 ± 0.003 % | 1.618 ± 0.010 % |
| 15.00 | 99.66% | 1.035123 ± 0.000777 | 12.099435 ± 0.306609 | 0.008687 ± 0.000022 | 1.018949 | 0.102148 | -0.033 ± 0.003 % | 1.612 ± 0.010 % |
- Downloads last month
- 335
We're not able to determine the quantization variants.
Model tree for ENOSYS/Octen-Embedding-0.6B-750-v1-GGUF
Base model
Qwen/Qwen3-0.6B-Base
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="ENOSYS/Octen-Embedding-0.6B-750-v1-GGUF", filename="", )