Instructions to use ENOSYS/Octen-Embedding-8B-750-v1-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use ENOSYS/Octen-Embedding-8B-750-v1-GGUF with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("ENOSYS/Octen-Embedding-8B-750-v1-GGUF") sentences = [ "That is a happy person", "That is a happy dog", "That is a very happy person", "Today is a sunny day" ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [4, 4] - Transformers
How to use ENOSYS/Octen-Embedding-8B-750-v1-GGUF with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("ENOSYS/Octen-Embedding-8B-750-v1-GGUF", dtype="auto") - llama-cpp-python
How to use ENOSYS/Octen-Embedding-8B-750-v1-GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="ENOSYS/Octen-Embedding-8B-750-v1-GGUF", filename="Octen-Embedding-8B-BPW10.0.gguf", )
llm.create_chat_completion( messages = "{\n \"source_sentence\": \"That is a happy person\",\n \"sentences\": [\n \"That is a happy dog\",\n \"That is a very happy person\",\n \"Today is a sunny day\"\n ]\n}" ) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use ENOSYS/Octen-Embedding-8B-750-v1-GGUF with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf ENOSYS/Octen-Embedding-8B-750-v1-GGUF # Run inference directly in the terminal: llama-cli -hf ENOSYS/Octen-Embedding-8B-750-v1-GGUF
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf ENOSYS/Octen-Embedding-8B-750-v1-GGUF # Run inference directly in the terminal: llama-cli -hf ENOSYS/Octen-Embedding-8B-750-v1-GGUF
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf ENOSYS/Octen-Embedding-8B-750-v1-GGUF # Run inference directly in the terminal: ./llama-cli -hf ENOSYS/Octen-Embedding-8B-750-v1-GGUF
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf ENOSYS/Octen-Embedding-8B-750-v1-GGUF # Run inference directly in the terminal: ./build/bin/llama-cli -hf ENOSYS/Octen-Embedding-8B-750-v1-GGUF
Use Docker
docker model run hf.co/ENOSYS/Octen-Embedding-8B-750-v1-GGUF
- LM Studio
- Jan
- Ollama
How to use ENOSYS/Octen-Embedding-8B-750-v1-GGUF with Ollama:
ollama run hf.co/ENOSYS/Octen-Embedding-8B-750-v1-GGUF
- Unsloth Studio new
How to use ENOSYS/Octen-Embedding-8B-750-v1-GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for ENOSYS/Octen-Embedding-8B-750-v1-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for ENOSYS/Octen-Embedding-8B-750-v1-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for ENOSYS/Octen-Embedding-8B-750-v1-GGUF to start chatting
- Pi new
How to use ENOSYS/Octen-Embedding-8B-750-v1-GGUF with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf ENOSYS/Octen-Embedding-8B-750-v1-GGUF
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "ENOSYS/Octen-Embedding-8B-750-v1-GGUF" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use ENOSYS/Octen-Embedding-8B-750-v1-GGUF with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf ENOSYS/Octen-Embedding-8B-750-v1-GGUF
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default ENOSYS/Octen-Embedding-8B-750-v1-GGUF
Run Hermes
hermes
- Docker Model Runner
How to use ENOSYS/Octen-Embedding-8B-750-v1-GGUF with Docker Model Runner:
docker model run hf.co/ENOSYS/Octen-Embedding-8B-750-v1-GGUF
- Lemonade
How to use ENOSYS/Octen-Embedding-8B-750-v1-GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull ENOSYS/Octen-Embedding-8B-750-v1-GGUF
Run and chat with the model
lemonade run user.Octen-Embedding-8B-750-v1-GGUF-{{QUANT_TAG}}List all available models
lemonade list
Experimental global target bits‑per‑weight quantization of Octen/Octen-Embedding-8B
- Using non-standard (forked) LLaMA C++ branch for quantization.
- Using a CLI tool to build KLD evaluation and imatrix calibration datasets for GGUF models, sourced from eaddario/imatrix-calibration.
- Using dataset sources: text_en, text_ru.
- Using dataset chunks: 750.
- Small set of patches added.
- Tensors quantinization F16 instead of BF16, Nvidia Pascal architecture friendly like P100.
- Small set of patches added.
Many thanks to Ed Addario for an impressive job.
Quantization comparison
| BPW/TGS | PPL correlation | PPL mean ratio | ΔPPL | Mean KLD | Maximum KLD | 99.9% KLD | Mean Δp | RMS Δp |
|---|---|---|---|---|---|---|---|---|
| 3.50 | 31.88% | 163.312009 ± 2.603029 | 1316751250.467953 ± 22175721.097179 | 0.865687 ± 0.002557 | 40.508530 | 13.199468 | -0.000 ± 0.001 % | 0.240 ± 0.035 % |
| 4.00 | 27.97% | 894.496097 ± 17.658576 | 7248459977.767529 ± 148545165.662543 | 0.462401 ± 0.002284 | 56.080093 | 14.465672 | -0.001 ± 0.000 % | 0.147 ± 0.022 % |
| 4.50 | 29.87% | 536.818391 ± 10.018571 | 4346810434.048196 ± 84695221.429227 | 0.247896 ± 0.001484 | 39.661839 | 8.264471 | 0.000 ± 0.000 % | 0.130 ± 0.018 % |
| 5.00 | 30.26% | 448.634201 ± 8.243512 | 3631418870.079112 ± 69768738.026683 | 0.189924 ± 0.001342 | 36.415863 | 7.533142 | -0.000 ± 0.000 % | 0.097 ± 0.016 % |
| 5.50 | 29.94% | 475.154499 ± 8.855907 | 3846563982.921083 ± 74878878.690613 | 0.175310 ± 0.001361 | 33.947906 | 7.652236 | -0.000 ± 0.000 % | 0.132 ± 0.037 % |
| 6.00 | 30.18% | 535.916733 ± 10.050226 | 4339495766.813949 ± 85010916.130295 | 0.093320 ± 0.000936 | 31.576149 | 5.749511 | -0.000 ± 0.000 % | 0.085 ± 0.019 % |
| 6.50 | 30.33% | 513.696057 ± 9.587249 | 4159231201.265534 ± 81127272.345352 | 0.076551 ± 0.000824 | 33.049152 | 5.351891 | -0.000 ± 0.000 % | 0.049 ± 0.009 % |
| 7.00 | 30.48% | 487.499691 ± 9.042077 | 3946713977.342775 ± 76548947.044238 | 0.069732 ± 0.000789 | 31.959265 | 5.224213 | -0.000 ± 0.000 % | 0.060 ± 0.014 % |
| 7.50 | 30.45% | 485.864997 ± 9.013780 | 3933452574.640460 ± 76305069.637214 | 0.066390 ± 0.000758 | 27.289934 | 5.135751 | -0.000 ± 0.000 % | 0.049 ± 0.009 % |
| 8.00 | 30.56% | 480.323684 ± 8.884633 | 3888498839.452749 ± 75233350.710666 | 0.064214 ± 0.000758 | 22.896826 | 4.984374 | 0.000 ± 0.000 % | 0.045 ± 0.006 % |
| 8.50 | 30.59% | 468.816726 ± 8.658897 | 3795148991.398126 ± 73329353.181311 | 0.063394 ± 0.000767 | 27.460506 | 4.974599 | -0.000 ± 0.000 % | 0.039 ± 0.005 % |
| 9.00 | 30.59% | 472.128288 ± 8.725107 | 3822013941.457990 ± 73888001.523125 | 0.061325 ± 0.000754 | 26.071749 | 4.991961 | 0.000 ± 0.000 % | 0.032 ± 0.004 % |
| 9.50 | 30.57% | 477.493384 ± 8.834961 | 3865538118.737411 ± 74813478.721664 | 0.061779 ± 0.000778 | 28.092499 | 5.049781 | -0.000 ± 0.000 % | 0.038 ± 0.006 % |
| 10.00 | 30.58% | 473.251580 ± 8.749327 | 3831126611.272995 ± 74092083.168597 | 0.060787 ± 0.000750 | 27.365194 | 5.046810 | -0.000 ± 0.000 % | 0.032 ± 0.004 % |
| 10.50 | 30.58% | 473.369865 ± 8.754410 | 3832086195.704186 ± 74134265.313673 | 0.061487 ± 0.000778 | 29.115179 | 5.049273 | -0.000 ± 0.000 % | 0.031 ± 0.005 % |
| 11.00 | 30.58% | 469.947653 ± 8.686996 | 3804323606.512961 ± 73563714.202142 | 0.060947 ± 0.000761 | 26.897139 | 4.949221 | -0.000 ± 0.000 % | 0.032 ± 0.004 % |
| 11.50 | 30.59% | 469.702016 ± 8.680818 | 3802330885.149252 ± 73513517.264363 | 0.060967 ± 0.000756 | 24.905037 | 4.991287 | -0.000 ± 0.000 % | 0.042 ± 0.006 % |
| 12.00 | 30.59% | 469.007636 ± 8.666011 | 3796697743.108781 ± 73388654.821674 | 0.060841 ± 0.000757 | 29.013231 | 4.902389 | -0.000 ± 0.000 % | 0.034 ± 0.004 % |
| 12.50 | 30.60% | 468.247009 ± 8.650971 | 3790527181.157271 ± 73262486.731016 | 0.061428 ± 0.000774 | 26.518728 | 5.096298 | -0.000 ± 0.000 % | 0.039 ± 0.007 % |
| 13.00 | 30.59% | 468.485073 ± 8.656076 | 3792458472.236744 ± 73305035.806184 | 0.060770 ± 0.000756 | 27.815191 | 4.977703 | -0.000 ± 0.000 % | 0.040 ± 0.006 % |
| 13.50 | 30.60% | 468.608802 ± 8.658301 | 3793462215.247329 ± 73324801.786474 | 0.060845 ± 0.000748 | 25.343117 | 5.012136 | -0.000 ± 0.000 % | 0.034 ± 0.006 % |
| 14.00 | 30.59% | 470.353813 ± 8.694041 | 3807618563.064641 ± 73625192.396193 | 0.060969 ± 0.000763 | 27.384163 | 5.017433 | 0.000 ± 0.000 % | 0.033 ± 0.004 % |
| 14.50 | 30.59% | 469.238406 ± 8.669486 | 3798569859.379515 ± 73417558.393644 | 0.060245 ± 0.000763 | 25.959768 | 4.983978 | 0.000 ± 0.000 % | 0.030 ± 0.004 % |
| 15.00 | 30.59% | 470.262969 ± 8.688094 | 3806881593.724875 ± 73576296.537943 | 0.060078 ± 0.000773 | 29.312548 | 5.103179 | 0.000 ± 0.000 % | 0.029 ± 0.004 % |
- Downloads last month
- 219
We're not able to determine the quantization variants.