Instructions to use tiiuae/Falcon-E-3B-Instruct-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use tiiuae/Falcon-E-3B-Instruct-GGUF with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("tiiuae/Falcon-E-3B-Instruct-GGUF", dtype="auto") - llama-cpp-python
How to use tiiuae/Falcon-E-3B-Instruct-GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="tiiuae/Falcon-E-3B-Instruct-GGUF", filename="ggml-model-i2_s.gguf", )
llm.create_chat_completion( messages = "No input example has been defined for this model task." )
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use tiiuae/Falcon-E-3B-Instruct-GGUF with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf tiiuae/Falcon-E-3B-Instruct-GGUF # Run inference directly in the terminal: llama-cli -hf tiiuae/Falcon-E-3B-Instruct-GGUF
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf tiiuae/Falcon-E-3B-Instruct-GGUF # Run inference directly in the terminal: llama-cli -hf tiiuae/Falcon-E-3B-Instruct-GGUF
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf tiiuae/Falcon-E-3B-Instruct-GGUF # Run inference directly in the terminal: ./llama-cli -hf tiiuae/Falcon-E-3B-Instruct-GGUF
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf tiiuae/Falcon-E-3B-Instruct-GGUF # Run inference directly in the terminal: ./build/bin/llama-cli -hf tiiuae/Falcon-E-3B-Instruct-GGUF
Use Docker
docker model run hf.co/tiiuae/Falcon-E-3B-Instruct-GGUF
- LM Studio
- Jan
- Ollama
How to use tiiuae/Falcon-E-3B-Instruct-GGUF with Ollama:
ollama run hf.co/tiiuae/Falcon-E-3B-Instruct-GGUF
- Unsloth Studio new
How to use tiiuae/Falcon-E-3B-Instruct-GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for tiiuae/Falcon-E-3B-Instruct-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for tiiuae/Falcon-E-3B-Instruct-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for tiiuae/Falcon-E-3B-Instruct-GGUF to start chatting
- Docker Model Runner
How to use tiiuae/Falcon-E-3B-Instruct-GGUF with Docker Model Runner:
docker model run hf.co/tiiuae/Falcon-E-3B-Instruct-GGUF
- Lemonade
How to use tiiuae/Falcon-E-3B-Instruct-GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull tiiuae/Falcon-E-3B-Instruct-GGUF
Run and chat with the model
lemonade run user.Falcon-E-3B-Instruct-GGUF-{{QUANT_TAG}}List all available models
lemonade list
Table of Contents
TL;DR
Model Details
Model Description
- Developed by: https://www.tii.ae
- Model type: Causal decoder-only / Base version
- Architecture: Pure-transformer - 1.58bit version
- Language(s) (NLP): English
- License: Falcon-LLM License
Training details
For more details about the training protocol of this model, please refer to the Falcon-E technical blogpost.
Usage
Currently to use this model you can either rely on Hugging Face transformers library or BitNet library. There are multiple ways to interact with the model depending on your target usage. For each of the Falcon-E series model, you have three variants: the BitNet model, the prequantized checkpoint for fine-tuning and the bfloat16 version of the BitNet model.
Inference
BitNet
git clone https://github.com/microsoft/BitNet && cd BitNet
pip install -r requirements.txt
huggingface-cli download tiiuae/Falcon-E-3B-Instruct-GGUF ggml-model-i2_s.gguf --local-dir models/Falcon-E-3B-Instruct/
python run_inference.py -m models/Falcon-E-3B-Instruct/ggml-model-i2_s.gguf -p "You are a helpful assistant" -cnv
Fine-tuning
For fine-tuning the model, you should load the prequantized revision of the model and use the onebitllms Python package:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from trl import SFTTrainer
+ from onebitllms import replace_linear_with_bitnet_linear, quantize_to_1bit
model_id = "tiiuae/Falcon-E-1B-Base"
tokenizer = AutoTokenizer.from_pretrained(model_id, revision="prequantized")
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
+ revision="prequantized"
)
+ model = replace_linear_with_bitnet_linear(model)
trainer = SFTTrainer(
model,
...
)
trainer.train()
+ quantize_to_1bit(output_directory)
Evaluation
We report in the following table our internal pipeline benchmarks:
Note evaluation results are normalized score from former Hugging Face leaderboard v2 tasks
For 1B scale models and below
| Model | Nb Params | Mem Footprint | IFEVAL | Math-Hard | GPQA | MuSR | BBH | MMLU-Pro | Avg. |
|---|---|---|---|---|---|---|---|---|---|
| Qwen-2.5-0.5B | 0.5B | 1GB | 16.27 | 3.93 | 0.0 | 2.08 | 6.95 | 10.06 | 6.55 |
| SmolLM2-360M | 0.36B | 720MB | 21.15 | 1.21 | 0.0 | 7.73 | 5.54 | 1.88 | 6.25 |
| Qwen-2.5-1.5B | 1.5B | 3.1GB | 26.74 | 9.14 | 16.66 | 5.27 | 20.61 | 4.7 | 13.85 |
| Llama-3.2-1B | 1.24B | 2.47GB | 14.78 | 1.21 | 4.37 | 2.56 | 2.26 | 0 | 4.2 |
| SmolLM2-1.7B | 1.7B | 3.4GB | 24.4 | 2.64 | 9.3 | 4.6 | 12.64 | 3.91 | 9.58 |
| Falcon-3-1B-Base | 1.5B | 3GB | 24.28 | 3.32 | 11.34 | 9.71 | 6.76 | 3.91 | 9.89 |
| Hymba-1.5B-Base | 1.5B | 3GB | 22.95 | 1.36 | 7.69 | 5.18 | 10.25 | 0.78 | 8.04 |
| Falcon-E-1B-Base | 1.8B | 635MB | 32.9 | 10.97 | 2.8 | 3.65 | 12.28 | 17.82 | 13.40 |
For 3B scale models
| Model | Nb Params | Mem Footprint | IFEVAL | Math-Hard | GPQA | MuSR | BBH | MMLU-Pro | Avg. |
|---|---|---|---|---|---|---|---|---|---|
| Falcon-3-3B-Base | 3B | 6.46GB | 15.74 | 11.78 | 21.58 | 6.27 | 18.09 | 6.26 | 15.74 |
| Qwen2.5-3B | 3B | 6.17GB | 26.9 | 14.8 | 24.3 | 11.76 | 24.48 | 6.38 | 18.1 |
| Falcon-E-3B-Base | 3B | 955MB | 36.67 | 13.45 | 8.67 | 4.14 | 19.83 | 27.16 | 18.32 |
Below are the results for instruction fine-tuned models:
For 1B scale models and below
| Model | Nb Params | Mem Footprint | IFEVAL | Math-Hard | GPQA | MuSR | BBH | MMLU-Pro | Avg. |
|---|---|---|---|---|---|---|---|---|---|
| Qwen-2.5-0.5B-Instruct | 500M | 1GB | 30.71 | 0 | 8.43 | 0.94 | 7.75 | 0 | 6.59 |
| SmolLM2-360M-Instruct | 360M | 720MB | 38.42 | 1.51 | 4.17 | 2.77 | 1.3 | 0.67 | 8.14 |
| Qwen-2.5-1.5B-Instruct | 1.5B | 3.1GB | 44.76 | 22.05 | 19.81 | 3.19 | 19.99 | 0.78 | 18.43 |
| SmolLM2-1.7B | 1.7B | 3.4GB | 53.68 | 5.82 | 10.92 | 4.1 | 11.71 | 0 | 15.02 |
| Falcon-3-1B-Instruct | 1.5B | 3GB | 55.57 | 6.34 | 12.96 | 10.56 | 9.32 | 2.24 | 16.16 |
| Hymba-1.5B-Instruct | 1.5B | 3GB | 60.09 | 2.72 | 4.59 | 1.05 | 11.56 | 5.515 | 14.19 |
| Falcon-E-1B-Instruct | 1.8B | 635MB | 54.35 | 9.12 | 16.5 | 2.51 | 19.42 | 9.64 | 18.59 |
For 3B scale models
| Model | Nb Params | Mem Footprint | IFEVAL | Math-Hard | GPQA | MuSR | BBH | MMLU-Pro | Avg. |
|---|---|---|---|---|---|---|---|---|---|
| Falcon-3-3B-Instruct | 3B | 6.46GB | 69.77 | 25 | 26.29 | 11.13 | 22.28 | 5.15 | 26.6 |
| Qwen2.5-3B-Instruct | 3B | 6.17GB | 64.75 | 36.78 | 25.8 | 7.57 | 25.05 | 3.02 | 27.16 |
| Falcon-E-3B-Instruct | 3B | 955MB | 60.97 | 15.3 | 23.59 | 2.12 | 26.45 | 7.45 | 22.64666667 |
Useful links
- View our release blogpost.
- Learn more about
onebitllmslibrary. - Feel free to join our discord server if you have any questions or to interact with our researchers and developers.
Citation
If the Falcon-E family of models were helpful to your work, feel free to give us a cite.
@misc{tiionebitllms,
title = {Falcon-E, a series of powerful, universal and fine-tunable 1.58bit language models.},
author = {Falcon-LLM Team},
month = {April},
url = {https://falcon-lm.github.io/blog/falcon-edge},
year = {2025}
}
- Downloads last month
- 71
We're not able to determine the quantization variants.
