Instructions to use ericrisco/salamandra-2b-r1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use ericrisco/salamandra-2b-r1 with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("ericrisco/salamandra-2b-r1", dtype="auto") - llama-cpp-python
How to use ericrisco/salamandra-2b-r1 with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="ericrisco/salamandra-2b-r1", filename="unsloth.F16.gguf", )
llm.create_chat_completion( messages = "No input example has been defined for this model task." )
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use ericrisco/salamandra-2b-r1 with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf ericrisco/salamandra-2b-r1:F16 # Run inference directly in the terminal: llama-cli -hf ericrisco/salamandra-2b-r1:F16
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf ericrisco/salamandra-2b-r1:F16 # Run inference directly in the terminal: llama-cli -hf ericrisco/salamandra-2b-r1:F16
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf ericrisco/salamandra-2b-r1:F16 # Run inference directly in the terminal: ./llama-cli -hf ericrisco/salamandra-2b-r1:F16
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf ericrisco/salamandra-2b-r1:F16 # Run inference directly in the terminal: ./build/bin/llama-cli -hf ericrisco/salamandra-2b-r1:F16
Use Docker
docker model run hf.co/ericrisco/salamandra-2b-r1:F16
- LM Studio
- Jan
- Ollama
How to use ericrisco/salamandra-2b-r1 with Ollama:
ollama run hf.co/ericrisco/salamandra-2b-r1:F16
- Unsloth Studio new
How to use ericrisco/salamandra-2b-r1 with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for ericrisco/salamandra-2b-r1 to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for ericrisco/salamandra-2b-r1 to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for ericrisco/salamandra-2b-r1 to start chatting
- Docker Model Runner
How to use ericrisco/salamandra-2b-r1 with Docker Model Runner:
docker model run hf.co/ericrisco/salamandra-2b-r1:F16
- Lemonade
How to use ericrisco/salamandra-2b-r1 with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull ericrisco/salamandra-2b-r1:F16
Run and chat with the model
lemonade run user.salamandra-2b-r1-F16
List all available models
lemonade list
Install from WinGet (Windows)
winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf ericrisco/salamandra-2b-r1:F16# Run inference directly in the terminal:
llama-cli -hf ericrisco/salamandra-2b-r1:F16Use pre-built binary
# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf ericrisco/salamandra-2b-r1:F16# Run inference directly in the terminal:
./llama-cli -hf ericrisco/salamandra-2b-r1:F16Build from source code
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf ericrisco/salamandra-2b-r1:F16# Run inference directly in the terminal:
./build/bin/llama-cli -hf ericrisco/salamandra-2b-r1:F16Use Docker
docker model run hf.co/ericrisco/salamandra-2b-r1:F16Salamandra 2B Reasoning R1 Model Card
Salamandra is a highly multilingual model pre-trained from scratch that comes in different sizes. This model card corresponds to the 2B instructed version, fine-tuned using GRPO (Group Reward Policy Optimization) and Unsloth.
To visit the model cards of other Salamandra versions, please refer to the Model Index.
The entire Salamandra family is released under a permissive Apache 2.0 license. Along with the open weights, all training scripts and configuration files are made publicly available in this GitHub repository.
Model Details
Description
Salamandra-2B is a reasoning-focused transformer-based language model fine-tuned with GRPO. It has been trained on high-quality datasets, including:
- GSM8K (English)
- GSM8K Translated (Spanish)
- GSM8K Translated (Catalan)
This dataset selection allows the model to reason through complex problems in multiple languages. Instead of relying on traditional supervised fine-tuning, GRPO optimizes the model through reward-based reinforcement learning, making it more adaptive to structured reasoning tasks.
Intended Use
Direct Use
The model is designed as a reasoning assistant capable of structured problem-solving across different domains. It can be used for:
- Logical and mathematical reasoning tasks
- Multi-step question answering
- Instruction following in multilingual contexts
Out-of-scope Use
The model is not intended for malicious applications or any activity that violates legal or ethical standards.
How to Use
The instruction-following models use the ChatML template for structured dialogue formatting:
from datetime import datetime
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model_id = "ericrisco/salamandra-2b-r1"
text = "At what temperature does water boil?"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
device_map="auto",
torch_dtype=torch.bfloat16
)
message = [ { "role": "user", "content": text } ]
date_string = datetime.today().strftime('%Y-%m-%d')
prompt = tokenizer.apply_chat_template(
message,
tokenize=False,
add_generation_prompt=True,
date_string=date_string
)
inputs = tokenizer.encode(prompt, add_special_tokens=False, return_tensors="pt")
outputs = model.generate(input_ids=inputs.to(model.device), max_new_tokens=200)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
- Downloads last month
- 11
Install from brew
# Start a local OpenAI-compatible server with a web UI: llama-server -hf ericrisco/salamandra-2b-r1:F16# Run inference directly in the terminal: llama-cli -hf ericrisco/salamandra-2b-r1:F16