Instructions to use froggeric/Cerebrum-1.0-7b-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use froggeric/Cerebrum-1.0-7b-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="froggeric/Cerebrum-1.0-7b-GGUF",
	filename="Cerebrum-1.0-7b-Q4_KS.gguf",
)

output = llm(
	"Once upon a time,",
	max_tokens=512,
	echo=True
)
print(output)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use froggeric/Cerebrum-1.0-7b-GGUF with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf froggeric/Cerebrum-1.0-7b-GGUF:Q6_K
# Run inference directly in the terminal:
llama-cli -hf froggeric/Cerebrum-1.0-7b-GGUF:Q6_K

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf froggeric/Cerebrum-1.0-7b-GGUF:Q6_K
# Run inference directly in the terminal:
llama-cli -hf froggeric/Cerebrum-1.0-7b-GGUF:Q6_K

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf froggeric/Cerebrum-1.0-7b-GGUF:Q6_K
# Run inference directly in the terminal:
./llama-cli -hf froggeric/Cerebrum-1.0-7b-GGUF:Q6_K

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf froggeric/Cerebrum-1.0-7b-GGUF:Q6_K
# Run inference directly in the terminal:
./build/bin/llama-cli -hf froggeric/Cerebrum-1.0-7b-GGUF:Q6_K

Use Docker

docker model run hf.co/froggeric/Cerebrum-1.0-7b-GGUF:Q6_K

LM Studio
Jan
Ollama
How to use froggeric/Cerebrum-1.0-7b-GGUF with Ollama:
```
ollama run hf.co/froggeric/Cerebrum-1.0-7b-GGUF:Q6_K
```

Unsloth Studio new

How to use froggeric/Cerebrum-1.0-7b-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for froggeric/Cerebrum-1.0-7b-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for froggeric/Cerebrum-1.0-7b-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for froggeric/Cerebrum-1.0-7b-GGUF to start chatting

Docker Model Runner
How to use froggeric/Cerebrum-1.0-7b-GGUF with Docker Model Runner:
```
docker model run hf.co/froggeric/Cerebrum-1.0-7b-GGUF:Q6_K
```

Lemonade

How to use froggeric/Cerebrum-1.0-7b-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull froggeric/Cerebrum-1.0-7b-GGUF:Q6_K

Run and chat with the model

lemonade run user.Cerebrum-1.0-7b-GGUF-Q6_K

List all available models

lemonade list

GGUF quantisations of AetherResearch/Cerebrum-1.0-7b

Introduction

Cerebrum 7b is a large language model (LLM) created specifically for reasoning tasks. It is based on the Mistral 7b model, fine-tuned on a small custom dataset of native chain of thought data and further improved with targeted RLHF (tRLHF), a novel technique for sample-efficient LLM alignment. Unlike numerous other recent fine-tuning approaches, our training pipeline includes under 5000 training prompts and even fewer labeled datapoints for tRLHF.

Native chain of thought approach means that Cerebrum is trained to devise a tactical plan before tackling problems that require thinking. For brainstorming, knowledge intensive, and creative tasks Cerebrum will typically omit unnecessarily verbose considerations.

Zero-shot prompted Cerebrum significantly outperforms few-shot prompted Mistral 7b as well as much larger models (such as Llama 2 70b) on a range of tasks that require reasoning, including ARC Challenge, GSM8k, and Math.

This LLM model works a lot better than any other mistral mixtral models for agent data, tested on 14th March 2024.

Benchmarking

An overview of Cerebrum 7b performance compared to reported performance Mistral 7b and LLama 2 70b on selected benchmarks that require reasoning:

Notes: 1) Cerebrum evaluated zero-shot, Mistral 8-shot with maj@8, Llama 8-shot; 2) Cerebrum evaluated zero-shot, Mistral 4-shot with maj@4, Llama 4-shot

Usage

For optimal performance, Cerebrum should be prompted with an Alpaca-style template that requests the description of the "thought process". Here is what a conversation should look like from the model's point of view:

<s>A chat between a user and a thinking artificial intelligence assistant. The assistant describes its thought process and gives helpful and detailed answers to the user's questions.
User: Are you conscious?
AI:

This prompt is also available as a chat template. Here is how you could use it:

messages = [
    {'role': 'user', 'content': 'What is chain of thought prompting?'},
    {'role': 'assistant', 'content': 'Chain of thought prompting is a technique used in large language models to encourage the model to think more deeply about the problem it is trying to solve. It involves prompting the model to generate a series of intermediate steps or "thoughts" that lead to the final answer. This can help the model to better understand the problem and to generate more accurate and relevant responses.'},
    {'role': 'user', 'content': 'Why does chain of thought prompting work?'}
]

input = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors='pt')

with torch.no_grad():
    out = model.generate(input_ids=input, max_new_tokens=100, do_sample=False)
    # will generate "Chain of thought prompting works because it helps the model to break down complex problems into smaller, more manageable steps. This allows the model to focus on each step individually and to generate more accurate and relevant responses. Additionally, the intermediate steps can help the model to understand the problem better and to find patterns or connections that it may not have seen before.</s>"

The model ends its turn by generating the EOS token. Importantly, this token should be removed from the model answer in a multi-turn dialogue.

Cerebrum can be operated at very low temperatures (and specifically temperature 0), which improves performance on tasks that require precise answers. The alignment should be sufficient to avoid repetitions in most cases without a repetition penalty.

Downloads last month: 47

GGUF

Model size

7B params

Architecture

llama

Hardware compatibility

6-bit

8-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for froggeric/Cerebrum-1.0-7b-GGUF

Base model

mistralai/Mistral-7B-v0.1

Quantized

(187)

this model