Instructions to use froggeric/Cerebrum-1.0-7b-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use froggeric/Cerebrum-1.0-7b-GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="froggeric/Cerebrum-1.0-7b-GGUF", filename="Cerebrum-1.0-7b-Q4_KS.gguf", )
output = llm( "Once upon a time,", max_tokens=512, echo=True ) print(output)
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use froggeric/Cerebrum-1.0-7b-GGUF with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf froggeric/Cerebrum-1.0-7b-GGUF:Q6_K # Run inference directly in the terminal: llama-cli -hf froggeric/Cerebrum-1.0-7b-GGUF:Q6_K
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf froggeric/Cerebrum-1.0-7b-GGUF:Q6_K # Run inference directly in the terminal: llama-cli -hf froggeric/Cerebrum-1.0-7b-GGUF:Q6_K
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf froggeric/Cerebrum-1.0-7b-GGUF:Q6_K # Run inference directly in the terminal: ./llama-cli -hf froggeric/Cerebrum-1.0-7b-GGUF:Q6_K
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf froggeric/Cerebrum-1.0-7b-GGUF:Q6_K # Run inference directly in the terminal: ./build/bin/llama-cli -hf froggeric/Cerebrum-1.0-7b-GGUF:Q6_K
Use Docker
docker model run hf.co/froggeric/Cerebrum-1.0-7b-GGUF:Q6_K
- LM Studio
- Jan
- Ollama
How to use froggeric/Cerebrum-1.0-7b-GGUF with Ollama:
ollama run hf.co/froggeric/Cerebrum-1.0-7b-GGUF:Q6_K
- Unsloth Studio new
How to use froggeric/Cerebrum-1.0-7b-GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for froggeric/Cerebrum-1.0-7b-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for froggeric/Cerebrum-1.0-7b-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for froggeric/Cerebrum-1.0-7b-GGUF to start chatting
- Docker Model Runner
How to use froggeric/Cerebrum-1.0-7b-GGUF with Docker Model Runner:
docker model run hf.co/froggeric/Cerebrum-1.0-7b-GGUF:Q6_K
- Lemonade
How to use froggeric/Cerebrum-1.0-7b-GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull froggeric/Cerebrum-1.0-7b-GGUF:Q6_K
Run and chat with the model
lemonade run user.Cerebrum-1.0-7b-GGUF-Q6_K
List all available models
lemonade list
GGUF quantisations of AetherResearch/Cerebrum-1.0-7b
Introduction
Cerebrum 7b is a large language model (LLM) created specifically for reasoning tasks. It is based on the Mistral 7b model, fine-tuned on a small custom dataset of native chain of thought data and further improved with targeted RLHF (tRLHF), a novel technique for sample-efficient LLM alignment. Unlike numerous other recent fine-tuning approaches, our training pipeline includes under 5000 training prompts and even fewer labeled datapoints for tRLHF.
Native chain of thought approach means that Cerebrum is trained to devise a tactical plan before tackling problems that require thinking. For brainstorming, knowledge intensive, and creative tasks Cerebrum will typically omit unnecessarily verbose considerations.
Zero-shot prompted Cerebrum significantly outperforms few-shot prompted Mistral 7b as well as much larger models (such as Llama 2 70b) on a range of tasks that require reasoning, including ARC Challenge, GSM8k, and Math.
This LLM model works a lot better than any other mistral mixtral models for agent data, tested on 14th March 2024.
Benchmarking
An overview of Cerebrum 7b performance compared to reported performance Mistral 7b and LLama 2 70b on selected benchmarks that require reasoning:
Notes: 1) Cerebrum evaluated zero-shot, Mistral 8-shot with maj@8, Llama 8-shot; 2) Cerebrum evaluated zero-shot, Mistral 4-shot with maj@4, Llama 4-shot
Usage
For optimal performance, Cerebrum should be prompted with an Alpaca-style template that requests the description of the "thought process". Here is what a conversation should look like from the model's point of view:
<s>A chat between a user and a thinking artificial intelligence assistant. The assistant describes its thought process and gives helpful and detailed answers to the user's questions.
User: Are you conscious?
AI:
This prompt is also available as a chat template. Here is how you could use it:
messages = [
{'role': 'user', 'content': 'What is chain of thought prompting?'},
{'role': 'assistant', 'content': 'Chain of thought prompting is a technique used in large language models to encourage the model to think more deeply about the problem it is trying to solve. It involves prompting the model to generate a series of intermediate steps or "thoughts" that lead to the final answer. This can help the model to better understand the problem and to generate more accurate and relevant responses.'},
{'role': 'user', 'content': 'Why does chain of thought prompting work?'}
]
input = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors='pt')
with torch.no_grad():
out = model.generate(input_ids=input, max_new_tokens=100, do_sample=False)
# will generate "Chain of thought prompting works because it helps the model to break down complex problems into smaller, more manageable steps. This allows the model to focus on each step individually and to generate more accurate and relevant responses. Additionally, the intermediate steps can help the model to understand the problem better and to find patterns or connections that it may not have seen before.</s>"
The model ends its turn by generating the EOS token. Importantly, this token should be removed from the model answer in a multi-turn dialogue.
Cerebrum can be operated at very low temperatures (and specifically temperature 0), which improves performance on tasks that require precise answers. The alignment should be sufficient to avoid repetitions in most cases without a repetition penalty.
- Downloads last month
- 47
6-bit
8-bit
Model tree for froggeric/Cerebrum-1.0-7b-GGUF
Base model
mistralai/Mistral-7B-v0.1