Chess-Bot-3000 250M

A 250M parameter language model trained from scratch on chess games in UCI notation. The model learns to predict chess moves from game context and can adapt its play style based on player Elo ratings.

Disclaimer: This model card is mostly LLM generated and may contain mistakes or missing information.

Model Details

Model Description

This model is a transformer-based language model trained on millions of chess games from the Lichess database. It uses UCI (Universal Chess Interface) notation and includes special tokens for player Elo ratings and game outcomes, allowing it to generate moves appropriate for different skill levels.

  • Developed by: David Hauser (https://github.com/kinggongzilla)
  • Model type: Causal language model (decoder-only transformer)
  • Language(s): Chess UCI notation
  • License: Apache 2.0
  • Architecture: Qwen2-style (SmolLM3 base)
  • Parameters: ~250M

Model Sources

How to Use

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("daavidhauser/chess-bot-3000-100m")
tokenizer = AutoTokenizer.from_pretrained("daavidhauser/chess-bot-3000-100m")

prompt = "<BOG> <WHITE:1500> <BLACK:1600> <BLACK_WIN> e2e4"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(inputs["input_ids"], max_new_tokens=1)
print(tokenizer.decode(outputs[0]))

Note:

  1. By passing the first move "e2e4", the model will generate the next move for black
  2. By setting <BLACK_WIN> the model is being conditioned to predict moves where black wins
  3. ELO level tokens for white and black will induce the model to play at the given ELO level.

Evaluation

250m model performance against stockfish at various ELO levels image

Uses

Direct Use

The model can be used for:

  • Chess move prediction and game continuation
  • Generating chess games at specific skill levels (by conditioning on Elo tokens)
  • Chess position evaluation through next-move probabilities
  • Chess education and analysis tools

Training Details

Training Data

The model was trained on approximately 100 million chess games from the Lichess open database (January 2024). Games were converted to UCI notation and augmented with:

  • Player Elo ratings (rounded to nearest 100, range 0-3500)
  • Game outcomes (<WHITE_WIN>, <BLACK_WIN>, <DRAW>)
  • Special tokens for game boundaries

Each training example follows the format: <BOG> <WHITE:1500> <BLACK:1500> <DRAW> ... <EOG> In this representation, each chess move (half-move) corresponds to one token.

Training Infrastructure:

  • Framework: Nanotron (PyTorch)
  • Hardware: 1x NVIDIA A100 GPU
  • Total training time: ~5 hours

Loss curves for 100m and 250m models

Downloads last month
44
Safetensors
Model size
0.2B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including daavidhauser/chess-bot-3000-250m