Esper 3.1 is a coding, architecture, and DevOps reasoning specialist built on Qwen 3.

Your dedicated DevOps expert: Esper 3.1 maximizes DevOps and architecture helpfulness, powered by high-difficulty DevOps and architecture data generated with DeepSeek-V3.1-Terminus!
Improved coding performance: challenging code-reasoning datasets stretch DeepSeek-V3.1-Terminus and DeepSeek-V3.2-Exp to the limits, allowing Esper 3.1 to tackle harder coding tasks!
AI to build AI: our high-difficulty AI expertise data boosts Esper 3.1's MLOps, AI architecture, AI research, and general reasoning skills.
Small model sizes allow running on local desktop and mobile, plus super-fast server inference!

Prompting Guide

Esper 3.1 uses the Ministral-3-8B-Reasoning-2512 prompt format.

Example inference script to get started:

import torch
from transformers import Mistral3ForConditionalGeneration, MistralCommonBackend

model_id = "ValiantLabs/Ministral-3-8B-Reasoning-2512-Esper3.1"

tokenizer = MistralCommonBackend.from_pretrained(model_id)

model = Mistral3ForConditionalGeneration.from_pretrained(
    model_id, torch_dtype=torch.bfloat16, device_map="auto"
)

user_prompt = "In a master-detail interface, you have a list of customer names. To improve perceived performance, you want to prefetch a customer's detailed data when a user hovers their mouse over the customer's name in the list. Implement this behavior using React Query's queryClient.prefetchQuery method within an onMouseEnter event handler."
#user_prompt = "The core learning mechanism in Soar, chunking, creates new production rules by compiling the results of successful subgoal resolution. Explain the precise mechanism by which the dependency graph of working memory elements that contributed to the subgoal's result determines the conditions of the new chunk. What are the implications of this mechanism for creating overly specific or overly general rules, and how can an architect guide the chunking process?"
#user_prompt = "Write a Haskell program that models a bank with multiple accounts. Use Haskell's Software Transactional Memory (STM) library to implement a thread-safe transfer function that moves funds from one account to another. The function must execute atomically, ensuring that the total amount of money in the system remains constant even when multiple transfers are attempted concurrently from different threads."

system_prompt = (
    "# HOW YOU SHOULD THINK AND ANSWER\n\n"
    "First draft your thinking process (inner monologue) until you arrive at a response. "
    "Format your response using Markdown, and use LaTeX for any mathematical equations. "
    "Write both your thoughts and the response in the same language as the input.\n\n"
    "Your thinking process must follow the template below:"
    "[THINK]Your thoughts or/and draft, like working through an exercise on scratch paper. "
    "Be as casual and as long as you want until you are confident to generate the response to the user.[/THINK]"
    "Here, provide a self-contained response."
)

messages = [
    {
        "role": "system",
        "content": system_prompt
    },
    {
        "role": "user",
        "content": [
            {
                "type": "text",
                "text": user_prompt,
            },
        ],
    },
]

tokenized = tokenizer.apply_chat_template(messages, return_tensors="pt", return_dict=True)
tokenized = {k: v.to("cuda") for k, v in tokenized.items() if hasattr(v, "to")}

output = model.generate(
    **tokenized,
    max_new_tokens=20000,
)[0]

decoded_output = tokenizer.decode(output[len(tokenized["input_ids"][0]):])
print(decoded_output)