Esper 3.1
Collection
Esper 3.1 is a DevOps, architecture, code, and general reasoning finetune for Qwen, Ministral and gpt-oss!
•
5 items
•
Updated
•
1
Support our open-source dataset and model releases!
Esper 3.1: Ministral-3-3B-Reasoning-2512, Qwen3-4B-Thinking-2507, Ministral-3-8B-Reasoning-2512, Ministral-3-14B-Reasoning-2512, gpt-oss-20b
Esper 3.1 is a coding, architecture, and DevOps reasoning specialist built on Qwen 3.
Esper 3.1 uses the Ministral-3-8B-Reasoning-2512 prompt format.
Example inference script to get started:
import torch
from transformers import Mistral3ForConditionalGeneration, MistralCommonBackend
model_id = "ValiantLabs/Ministral-3-8B-Reasoning-2512-Esper3.1"
tokenizer = MistralCommonBackend.from_pretrained(model_id)
model = Mistral3ForConditionalGeneration.from_pretrained(
model_id, torch_dtype=torch.bfloat16, device_map="auto"
)
user_prompt = "In a master-detail interface, you have a list of customer names. To improve perceived performance, you want to prefetch a customer's detailed data when a user hovers their mouse over the customer's name in the list. Implement this behavior using React Query's queryClient.prefetchQuery method within an onMouseEnter event handler."
#user_prompt = "The core learning mechanism in Soar, chunking, creates new production rules by compiling the results of successful subgoal resolution. Explain the precise mechanism by which the dependency graph of working memory elements that contributed to the subgoal's result determines the conditions of the new chunk. What are the implications of this mechanism for creating overly specific or overly general rules, and how can an architect guide the chunking process?"
#user_prompt = "Write a Haskell program that models a bank with multiple accounts. Use Haskell's Software Transactional Memory (STM) library to implement a thread-safe transfer function that moves funds from one account to another. The function must execute atomically, ensuring that the total amount of money in the system remains constant even when multiple transfers are attempted concurrently from different threads."
system_prompt = (
"# HOW YOU SHOULD THINK AND ANSWER\n\n"
"First draft your thinking process (inner monologue) until you arrive at a response. "
"Format your response using Markdown, and use LaTeX for any mathematical equations. "
"Write both your thoughts and the response in the same language as the input.\n\n"
"Your thinking process must follow the template below:"
"[THINK]Your thoughts or/and draft, like working through an exercise on scratch paper. "
"Be as casual and as long as you want until you are confident to generate the response to the user.[/THINK]"
"Here, provide a self-contained response."
)
messages = [
{
"role": "system",
"content": system_prompt
},
{
"role": "user",
"content": [
{
"type": "text",
"text": user_prompt,
},
],
},
]
tokenized = tokenizer.apply_chat_template(messages, return_tensors="pt", return_dict=True)
tokenized = {k: v.to("cuda") for k, v in tokenized.items() if hasattr(v, "to")}
output = model.generate(
**tokenized,
max_new_tokens=20000,
)[0]
decoded_output = tokenizer.decode(output[len(tokenized["input_ids"][0]):])
print(decoded_output)
Esper 3.1 is created by Valiant Labs.
Check out our HuggingFace page to see all of our models!
We care about open source. For everyone to use.