allenai
/

Olmo-3-7B-Instruct-DPO

Text Generation

Model card Files Files and versions

saurabh5 commited on Nov 21, 2025

Commit

9090ad1

·

verified ·

1 Parent(s): d7dde56

Update README.md

Files changed (1) hide show

README.md +48 -0

README.md CHANGED Viewed

@@ -150,6 +150,54 @@ Moo Moo the cow would certinaly win.
 - reinforcement learning from verifiable rewards on the Dolci-Think-RL-7B dataset. This dataset consits of math, code, instruction-following, and general chat queries.
 - Datasets: [Dolci-Think-RL-7B](https://huggingface.co/datasets/allenai/Dolci-Think-RL-7B), [Dolci-Instruct-RL-7B](https://huggingface.co/datasets/allenai/Dolci-Instruct-RL-7B)
 ## Bias, Risks, and Limitations
 Like any base language model or fine-tuned model without safety filtering, these models can easily be prompted by users to generate harmful and sensitive content. Such content may also be produced unintentionally, especially in cases involving bias, so we recommend that users consider the risks when applying this technology. Additionally, many statements from OLMo or any LLM are often inaccurate, so facts should be verified.

 - reinforcement learning from verifiable rewards on the Dolci-Think-RL-7B dataset. This dataset consits of math, code, instruction-following, and general chat queries.
 - Datasets: [Dolci-Think-RL-7B](https://huggingface.co/datasets/allenai/Dolci-Think-RL-7B), [Dolci-Instruct-RL-7B](https://huggingface.co/datasets/allenai/Dolci-Instruct-RL-7B)
+## Inference & Recommended Settings
+We evaluated our models on the following settings. We also recommend using them for generation:
+- **temperature:** `0.6`
+- **top_p:** `0.95`
+- **max_tokens:** `32768`
+### transformers Example
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+model_id = "allenai/Olmo-3-7B-Instruct-DPO"
+tokenizer = AutoTokenizer.from_pretrained(model_id)
+model = AutoModelForCausalLM.from_pretrained(
+    model_id,
+    device_map="auto",
+)
+prompt = "Who would in in a fight - a dinosaur of a cow named MooMoo?"
+inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
+outputs = model.generate(
+    **inputs,
+    temperature=0.6,
+    top_p=0.95,
+    max_new_tokens=32768,
+)
+print(tokenizer.decode(outputs[0], skip_special_tokens=True))
+```
+### vllm Example
+```python
+from vllm import LLM, SamplingParams
+model_id = "allenai/Olmo-3-7B-Instruct-DPO"
+llm = LLM(model=model_id)
+sampling_params = SamplingParams(
+    temperature=0.6,
+    top_p=0.95,
+    max_tokens=32768,
+)
+prompt = "Who would in in a fight - a dinosaur of a cow named MooMoo?"
+outputs = llm.generate(prompt, sampling_params)
+print(outputs[0].outputs[0].text)
+```
 ## Bias, Risks, and Limitations
 Like any base language model or fine-tuned model without safety filtering, these models can easily be prompted by users to generate harmful and sensitive content. Such content may also be produced unintentionally, especially in cases involving bias, so we recommend that users consider the risks when applying this technology. Additionally, many statements from OLMo or any LLM are often inaccurate, so facts should be verified.