🐶Doge-CheckPoints
Collection
A series of checkPoint weights that can continue training on new datasets without spikes of the training. • 6 items • Updated • 2
# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("SmallDoge/Doge-320M-checkpoint", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("SmallDoge/Doge-320M-checkpoint", trust_remote_code=True)
messages = [
{"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
tokenize=True,
return_dict=True,
return_tensors="pt",
).to(model.device)
outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))Doge uses wsd_scheduler as the training scheduler, which divides the learning rate into three stages: warmup, stable, and decay. It allows us to continue training on any new dataset from any checkpoint in the stable stage without spikes of the training.
Here are the initial learning rates required to continue training at each checkpoint:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="SmallDoge/Doge-320M-checkpoint", trust_remote_code=True) messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)