TinyStories: How Small Can Language Models Be and Still Speak Coherent English?
Paper
• 2305.07759 • Published
• 39
A GPT-style language model with 51 million parameters, trained from scratch on the TinyStories dataset. This model generates creative short stories using vocabulary and concepts that young children can understand.
This is a decoder-only transformer model built using PyTorch, implementing the GPT architecture with the following specifications:
The model was trained on TinyStories, a dataset of short stories generated by GPT-3.5 and GPT-4 using simple vocabulary.
| Iteration | Train Loss | Validation Loss |
|---|---|---|
| 1,000 | 5.46 | 5.46 |
| 5,000 | 3.07 | 3.07 |
| 10,000 | 2.38 | 2.39 |
| 20,000 | 1.89 | 1.92 |
| 40,000 | 1.51 | 1.57 |
import torch
import torch.nn as nn
from dataclasses import dataclass
# Define the model configuration and architecture
# (Copy the GPT class from the training notebook)
# Load the trained weights
config = GPTConfig(
vocab_size=50257,
block_size=256,
n_layer=8,
n_head=8,
n_embd=512,
dropout=0.0,
bias=True
)
model = GPT(config)
model.load_state_dict(torch.load("pytorch_model.pt"))
model.eval()
import tiktoken
# Initialize tokenizer
enc = tiktoken.get_encoding("gpt2")
# Prepare input
prompt = "Once upon a time there was a"
context = torch.tensor(enc.encode_ordinary(prompt)).unsqueeze(0)
# Generate
with torch.no_grad():
output = model.generate(context, max_new_tokens=200, temperature=0.8, top_k=40)
# Decode
generated_text = enc.decode(output.squeeze().tolist())
print(generated_text)
If you use this model, please cite:
@misc{tinystories-gpt-512,
author = {Usama Asif},
title = {GPT-TinyStories-512: A Small Language Model for Story Generation},
year = {2025},
publisher = {HuggingFace},
url = {https://huggingface.co/usamaasif-ua/GPT-TinyStories-512}
}