GPT-TinyStories-512

A GPT-style language model with 51 million parameters, trained from scratch on the TinyStories dataset. This model generates creative short stories using vocabulary and concepts that young children can understand.

Model Description

This is a decoder-only transformer model built using PyTorch, implementing the GPT architecture with the following specifications:

  • Parameters: ~51 million
  • Architecture: 8-layer transformer with 8 attention heads
  • Embedding Dimension: 512
  • Context Window: 256 tokens
  • Vocabulary: 50,257 tokens (GPT-2 tokenizer)
  • Training Dataset: TinyStories (synthetic stories for 3-4 year olds)

Training Details

Training Data

The model was trained on TinyStories, a dataset of short stories generated by GPT-3.5 and GPT-4 using simple vocabulary.

Training Procedure

  • Optimizer: AdamW (lr=5e-4, betas=(0.9, 0.95), weight_decay=0.1)
  • Learning Rate Schedule: Linear warmup (2,000 steps) + Cosine annealing
  • Batch Size: 32 (with 32 gradient accumulation steps, effective batch size: 1,024)
  • Training Steps: 40,000 iterations
  • Mixed Precision: FP16/BF16 with gradient scaling
  • Hardware: Single GPU training

Training Results

Iteration Train Loss Validation Loss
1,000 5.46 5.46
5,000 3.07 3.07
10,000 2.38 2.39
20,000 1.89 1.92
40,000 1.51 1.57

Usage

Loading the Model

import torch
import torch.nn as nn
from dataclasses import dataclass

# Define the model configuration and architecture
# (Copy the GPT class from the training notebook)

# Load the trained weights
config = GPTConfig(
    vocab_size=50257,
    block_size=256,
    n_layer=8,
    n_head=8,
    n_embd=512,
    dropout=0.0,
    bias=True
)

model = GPT(config)
model.load_state_dict(torch.load("pytorch_model.pt"))
model.eval()

Generating Text

import tiktoken

# Initialize tokenizer
enc = tiktoken.get_encoding("gpt2")

# Prepare input
prompt = "Once upon a time there was a"
context = torch.tensor(enc.encode_ordinary(prompt)).unsqueeze(0)

# Generate
with torch.no_grad():
    output = model.generate(context, max_new_tokens=200, temperature=0.8, top_k=40)

# Decode
generated_text = enc.decode(output.squeeze().tolist())
print(generated_text)

Limitations and Bias

  • The model is trained on synthetic data and may not reflect real-world language patterns
  • Limited to simple vocabulary suitable for young children
  • May generate repetitive or nonsensical text for longer sequences
  • No safety filtering or alignment training has been applied

Citation

If you use this model, please cite:

@misc{tinystories-gpt-512,
  author = {Usama Asif},
  title = {GPT-TinyStories-512: A Small Language Model for Story Generation},
  year = {2025},
  publisher = {HuggingFace},
  url = {https://huggingface.co/usamaasif-ua/GPT-TinyStories-512}
}

References

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train usamaasif-ua/GPT-TinyStories-512

Papers for usamaasif-ua/GPT-TinyStories-512