TinyWay-1.1.0
TinyWay-1.1.0 is a lightweight decoder-only Transformer language model trained from scratch on limited compute. The project demonstrates that meaningful language modeling behavior can emerge from modest-scale models trained in constrained environments such as Kaggle.
Core idea: Understanding LLM training mechanics end-to-end by building, training, debugging, and deploying a Transformer LM without relying on pretrained weights.
Model Details
- Architecture: Decoder-only Transformer (GPT-style)
- Parameters: ~83M
- Layers: 10 Transformer blocks
- Hidden size: 512
- Attention heads: 8
- Context length: 256 tokens
- Activation: GELU
- Normalization: Pre-LayerNorm
- Weight tying: Token embedding ↔ LM head
- Precision during training: FP16 (AMP)
Training
Dataset
- TinyStoriesV2 (cleaned)
- Natural language short stories designed for training small language models
Tokenization
- GPT-2 BPE tokenizer
- Vocabulary size: 50,257
Training Setup
- Optimizer: AdamW
- Learning rate: tuned for stable convergence
- Gradient accumulation: enabled
- Gradient clipping: enabled
- Mixed precision training (AMP)
- Training performed entirely on Kaggle GPU environment (12-hour sessions)
Checkpoints
Models were saved at multiple training steps (5k → 30k). TinyWay-1.1.0 corresponds to the ~25k step checkpoint, which showed the best balance of fluency and stability.
Example Usage
from transformers import AutoConfig, AutoTokenizer, AutoModelForCausalLM
model_id = "NNEngine/TinyWay-1.1.0"
config = AutoConfig.from_pretrained(model_id, trust_remote_code=True)
tok = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
mdl = AutoModelForCausalLM.from_pretrained(model_id, config=config, trust_remote_code=True)
out = mdl.generate(
**tok(
"Once upon a time",
return_tensors="pt"
).to(mdl.device),
max_new_tokens=200, # force length
do_sample=True, # sampling, not greedy
temperature=0.8,
top_k=50,
repetition_penalty=1.2,
eos_token_id=None, # 🔥 disable EOS stopping
pad_token_id=tok.eos_token_id
)
print(tok.decode(out[0], skip_special_tokens=True))
Sample Output
Once upon a time, there was a little girl named Lily. She loved to play with her toys and explore the park near her home. One day, she found a shiny red ball hidden behind a tree…
(Outputs vary due to sampling.)
Intended Use
- Educational purposes
- Research on small-scale language models
- Understanding Transformer internals
- Studying training dynamics under compute constraints
Limitations
- Not instruction-tuned
- Not aligned for factual accuracy or safety
- May produce repetitive or incoherent text at times
- Trained on a limited dataset
This model is not intended for production use or sensitive applications.
Ethical Considerations
- The model may generate fictional or incorrect information
- No explicit safety or content filtering was applied
- Users should apply downstream safeguards if deploying
Citation
If you use this model in academic or technical work, please cite:
@misc{sharma2025tinyway,
title={TinyWay: Training Decoder-Only Language Models from Scratch on Limited Compute},
author={Shivam Sharma},
year={2025},
}
Author
Shivam Sharma B.Tech in Computer Science and Engineering (AIML) ITM Gwalior, India
Acknowledgements
- Hugging Face Transformers
- Kaggle GPU resources
- Open research community for open-source inspiration
- Downloads last month
- 32