TinyWay-1.1.0

TinyWay-1.1.0 is a lightweight decoder-only Transformer language model trained from scratch on limited compute. The project demonstrates that meaningful language modeling behavior can emerge from modest-scale models trained in constrained environments such as Kaggle.

Core idea: Understanding LLM training mechanics end-to-end by building, training, debugging, and deploying a Transformer LM without relying on pretrained weights.

Model Details

Architecture: Decoder-only Transformer (GPT-style)
Parameters: ~83M
Layers: 10 Transformer blocks
Hidden size: 512
Attention heads: 8
Context length: 256 tokens
Activation: GELU
Normalization: Pre-LayerNorm
Weight tying: Token embedding ↔ LM head
Precision during training: FP16 (AMP)

Training

Dataset

TinyStoriesV2 (cleaned)
Natural language short stories designed for training small language models

Tokenization

GPT-2 BPE tokenizer
Vocabulary size: 50,257

Training Setup

Optimizer: AdamW
Learning rate: tuned for stable convergence
Gradient accumulation: enabled
Gradient clipping: enabled
Mixed precision training (AMP)
Training performed entirely on Kaggle GPU environment (12-hour sessions)

Checkpoints

Models were saved at multiple training steps (5k → 30k). TinyWay-1.1.0 corresponds to the ~25k step checkpoint, which showed the best balance of fluency and stability.

Example Usage

from transformers import AutoConfig, AutoTokenizer, AutoModelForCausalLM

model_id = "NNEngine/TinyWay-1.1.0"

config = AutoConfig.from_pretrained(model_id, trust_remote_code=True)
tok = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
mdl = AutoModelForCausalLM.from_pretrained(model_id, config=config, trust_remote_code=True)

out = mdl.generate(
    **tok(
        "Once upon a time",
        return_tensors="pt"
    ).to(mdl.device),

    max_new_tokens=200,          # force length
    do_sample=True,              # sampling, not greedy
    temperature=0.8,
    top_k=50,
    repetition_penalty=1.2,

    eos_token_id=None,           # 🔥 disable EOS stopping
    pad_token_id=tok.eos_token_id
)

print(tok.decode(out[0], skip_special_tokens=True))

Sample Output

Once upon a time, there was a little girl named Lily. She loved to play with her toys and explore the park near her home. One day, she found a shiny red ball hidden behind a tree…

(Outputs vary due to sampling.)

Intended Use

Educational purposes
Research on small-scale language models
Understanding Transformer internals
Studying training dynamics under compute constraints

Limitations

Not instruction-tuned
Not aligned for factual accuracy or safety
May produce repetitive or incoherent text at times
Trained on a limited dataset

This model is not intended for production use or sensitive applications.

Ethical Considerations

The model may generate fictional or incorrect information
No explicit safety or content filtering was applied
Users should apply downstream safeguards if deploying

Citation

If you use this model in academic or technical work, please cite:

@misc{sharma2025tinyway,
  title={TinyWay: Training Decoder-Only Language Models from Scratch on Limited Compute},
  author={Shivam Sharma},
  year={2025},
}

Author

Shivam Sharma B.Tech in Computer Science and Engineering (AIML) ITM Gwalior, India

Acknowledgements

Hugging Face Transformers
Kaggle GPU resources
Open research community for open-source inspiration

Downloads last month: 32

NNEngine
/

TinyWay-1.1.0