GPT4-dev
This directory contains a development checkpoint of my small GPT-4-style language model saved at 78,000 training steps. The model is trained using PyTorch and the Hugging Face Transformers library.
These weights are intended for research and experimentation. You can either:
- Load them from the Hugging Face Hub (recommended), or
- Use the raw checkpoint folder (for example,
GPT4-dev-177M-1511in this repo) with the training/inference scripts.
All core training and inference code lives in this repository (see train.py, modeling_gpt4dev.py, and inference2.py).
Model description
- Architecture: GPT-style decoder-only transformer (small GPT-4-inspired configuration).
- Objective: Next-token prediction on web text (causal language modeling).
- Use cases: General text generation, experimentation, and as a base for future instruction-tuned models.
- Status: Undertrained research checkpoint – expect rough edges and occasional incoherence. I didn't stop training so more checkpoints will be published in the future.
- Evals: 29.03% on MMLU
More detailed in EVALS.md
I plan to continue training and to release instruction-tuned variants based on this model in the future.
Training details
- Dataset:
HuggingFaceFW/fineweb - Steps: 78,000 of 800,000
- Frameworks: PyTorch + Hugging Face Transformers
This is not a production-ready model. It is a snapshot from an ongoing run, shared so others can tinker with it.
Inference
Example with Hugging Face Transformers:
from transformers import AutoTokenizer, AutoModelForCausalLM
model_id = "k050506koch/GPT4-dev-177M-1511"
model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True) # trust_remote_code is required because the model has a custom architecture
tokenizer = AutoTokenizer.from_pretrained(model_id)
tokenizer.pad_token_id = tokenizer.eos_token_id
prompt = "He is a doctor. His main goal is"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(
**inputs,
max_length=128,
temperature=0.7,
top_p=0.9,
repetition_penalty=1.2,
no_repeat_ngram_size=3,
num_return_sequences=1,
do_sample=True,
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
You can also download the full folder and point local scripts like inference2.py to the checkpoint path if you prefer offline usage.
Limitations and bias
- This is an early checkpoint and can produce factual errors, nonsense, and unsafe or biased content.
Contributing
Contributions, issues, and any kind of feedback are very welcome! I’m a student working on these models as a learning project, so there may be mistakes or "suboptimal" design choices. If you have suggestions or find bugs, please open an issue or a pull request, I will be happy.
License
This model and associated code are released under the MIT License. You are free to use, modify, and build upon this work. Have fun!
Acknowledgements
Thanks to the teams and communities that made this work possible:
- Downloads last month
- 41