GPT4-dev

This directory contains a development checkpoint of my small GPT-4-style language model saved at 78,000 training steps. The model is trained using PyTorch and the Hugging Face Transformers library.

These weights are intended for research and experimentation. You can either:

Load them from the Hugging Face Hub (recommended), or
Use the raw checkpoint folder (for example, GPT4-dev-177M-1511 in this repo) with the training/inference scripts.

All core training and inference code lives in this repository (see train.py, modeling_gpt4dev.py, and inference2.py).

Model description

Architecture: GPT-style decoder-only transformer (small GPT-4-inspired configuration).
Objective: Next-token prediction on web text (causal language modeling).
Use cases: General text generation, experimentation, and as a base for future instruction-tuned models.
Status: Undertrained research checkpoint – expect rough edges and occasional incoherence. I didn't stop training so more checkpoints will be published in the future.
Evals: 29.03% on MMLU

More detailed in EVALS.md

I plan to continue training and to release instruction-tuned variants based on this model in the future.

Training details

Dataset: HuggingFaceFW/fineweb
Steps: 78,000 of 800,000
Frameworks: PyTorch + Hugging Face Transformers

This is not a production-ready model. It is a snapshot from an ongoing run, shared so others can tinker with it.

Inference

Example with Hugging Face Transformers:

from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "k050506koch/GPT4-dev-177M-1511"

model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True) # trust_remote_code is required because the model has a custom architecture
tokenizer = AutoTokenizer.from_pretrained(model_id)

tokenizer.pad_token_id = tokenizer.eos_token_id

prompt = "He is a doctor. His main goal is"

inputs = tokenizer(prompt, return_tensors="pt")

outputs = model.generate(
    **inputs,
    max_length=128,
    temperature=0.7,
    top_p=0.9,
    repetition_penalty=1.2,
    no_repeat_ngram_size=3,
    num_return_sequences=1,
    do_sample=True,
)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

You can also download the full folder and point local scripts like inference2.py to the checkpoint path if you prefer offline usage.

Limitations and bias

This is an early checkpoint and can produce factual errors, nonsense, and unsafe or biased content.

Contributing

Contributions, issues, and any kind of feedback are very welcome! I’m a student working on these models as a learning project, so there may be mistakes or "suboptimal" design choices. If you have suggestions or find bugs, please open an issue or a pull request, I will be happy.

License

This model and associated code are released under the MIT License. You are free to use, modify, and build upon this work. Have fun!

Acknowledgements

Thanks to the teams and communities that made this work possible:

Downloads last month: 41

Safetensors

Model size

0.2B params

Tensor type

F32

Model tree for k050506koch/GPT4-dev-177M-1511

Finetunes

1 model

k050506koch
/

GPT4-dev-177M-1511