@Bochkov on Hugging Face: "Curious reproducible fact: I trained a GPT-like decoder-only Transformer where…"

Post

1933

Curious reproducible fact: I trained a GPT-like decoder-only Transformer where the entire input embedding table is frozen and reduced to a 16‑D binary token-ID code (0/1) — this is NOT 16-bit quantization.

Key details:
- vocab_size = 65536, n_embed = 16 (2^16 = 65536 unique IDs)
- deterministic expansion 16 → d_model=1024 via repeat_interleave (scale=64)
- full embedding table is published (embeddings.txt) for auditability

Repro note + verification script:
https://huggingface.co/blog/Bochkov/emergent-semantics-beyond-token-embeddings

Model repo:
Bochkov/emergent-semantics-model-16-bit-269m

License: Apache-2.0

Join the conversation