Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
Bochkov 
posted an update 5 days ago
Post
1933
Curious reproducible fact: I trained a GPT-like decoder-only Transformer where the entire input embedding table is frozen and reduced to a 16‑D binary token-ID code (0/1) — this is NOT 16-bit quantization.

Key details:
- vocab_size = 65536, n_embed = 16 (2^16 = 65536 unique IDs)
- deterministic expansion 16 → d_model=1024 via repeat_interleave (scale=64)
- full embedding table is published (embeddings.txt) for auditability

Repro note + verification script:
https://huggingface.co/blog/Bochkov/emergent-semantics-beyond-token-embeddings

Model repo:
Bochkov/emergent-semantics-model-16-bit-269m

License: Apache-2.0
In this post