Note: SFT was not done on this model, it cannot respond to questions 9/10 times.

Glint-1.3

⚠️ IMPORTANT NOTICE

This model is experimental. Glint-1.3 is a 982K parameter research model.

Performance characteristics: This model may occasionally output chuamliamce. If it does, try again. It is shy.

Not production-ready: This is a tiny neural network running on a prayer and a GPU.

Quick Stats

Stat	Value
Parameters	982,656 (under 1M 👍)
Training Tokens	100 Billion (FineWeb-Edu)
Hardware	RTX 5090
Context Window	256 tokens
Inference Speed	138,562.1925 tok/s
Vibe	Doing its best

What Is This?

Glint-1.3 is the first model in the CompactAI scaling-down plan.

We spent months adding features, SPIN, DPO, sleep gates, retention, recurrent loops, LoRA, engrams. More parameters, more tricks, more complexity. And you know what? The features were hurting the models. The tiny models couldn't breathe. So we're doing the opposite now: scaling down. Strip everything. Pure Llama. See how far simplicity goes.

This is that experiment. ~1M params. No gimmicks. Just a transformer doing its best.

It runs at 138,000 tokens per second on an RTX 5090. Fun, but useless. lmao.

The Journey

The model improves monotonically over 95K training steps on 100B tokens, with Wikitext-2 cross-entropy loss dropping from 4.29 → 3.08. For a 1M parameter model, this is actually respectable.

Model Specifications

Parameter	Value
Architecture	Transformer Decoder (Llama-style)
Parameters	982,656
Hidden Dim	128
Layers	4
Attention Heads	4
KV Heads	4 (GQA)
MLP Intermediate	384 (SwiGLU)
Context Length	256 tokens
Vocab Size	500 (ByteLevel BPE)
Normalization	RMSNorm
Position Encoding	RoPE
Embeddings	Tied input/output

Benchmarks

All checkpoints evaluated on Wikitext-2, BLiMP (grammaticality), and ARC-Easy (science QA). Sliding-window log-prob scoring methodology from the CompactAI benchmark suite.

Per-Metric Standouts

Metric	Best Checkpoint	Score
Wikitext-2 CE Loss	Step 95,000	3.06
BLiMP Accuracy	Step 11,500	64.2%
ARC-Easy Accuracy	Step 55,500	32.5%

Merged Model (Model Soup)

Weight averaging the best checkpoints per benchmark via per-parameter-group SLERP produces a model that exceeds individual bests on certain metrics:

Model	WT Loss	BLiMP	ARC	Composite
Best Merged	3.148	68.7% 🏆	29.0%	1.391 🏆
Best WT (step 95367)	3.080	53.7%	25.0%	1.431
Best BLiMP (step 11500)	3.307	64.2%	22.5%	1.480
Best ARC (step 55500)	3.128	50.7%	32.5%	1.432

The merged model achieves superadditive BLiMP gains (+4.5% over the individual best checkpoint) through spherical interpolation of attention and MLP weights at different blend factors.

Training Details

Parameter	Value
Dataset	FineWeb-Edu (sample-10BT)
Batch Size	4,096 (gradient accumulation 1)
Sequence Length	256
Learning Rate	8e-4 (cosine decay, 200 step warmup)
Weight Decay	0.05
Max Grad Norm	0.5
Optimizer	AdamW (fused, β₁=0.9, β₂=0.95)
Precision	bfloat16
Hardware	NVIDIA RTX 5090 (throughout)
Training Time	~30 hours for 95K steps

Usage

infer

Limitations

Context window: 256 tokens severely limits long-range dependencies
Knowledge: Extremely limited world knowledge due to parameter constraints
Coherence: May lose track of topic after a few sentences
Repetition: Tends toward repetitive patterns at higher temperatures
Reliability: Not suitable for any production application
Purpose: Research, education, and architectural experimentation

Related Models

Glint-1.3 — 1M params, instruction-tuned, our other scaling-down experiment
Shard-1 — 54.5M params, Gemma-4 attention
TMLM-Haiku-2.3 — 1M params, 10B tokens, SPIN-optimized (pre-scaling-down era)

Citation

@misc{tinylm1m,
  author = {CompactAI},
  title  = {Glint-1.3: a 982K parameter Llama-style transformer},
  year   = {2026},
  publisher = {GitHub},
  url    = {https://github.com/CompactAI-O/TinyLM}
}

Built by CompactAI. Small models trying their best since 2026.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train CompactAI-O/Glint-1.3

Collection including CompactAI-O/Glint-1.3

Glint Series

Collection

Find all of the Glint models in one place! (Hint: its here ) • 6 items • Updated 2 days ago • 2