Haipai-micro
Haipai-micro is a "Micro-LLM" designed to test the limits of parameter efficiency. Despite having only 55 Million parameters (roughly 1/2 the size of GPT-2 Small), it achieves surprising performance on common sense and reasoning benchmarks by utilizing a high-density dataset mix.
This is a Base Model. It is not instruction-tuned.
Model Details
- Architecture: Custom
haipai - Parameters: ~55M
- Context Length: 1024
- Hidden Size: 512
- Layers: 12
- Attention Heads: 8
- KV Heads: 4 (GQA - Grouped Query Attention)
- Embedding Size: 256 (Factorized Embeddings)
- Training Tokens: ~10 Billion
Architecture Features
Haipai uses a modern architecture inspired by Llama 3 and PaLM, scaled down for efficiency:
- RoPE (Rotary Positional Embeddings): For better long-context handling.
- SwiGLU Activation: For higher capacity per parameter than GeLU.
- RMSNorm: For training stability.
- GQA (Grouped Query Attention): 8 Query heads sharing 4 Key/Value heads for faster inference.
- Factorized Embeddings: Decoupled embedding size (256) from hidden size (512) to maximize logic parameters while keeping the model small.
Evaluation (Zero-Shot)
Despite its tiny size, Haipai-55M outperforms random baselines significantly on knowledge and common sense tasks.
| Benchmark | Haipai-55M Score | Random Chance |
|---|---|---|
| ARC-Easy | 45.88% | 25.0% |
| PIQA | 61.04% | 50.0% |
| COPA | 62.00% | 50.0% |
| Winogrande | 53.35% | 50.0% |
| Lambada (OpenAI) | 27.25% | ~0.0% |
Note: Benchmarks were run using lm-evaluation-harness in Zero-Shot setting.
How to Use
Since this model uses a custom architecture, you must enable trust_remote_code=True.
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "rocky1410/haipai-micro"
# 1. Load Model
model = AutoModelForCausalLM.from_pretrained(
model_id,
trust_remote_code=True,
torch_dtype=torch.float32,
device_map="auto"
)
# 2. Load Tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_id)
# 3. Generate
prompt = "Once upon a time in a digital world,"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
**inputs,
max_new_tokens=100,
do_sample=True,
temperature=0.7,
top_k=50
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Limitations
- Size: At 55M parameters, this model functions more like a "smart autocomplete" than a reasoning engine. It struggles with complex, multi-step instructions.
- Knowledge: While it knows basic facts (from Cosmopedia), it does not have the encyclopedic knowledge of a 7B model.
- Hallucinations: As a base model, it may hallucinate facts or continue generating text indefinitely if not stopped.
Citation
If you use this model or architecture, please attribute it to [rocky1410/haipai-micro].
- Downloads last month
- 40