Haipai-micro

Haipai-micro is a "Micro-LLM" designed to test the limits of parameter efficiency. Despite having only 55 Million parameters (roughly 1/2 the size of GPT-2 Small), it achieves surprising performance on common sense and reasoning benchmarks by utilizing a high-density dataset mix.

This is a Base Model. It is not instruction-tuned.

Model Details

Architecture: Custom haipai
Parameters: ~55M
Context Length: 1024
Hidden Size: 512
Layers: 12
Attention Heads: 8
KV Heads: 4 (GQA - Grouped Query Attention)
Embedding Size: 256 (Factorized Embeddings)
Training Tokens: ~10 Billion

Architecture Features

Haipai uses a modern architecture inspired by Llama 3 and PaLM, scaled down for efficiency:

RoPE (Rotary Positional Embeddings): For better long-context handling.
SwiGLU Activation: For higher capacity per parameter than GeLU.
RMSNorm: For training stability.
GQA (Grouped Query Attention): 8 Query heads sharing 4 Key/Value heads for faster inference.
Factorized Embeddings: Decoupled embedding size (256) from hidden size (512) to maximize logic parameters while keeping the model small.

Evaluation (Zero-Shot)

Despite its tiny size, Haipai-55M outperforms random baselines significantly on knowledge and common sense tasks.

Benchmark	Haipai-55M Score	Random Chance
ARC-Easy	45.88%	25.0%
PIQA	61.04%	50.0%
COPA	62.00%	50.0%
Winogrande	53.35%	50.0%
Lambada (OpenAI)	27.25%	~0.0%

Note: Benchmarks were run using lm-evaluation-harness in Zero-Shot setting.

How to Use

Since this model uses a custom architecture, you must enable trust_remote_code=True.

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "rocky1410/haipai-micro"

# 1. Load Model
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    trust_remote_code=True,
    torch_dtype=torch.float32,
    device_map="auto"
)

# 2. Load Tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_id)

# 3. Generate
prompt = "Once upon a time in a digital world,"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

outputs = model.generate(
    **inputs, 
    max_new_tokens=100, 
    do_sample=True, 
    temperature=0.7, 
    top_k=50
)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Limitations

Size: At 55M parameters, this model functions more like a "smart autocomplete" than a reasoning engine. It struggles with complex, multi-step instructions.
Knowledge: While it knows basic facts (from Cosmopedia), it does not have the encyclopedic knowledge of a 7B model.
Hallucinations: As a base model, it may hallucinate facts or continue generating text indefinitely if not stopped.

Citation

If you use this model or architecture, please attribute it to [rocky1410/haipai-micro].

Downloads last month: 40

Safetensors

Model size

55.6M params

Tensor type

F32