Haipai-micro

Haipai-micro is a "Micro-LLM" designed to test the limits of parameter efficiency. Despite having only 55 Million parameters (roughly 1/2 the size of GPT-2 Small), it achieves surprising performance on common sense and reasoning benchmarks by utilizing a high-density dataset mix.

This is a Base Model. It is not instruction-tuned.

Model Details

  • Architecture: Custom haipai
  • Parameters: ~55M
  • Context Length: 1024
  • Hidden Size: 512
  • Layers: 12
  • Attention Heads: 8
  • KV Heads: 4 (GQA - Grouped Query Attention)
  • Embedding Size: 256 (Factorized Embeddings)
  • Training Tokens: ~10 Billion

Architecture Features

Haipai uses a modern architecture inspired by Llama 3 and PaLM, scaled down for efficiency:

  • RoPE (Rotary Positional Embeddings): For better long-context handling.
  • SwiGLU Activation: For higher capacity per parameter than GeLU.
  • RMSNorm: For training stability.
  • GQA (Grouped Query Attention): 8 Query heads sharing 4 Key/Value heads for faster inference.
  • Factorized Embeddings: Decoupled embedding size (256) from hidden size (512) to maximize logic parameters while keeping the model small.

Evaluation (Zero-Shot)

Despite its tiny size, Haipai-55M outperforms random baselines significantly on knowledge and common sense tasks.

Benchmark Haipai-55M Score Random Chance
ARC-Easy 45.88% 25.0%
PIQA 61.04% 50.0%
COPA 62.00% 50.0%
Winogrande 53.35% 50.0%
Lambada (OpenAI) 27.25% ~0.0%

Note: Benchmarks were run using lm-evaluation-harness in Zero-Shot setting.

How to Use

Since this model uses a custom architecture, you must enable trust_remote_code=True.

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "rocky1410/haipai-micro"

# 1. Load Model
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    trust_remote_code=True,
    torch_dtype=torch.float32,
    device_map="auto"
)

# 2. Load Tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_id)

# 3. Generate
prompt = "Once upon a time in a digital world,"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

outputs = model.generate(
    **inputs, 
    max_new_tokens=100, 
    do_sample=True, 
    temperature=0.7, 
    top_k=50
)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Limitations

  • Size: At 55M parameters, this model functions more like a "smart autocomplete" than a reasoning engine. It struggles with complex, multi-step instructions.
  • Knowledge: While it knows basic facts (from Cosmopedia), it does not have the encyclopedic knowledge of a 7B model.
  • Hallucinations: As a base model, it may hallucinate facts or continue generating text indefinitely if not stopped.

Citation

If you use this model or architecture, please attribute it to [rocky1410/haipai-micro].

Downloads last month
40
Safetensors
Model size
55.6M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support