LaViDA Variant A Seed-0 A_2000
Strong GRPO-only baseline for the Oracle Phase-3 seed-0 matrix.
This repository is part of LaViDA: Latent Visitation Distribution Alignment for Mathematical Reasoning. It is a research checkpoint from the Oracle Phase-3 seed-0 matrix. The public artifact is intended for reproducibility and analysis, not as a general-purpose assistant model.
Model Details
| Field | Value |
|---|---|
| Base model | Qwen/Qwen2.5-Math-7B |
| Adaptation | LoRA adapters, rank 64 on linear layers |
| Training algorithm | GRPO |
| Variant label | A_2000 |
| Loss mode | none / GRPO-only |
Auxiliary weight alpha |
0.0 |
| Expert pool / data | No auxiliary expert pool used in the loss. |
| Training steps | 2,000 RL optimizer steps |
| Evaluation prompt path | base-model cot-4shot |
| W&B run id | xqboonlw |
| W&B project | lavida-mvm |
Training Method
GRPO-only control; no latent auxiliary loss.
Shared setup:
- Binary exact-match reward using the Qwen2.5-Math evaluation stack.
- Group sampling with GRPO on hard mathematical prompts.
- Frozen base-model hidden-state feature extraction over the last 4 transformer layers.
- Feature vector
psi = [h_start || h_end || h_mean || delta_H]inR^14336for LaViDA variants. - Frozen VAE latent dimension
256for auxiliary branches. - Maximum completion length 3072 tokens.
Seed-0 MATH-500 Results
| Metric | Value |
|---|---|
Greedy overall (T=0) |
76.2% |
n=8 mean correctness (T=0.6) |
74.88% |
| pass@8 | 75.8% |
| L4-5 pass@8 | 63.74% |
| Level-5 pass@8 | 52.24% |
Interpretation: This is the matched strong RL baseline that D_OracleAug is compared against.
Related Data
- No auxiliary dataset is used by the GRPO-only loss.
Related model repos:
Pritish92/lavida-variant-A-seed0-a-2000Pritish92/lavida-variant-B-seed0-oracleaug-alpha0p2Pritish92/lavida-variant-D-seed0-oracleaug-alpha0p001Pritish92/lavida-variant-B-seed0-selfdistill-alpha0p02
How To Use
This checkpoint is expected to be used with the base model and PEFT/LoRA loading:
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
base_id = "Qwen/Qwen2.5-Math-7B"
adapter_id = "Pritish92/lavida-variant-A-seed0-a-2000"
tokenizer = AutoTokenizer.from_pretrained(base_id, trust_remote_code=True)
base = AutoModelForCausalLM.from_pretrained(base_id, trust_remote_code=True, device_map="auto")
model = PeftModel.from_pretrained(base, adapter_id)
model.eval()
Use the same base-model cot-4shot evaluation path used in the LaViDA experiments for comparable MATH-500 numbers.
Limitations
- This is a seed-0 research checkpoint; the main
A_2000vsD_OracleAugreplication target is still seed 1. - Results are currently for MATH-500 only in the locked public ledger.
- The model was trained for mathematical reasoning experiments and should not be treated as a general assistant.
- Oracle-generated traces are machine-generated and filtered, not human process annotations.
- The chi-square critic branches are controls / negative evidence in seed 0; the positive RL-side mechanism candidate is nearest-expert MSE (
D_OracleAug).
Citation
@misc{saha2026lavidaa2000,
title = {LaViDA Variant A Seed-0 A_2000},
author = {Saha, Pritish},
year = {2026},
url = {https://huggingface.co/Pritish92/lavida-variant-A-seed0-a-2000}
}
- Downloads last month
- 21