LaViDA Variant A Seed-0 A_2000

Strong GRPO-only baseline for the Oracle Phase-3 seed-0 matrix.

This repository is part of LaViDA: Latent Visitation Distribution Alignment for Mathematical Reasoning. It is a research checkpoint from the Oracle Phase-3 seed-0 matrix. The public artifact is intended for reproducibility and analysis, not as a general-purpose assistant model.

Model Details

Field Value
Base model Qwen/Qwen2.5-Math-7B
Adaptation LoRA adapters, rank 64 on linear layers
Training algorithm GRPO
Variant label A_2000
Loss mode none / GRPO-only
Auxiliary weight alpha 0.0
Expert pool / data No auxiliary expert pool used in the loss.
Training steps 2,000 RL optimizer steps
Evaluation prompt path base-model cot-4shot
W&B run id xqboonlw
W&B project lavida-mvm

Training Method

GRPO-only control; no latent auxiliary loss.

Shared setup:

  • Binary exact-match reward using the Qwen2.5-Math evaluation stack.
  • Group sampling with GRPO on hard mathematical prompts.
  • Frozen base-model hidden-state feature extraction over the last 4 transformer layers.
  • Feature vector psi = [h_start || h_end || h_mean || delta_H] in R^14336 for LaViDA variants.
  • Frozen VAE latent dimension 256 for auxiliary branches.
  • Maximum completion length 3072 tokens.

Seed-0 MATH-500 Results

Metric Value
Greedy overall (T=0) 76.2%
n=8 mean correctness (T=0.6) 74.88%
pass@8 75.8%
L4-5 pass@8 63.74%
Level-5 pass@8 52.24%

Interpretation: This is the matched strong RL baseline that D_OracleAug is compared against.

Related Data

  • No auxiliary dataset is used by the GRPO-only loss.

Related model repos:

How To Use

This checkpoint is expected to be used with the base model and PEFT/LoRA loading:

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base_id = "Qwen/Qwen2.5-Math-7B"
adapter_id = "Pritish92/lavida-variant-A-seed0-a-2000"

tokenizer = AutoTokenizer.from_pretrained(base_id, trust_remote_code=True)
base = AutoModelForCausalLM.from_pretrained(base_id, trust_remote_code=True, device_map="auto")
model = PeftModel.from_pretrained(base, adapter_id)
model.eval()

Use the same base-model cot-4shot evaluation path used in the LaViDA experiments for comparable MATH-500 numbers.

Limitations

  • This is a seed-0 research checkpoint; the main A_2000 vs D_OracleAug replication target is still seed 1.
  • Results are currently for MATH-500 only in the locked public ledger.
  • The model was trained for mathematical reasoning experiments and should not be treated as a general assistant.
  • Oracle-generated traces are machine-generated and filtered, not human process annotations.
  • The chi-square critic branches are controls / negative evidence in seed 0; the positive RL-side mechanism candidate is nearest-expert MSE (D_OracleAug).

Citation

@misc{saha2026lavidaa2000,
  title  = {LaViDA Variant A Seed-0 A_2000},
  author = {Saha, Pritish},
  year   = {2026},
  url    = {https://huggingface.co/Pritish92/lavida-variant-A-seed0-a-2000}
}
Downloads last month
21
Video Preview
loading

Model tree for Pritish92/lavida-variant-A-seed0-a-2000

Base model

Qwen/Qwen2.5-7B
Adapter
(18)
this model