LaViDA Variant A Seed-0 A_2000

Strong GRPO-only baseline for the Oracle Phase-3 seed-0 matrix.

This repository is part of LaViDA: Latent Visitation Distribution Alignment for Mathematical Reasoning. It is a research checkpoint from the Oracle Phase-3 seed-0 matrix. The public artifact is intended for reproducibility and analysis, not as a general-purpose assistant model.

Model Details

Field	Value
Base model	`Qwen/Qwen2.5-Math-7B`
Adaptation	LoRA adapters, rank 64 on linear layers
Training algorithm	GRPO
Variant label	`A_2000`
Loss mode	`none / GRPO-only`
Auxiliary weight `alpha`	`0.0`
Expert pool / data	No auxiliary expert pool used in the loss.
Training steps	2,000 RL optimizer steps
Evaluation prompt path	base-model `cot-4shot`
W&B run id	`xqboonlw`
W&B project	`lavida-mvm`

Training Method

GRPO-only control; no latent auxiliary loss.

Shared setup:

Binary exact-match reward using the Qwen2.5-Math evaluation stack.
Group sampling with GRPO on hard mathematical prompts.
Frozen base-model hidden-state feature extraction over the last 4 transformer layers.
Feature vector psi = [h_start || h_end || h_mean || delta_H] in R^14336 for LaViDA variants.
Frozen VAE latent dimension 256 for auxiliary branches.
Maximum completion length 3072 tokens.

Seed-0 MATH-500 Results

Metric	Value
Greedy overall (`T=0`)	76.2%
`n=8` mean correctness (`T=0.6`)	74.88%
pass@8	75.8%
L4-5 pass@8	63.74%
Level-5 pass@8	52.24%

Interpretation: This is the matched strong RL baseline that D_OracleAug is compared against.

Related Data

No auxiliary dataset is used by the GRPO-only loss.

Related model repos:

How To Use

This checkpoint is expected to be used with the base model and PEFT/LoRA loading:

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base_id = "Qwen/Qwen2.5-Math-7B"
adapter_id = "Pritish92/lavida-variant-A-seed0-a-2000"

tokenizer = AutoTokenizer.from_pretrained(base_id, trust_remote_code=True)
base = AutoModelForCausalLM.from_pretrained(base_id, trust_remote_code=True, device_map="auto")
model = PeftModel.from_pretrained(base, adapter_id)
model.eval()

Use the same base-model cot-4shot evaluation path used in the LaViDA experiments for comparable MATH-500 numbers.

Limitations

This is a seed-0 research checkpoint; the main A_2000 vs D_OracleAug replication target is still seed 1.
Results are currently for MATH-500 only in the locked public ledger.
The model was trained for mathematical reasoning experiments and should not be treated as a general assistant.
Oracle-generated traces are machine-generated and filtered, not human process annotations.
The chi-square critic branches are controls / negative evidence in seed 0; the positive RL-side mechanism candidate is nearest-expert MSE (D_OracleAug).

Citation

@misc{saha2026lavidaa2000,
  title  = {LaViDA Variant A Seed-0 A_2000},
  author = {Saha, Pritish},
  year   = {2026},
  url    = {https://huggingface.co/Pritish92/lavida-variant-A-seed0-a-2000}
}

Downloads last month: 21

Video Preview

Reinforcement Learning

Model tree for Pritish92/lavida-variant-A-seed0-a-2000

Base model

Qwen/Qwen2.5-7B

Finetuned

Qwen/Qwen2.5-Math-7B

Adapter

(18)

this model