SAGE V2 Topology Policy

Model: nvidia/Nemotron-Orchestrator-8B (Qwen3, 8.19B params) Training: DAPO via verl 0.7.1 on 2x H100 NVL 94GB Framework: YGN-SAGE — Self-Adaptive Generation Engine

Key Result: SAGE +27pp on MASBENCH depth

First empirical proof that multi-agent topology improves performance on a recognized benchmark.

Axis	Bare Model	SAGE	Delta
depth	40%	67%	+27pp
horizon	0%	pending	HIGH potential
robustness	0%	pending	HIGH potential
breadth	60%	40%	-20pp (overhead)
parallel	60%	40%	-20pp (overhead)

Insight: Topology helps when base accuracy < 60% (aligns with AdaptOrch, arXiv 2602.16873).

Training Status (March 30, 2026)

Phase	Status	Metrics
SFT warmup	Done	loss 2.87 to 1.30
Phase A (structural)	Done	step 1050, reward 0.225
DAPO targeted	In Progress	step 104/1920, reward 0.184, DAPO token-level loss
Phase C (micro-decisions)	Pending	SageTopologyEnv 4-state machine ready

What makes Phase C unique (no competitor has this)

The model doesn't just generate topologies - it operates them at runtime:

At each checkpoint node, decides: continue, upgrade (re-run with better model), or reroute
GiGPO step-level anchor states for per-decision credit assignment
5-signal reward: structural + execution + rewardflow + resilience + cost

Repository Contents

Path	Description	Size
`checkpoints/combined_step_100/`	Full FSDP checkpoint (for resume)	34 GB
`sft_merged/`	SFT base model	15.6 GB
`phase_a_step_100/`	LoRA adapter	667 MB

Research Backing

DAPO - Token-level loss fixes GRPO entropy collapse
AdaptOrch - Topology matters when base accuracy < 60%
MAS-Orchestra - Function-calling MAS orchestration
The Conductor - Binary topology reward
MASBENCH - 5-axis MAS evaluation
Graph-GRPO - Edge-level credit assignment
GoAgent - Group-level topology + CIB compression

Downloads last month: -; Downloads are not tracked for this model. How to track

Video Preview

Reinforcement Learning

Model tree for yannabadie/sage-topology-policy-v2

Base model

Qwen/Qwen3-8B-Base

Finetuned

Qwen/Qwen3-8B

Finetuned

nvidia/Nemotron-Orchestrator-8B

Finetuned

(11)

this model

Papers for yannabadie/sage-topology-policy-v2

GoAgent: Group-of-Agents Communication Topology Generation for LLM-based Multi-Agent Systems

Paper • 2603.19677 • Published Mar 20

Graph-GRPO: Stabilizing Multi-Agent Topology Learning via Group Relative Policy Optimization

Paper • 2603.02701 • Published Mar 3

AdaptOrch: Task-Adaptive Multi-Agent Orchestration in the Era of LLM Performance Convergence

Paper • 2602.16873 • Published Feb 18 • 2

MAS-Orchestra: Understanding and Improving Multi-Agent Reasoning Through Holistic Orchestration and Controlled Benchmarks

Paper • 2601.14652 • Published Jan 21 • 4

Learning to Orchestrate Agents in Natural Language with the Conductor

Paper • 2512.04388 • Published Dec 4, 2025