🧠 Autonomous Recursive Code-Improving Agent (Experimental) with rsi-gpt 120b

Open source

This model fine tune using experimental autonomous coding agent with sft that can iteratively improve code, generate new tests (including edge cases), evaluate itself in a sandbox, and self-train via LoRA using its own interaction data.

The system closes a full loop:

write → test → reflect → improve → generate data → fine-tune → repeat

It is designed as a research prototype for studying recursive self-improvement, agentic programming, and self-generated supervision.

Benchmark

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k_cot_llama	3	flexible-extract	0	exact_match	↑	1	±	0
gsm_plus	1	flexible-extract	0	exact_match	↑	0.7	±	0.1528

🔍 What This Project Does

At a high level, the agent:

Starts with a small codebase and basic tests
Runs tests in an isolated sandbox
Uses an LLM to propose code patches
Generates new edge-case tests for its own changes
Re-runs tests to measure improvement
Logs successful interactions as supervised fine-tuning (SFT) data
Periodically fine-tunes itself using LoRA
Repeats until performance plateaus

The goal is not perfection, but measurable improvement over time.

🧩 Core Components

Workspace

A local sandbox directory containing:

Initial example code (example.py)
Initial tests (test_example.py)
All patches and generated tests

This workspace is mutated over time by the agent.

Sandbox

Tests are executed via pytest inside a controlled subprocess with:

Timeout protection
Captured output
Binary success/failure signal

This prevents runaway execution while giving the agent concrete feedback.

Agent State

The agent maintains memory across iterations:

Success history
Reward signals
Files modified
Plateau counter (to detect stagnation)

This state governs when the system should stop.

Autonomous Agent Loop

Each iteration performs:

Patch phase
The model proposes code improvements based on test output.
Test-generation phase
The model generates new tests, focusing on edge cases.
Evaluation phase
Tests are re-run to check whether things improved.
Archival phase
Successful prompt–response pairs are saved as .jsonl samples for training.

The agent communicates strictly via structured JSON to reduce ambiguity.

Recursive Depth

Each decision (patch or test generation) allows multiple retries if output is malformed or incomplete. This increases robustness without human intervention.

🧪 Self-Training via LoRA

All successful interactions are converted into supervised fine-tuning samples.

Training uses:

LoRA adapters
4-bit quantized base model
Short training bursts every iteration
Conservative hyperparameters to avoid catastrophic drift

This allows the agent to learn from its own successes over time.

📈 Improvement & Plateau Detection

Improvement is estimated by:

Increase in passing test count
Comparison against prior iterations

If no improvement is detected for several iterations, training stops automatically.

This prevents infinite loops and overfitting to noise.

🛠 Model & Stack

Base model: unsloth/gpt-oss-120b
Training: TRL SFTTrainer
Optimization: AdamW 8-bit
Frameworks: PyTorch, Hugging Face Datasets, Unsloth
Testing: Pytest

⚠️ Important Notes

This is a research prototype, not a production system.
Improvement signals are approximate, not formal verification.
Generated code and tests should not be trusted blindly.
Recursive self-training can drift if reward signals are poorly defined.

Treat this as a laboratory, not an autopilot.

🌱 Why This Matters

This project explores a critical question:

Can models improve their practical capabilities by generating, evaluating, and learning from their own work—without external labels?

The answer is still “partially, cautiously, and experimentally,” but this repository provides a concrete, inspectable loop for studying that question.

📜 License & Use

Use responsibly. Expect bugs. Expect surprises.
Recursive systems amplify both intelligence and mistakes.

Curiosity recommended. Blind faith discouraged.

Uploaded finetuned model

Developed by: EpistemeAI
License: apache-2.0
Finetuned from model : unsloth/gpt-oss-120b-unsloth-bnb-4bit

This gpt_oss model was trained 2x faster with Unsloth and Huggingface's TRL library.

Downloads last month: 144

Safetensors

Model size

120B params

Tensor type

BF16

Model tree for EpistemeAI/rsi-gpt-oss-120bv2-8bit

Base model

openai/gpt-oss-120b

Quantized

unsloth/gpt-oss-120b-unsloth-bnb-4bit

Quantized

(16)

this model