🧠 Autonomous Recursive Code-Improving Agent (Experimental) with rsi-gpt 120b
Open source
This model fine tune using experimental autonomous coding agent with sft that can iteratively improve code, generate new tests (including edge cases), evaluate itself in a sandbox, and self-train via LoRA using its own interaction data.
The system closes a full loop:
write → test → reflect → improve → generate data → fine-tune → repeat
It is designed as a research prototype for studying recursive self-improvement, agentic programming, and self-generated supervision.
Benchmark
| Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
|---|---|---|---|---|---|---|---|---|
| gsm8k_cot_llama | 3 | flexible-extract | 0 | exact_match | ↑ | 1 | ± | 0 |
| gsm_plus | 1 | flexible-extract | 0 | exact_match | ↑ | 0.7 | ± | 0.1528 |
🔍 What This Project Does
At a high level, the agent:
- Starts with a small codebase and basic tests
- Runs tests in an isolated sandbox
- Uses an LLM to propose code patches
- Generates new edge-case tests for its own changes
- Re-runs tests to measure improvement
- Logs successful interactions as supervised fine-tuning (SFT) data
- Periodically fine-tunes itself using LoRA
- Repeats until performance plateaus
The goal is not perfection, but measurable improvement over time.
🧩 Core Components
Workspace
A local sandbox directory containing:
- Initial example code (
example.py) - Initial tests (
test_example.py) - All patches and generated tests
This workspace is mutated over time by the agent.
Sandbox
Tests are executed via pytest inside a controlled subprocess with:
- Timeout protection
- Captured output
- Binary success/failure signal
This prevents runaway execution while giving the agent concrete feedback.
Agent State
The agent maintains memory across iterations:
- Success history
- Reward signals
- Files modified
- Plateau counter (to detect stagnation)
This state governs when the system should stop.
Autonomous Agent Loop
Each iteration performs:
Patch phase
The model proposes code improvements based on test output.Test-generation phase
The model generates new tests, focusing on edge cases.Evaluation phase
Tests are re-run to check whether things improved.Archival phase
Successful prompt–response pairs are saved as.jsonlsamples for training.
The agent communicates strictly via structured JSON to reduce ambiguity.
Recursive Depth
Each decision (patch or test generation) allows multiple retries if output is malformed or incomplete. This increases robustness without human intervention.
🧪 Self-Training via LoRA
All successful interactions are converted into supervised fine-tuning samples.
Training uses:
- LoRA adapters
- 4-bit quantized base model
- Short training bursts every iteration
- Conservative hyperparameters to avoid catastrophic drift
This allows the agent to learn from its own successes over time.
📈 Improvement & Plateau Detection
Improvement is estimated by:
- Increase in passing test count
- Comparison against prior iterations
If no improvement is detected for several iterations, training stops automatically.
This prevents infinite loops and overfitting to noise.
🛠 Model & Stack
- Base model:
unsloth/gpt-oss-120b - Training: TRL
SFTTrainer - Optimization: AdamW 8-bit
- Frameworks: PyTorch, Hugging Face Datasets, Unsloth
- Testing: Pytest
⚠️ Important Notes
- This is a research prototype, not a production system.
- Improvement signals are approximate, not formal verification.
- Generated code and tests should not be trusted blindly.
- Recursive self-training can drift if reward signals are poorly defined.
Treat this as a laboratory, not an autopilot.
🌱 Why This Matters
This project explores a critical question:
Can models improve their practical capabilities by generating, evaluating, and learning from their own work—without external labels?
The answer is still “partially, cautiously, and experimentally,” but this repository provides a concrete, inspectable loop for studying that question.
📜 License & Use
Use responsibly. Expect bugs. Expect surprises.
Recursive systems amplify both intelligence and mistakes.
Curiosity recommended. Blind faith discouraged.
Uploaded finetuned model
- Developed by: EpistemeAI
- License: apache-2.0
- Finetuned from model : unsloth/gpt-oss-120b-unsloth-bnb-4bit
This gpt_oss model was trained 2x faster with Unsloth and Huggingface's TRL library.
- Downloads last month
- 144
Model tree for EpistemeAI/rsi-gpt-oss-120bv2-8bit
Base model
openai/gpt-oss-120b