🧠 Autonomous Recursive Code-Improving Agent (Experimental) with rsi-gpt 120b

Open source

This model fine tune using experimental autonomous coding agent with sft that can iteratively improve code, generate new tests (including edge cases), evaluate itself in a sandbox, and self-train via LoRA using its own interaction data.

The system closes a full loop:

write → test → reflect → improve → generate data → fine-tune → repeat

It is designed as a research prototype for studying recursive self-improvement, agentic programming, and self-generated supervision.


Benchmark

Tasks Version Filter n-shot Metric Value Stderr
gsm8k_cot_llama 3 flexible-extract 0 exact_match 1 ± 0
gsm_plus 1 flexible-extract 0 exact_match 0.7 ± 0.1528

🔍 What This Project Does

At a high level, the agent:

  1. Starts with a small codebase and basic tests
  2. Runs tests in an isolated sandbox
  3. Uses an LLM to propose code patches
  4. Generates new edge-case tests for its own changes
  5. Re-runs tests to measure improvement
  6. Logs successful interactions as supervised fine-tuning (SFT) data
  7. Periodically fine-tunes itself using LoRA
  8. Repeats until performance plateaus

The goal is not perfection, but measurable improvement over time.


🧩 Core Components

Workspace

A local sandbox directory containing:

  • Initial example code (example.py)
  • Initial tests (test_example.py)
  • All patches and generated tests

This workspace is mutated over time by the agent.


Sandbox

Tests are executed via pytest inside a controlled subprocess with:

  • Timeout protection
  • Captured output
  • Binary success/failure signal

This prevents runaway execution while giving the agent concrete feedback.


Agent State

The agent maintains memory across iterations:

  • Success history
  • Reward signals
  • Files modified
  • Plateau counter (to detect stagnation)

This state governs when the system should stop.


Autonomous Agent Loop

Each iteration performs:

  • Patch phase
    The model proposes code improvements based on test output.

  • Test-generation phase
    The model generates new tests, focusing on edge cases.

  • Evaluation phase
    Tests are re-run to check whether things improved.

  • Archival phase
    Successful prompt–response pairs are saved as .jsonl samples for training.

The agent communicates strictly via structured JSON to reduce ambiguity.


Recursive Depth

Each decision (patch or test generation) allows multiple retries if output is malformed or incomplete. This increases robustness without human intervention.


🧪 Self-Training via LoRA

All successful interactions are converted into supervised fine-tuning samples.

Training uses:

  • LoRA adapters
  • 4-bit quantized base model
  • Short training bursts every iteration
  • Conservative hyperparameters to avoid catastrophic drift

This allows the agent to learn from its own successes over time.


📈 Improvement & Plateau Detection

Improvement is estimated by:

  • Increase in passing test count
  • Comparison against prior iterations

If no improvement is detected for several iterations, training stops automatically.

This prevents infinite loops and overfitting to noise.


🛠 Model & Stack

  • Base model: unsloth/gpt-oss-120b
  • Training: TRL SFTTrainer
  • Optimization: AdamW 8-bit
  • Frameworks: PyTorch, Hugging Face Datasets, Unsloth
  • Testing: Pytest

⚠️ Important Notes

  • This is a research prototype, not a production system.
  • Improvement signals are approximate, not formal verification.
  • Generated code and tests should not be trusted blindly.
  • Recursive self-training can drift if reward signals are poorly defined.

Treat this as a laboratory, not an autopilot.


🌱 Why This Matters

This project explores a critical question:

Can models improve their practical capabilities by generating, evaluating, and learning from their own work—without external labels?

The answer is still “partially, cautiously, and experimentally,” but this repository provides a concrete, inspectable loop for studying that question.


📜 License & Use

Use responsibly. Expect bugs. Expect surprises.
Recursive systems amplify both intelligence and mistakes.

Curiosity recommended. Blind faith discouraged.

Uploaded finetuned model

  • Developed by: EpistemeAI
  • License: apache-2.0
  • Finetuned from model : unsloth/gpt-oss-120b-unsloth-bnb-4bit

This gpt_oss model was trained 2x faster with Unsloth and Huggingface's TRL library.

Downloads last month
144
Safetensors
Model size
120B params
Tensor type
BF16
·
U8
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for EpistemeAI/rsi-gpt-oss-120bv2-8bit

Quantized
(16)
this model