Instructions to use Kabs-123/clustermind-lora with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use Kabs-123/clustermind-lora with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-0.5B-Instruct") model = PeftModel.from_pretrained(base_model, "Kabs-123/clustermind-lora") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- Unsloth Studio
How to use Kabs-123/clustermind-lora with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Kabs-123/clustermind-lora to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Kabs-123/clustermind-lora to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for Kabs-123/clustermind-lora to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="Kabs-123/clustermind-lora", max_seq_length=2048, )
ClusterMind Chaos Arena β LoRA adapter (GRPO)
Trained on the ClusterMind Chaos Arena environment via SFT warm-start + online RL (grpo). Base weights are frozen; only the LoRA adapter is updated (r=8, target_modules=["q_proj","v_proj"]).
Training stack
- Model load + LoRA:
transformers(UnslothFastLanguageModelwhen available, elsetransformers+bitsandbytes4-bit +peft) - SFT phase:
trl.SFTTrainer - RL phase: in-tree GRPO/PPO/REINFORCE loop (TRL's
GRPOTrainerOOMs on T4 because it holds all K trajectories' computation graphs simultaneously; ours is two-phase: no-grad rollout collection then per-step backward) - Hub push:
huggingface_hub.push_to_hub+upload_file
Training summary
| field | value |
|---|---|
| base model | Qwen/Qwen2.5-0.5B-Instruct |
| engine | transformers |
| SFT trainer | trl.SFTTrainer |
| RL algo | grpo (auto: trl present -> using episode-level GRPO) |
| trainable params | 540,672 / 11.973056694274142 (4515739.08%) |
| SFT episodes | 16 |
| RL episodes | 24 |
| eval episodes | 8 |
| eval mean reward | 10.46 |
| frozen base | True |
| lora only | True |
| quick mode | True |
How to load
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
base = "Qwen/Qwen2.5-0.5B-Instruct"
adapter = "Kabs-123/clustermind-lora"
tok = AutoTokenizer.from_pretrained(base)
model = AutoModelForCausalLM.from_pretrained(base, device_map="auto")
model = PeftModel.from_pretrained(model, adapter)
Files in this repo
adapter_model.safetensorsβ LoRA weightsadapter_config.jsonβ LoRA config (r, alpha, target modules)tokenizer.jsonetc. β tokenizer of the base modeltraining_logs.jsonlβ per-step reward + loss + metricstrained_results.jsonβ full training summary
Evaluation
The trained agent is benchmarked against five heuristic baselines on
8 chaos scenarios at curriculum levels 3β5. See trained_results.json
for the full eval breakdown.
- Downloads last month
- 2