---
base_model: Kwaipilot/KAT-Dev
tags:
  - rust
  - Hyperswitch
  - LoRA
  - CPT
  - Causal-LM
  - code-generation
  - phased-training
pipeline_tag: text-generation
language:
  - en
datasets:
  - AdityaNarayan/HS-Repo-Curriculum-Learning
---

# KAT-Dev-CPT-LoRA-HS-32K-maxToken-v1

A specialized **LoRA fine-tuned adapter** built on top of **Kwaipilot/KAT-Dev (32B)**, designed for deep understanding of the **Hyperswitch** (Rust) payment orchestration codebase. This model uses a **3-phase curriculum training pipeline**, progressively enhancing the model's grasp of Rust patterns, PR changes, repository structure, and payment processing logic.

---

# 🚀 Overview

This LoRA adapter was trained with a **phased CPT strategy**:

### **Phase 1 — Foundation**
Learns core repository structure, Rust syntax, basic modules, and Hyperswitch architectural patterns.

### **Phase 2 — Evolution**
Exposes the model to progressively complex components, multi-file interactions, workflows, and feature evolution.

### **Phase 3 — PR Mastery**
Specializes on real PR changes, diffs, refactors, and reasoning across multi-module changes.

The final result is a **high-signal Rust-aware, Hyperswitch-specialized LoRA adapter** ideal for:

- Code generation  
- Code explanation  
- PR reasoning  
- Diff summarization  
- Documentation generation  
- Rust workflow automation  

---

# 🔧 Training Details

## LoRA Configuration
```yaml
r: 128
alpha: 256
dropout: 0.05
target_modules:
  - q_proj
  - k_proj
  - v_proj
  - o_proj
  - gate_proj
  - up_proj
  - down_proj
```

### Hyperparameters
```yaml
learning_rate: 1e-4
micro_batch_size: 1
gradient_accumulation_steps: 6
sequence_length: 32768
train/val split: 95/5
precision: bf16
```

### Hardware
```yaml
num_gpus: 8
gpu_name: NVIDIA H200
```

---

# 📊 Phased Training Metrics

## **Phase 1 — Foundation**
**Dataset:** `phase1_foundation.jsonl`  
**Epochs:** 3

| Metric | Train | Eval |
|--------|-------|-------|
| Loss | 0.2918 | 0.2434 |
| Entropy | 0.2052 | 0.2355 |
| Mean Token Accuracy | 0.9505 | 0.9331 |
| Perplexity | — | **1.2756** |
| Tokens | 8.88M | 8.88M |

---

## **Phase 2 — Evolution**
**Dataset:** `phase2_evolution.jsonl`  
**Epochs:** 2

| Metric | Train | Eval |
|--------|-------|-------|
| Loss | 0.7255 | 0.7661 |
| Entropy | 0.5080 | 0.7210 |
| Mean Token Accuracy | 0.8641 | 0.8110 |
| Perplexity | — | **2.1514** |
| Tokens | 23.48M | 23.48M |

---

## **Phase 3 — PR Mastery**
**Dataset:** `phase3_pr_mastery.jsonl`  
**Epochs:** 2

| Metric | Train | Eval |
|--------|-------|-------|
| Loss | 0.5378 | 0.5606 |
| Entropy | 0.4781 | 0.5254 |
| Mean Token Accuracy | 0.8749 | 0.8569 |
| Perplexity | — | **1.7516** |
| Tokens | 15.45M | 15.45M |

---

# 📈 Summary Across All Phases

```yaml
total_epochs: 7
total_phases: 3

initial_train_loss: 0.2918
final_train_loss: 0.5378

initial_eval_loss: 0.2434
final_eval_loss: 0.5606

initial_perplexity: 1.2756
final_perplexity: 1.7516
```


---

# 🙏 Acknowledgments

- **Kwaipilot Team** — For the excellent KAT-Dev 32B base model  
- **Juspay / Hyperswitch** — For the rich open-source Rust codebase  
- **Hugging Face** — For PEFT, TRL, and Transformers  

---

# 📚 Citation

```bibtex
@misc{katdev-hyperswitch-phasedlora-2025,
  title={KAT-Dev-CPT-LoRA-HS-32K-maxToken-v1},
  author={Aditya Narayan},
  year={2025},
  publisher={Hugging Face},
  url={https://huggingface.co/AdityaNarayan/KAT-Dev-CPT-LoRA-HS-32K-maxToken-v1}
}
```