Instructions to use saillab/medgemma-4b-targeted-lora-mimic-mt-12k with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use saillab/medgemma-4b-targeted-lora-mimic-mt-12k with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("google/medgemma-4b-it") model = PeftModel.from_pretrained(base_model, "saillab/medgemma-4b-targeted-lora-mimic-mt-12k") - Notebooks
- Google Colab
- Kaggle
MedGemma-4B Targeted LoRA (Layers 15-19) β Multi-task n=12K
LoRA adapter for google/medgemma-4b-it,
released as part of "Mechanistically Guided LoRA Improves Paraphrase
Consistency in Medical Vision-Language Models" (Sadanadan & Behzadan,
CHIL 2026).
This is the targeted arm of the paper: rank-16 adapters restricted to layers 15-19 of the language model. Layer selection is mechanistically motivated β sparse-autoencoder analysis on Gemma Scope 2 identifies Feature 3818 at layer 17 as a "clinical query formality gate" that drives much of the model's paraphrase sensitivity. Targeting the surrounding register-sensitive layers (15-19) reduces flip rate without disturbing the rest of the network.
This release corresponds to the multi-task n=12K scale-up of the n=500 binary checkpoint reported in the submitted CHIL paper. It uses a sequence-level cross-entropy + symmetric KL loss compatible with all MIMIC-CXR question types (presence, abnormality, view, type, level, location).
Training
| Setting | Value |
|---|---|
| Base model | google/medgemma-4b-it |
Adapter rank (r) |
16 |
alpha |
32 |
| Dropout | 0.05 |
| Learning rate | 2e-4 |
| Effective batch size | 8 (batch 1, grad-accum 8) |
| Epochs | 3 |
| Target layers | 15-19 of 34 |
| Target modules | Q, K, V, O attention projections + gate, up, down MLP projections |
| Training data | MIMIC-CXR train split, all question types, ~2,865 unique questions Γ 3 epochs of random paraphrase sampling β 8,600 paraphrase pairs |
| Loss | Sequence-level cross-entropy on first answer token + symmetric KL divergence between paraphrase predictions |
| Trainable parameters | 4.4M (0.10% of base) |
Final epoch metrics
- Train loss: 1.04
- Question-level flip rate: 7.7%
- First-token accuracy: 72.2%
Usage
from transformers import AutoProcessor, AutoModelForImageTextToText
from peft import PeftModel
import torch
base = AutoModelForImageTextToText.from_pretrained(
"google/medgemma-4b-it",
dtype=torch.bfloat16,
device_map="cuda",
)
model = PeftModel.from_pretrained(base, "saillab/medgemma-4b-targeted-lora-mimic-mt-12k")
processor = AutoProcessor.from_pretrained("saillab/medgemma-4b-targeted-lora-mimic-mt-12k")
Intended use
Research on medical-VLM paraphrase robustness, mechanistic interpretability, and LoRA-based fine-tuning. Not for clinical use. The model has not been clinically validated.
Citation (primary β CHIL 2026)
@inproceedings{sadanadan2026mechanistic,
title = {Mechanistically Guided LoRA Improves Paraphrase Consistency in Medical Vision-Language Models},
author = {Sadanadan, Binesh and Behzadan, Vahid},
booktitle = {Conference on Health, Inference, and Learning (CHIL)},
year = {2026}
}
Companion evaluation work
This adapter is also evaluated in a companion heatmap-faithfulness study that demonstrates fine-tuned VLMs achieve consistent answers without producing faithful visual explanations:
@misc{sadanadan2026heatmap,
title = {Attention Without Grounding: Causal Evaluation of Visual Explanations in Medical Vision-Language Models},
author = {Sadanadan, Binesh and Behzadan, Vahid},
year = {2026},
note = {Pre-print, SAIL Lab, University of New Haven}
}
License
Distributed under the Gemma Terms of Use, inheriting the licensing terms of the base model.
- Downloads last month
- -