DrDiag-QwenVL2

DrDiag-QwenVL2 is a 🩺 vision–language dermatology model trained using our two-stage fine-tuning pipeline on the HAM10000 dataset with bounding box annotations. The model combines image understanding with natural language reasoning to perform skin disease diagnosis with spatial awareness.

Training Pipeline

Supervised Fine-Tuning (SFT):
The base model Qwen2.5-VL was fine-tuned on HAM10000 with labels for skin disease classification, establishing a strong diagnostic baseline.
Group Relative Policy Optimization (GRPO):
Reinforcement learning was applied to align outputs with spatial annotations, improving consistency in bounding box predictions and segmentation-related tasks.

This setup enhances the model’s ability to not only classify skin lesions but also localize them through bounding box outputs, supporting explainability and trustworthiness in medical AI.

Core Results:

Diagnosis Accuracy: 85.0% (850/1000)
Bounding Box Prediction: 100.0% (1000/1000)

Model Details

Base Model: Qwen/Qwen2.5-VL-2B-Instruct
Finetuning: Two-stage pipeline (SFT + GRPO)
Dataset: HAM10000 with bounding box annotations (abaryan/ham10000_bbox)
Developed, Funded & Shared by: Abaryan
License: MIT
Type: Vision–Language Causal Model
Languages: English

Out-of-Scope Use

The model is intended solely for research purposes in dermatology AI. It should not be deployed for real-world clinical decision-making without further validation, regulatory clearance, and human oversight.

Bias, Risks, and Limitations

Dataset limitations: HAM10000 is relatively small and focused; performance may not generalize to all populations or rare skin conditions.
Model limitations: While bounding boxes improve explainability, they do not replace full clinical reasoning or histopathology.
Ethical considerations: Use with caution in contexts involving patient health data.

How to Get Started with the Model

Available Demo: https://huggingface.co/spaces/abaryan/DrDiag_HAM10000

You can load this model using the transformers library in Python:

from transformers import AutoModelForVision2Seq, AutoProcessor

model_id = "abaryan/DrDiag_qwen2vl_Ham10000" 
processor = Qwen2VLProcessor.from_pretrained(model_id)
model = Qwen2VLForConditionalGeneration.from_pretrained(model_id, device_map="auto", torch_dtype="auto")

# Example usage
inputs = processor(images="lesion.jpg", text="Diagnose the lesion and provide bounding box.", return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=200)
print(processor.decode(outputs[0], skip_special_tokens=True))

Downloads last month: 2

Safetensors

Model size

2B params

Tensor type

BF16

abaryan
/

DrDiag_qwen2vl_Ham10000