DrDiag-QwenVL2
DrDiag-QwenVL2 is a 🩺 vision–language dermatology model trained using our two-stage fine-tuning pipeline on the HAM10000 dataset with bounding box annotations. The model combines image understanding with natural language reasoning to perform skin disease diagnosis with spatial awareness.
Training Pipeline
Supervised Fine-Tuning (SFT):
The base model Qwen2.5-VL was fine-tuned on HAM10000 with labels for skin disease classification, establishing a strong diagnostic baseline.Group Relative Policy Optimization (GRPO):
Reinforcement learning was applied to align outputs with spatial annotations, improving consistency in bounding box predictions and segmentation-related tasks.
This setup enhances the model’s ability to not only classify skin lesions but also localize them through bounding box outputs, supporting explainability and trustworthiness in medical AI.
Core Results:
- Diagnosis Accuracy: 85.0% (850/1000)
- Bounding Box Prediction: 100.0% (1000/1000)
Model Details
- Base Model: Qwen/Qwen2.5-VL-2B-Instruct
- Finetuning: Two-stage pipeline (SFT + GRPO)
- Dataset: HAM10000 with bounding box annotations (abaryan/ham10000_bbox)
- Developed, Funded & Shared by: Abaryan
- License: MIT
- Type: Vision–Language Causal Model
- Languages: English
Out-of-Scope Use
The model is intended solely for research purposes in dermatology AI. It should not be deployed for real-world clinical decision-making without further validation, regulatory clearance, and human oversight.
Bias, Risks, and Limitations
- Dataset limitations: HAM10000 is relatively small and focused; performance may not generalize to all populations or rare skin conditions.
- Model limitations: While bounding boxes improve explainability, they do not replace full clinical reasoning or histopathology.
- Ethical considerations: Use with caution in contexts involving patient health data.
How to Get Started with the Model
Available Demo: https://huggingface.co/spaces/abaryan/DrDiag_HAM10000
You can load this model using the transformers library in Python:
from transformers import AutoModelForVision2Seq, AutoProcessor
model_id = "abaryan/DrDiag_qwen2vl_Ham10000"
processor = Qwen2VLProcessor.from_pretrained(model_id)
model = Qwen2VLForConditionalGeneration.from_pretrained(model_id, device_map="auto", torch_dtype="auto")
# Example usage
inputs = processor(images="lesion.jpg", text="Diagnose the lesion and provide bounding box.", return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=200)
print(processor.decode(outputs[0], skip_special_tokens=True))
- Downloads last month
- 2