DrDiag-QwenVL2

DrDiag-QwenVL2 is a 🩺 vision–language dermatology model trained using our two-stage fine-tuning pipeline on the HAM10000 dataset with bounding box annotations. The model combines image understanding with natural language reasoning to perform skin disease diagnosis with spatial awareness.

Training Pipeline

  1. Supervised Fine-Tuning (SFT):
    The base model Qwen2.5-VL was fine-tuned on HAM10000 with labels for skin disease classification, establishing a strong diagnostic baseline.

  2. Group Relative Policy Optimization (GRPO):
    Reinforcement learning was applied to align outputs with spatial annotations, improving consistency in bounding box predictions and segmentation-related tasks.

This setup enhances the model’s ability to not only classify skin lesions but also localize them through bounding box outputs, supporting explainability and trustworthiness in medical AI.

Core Results:

  • Diagnosis Accuracy: 85.0% (850/1000)
  • Bounding Box Prediction: 100.0% (1000/1000)

Model Details

  • Base Model: Qwen/Qwen2.5-VL-2B-Instruct
  • Finetuning: Two-stage pipeline (SFT + GRPO)
  • Dataset: HAM10000 with bounding box annotations (abaryan/ham10000_bbox)
  • Developed, Funded & Shared by: Abaryan
  • License: MIT
  • Type: Vision–Language Causal Model
  • Languages: English

Out-of-Scope Use

The model is intended solely for research purposes in dermatology AI. It should not be deployed for real-world clinical decision-making without further validation, regulatory clearance, and human oversight.


Bias, Risks, and Limitations

  • Dataset limitations: HAM10000 is relatively small and focused; performance may not generalize to all populations or rare skin conditions.
  • Model limitations: While bounding boxes improve explainability, they do not replace full clinical reasoning or histopathology.
  • Ethical considerations: Use with caution in contexts involving patient health data.

How to Get Started with the Model

Available Demo: https://huggingface.co/spaces/abaryan/DrDiag_HAM10000

You can load this model using the transformers library in Python:

from transformers import AutoModelForVision2Seq, AutoProcessor

model_id = "abaryan/DrDiag_qwen2vl_Ham10000" 
processor = Qwen2VLProcessor.from_pretrained(model_id)
model = Qwen2VLForConditionalGeneration.from_pretrained(model_id, device_map="auto", torch_dtype="auto")

# Example usage
inputs = processor(images="lesion.jpg", text="Diagnose the lesion and provide bounding box.", return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=200)
print(processor.decode(outputs[0], skip_special_tokens=True))
Downloads last month
2
Safetensors
Model size
2B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train abaryan/DrDiag_qwen2vl_Ham10000

Space using abaryan/DrDiag_qwen2vl_Ham10000 1