DeepSeek-R1-Distill-Qwen-1.5B Fine-tuned on MedMCQA

This model is a fine-tuned version of deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B on the MedMCQA dataset using QLoRA (Quantized Low-Rank Adaptation).

Model Details

  • Base Model: deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
  • Dataset: MedMCQA (Medical Multiple Choice Question Answering)
  • Fine-tuning Method: QLoRA (4-bit quantization + LoRA)
  • Training Framework: Transformers + PEFT + TRL

Performance

Test Set Results

  • Accuracy: 0.0980
  • Macro F1-Score: 0.0446
  • Weighted F1-Score: 0.1785

Validation Set Results

  • Accuracy: 0.3400
  • Macro F1-Score: 0.2290
  • Weighted F1-Score: 0.2481

Per-Class Performance (Test Set)

Class Precision Recall F1-Score Support
A 0.0000 0.0000 0.0000 0
B 0.0000 0.0000 0.0000 0
C 1.0000 0.0980 0.1785 500
D 0.0000 0.0000 0.0000 0

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel

# Load base model and tokenizer
base_model = AutoModelForCausalLM.from_pretrained("deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B")
tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B")

# Load fine-tuned model
model = PeftModel.from_pretrained(base_model, "arumpuri/deepseek-r1-medmcqa-qlora")

# Example usage
prompt = '''Question: What is the most common cause of acute pancreatitis?

Options:
A. Alcohol abuse
B. Gallstones
C. Hypertriglyceridemia
D. Medications

Answer:'''

inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=50, temperature=0.7)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Training Details

  • LoRA Rank: 16
  • LoRA Alpha: 32
  • Learning Rate: 2e-4
  • Batch Size: 1 (with gradient accumulation)
  • Epochs: 3
  • Optimizer: Paged AdamW 8-bit
  • Training Samples: 5000
  • Validation Samples: 500
  • Test Samples: 1000

Evaluation Methodology

The model was evaluated using comprehensive metrics including:

  • Accuracy: Overall correctness across all questions
  • Precision/Recall/F1: Per-class and averaged metrics
  • Confusion Matrix: Detailed error analysis

The evaluation was performed on both validation and test sets to ensure robust performance assessment.

Intended Use

This model is designed for medical question answering tasks, particularly multiple-choice questions. It should be used as a research tool and not for actual medical diagnosis or treatment recommendations.

Limitations

  • The model may generate incorrect medical information
  • It should not be used for clinical decision-making
  • Performance may vary on questions outside the training distribution
  • Evaluation was performed on a subset of the full dataset due to computational constraints
Downloads last month
1
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for arumpuri/deepseek-r1-medmcqa-qlora

Adapter
(193)
this model

Dataset used to train arumpuri/deepseek-r1-medmcqa-qlora