DeepSeek-R1-Distill-Qwen-1.5B Fine-tuned on MedMCQA

This model is a fine-tuned version of deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B on the MedMCQA dataset using QLoRA (Quantized Low-Rank Adaptation).

Model Details

Base Model: deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
Dataset: MedMCQA (Medical Multiple Choice Question Answering)
Fine-tuning Method: QLoRA (4-bit quantization + LoRA)
Training Framework: Transformers + PEFT + TRL

Performance

Test Set Results

Accuracy: 0.0980
Macro F1-Score: 0.0446
Weighted F1-Score: 0.1785

Validation Set Results

Accuracy: 0.3400
Macro F1-Score: 0.2290
Weighted F1-Score: 0.2481

Per-Class Performance (Test Set)

Class	Precision	Recall	F1-Score	Support
A	0.0000	0.0000	0.0000	0
B	0.0000	0.0000	0.0000	0
C	1.0000	0.0980	0.1785	500
D	0.0000	0.0000	0.0000	0

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel

# Load base model and tokenizer
base_model = AutoModelForCausalLM.from_pretrained("deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B")
tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B")

# Load fine-tuned model
model = PeftModel.from_pretrained(base_model, "arumpuri/deepseek-r1-medmcqa-qlora")

# Example usage
prompt = '''Question: What is the most common cause of acute pancreatitis?

Options:
A. Alcohol abuse
B. Gallstones
C. Hypertriglyceridemia
D. Medications

Answer:'''

inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=50, temperature=0.7)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Training Details

LoRA Rank: 16
LoRA Alpha: 32
Learning Rate: 2e-4
Batch Size: 1 (with gradient accumulation)
Epochs: 3
Optimizer: Paged AdamW 8-bit
Training Samples: 5000
Validation Samples: 500
Test Samples: 1000

Evaluation Methodology

The model was evaluated using comprehensive metrics including:

Accuracy: Overall correctness across all questions
Precision/Recall/F1: Per-class and averaged metrics
Confusion Matrix: Detailed error analysis

The evaluation was performed on both validation and test sets to ensure robust performance assessment.

Intended Use

This model is designed for medical question answering tasks, particularly multiple-choice questions. It should be used as a research tool and not for actual medical diagnosis or treatment recommendations.

Limitations

The model may generate incorrect medical information
It should not be used for clinical decision-making
Performance may vary on questions outside the training distribution
Evaluation was performed on a subset of the full dataset due to computational constraints

Downloads last month: 1

Model tree for arumpuri/deepseek-r1-medmcqa-qlora

Base model

deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B

Adapter

(193)

this model

arumpuri
/

deepseek-r1-medmcqa-qlora