Vardis's picture
Update README.md
84ac4e6 verified
---
library_name: transformers
tags:
- automatic-speech-recognition
- whisper
- fine-tuned
- peft
datasets:
- Vardis/Greek_Mosel
- mozilla-foundation/common_voice_11_0
- google/fleurs
language:
- el
metrics:
- wer
- cer
base_model:
- openai/whisper-large
---
# Fine-Tuned Whisper Large
This is a **Large-sized Whisper model** fine-tuned for Greek speech transcription. It has 1.5B parameters and achieves improved transcription performance over the medium model.
- **WER:** 12.06%
- **CER:** 6.20%
## Training Results
| Step | Training Loss | Validation Loss | WER | CER |
|-------|---------------|----------------|----------|----------|
| 250 | 0.1776 | 0.1904 | 13.52% | 6.74% |
| 500 | 0.1478 | 0.1698 | 12.55% | 6.38% |
| 750 | 0.1229 | 0.1608 | 12.33% | 6.24% |
| 1000 | 0.1057 | 0.1605 | 12.15% | 6.26% |
| 1250 | 0.0864 | 0.1630 | 12.65% | 6.65% |
| 1500 | 0.0677 | 0.1643 | 13.23% | 7.35% |
| 1750 | 0.0618 | 0.1681 | 12.86% | 6.86% |
| 2000 | 0.0533 | 0.1686 | 12.98% | 7.00% |
## Model Details
- **Model Type:** Whisper (Large)
- **Fine-tuned From:** OpenAI Whisper Large
- **Language(s):** Greek
- **Parameters:** 1.5B
## How to Use
```python
from transformers import WhisperProcessor, WhisperForConditionalGeneration
from peft import PeftModel
import torch
device = "cuda" if torch.cuda.is_available() else "cpu"
# Load base model and Greek fine-tuned LoRA weights
base_model = WhisperForConditionalGeneration.from_pretrained("openai/whisper-large-v2").to(device)
model = PeftModel.from_pretrained(base_model, "Vardis/Whisper-Large-v2-Greek").to(device)
processor = WhisperProcessor.from_pretrained("Vardis/Whisper-Large-v2-Greek")
# Load your audio waveform (e.g., using librosa or torchaudio)
audio_input = ...
# Generate transcription
inputs = processor(audio_input, return_tensors="pt").input_features.to(device)
predicted_ids = model.generate(inputs)
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)
print(transcription)
```
## Context / Reference
This model was developed as part of the work described in:
**Georgilas, V., Stafylakis, T. (2025). _Automatic Speech Recognition for Greek Medical Dictation_.**
The paper focuses on Greek medical ASR research in general and is **not primarily about the model itself**, but provides context for its development. Users are welcome to use the model freely for research and practical applications.
**BibTeX citation:**
```bibtex
@misc{georgilas2025greekasr,
title={Automatic Speech Recognition for Greek Medical Dictation},
author={Vardis Georgilas and Themos Stafylakis},
year={2025},
note={Available at: https://www.arxiv.org/abs/2509.23550}
}