---
library_name: transformers
tags:
  - automatic-speech-recognition
  - whisper
  - fine-tuned
  - peft
datasets:
  - Vardis/Greek_Mosel
  - mozilla-foundation/common_voice_11_0
  - google/fleurs
language:
  - el
metrics:
  - wer
  - cer
base_model:
  - openai/whisper-large
---


# Fine-Tuned Whisper Large

This is a **Large-sized Whisper model** fine-tuned for Greek speech transcription. It has 1.5B parameters and achieves improved transcription performance over the medium model.

- **WER:** 12.06%  
- **CER:** 6.20%  

## Training Results

| Step  | Training Loss | Validation Loss | WER      | CER      |
|-------|---------------|----------------|----------|----------|
| 250   | 0.1776        | 0.1904         | 13.52%   | 6.74%    |
| 500   | 0.1478        | 0.1698         | 12.55%   | 6.38%    |
| 750   | 0.1229        | 0.1608         | 12.33%   | 6.24%    |
| 1000  | 0.1057        | 0.1605         | 12.15%   | 6.26%    |
| 1250  | 0.0864        | 0.1630         | 12.65%   | 6.65%    |
| 1500  | 0.0677        | 0.1643         | 13.23%   | 7.35%    |
| 1750  | 0.0618        | 0.1681         | 12.86%   | 6.86%    |
| 2000  | 0.0533        | 0.1686         | 12.98%   | 7.00%    |

## Model Details

- **Model Type:** Whisper (Large)  
- **Fine-tuned From:** OpenAI Whisper Large  
- **Language(s):** Greek  
- **Parameters:** 1.5B  
 

## How to Use

```python
from transformers import WhisperProcessor, WhisperForConditionalGeneration
from peft import PeftModel
import torch

device = "cuda" if torch.cuda.is_available() else "cpu"

# Load base model and Greek fine-tuned LoRA weights
base_model = WhisperForConditionalGeneration.from_pretrained("openai/whisper-large-v2").to(device)
model = PeftModel.from_pretrained(base_model, "Vardis/Whisper-Large-v2-Greek").to(device)
processor = WhisperProcessor.from_pretrained("Vardis/Whisper-Large-v2-Greek")

# Load your audio waveform (e.g., using librosa or torchaudio)
audio_input = ...  

# Generate transcription
inputs = processor(audio_input, return_tensors="pt").input_features.to(device)
predicted_ids = model.generate(inputs)
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)

print(transcription)
```

## Context / Reference

This model was developed as part of the work described in:

**Georgilas, V., Stafylakis, T. (2025). _Automatic Speech Recognition for Greek Medical Dictation_.**  
The paper focuses on Greek medical ASR research in general and is **not primarily about the model itself**, but provides context for its development. Users are welcome to use the model freely for research and practical applications.

**BibTeX citation:**
```bibtex
@misc{georgilas2025greekasr,
  title={Automatic Speech Recognition for Greek Medical Dictation},
  author={Vardis Georgilas and Themos Stafylakis},
  year={2025},
  note={Available at: https://www.arxiv.org/abs/2509.23550}
}