--- library_name: transformers tags: - automatic-speech-recognition - whisper - fine-tuned - peft datasets: - Vardis/Greek_Mosel - mozilla-foundation/common_voice_11_0 - google/fleurs language: - el metrics: - wer - cer base_model: - openai/whisper-large --- # Fine-Tuned Whisper Large This is a **Large-sized Whisper model** fine-tuned for Greek speech transcription. It has 1.5B parameters and achieves improved transcription performance over the medium model. - **WER:** 12.06% - **CER:** 6.20% ## Training Results | Step | Training Loss | Validation Loss | WER | CER | |-------|---------------|----------------|----------|----------| | 250 | 0.1776 | 0.1904 | 13.52% | 6.74% | | 500 | 0.1478 | 0.1698 | 12.55% | 6.38% | | 750 | 0.1229 | 0.1608 | 12.33% | 6.24% | | 1000 | 0.1057 | 0.1605 | 12.15% | 6.26% | | 1250 | 0.0864 | 0.1630 | 12.65% | 6.65% | | 1500 | 0.0677 | 0.1643 | 13.23% | 7.35% | | 1750 | 0.0618 | 0.1681 | 12.86% | 6.86% | | 2000 | 0.0533 | 0.1686 | 12.98% | 7.00% | ## Model Details - **Model Type:** Whisper (Large) - **Fine-tuned From:** OpenAI Whisper Large - **Language(s):** Greek - **Parameters:** 1.5B ## How to Use ```python from transformers import WhisperProcessor, WhisperForConditionalGeneration from peft import PeftModel import torch device = "cuda" if torch.cuda.is_available() else "cpu" # Load base model and Greek fine-tuned LoRA weights base_model = WhisperForConditionalGeneration.from_pretrained("openai/whisper-large-v2").to(device) model = PeftModel.from_pretrained(base_model, "Vardis/Whisper-Large-v2-Greek").to(device) processor = WhisperProcessor.from_pretrained("Vardis/Whisper-Large-v2-Greek") # Load your audio waveform (e.g., using librosa or torchaudio) audio_input = ... # Generate transcription inputs = processor(audio_input, return_tensors="pt").input_features.to(device) predicted_ids = model.generate(inputs) transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True) print(transcription) ``` ## Context / Reference This model was developed as part of the work described in: **Georgilas, V., Stafylakis, T. (2025). _Automatic Speech Recognition for Greek Medical Dictation_.** The paper focuses on Greek medical ASR research in general and is **not primarily about the model itself**, but provides context for its development. Users are welcome to use the model freely for research and practical applications. **BibTeX citation:** ```bibtex @misc{georgilas2025greekasr, title={Automatic Speech Recognition for Greek Medical Dictation}, author={Vardis Georgilas and Themos Stafylakis}, year={2025}, note={Available at: https://www.arxiv.org/abs/2509.23550} }