Vardis
/

Whisper-Large-v2-Greek

Automatic Speech Recognition

Model card Files Files and versions

Whisper-Large-v2-Greek / README.md

Vardis's picture

Update README.md

84ac4e6 verified 5 months ago

|

history blame contribute delete

2.89 kB

	---
	library_name: transformers
	tags:
	- automatic-speech-recognition
	- whisper
	- fine-tuned
	- peft
	datasets:
	- Vardis/Greek_Mosel
	- mozilla-foundation/common_voice_11_0
	- google/fleurs
	language:
	- el
	metrics:
	- wer
	- cer
	base_model:
	- openai/whisper-large
	---


	# Fine-Tuned Whisper Large

	This is a Large-sized Whisper model fine-tuned for Greek speech transcription. It has 1.5B parameters and achieves improved transcription performance over the medium model.

	- WER: 12.06%
	- CER: 6.20%

	## Training Results

	\| Step \| Training Loss \| Validation Loss \| WER \| CER \|
	\|-------\|---------------\|----------------\|----------\|----------\|
	\| 250 \| 0.1776 \| 0.1904 \| 13.52% \| 6.74% \|
	\| 500 \| 0.1478 \| 0.1698 \| 12.55% \| 6.38% \|
	\| 750 \| 0.1229 \| 0.1608 \| 12.33% \| 6.24% \|
	\| 1000 \| 0.1057 \| 0.1605 \| 12.15% \| 6.26% \|
	\| 1250 \| 0.0864 \| 0.1630 \| 12.65% \| 6.65% \|
	\| 1500 \| 0.0677 \| 0.1643 \| 13.23% \| 7.35% \|
	\| 1750 \| 0.0618 \| 0.1681 \| 12.86% \| 6.86% \|
	\| 2000 \| 0.0533 \| 0.1686 \| 12.98% \| 7.00% \|

	## Model Details

	- Model Type: Whisper (Large)
	- Fine-tuned From: OpenAI Whisper Large
	- Language(s): Greek
	- Parameters: 1.5B


	## How to Use

	```python
	from transformers import WhisperProcessor, WhisperForConditionalGeneration
	from peft import PeftModel
	import torch

	device = "cuda" if torch.cuda.is_available() else "cpu"

	# Load base model and Greek fine-tuned LoRA weights
	base_model = WhisperForConditionalGeneration.from_pretrained("openai/whisper-large-v2").to(device)
	model = PeftModel.from_pretrained(base_model, "Vardis/Whisper-Large-v2-Greek").to(device)
	processor = WhisperProcessor.from_pretrained("Vardis/Whisper-Large-v2-Greek")

	# Load your audio waveform (e.g., using librosa or torchaudio)
	audio_input = ...

	# Generate transcription
	inputs = processor(audio_input, return_tensors="pt").input_features.to(device)
	predicted_ids = model.generate(inputs)
	transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)

	print(transcription)
	```

	## Context / Reference

	This model was developed as part of the work described in:

	Georgilas, V., Stafylakis, T. (2025). _Automatic Speech Recognition for Greek Medical Dictation_.
	The paper focuses on Greek medical ASR research in general and is not primarily about the model itself, but provides context for its development. Users are welcome to use the model freely for research and practical applications.

	BibTeX citation:
	```bibtex
	@misc{georgilas2025greekasr,
	title={Automatic Speech Recognition for Greek Medical Dictation},
	author={Vardis Georgilas and Themos Stafylakis},
	year={2025},
	note={Available at: https://www.arxiv.org/abs/2509.23550}
	}