Updated README
Browse files
README.md
CHANGED
|
@@ -28,13 +28,23 @@ The pre-training datasets include: AudioSet (vocalization), FreeSound (babies),
|
|
| 28 |
|
| 29 |
We evaluate voc2vec on six datasets: ASVP-ESD, ASPV-ESD (babies), CNVVE, NonVerbal Vocalization Dataset, Donate a Cry, VIVAE.
|
| 30 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 31 |
## Available Models
|
| 32 |
|
| 33 |
| Model | Description | Link |
|
| 34 |
|--------|-------------|------|
|
| 35 |
| **voc2vec** | Pre-trained model on **125 hours of non-verbal audio**. | [🔗 Model](https://huggingface.co/alkiskoudounas/voc2vec) |
|
| 36 |
-
| **voc2vec-as-pt** | Continues pre-training from a model that was **initially trained on the AudioSet dataset**. | [🔗 Model](https://huggingface.co/alkiskoudounas/voc2vec-as-pt) |
|
| 37 |
-
| **voc2vec-ls-pt** | Continues pre-training from a model that was **initially trained on the LibriSpeech dataset**. | [🔗 Model](https://huggingface.co/alkiskoudounas/voc2vec-ls-pt) |
|
|
|
|
| 38 |
|
| 39 |
## Usage examples
|
| 40 |
|
|
@@ -63,13 +73,12 @@ logits = model(**inputs).logits
|
|
| 63 |
```bibtex
|
| 64 |
@INPROCEEDINGS{koudounas2025icassp,
|
| 65 |
author={Koudounas, Alkis and La Quatra, Moreno and Siniscalchi, Sabato Marco and Baralis, Elena},
|
| 66 |
-
booktitle={ICASSP 2025 -
|
| 67 |
title={voc2vec: A Foundation Model for Non-Verbal Vocalization},
|
| 68 |
year={2025},
|
| 69 |
volume={},
|
| 70 |
number={},
|
| 71 |
-
pages={},
|
| 72 |
-
keywords={},
|
| 73 |
-
doi={}}
|
| 74 |
-
|
| 75 |
```
|
|
|
|
| 28 |
|
| 29 |
We evaluate voc2vec on six datasets: ASVP-ESD, ASPV-ESD (babies), CNVVE, NonVerbal Vocalization Dataset, Donate a Cry, VIVAE.
|
| 30 |
|
| 31 |
+
The following table reports the average performance in terms of Unweighted Average Recall (UAR) and F1 Macro across the six datasets described above.
|
| 32 |
+
|
| 33 |
+
| Model | Architecture | Pre-training DS | UAR | F1 Macro |
|
| 34 |
+
|--------|-------------|-------------|-----------|-----------|
|
| 35 |
+
| **voc2vec** | wav2vec 2.0 | Voc125 | .612±.212 | .580±.230 |
|
| 36 |
+
| **voc2vec-as-pt** | wav2vec 2.0 | AudioSet + Voc125 | .603±.183 | .574±.194 |
|
| 37 |
+
| **voc2vec-ls-pt** | wav2vec 2.0 | LibriSpeech + Voc125 | .661±.206 | .636±.223 |
|
| 38 |
+
| **voc2vec-hubert-ls-pt** | HuBERT | LibriSpeech + Voc125 | **.696±.189** | **.678±.200** |
|
| 39 |
+
|
| 40 |
## Available Models
|
| 41 |
|
| 42 |
| Model | Description | Link |
|
| 43 |
|--------|-------------|------|
|
| 44 |
| **voc2vec** | Pre-trained model on **125 hours of non-verbal audio**. | [🔗 Model](https://huggingface.co/alkiskoudounas/voc2vec) |
|
| 45 |
+
| **voc2vec-as-pt** | Continues pre-training from a wav2vec2-like model that was **initially trained on the AudioSet dataset**. | [🔗 Model](https://huggingface.co/alkiskoudounas/voc2vec-as-pt) |
|
| 46 |
+
| **voc2vec-ls-pt** | Continues pre-training from a wav2vec2-like model that was **initially trained on the LibriSpeech dataset**. | [🔗 Model](https://huggingface.co/alkiskoudounas/voc2vec-ls-pt) |
|
| 47 |
+
| **voc2vec-hubert-ls-pt** | Continues pre-training from a hubert-like model that was **initially trained on the LibriSpeech dataset**. | [🔗 Model](https://huggingface.co/alkiskoudounas/voc2vec-hubert-ls-pt) |
|
| 48 |
|
| 49 |
## Usage examples
|
| 50 |
|
|
|
|
| 73 |
```bibtex
|
| 74 |
@INPROCEEDINGS{koudounas2025icassp,
|
| 75 |
author={Koudounas, Alkis and La Quatra, Moreno and Siniscalchi, Sabato Marco and Baralis, Elena},
|
| 76 |
+
booktitle={ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
|
| 77 |
title={voc2vec: A Foundation Model for Non-Verbal Vocalization},
|
| 78 |
year={2025},
|
| 79 |
volume={},
|
| 80 |
number={},
|
| 81 |
+
pages={1-5},
|
| 82 |
+
keywords={Pediatrics;Accuracy;Foundation models;Benchmark testing;Signal processing;Data models;Acoustics;Speech processing;Nonverbal vocalization;Representation Learning;Self-Supervised Models;Pre-trained Models},
|
| 83 |
+
doi={10.1109/ICASSP49660.2025.10890672}}
|
|
|
|
| 84 |
```
|