Piper TTS: en_US-ryan-medium

Medium-size US English male voice.

Model Details

Field Value
Architecture VITS (end-to-end)
Format ONNX
Language English (US)
Gender Male
Model Size medium (~63 MB ONNX, ~15M params)
Sample Rate 22050 Hz
License CC BY-NC-SA 4.0

Note: Piper uses the terms "medium", "high", etc. to refer to model size, not output quality. Medium models (63 MB, ~15M params) and high models (114 MB, ~28M params) both produce 22.05 kHz audio.

Usage

With piper-tts (GPL)

from piper import PiperVoice

voice = PiperVoice.load("model.onnx")
for chunk in voice.synthesize("Hello, this is a test."):
    # chunk.audio_float_array contains float32 audio
    pass

Standalone ONNX (MIT โ€” no piper-tts dependency)

Requires espeak-ng installed (brew install espeak-ng / apt install espeak-ng).

import json, subprocess, numpy as np, onnxruntime as ort, soundfile as sf
from huggingface_hub import hf_hub_download

model_id = "Trelis/piper-en-us-ryan-medium"
onnx_path = hf_hub_download(model_id, "model.onnx")
config_path = hf_hub_download(model_id, "model.onnx.json")

with open(config_path) as f:
    config = json.load(f)

session = ort.InferenceSession(onnx_path, providers=["CPUExecutionProvider"])
phoneme_id_map = config["phoneme_id_map"]
espeak_voice = config["espeak"]["voice"]

def phonemize(text, voice):
    out = subprocess.run(
        ["espeak-ng", "-v", voice, "-q", "--ipa=2", "-x", text],
        capture_output=True, text=True,
    ).stdout.strip()
    return [list(line.replace("_", " ")) for line in out.split("\n") if line.strip()]

def to_ids(phonemes, pmap):
    ids = [pmap["^"][0], pmap["_"][0]]
    for p in phonemes:
        if p in pmap:
            ids.extend(pmap[p])
            ids.append(pmap["_"][0])
    ids.append(pmap["$"][0])
    return ids

text = "Hello, this is a test."
audio_chunks = []
for sentence in phonemize(text, espeak_voice):
    ids = to_ids(sentence, phoneme_id_map)
    if len(ids) < 3:
        continue
    audio = session.run(None, {
        "input": np.array([ids], dtype=np.int64),
        "input_lengths": np.array([len(ids)], dtype=np.int64),
        "scales": np.array([
            config["inference"]["noise_scale"],
            config["inference"]["length_scale"],
            config["inference"]["noise_w"],
        ], dtype=np.float32),
    })[0]
    audio_chunks.append(audio.squeeze())

audio = np.concatenate(audio_chunks).astype(np.float32)
sf.write("output.wav", audio, config["audio"]["sample_rate"])

Fine-tuning

You can fine-tune this model on your own voice data using Trelis Studio. Piper models can be trained on custom datasets to create personalized voices.

Attribution

Trained on Ryan Speech dataset. Fine-tuned from lessac medium.

Re-hosted from rhasspy/piper-voices. Original voice: en_US-ryan-medium

Downloads last month
22
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Collection including Trelis/piper-en-us-ryan-medium