ysdede/khanacademy-turkish
Viewer • Updated • 27.1k • 151 • 35
How to use khazarai/KhanAcademy-TTS with PEFT:
from peft import PeftModel
from transformers import AutoModelForCausalLM
base_model = AutoModelForCausalLM.from_pretrained("unsloth/csm-1b")
model = PeftModel.from_pretrained(base_model, "khazarai/KhanAcademy-TTS")How to use khazarai/KhanAcademy-TTS with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("text-to-speech", model="khazarai/KhanAcademy-TTS") # Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("khazarai/KhanAcademy-TTS", dtype="auto")How to use khazarai/KhanAcademy-TTS with Unsloth Studio:
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for khazarai/KhanAcademy-TTS to start chatting
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for khazarai/KhanAcademy-TTS to start chatting
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for khazarai/KhanAcademy-TTS to start chatting
pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
model_name="khazarai/KhanAcademy-TTS",
max_seq_length=2048,
)This model is a LoRA fine-tuned version of unsloth/csm-1b trained on the Khan Academy Turkish audio dataset. It is designed to perform text-to-speech (TTS) generation in Turkish, producing natural-sounding audio for educational and academic contexts.
Use the code below to get started with the model.
import torch
from transformers import CsmForConditionalGeneration, AutoProcessor
import soundfile as sf
from peft import PeftModel
model_id = "unsloth/csm-1b"
device = "cuda" if torch.cuda.is_available() else "cpu"
processor = AutoProcessor.from_pretrained(model_id)
base_model = CsmForConditionalGeneration.from_pretrained(model_id, device_map=device)
model = PeftModel.from_pretrained(base_model, "khazarai/KhanAcademy-TTS")
text = "İnsanlarda, prefrontal korteks çok gelişmiştir."
speaker_id = 0
conversation = [
{"role": str(speaker_id), "content": [{"type": "text", "text": text}]},
]
audio_values = model.generate(
**processor.apply_chat_template(
conversation,
tokenize=True,
return_dict=True,
).to("cuda"),
max_new_tokens=700,
# play with these parameters to tweak results
# depth_decoder_top_k=0,
# depth_decoder_top_p=0.9,
# depth_decoder_do_sample=True,
# depth_decoder_temperature=0.9,
# top_k=0,
# top_p=1.0,
# temperature=0.9,
# do_sample=True,
#########################################################
output_audio=True
)
audio = audio_values[0].to(torch.float32).cpu().numpy()
sf.write("example.wav", audio, 24000)
~5K samples of ysdede/khanacademy-turkish