Instructions to use AEmotionStudio/audioldm2-inpaint-models with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use AEmotionStudio/audioldm2-inpaint-models with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("AEmotionStudio/audioldm2-inpaint-models", dtype=torch.bfloat16, device_map="cuda") prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" image = pipe(prompt).images[0] - Notebooks
- Google Colab
- Kaggle
AudioLDM2 — MAESTRO Mirror
Source: cvssp/audioldm2
License: CC-BY-NC-SA-4.0 (non-commercial)
Used by: MAESTRO Spectral Editor — inpaint operation (Tier-3)
This mirror is the upstream cvssp/audioldm2 AudioLDM2Pipeline with the
duplicate .bin checkpoints removed — only .safetensors are kept. The
diffusers loader picks the safetensors variant automatically, so behaviour
is bit-for-bit identical to the original repo.
Why the mirror exists
MAESTRO ships its Tier-3 spectral-editor AI ops as download-on-demand
modules. Keeping weights in our org (a) decouples our app from upstream
re-uploads, (b) lets us strip .bin duplicates to halve the download, and
(c) ensures Tier-3 features keep working if a user is offline after the
first fetch.
License notice
AudioLDM2 is licensed under CC-BY-NC-SA-4.0 — non-commercial use only.
MAESTRO surfaces this restriction in-app via the LicenseWarning
component (same pattern as the Woosh foley model). Commercial users
should not enable this model in production projects.
Folder layout
model_index.json
scheduler/
vae/diffusion_pytorch_model.safetensors (212 MB)
unet/diffusion_pytorch_model.safetensors (1.3 GB)
text_encoder/model.safetensors (741 MB, CLAP-text)
text_encoder_2/model.safetensors (1.3 GB, T5)
language_model/model.safetensors (475 MB, GPT2)
vocoder/model.safetensors (211 MB, HiFiGAN)
projection_model/diffusion_pytorch_model.safetensors (4.6 MB)
tokenizer/, tokenizer_2/, feature_extractor/ (HF config)
Total: ~4.2 GB
Integration
from diffusers import AudioLDM2Pipeline
pipe = AudioLDM2Pipeline.from_pretrained(
"AEmotionStudio/audioldm2-inpaint-models",
torch_dtype=torch.float16,
)
See backend/ai/models/audioldm2_inpaint.py in the MAESTRO source tree
for the full async runner with progress reporting and VRAM lifecycle.
- Downloads last month
- 13