AudioLDM2 — MAESTRO Mirror

Source: cvssp/audioldm2 License: CC-BY-NC-SA-4.0 (non-commercial) Used by: MAESTRO Spectral Editor — inpaint operation (Tier-3)

This mirror is the upstream cvssp/audioldm2 AudioLDM2Pipeline with the duplicate .bin checkpoints removed — only .safetensors are kept. The diffusers loader picks the safetensors variant automatically, so behaviour is bit-for-bit identical to the original repo.

Why the mirror exists

MAESTRO ships its Tier-3 spectral-editor AI ops as download-on-demand modules. Keeping weights in our org (a) decouples our app from upstream re-uploads, (b) lets us strip .bin duplicates to halve the download, and (c) ensures Tier-3 features keep working if a user is offline after the first fetch.

License notice

AudioLDM2 is licensed under CC-BY-NC-SA-4.0 — non-commercial use only. MAESTRO surfaces this restriction in-app via the LicenseWarning component (same pattern as the Woosh foley model). Commercial users should not enable this model in production projects.

Folder layout

model_index.json
scheduler/
vae/diffusion_pytorch_model.safetensors                (212 MB)
unet/diffusion_pytorch_model.safetensors               (1.3 GB)
text_encoder/model.safetensors                         (741 MB, CLAP-text)
text_encoder_2/model.safetensors                       (1.3 GB, T5)
language_model/model.safetensors                       (475 MB, GPT2)
vocoder/model.safetensors                              (211 MB, HiFiGAN)
projection_model/diffusion_pytorch_model.safetensors   (4.6 MB)
tokenizer/, tokenizer_2/, feature_extractor/           (HF config)

Total: ~4.2 GB

Integration

from diffusers import AudioLDM2Pipeline
pipe = AudioLDM2Pipeline.from_pretrained(
    "AEmotionStudio/audioldm2-inpaint-models",
    torch_dtype=torch.float16,
)

See backend/ai/models/audioldm2_inpaint.py in the MAESTRO source tree for the full async runner with progress reporting and VRAM lifecycle.

Downloads last month: 13