--- license: apache-2.0 tags: - audio - music - audio-processing - mastering - enhancement - flow-matching - siren pipeline_tag: audio-to-audio library_name: pytorch --- # SIREN-MASTER **Neural Audio Enhancement and Mastering with Flow Matching** SIREN-MASTER is part of the **SIREN Audio Suite** - a family of neural audio processing models designed for professional music production workflows. ## Model Description SIREN-MASTER enhances and masters audio using a Flow Matching architecture. The model learns the transformation from raw mixes to professionally mastered audio, capturing the nuanced decisions of human mastering engineers. Key capabilities: - **Automatic mastering** - Professional-quality mastering in one pass - **Audio enhancement** - Improve clarity, punch, and presence - **Dynamic processing** - Intelligent compression and limiting - **Tonal balance** - Optimal frequency distribution - **Stereo imaging** - Enhanced width and depth ## Architecture | Component | Details | |-----------|---------| | Base Architecture | Flow Matching (Continuous Normalizing Flow) | | Model Size | 40MB | | Training Phases | 2 (Foundation + Enhancement) | | Sample Rate | 44.1 kHz | Flow Matching provides: - **Stable training** - More stable than diffusion models - **Fast inference** - Fewer steps than diffusion - **High fidelity** - Excellent audio quality preservation ## Training Pipeline SIREN-MASTER was trained in two phases: 1. **Phase 1: Foundation** (100 epochs) - Learn basic audio transformations - Build robust feature representations 2. **Phase 2: Enhancement** (100 epochs) - Fine-tune on mastering pairs - Learn professional mastering aesthetics ## The SIREN Family SIREN-MASTER is part of a suite of audio AI models: | Model | Purpose | |-------|---------| | **SIREN-FX** | Neural audio effects | | **SIREN-FIX** | Audio restoration and repair | | **SIREN-MASTER** | Audio enhancement and mastering (this model) | | **SIREN-STEER** | Steerable audio transformations | | **SIREN-SEPARATE** | Source separation | | **SIREN-TRANSCRIBE** | Music analysis (key, tempo, beats) | ## Usage ```python import torch import torchaudio # Load model checkpoint = torch.load('siren_master.pt', map_location='cpu') model_state = checkpoint['model_state_dict'] # Model expects stereo audio at 44.1kHz # Input: raw mix # Output: mastered audio ``` ## Training Details - **Training Data**: Large-scale mastering dataset (raw/mastered pairs) - **Training Duration**: 200 total epochs (100 Phase 1 + 100 Phase 2) - **Hardware**: NVIDIA B200 GPUs (8-GPU DDP) - **Batch Size**: 256 ## Intended Use SIREN-MASTER is designed for: - Automatic audio mastering - Mix enhancement and polish - Reference-quality output preparation - Demo/pre-production mastering - Research in neural audio enhancement ## What SIREN-MASTER Learns The model captures mastering techniques including: - **EQ adjustments** - Tonal balance and clarity - **Compression** - Dynamic range control - **Limiting** - Loudness maximization - **Stereo enhancement** - Width and imaging - **Harmonic saturation** - Warmth and presence ## Limitations - Optimized for 44.1kHz sample rate - Best results with full mixes (not individual stems) - Mastering style reflects training data aesthetics - Not a replacement for genre-specific mastering ## License Apache 2.0 ## Citation If you use SIREN-MASTER in your research, please cite: ```bibtex @software{siren_master_2026, title={SIREN-MASTER: Neural Audio Mastering with Flow Matching}, author={SIREN Team}, year={2026}, url={https://huggingface.co/hilarl/siren-master} } ``` ## Contact For questions and feedback, please open an issue on the model repository.