---
license: apache-2.0
tags:
  - audio
  - music
  - audio-processing
  - mastering
  - enhancement
  - flow-matching
  - siren
pipeline_tag: audio-to-audio
library_name: pytorch
---

# SIREN-MASTER

**Neural Audio Enhancement and Mastering with Flow Matching**

SIREN-MASTER is part of the **SIREN Audio Suite** - a family of neural audio processing models designed for professional music production workflows.

## Model Description

SIREN-MASTER enhances and masters audio using a Flow Matching architecture. The model learns the transformation from raw mixes to professionally mastered audio, capturing the nuanced decisions of human mastering engineers.

Key capabilities:
- **Automatic mastering** - Professional-quality mastering in one pass
- **Audio enhancement** - Improve clarity, punch, and presence
- **Dynamic processing** - Intelligent compression and limiting
- **Tonal balance** - Optimal frequency distribution
- **Stereo imaging** - Enhanced width and depth

## Architecture

| Component | Details |
|-----------|---------|
| Base Architecture | Flow Matching (Continuous Normalizing Flow) |
| Model Size | 40MB |
| Training Phases | 2 (Foundation + Enhancement) |
| Sample Rate | 44.1 kHz |

Flow Matching provides:
- **Stable training** - More stable than diffusion models
- **Fast inference** - Fewer steps than diffusion
- **High fidelity** - Excellent audio quality preservation

## Training Pipeline

SIREN-MASTER was trained in two phases:

1. **Phase 1: Foundation** (100 epochs)
   - Learn basic audio transformations
   - Build robust feature representations

2. **Phase 2: Enhancement** (100 epochs)
   - Fine-tune on mastering pairs
   - Learn professional mastering aesthetics

## The SIREN Family

SIREN-MASTER is part of a suite of audio AI models:

| Model | Purpose |
|-------|---------|
| **SIREN-FX** | Neural audio effects |
| **SIREN-FIX** | Audio restoration and repair |
| **SIREN-MASTER** | Audio enhancement and mastering (this model) |
| **SIREN-STEER** | Steerable audio transformations |
| **SIREN-SEPARATE** | Source separation |
| **SIREN-TRANSCRIBE** | Music analysis (key, tempo, beats) |

## Usage

```python
import torch
import torchaudio

# Load model
checkpoint = torch.load('siren_master.pt', map_location='cpu')
model_state = checkpoint['model_state_dict']

# Model expects stereo audio at 44.1kHz
# Input: raw mix
# Output: mastered audio
```

## Training Details

- **Training Data**: Large-scale mastering dataset (raw/mastered pairs)
- **Training Duration**: 200 total epochs (100 Phase 1 + 100 Phase 2)
- **Hardware**: NVIDIA B200 GPUs (8-GPU DDP)
- **Batch Size**: 256

## Intended Use

SIREN-MASTER is designed for:
- Automatic audio mastering
- Mix enhancement and polish
- Reference-quality output preparation
- Demo/pre-production mastering
- Research in neural audio enhancement

## What SIREN-MASTER Learns

The model captures mastering techniques including:
- **EQ adjustments** - Tonal balance and clarity
- **Compression** - Dynamic range control
- **Limiting** - Loudness maximization
- **Stereo enhancement** - Width and imaging
- **Harmonic saturation** - Warmth and presence

## Limitations

- Optimized for 44.1kHz sample rate
- Best results with full mixes (not individual stems)
- Mastering style reflects training data aesthetics
- Not a replacement for genre-specific mastering

## License

Apache 2.0

## Citation

If you use SIREN-MASTER in your research, please cite:

```bibtex
@software{siren_master_2026,
  title={SIREN-MASTER: Neural Audio Mastering with Flow Matching},
  author={SIREN Team},
  year={2026},
  url={https://huggingface.co/hilarl/siren-master}
}
```

## Contact

For questions and feedback, please open an issue on the model repository.