--- license: apache-2.0 pipeline_tag: audio-to-audio tags: - speech_enhancement - noise_suppression - real_time --- # DPDFNet: Boosting DeepFilterNet2 via Dual-Path RNN DPDFNet is a family of causal, single-channel speech enhancement models for real-time noise suppression in challenging everyday environments. It extends the DeepFilterNet2 enhancement framework by inserting Dual-Path RNN (DPRNN) blocks into the encoder, strengthening long-range temporal and cross-band modeling while preserving a compact, streaming-friendly design. This repository provides four TensorFlow Lite (TFLite) models optimized for mobile and edge deployment: * `baseline.tflite` * `dpdfnet2.tflite` * `dpdfnet4.tflite` * `dpdfnet8.tflite` --- ## Key Features * Causal and low-latency: Designed for streaming use cases such as telephony, conferencing, and embedded devices. * Dual-Path RNN integration: Improves temporal context and frequency-domain interactions for more robust enhancement in difficult noise conditions. * Scalable family: Choose baseline or dpdfnet2/4/8 to balance quality vs. compute. * Edge deployment focus: Demonstrated on Ceva NeuPro Nano NPUs in the accompanying work. --- ## Model Variants and Footprint | Model | Params [M] | MACs [G] | TFLite Size [MB] | | --------- | ---------: | -------: | ---------------: | | Baseline | 2.31 | 0.36 | 8.5 | | DPDFNet-2 | 2.49 | 1.35 | 10.7 | | DPDFNet-4 | 2.84 | 2.36 | 12.9 | | DPDFNet-8 | 3.54 | 4.37 | 17.2 | --- ## Intended Use Primary task: Real-time, single-channel speech enhancement (noise suppression). Deployment targets: Mobile devices, embedded NPUs, and edge platforms. Input and Output: * Input: 16 kHz mono noisy speech waveform * Output: 16 kHz mono enhanced speech waveform Typical applications: * Voice calls and VoIP * Video conferencing * Always-on voice interfaces * Wearables, earbuds, and embedded audio devices --- ## Inference This repo includes a reference script for running the TFLite models on WAV files using streaming-style, frame-by-frame inference: `run_tflite.py`. ### Setup Install dependencies: ```bash pip install numpy soundfile librosa tqdm pip install tflite-runtime ``` ### Model placement By default, the script loads models from: * `./.tflite` Create the folder and place the `.tflite` files there (or edit `TFLITE_DIR` in the script to match your layout). ### Run enhancement on a folder of WAVs The script processes `*.wav` files non-recursively and writes enhanced outputs as 16-bit PCM WAVs: ```bash python run_tflite.py --noisy_dir /path/to/noisy_wavs --enhanced_dir /path/to/out --model_name dpdfnet8 ``` Available `--model_name` options: `baseline`, `dpdfnet2`, `dpdfnet4`, `dpdfnet8`. --- ## Training Data The models were trained using a mixture of public speech and noise datasets, including DNS4 (downsampled), MLS, MUSAN, and FSD50K. --- ## Citation If you use these models, please cite: ```bibtex @article{rika2025dpdfnet, title = {DPDFNet: Boosting DeepFilterNet2 via Dual-Path RNN}, author = {Rika, Daniel and Sapir, Nino and Gus, Ido}, year = {2025} } ``` --- ## License Apache-2.0