--- tags: - singing - svc - speech - synthesis - aigc - super-resolution license: apache-2.0 pipeline_tag: audio-to-audio --- # HQ-SVC: Towards High-Quality Zero-Shot Singing Voice Conversion in Low-Resource Scenarios Official Repository of Paper: "Towards High-Quality Zero-Shot Singing Voice Conversion in Low-Resource Scenarios"(AAAI 2026)

HQ-SVC Logo

arXiv Demos Models Access GitHub Repository
HQ-SVC is an efficient framework for high-quality zero-shot singing voice conversion (SVC) in low-resource scenarios. It achieves disentanglement of content and speaker features via a unified decoupled codec, and enhances synthesis quality through multi-feature fusion and progressive optimization. Unlike existing methods that demand large datasets or heavy computational resources, **HQ-SVC** unifies: - 🚀 Zero-shot conversion for unseen speakers without fine-tuning - ⚡ Low-resource training (single consumer-grade GPU, <80h data) - 🎧 Dual capabilities: high-quality singing voice conversion + voice super-resolution - 🎯 Superior naturalness and speaker similarity compared to SOTA methods ## 🗞 News - **[2025-11-08]** 🎉 Paper accepted by AAAI 2026 - **[2025-11-12]** 🎉 arXiv paper released - **[2025-11-12]** 🎉 Demo released - **[2025-12-24]** 🎉 Inference codes and pre-trained models released ## 📅 Release Plan - [x] arXiv preprint - [x] Online demo - [x] Inference codes - [x] Pre-trained models - [ ] Training codes ## ✨ New features - [ ] Singing style control - [ ] Improved quality ## 🎸 Try Inference ### 1. Download Codes and Environment(下载代码和环境) * Tested only on Linux platforms with CUDA >= 11.8 (仅在 Linux 平台、CUDA >= 11.8 的环境上测试通过) * Windows users can use WSL (Ubuntu) for deployment and execution (Windows 用户可以使用 WSL (Ubuntu) 进行部署运行) ```bash git clone https://github.com/ShawnPi233/HQ-SVC.git cd HQ-SVC ``` ```bash wget -c https://huggingface.co/shawnpi/HQ-SVC/resolve/main/environment.tar.gz ``` ```bash wget -c https://hf-mirror.com/shawnpi/HQ-SVC/resolve/main/environment.tar.gz # Optional mirror ``` ### 2. Unzip Environment(解压环境) ```bash mkdir -p venv tar -xzf environment.tar.gz -C venv ``` ### 3. Activate Environment(激活环境) ```bash source venv/bin/activate ``` ### 4. Running(运行) ```bash export HF_ENDPOINT=https://hf-mirror.com # Optional mirror python gradio_app.py ``` * If you encounter the error `Caught signal 11 (Segmentation fault: address not mapped to object at address (nil))` (如果报错 `Caught signal 11 (Segmentation fault: address not mapped to object at address (nil))`) * Please execute the following code before running the above code (请执行以下代码后再启动上述代码) ```bash unset LD_LIBRARY_PATH ```
sr **Zero-shot Super-Resolution (16 kHz to 44.1 kHz)**: Input only `source` audio
svc **Zero-shot Singing Voice Conversion**: Input both `source` audio and `target` audio
## 📜 Citation If you use HQ-SVC in your research, please cite our work: ```bibtex @article{bai2025hq, title={HQ-SVC: Towards High-Quality Zero-Shot Singing Voice Conversion in Low-Resource Scenarios}, author={Bai, Bingsong and Geng, Yizhong and Wang, Fengping and Wang, Cong and Guo, Puyuan and Gao, Yingming and Li, Ya}, journal={arXiv preprint arXiv:2511.08496}, year={2025} } ``` ## 🙏 Acknowledgement We thank the open-source communities behind: * **[DDSP-SVC](https://github.com/yxlllc/DDSP-SVC)** * **[Amphion](https://github.com/open-mmlab/Amphion)** * **[NaturalSpeech 3](https://speechresearch.github.io/naturalspeech3/)** * **[NSF-HIFIGAN](https://github.com/openvpi/vocoders)** * **[RMVPE](https://github.com/Dream-High/RMVPE)** ## ⭐️ Star History [![Star History Chart](https://api.star-history.com/svg?repos=ShawnPi233/HQ-SVC&type=date&legend=top-left)](https://www.star-history.com/#ShawnPi233/HQ-SVC&type=date&legend=top-left)