Add pipeline tag and library name
#1
by
nielsr
HF Staff
- opened
README.md
CHANGED
|
@@ -1,7 +1,260 @@
|
|
| 1 |
---
|
| 2 |
-
license: apache-2.0
|
| 3 |
base_model:
|
| 4 |
- stabilityai/stable-diffusion-2-1-base
|
|
|
|
|
|
|
|
|
|
| 5 |
paper:
|
| 6 |
-
- arxiv.org/abs/2503.
|
| 7 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
|
|
|
| 2 |
base_model:
|
| 3 |
- stabilityai/stable-diffusion-2-1-base
|
| 4 |
+
license: apache-2.0
|
| 5 |
+
pipeline_tag: text-to-3d
|
| 6 |
+
library_name: diffusers
|
| 7 |
paper:
|
| 8 |
+
- arxiv.org/abs/2503.21694
|
| 9 |
+
---
|
| 10 |
+
|
| 11 |
+
## File information
|
| 12 |
+
|
| 13 |
+
The repository contains the following file information:
|
| 14 |
+
|
| 15 |
+
|
| 16 |
+
|
| 17 |
+
Note: file information is just provided as context for you, do not add it to the model card.
|
| 18 |
+
|
| 19 |
+
# Project page
|
| 20 |
+
|
| 21 |
+
The project page URL we found has the following URL:
|
| 22 |
+
|
| 23 |
+
# Github README
|
| 24 |
+
|
| 25 |
+
The Github README we found contains the following content:
|
| 26 |
+
|
| 27 |
+
<img src="assets/Showcase_v4.drawio.png" width="100%" align="center">
|
| 28 |
+
<div align="center">
|
| 29 |
+
<h1>Progressive Rendering Distillation: Adapting Stable Diffusion for Instant Text-to-Mesh Generation without 3D Data</h1>
|
| 30 |
+
<div>
|
| 31 |
+
<a href='https://scholar.google.com/citations?user=F15mLDYAAAAJ&hl=en' target='_blank'>Zhiyuan Ma</a> 
|
| 32 |
+
<a href='https://scholar.google.com/citations?user=R9PlnKgAAAAJ&hl=en' target='_blank'>Xinyue Liang</a> 
|
| 33 |
+
<a href='https://scholar.google.com/citations?user=A-U8zE8AAAAJ&hl=en' target='_blank'>Rongyuan Wu</a> 
|
| 34 |
+
<a href='https://scholar.google.com/citations?user=1rbNk5oAAAAJ&hl=zh-CN' target='_blank'>Xiangyu Zhu</a> 
|
| 35 |
+
<a href='https://scholar.google.com/citations?user=cuJ3QG8AAAAJ&hl=en' target='_blank'>Zhen Lei</a> 
|
| 36 |
+
<a href='https://scholar.google.com/citations?user=tAK5l1IAAAAJ&hl=en' target='_blank'>Lei Zhang</a>
|
| 37 |
+
</div>
|
| 38 |
+
|
| 39 |
+
<div>
|
| 40 |
+
<a href="https://arxiv.org/abs/2503.21694"><img src='https://img.shields.io/badge/arXiv-Paper-red?logo=arxiv&logoColor=white' alt='arXiv'></a>
|
| 41 |
+
<a href='https://theericma.github.io/TriplaneTurbo/'><img src='https://img.shields.io/badge/Project_Page-Website-green?logo=googlechrome&logoColor=white' alt='Project Page'></a>
|
| 42 |
+
<a href='https://huggingface.co/spaces/ZhiyuanthePony/TriplaneTurbo'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Live_Demo-blue'></a>
|
| 43 |
+
<a href='https://theericma.github.io/TriplaneTurbo/static/pdf/main.pdf'><img src='https://img.shields.io/badge/Slides-Presentation-orange?logo=microsoftpowerpoint&logoColor=white' alt='Presentation Slides'></a>
|
| 44 |
+
</div>
|
| 45 |
+
|
| 46 |
+
|
| 47 |
+
---
|
| 48 |
+
|
| 49 |
+
</div>
|
| 50 |
+
|
| 51 |
+
<!-- Updates -->
|
| 52 |
+
## β© Updates
|
| 53 |
+
|
| 54 |
+
- **2025-04-01**: Presentation slides are now available for download.
|
| 55 |
+
- **2025-03-27**: The paper is now available on Arxiv.
|
| 56 |
+
- **2025-03-03**: Gradio and HuggingFace Demos are available.
|
| 57 |
+
- **2025-02-27**: TriplaneTurbo is accepted to CVPR 2025.
|
| 58 |
+
|
| 59 |
+
<!-- Features -->
|
| 60 |
+
## π Features
|
| 61 |
+
- **Fast Inference π**: Our code excels in inference efficiency, capable of outputting textured mesh in around 1 second.
|
| 62 |
+
- **Text Comprehension π**: It demonstrates strong understanding capabilities for complex text prompts, ensuring accurate generation according to the input.
|
| 63 |
+
- **3D-Data-Free Training π
ββοΈ**: The entire training process doesn't rely on any 3D datasets, making it more resource-friendly and adaptable.
|
| 64 |
+
|
| 65 |
+
|
| 66 |
+
## π€ Start local inference in 3 minutes
|
| 67 |
+
If you only wish to set up the demo locally, use the following code for the inference. Otherwise, for training and evaluation, use the next section of instructions for environment setup.
|
| 68 |
+
|
| 69 |
+
```python
|
| 70 |
+
python -m venv venv
|
| 71 |
+
source venv/bin/activate
|
| 72 |
+
bash setup.sh
|
| 73 |
+
python gradio_app.py
|
| 74 |
+
```
|
| 75 |
+
|
| 76 |
+
## π οΈ Official Installation
|
| 77 |
+
|
| 78 |
+
Create a virtual environment:
|
| 79 |
+
```sh
|
| 80 |
+
conda create -n triplaneturbo python=3.10
|
| 81 |
+
conda activate triplaneturbo
|
| 82 |
+
conda install pytorch==2.2.0 torchvision==0.17.0 torchaudio==2.2.0 pytorch-cuda=12.1 -c pytorch -c nvidia
|
| 83 |
+
```
|
| 84 |
+
(Optional, Recommended) Install xFormers for attention acceleration:
|
| 85 |
+
```sh
|
| 86 |
+
conda install xFormers -c xFormers
|
| 87 |
+
```
|
| 88 |
+
(Optional, Recommended) Install ninja to speed up the compilation of CUDA extensions
|
| 89 |
+
```sh
|
| 90 |
+
pip install ninja
|
| 91 |
+
```
|
| 92 |
+
Install major dependencies
|
| 93 |
+
```sh
|
| 94 |
+
pip install -r requirements.txt
|
| 95 |
+
```
|
| 96 |
+
Install iNGP
|
| 97 |
+
```sh
|
| 98 |
+
export PATH="/usr/local/cuda/bin:$PATH"
|
| 99 |
+
export LD_LIBRARY_PATH="/usr/local/cuda/lib64:$LD_LIBRARY_PATH"
|
| 100 |
+
pip install git+https://github.com/NVlabs/tiny-cuda-nn/#subdirectory=bindings/torch
|
| 101 |
+
```
|
| 102 |
+
If you encounter errors while installing iNGP, it is recommended to check your gcc version. Follow these steps to change the gcc version within your -cconda environment. After that, return to the project directory and reinstall iNGP and NerfAcc:
|
| 103 |
+
```sh
|
| 104 |
+
conda install -c conda-forge gxx=9.5.0
|
| 105 |
+
cd $CONDA_PREFIX/lib
|
| 106 |
+
ln -s /usr/lib/x86_64-linux-gnu/libcuda.so ./
|
| 107 |
+
cd <your project directory>
|
| 108 |
+
```
|
| 109 |
+
|
| 110 |
+
## π Evaluation
|
| 111 |
+
|
| 112 |
+
If you only want to run the evaluation without training, follow these steps:
|
| 113 |
+
|
| 114 |
+
```sh
|
| 115 |
+
# Download the model from HuggingFace
|
| 116 |
+
huggingface-cli download --resume-download ZhiyuanthePony/TriplaneTurbo \
|
| 117 |
+
--include "triplane_turbo_sd_v1.pth" \
|
| 118 |
+
--local-dir ./pretrained \
|
| 119 |
+
--local-dir-use-symlinks False
|
| 120 |
+
|
| 121 |
+
# Download evaluation assets
|
| 122 |
+
python scripts/prepare/download_eval_only.py
|
| 123 |
+
|
| 124 |
+
# Run evaluation script
|
| 125 |
+
bash scripts/eval/dreamfusion.sh --gpu 0,1 # You can use more GPUs (e.g. 0,1,2,3,4,5,6,7). For single GPU usage, please check the script for required modifications
|
| 126 |
+
```
|
| 127 |
+
|
| 128 |
+
Our evaluation metrics include:
|
| 129 |
+
- CLIP Similarity Score
|
| 130 |
+
- CLIP Recall@1
|
| 131 |
+
|
| 132 |
+
For detailed evaluation results, please refer to our paper.
|
| 133 |
+
|
| 134 |
+
If you want to evaluate your own model, use the following script:
|
| 135 |
+
```sh
|
| 136 |
+
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python launch.py \
|
| 137 |
+
--config <path_to_your_exp_config> \
|
| 138 |
+
--export \
|
| 139 |
+
system.exporter_type="multiprompt-mesh-exporter" \
|
| 140 |
+
resume=<path_to_your_ckpt> \
|
| 141 |
+
data.prompt_library="dreamfusion_415_prompt_library" \
|
| 142 |
+
system.exporter.fmt=obj
|
| 143 |
+
```
|
| 144 |
+
|
| 145 |
+
After running the script, you will find generated OBJ files in `outputs/<your_exp>/dreamfusion_415_prompt_library/save/<itXXXXX-export>`. Set this path as `<OBJ_DIR>`, and set `outputs/<your_exp>/dreamfusion_415_prompt_library/save/<itXXXXX-4views>` as `<VIEW_DIR>`. Then run:
|
| 146 |
+
|
| 147 |
+
```sh
|
| 148 |
+
SAVE_DIR=<VIEW_DIR>
|
| 149 |
+
python evaluation/mesh_visualize.py \
|
| 150 |
+
<OBJ_DIR> \
|
| 151 |
+
--save_dir $SAVE_DIR \
|
| 152 |
+
--gpu 0,1,2,3,4,5,6,7
|
| 153 |
+
|
| 154 |
+
python evaluation/clipscore/compute.py \
|
| 155 |
+
--result_dir $SAVE_DIR
|
| 156 |
+
```
|
| 157 |
+
The evaluation results will be displayed in your terminal once the computation is complete.
|
| 158 |
+
|
| 159 |
+
## π Training Options
|
| 160 |
+
|
| 161 |
+
### 1. Download Required Pretrained Models and Datasets
|
| 162 |
+
Use the provided download script to get all necessary files:
|
| 163 |
+
```sh
|
| 164 |
+
python scripts/prepare/download_full.py
|
| 165 |
+
```
|
| 166 |
+
|
| 167 |
+
This will download:
|
| 168 |
+
- Stable Diffusion 2.1 Base
|
| 169 |
+
- Stable Diffusion 1.5
|
| 170 |
+
- MVDream 4-view checkpoint
|
| 171 |
+
- RichDreamer checkpoint
|
| 172 |
+
- Text prompt datasets (3DTopia and DALLE+Midjourney)
|
| 173 |
+
|
| 174 |
+
### 2. Training Options
|
| 175 |
+
|
| 176 |
+
#### Option 1: Train with 3DTopia Text Prompts
|
| 177 |
+
```sh
|
| 178 |
+
# Single GPU
|
| 179 |
+
CUDA_VISIBLE_DEVICES=0 python launch.py \
|
| 180 |
+
--config configs/TriplaneTurbo_v0_acc-2.yaml \
|
| 181 |
+
--train \
|
| 182 |
+
data.prompt_library="3DTopia_prompt_library" \
|
| 183 |
+
data.condition_processor.cache_dir=".threestudio_cache/text_embeddings_3DTopia" \
|
| 184 |
+
data.guidance_processor.cache_dir=".threestudio_cache/text_embeddings_3DTopia"
|
| 185 |
+
```
|
| 186 |
+
|
| 187 |
+
For multi-GPU training:
|
| 188 |
+
```sh
|
| 189 |
+
# 8 GPUs with 48GB+ memory each
|
| 190 |
+
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python launch.py \
|
| 191 |
+
--config configs/TriplaneTurbo_v1_acc-2.yaml \
|
| 192 |
+
--train \
|
| 193 |
+
data.prompt_library="3DTopia_361k_prompt_library" \
|
| 194 |
+
data.condition_processor.cache_dir=".threestudio_cache/text_embeddings_3DTopia" \
|
| 195 |
+
data.guidance_processor.cache_dir=".threestudio_cache/text_embeddings_3DTopia"
|
| 196 |
+
```
|
| 197 |
+
|
| 198 |
+
#### Option 2: Train with DALLE+Midjourney Text Prompts
|
| 199 |
+
Choose the appropriate command based on your GPU configuration:
|
| 200 |
+
|
| 201 |
+
```sh
|
| 202 |
+
# Single GPU
|
| 203 |
+
CUDA_VISIBLE_DEVICES=0 python launch.py \
|
| 204 |
+
--config configs/TriplaneTurbo_v0_acc-2.yaml \
|
| 205 |
+
--train \
|
| 206 |
+
data.prompt_library="DALLE_Midjourney_prompt_library" \
|
| 207 |
+
data.condition_processor.cache_dir=".threestudio_cache/text_embeddings_DE+MJ" \
|
| 208 |
+
data.guidance_processor.cache_dir=".threestudio_cache/text_embeddings_DE+MJ"
|
| 209 |
+
```
|
| 210 |
+
|
| 211 |
+
For multi-GPU training (higher performance):
|
| 212 |
+
```sh
|
| 213 |
+
# 8 GPUs with 48GB+ memory each
|
| 214 |
+
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python launch.py \
|
| 215 |
+
--config configs/TriplaneTurbo_v1_acc-2.yaml \
|
| 216 |
+
--train \
|
| 217 |
+
data.prompt_library="DALLE_Midjourney_prompt_library" \
|
| 218 |
+
data.condition_processor.cache_dir=".threestudio_cache/text_embeddings_DE+MJ" \
|
| 219 |
+
data.guidance_processor.cache_dir=".threestudio_cache/text_embeddings_DE+MJ"
|
| 220 |
+
```
|
| 221 |
+
|
| 222 |
+
### 3. Configuration Notes
|
| 223 |
+
- **Memory Requirements**:
|
| 224 |
+
- v1 configuration: Requires GPUs with 48GB+ memory
|
| 225 |
+
- v0 configuration: Works with GPUs that have less memory (46GB+) but with reduced performance
|
| 226 |
+
|
| 227 |
+
- **Acceleration Options**:
|
| 228 |
+
- Use `_acc-2.yaml` configs for gradient accumulation to reduce memory usage
|
| 229 |
+
|
| 230 |
+
- **Advanced Options**:
|
| 231 |
+
- For highest quality, use `configs/TriplaneTurbo_v1.yaml` with `system.parallel_guidance=true` (requires 98GB+ memory GPUs)
|
| 232 |
+
- To disable certain guidance components: add `guidance.rd_weight=0 guidance.sd_weight=0` to the command
|
| 233 |
+
|
| 234 |
+
|
| 235 |
+
|
| 236 |
+
|
| 237 |
+
|
| 238 |
+
<!-- Citation -->
|
| 239 |
+
## π Citation
|
| 240 |
+
|
| 241 |
+
If you find this work helpful, please consider citing our paper:
|
| 242 |
+
```
|
| 243 |
+
@article{ma2025progressive,
|
| 244 |
+
title={Progressive Rendering Distillation: Adapting Stable Diffusion for Instant Text-to-Mesh Generation without 3D Data},
|
| 245 |
+
author={Ma, Zhiyuan and Liang, Xinyue and Wu, Rongyuan and Zhu, Xiangyu and Lei, Zhen and Zhang, Lei},
|
| 246 |
+
booktitle={Proceedings of the IEEE/CVF conference on computer vision and pattern recognition},
|
| 247 |
+
year={2025}
|
| 248 |
+
}
|
| 249 |
+
```
|
| 250 |
+
|
| 251 |
+
|
| 252 |
+
<!-- Acknowledgement -->
|
| 253 |
+
## π Acknowledgement
|
| 254 |
+
Our code is heavily based on the following works
|
| 255 |
+
- [ThreeStudio](https://github.com/threestudio-project/threestudio): A clean and extensible codebase for 3D generation via Score Distillation.
|
| 256 |
+
- [MVDream](https://github.com/bytedance/MVDream): Used as one of our multi - view teachers.
|
| 257 |
+
- [RichDreamer](https://github.com/bytedance/MVDream): Serves as another multi - view teacher for normal and depth supervision
|
| 258 |
+
- [3DTopia](https://github.com/3DTopia/3DTopia): Its text caption dataset is applied in our training and comparison.
|
| 259 |
+
- [DiffMC](https://github.com/SarahWeiii/diso): Our solution uses its differentiable marching cube for mesh rasterization.
|
| 260 |
+
- [NeuS](https://github.com/Totoro97/NeuS): We implement its SDF - based volume rendering for dual rendering in our solution
|