env_name=mvdp
mkdir -p $env_name
tar -xzf $env_name.tar.gz -C $env_name
source $env_name/bin/activate
conda-unpack
python demo.py --weights ./checkpoints/MVD.pth
[CVPR 2025] MV-DUSt3R+: Single-Stage Scene Reconstruction from Sparse Views In 2 Seconds
Paper | Website | Video | Data | CheckpointsZhenggang Tang, Yuchen Fan, Dilin Wang, Hongyu Xu,Rakesh Ranjan, Alexander Schwing, Zhicheng Yan
TL;DR
Multi-view Pose-free RGB-only 3D reconstruction in one step. Also supports for new view synthesis and relative pose estimation.
Please see more visual results and video on our website!
Update Logs
- 2025-1-30: data generation code of ScanNet.
- 2025-1-1: A gradio demo, all checkpoints, training/evaluation code and training/evaluation trajectories of ScanNet.
- 2025-1-8: demo view selection improved, better quality for multiple rooms.
Installation
We only test this on a linux server and CUDA=12.4
- Clone MV-DUSt3R+
git clone https://github.com/facebookresearch/mvdust3r.git
cd mvdust3r
- Install the virtual environment under anaconda.
./install.sh
(version of pytorch and pytorch3d should be changed if you need other CUDA version.)
- (Optional for faster runtime) Compile the cuda kernels for RoPE (the same as DUSt3R and Croco)
cd croco/models/curope/
python setup.py build_ext --inplace
cd ../../../
Checkpoints
Please download checkpoints here to the folder checkpoints before trying demo and evaluation.
| Name | Description |
|---|---|
| MVD.pth | MV-DUSt3R |
| MVDp_s1.pth | MV-DUSt3R+ trained on stage 1 (8 views) |
| MVDp_s2.pth | MV-DUSt3R+ trained on stage 1 then stage 2 (mixed 4~12 views) |
| DUSt3R_ViTLarge_BaseDecoder_224_linear.pth | the pretrained DUSt3R model. Our training is finetuned upon it |
Gradio Demo
python demo.py --weights ./checkpoints/{CHECKPOINT}
You will see the UI like this:
The input can be multiple images (we do not support a single image) or a video. You will see the pointcloud along with predicted camera poses (3DGS visualization as future work).
The confidence threshold controls how many low confidence points should be filtered.
The No. of video frames is only valid when the input is a video and controls how many frames are uniformly selected from the video for reconstruction.
Note that the demo's inference is slower than what claimed in the paper due to overheads of gradio and model loading. If you need faster runtime, please use our evaluation code.
some tips to improve quality especially for multiple rooms.
Data
We use five data for training and test: ScanNet, ScanNet++, HM3D, Gibson, MP3D. Please go to their website to sign contract, download and extract them in the folder data. Here are more instructions.
Currently we released the trajectories of ScanNet for evaluation. Please download it to the folder trajectories More trajectories for training and more data will be released later.
Evaluation
Here we have the following scripts for evaluation on ScanNet in the folder scripts:
| Name | Description |
|---|---|
| test_mvd.sh | MV-DUSt3R |
| test_mvdp_stage1.sh | MV-DUSt3R+ trained on stage 1 (8 views) |
| test_mvdp_stage2.sh | MV-DUSt3R+ trained on stage 1 then stage 2 (mixed 4~12 views) |
They should reproduce the paper's result on ScanNet (Tab. 2, 3, 4, S2, S3, and S5).
Training
We are still preparing for the releasing of trajectories of training data and code of trajectory generation. Here we also put training scripts in the folder scripts, which can provide more information about our training.
| Name | Description |
|---|---|
| train_mvd.sh | MV-DUSt3R, loaded from DUSt3R to finetune |
| train_mvdp_stage1.sh | MV-DUSt3R+ training on stage 1 (8 views), loaded from DUSt3R to finetune |
| train_mvdp_stage2.sh | MV-DUSt3R+ trained on stage 1 finetuning on stage 2 (mixed 4~12 views) |
Citation
@article{tang2024mv,
title={MV-DUSt3R+: Single-Stage Scene Reconstruction from Sparse Views In 2 Seconds},
author={Tang, Zhenggang and Fan, Yuchen and Wang, Dilin and Xu, Hongyu and Ranjan, Rakesh and Schwing, Alexander and Yan, Zhicheng},
journal={arXiv preprint arXiv:2412.06974},
year={2024}
}
License
We use CC BY-NC 4.0
Acknowledgement
Many thanks to:
- DUSt3R for the codebase.