YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

env_name=mvdp
mkdir -p $env_name
tar -xzf $env_name.tar.gz -C $env_name

source $env_name/bin/activate
conda-unpack

python demo.py --weights ./checkpoints/MVD.pth

[CVPR 2025] MV-DUSt3R+: Single-Stage Scene Reconstruction from Sparse Views In 2 Seconds

Paper | Website | Video | Data | Checkpoints

Zhenggang Tang, Yuchen Fan, Dilin Wang, Hongyu Xu,Rakesh Ranjan, Alexander Schwing, Zhicheng Yan

TL;DR

Multi-view Pose-free RGB-only 3D reconstruction in one step. Also supports for new view synthesis and relative pose estimation.

Please see more visual results and video on our website!

Update Logs

2025-1-30: data generation code of ScanNet.
2025-1-1: A gradio demo, all checkpoints, training/evaluation code and training/evaluation trajectories of ScanNet.
2025-1-8: demo view selection improved, better quality for multiple rooms.

Installation

We only test this on a linux server and CUDA=12.4

Clone MV-DUSt3R+

git clone https://github.com/facebookresearch/mvdust3r.git
cd mvdust3r

Install the virtual environment under anaconda.

./install.sh

(version of pytorch and pytorch3d should be changed if you need other CUDA version.)

(Optional for faster runtime) Compile the cuda kernels for RoPE (the same as DUSt3R and Croco)

cd croco/models/curope/
python setup.py build_ext --inplace
cd ../../../

Checkpoints

Please download checkpoints here to the folder checkpoints before trying demo and evaluation.

Name	Description
MVD.pth	MV-DUSt3R
MVDp_s1.pth	MV-DUSt3R+ trained on stage 1 (8 views)
MVDp_s2.pth	MV-DUSt3R+ trained on stage 1 then stage 2 (mixed 4~12 views)
DUSt3R_ViTLarge_BaseDecoder_224_linear.pth	the pretrained DUSt3R model. Our training is finetuned upon it

Gradio Demo

python demo.py --weights ./checkpoints/{CHECKPOINT}

You will see the UI like this:

The input can be multiple images (we do not support a single image) or a video. You will see the pointcloud along with predicted camera poses (3DGS visualization as future work).

The confidence threshold controls how many low confidence points should be filtered. The No. of video frames is only valid when the input is a video and controls how many frames are uniformly selected from the video for reconstruction.

Note that the demo's inference is slower than what claimed in the paper due to overheads of gradio and model loading. If you need faster runtime, please use our evaluation code.

some tips to improve quality especially for multiple rooms.

Data

We use five data for training and test: ScanNet, ScanNet++, HM3D, Gibson, MP3D. Please go to their website to sign contract, download and extract them in the folder data. Here are more instructions.

Currently we released the trajectories of ScanNet for evaluation. Please download it to the folder trajectories More trajectories for training and more data will be released later.

Evaluation

Here we have the following scripts for evaluation on ScanNet in the folder scripts:

Name	Description
test_mvd.sh	MV-DUSt3R
test_mvdp_stage1.sh	MV-DUSt3R+ trained on stage 1 (8 views)
test_mvdp_stage2.sh	MV-DUSt3R+ trained on stage 1 then stage 2 (mixed 4~12 views)

They should reproduce the paper's result on ScanNet (Tab. 2, 3, 4, S2, S3, and S5).

Training

We are still preparing for the releasing of trajectories of training data and code of trajectory generation. Here we also put training scripts in the folder scripts, which can provide more information about our training.

Name	Description
train_mvd.sh	MV-DUSt3R, loaded from DUSt3R to finetune
train_mvdp_stage1.sh	MV-DUSt3R+ training on stage 1 (8 views), loaded from DUSt3R to finetune
train_mvdp_stage2.sh	MV-DUSt3R+ trained on stage 1 finetuning on stage 2 (mixed 4~12 views)

Citation

@article{tang2024mv,
  title={MV-DUSt3R+: Single-Stage Scene Reconstruction from Sparse Views In 2 Seconds},
  author={Tang, Zhenggang and Fan, Yuchen and Wang, Dilin and Xu, Hongyu and Ranjan, Rakesh and Schwing, Alexander and Yan, Zhicheng},
  journal={arXiv preprint arXiv:2412.06974},
  year={2024}
}

License

We use CC BY-NC 4.0

Acknowledgement

Many thanks to:

DUSt3R for the codebase.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support