arxiv:2605.26368

Unified Panoramic Geometry Estimation via Multi-View Foundation Models

Published on May 25

· Submitted by

Vukasin Bozic on May 28

Photogrammetry and Remote Sensing Lab of ETH Zurich

Upvote

Authors:

Vukasin Bozic ,

Abstract

PaGeR is a framework that adapts 3D foundation models for perspective imagery to reconstruct 360-degree scenes from panoramic images, enabling simultaneous prediction of depth, normals, and sky masks with high performance.

AI-generated summary

Geometry estimation from perspective images has greatly advanced, maturing to the point where off-the-shelf foundation models are able to reconstruct 3D scene structure not only from multi-view imagery, but even from a single view. A natural extension is 3D reconstruction from panoramas, with the exciting prospect of recovering a full 360-degree scene from a single panoramic image. In this work, we introduce PaGeR (Panoramic Geometry Reconstruction), a framework to lift powerful 3D foundation models designed for perspective imagery to the panorama domain. Our strategy is to start from a pre-trained transformer for 3D reconstruction and turn it into a unified high-performance model that predicts scale-invariant depth, metric depth, surface normals, and sky masks from both perspective and omnidirectional images, in a single forward pass. By keeping architectural changes to a minimum and mixing perspective and panoramic images during training, PaGeR retains the rich 3D prior of the underlying foundation model while learning to also estimate geometrically consistent 360-degree scenes from single panoramas. We extensively test our method in both indoor and outdoor environments and find that it delivers state-of-the-art performance and excellent zero-shot performance across a wide range of scenes.

View arXiv page View PDF Project page GitHub 10 Add to collection

Community

vulus98

Paper author Paper submitter about 21 hours ago

TL;DR: PaGeR turns a perspective 3D foundation model into a single-pass 360° geometry estimator — from one equirectangular image it predicts scale-invariant depth, metric depth (in metres), surface normals, and sky segmentation at full panoramic resolution.

We introduce PaGeR (Panoramic Geometry Reconstruction), which lifts a multi-view perspective foundation model (Depth Anything 3) to the panoramic domain via a fixed 6×504×504 cubemap, so VRAM and runtime stay constant regardless of input resolution. A single forward pass returns Scale-invariant + metric depth, world-frame normals, and a sky mask. We also release two new datasets — ZüriPano (real eval) and PanoInfinigen (synthetic training).

🔗 Project page: https://pager360.github.io · 🤗 Demo: https://huggingface.co/spaces/prs-eth/PaGeR · Collection (models + datasets): https://huggingface.co/collections/prs-eth/pager-697241d06b3733a6f18e4d39 · Code: https://github.com/prs-eth/PaGeR

Happy to answer any questions!