Unified Panoramic Geometry Estimation via Multi-View Foundation Models
Abstract
PaGeR is a framework that adapts 3D foundation models for perspective imagery to reconstruct 360-degree scenes from panoramic images, enabling simultaneous prediction of depth, normals, and sky masks with high performance.
Geometry estimation from perspective images has greatly advanced, maturing to the point where off-the-shelf foundation models are able to reconstruct 3D scene structure not only from multi-view imagery, but even from a single view. A natural extension is 3D reconstruction from panoramas, with the exciting prospect of recovering a full 360-degree scene from a single panoramic image. In this work, we introduce PaGeR (Panoramic Geometry Reconstruction), a framework to lift powerful 3D foundation models designed for perspective imagery to the panorama domain. Our strategy is to start from a pre-trained transformer for 3D reconstruction and turn it into a unified high-performance model that predicts scale-invariant depth, metric depth, surface normals, and sky masks from both perspective and omnidirectional images, in a single forward pass. By keeping architectural changes to a minimum and mixing perspective and panoramic images during training, PaGeR retains the rich 3D prior of the underlying foundation model while learning to also estimate geometrically consistent 360-degree scenes from single panoramas. We extensively test our method in both indoor and outdoor environments and find that it delivers state-of-the-art performance and excellent zero-shot performance across a wide range of scenes.
Community
TL;DR: PaGeR turns a perspective 3D foundation model into a single-pass 360° geometry estimator — from one equirectangular image it predicts scale-invariant depth, metric depth (in metres), surface normals, and sky segmentation at full panoramic resolution.
We introduce PaGeR (Panoramic Geometry Reconstruction), which lifts a multi-view perspective foundation model (Depth Anything 3) to the panoramic domain via a fixed 6×504×504 cubemap, so VRAM and runtime stay constant regardless of input resolution. A single forward pass returns Scale-invariant + metric depth, world-frame normals, and a sky mask. We also release two new datasets — ZüriPano (real eval) and PanoInfinigen (synthetic training).
🔗 Project page: https://pager360.github.io · 🤗 Demo: https://huggingface.co/spaces/prs-eth/PaGeR · Collection (models + datasets): https://huggingface.co/collections/prs-eth/pager-697241d06b3733a6f18e4d39 · Code: https://github.com/prs-eth/PaGeR
Happy to answer any questions!
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Learning 3D Representations for Spatial Intelligence from Unposed Multi-View Images (2026)
- Holo360D: A Large-Scale Real-World Dataset with Continuous Trajectories for Advancing Panoramic 3D Reconstruction and Beyond (2026)
- CalibAnyView: Beyond Single-View Camera Calibration in the Wild (2026)
- Fisheye3R: Adapting Unified 3D Feed-Forward Foundation Models to Fisheye Lenses (2026)
- IVGT: Implicit Visual Geometry Transformer for Neural Scene Representation (2026)
- Generalizable Sparse-View 3D Reconstruction from Unconstrained Images (2026)
- GemDepth: Geometry-Embedded Features for 3D-Consistent Video Depth (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend
Get this paper in your agent:
hf papers read 2605.26368 Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash