šŸ“Ÿ PaGeR

Panoramic Geometry Reconstruction

Unified Panoramic Geometry Estimation via Multi-View Foundation Models

Vukasin Bozic Vukasin Bozic1 Isidora Slavkovic Isidora Slavkovic2 Dominik Narnhofer Dominik Narnhofer1 Nando Metzger Nando Metzger1,4 Denis Rozumny Denis Rozumny3 Konrad Schindler Konrad Schindler1 Nikolai Kalischek Nikolai Kalischek2
school 1ETH Zurich
business 2Google
business 3Meta
business 4Athlence Sports

Abstract

Geometry estimation from perspective images has greatly advanced, maturing to the point where off-the-shelf foundation models are able to reconstruct 3D scene structure not only from multi-view imagery, but even from a single view. A natural extension is 3D reconstruction from panoramas, with the exciting prospect of recovering a full 360-degree scene from a single panoramic image. In this work, we introduce PaGeR (Panoramic Geometry Reconstruction), a framework to lift powerful 3D foundation models designed for perspective imagery to the panorama domain. Our strategy is to start from a pre-trained transformer for 3D reconstruction and turn it into a unified high-performance model that predicts scale-invariant depth, metric depth, surface normals, and sky masks from both perspective and omnidirectional images, in a single forward pass. By keeping architectural changes to a minimum and mixing perspective and panoramic images during training, PaGeR retains the rich 3D prior of the underlying foundation model while learning to also estimate geometrically consistent 360-degree scenes from single panoramas. We extensively test our method in both indoor and outdoor environments and find that it delivers state-of-the-art performance and excellent zero-shot performance across a wide range of scenes.

Method

PaGeR method overview: an RGB panorama is encoded by a shared backbone whose features drive sky, normals, depth and coarse-metric heads, producing a sky mask, surface normals, scale-invariant depth and metric depth.

Cubemap Projection

The equirectangular panorama is projected into a fixed 6 Ɨ 504 Ɨ 504 cubemap, avoiding polar distortion and decoupling runtime and memory from the input resolution.

Multi-View Foundation Backbone

Rather than a panorama-specific network, PaGeR lifts a perspective 3D foundation model (Depth Anything 3, ViT-Giant) to 360° by treating the six cube faces as a multi-view set, reusing its strong geometry priors.

Multi-Task Heads

A single forward pass drives four heads — scale-invariant depth, surface normals, sky, and coarse-metric. The coarse-metric head rescales the scale-invariant depth into metric depth in metres, alongside the surface normals and sky mask.

Datasets

To train and benchmark panoramic geometry, we release two complementary datasets.

auto_awesome PanoInfinigen

A large-scale synthetic dataset for high-resolution, general-purpose panoramic geometry. We extend the procedural Infinigen pipeline to 360° equirectangular rendering, producing 70,000 panoramas from 20,000 distinct indoor scenes — from kitchens and bedrooms to natural landscapes — and add ~7,000 outdoor urban panoramas via the iCity city generator. Every sample is rendered at native 4K with complete, pixel-perfect ground-truth metric depth and surface normals.

photo_camera ZüriPano

A real-world outdoor benchmark of 100 panoramic scans across 11 urban locations in Zürich, captured with a Leica RTC360 LiDAR scanner at 8K resolution with an effective range of 130 m. Each panorama comes with a dense, high-accuracy depth map and a validity mask, providing challenging large-scale outdoor geometry that complements the synthetic training data.

Quantitative Results

From a single panorama and one forward pass, PaGeR sets a new state of the art across scale-invariant depth, metric depth, and surface normals. Best per column in bold.

Scale-Invariant Depth

Depth up to a single global scale (least-squares aligned), on two indoor benchmarks and the outdoor ZüriPano. PaGeR leads every metric, and the gap is largest on real outdoor scenes — where perspective foundation models such as DA² and UniK3D break down.

Method Matterport3D360 Stanford2D3DS ZüriPano
AbsRel↓RMSE↓Γ₁↑ AbsRel↓RMSE↓Γ₁↑ AbsRel↓RMSE↓Γ₁↑
DreamCube*30.45108.8154.2628.4575.3056.3929.57484.9153.07
DepthAnyCamera25.6294.0668.1118.3553.7376.2321.98487.6665.41
MoGe18.1281.6877.1115.3449.8882.5219.25484.1877.12
EGformer*16.7496.1579.3213.6458.9584.6455.49721.8331.21
DAP15.8485.1882.449.7646.6992.3719.86583.4272.09
RPG36015.4079.4082.4011.9146.6087.7318.27455.2878.41
UniK3D†14.8269.2783.679.9340.6993.3431.00832.1338.78
PanDA*13.8482.8284.2610.6955.7290.7324.64515.4858.31
DA²11.0667.7289.187.6437.2995.6361.22869.082.17
PaGeR (Ours)9.6764.6990.875.9335.3496.109.36299.6194.75
AbsRel / Γ₁ in %, RMSE in cm. * affine-invariant Ā· † optimized with in-domain training.

Metric Depth

Absolute depth in metres — a far harder setting. PaGeR is best on eight of nine metrics across the three benchmarks.

Method Matterport3D360 Stanford2D3DS ZüriPano
AbsRel↓RMSE↓Γ₁↑ AbsRel↓RMSE↓Γ₁↑ AbsRel↓RMSE↓Γ₁↑
UniK3D†33.43143.2446.4824.3764.7164.5035.811345.4834.86
RPG36029.26136.9935.3329.3590.7317.3640.36772.162.14
DepthAnyCamera25.16132.6662.2517.1559.2977.9833.23716.3837.83
DAP—30.16168.8880.1910.9753.3990.6447.00839.7224.02
PaGeR— (Ours)21.83123.4869.5010.9445.4390.9431.97530.8539.30
AbsRel / Γ₁ in %, RMSE in cm. † optimized with in-domain training Ā· — separate indoor / outdoor prediction heads.

Surface Normals

Evaluated on Structured3D against specialist normal estimators trained in-domain. Despite being a single unified model, PaGeR is best on every metric.

Method Mean↓MSE↓Γ<5°↑Γ<22.5°↑
UniFuse8.25453.176.2487.55
PanoFormer16.921053.559.1375.50
OmniFusion20.70832.028.5163.55
MonoViT5.92277.878.9390.58
HyperSphere5.79253.478.3890.73
MTL8.98469.072.5186.02
PanoNormal5.56246.679.1891.01
PaGeR (Ours)5.49174.979.9192.83
Mean / MSE are angular errors in degrees; Γ<t° is the fraction of pixels within t° of ground truth (%).

Qualitative Results

compare Depth Comparison

Drag the slider to compare. We compare PaGeR against the strongest scale-invariant competitor (DA²) and the strongest metric-depth competitor (DAP).

Interactive

Scenes — scroll for more →

Competitor

Competitor Depth
Ours Depth
PaGeR (Ours) Competitor
code

compare Normal Comparison

Drag the slider to compare PaGeR's surface normals against the strongest panoramic specialist (MTL), or against the input RGB panorama. Normal direction is encoded as RGB color.

Interactive

Scenes

Competitor

Competitor Result
Ours Result
PaGeR (Ours) Competitor
code

view_in_ar 3D Point Cloud Comparison

PaGeR on the left, competitor on the right — cameras stay in sync. We compare PaGeR against the strongest scale-invariant competitor (DA²) and the strongest metric-depth competitor (DAP).

Rotate: drag Pan: Ctrl + drag, or two-finger touch Zoom: scroll

What to look for: global layout — PaGeR reconstructs a coherent scene with flat, parallel walls, where competitors warp and tilt the same surfaces; local detail — PaGeR preserves the fine geometry of edges and clutter, while competitors squash, oversmooth, and flatten it.

Interactive

Scenes — scroll for more →

Competitor

Input RGB Panorama

RGB input panorama

Citation

@article{bozic2026pager,
  title={Unified Panoramic Geometry Estimation via Multi-View Foundation Models},
  author={Bozic, Vukasin and Slavkovic, Isidora and Narnhofer, Dominik and Metzger, Nando and Rozumny, Denis and Schindler, Konrad and Kalischek, Nikolai},
  journal={arXiv preprint arXiv:2605.26368},
  year={2026}
}