PaGeR (Panoramic Geometry Reconstruction) lifts a perspective 3D foundation model to the 360° panoramic domain. From a single equirectangular image, a single forward pass returns:
- Scale-invariant depth at full panoramic resolution.
- Metric depth in metres — recovered by multiplying the SI depth by a single per-panorama scale emitted by a separate scale head; PaGeR ships two such heads (indoor / outdoor) and the inference pipeline picks one per panorama via a CLIP router.
- Surface normals as unit vectors in the world frame of the panorama.
- Sky segmentation for masking unbounded depth regions.
This repository contains the inference and evaluation code that produced the numbers in our paper, plus a Gradio demo and helpers to export the predicted geometry as a coloured point cloud.
- 2026-05-27 — arXiv preprint and project website go live.
- 2026-05-26 — code, three pretrained checkpoints (
prs-eth/PaGeR,prs-eth/PaGeR-metric-depth,prs-eth/PaGeR-normals) and both datasets (prs-eth/ZuriPano,prs-eth/PanoInfinigen) are live on the Hub.
- Release status
- Installation
- Pretrained models
- Gradio demo
- Batch inference
- Evaluation
- Point-cloud export
- Citation
- License
PaGeR is tested with Python 3.10, PyTorch ≥ 2.0, and CUDA ≥ 12.1
on Linux. Other configurations likely work; these are the ones we run in CI.
PaGeR always projects the input panorama into a fixed 6 × 504 × 504
cubemap before the forward pass, regardless of the input ERP resolution, so
peak VRAM and runtime do not scale with the input size (only the final
cubemap-to-ERP stitch does). At that fixed resolution the unified checkpoint
(backbone + depth/normals/sky/scale heads + CLIP router) needs
≈ 11.5 GiB of VRAM at fp16 on an RTX 4090, so any 3090 / 4090 /
A4000-class GPU with ≥ 12 GB is enough; the depth-only and normals-only
checkpoints fit in ≈ 9.8 GiB.
git clone https://github.com/prs-eth/PaGeR.git
cd PaGeR
# Editable install — exposes `depth_anything_3` as an importable package so
# the backbone's internal absolute imports resolve. The rest of the code
# (Pager, dataloaders, eval scripts) is run directly from the repo root.
pip install -e .
# Gradio demo (optional)
pip install -e ".[app]"If you hit XFormers wheel issues on older GPUs, see the upstream FAQ for build instructions.
Three checkpoints on the Hub, same ViT-Giant backbone, different heads:
| Checkpoint | --checkpoint alias |
SI depth | Metric depth | Normals | Sky |
|---|---|---|---|---|---|
PaGeR (recommended) — prs-eth/PaGeR |
pager |
✅ | ✅ (SI × CLIP-routed indoor / outdoor scale head) | ✅ | ✅ |
PaGeR-Metric-Depth — prs-eth/PaGeR-metric-depth |
pager-metric-depth |
— | ✅ (single direct head) | — | — |
PaGeR-Normals — prs-eth/PaGeR-normals |
pager-normals |
— | — | ✅ | — |
inference.py / app.py accept the alias, a Hub repo id, or a local
directory with model.safetensors + config.yaml; Hub weights are streamed
into the HF cache on first use.
Heads up — indoor / outdoor scale routing. On the unified checkpoint,
--scene_mode auto(default) runs a small CLIP ViT-B/32 classifier on the cubemap to route each panorama through the matching indoor / outdoor scale head; pass--scene_mode indoororoutdoorto force one head and reproduce the per-domain paper numbers. Note that the automatic CLIP-based indoor / outdoor routing was added after the paper submission — the paper numbers were produced with--scene_mode indoor/outdoorforced per dataset, so use those flags to exactly reproduce the reported results.
python app.py --checkpoint pager # or prs-eth/PaGeR / a local dirOpen http://127.0.0.1:7860 and drop a panorama into the file picker. The
demo includes example panoramas, switches between map and point-cloud
output, and exposes the same auto / indoor / outdoor scale-head routing as
the CLI. A hosted version is also available on the
PaGeR HuggingFace Space.
inference.py runs the model on every panorama in a chosen evaluation
dataset and writes raw predictions plus side-by-side previews:
python inference.py \
--config configs/inference.yaml \
--checkpoint <model_name> \
--data_path /path/to/datasets \
--dataset <dataset_name> \
--results_path results/ \
--scene_mode auto \
--generate_evalKey flags:
--checkpoint— which model to run. Accepts a short alias for one of the released checkpoints —pager(default,prs-eth/PaGeR),pager-metric-depth(prs-eth/PaGeR-metric-depth),pager-normals(prs-eth/PaGeR-normals) — or a HuggingFace Hub repo id (<user>/<repo>) or a local directory containingmodel.safetensors+config.yaml.--scene_mode {auto,indoor,outdoor}— controls the scale-head routing. On the unified checkpointautois the default; on single-domain checkpoints this flag has no effect.--generate_eval— also write per-sample.npzarrays under<results>/<modality>/preds/so the evaluation scripts can pick them up.--sky_mask_threshold,--sky_mask_softness,--sky_mask_open_kernel— tune the soft sky-fill applied to the depth / normals outputs.
Expected dataset layout under --data_path:
<data_path>/
├── Matterport3D360/
├── Stanford2D3DS/
├── Structured3D/
└── Replica360_4K/
Download / access pages for each dataset:
| Dataset | Source | Use in PaGeR |
|---|---|---|
| Matterport3D360 | re-projected from Matterport3D | eval only |
| Stanford 2D-3D-S | Armeni et al., Stanford | eval only |
| Replica360_4K | 4K equirectangular renders of Replica (Facebook Research) | eval only |
| Structured3D | Zheng et al. | training + eval (normals) |
| ZüriPano | released with PaGeR | eval |
| PanoInfinigen | released with PaGeR (Infinigen / iCity renders) | training |
The three eval-only datasets (Matterport3D360, Stanford2D3DS, Replica360_4K)
are gated behind the upstream EULAs; please obtain them from the linked
source pages. See NOTICE for the per-dataset licenses and
obligations, and each dataloader in dataloaders/ for the
exact on-disk layout it expects (image / depth / mask / normals naming).
evaluation/depth_evaluation.py scores the cached depth predictions against
ground truth, on Matterport3D360, Stanford2D3DS or ZuriPano. Run
it once per dataset:
# Metric depth (in-metres, no alignment).
python evaluation/depth_evaluation.py \
--data_path /path/to/datasets \
--dataset <dataset-name> \
--pred_path results \
--alignment_type metric
# Scale-invariant depth (least-squares scale alignment).
python evaluation/depth_evaluation.py \
--data_path /path/to/datasets \
--dataset <dataset-name> \
--pred_path results \
--alignment_type scaleReported metrics: AbsRel, RMSE (linear), δ₁, averaged uniformly
over the valid ERP pixels. Results land in
<pred_path>/depth/evaluation_metrics_<alignment>.txt.
For surface normals on Structured3D:
python evaluation/normals_evaluation.py \
--data_path /path/to/datasets \
--dataset Structured3D \
--pred_path resultsTo quantify cubemap-stitching artefacts in the depth predictions (Table 4a in the paper) on Replica360_4K:
python evaluation/seams_evaluation.py \
--data_path /path/to/datasets \
--dataset Replica360_4K \
--pred_path resultsReports the three metrics seam_defect_density, seam_prevalence,
seam_severity — see the appendix of our paper for the exact definitions.
python generate_point_cloud.py \
--data_path /path/to/datasets \
--dataset <dataset-name> \
--depth_path results \
--color_modality rgb \
--max_points 1000000GLBs land under <depth_path>/depth/point_clouds/ (or
<depth_path>/normals/point_clouds/ when colouring by predicted normals).
Pass --color_modality normals to colour by predicted normals.
If you use PaGeR, the released checkpoints, or the ZüriPano / PanoInfinigen datasets in your work, please cite:
@article{bozic2026pager,
title = {Unified Panoramic Geometry Estimation via Multi-View Foundation Models},
author = {Bozic, Vukasin and Slavkovic, Isidora and Narnhofer, Dominik and
Metzger, Nando and Rozumny, Denis and Schindler, Konrad and
Kalischek, Nikolai},
journal = {arXiv preprint arXiv:2605.26368},
year = {2026}
}PaGeR is released under three separate licenses, one per artifact class, because the training and modeling stack is not uniform.
| Artifact | License | File |
|---|---|---|
| Source code (this repository) | Apache License 2.0 | LICENSE |
Pretrained model weights (HuggingFace prs-eth/PaGeR*) |
CC BY-NC 4.0 — academic / non-commercial only | LICENSE-MODEL |
| ZüriPano dataset | CC BY 4.0 | (on the dataset HF repo) |
| PanoInfinigen — nature split | BSD 3-Clause (inherited from Infinigen) | (on the dataset HF repo) |
| PanoInfinigen — indoor split | BSD 3-Clause (inherited from Infinigen) | (on the dataset HF repo) |
| PanoInfinigen — urban split | CC BY-NC 4.0 (iCity-encumbered) | (on the dataset HF repo) |
The non-commercial restriction on the weights is inherited from the
Depth Anything 3
ViT-Giant backbone (CC BY-NC 4.0) and from the non-commercial / research-only
terms of PanoInfinigen — urban subset. See NOTICE for the full
third-party attribution list and per-dataset breakdown.
PaGeR builds on top of
Depth Anything 3 —
the backbone and the multi-view inference code under
src/depth_anything_3/ are derived from that
project. The training script is adapted from
Marigold-E2E-FT.
The indoor/outdoor classifier uses OpenAI CLIP via
open_clip.
