A tool for downloading and processing wildfire-related geospatial data from multiple sources for machine learning and analysis.
Software DOI: 10.5281/zenodo.20743742.
| Dataset | Description | Native Spatial Resolution | Temporal Resolution | Feature Name(s) |
|---|---|---|---|---|
| FEDS-MTBS | Fire perimeter polygons and active firelines tracked from clusters of satellite active-fire detections | 375 m | 12-hourly | burn_perimeter, fireline |
| VIIRS Active Fire | Fire Radiative Power (MW) reported at detected hotspot pixels, split by day vs. night overpass | 375 m | ~2 overpasses/day | frp_daytime, frp_nighttime |
| FEDS × VIIRS Active Fire (derived) | Maximum FRP from nearby hotspots painted onto each FEDS fireline segment | 375 m | 12-hourly | fireline_max_frp |
| 3DEP | Bare-earth elevation, plus a colored hill-shade RGB visualization derived from it | 1 m | Static | elevation, terrain_rgb |
| LANDFIRE | Canopy fuel layers: canopy bulk density and percent canopy cover | 30 m | Static | canopy_bulk_density, canopy_cover |
| NIFC IFPH (recent burns) | Most-recent burn year per pixel, rasterized from NIFC InteragencyFirePerimeterHistory perimeters that intersect the AOI in the prior N years (default 5; same-year fires only included if contained before the current event's t_start) |
task grid (default 30 m) | Updated ~monthly | recent_burn |
| HRRR | Near-surface weather forecast fields: 2 m relative humidity and 10 m wind components | 3 km | Hourly | r2, u10, v10 |
| Global Building Atlas | Per-building height estimates rasterized to a regular grid | 3 m | Static | building_height |
| WorldCover | Global land cover classification (11 classes) | 10 m | Static | landcover |
| Global LAI | Leaf Area Index retrieved from Sentinel-2 surface reflectance | 10 m | Single global snapshot | lai |
| Sentinel-2 Cloudless Mosaic | Cloud-free RGB composite built from Sentinel-2 surface-reflectance scenes | 10 m | Annual composite | sentinel2_rgb |
| Global WUI | Wildland–Urban Interface classes from a buildings × wildland-vegetation overlay | 10 m | Static | wui |
Requires Python 3.12+ (tested on 3.14) and uv. FireDataForge is run from source (it has no installed console script), so get the code and create the environment with:
git clone https://github.com/xiazeyu/FireDataForge.git
cd FireDataForge
uv syncuv sync creates a local .venv with every dependency. The python main.py …
and python plot.py … commands throughout this README assume that environment
is active — either activate it once per shell:
source .venv/bin/activate # Windows: .venv\Scripts\activate…or prefix each command with uv run (e.g. uv run python main.py <event_id>)
to run it in the project environment without activating.
The first time you run FireDataForge (or anytime via python main.py --setup),
a short interactive wizard configures the optional credentials and saves them to
a local .env file. Everything is optional — skip any step and the
wizard tells you exactly which output features become unavailable.
python main.py --setupWizard walkthrough — what it does and how to use it
Settings are read from .env, but values set in the real environment always take
precedence, so you can override per-run, e.g. FIRMS_MAP_KEY=... python main.py ....
| Credential / dependency | Stored as | Unlocks | If unavailable |
|---|---|---|---|
| Earth Engine auth + project | earthengine creds + project from earthengine set_project (optional EARTHENGINE_PROJECT override in .env) |
elevation, terrain_rgb, canopy_bulk_density, canopy_cover, building_height, landcover, lai, sentinel2_rgb |
those layers are skipped; all others still run |
| NASA FIRMS MAP_KEY | FIRMS_MAP_KEY |
frp_daytime, frp_nighttime for any year (streamed from the FIRMS Area API) |
pre-2025 events fall back to the bundled FEDS firepix archive; otherwise FRP is skipped |
| FEDS-MTBS archive (Zenodo) | optional full archive under datasets/FEDS25MTBS/ |
burn_perimeter, fireline, fireline_max_frp, perimeter-masked FRP, and the tightest fire window |
auto-streamed per fire from Zenodo into cache/ at run time; set FIREDATAFORGE_LAZY_FETCH=0 to disable, then those layers skip and FRP comes from FIRMS unmasked |
| (none) | — | wui, recent_burn, hrrr |
no credential needed; fetched from public services at run time (skipped fail-soft if a service is unreachable; hrrr only covers events on/after 2014-09-30) |
The pipeline is fail-soft: a missing dependency, an unauthenticated service, or
a server under maintenance only disables the layers that need it — every other layer
is still produced, a warning is logged, and the reason is written to the per-event
task_summary.json.
For the FEDS-MTBS and Global WUI datasets the wizard also asks how you want the
data on disk: full download (the whole archive is fetched and unpacked into
datasets/, trading disk for bandwidth on repeated runs) or on-the-fly (only
the bits each fire needs are fetched into cache/ as you go). Either way nothing
needs to be pre-staged by hand.
Two on-disk buckets (both optional):
datasets/— user-managed. Full archives you deliberately download (via the wizard or by hand). FireDataForge only writes here on an explicit user-requested download.cache/— software-managed. Everything fetched on the fly (per-fire GeoPackages, FIRMS slices, HRRR GRIBs, WUI tiles, the MTBS fire list) lands here. Readers checkdatasets/first, thencache/. Deletecache/anytime to reclaim space; it is transparently re-fetched. Deletedatasets/too to reclaim more (then everything streams on demand again).
The wizard's final step lets you choose where each fire's name + acreage come
from — the FEDS-MTBS fire list (Zenodo, FEDS-aligned acreage, 2012–2024, ~280 MB),
a built-once offline MTBS fire list (python main.py --build-firelist,
mtbs.gov, more recent), or live on-the-fly lookups — since the GeoPackage already
supplies everything else. See the
resolution table for the tradeoffs.
Manual setup — configure credentials yourself instead of using the wizard
Google Earth Engine
# Authenticate (one-time, opens browser)
earthengine authenticate
# Set your project ID
earthengine set_project <YOUR-PROJECT-ID>To get a project ID:
- Go to Google Cloud Console
- Create or select a project with Earth Engine enabled
- Copy the project ID
NASA FIRMS API Key (for VIIRS active fire)
Active-fire data is streamed on demand from the NASA
FIRMS Area API — only the
points inside each event's bounding box and date window are downloaded. Get a free MAP_KEY and either let the
wizard save it, export it, or add it to .env:
# Request a key (instant, free): https://firms.modaps.eosdis.nasa.gov/api/map_key/
export FIRMS_MAP_KEY=your-map-key-here
# ...or in .env: FIRMS_MAP_KEY=your-map-key-hereIf no FIRMS key is configured, the pipeline falls back to bundled archive CSVs in
datasets/FIRMS/ (see FIRMS / VIIRS Active Fire Dataset).
Pre-2025 events use FEDS firepix and need no key.
FEDS-MTBS archive (for burn_perimeter, fireline, fireline_max_frp)
No credential needed — each fire is streamed on demand. To pre-stage the examples or download the full archive, see FEDS-MTBS Dataset.
# Download the example fires from Zenodo
python main.py --fetch-examples
# Process a single fire event given its MTBS Event ID
python main.py CA3432611848120191010The argument is an MTBS Event ID — FireDataForge's canonical event key. Any
valid ID works (unrecognized IDs are resolved live from mtbs.gov, no archive
needed); see Looking up an MTBS Event ID and
Event ID Format for the character scheme.
CA3432611848120191010 is the 2019 Saddleridge fire, one of the bundled
example events.
python main.py <event_id> [options]# From comma-separated list
python main.py --batch CA123,CA456,CA789 [options]
# From a file (one event ID per line)
python main.py --batch events.txt [options]Options
| Option | Description | Default |
|---|---|---|
--batch |
Batch mode: file path or comma-separated event IDs | - |
--setup |
Run the interactive credential wizard and exit | - |
--build-firelist |
Download the full MTBS archive to the offline cache and exit | - |
--fetch-examples |
Download the example fires from the Zenodo reproducibility artifact (doi:10.5281/zenodo.20743743; into datasets/FEDS25MTBS/ + events.txt) and exit |
- |
-w, --workers |
Events processed in parallel in batch mode | 1 |
--layer-workers |
Concurrent layer downloads within a single event | 5 |
-r, --resolution |
Spatial resolution (meters) | 30 |
-b, --buffer |
Buffer around fire bounds (meters) | 100 |
-c, --crs |
Target coordinate reference system (must be a projected/metric CRS — see note below) | EPSG:5070 |
-o, --output_dir |
Output directory | output |
-t, --interpolation |
Intermediate frames between timesteps | 0 |
--cache_dir |
Root directory for all on-the-fly downloads (HRRR, FIRMS, FEDS, firepix, WUI, fire list); each caches under its own fixed subfolder | cache |
--only |
Only process specific feature(s), comma-separated | all |
-v, --verbose |
Enable verbose logging | False |
| Feature | Description |
|---|---|
burn_perimeter |
Fire perimeter time series from FEDS |
fireline |
Active fireline derived from consecutive perimeter differences |
fireline_max_frp |
Per-pixel maximum FRP along the fireline |
frp_daytime |
Daytime Fire Radiative Power (NASA FIRMS / FEDS firepix) |
frp_nighttime |
Nighttime Fire Radiative Power (NASA FIRMS / FEDS firepix) |
elevation |
USGS 3DEP elevation |
canopy_bulk_density, canopy_cover |
LANDFIRE canopy fuel layers (alias: landfire) |
recent_burn |
Most-recent burn year per pixel from NIFC InteragencyFirePerimeterHistory (default 5-yr lookback; lookback recorded in the layer's note) |
building_height |
Global Building Atlas heights |
landcover |
ESA WorldCover classification |
lai |
Leaf Area Index |
sentinel2_rgb |
Sentinel-2 cloudless RGB mosaic |
terrain_rgb |
Colored shaded-relief terrain (Google-Maps style, RGB) |
wui |
Wildland-Urban Interface classification |
r2, u10, v10 |
HRRR weather: relative humidity + 10 m wind components (alias: hrrr) |
Each name above is an output file stem. The convenience aliases landfire and
hrrr select all of their respective layers.
Target CRS must be projected/metric.
--resolutionand--bufferare in meters, and the grid snaps the projected bounds to whole multiples of the resolution. A geographic CRS in degrees such asEPSG:4326is therefore not supported and is rejected up front with a clear error, since meter-valued settings applied in degrees would produce a nonsensical grid. Use a projected, meter-based CRS: the defaultEPSG:5070(CONUS Albers), a UTM zone, or similar.
Examples
# High resolution processing
python main.py CA3432611848120191010 -r 10 -v
# With temporal interpolation (3 intermediate frames)
python main.py CA3432611848120191010 -t 3
# Custom output directory
python main.py CA3432611848120191010 -o ./my_output
# Batch process from file with 4 workers
python main.py --batch events.txt -w 4 -o results/
# Batch process specific events
python main.py --batch CA123,CA456,CA789 --workers 3
# Process only a single feature (for quick debugging)
python main.py CA3432611848120191010 --only frp_daytime
# Process multiple specific features
python main.py CA3432611848120191010 --only frp_daytime,frp_nighttime,elevation
# Regenerate only weather data
python main.py CA3432611848120191010 --only hrrrThe batch summary is saved to output/batch_summary.json.
Data is saved as .npy files in output/<event_id>/:
output/CA3432611848120191010/
├── task_summary.json # Per-layer outcome (ok / skipped / failed) + reasons + metadata
├── task_info.npy # Processing configuration
├── coordinates.npy # Pixel-center x/y coordinates + CRS for the grid
├── burn_perimeter.npy # Fire perimeter time series
├── fireline.npy # Active fireline time series (perimeter differences)
├── fireline_max_frp.npy # Max FRP painted onto each fireline segment
├── frp_daytime.npy # Daytime Fire Radiative Power (MW)
├── frp_nighttime.npy # Nighttime Fire Radiative Power (MW)
├── elevation.npy # Terrain elevation
├── canopy_bulk_density.npy # Canopy Bulk Density
├── canopy_cover.npy # Canopy Cover
├── recent_burn.npy # Most-recent burn year per pixel (NIFC IFPH, NaN = unburned)
├── r2.npy # Relative humidity
├── u10.npy # Wind U component
├── v10.npy # Wind V component
├── building_height.npy # Building heights
├── landcover.npy # Land cover classes
├── lai.npy # Leaf Area Index
├── sentinel2_rgb.npy # RGB Sentinel-2 cloudless mosaic
├── terrain_rgb.npy # Colored shaded-relief terrain RGB (H, W, 3)
└── wui.npy # Wildland-Urban Interface classification
Only the layers that were successfully produced are written; any that were skipped
or failed are omitted from the directory but always recorded in
task_summary.json with the reason.
Task summary (task_summary.json) — per-layer outcome schema
Every event writes a task_summary.json describing the run and the outcome of each
layer, so a partial run is self-documenting:
{
"event_id": "CA3432611848120191010",
"name": "SADDLERIDGE", "year": 2019, "status": "partial",
"crs": "EPSG:5070", "resolution_m": 30, "shape": [275, 377],
"t_start": "2019-10-01T12:00:00", "t_end": "2019-10-16T12:00:00",
"t_end_estimated": true,
"has_feds_archive": false, "earth_engine": true, "firms_key": true,
"notes": ["t_end is an estimate (t_start + 15 days): no FEDS perimeter ..."],
"layers": {
"elevation": {"status": "ok", "files": ["elevation.npy"]},
"frp_daytime": {"status": "ok", "files": ["frp_daytime.npy"], "n_frames": 7},
"burn_perimeter": {"status": "skipped", "reason": "no local FEDS archive"},
"wui": {"status": "failed", "reason": "..."}
},
"counts": {"ok": 2, "skipped": 1, "failed": 1}
}status is "ok" when all layers succeed, "partial" when any failed, or
"error" if the event itself could not be resolved. Batch runs additionally write
an aggregated output/batch_summary.json.
Grid coordinates (coordinates.npy) — georeferencing the saved arrays
Every event directory also contains coordinates.npy, which stores the
pixel-center coordinates of the common output grid together with the CRS.
Almost all raster layers (elevation.npy, frp_*.npy, wui.npy, ...) are
sampled on this exact grid, so this file is the single source of truth for
georeferencing the arrays — useful for wrapping outputs into xarray
DataArrays or re-projecting them when preparing publication figures.
Weather layers are the exception.
r2.npy,u10.npy, andv10.npy(HRRR) share the sameboundsandcrsbut sit on a coarser ~500 m grid (HRRR is ~3 km natively; resampling an hourly series to 30 m would bloat the files for no added detail). Each records its grid size incurrent_resolution; reconstruct their grid from the sharedboundsand the array's own shape, not fromcoordinates.npy.
from main import load_numpy
coords = load_numpy('output/CA3432611848120191010/coordinates.npy')
x, y = coords.data # 1-D arrays, shape (width,) and (height,)
geo = coords.georeference # typed GeoReference (see schemas.py)
crs = geo.crs # e.g. 'EPSG:5070'
crs_wkt = geo.crs_wkt # full WKT2 (works without an EPSG db)
crs_proj4 = geo.crs_proj4 # legacy PROJ string
bounds = geo.bounds # (minx, miny, maxx, maxy)
height, width = geo.shape
a, b, c, d, e, f = geo.transform # affine: (col, row) -> (x, y)For standard EPSG codes, geo.crs alone is enough for pyproj /
rasterio / cartopy. The extra crs_wkt and crs_proj4 fields are
included so the file is fully self-describing — useful for archival,
custom CRSes (e.g. EQUI7, HRRR Lambert), or environments without a PROJ
database.
y is ordered top-to-bottom (north → south) to match the row order of the
saved rasters.
Array shapes & dtypes — what each .npy holds
Every .npy (except task_info.npy) loads via load_numpy into a DataLayer
whose .data is always a list of frames (schemas.py). The frame rank and
the number of frames follow the layer's type:
| Layer kind | Layers | len(data) |
Frame shape | dtype |
|---|---|---|---|---|
| Static raster (integer) | elevation, canopy_bulk_density, canopy_cover |
1 (timestamps == [t_start]) |
(H, W) |
int16 |
| Static raster (float) | lai, building_height, recent_burn |
1 | (H, W) |
float (32/64) |
| Static categorical | landcover, wui |
1 | (H, W) |
int (landcover int16, wui uint8) |
| Time-varying mask | burn_perimeter, fireline |
T frames (len(data) == len(timestamps)) |
(H, W) |
bool |
| Time-varying raster | fireline_max_frp, frp_daytime, frp_nighttime |
T frames |
(H, W) |
float |
| Time-varying raster | r2, u10, v10 (HRRR) |
T frames |
(h, w) (coarser grid) |
float |
| RGB visualization | sentinel2_rgb, terrain_rgb |
1 | (H, W, 3) |
uint8 |
| Coordinates | coordinates |
2 | [x: (W,), y: (H,)] |
float |
(H, W) is the common grid in coordinates.npy for every layer except the
HRRR fields (r2/u10/v10), which sit on a coarser (h, w) grid recorded in
their current_resolution — derive it from the shared bounds and the array's
own shape. For time-varying layers data[i] is the frame observed at
timestamps[i] (the pairing the temporal cursor relies on; see
temporal alignment). Per-pixel nodata
sentinels are in the same section's table. Each layer also carries, inside its
DataLayer, native_resolution (the source's true resolution, in meters),
current_resolution (the grid it is sampled on, in meters), unit,
source, and — for categorical layers — categories.
# Plot everything (default): the combined overview grid, one PNG per channel,
# and one time-series figure per multi-frame layer
python plot.py CA3432611848120191010Other plotting options
# Only the combined overview grid
python plot.py CA3432611848120191010 --mode overview
# Only one PNG per channel
python plot.py CA3432611848120191010 --mode channels
# Only the time-series figures (one per multi-frame layer)
python plot.py CA3432611848120191010 --mode timeseries
# Overview grid + per-channel PNGs, but skip time series
python plot.py CA3432611848120191010 --mode both
# Also write the overview as PDF (PNG only by default)
python plot.py CA3432611848120191010 --pdf
# Plot and display interactively
python plot.py CA3432611848120191010 --show
# Plot the time series for ONLY one layer (e.g., burn perimeters)
python plot.py CA3432611848120191010 -t burn_perimeter
# Batch plot multiple events
python plot.py --batch events.txt
# Batch plot from comma-separated list
python plot.py --batch CA123,CA456,CA789from main import load_numpy
# Load a data file
data = load_numpy('output/CA3432611848120191010/elevation.npy')
print(data.name) # 'elevation'
print(data.data[0].shape) # (height, width)
print(data.unit) # 'm'The output format is a small, stable API defined in schemas.py —
the typed dataclasses every layer deserializes into. load_numpy returns a
DataLayer, the universal envelope wrapping a layer's frames, timestamps, unit,
categories, and (for the grid) a GeoReference; FireEvent, ProcessingTask,
and ProcessingArgs describe the event and run configuration, and
SCHEMA_VERSION tags the envelope version.
schemas.py is standard-library only (no rasterio / earthengine / other
geo dependencies), so a third-party application can import — or simply vendor —
this single file to read, type-check, and validate FireDataForge outputs without
installing the rest of the pipeline.
from firedataforge import forge_event, get_fire_info, get_task_info, ProcessingArgs
# One call: resolve, retrieve, harmonize, and write every available layer.
summary = forge_event("CA3432611848120191010", ProcessingArgs(resolution=30))
print(summary["counts"]) # {'ok': ..., 'skipped': ..., 'failed': ...}
# Or step through the resolution stages.
fire = get_fire_info("CA3432611848120191010")
print(f"{fire.name}: {fire.acres_burned} acres")
task = get_task_info(fire, resolution=30)
print(f"Grid: {task.shape}, CRS: {task.crs}")-
examples/ml_dataloader.py— a PyTorchDatasetthat stacks the static layers of each event into a(C, H, W)tensor (python examples/ml_dataloader.py [output]). -
examples/fire_spread_demo.py— a self-contained NumPy fire-spread automaton that consumes the harmonized terrain/fuel/wind layers and shows a 3-frame progression with matplotlib (python examples/fire_spread_demo.py [output/<event_id>]).Both are written as notebook-style
# %%cell scripts — run them top-to-bottom, or open them in a Jupyter/VS Code interactive window. -
validation/— quantitative checks (reprojection round-trip error, FRP conservation, categorical overall accuracy, continuous RMSE, sub-pixel registration); seevalidation/README.md.
Repository layout
The implementation lives in the firedataforge/ package; main.py is a thin CLI
entry point that re-exports the public API.
main.py CLI entry point + backward-compatible import surface
schemas.py public data contract — stdlib-only output dataclasses
plot.py visualization of saved layers
firedataforge/
├── constants.py paths + source-API constants
├── config.py credentials, first-run wizard, dataset discovery
├── events.py fire-list resolution + ProcessingTask (grid + time window)
├── io.py .npy + coordinate persistence
├── pipeline.py fail-soft per-event / batch orchestration + task summaries
├── cli.py argument parsing
├── examples.py --fetch-examples: download the example bundle from Zenodo
├── remote_archive.py on-the-fly range-fetch of FEDS fires/firepix from Zenodo
├── progress.py download + zip-extraction progress helpers
└── sources/ one module per source family
├── mtbs.py MTBS Event-ID → metadata
├── feds.py perimeter / fireline / fireline_max_frp
├── frp.py VIIRS FRP (FIRMS / firepix)
├── gee.py Earth Engine layers (3DEP, LANDFIRE, GBA, WorldCover, LAI, S2, terrain)
├── weather.py HRRR
├── wui.py Global WUI
└── nifc.py recent burns
Every source is reprojected to the target CRS and resampled to the target grid with a single, direction-independent method chosen by the data type — the same method is used whether the native source is finer or coarser than the target grid (there is no conditional on the scale ratio). Categorical layers keep nearest-neighbour deliberately: it introduces no mixed/invented classes and preserves the expected class proportions (majority-vote aggregation would erode thin minority features such as narrow WUI strips).
| Data type | Layers | Method | Why |
|---|---|---|---|
| Continuous raster | elevation, canopy_bulk_density, canopy_cover, lai, r2, u10, v10 |
Bilinear | Smooth fields; bilinear avoids blocky artifacts. When upsampling a coarse source (e.g. 3 km HRRR → 30 m) it interpolates but adds no real detail; when downsampling a fine source (e.g. 1 m DEM → 30 m) it point-samples and discards sub-pixel variance (quantified in validation/). |
| Categorical raster | landcover, wui |
Nearest neighbour | No mixed/invented classes; preserves class proportions. At heavy downsampling, sub-grid features smaller than a pixel may be dropped rather than blended. |
| RGB visualization | sentinel2_rgb, terrain_rgb |
Nearest / bilinear | Display layers; resampling is cosmetic, not analytic. |
| Vector polygon | burn_perimeter, fireline |
Rasterization (all_touched) |
Any pixel the polygon touches is burned in, so thin firelines survive at coarse grids. |
| Vector polygon (weighted) | building_height |
Area-weighted mean | Larger building footprints contribute proportionally more to a pixel's mean height. |
| Point mass | frp_daytime, frp_nighttime |
Mass-preserving Gaussian splat (below) | Conserves total radiative power across the regrid. |
| Point max | fireline_max_frp |
Per-segment nearby max | Keeps the observed peak FRP (MW) intensity rather than a spread share. |
Choosing the target resolution. A fine target grid does not create information that a coarse source lacks — upsampling 3 km HRRR weather to 30 m yields smooth but not genuinely fine fields, so treat such layers as their native resolution despite the grid spacing. Conversely a coarse target grid discards real local variation in fine sources (terrain, fuels, buildings, WUI). Each layer records its true
native_resolution(and, where it differs, itscurrent_resolution) — both in meters — in the saved metadata so the scale mismatch is never hidden.
Each VIIRS active-fire detection reports a Fire Radiative Power (FRP, MW) at a point. Rather than dropping the whole value into one grid cell, the detection's FRP is spread over a Gaussian footprint the size of the VIIRS sensor pixel, so the rasterized field reflects the sensor's true spatial uncertainty while conserving the total radiative power.
For a detection with value F at grid position (pₓ, p_y) (pixel units):
- Footprint width.
σ = (source_resolution / target_resolution) / 2pixels, withsource_resolution = 375 m(VIIRS). At a 30 m grid,σ = 375/30/2 = 6.25 px(= 187.5 m), giving a full-width-at-half-maximumFWHM = 2√(2 ln 2)·σ ≈ 2.355 σ ≈ 441 m. - Kernel extent. Weights are evaluated over a square window of radius
⌈3σ⌉pixels (covering ±3σ, ≈ 99.7 % of the Gaussian mass). - Weights. A pixel whose center is at distance
d(pixels) from the detection getsw = exp(−d² / (2σ²)). - Normalization (conservation). The in-grid weights are normalized to sum to
one and the deposit is
F · w / Σw. Summing the rasterized footprint therefore recoversFexactly for any detection whose footprint lies inside the grid; the only loss is the fraction of a footprint that falls off the grid edge. Each pixel value is thus a share of the detection's FRP (≪ the observed MW), not the observed value itself.
The validation/ suite re-splats each event's detections and reports the
relative conservation error (typically far below 1 %); see
validation/README.md.
FireDataForge degrades gracefully at three levels — a whole layer, an individual timestep, and an individual pixel — and records every gap, so a partial cube is never silently mistaken for a complete one.
Layer level — fail-soft. If a source is unauthenticated, unreachable, or under
maintenance, only the layers that depend on it are dropped; every other layer is
still produced. A dropped layer is omitted from the event directory and recorded
in task_summary.json with status: "skipped"/"failed" and a human-readable
reason (see the Output section's task-summary schema).
Pixel level — per-layer nodata sentinels. There is no universal nodata
value; the sentinel follows the data type, so check this table (and each layer's
unit) before masking:
| Layer(s) | dtype | "No data / absent" value |
|---|---|---|
elevation, canopy_bulk_density, canopy_cover (int16), lai, building_height (float) |
int16 / float | 0 — Earth Engine masks (out-of-coverage, water) are filled with 0 (unmask(0)); these layers carry no NaN, so where the distinction matters treat 0 as "no data" rather than a measured zero |
landcover, wui |
int | 0 — outside the source's coverage; defined classes are ≥ 10 (WorldCover) / 1–8 (WUI) |
recent_burn |
float | NaN = pixel never burned in the lookback window (stated explicitly in the layer unit) |
frp_daytime, frp_nighttime, fireline_max_frp |
float | 0 = no active-fire detection at/near the pixel (FRP is a conserved share of MW, so 0 means "no fire", not "missing") |
burn_perimeter, fireline |
bool | False (0) = outside the perimeter / not on the fireline at that timestep; True (1) = inside |
r2, u10, v10 (HRRR) |
float | dense over CONUS — no per-pixel nodata in practice |
sentinel2_rgb, terrain_rgb |
uint8 | display layers; 0 where the source mosaic had no pixel |
Timestep level — the per-source datetime cursor. Time-varying layers
(burn_perimeter, fireline, fireline_max_frp, frp_*, and the HRRR fields)
store only the frames actually observed, held in DataLayer.data paired
one-to-one with DataLayer.timestamps (schemas.py). Sources observe at different,
irregular cadences — FEDS perimeters/firelines every 12 h, VIIRS at ~2 overpasses/day,
HRRR hourly — and each keeps its own timestamp vector, so cadences are never
resampled onto a shared clock. A consumer reads each layer through a datetime
cursor: at simulation time t, take the frame at the most recent
timestamps[i] <= t for that layer, advancing each source independently — the same
"most-recent observation ≤ t" rule the pipeline itself uses to mask FRP to the live
perimeter (firedataforge/sources/frp.py). Consequences:
- A skipped or missing overpass is a no-op: the previous frame stays current
until the next real observation. The pipeline never forward-fills synthetic values
or fabricates a frame for a cadence it did not observe. (Pass
--interpolation Nto explicitly synthesizeNSDF-interpolated perimeter frames between timesteps.) - The cursor never exposes a not-yet-valid observation: it cannot return a frame
whose timestamp is after
t, so temporal alignment is correct by construction — there is nothing to "synchronize" after the fact. - Out-of-window / empty sources are excluded upstream: every frame is bounded to
the event's active-burning window (
t_start–t_end); a source with zero valid observations in that window degrades to the layer-level fail-soft case above rather than emitting empty frames.
Beyond gaps, every .npy envelope is self-describing: it carries source
(provenance/attribution), unit, native_resolution (the source's resolution in
meters), timestamps, and — for categorical layers — a categories map,
while the grid CRS/transform lives in coordinates.npy. See
Data contract (schemas.py).
Any MTBS Event ID (Monitoring Trends in Burn Severity) is a valid input, and
you do not need the FEDS-MTBS archive to choose or resolve one: unrecognized
IDs are looked up live from mtbs.gov at run time, and
python main.py --build-firelist caches the full MTBS list offline for browsing
and network-free resolution.
If you have the full FEDS-MTBS archive, the summary
fireslist_FEDS25MTBS_2012-2024.geojson (the Event_ID column) is a ready-made
list of all 7,739 FEDS-covered events (2012–2024).
See Event ID Format for the {STATE}{LAT}{LON}{DATE} scheme.
python main.py --fetch-examples stages the eight fires below and writes their IDs
to events.txt at the repo root (ready for --batch events.txt); they are the
events used for the benchmark and the
quantitative validation. The bundle — these eight fires
plus the benchmark and validation reference outputs — is archived on Zenodo as a
citable reproducibility artifact
(doi:10.5281/zenodo.20743743).
They span 2013–2025 and ~9.7k–189k
acres across 5 California (chaparral / coastal, wind-driven) and 3 Colorado
(montane conifer / grassland, terrain-driven) fires, chosen to exercise the pipeline
across diverse fuels, scales, and spread regimes.
| Fire | Year | State | MTBS Event ID | Burned area (ac) | Active-fire window (UTC) | Grid (H×W) |
|---|---|---|---|---|---|---|
| Black Forest | 2013 | CO | CO3901210474920130611 |
11,885 | 2013-06-11 → 06-13 | 369 × 462 |
| Tubbs | 2017 | CA | CA3859812261820171009 |
36,981 | 2017-10-09 → 10-15 | 848 × 681 |
| Spring Creek | 2018 | CO | CO3749610529120180627 |
107,108 | 2018-06-28 → 07-10 | 1194 × 944 |
| Camp | 2018 | CA | CA3982012144020181108 |
153,687 | 2018-11-08 → 11-20 | 1397 × 1448 |
| Saddleridge | 2019 | CA | CA3432611848120191010 |
9,654 | 2019-10-01 → 10-12 | 361 × 388 |
| East Troublesome | 2020 | CO | CO4020310623920201014 |
188,924 | 2020-10-14 → 10-24 | 1016 × 1602 |
| Palisades | 2025 | CA | CA3406811855120250107 |
23,448 † | 2025-01-07 → 01-11 | 572 × 716 |
| Eaton | 2025 | CA | CA3419211810520250108 |
14,021 † | 2025-01-08 → 01-10 | 396 × 519 |
Burned area is the MTBS BurnBndAc. The two 2025 fires (Palisades, Eaton) are not
yet in the MTBS final record, so the value marked † is the MTBS Provisional
Initial Assessment acreage instead (the pre-final figure FireDataForge resolves
these events to; see PROVISIONAL_IA in firedataforge/sources/mtbs.py). It is
not directly comparable to the FEDS VIIRS perimeter area in each fire's
GeoPackage, which runs larger (~31.8k ac Palisades, ~18.6k ac Eaton) because the
375 m active-fire perimeter over-bounds the refined MTBS burn boundary.
The active-fire window is each event's perimeter-growth
period taken from the FEDS progression (observation-derived, not the estimated
fallback); the exact t_start/t_end (including the time of day) are in each
event's task_summary.json. Grid is the output raster size on the default
EPSG:5070 30 m grid (100 m buffer); it scales with -r/-b.
If you know a fire by name, year, or location rather than by ID:
- Open the MTBS Data Explorer (interactive map) or the Direct Download / search page.
- Filter by fire name, year, and state/region (or click the fire on the map).
- Read the Fire ID (a.k.a. Event ID) field of the matching record — e.g.
CA3432611848120191010— and pass it topython main.py <Event_ID>.
Querying by the exact MTBS Event ID (rather than a name or bounding box) is what
makes event matching unambiguous and reproducible: one ID always resolves to the
same fire. To browse offline, run python main.py --build-firelist once and
search the resulting cache/mtbs_firelist.csv.
FEDS-MTBS derives fire perimeters and firelines every 12 hours from VIIRS active-fire hotspots via object-based tracking, constrained to MTBS burn records.
Note: The FEDS-MTBS archive is optional. FireDataForge streams each requested fire on demand, a few hundred KB per fire with no full download — run
python main.py --fetch-examplesto stage the eight example fires, or grab the full archive for heavy offline use. Without it the FEDS layers —burn_perimeter,fireline,fireline_max_frp— are skipped and FRP comes from NASA FIRMS unmasked; every other layer is unaffected.
Dataset details — source, event-ID resolution, setup, and ID format
- Publication: Chen, Y. et al. (2022). California wildfire spread derived using VIIRS satellite observations and an object-based tracking system. Scientific Data. https://doi.org/10.1038/s41597-022-01343-0
- Dataset (used here): FEDS-MTBS, the MTBS-constrained extension from the UCI–UBC–NASA fire-tracking group, published on Zenodo (DOI 10.5281/zenodo.20187962).
- Temporal Coverage: 2012–2024 fire seasons (7,739 fires; FireDataForge also ships two 2025 example fires)
- Resolution: 375 m (VIIRS native resolution)
The overall priority is gpkg › FEDS-MTBS fire list › MTBS fire list › MTBS online. The GeoPackage ranks first but has no fire name, so the name/acreage fall to the first fire-list/online source that has the event, while the gpkg (when present) supplies the bounds and active-burning window. Concretely, fire metadata (name, year, acres, bounds, start/end) is resolved in order, first hit wins:
- FEDS-MTBS fire list (Zenodo) in
datasets/FEDS25MTBS/— the bundledfireslist_examples.csv(FEDS perimeter bbox plus bothtstandtedfor the eight demo fires; preferred when present) or the releasedfireslist_FEDS25MTBS_2012-2024.geojsonGeoPackage (MTBS final-perimeter bboxIg_Datefor all 7,739 fires).
- MTBS fire list (
cache/mtbs_firelist.csv, frommtbs.gov) — a different source: grown from prior live lookups, or pre-built withpython main.py --build-firelist; covers all ~30k MTBS fires; carries the MTBS burn-boundary bbox. - Live MTBS service (
mtbs.gov) + a small Provisional IA supplement for very recent fires — the resolved record is appended to the offline MTBS list. - FEDS GeoPackage only — if none of the above has the event but a local (or
fetchable)
<event_id>.gpkgexists, the run still proceeds with the Event ID as the name and the gpkg's bounds/window.
Whenever a local GeoPackage exists, its perimeter time series and extent take
precedence: t_end comes from the progression (tightest), then the example list's
ted, then an estimate of t_start + 15 days (flagged t_end_estimated: true in
task_summary.json); bounds come from the perimeter extent, then the fire-list
bbox, then the MTBS bbox. So the gpkg alone supplies the window and bounds — the
fire list mainly adds the display name and acreage for offline use.
Because the name/acreage are all the fire list adds, the source is your choice
(the setup wizard's step 5 stages it, or pick by what you place in
datasets/FEDS25MTBS/):
| Source | Acreage | Coverage | Reliability / cost |
|---|---|---|---|
FEDS-MTBS fire list (fireslist_FEDS25MTBS_2012-2024.geojson, Zenodo) |
aligned to FEDS | 2012–2024 only | offline, most reliable; ~280 MB |
MTBS fire list (cache/mtbs_firelist.csv, --build-firelist) |
MTBS (not FEDS-aligned) | all ~30k MTBS fires, more recent | offline once built (~30 s) |
On-the-fly (live mtbs.gov) |
MTBS (not FEDS-aligned) | most up-to-date | per-event network, least reliable |
If none is available, the run still proceeds with the Event ID as the name.
Three ways to get the data, in increasing weight:
- Nothing (default, on-the-fly) — when you forge a fire whose GeoPackage
isn't local, FireDataForge range-fetches just that fire (and, on first use of a
year, that year's firepix CSV) out of the Zenodo archive and caches it under
cache/FEDS25MTBS/<year>/<event_id>.gpkg(the same layout as the archive). Needs network at run time; setFIREDATAFORGE_LAZY_FETCH=0to turn this off for a fully offline/deterministic run. - Examples —
python main.py --fetch-examplesdownloadsexamples.zipand unzips it at the repo root, pulling the eight demo fires up front intodatasets/FEDS25MTBS/(with their firepix, the offlinefireslist_examples.csv, and a top-levelevents.txt). - Full archive — let the wizard download it (or call
download_full_feds_archive()): it pullsFEDS25MTBS.zip(GeoPackages + firepix) intodatasets/FEDS25MTBS/so all fires resolve offline. The fire name/acreage list is a separate, optional choice (it is not in the zip): stagefireslist_FEDS25MTBS_2012-2024.geojsonvia the wizard's step 5 ordownload_feds_firelist(), or skip it and let names come frommtbs.gov/ the Event ID — see How an Event ID is resolved.
The loader checks the user archive (datasets/FEDS25MTBS/) first, then the cache
(cache/FEDS25MTBS/), and searches recursively for <event_id>.gpkg:
datasets/FEDS25MTBS/
├── fireslist_FEDS25MTBS_2012-2024.geojson # Optional: MTBS perimeters + metadata (7,739 fires)
├── fireslist_examples.csv # Example fire metadata (Event_ID, Year, tst, ted, bbox)
├── firepix/ # Per-fire VIIRS active-fire CSVs (pre-2025 FRP source)
├── 2012/
│ └── CA3245811923420120801.gpkg
├── ...
└── 2024/
└── ...
Each .gpkg file contains the perimeter and fireline layers with fire
boundary polygons at each timestep.
FEDS-MTBS reuses the MTBS Event ID as its canonical event key, so the
same identifier addresses both the MTBS burn record and the FEDS-MTBS
perimeter time series. The ID follows the pattern {STATE}{LAT}{LON}{DATE}:
- STATE: 2-letter state code (e.g.,
CA) - LAT: Latitude × 1000 (5 digits, e.g.,
34326for 34.326°) - LON: Longitude × 1000, unsigned (6 digits, e.g.,
118481for 118.481°) - DATE: Fire start date as YYYYMMDD (e.g.,
20191010)
Example: CA3432611848120191010 = California fire at (34.326°N, 118.481°W) starting October 10, 2019
Fire Radiative Power comes from NASA FIRMS VIIRS (Collection 2) active-fire
detections. When the FEDS archive is available, pre-2025 fires use the bundled FEDS
firepix CSVs and FRP is masked to the perimeter at each timestep; otherwise FRP is
streamed from FIRMS for any year and left unmasked (perimeter_masked: false in the
layer metadata). The FIRMS Area API archive (S-NPP) covers the full FEDS period.
Dataset details — source, streaming, and offline fallback
- Provider: NASA FIRMS (Fire Information for Resource Management System)
- Data Access: Area API — streamed per event
- Sources queried: VIIRS S-NPP and NOAA-20, each with
*_SP(standard processing / archive) and*_NRT(near real-time) - Resolution: 375 m
Both VIIRS platforms are queried and merged because a single platform can have gaps — e.g. S-NPP VIIRS had a multi-day outage during the July 2024 Park Fire (zero detections on its peak days) that NOAA-20 captured in full.
Set the FIRMS_MAP_KEY environment variable (see
Prerequisites). For each
event the pipeline requests only the bounding box and date window from the Area
API (in ≤5-day chunks, the API's per-request limit), merges all VIIRS
platform/processing sources, de-duplicates, and caches the small result under
cache/FIRMS/<event_id>.csv (typically a few hundred KB per event).
No multi-GB archive download is needed.
If FIRMS_MAP_KEY is not set, the pipeline reads full-archive VIIRS CSVs placed
directly in datasets/FIRMS/ (e.g. fire_archive_SV-C2_*.csv,
fire_nrt_SV-C2_*.csv), filtering them by the event's bounds and time range.
These large files are only required for this offline fallback and can be deleted
once streaming is configured.
The ESA WorldCover dataset provides global land cover classification at 10m resolution based on Sentinel-1 and Sentinel-2 data.
Dataset details — source and land-cover class table
- Provider: European Space Agency (ESA)
- Data Access: Google Earth Engine (
ESA/WorldCover/v200) - Temporal Coverage: 2021
- Resolution: 10m
- Coverage: Global
| Value | Class | Color |
|---|---|---|
| 10 | Tree Cover | Dark Green |
| 20 | Shrubland | Orange/Yellow |
| 30 | Grassland | Yellow |
| 40 | Cropland | Pink |
| 50 | Built-up | Red |
| 60 | Bare/Sparse Vegetation | Gray |
| 70 | Snow and Ice | White |
| 80 | Permanent Water Bodies | Blue |
| 90 | Herbaceous Wetland | Teal |
| 95 | Mangroves | Green |
| 100 | Moss and Lichen | Beige |
This dataset is automatically downloaded from Google Earth Engine during processing. No manual setup required.
The Global Wildland-Urban Interface (WUI) dataset maps where buildings and wildland vegetation meet or intermingle at 10m resolution globally.
Dataset details — source, WUI class table, and tile setup
- Publication: Schug, F. et al. (2023). The global wildland–urban interface. Nature. https://doi.org/10.1038/s41586-023-06320-0
- Data Repository: Available from SILVIS Lab, University of Wisconsin-Madison
- Temporal Coverage: ca. 2020
- Resolution: 10m
- Projection: EQUI7 Azimuthal Equidistant
| Value | Class | Description |
|---|---|---|
| 1 | Forest/Shrub/Wetland Intermix WUI | Buildings intermixed with forest/shrub/wetland vegetation |
| 2 | Forest/Shrub/Wetland Interface WUI | Buildings adjacent to forest/shrub/wetland vegetation |
| 3 | Grassland Intermix WUI | Buildings intermixed with grassland vegetation |
| 4 | Grassland Interface WUI | Buildings adjacent to grassland vegetation |
| 5 | Non-WUI: Forest/Shrub/Wetland | Forest/shrub/wetland without WUI |
| 6 | Non-WUI: Grassland | Grassland without WUI |
| 7 | Non-WUI: Urban | Urban areas without wildland interface |
| 8 | Non-WUI: Other | Other land cover types |
No manual download required. The Global WUI data uses the EQUI7 tiling grid,
and the pipeline automatically determines which tiles a fire event needs. Any
tile not already present locally is streamed directly out of the remote continent
archive (North America) using HTTP byte-range requests — roughly 32 KB per
tile — instead of downloading the full ~3.8 GB archive. The loader checks the
user archive (datasets/GlobalWUI/) first, then the cache; streamed tiles are
cached under cache/GlobalWUI/ so repeated runs reuse them:
cache/GlobalWUI/
├── X0065_Y0040/
│ └── WUI.tif # streamed + cached on first use
├── X0066_Y0040/
│ └── WUI.tif
└── ...
If you already have the full archive extracted into datasets/GlobalWUI/ (or let
the wizard download it), those local tiles are used as-is and nothing is streamed.
The remote source is the SILVIS Lab geoserver (NA.zip); to pre-seed tiles for
offline use yourself, extract the relevant X..._Y... directories under
datasets/GlobalWUI/.
Every output layer and the underlying source dataset, with primary citation and the exact product version / Earth Engine asset ID or service endpoint used. All Earth Engine layers are accessed through Google Earth Engine (Gorelick et al. 2017).
| Output feature(s) | Source & primary citation | Version / asset ID or endpoint |
|---|---|---|
burn_perimeter, fireline, fireline_max_frp |
FEDS — Chen et al. 2022, Sci. Data, doi:10.1038/s41597-022-01343-0 | FEDS-MTBS extension (2012–2024), Zenodo doi:10.5281/zenodo.20187962; FEDS Sci. Data DOI is the algorithm reference |
frp_daytime, frp_nighttime |
NASA FIRMS — Davies et al. 2009, doi:10.1109/TGRS.2008.2002076; VIIRS 375 m product, Schroeder et al. 2014, doi:10.1016/j.rse.2013.12.008 | VIIRS S-NPP + NOAA-20, Collection 2; FIRMS Area API |
| (event keys / MTBS constraint) | MTBS — Eidenshink et al. 2007, doi:10.4996/fireecology.0301003 | mtbs.gov burn records |
elevation, terrain_rgb |
USGS 3DEP — U.S. Geological Survey 2015 | GEE USGS/3DEP/1m |
canopy_bulk_density, canopy_cover |
LANDFIRE 2.4.0 — USGS/USDA 2025 | GEE projects/sat-io/open-datasets/landfire/FUEL/{CBD,CC} |
recent_burn |
NIFC InteragencyFirePerimeterHistory (no DOI) | ArcGIS FeatureServer services3.arcgis.com/T4QMspbfLg3qTGWY |
r2, u10, v10 |
NOAA HRRR — Dowell et al. 2022, doi:10.1175/WAF-D-21-0151.1; retrieved via Herbie, Blaylock 2026, doi:10.5281/zenodo.18902673 | HRRR v4, AWS NODD archive |
building_height |
GlobalBuildingAtlas — Zhu et al. 2025, ESSD, doi:10.5194/essd-17-6647-2025 | GEE projects/sat-io/open-datasets/GLOBAL_BUILDING_ATLAS; dataset doi:10.14459/2025mp1782307 |
landcover |
ESA WorldCover — Zanaga et al. 2022, doi:10.5281/zenodo.7254221 | GEE ESA/WorldCover/v200 (2021) |
lai |
Sentinel-2 LAI — Mukherjee & Chakraborty 2026, doi:10.21203/rs.3.rs-8970245/v1 | GEE projects/tc-global-urban/assets/LAI_Grid_30deg_* (2020) |
sentinel2_rgb |
Sentinel-2 — Drusch et al. 2012, doi:10.1016/j.rse.2011.11.026 | GEE COPERNICUS/S2_SR_HARMONIZED |
wui |
Global WUI — Schug et al. 2023, Nature, doi:10.1038/s41586-023-06320-0 | SILVIS Lab GlobalWUI (EQUI7, ca. 2020) |
This work is supported by the University of Virginia's Environmental Institute. PNNL is operated by DOE by the Battelle Memorial Institute under contract DE-AC05-76RL01830.
This product contains modified Copernicus Sentinel data (2017–2025), used to derive the sentinel2_rgb surface-reflectance imagery layer.
See LICENSE.