Skip to content

Releases: NVIDIA/aistore

4.7

03 Jun 19:57

Choose a tag to compare

AIStore 4.7 is a small patch release for improved long-term stability and observability.

The primary changes fix unbounded growth of xaction notification metadata passed between IC members, extend Prometheus with Go runtime metrics and a per-node primary-proxy gauge, and tighten outbound network policy for the external downloader.


IC Notification Listener Cleanup

Finished xaction notification listeners were not being reclaimed reliably in long-lived clusters, causing IC ownership-table metadata to grow without bound.
Cleanup timing and IC synchronization of listener state are corrected.

Commits

  • 47628799e: core: fix notification listener time-based cleanup
  • d23009750: core: use consistent nl unmarshal, test atomic marshal round trips

Observability

Prometheus can now include a low-cardinality subset of Go runtime metrics (goroutines, GC, heap) alongside the existing AIS counters.
These are off by default and gated by the cluster Enable-Go-Runtime-Metrics feature flag.
Each node also exports ais_node_primary_info, a gauge labeled with primary_id for the primary proxy reflected in its cluster map.

Commits

  • 7e08a8461: stats: register subset of go runtime metrics with prometheus
  • da930ae0c: observability: conditionally publish Go-runtime Prometheus metrics
  • 0c395682c: stats: add primary info metric

Downloader egress controls

The downloader now applies egress policy to its outbound HTTP connections.
By default, destinations in non-public address space are not reachable.
Clusters that need to fetch from private networks can enable the Dload-Allow-Private-Egress feature flag; a subset of sensitive ranges stays blocked regardless of configuration.

Reported by tonghuaroot <tonghuaroot@gmail.com>. Thanks for the detailed write-up.

Commits

  • 06f8bfc83: ext/dload: restrict downloader egress to mitigate SSRF

Other Changes

  • Sendfile transmit — the GET sendfile path held back in 4.6 is now enabled for eligible plain-HTTP, file-backed objects, including range reads.
  • Shard index summary — new target-side summary-shard xaction aggregates indexed vs unindexed TAR coverage for a bucket.
  • Space cleanup — completed chunk manifests are validated by load, not LOM flag alone.
  • S3 compatibility — response XML root element names match the AWS spec for strict clients.

Commits

  • da541a1be: sendfile transmit path: enable; add range-read (part two)
  • fe855df60: shard index: add bucket shard summary xaction
  • 1739b1c59: space/cleanup: validate completed chunk manifests
  • 9a5d91c0d: ais/s3: emit AWS-spec XML root element names for S3 responses

4.6

28 May 00:20

Choose a tag to compare

4.6

AIStore 4.6 is a short-cycle reliability and scalability release following 4.5.

The drivers are practical: stabilize get-batch under heavy ML-style stress, replace the list-objects startup grace period with an explicit two-phase commit (2PC) protocol, avoid cgroup-v2 broadcast throttling in large clusters, harden primary-restart bootstrap, guard against prefetch misconfiguration, and make prefetch/blob-downloader behavior easier to observe and control.

Get-Batch is the largest area of work. Under heavy ML-style load - many concurrent batch requests, large objects, slow or disconnecting clients - the prior implementation could leak work items, take too long to abort, or mishandle late arrivals after cleanup. The 4.6 changes restructure the work-item lifecycle around explicit ownership, receive-side pinning, progress-based cleanup timers, bounded abort latency, and stricter late-arrival rejection.

Remote list-objects, also referred to as R-flow, gets a structural fix. The temporary startup-grace workaround from 4.5 is replaced with an explicit begin/commit/abort sequence: targets that are about to receive pages prepare first, the randomly selected designated target starts paging only after the receive side is ready, and abort has a dedicated cleanup path.

Broadcast parallelism in containerized deployments is also improved. In deployments using cgroup-v2 CPU quotas, a proxy with a small visible CPU count could collapse one-to-many broadcasts to single-digit parallelism even when targeting dozens of targets. A dedicated cluster broadcast wait group now covers the mostly I/O-bound intra-cluster fanout paths: keepalive, rebalance, generic broadcasts, and xaction target-to-target control.

The intra-cluster control plane is faster on the hot path. Keepalives and broadcasts - invoked behind essentially every cluster operation - no longer register a timer, allocate a context, and defer a cancel on every call. A separate fix covers a former primary restarting without a local cluster map: rather than assuming the primary role from local configuration alone, the proxy probes alternative URLs and validates returned node-state-info first.

Prefetch and blob-downloader observability is another release driver. A single prefetch job can spawn many blob-download child jobs. The 4.6 release adds clearer parent/child reporting, per-xaction stats, bucket-keyed Prometheus metrics, and rejection counters so operators can understand progress and throttling behavior without correlating logs manually. Get-Batch and list-objects extend their control-message output in the same direction. The cross-cutting picture is summarized in Observability.

No data-format migration is required. Operational and API behavior changes are summarized in Upgrade Notes.


Table of Contents

  1. Get-Batch
  2. Remote List-Objects (R-flow)
  3. Large-Cluster Broadcast Scalability
  4. Intra-Cluster Control Plane
  5. Primary Restart (bootstrap)
  6. Prefetch and Blob Downloader
  7. Space Cleanup
  8. Shard-Index Cleanup
  9. Xaction Go API
  10. ETL
  11. Python SDK and aisloader
  12. Other Fixes
  13. Documentation and Tools
  14. Observability
  15. Upgrade Notes

Get-Batch

Work-item lifecycle

Get-Batch is highly concurrent by design: a single x-moss xaction multiplexes many batch requests, each driven by a remote client. A slow or disconnected client can leave its work item parked on the target side - receivers still pinned, SGLs still held - while making no forward progress. We call these abandoned work items; detecting them reliably without falsely killing legitimately slow batches is one of the problems 4.6 addresses.

Work-item ownership is now an explicit three-state model rather than a single demand flag. Assembly claims a work item before consuming it; abandoned-work GC may clean only unowned work; and receive paths use pin/unpin discipline, so data and control receives cannot accidentally keep touching a work item that is already cleaning up or abandoned.

Abandoned work items are marked before cleanup, and new receives are rejected once cleanup begins. Cleanup waits for a stable receive/assembly quiet window, using last-receive and last-assembly progress timestamps rather than only the original start time. This matters for large batches that can legitimately run for a long time while still making forward progress.

The work-item memory pool (sync.Pool) is removed in favor of explicit receive-side pin/unpin. The housekeeper gets a dedicated x-moss maintenance callback, so abandoned-work cleanup no longer depends on the generic DemandBase idle/fini path, and each cleanup pass is capped so the housekeeper remains responsive.

The shared data-mover (SDM) receive-side lock ordering is tightened as well: the receiver-demux mutex is no longer held across Close(), and Rx drop now distinguishes SDM close from xaction abort.

Abort and late-arrival handling

Abort latency is now bounded. An unbounded wait-group.Wait() is replaced with a bounded poll, and drainRx is extracted with a brief mode for fast cleanup paths. Stuck receive-side work items are forced through cleanup via the abandoned path. Late assembly is rejected after x-moss is aborted or done; late Assemble calls after abort cleanup are tolerated; and internal stopped sentinels are treated as quiet assembly termination.

Stop/Abort terminal semantics are tightened. DemandBase.Stop is now idempotent and protected by its own atomic, and the common SetStopping returns its CAS result so get-batch can use it to eliminate a race against its own concurrent Abort. DemandBase.Stop is now called before waiting and quiescing.

Get-from-neighbor (GFN) is the internal target-to-target object-fetch path, used both during global rebalance, when objects migrate between targets, and by get-batch when a target needs an object it does not have locally. GFN error handling is corrected as well. When get-batch engages GFN, not-found remains a soft error while other GFN failures are treated as hard errors. Membership changes are detected and enforced during hole-timeout and failed-GFN paths, so a target-set change no longer looks like an ordinary object-level miss.

Observability and docs

Control messages now include target IDs and use clearer formatting. Documentation is updated to clarify supported scope and known limitations.

Commits

  • 127f69be6: get-batch: bound abort latency; amend gcAbandoned for stuck Rx
  • 564efd619: get-batch: exit gracefully via housekeeper (fix)
  • 926bdbb42: get-batch: amend abandoned work-item cleanup (add 2nd housekeep)
  • cd07a4ec4: get-batch: amend abandoned work-item cleanup (part two)
  • 9ad57d35f: get-batch: amend abandoned work-item cleanup (part three)
  • f3756a125: get-batch: abort vs active assembly (part four)
  • 3f247c528: get-batch: remove work-items mem-pool; Rx pin/unpin discipline
  • b87ea0cf0: get-batch: add cleanup stability window; track last assembly progress
  • edc4d7338: get-batch: gate pin-wi before touching last-Rx; recycled() to use last-asm
  • f6fe8d279: get-batch: escalate GFN error (fix)
  • 62937942f: get-batch: update CtlMsg to include target ID (observability; minor)
  • 198732215: docs/get_batch: clarify supported scope and known limitations
  • b55803190: core/xactions: Stop/Abort terminal semantics (fix)
  • 3924f9e81: shared data-mover: demux/transport lock-order

Remote List-Objects (R-flow)

Multi-target remote-bucket list-objects - the R-flow mechanism, one of several distinct list-objects paths in AIS - gains the explicit prepare phase anticipated in the 4.5 release notes.

In a multi-target R-flow run, one target is designated to perform the backend listing and distribute pages to the others. In 4.5, the designated target waited a brief grace period before issuing the backend call and beginning page distribution - a temporary workaround until the flow gained an explicit prepare phase.

The workaround is replaced with an explicit two-phase protocol. The begin phase broadcasts startup, renews the x-lso, opens streams, and registers receive paths. The commit phase performs strict x-lso lookup and starts paging. The abort phase performs best-effort lookup and aborts the xaction if present. The previously reserved feature flag used by the workaround is removed as part of the same cleanup.

The same area gains better runtime observability, extending the dedicated ais show job rendering introduced in 4.5. List-objects now reports lightweight per-reque...

Read more

4.5

13 May 23:04

Choose a tag to compare

4.5

AIStore 4.5 is a focused release on three major areas: global rebalance scalability, indexed archive access via the new shard index, and a get-batch flow reordering that substantially reduces memory pressure under load.

The work on global rebalance is one of the headlines. AIS removes the legacy per-object ACK machinery that had become a scalability ceiling for very large rebalances (counting millions of migrated objects), replaces the cleanup behavior that depended on that machinery with a new explicit cleanup mode, and optimizes the lifecycle around data movers, transport endpoints, stage transitions, and peer status queries.

Remote list-objects (R-flow) is another practical driver for this release. Multi-target remote-bucket listing now starts the (N - 1) targets before the designated target (DT) begins backend listing and page distribution. The same area also gets flow-control cleanup, corrected page accounting, and a dedicated CLI job view for remote-list xactions.

The shard index is a new experimental subsystem for indexed extraction from TAR shards. It lets GET and get-batch read files from TAR shards directly using a persisted index instead of scanning the full archive. The 4.5 implementation includes the index format and binary pack/unpack support, persistence in a system bucket, a bucket-scoped indexing xaction, CLI support, read-path integration, tests, and a micro-benchmark.

Streaming get-batch responses now use explicit write deadlines while sending data to the client. This lets AIS detect terminated, stalled, or unreachable clients promptly and abort the request instead of continuing to assemble and transmit a large batch that no client is still reading. The flow was reordered and optimized, work-item cancellation now propagates across senders, and admission control is stricter under load.

Authentication and access control adds support for externally provisioned RSA signing keys, JWKS refresh on cache miss for rotated key pairs, configurable maximum token age, and a more general signing-key configuration. Intra-cluster request validation is also tightened: spoofed caller headers are rejected on the public network, and internal-network checks require a validated Smap entry.

CLI and observability gain several user-visible improvements: dynamic ais show cluster cpu and ais show cluster memory views, a new ais performance intra-data view, shard-index commands, better rebalance rendering including cleanup mode, and improved help for force-join / split-brain recovery workflows.

This release preserves backward compatibility.
The few additive API fields, configurable behavior changes, and operational migration notes are summarized in Upgrade Notes.


Table of Contents

  1. Global Rebalance
  2. Shard Index
  3. Get-Batch
  4. Intra-Cluster Control Plane
  5. AuthN
  6. Stats and Observability
  7. Blob Downloader and Prefetch
  8. CLI
  9. Core, Config, and xactions
  10. Python SDK, ETL, and aisloader
  11. Documentation and Website
  12. Build, CI, and Tools
  13. Upgrade Notes

Global Rebalance

AIStore 4.5 delivers the largest rebalance update in several release cycles. The main theme is replacing per-object state with stage-level coordination and explicit cleanup semantics.

Per-object ACKs removed

Rebalance no longer tracks individual object acknowledgments. The previous mechanism - ACK messages back to the sender, sender-side maps of unacknowledged objects, retransmit and wait loops, and an ACK-driven lazy-delete path - has been removed in favor of stage-level coordination.

In the new model, traversal sends objects without keeping per-object state on the sender, and a post-traverse barrier takes the place of the old wait-for-ACK drain. Intra-cluster transport headers carry a compact opaque payload with the rebalance generation, so receivers can recognize and reject objects that arrive late from a previous run. Object and byte totals reported by ais show rebalance and ais show job now come directly from the transmitted counters rather than from ACK accounting.

The practical effect is lower memory pressure and simpler lifecycle behavior during large rebalances, where per-object ACK state had become the limiting factor.

Cleanup mode

Removing per-object ACKs also removed the old incidental mechanism that trimmed misplaced source copies after migration. AIStore 4.5 adds an explicit replacement: rebalance cleanup mode.

Cleanup mode walks local mountpaths, identifies misplaced object copies using Smap HRW ownership, and removes a misplaced copy only after verifying that the canonical copy exists in the expected location with matching object identity. Identity checks include size and checksum, and use version / ETag when available.

CLI:

$ ais start rebalance --cleanup
$ ais start rebalance --cleanup --force

By default, cleanup mode keeps diverged copies. With --force, it can also remove copies that differ from the canonical peer version. This is an advanced operator option.

Cleanup mode is intentionally distinct from regular rebalance:

  • it has its own preflight checks;
  • it refuses to start while rebalance or resilver is active;
  • it requires at least two active targets;
  • it bypasses config.Rebalance.Enabled;
  • it uses no data mover, no streams, and no GFN;
  • it skips EC-enabled buckets and busy objects.

ais show job and ais show rebalance render cleanup-mode runs with a dedicated view that reports removed objects and bytes rather than migration TX/RX counters.

Lifecycle and transport

A series of lifecycle changes makes rebalance more robust through abort, preempt, renew, and finalization paths: fresh data mover construction per run, safer handling of duplicate transport endpoints after abort, narrower mutex scope in the finalization path, a consistent same-targets predicate across preempt and renew, and corrected stage-reached detection.

One change is operator-visible: rebalance CtlMsg now carries per-stage timing and final status, and ais show job --all continues to surface this information after the run completes.

CtlMsg is the control-message interface supported by AIS xactions (jobs). Each xaction uses it to report job-specific runtime state - counters, stage information, start parameters, and error summaries - so that ais show job and related commands can render a current view without requiring separate CLI plumbing for every xaction.

See also

  • docs/rebalance.md - user-facing guide: concepts, configuration, and CLI workflows including cleanup mode.
  • reb/README.md - internal design notes: execution flow, stages, and lifecycle.

Commits

  • 05eddc1e1: global rebalance: remove per-object ACKs (major upd)
  • d8d4f9931: global rebalance: introduce safe cleanup mode
  • 28d55f6c9: global rebalance: cleanup-mode observability
  • 0cab3b877: global rebalance: consolidate primary's logic to run cleanup mode
  • b6037a3da: cli: ais show job to correctly show rebalance in cleanup mode
  • 785a1b769: cli: 'start rebalance [--cleanup]' to show ID; add formatted action helpers
  • 0df4c7549: global-rebalance/cleanup-mode: add CLI-based scripted test
  • 3c5ead654: space cleanup: detect cluster-wide misplaced objects via Smap HRW
  • ad5256b96: docs: update rebalance.md (add cleanup mode; examples)
  • a10928632: apply rebalance conflict/abort policy consistently across jobs
  • 913f4bd89: lazy-delete: reduce noisy logs and bump work channel cap

Shard Index

Experimental in 4.5. The on-disk format, xaction semantics, and CLI surface may change in subsequent releases.

The shard index is a new subsystem for indexed archive extraction, primarily intended for ML and data-lake workloads that repeatedly fetch named samples from large TAR shards. When a client requests a file inside a TAR shard via archpath, AIS can now resolve the lookup through a persisted index rather than scanning the archive. The fast path is integrated into both GET and get-batch.

A shard index is a compact binary structure built in a single pass over a source TAR. It supports USTAR, GNU, and PAX variants, encodes entries with Pack/Unpack, and embeds the source object's checksum and size so the index can detect re-uploads and treat itself as stale when the underlying shard has changed. Indexes are persisted as objects in a new system bucket (ais://.sys-shardidx); shards that have an index carry a dedicated flag in their on-disk metadata.

Indexing is performed by a bucket-scoped xaction (job). It walks the bucket, skips non-TAR objects, rebuilds stale or corrupted indexes, and treats per-object failures as non-fatal - a single bad shard does not abort the run.

Index presence, lookup, va...

Read more

4.4

08 Apr 17:14

Choose a tag to compare

4.4

AIStore 4.4 is a short release cycle focused on runtime correctness, container awareness, and consolidation.

This release is the first to deliver full cgroup v2 support, improving AIS behavior in constrained and containerized deployments. CPU and memory accounting are now cgroup-aware, container detection and startup initialization are more robust, and CPU utilization is reported as a smoothed moving average rather than an instantaneous sample.

4.4 also promotes native bucket inventory (NBI) into the default AIS path for S3-compatible inventory workflows. The release adds non-recursive inventory listing, removes the legacy S3-specific inventory implementation, and updates the S3 compatibility layer to route inventory-backed requests through NBI. NBI should now be regarded as stable.

Additional changes improve S3 bucket region discovery and error reporting, add per-bucket OCI regions and instance principal authentication, introduce per-bucket user-defined custom metadata, and update the CLI, documentation, tracing, and build tooling.

This release includes roughly 50 commits since 4.3 and is fully backward compatible.


Table of Contents

  1. Runtime: cgroup v2, CPU, and Memory
  2. Native Bucket Inventory (NBI)
  3. S3 Compatibility and Remote Bucket Discovery
  4. OCI Backend
  5. Bucket Metadata and BMD Changes
  6. CLI
  7. Python SDK and ETL
  8. Tracing
  9. Website
  10. Documentation
  11. Build, CI, and Tools
  12. Upgrade Notes

Runtime: cgroup v2, CPU, and Memory

The main feature of 4.4 is full cgroup v2 support, with a broader runtime refactor to make AIS behave correctly in standard containers and restricted Kubernetes-style environments.

See Also

The work starts with startup and environment detection. AIS now initializes CPU and memory accounting earlier and more explicitly via sys.Init(), revises container auto-detection with additional heuristics, and adds the ForceContainerCPUMem feature flag to override failed detection when needed. The initialization path also applies container-aware CPU counting and GOMAXPROCS, and avoids retrying failed cgroup parsing later at runtime.

On the memory side, AIS now supports cgroup-v2 memory reporting based on memory.max, memory.current, and memory.stat, while keeping memory.stat best-effort so that missing or unreadable auxiliary details do not turn into fatal runtime failures. Host and container memory readers have been refactored into stateless helpers, and related error reporting has been cleaned up and prefixed consistently.

On the CPU side, bare-metal /proc/stat parsing is corrected to use specific enumerated fields and avoid double-counting. The runtime now maintains global cached CPU state and reports CPU utilization as a smoothed, time-aware moving average rather than an instantaneous sample. The new model is better suited for operator-facing reporting and long-running process decisions, and it also supports CPU throttling (cgroup v2).

The CLI reflects these changes directly. ais show cluster now shows SYS CPU(%) instead of load average by default, moves load averages behind --verbose, and adds THROTTLED(%) when the displayed proxy or target section contains at least one non-zero cgroup-v2 throttling value.

The last part of the work addresses deployment realism. On Kubernetes, cgroup-v2 files may live under per-pod subtrees rather than fixed global paths, so AIS now resolves /sys/fs/cgroup paths at init time instead of hardcoding them. Startup logging also returns a more descriptive container tag, and the implementation includes explicit constrained-container testing and documentation.

Commits

  • dd03cae2c: sys: bare-metal CPU usage parsing (fix)
  • 1b1397ee3: sys: cpu and memory (part two; major)
  • 3fb3aeb7e: sys: cpu and memory (support cgroup-v2 memory stats)
  • ac6c9461a: sys: cpu and memory (cgroup v2 in a constrained container)
  • 074554a72: sys: CPU utilization is now a moving average; [API change]
  • 30e7f430a: sys: cpu and memory (resolve /sys/fs/cgroup/ paths at init time)
  • 366a8906e: cli: show CPU utilization (EMA); load-averages only when verbose

Native Bucket Inventory (NBI)

Native bucket inventory was introduced in 4.3. In 4.4, it becomes the clear and supported path for AIS-managed inventory-backed listing.

The most visible functional addition is support for non-recursive inventory listing via --nr. That closes an important gap for users who want inventory-backed listing while preserving delimiter-style behavior rather than flattening the namespace unconditionally.

Just as importantly, 4.4 removes the older S3-specific inventory implementation and routes S3-compatible inventory-backed requests to NBI instead. This includes removal of deprecated S3-only backend hooks, old R-flow plumbing, and deprecated CLI/API pieces such as --s3-inventory. At the S3 compatibility layer, AIS keeps the existing Ais-Bucket-Inventory and Ais-Inv-Name headers, but now interprets them through NBI with validation.

With 4.4, NBI is no longer experimental. The docs are updated accordingly; 4.4 adds benchmark tooling and documentation around large-bucket listing to support that shift in status.

See Also

Commits

  • ddbca5e78: native bucket inventory: support non-recursive mode (--nr)
  • 95b023365: remove legacy S3 inventory; route S3 compat to NBI; update docs
  • 8cab32fcf: perf: add NBI benchmark python script; update docs
  • b06d42815: blog: native bucket inventory blog

S3 Compatibility and Remote Bucket Discovery

4.4 also improves general S3 usability outside NBI.

First, AIS now uses HeadBucket for region discovery. When the bucket region is initially unknown, the discovered region is propagated back into the session and used to construct a new client with the now-complete connection tuple. This also removes an older special workaround and includes a ListBuckets() special case when credentials do not provide a region. The overall effect is simpler and more reliable region handling for S3 buckets.

Second, 4.4 improves the behavior and messaging when users try to add a inaccessible S3 bucket, including clearer guidance for cases such as explicitly specifying extra.aws.cloud_region.

Third, AIS now returns more standard S3 error codes. The S3 layer introduces structured error information via s3.ErrInfo and consistently reports codes such as NoSuchKey, NoSuchBucket, and NoSuchUpload, including NoSuchKey for GET, HEAD, DELETE, and COPY of non-existent objects.

Commits

  • 829559e73: s3: use HeadBucket for region discovery
  • db3a5a4a5: s3 compat: return standard S3 error codes (NoSuchKey, et al.)

OCI Backend

The OCI backend gains two notable capabilities in 4.4.

First, AIS now supports per-bucket OCI regions via extra.oci.region. Backend operations are routed through region-keyed clients, making it possible to use different OCI regions on a bucket-by-bucket basis rather than relying on a single process-wide assumption.

Second, AIS adds native instance principal authentication via OCI_INSTANCE_PRINCIPAL_AUTH. The implementation also serializes OCI client creation per region, adds regression tests, and extends CLI autocompletion for extra.oci.*. Because this is marked as a BMD change, it belongs in the upgrade notes as well.

Commits

  • ac35234f9: [BMD change] oci: support per-bucket regions and instance principal auth

Bucket Metadata and BMD Changes

4.4 introduces extra.custom, a new per-bucket user-defined metadata field in ExtraProps.

The new field is an opaque string, up to 128 characters, settable via bucket properties APIs. Validation rejects control characters and enforces a maximum length consistently across providers. CLI help and usage examples are updated accordingly, and bucket-property error messages now include unmarshal details to make bad inputs easier to diagnose. This change is also marked as a BMD change.

Commits

  • fed41c850: [BMD change] add extra.custom: per-bucket user-defined metadata; cli help

CLI

Most visible CLI work in 4.4 is tied to the runtime and NBI changes already de...

Read more

4.3

25 Mar 22:39

Choose a tag to compare

4.3

AIStore 4.3 introduces native bucket inventory (NBI) for remote buckets, backed by a new system-bucket namespace. Buckets with millions or tens of millions of objects can now be listed from AIS-managed inventory snapshots instead of traversing the backend on every call.

This release also adds full IPv6 networking support across all three logical networks and the transport streaming layer, with cluster-wide address-family configuration and automatic IPv4 fallback.

4.3 expands authentication and key-management capabilities with JWKS persistence, manual key rotation, standard JWT claims, and a new key-provider abstraction. Client-side multipart download is significantly improved across the Python SDK, Go APIs, and aisloader, with a new multiprocessing-based parallel downloader backed by a shared-memory ring buffer. The Python SDK and ETL toolchain add streaming transform support across multiple server styles and improve retry handling.

Improvements include media-aware worker parallelism, global rebalance hardening, per-bucket GCP credentials, remote-bucket support for rechunk, multipart whole-object checksum, and various backend and reliability fixes.

Object HEAD v2, introduced as experimental in 4.2, is now stable.

This release includes over 300 commits since 4.2 and is fully backward compatible.


Table of Contents

  1. Native Bucket Inventory (NBI)
  2. System Buckets
  3. IPv6 Networking
  4. Multi-Part Download (MPD)
  5. Multi-Part Upload: Whole-Object Checksum
  6. Global Rebalance
  7. Multi-Credential GCP Backend
  8. Authentication and Key Management
  9. Media-Aware Worker Parallelism
  10. Object HEAD v2
  11. ETL
  12. Rechunk: Remote Bucket Support and --sync-remote
  13. Python SDK
  14. CLI
  15. Runtime, Backend, and Reliability Fixes
  16. Documentation
  17. Build, CI, and Tools
  18. Upgrade Notes

Native Bucket Inventory (NBI)

Native bucket inventory is experimental in 4.3. It gives AIS a built-in way to create and reuse inventory snapshots of remote buckets.

NBI currently applies to remote buckets only: cloud buckets (including S3, GCS, Azure, and OCI) and remote AIS buckets.

The goal is straightforward: make listing large remote buckets - those with millions or tens of millions of objects - substantially faster to list. Direct backend listings at that scale can take minutes or longer and repeat the same remote roundtrips on every next-page call. With NBI, AIS builds a cluster-managed inventory snapshot once, then serves subsequent listings from local storage, supporting pagination and prefix filtering without touching the backend.

The implementation includes:

  • new CLI commands to create, inspect, and remove inventories
  • inventory-backed list-objects via the same distributed flow used for in-cluster ais:// buckets
  • paginated reads and continuation tokens
  • prefix filtering over inventory snapshots
  • inventory metadata including object count, chunk count, participating targets, and cluster-map version
  • support for empty inventory names (lookup by bucket name)
  • handling of small buckets where some targets may have nothing to list
  • improved cache behavior and page-size handling during listing
  • deprecation of the older S3-specific inventory path in favor of a native AIS implementation (see Upgrade Notes)

NBI is primarily a latency-reduction tool. Since ais:// data and metadata are already in the cluster, there is no latency tax to justify a separate inventory layer.
Still, it remains a valid option for point-in-time consistency (snapshotting) - to be possibly considered in the future.

Commits

  • c4ee603ed: Initial NBI support; create ais://.sys-inventory on first use
  • 8daff7a67: Create-inventory 2PC; target validation and execution
  • 416fc041b: Add top-level ais nbi {create|rm|show}
  • 4be7dcd90: Initial list-objects integration for NBI
  • f3472105d: Switch NBI listing from R-flow to A-flow
  • ffefd96db: Replace page-based knobs with NamesPerChunk
  • ee7514cde: Best-effort page sizing for distributed NBI listing
  • 846ee81af: Add chunk cache; cover tiny-page and empty-name cases
  • 51a1b3889: Extend metadata; improve default and verbose show nbi
  • 364e67361: Handle small buckets and empty inventories

System Buckets

System buckets establish a reserved namespace for AIS-managed internal data.

The first system bucket is:

  • ais://.sys-inventory - stores native bucket inventory snapshots as AIS-managed internal objects

System buckets are created on demand and can be listed and inspected with regular CLI commands. At the same time, they are infrastructure: users are not expected to write into them directly or depend on their internal object layout.

This namespace lays the groundwork for more AIS-managed content in future releases.

Commits

  • 347e27e12: Introduce system buckets; reserve leading-dot names
  • c4ee603ed: First system bucket: ais://.sys-inventory
  • 2813f2f1d: Add system-bucket docs; clarify NBI and bucket naming

IPv6 Networking

IPv6 support is now available across all three logical HTTP networks and the transport streaming layer.

It is enabled cluster-wide via the net.use_ipv6 configuration knob. In production, nodes are typically configured with explicit addresses. When addresses are not configured, each node attempts to discover usable IPv6 addresses and, if none are available, falls back to IPv4. Either way, the effective IP family is applied consistently for the lifetime of the process.

Setting net.use_ipv6 in the cluster configuration is a cluster-wide address-family preference, not a dual-stack mode. Once the effective family is determined at startup, AIS avoids dual-stack dialing entirely, with no fallback probing on each connection. This keeps connection setup predictable and eliminates the latency overhead that dual-stack resolution can introduce at scale.

Under the hood, this required moving address-family resolution earlier in startup, before listeners and clients are fully brought up. That keeps listeners, intra-cluster HTTP clients, and related transport paths aligned with the same runtime networking choice, and avoids mixed-family failures such as binding on IPv4 while still attempting IPv6 connections elsewhere. This work also extends local playground and CI coverage, and is documented in docs/networking.md.

A reminder that AIS separates traffic across three logical networks: a user-facing public network, and two intra-cluster networks for control and data. Each can be placed on a dedicated NIC or VLAN. The most impactful use is isolating the intra-cluster data network - which carries global rebalance, get-batch, and other bulk data movement - from the public network serving foreground I/O. In fact, on loaded clusters, there is also strong motivation to isolate the latency-sensitive (small-bandwidth) control plane from both of the above. Logical network separation is a designed-in capability that pays off most at high drive counts and high utilization, and is worth configuring explicitly rather than relying on shared bandwidth of 100GbE links.

See Also

Commits

  • 3a4e3ae49: Initial IPv6 support; config and primary networking path
  • 1b77779aa: IPv6 fallback logic; Host2IP dialability filtering
  • f3630bc68: Local playground IPv6 support; add AIS_USE_IPv6
  • fa5cb4ca3: Apply effective IP family consistently across listeners and clients
  • dd8f6c5e7: Avoid dual-stack dialing when effective family is IPv4
  • 3f7f1418f: IPv4 fallback for configured IPv6 preference

Multi-Part Download (MPD)

Multi-part download (MPD), also referred to as parallel download, sees substantial improvements in 4.3 across the Python SDK, Go client APIs, and aisloader.

For a detailed walkthrough of the architecture, benchmarks, and PyTorch integration,
see the companion blog post: [Parallel Download: 9x Lower Latency for Large-Object Reads](https://aistore.nvidia.com/blog/2026/03/25...

Read more

4.2

15 Jan 17:01

Choose a tag to compare

4.2

AIStore 4.2 focuses on correctness and reliability, authentication and observability, API modernization, and operational fixes across backends and tooling.

The resilver subsystem was substantially rewritten, introducing explicit preemption on mountpath events and full support for chunked object relocation.

Security enhancements include Prometheus metrics for AuthN, persistent RSA key pairs, OIDC-compatible discovery and JWKS endpoints.

New APIs replace legacy polling with explicit condition-based waiting for batch jobs and introduce a chunk-aware HEAD(object), enabling more scalable job monitoring and efficient large-object access.

List-objects implementation was corrected for non-recursive walks on AIS buckets, while the Python SDK and CLI add faster, chunk-aware download paths with parallelism and progress reporting.

Additional improvements and fixes span: cloud bucket namespaces, multipart retry behavior, backend interoperability, and premature global-rebalance completion reporting.

In particular, 4.2 adds support for namespace-scoped cloud buckets (e.g., s3://#prod/data, s3://#dev/data, etc.). This enables multi-tenant scenarios where different users/accounts access same-named buckets in different cloud accounts via respective (different) profiles and/or endpoints.

AIStore 4.2 maintains full backward compatibility with v4.1 and earlier releases. Overall, this release improves the system's availability in presence of disk faults, observability and correctness under load, and modernizes long-standing APIs.


Table of Contents

  1. Resilver
  2. Authentication and Observability
  3. New APIs
  4. List Objects: Non-Recursive Walks
  5. Multipart transfers (downloads, uploads, and backend interoperability)
  6. Global Rebalance
  7. Filesystem Health Checker (FSHC)
  8. ETag and Last-Modified Normalization
  9. Python SDK
  10. CLI
  11. Documentation
  12. Build and CI
  13. Miscellaneous fixes across subsystems

Resilver

Resilver is AIStore’s node-local counterpart to global rebalance: it redistributes objects across a target’s mountpaths to restore correct placement and redundancy after volume changes (attach, detach, enable, disable).

Version 4.2 introduces a major rewrite of the resilver xaction for correctness and reliability. The previous implementation relied on retry-based copying loops; the new implementation uses deterministic copy (object replica) selection and explicit lifecycle management.

Resilver is a single-target operation: cluster-wide execution is now disallowed to prevent cross-target interference; attempts to start resilver without a target ID are rejected by both the CLI and by AIStore itself (i.e., calling the API directly without a target ID will also fail).

Mountpath event-triggered resilvers remain internal and continue to register with IC automatically (the events are: enable, disable, attach, and detach).

Improvements include:

  • Preemption on mountpath events - disable/detach/attach/enable operations now abort any running resilver and restart appropriately; handles back-to-back mountpath events without data loss
  • Chunked object relocation - step-wise relocation with rollback and cleanup on error; validates chunk placement
  • Mountpath jogger lifecycle - filesystem walks terminate when the parent xaction aborts
  • Concurrent access fixes across shared resilver state
  • Runtime progress visibility - live counters (visited objects, active workers) via 'ais show job' CLI
  • Deterministic primary copy selection - to eliminate contention between concurrent goroutines

New documentation: docs/resilver.md

Commit Highlights

  • 805f9ab93: Rewrite copy-recovering path; deterministic primary copy selection
  • 48153ca6d: Preempt running xaction upon mountpath events; add tests
  • 55c890fdd: Relocate chunked objects with rollback
  • 140c6e432: Stop joggers; revise concurrent access to shared state
  • d2a56a220: Add stress tests; consolidate resilver tests
  • 1a53f10e9: Revise mountpath jogger; add walk-stopped sentinel
  • 96213840f: Wire walks to parent xaction abort

Authentication and Observability

Building on v4.1 authentication improvements, this release adds OIDC-compatible discovery and JWKS endpoints, persistent RSA key pairs, and production-grade observability for AuthN.

AuthN now supports both HMAC-SHA256 and RSA (RS256) token signing. When no HMAC secret is provided via config or AIS_AUTH_SECRET_KEY, AuthN will initialize and persist an RSA keypair on disk and use it to issue RS256-signed JWTs.

From the operational perspective, the important changes include:

  • OIDC discovery + JWKS endpoints: AuthN now serves /.well-known/openid-configuration and a public JWKS endpoint for RS256 verification; JWKS responses include cache control based on token expiry.
  • Cluster validation handshake: either the HMAC secret checksum or the RSA public key, depending on the configured signing method.
  • Persistent RSA key pairs for AuthN - RSA private key is loaded from disk if present; otherwise generated once and persisted (key rotation not yet implemented)
  • Prometheus metrics for OIDC/JWKS: counters for invalid iss/kid, plus latency histograms for issuer discovery and key fetches.
  • Improved logging and validation - clearer token and ACL failure diagnostics; stricter permission checks (including admin and bucket-admin)

And separately, CLI:

  • ais authn command, to inspect OIDC configuration and display RSA public keys
  • ais authn show oidc and ais authn show jwks - to display discovery and JWKS output (JSON or table)

Commit Highlights

  • b930247cc: Add metrics for total counts and JWKS caching
  • 49733a2d6: Refactor access check; add initial Prometheus metrics
  • 4253d7bde: Persistent RSA key pair for AuthN
  • 2fb333675: Add RSA signing and validation to AuthN service
  • 26e4d5098: Improve logging on token/ACL failures
  • 48b830470: CLI: view OIDC config and show RSA public key
  • df8f9e764: Fix CheckPermissions to validate all permission types

New APIs

3.1 Xaction v2

Xaction (eXtended action) is AIStore’s abstraction for asynchronous batch jobs. All xactions expose a uniform API and CLI for starting, stopping, waiting, and reporting both generic and job-specific statistics.

Version 4.2 introduces explicit separation between IC-notifying and non-IC xactions, replacing legacy polling with explicit condition-based waiting and formalizing two observation modes:

  • IC-based status observation for xactions that actively report progress via IC
  • Snapshot-based observation for xactions that do not

For IC-notifying xactions, this avoids polling every target and makes waiting scale predictably even in large clusters. Snapshot-based xactions continue to use explicit snapshot inspection.

Background: IC (Information Center) runs on three AIS gateways (one primary and
two random). Targets asynchronously notify IC of xaction progress, eliminating
per-target polling.

Observation APIs

Observation type API Semantics
Status-based (IC) GetStatus Return current status as reported by IC
WaitForStatus Block until a condition is satisfied using IC-reported status
Snapshot-based GetSnaps Fetch current xaction snapshots
WaitForSnaps Wait until a snapshot-based condition is satisfied
WaitForSnapsStarted Wait until at least one matching xaction is observed
WaitForSnapsIdle Wait until snapshots become empty (xaction quiescence)

Built-in Conditions

Conditions are expressed as xact.ArgsMsg methods and apply to waiting APIs:

Condition Meaning
Finished() Xaction reached a terminal state (finished or aborted)
NotRunning() No matching xaction is currently running
Started() At least one matching xaction has been observed
Idle() Consecutive empty snapshots observed

Conditions that req...

Read more

4.1

05 Dec 23:10

Choose a tag to compare

4.1

AIStore 4.1 delivers upgrades across retrieval, security, and cluster operation. The GetBatch API is significantly expanded for ML training workloads, with client-side streaming, improved resilience and error handling under resource shortages. Authentication is redesigned with OIDC support, structured JWT validation, and cluster-key HMAC signing for HTTP redirects. This release also adds the rechunk job for converting datasets between monolithic and chunked layouts, unifies multipart-upload behavior across all cloud backends, and enhances the blob downloader with load-aware throttling. The Python SDK advances to v1.18 with a redesigned Batch API and revised timeout configuration. Configuration validation is strengthened throughout, including automatic migration from v4.0 auth settings.

This release arrives with over 200 commits since v4.0 and maintains backward compatibility, supporting rolling upgrades.

Table of Contents

  1. GetBatch: Distributed Multi-Object Retrieval
  2. Authentication and Security
  3. Chunked Objects
  4. Blob Downloader
  5. Rechunk Job
  6. Unified Load and Throttling
  7. Transport Layer
  8. Multipart Upload
  9. Python SDK
  10. S3 Compatibility
  11. Build System and Tooling
  12. Xaction Lifecycle
  13. ETL and Transform Pipeline
  14. Observability
  15. Configuration Changes
  16. Tools: aisloader

GetBatch: Distributed Multi-Object Retrieval

The GetBatch workflow now has a robust implementation across the cluster. Retrieval is streaming-oriented, supports multi-bucket batches, and includes tunable soft-error handling. The request path incorporates load-based throttling and may return HTTP 429 ("too many requests") when the system is under severe pressure. Memory and disk pressure are taken into account, and connection resets are handled transparently.

Configuration is available via a new "get_batch" section:

{
  "max_wait": "30s",           // Wait time for remote targets (range: 1s-1m)
  "warmup_workers": 2,         // Pagecache read-ahead workers (-1=disabled, 0-10)
  "max_soft_errs": 6           // Recoverable error limit per request
}

Observability has improved through consolidated counters, Prometheus metrics, and clearer status reporting.
Client and tooling updates include a new Batch API in the Python SDK, extended aisloader support, and ansible composer playbooks for distributed benchmarks.

Reference: https://github.com/NVIDIA/aistore/blob/main/docs/get_batch.md

Commit Highlights

  • 5e3382dfa: Refine throttling under memory/disk pressure
  • d093b5c0f: Add Prometheus metrics for GetBatch
  • e71ad3e3c: Consolidate statistics and add counters
  • 2ab4c289a: Handle stream-breakages (ErrSBR); per-request polling and cleanup
  • 1c8dc4b29: Recover from connection drops/resets via SharedDM
  • 9316804d5: Add client-side streaming GetBatch API (breaking change)
  • 174931299: Avoid aborting x-moss; rename internal intra-cluster headers

Authentication and Security

AIS v4.1 introduces a standardized JWT validation model, reorganized configuration for external authentication, and an expanded cluster-key mechanism for securing intra-cluster redirects. Together, these changes provide clearer semantics, better interoperability with third-party identity providers, and more uniform behavior across proxies and targets.

JWT Validation Model and Token Requirements

AIS uses JWTs to both authenticate and authorize requests. Version 4.1 formalizes this process and documents the complete validation flow in auth_validation.md.

Tokens may be issued by the first-party AuthN service or by compatible third-party identity providers (Keycloak, Auth0, custom OAuth services). AIS makes authorization decisions directly from JWT claims; no external role lookups are performed during request execution.

When auth.enabled=true is set in the cluster configuration, proxies validate tokens before routing requests to targets; targets verify redirect signatures when cluster-key signing is enabled.

Token Requirements

AIS accepts tokens signed with supported HMAC or RSA algorithms (HS256/384/512, RS256/384/512).
All tokens must include the standard sub and exp claims; aud and iss are validated when required by configuration.

AIS also recognizes several AIS-specific claims:

  • admin: Full administrative access; overrides all other claims
  • clusters: Cluster-scoped permissions (specific cluster UUID or wildcard)
  • buckets: Bucket-scoped permissions tied to individual buckets within a cluster

Cluster and bucket permissions use the access-flag bitmask defined in api/apc/access.go.

Signature Verification: Static or OIDC

AIS supports two mutually exclusive approaches for signature verification:

  1. Static verification (auth.signature)

    • HMAC (shared secret) or RSA public-key–based
    • Suitable for the AIS AuthN service or controlled token issuers
    • Verifies tokens using the configured secret or public key
  2. OIDC verification (auth.oidc)

    • Automatic discovery using /.well-known/openid-configuration
    • JWKS retrieval and caching with periodic refresh
    • Validates issuer (iss) against allowed_iss
    • Supports custom CA bundles for TLS verification

Both modes accept standard Authorization: Bearer <token> headers and AWS-compatible X-Amz-Security-Token.

Authentication Flow

  1. Extract token from Authorization or X-Amz-Security-Token.
  2. Validate signature via static credentials or OIDC discovery.
  3. Check standard claims (sub, exp, and optionally aud, iss).
  4. Evaluate AIS-specific claims (admin, clusters, buckets) to authorize the operation.
  5. If cluster-key signing is enabled, sign redirect URLs before forwarding to a target; targets verify signatures prior to execution.

This flow applies to all AIS APIs, including S3-compatible requests.

Configuration Changes and Compatibility (v4.0 => v4.1)

The authentication configuration has been reorganized for clarity in v4.1, but the previous format remains fully supported:

{
  "auth": {
    "enabled": true,
    "secret": "your-hmac-secret"
  }
}

Version 4.1 introduces explicit sections for signature verification, required claims, and OIDC issuer configuration:

{
  "auth": {
    "enabled": true,
    "signature": {
      "key": "your-key",
      "method": "HS256"  // or RS256, RS384, RS512
    },
    "required_claims": {
      "aud": ["your-audience"]
    },
    "oidc": {
      "allowed_iss": ["https://your-issuer.com"],
      "issuer_ca_bundle": "/path/to/ca.pem"
    }
  }
}

OIDC handling includes JWKS caching, issuer validation, and optional CA bundles.
Token-cache sharding reduces lock contention under heavy concurrency.

See the JWT Validation Model and Token Requirements subsection above for validation flow and claim semantics.

Cluster-Key Authentication

Cluster-key authentication provides HMAC-based signing for internal redirect URLs.
It is independent from user JWT authentication and ensures that targets accept only authenticated, correctly routed internal redirects.

{
  "auth": {
    "cluster_key": {
      "enabled": true,
      "ttl": "24h",              // 0 = never expire, min: 1h
      "nonce_window": "1m",      // Clock-skew tolerance (max: 10m)
      "rotation_grace": "1m"     // Accept old+new key during rotation (max: 1h)
    }
  }
}

When enabled, the primary proxy generates a versioned secret and distributes it via metasync.
Proxies sign redirect URLs after validating the caller’s token; targets verify the signature before performing redirected operations.
The mechanism enforces correct routing, provides defense-in-depth against forged redirect traffic, and integrates with timestamp and nonce validation.

Commit Highlights

  • 8da7e0ef3: Sign and verify redirect URLs (part six); DPQ
  • 5b61f2571: Shared-secret handling; config sanitization and public clone
  • a2f1b5fbd: Follow-up tests for auth config validation
  • d304febdf: Refactor JWKS cache to support dynamic issuer registration
  • 3d11b1ee1: Enable S3 JWT auth as automatic fallback
  • c0b5aee40: Optimize locking for token validation and revocation

Chunked Objects

The chunked-object subsystem adds a new hard limit on maximum monolithic object size. This prevents ingestion of extremely large single-object payloads that exceed the cluster’s cap...

Read more

4.0

07 Oct 20:20

Choose a tag to compare

4.0

AIStore 4.0 is a major release that introduces a v2 object-metadata format and a chunked object representation (objects stored and managed as multiple chunks). This release also adds native multipart upload and a new GetBatch API for high-throughput retrieval of batches (objects and/or archived files).

In-cluster ETL has been extended with ETL pipelines, allowing users to chain transformations without intermediate buckets. Observability in 4.0 consolidates on Prometheus (as the sole monitoring backend) and adds disk-level capacity alerts

All subsystems, extensions, and modules were updated to support the new functionality. The CLI adds a cluster dashboard, improved feature-flag management, and numerous usability improvements. Configuration updates include a new chunks section and additional knobs to support clusters with hundreds of millions of objects and to improve runtime throttling.

This release arrives with nearly 300 commits since the previous 3.31 and maintains compatibility with the previous version, supporting rolling upgrades.

Table of Contents


Object Metadata Version 2

For the first time since AIStore's inception in 2018, the on-disk object metadata format has been upgraded.

Metadata v2 is now the default for all new writes, while v1 remains fully supported for backward compatibility. The system reads both versions seamlessly.

What's New

The upgrade introduces persistent bucket identity and durable flags. Each object now stores its bucket ID (BID) alongside metadata. On load, AIStore checks the stored BID against the current bucket's BID from BMD (bucket metadata). A mismatch marks the object as defunct and evicts it from the cache, enforcing strong referential integrity between every object and the exact bucket generation it was written to.

The previous format had limited room for new features. Version 2 adds a dedicated 8-byte field for future capabilities (e.g., storage-class, compression, encryption, write-back). The system now returns explicit, typed errors (for example, ErrLmetaCorrupted) when metadata is damaged or unparseable, improving troubleshooting and debuggability.

Legacy flags (filename-too-long, lom-is-chunked) keep their original on-disk placement for backward compatibility, but we no longer carve bits out of the BID for new features.

The integrity model is now stronger: BID and flags are verified on load (before any object access). If the bucket is missing or has a different BID, loads fail with explicit errors (ErrBckNotFound, ErrObjDefunct). From the user's perspective, a defunct object effectively disappears, though it remains on disk until the next space-cleanup.

The serialized layout now packs BID and flags alongside checksum and other fields. The (future-proof) format remains compact.

No manual migration is required. Clusters continue to read v1 metadata while all new writes automatically persist v2.

Commit Highlights

  • ae9ed1d2: Introduce LOM (Local Object Metadata) v2 with BID persistence and durable flags
  • cdd3beef: Enforce defunct object detection when BID mismatches current bucket
  • b543dab7: Extend serialized layout with new packed fields (LID, flags)
  • 12ac88fe: Update error handling and unpack() logic for v2 metadata

Another closely related and notable on-disk metadata change is chunked objects and chunk manifests - see Chunked Objects.


Chunked Objects

Version 4.0 introduces chunked objects as a new persistent storage format. The previous limitation that required storing all objects (small or large) as single monoliths has been removed. For chunked objects, AIStore maintains a chunk manifest that describes their content: chunk sizes, checksums, and ordering. The manifest itself is compressed and protected by a checksum.

Each in-progress multipart upload creates a uniquely identified partial manifest, tied to the bucket, object name, and upload ID.

Multiple uploads can proceed in parallel, with each upload identified by its own manifest ID. Partial manifests are persistent, and the cluster automatically checkpoints them after every N chunks (configured via the "checkpoint_every" setting below):

{
  "chunks": {
    "objsize_limit": 0,
    "chunk_size": "1GiB",
    "checkpoint_every": 4
  }
}

The full set of chunk-related options is described in the Configuration Changes section.

At any given time, however, only one completed manifest exists for an object, serving as the current object version.

There is no limit — hard or soft — on the number of chunks. Chunks are sequentially numbered and distributed across mountpaths, while the manifest includes fields such as MD5 and ETag for S3 compatibility and reserved flags for future extensions like compression or encryption.

Interrupted or abandoned uploads are also cleaned up automatically. Any orphaned chunks or manifests are discovered and removed by the space-cleanup job, which runs on its own when a mountpath goes out of space, but can just as easily be invoked by an admin at any time:

$ ais space-cleanup --help

For users and applications, chunked objects are fully transparent. Chunked and monolithic formats coexist side by side, and the same is true across both the S3-compatible and native APIs. Standard GETs, range reads, and archive reads all continue to work out of the box.

Note: Throughout this document, we use the terms "archive" and "shard" interchangeably to refer to packaged collections of files (TAR, TGZ, ZIP, etc.).

Commit Highlights

  • c4133361: Introduce object chunks and chunk manifest (part one); add initial unit test
  • 61a0ba6b: LZ4-compress manifest with XXHash64 trailer; include per-chunk path; set max manifest size
  • 05174b66: On-disk structure: manifest flags, per-chunk flags; move to cos.Cksum; unit test pass
  • 2c2afc1a: Deterministic serialization order; enforce uint16 limits; reset completed on error
  • 54a03394: Refactor "none" checksum handling; cap per-entry metadata size
  • 9eaaba8d: Add StoreCompleted/StorePartial; atomic finalize via CompleteUfest; reader/reader-at; checksums/ETag helpers
  • a2866fc6: Manifest LoadPartial; HRW/ordering fixes; expand unit tests and scripted tests
  • 445f6e35: Revamp content resolution; register ut (manifest) CT; rename chunk CT to ch; remove hardcoded limits
  • 9bad95b4: Transition manifest from xattr to a content type
  • efbacca6: Refine manifest loading paths (completed vs partial)
  • 984b7f25: S3 multipart integration with chunk manifest

Native API: Multipart Upload (unified implementation)

AIStore 4.0 extends its native API to support multipart uploads (MPU) — a capability previously available only through the S3-compatible interface. The native flow now mirrors the (de-facto) S3 standard: initiate => upload-part => complete (or abort).

The implementation unifies MPU handling across both native and S3 APIs, making it a core storage feature rather than a protocol-specific one.

Each upload gets a unique ID; multiple uploads can run in parallel against the same object. As parts arrive, AIStore records them as chunks and keeps a partial manifest on disk (checkpointed), so long uploads survive restarts without losing state.

Completion doesn't stitch bytes into one monolith anymore — it finalizes the manifest and promotes the object to the new chunked format (see Chunked Objects).

Completion rules follow S3 semantics with one clarification: AIStore requires all parts from 1 to N (no gaps) to finalize.

One practical note from testing: some providers enforce a minimum part size (5MiB for S3 with the last part excluded).

Chunks (a.k.a., "parts") may arrive unordered, duplicates are tolerated, and the most recent copy of a given partNumber (chunk number) wins. Partial completion is rejected.

For S3 and compatible Cloud backends, the whole-object ETag is derived from the per-chunk MD5s, and full-object checksums are computed at finalize.

Range and archival reads behave as expected after completion (the reader streams across chunks).

Backends are handled unifor...

Read more

3.31

25 Jul 16:39

Choose a tag to compare

Changelog

Core

  • 63367e4: Do not reverse-proxy to self
  • aeac54b: Reverse traffic now uses intra-cluster control net (excludes S3)
  • 803fc4d: Remove legacy CONNECT tunnel

Global Rebalance

  • 1b310f1: Add operation-scope --latest and --sync
  • 62793b1: Limited-scope fix for empty buckets
  • cf31b87: Introduce three-way version tie-breakers: local/sender/cloud

S3 multipart upload

  • d830bbf: Add (and check for) NoSuchUpload error
  • 7e13f86: Amend multipart upload error handling

CLI

  • 45d625c, 59c9bd8: New ais show dashboard command: at-a-glance cluster dashboard — node counts, capacity, performance, health, and version info
  • 19a83de: Fix ais show remote-cluster when remote is a different HTTP(s)

See also: ais cluster command

ETL

  • b98e109: Add support for ETLArgs in single-object transform flow

See also: Single-Object Copy/Transform Capability

Deployment & Monitoring

  • 85ad0a4: Reduce recommended file descriptor limits; update docs
  • 59ec6b9: Check and periodically log FD table usage

See also: Maximum number of open files

Refactoring & Lint

  • bea853c: Upgrade golangci-lint version
  • 1b67381: Fix noctx linter errors
  • 3373ae3: refactor ErrBucketNotFound

Documentation

  • 49cac6a: CLI: add inline help; update supported subcommands

Full changelog: git log --oneline v1.3.30...v1.3.31 (≈ 20 commits).

3.30

21 Jul 21:43

Choose a tag to compare

This AIStore release, version 3.30, arrives two months after the previous release with a cycle spanning over 300 commits. As always, 3.30 maintains compatibility with the previous version and supports rolling upgrades.

This release adds the capability to handle batch workloads. The idea is to serve hundreds or thousands of objects (or archived files) in a single serialized streaming (or multipart) response.

AIStore 3.30 delivers performance improvements across multiple subsystems, with particular focus on I/O efficiency, connection management, and ETL operations. The updated and restructured ETL subsystem now features direct filesystem access (by ETL containers), eliminates the WebSocket communicator’s io.Pipe bottlenecks, and enables the container to perform direct PUT operations. It also simplifies configuration using minimal runtime specs in place of full Kubernetes Pod YAML.

Python SDK 1.15 introduces high-performance batch processing with streaming decode for large archives and powerful new ETL capabilities. This breaking release removes the deprecated init_code ETL API while adding improved resilience with better retry logic.

For observability, Prometheus now exports disk write latency and pending I/O depth metrics, with automatic capacity refresh triggered by disk alerts. StatsD exporters, while still available, are now disabled by default as we transition to Prometheus and OpenTelemetry as first-class monitoring solutions.

For tooling, the CLI gains a new ml namespace with Lhotse CutSet helpers for ML pipelines. This CLI upgrade also delivers Hugging Face repository integration (including batched downloads) and multiple usability improvements.

Cloud backend enhancements include Oracle Cloud Infrastructure multipart upload support enabling S3 clients (boto3, s3cmd, AWS CLI, etc.) to perform multipart uploads against OCI backends without code changes, plus AWS configuration management improvements and related bug fixes.

New ais object cp and ais object etl commands (and the respective APIs) provide synchronous copy and transform operations without engaging asynchronous multi-object xactions (batch jobs).

Documentation updates include a complete ETL CLI (docs) rewrite, new operational guides for connection management and ML workflows, enhanced Python SDK documentation, and improved AWS backend configuration guidance.

Infrastructure improvements include automatic macOS/arm64 CLI binary builds for GitHub releases and upgrades to all open-source dependencies (except Kubernetes client libraries), bringing security patches and performance improvements across the codebase.

Table of Contents

  1. Batch Workflows
  2. ETL
  3. Performance & Scalability
  4. Observability
  5. CLI
  6. Python SDK 1.15
  7. Single Object Copy/Transform
  8. Cloud Backend Enhancements
  9. Bug and Security Fixes
  10. Build & CI
  11. Documentation
  12. Deprecations & Compatibility

Detailed changelog is available at this link.

1. Batch Workflows

AIStore 3.30 introduces the new GetBatch API. Instead of reading objects and files one at a time, you bundle any number of items — plain objects and/or archived files — into a single request. The cluster then streams back the entire batch as an ordered archive, eliminating multiple network round-trips. Location wise, the specified data items can reside in-cluster or in remote (cloud) buckets.

The response itself may be streaming or multipart, with formatting options that universally include (.tar, .tar.gz, .tar.lz4, and .zip).

The response always preserves the specified order, and in streaming mode it begins flowing immediately so you don’t have to wait for the entire archive to assemble. If you enable ‘continue on error,’ missing files won’t halt the request — instead, those items appear as zero-length files with a special prefix, and the transfer proceeds with the remaining data.

Lhotse Integration

AIStore 3.30's first vertical GetBatch integration supports Lhotse speech-processing toolkit.
You can now provide a Lhotse CutSet (cuts.jsonl or .gz or .lz4) to the CLI, and AIStore will assemble each cut's audio frames into training-ready serialized (.tar | .tar.gz | .tar.lz4 | .zip) files.

In your batch manifest, each entry can reference one of the following:

  • A complete object (bucket/audio.wav)
  • A file within an archive (shard.tar/images/003.jpg)
  • A time range in seconds (start_time,duration) from Lhotse cuts¹

This integration is intended for speech ML pipelines where audio files are often stored as compressed archives, training requires precise range extraction, and batch sizes can reach thousands of cuts.

AIStore's batch processing groups cuts by source file, minimizing redundant reads when multiple cuts reference the same audio file. Rather than reading byte ranges (which would require multiple I/O operations per file), the system downloads complete files once and performs cut extraction in-memory, delivering superior performance for typical speech training workloads.

Further, large manifests can be automatically split using --batch-size and --output-template parameters, producing multiple equal-sized archives instead of one massive output.


¹ Note: Current implementation processes complete audio files and extracts cuts in-memory for optimal I/O efficiency. Byte-range reading support can be added upon request, though this would impact performance for workloads with multiple cuts per file.

CLI Examples

# Stream a cross-bucket batch directly to disk
ais ml get-batch output.zip --spec manifest.yaml --streaming

# Process Lhotse cuts into 1000-sample shards
ais ml lhotse-get-batch --cuts training.cuts.jsonl.lz4 \
                        --batch-size 1000 \
                        --output-template "shard-{001..999}.tar"

Python SDK Integration

from aistore.sdk.batch import BatchRequest, BatchLoader

# Build streaming batch request
req = BatchRequest(streaming=True, continue_on_err=True)
req.add_object_request(obj, archpath="img/0001.jpg", opaque=b"metadata")

# Execute with streaming decode
stream = BatchLoader(cluster_url).get_batch(
    req, return_raw=True, decode_as_stream=True
)

Commit Highlights

  • 2f18344e: Complete CLI refactor for batch operations
  • 404d0011: Implement streaming path with memory optimization
  • 726da0d: Multi-batch generator with automatic chunking
  • 0affbd75: Ordered multi-node assembly protocol
  • f8ee6c2d: Shared stream pool implementation

2. ETL

AIStore 3.30 represents a major restructure of the ETL component that consumed the majority of this development cycle. The overhaul focused primarily on performance improvements, introducing direct PUT operations and eliminating io.Pipe (in the previous WebSocket-based implementation) that were limiting throughput at scale. This restructure required breaking changes to the ETL metadata format and removal of the deprecated init-code API, while also adding automatic filesystem access and a two-phase commit protocol for deployment reliability.

Performance-Focused Restructure

The core motivation for this restructure was addressing performance bottlenecks that became apparent under heavy production workloads. The previous ETL architecture suffered from sub-optimal data flows that created significant overhead for large-scale transformations.

Direct PUT Operations: ETL containers can now write transformed objects directly back to AIStore targets without intermediate hops or staging. This eliminates a full network round-trip and the associated serialization overhead, dramatically improving throughput for write-heavy transformations. Previously, transformed data had to flow back through the proxy layer, creating both latency and bandwidth bottlenecks.

WebSocket io.Pipe Elimination: The WebSocket communicator has been completely rewritten to remove the io.Pipe bottleneck that was causing blocking I/O operations. The new implementation writes directly to file handles instead of using goroutine-coordinated pipes, eliminating unnecessary buffering, memory allocation pressure, and synchronization overhead. This change alone reduces goroutine count by thousands for large ETL jobs.

Streamlined Transport Layer: The per-job transport mechanism now uses a single read-write loop rather than complex goroutine orchestration, reducing resource consumption and improving predictability under load. A one-million-object ETL run on a 20-node cluster now operates with significantly lower memory footprint and CPU overhead.

Breaking Changes

This restructure required two significant breaking changes that affect existing ETL workflows.

ETL Metadata Format: The metadata format has been updated to support the new performance architecture and deployment protocols. Clusters must achieve uniform version status before starting new ETL operations to ensure consistent behavior across all nodes during the transition period.

init_code API Removal: The deprecated init_code method for ETL initialization...

Read more