Skip to content

Antalya 26.3 Backport of #100645 - Parse record_count and size_bytes fields from iceberg manifest file#1776

Merged
zvonand merged 2 commits into
antalya-26.3from
backports/antalya-26.3/100645
May 18, 2026
Merged

Antalya 26.3 Backport of #100645 - Parse record_count and size_bytes fields from iceberg manifest file#1776
zvonand merged 2 commits into
antalya-26.3from
backports/antalya-26.3/100645

Conversation

@mkmkme

@mkmkme mkmkme commented May 9, 2026

Copy link
Copy Markdown
Collaborator

Changelog category (leave one):

  • Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Object information used for parsing data files in iceberg now contains the number of file rows and file size in bytes parsed from manifest file (ClickHouse#100645 by @divanik).

Documentation entry for user-facing changes

...

CI/CD Options

Exclude tests:

  • Fast test
  • Integration Tests
  • Stateless tests
  • Stateful tests
  • Performance tests
  • All with ASAN
  • All with TSAN
  • All with MSAN
  • All with UBSAN
  • All with Coverage
  • All with Aarch64
  • All Regression
  • Disable CI Cache

Regression jobs to run:

  • Fast suites (mostly <1h)
  • Aggregate Functions (2h)
  • Alter (1.5h)
  • Benchmark (30m)
  • ClickHouse Keeper (1h)
  • Iceberg (2h)
  • LDAP (1h)
  • Parquet (1.5h)
  • RBAC (1.5h)
  • SSL Server (1h)
  • S3 (2h)
  • S3 Export (2h)
  • Swarms (30m)
  • Tiered Storage (2h)

divanik and others added 2 commits May 9, 2026 13:53
…s_and_rows_count_to_iceberg_data_object

Parse record_count and size_bytes fields from iceberg manifest file
@github-actions

github-actions Bot commented May 9, 2026

Copy link
Copy Markdown

Workflow [PR], commit [3cb64e5]

if (info.record_count.has_value())
LOG_TEST(log, "Iceberg record_count for '{}': {}", object_info->getPath(), *info.record_count);
if (info.file_size_in_bytes.has_value())
LOG_TEST(log, "Iceberg file_size_in_bytes for '{}': {}", object_info->getPath(), *info.file_size_in_bytes);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I right that writing in log with 'test' level is an only place, where new data are used?

@alsugiliazova

Copy link
Copy Markdown
Member

Audit: PR #1776 — Antalya 26.3 Backport of #100645 — Parse record_count and size_bytes fields from iceberg manifest file

AI audit note: generated by AI (Cursor agent, audit-review skill).

Confirmed defects

No confirmed defects in reviewed scope.

@ianton-ru ianton-ru left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but I don't understand a value of this PR. It makes a lot of work only to write two lines in log with 'test' level. Or I missed something.

@alsugiliazova

Copy link
Copy Markdown
Member

Verification: PR #1776

tats_logging(storages3× format-version1/2×use_view False/True` = 4 cases).

PR-added tests — all GREEN

4 parametrized cases × 3 integration jobs = 12 OK runs, 0 failures.

Job Cases run Status
Integration tests (amd_asan, db disk, old analyzer, 4/6) 4 OK
Integration tests (amd_binary, 5/5) 4 OK
Integration tests (arm_binary, distributed plan, 2/4) 4 OK

All four parametrizations pass on every job:

  • [s3-1-False], [s3-1-True] — format-version 1, with/without view
  • [s3-2-False], [s3-2-True] — format-version 2, with/without view

The new manifest-file stats path has clean positive coverage on both Iceberg spec versions and on plain-table / view-wrapped reads.

Note: the new gtest gtest_datalake_table_state_serde is built into the unit-tests binary; the unit-tests job ran green in this CI rollup.

CI overview (head commit)

  • PR test workflow: 46 success / 50 skipped / 0 failure — fully GREEN at the PR test workflow level.
  • Regression workflow: 29 success / 67 skipped / 4 failure (chronic baseline).
  • One pending action_required job (queue/auth).

Test-level failures in DB

Zero. No test_status='FAIL' rows on this commit.

Regression-workflow failures (chronic baseline on antalya-26.3)

Suite Fails
Swarms (Aarch64 + Release) 227
Parquet (Aarch64 + Release) 34
S3Export partition (Aarch64 + Release) 20
S3Export part (Aarch64 + Release) 16

Same fingerprint as sibling antalya-26.3 PRs (1783, 1775, 1773, 1772, 1771, 1770, 1769, …). No new failure modes.

Caveat — partial frontport

PR lands on antalya-26.3 while companion features from antalya-26.1 are still being frontported in parallel. Final re-verify recommended once the rest of the bundle lands.

Verdict

Safe to merge.

  • New integration test test_iceberg_file_stats_logging passes 100% (12/12 integration runs) across all 4 parametrizations and 3 integration jobs.
  • New gtest for datalake_table_state serde compiles and runs green.
  • Zero test-level FAIL rows on this head.
  • All remaining red checks are the recurring antalya-26.3 chronic regression baseline (Swarms / Parquet / S3Export), shared with sibling PRs.

@alsugiliazova alsugiliazova added the verified Approved for release label May 15, 2026
@zvonand zvonand merged commit 43315aa into antalya-26.3 May 18, 2026
296 of 312 checks passed
subkanthi pushed a commit that referenced this pull request May 26, 2026
Antalya 26.3 Backport of ClickHouse#100645 - Parse record_count and size_bytes fields from iceberg manifest file
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants