Skip to content

[lake/iceberg] only enable column stats when scan filter exists#2842

Merged
luoyuxia merged 1 commit into
apache:mainfrom
zuston:iceberg
Mar 12, 2026
Merged

[lake/iceberg] only enable column stats when scan filter exists#2842
luoyuxia merged 1 commit into
apache:mainfrom
zuston:iceberg

Conversation

@zuston

@zuston zuston commented Mar 11, 2026

Copy link
Copy Markdown
Member

Purpose

Linked issue: close #xxx

Brief change log

Tests

API and Format

Documentation

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Optimizes Iceberg split planning in the lake source by avoiding column-stat collection unless it is needed for predicate pushdown, reducing planning overhead for unfiltered scans.

Changes:

  • Remove unconditional includeColumnStats() from TableScan creation.
  • Enable includeColumnStats() only when a scan filter is present.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +71 to 74
TableScan tableScan = table.newScan().useSnapshot(snapshotId);
if (filter != null) {
tableScan = tableScan.filter(filter);
tableScan = tableScan.includeColumnStats().filter(filter);
}

Copilot AI Mar 11, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change alters planning behavior by conditionally enabling includeColumnStats() only when filter != null, but there is no test asserting the new contract (stats absent when no filter; stats present when filter exists). Please add/extend an Iceberg source test to cover both paths so the intended performance optimization doesn’t regress across Iceberg version changes (e.g., assert presence/absence of file metrics like lowerBounds() on planned tasks for the same table with/without a filter).

Copilot uses AI. Check for mistakes.
Comment on lines +71 to 72
TableScan tableScan = table.newScan().useSnapshot(snapshotId);
if (filter != null) {

Copilot AI Mar 11, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description still contains the default template and does not state the purpose, linked issue, or tests run. Please update the PR description (e.g., link the issue being fixed and list the specific UT/IT that validate this change) so reviewers can verify intent and coverage.

Copilot uses AI. Check for mistakes.
@beryllw

beryllw commented Mar 11, 2026

Copy link
Copy Markdown
Contributor

Should we also check if this needs to be fixed?

@luoyuxia

Copy link
Copy Markdown
Contributor

Should we also check if this needs to be fixed?

The code must keep includeColumnStats() because sortFileScanTask() at line 122-138 directly reads f1.file().lowerBounds().get(sortFiledId) to sort files by the __offset column. Removing includeColumnStats() there would cause lowerBounds() to return null, resulting in an NPE

@luoyuxia luoyuxia left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zuston Thanks. LGTM!

@luoyuxia luoyuxia merged commit a89311f into apache:main Mar 12, 2026
10 checks passed
hemanthsavasere pushed a commit to hemanthsavasere/fluss that referenced this pull request Mar 14, 2026
wxplovecc pushed a commit to tongcheng-elong/fluss that referenced this pull request Apr 17, 2026
wxplovecc pushed a commit to tongcheng-elong/fluss that referenced this pull request Apr 20, 2026
Ugbot pushed a commit to Ugbot/fluss that referenced this pull request Apr 26, 2026
…he#2842)

Co-authored-by: Junfan Zhang <zhangjunfan@qiyi.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants