[lake/iceberg] only enable column stats when scan filter exists#2842
Conversation
There was a problem hiding this comment.
Pull request overview
Optimizes Iceberg split planning in the lake source by avoiding column-stat collection unless it is needed for predicate pushdown, reducing planning overhead for unfiltered scans.
Changes:
- Remove unconditional
includeColumnStats()fromTableScancreation. - Enable
includeColumnStats()only when a scanfilteris present.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| TableScan tableScan = table.newScan().useSnapshot(snapshotId); | ||
| if (filter != null) { | ||
| tableScan = tableScan.filter(filter); | ||
| tableScan = tableScan.includeColumnStats().filter(filter); | ||
| } |
There was a problem hiding this comment.
This change alters planning behavior by conditionally enabling includeColumnStats() only when filter != null, but there is no test asserting the new contract (stats absent when no filter; stats present when filter exists). Please add/extend an Iceberg source test to cover both paths so the intended performance optimization doesn’t regress across Iceberg version changes (e.g., assert presence/absence of file metrics like lowerBounds() on planned tasks for the same table with/without a filter).
| TableScan tableScan = table.newScan().useSnapshot(snapshotId); | ||
| if (filter != null) { |
There was a problem hiding this comment.
The PR description still contains the default template and does not state the purpose, linked issue, or tests run. Please update the PR description (e.g., link the issue being fixed and list the specific UT/IT that validate this change) so reviewers can verify intent and coverage.
|
Should we also check if this needs to be fixed? |
The code must keep includeColumnStats() because sortFileScanTask() at line 122-138 directly reads f1.file().lowerBounds().get(sortFiledId) to sort files by the __offset column. Removing includeColumnStats() there would cause lowerBounds() to return null, resulting in an NPE |
…he#2842) Co-authored-by: Junfan Zhang <zhangjunfan@qiyi.com>
…he#2842) Co-authored-by: Junfan Zhang <zhangjunfan@qiyi.com>
…he#2842) Co-authored-by: Junfan Zhang <zhangjunfan@qiyi.com>
…he#2842) Co-authored-by: Junfan Zhang <zhangjunfan@qiyi.com>
Purpose
Linked issue: close #xxx
Brief change log
Tests
API and Format
Documentation