[fix](nereids) clamp limit + offset overflow when pushing down TopN/Limit#64633
[fix](nereids) clamp limit + offset overflow when pushing down TopN/Limit#64633924060929 wants to merge 1 commit into
Conversation
…imit
When pushing a TopN/Limit down through Union/Join/Window, the child operator's
limit is computed as limit + offset. Both are non-negative longs, so when they
are close to BIGINT_MAX (e.g. LIMIT 9223372036854775807 OFFSET 9223372036854775807)
the addition overflows long and wraps to a negative value. A negative limit is an
illegal plan: the BE sorter reinterprets it as a huge unsigned value, so a trivial
query that should return an empty set instead hangs until the query times out.
Add Utils.saturatedAdd, which clamps to Long.MAX_VALUE on overflow, and use it
everywhere a child limit is derived from limit + offset: PushDownTopNThroughUnion,
PushDownTopNDistinctThroughUnion, PushDownTopNThroughJoin,
PushDownTopNDistinctThroughJoin, PushDownTopNThroughWindow and SplitLimit.
Long.MAX_VALUE ("all rows") is the correct upper bound: no relation can hold more
than Long.MAX_VALUE rows, so the pushed-down limit never drops rows the parent may
need, and the parent operator still applies the real limit/offset.
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
|
/review |
There was a problem hiding this comment.
Requesting changes because the overflow fix is incomplete.
Critical checkpoint conclusions:
- Goal/test proof: The PR adds a saturating helper and applies it to several rewrite rules, but it does not accomplish the stated goal of clamping
limit + offseteverywhere a pushed child/scan/agg limit is derived. A plain TopN can still become a negative local physical TopN inLogicalTopNToPhysicalTopN, and several Limit/TopN parallel paths still use raw addition. The tests cover the helper and one Union reproducer only, so they do not prove the end-to-end fix. - Scope/focus: The helper is small and clear, but the applied scope is inconsistent with the PR description and leaves equivalent code paths unfixed.
- Concurrency/lifecycle/config/compatibility: No new concurrency, lifecycle, config, persistence, FE-BE protocol, or rolling-upgrade compatibility concerns found.
- Parallel code paths: Not fully handled. Remaining raw arithmetic exists in TopN implementation/post-processing, Limit distinct pushdown, Limit window pushdown, TopN-to-agg, scan sort/vector/score pushdowns, and translator sort-limit pushdown.
- Special checks: Threshold checks that compare against raw
limit + offsetcan still flip from huge positive to negative and enable optimizations that should be disabled. - Test coverage/results:
UtilsTestis useful, but the regression case uses direct Groovy assertions for a deterministic result and does not update the.outfile, which violates the local regression-test standard. Additional regression coverage is needed for a forced two-phase TopN and at least one Limit path. - Observability/performance/data correctness: No extra observability appears necessary. The intended clamp is semantically safe for non-negative limits and avoids the BE timeout behavior, but only once every equivalent derived-limit path is fixed.
- User focus: No additional user-provided focus points were present.
| * no relation can hold more than {@code Long.MAX_VALUE} rows, clamping to {@code Long.MAX_VALUE} | ||
| * ("all rows") is the semantically correct upper bound and never drops rows the parent may need. | ||
| */ | ||
| public static long saturatedAdd(long a, long b) { |
There was a problem hiding this comment.
This helper is the right primitive, but the PR does not route all equivalent limit + offset row-count derivations through it. A plain TopN still follows this path:
LogicalTopN(limit=Long.MAX_VALUE, offset=Long.MAX_VALUE, order by id)
Scan/Union
LogicalTopNToPhysicalTopN.twoPhaseSort then builds a local child with logicalTopN.getLimit() + logicalTopN.getOffset() in fe/fe-core/src/main/java/org/apache/doris/nereids/rules/implementation/LogicalTopNToPhysicalTopN.java:50, so the physical tree can still contain:
PhysicalTopN(MERGE_SORT limit=MAX offset=MAX)
PhysicalTopN(LOCAL_SORT limit=-2 offset=0)
With SET sort_phase_num = 2 this is not just an unused alternative. The same raw computation also remains in PushDownLimit.java, PushDownLimitDistinctThroughJoin.java, PushDownLimitDistinctThroughUnion.java, LimitAggToTopNAgg.java, PushTopnToAgg.java, the score/vector TopN scan pushdowns, and PhysicalPlanTranslator#setSortLimit. Please replace these parallel derived child/scan/agg limits, or centralize the operation on Limit/TopN, otherwise the same negative limit can still reach BE outside the specific Union rewrite covered here.
| ) s; | ||
| """ | ||
| assertEquals(1, topnUnionOverflowRes.size()) | ||
| assertEquals(0L, topnUnionOverflowRes[0][0]) |
There was a problem hiding this comment.
This deterministic result should be recorded with a qt_... query and the generated .out file, not Groovy assertions. The local regression-test rules require determined expected results to be generated through qt_sql/order_qt style output files; here count(*) always returns one row with 0, but that expected result is invisible in regression-test/data/nereids_rules_p0/push_down_top_n/push_down_top_n_through_union.out. Please convert this to a qt_topn_union_overflow block and regenerate the .out file.
FE Regression Coverage ReportIncrement line coverage |
TPC-H: Total hot run time: 29205 ms |
TPC-DS: Total hot run time: 175320 ms |
ClickBench: Total hot run time: 25.76 s |
Proposed changes
When pushing a
TopN/Limitdown throughUnion/Join/Window, the child operator's limit iscomputed as
limit + offset. Both are non-negativelongs, so when they are close toBIGINT_MAX(e.g.
LIMIT 9223372036854775807 OFFSET 9223372036854775807) the addition overflows thelongrangeand wraps to a negative value.
A negative limit is an illegal plan. On the BE side it is reinterpreted as a huge unsigned value
(
uint64_t limit = _offset + _limitin the sorter), so a trivial query that should immediately returnan empty set instead runs until it hits the query timeout.
Minimal reproducer (no table required)
PUSH_DOWN_TOP_N_THROUGH_UNIONdisabled: returns0immediately(correct — the offset is far beyond the 3 input rows).
Fix
Add
Utils.saturatedAdd(long, long), which clamps toLong.MAX_VALUEon positive overflow instead ofwrapping, and use it everywhere a child limit is derived from
limit + offset:PushDownTopNThroughUnion/PushDownTopNDistinctThroughUnionPushDownTopNThroughJoin/PushDownTopNDistinctThroughJoinPushDownTopNThroughWindowSplitLimitLong.MAX_VALUE("all rows") is the semantically correct upper bound: no relation can hold more thanLong.MAX_VALUErows, so the pushed-down limit never drops rows the parent may need, and the parentoperator still applies the real
limit/offset. For non-overflowing inputs the behavior is unchanged.Tests
UtilsTest#testSaturatedAddcovers normal, positive-overflow and negative-overflow cases.push_down_top_n_through_unionasserts the reproducer returns an empty result(count
0) without timing out.