Skip to content

[fix](nereids) clamp limit + offset overflow when pushing down TopN/Limit#64633

Draft
924060929 wants to merge 1 commit into
apache:masterfrom
924060929:fix-push-down-topn-union-overflow
Draft

[fix](nereids) clamp limit + offset overflow when pushing down TopN/Limit#64633
924060929 wants to merge 1 commit into
apache:masterfrom
924060929:fix-push-down-topn-union-overflow

Conversation

@924060929

Copy link
Copy Markdown
Contributor

Proposed changes

When pushing a TopN/Limit down through Union/Join/Window, the child operator's limit is
computed as limit + offset. Both are non-negative longs, so when they are close to BIGINT_MAX
(e.g. LIMIT 9223372036854775807 OFFSET 9223372036854775807) the addition overflows the long range
and wraps to a negative value.

A negative limit is an illegal plan. On the BE side it is reinterpreted as a huge unsigned value
(uint64_t limit = _offset + _limit in the sorter), so a trivial query that should immediately return
an empty set instead runs until it hits the query timeout.

Minimal reproducer (no table required)

select count(*) as c from (
    select id from (
        select 1 as id union all select 2 as id union all select 3 as id
    ) t
    order by id limit 9223372036854775807 offset 9223372036854775807
) s;
  • Original planner, or Nereids with PUSH_DOWN_TOP_N_THROUGH_UNION disabled: returns 0 immediately
    (correct — the offset is far beyond the 3 input rows).
  • Nereids with the rule enabled: times out.

Fix

Add Utils.saturatedAdd(long, long), which clamps to Long.MAX_VALUE on positive overflow instead of
wrapping, and use it everywhere a child limit is derived from limit + offset:

  • PushDownTopNThroughUnion / PushDownTopNDistinctThroughUnion
  • PushDownTopNThroughJoin / PushDownTopNDistinctThroughJoin
  • PushDownTopNThroughWindow
  • SplitLimit

Long.MAX_VALUE ("all rows") is the semantically correct upper bound: no relation can hold more than
Long.MAX_VALUE rows, so the pushed-down limit never drops rows the parent may need, and the parent
operator still applies the real limit/offset. For non-overflowing inputs the behavior is unchanged.

Tests

  • UtilsTest#testSaturatedAdd covers normal, positive-overflow and negative-overflow cases.
  • A regression case in push_down_top_n_through_union asserts the reproducer returns an empty result
    (count 0) without timing out.

…imit

When pushing a TopN/Limit down through Union/Join/Window, the child operator's
limit is computed as limit + offset. Both are non-negative longs, so when they
are close to BIGINT_MAX (e.g. LIMIT 9223372036854775807 OFFSET 9223372036854775807)
the addition overflows long and wraps to a negative value. A negative limit is an
illegal plan: the BE sorter reinterprets it as a huge unsigned value, so a trivial
query that should return an empty set instead hangs until the query times out.

Add Utils.saturatedAdd, which clamps to Long.MAX_VALUE on overflow, and use it
everywhere a child limit is derived from limit + offset: PushDownTopNThroughUnion,
PushDownTopNDistinctThroughUnion, PushDownTopNThroughJoin,
PushDownTopNDistinctThroughJoin, PushDownTopNThroughWindow and SplitLimit.

Long.MAX_VALUE ("all rows") is the correct upper bound: no relation can hold more
than Long.MAX_VALUE rows, so the pushed-down limit never drops rows the parent may
need, and the parent operator still applies the real limit/offset.
@hello-stephen

Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@924060929

Copy link
Copy Markdown
Contributor Author

run buildall

@924060929

Copy link
Copy Markdown
Contributor Author

/review

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requesting changes because the overflow fix is incomplete.

Critical checkpoint conclusions:

  • Goal/test proof: The PR adds a saturating helper and applies it to several rewrite rules, but it does not accomplish the stated goal of clamping limit + offset everywhere a pushed child/scan/agg limit is derived. A plain TopN can still become a negative local physical TopN in LogicalTopNToPhysicalTopN, and several Limit/TopN parallel paths still use raw addition. The tests cover the helper and one Union reproducer only, so they do not prove the end-to-end fix.
  • Scope/focus: The helper is small and clear, but the applied scope is inconsistent with the PR description and leaves equivalent code paths unfixed.
  • Concurrency/lifecycle/config/compatibility: No new concurrency, lifecycle, config, persistence, FE-BE protocol, or rolling-upgrade compatibility concerns found.
  • Parallel code paths: Not fully handled. Remaining raw arithmetic exists in TopN implementation/post-processing, Limit distinct pushdown, Limit window pushdown, TopN-to-agg, scan sort/vector/score pushdowns, and translator sort-limit pushdown.
  • Special checks: Threshold checks that compare against raw limit + offset can still flip from huge positive to negative and enable optimizations that should be disabled.
  • Test coverage/results: UtilsTest is useful, but the regression case uses direct Groovy assertions for a deterministic result and does not update the .out file, which violates the local regression-test standard. Additional regression coverage is needed for a forced two-phase TopN and at least one Limit path.
  • Observability/performance/data correctness: No extra observability appears necessary. The intended clamp is semantically safe for non-negative limits and avoids the BE timeout behavior, but only once every equivalent derived-limit path is fixed.
  • User focus: No additional user-provided focus points were present.

* no relation can hold more than {@code Long.MAX_VALUE} rows, clamping to {@code Long.MAX_VALUE}
* ("all rows") is the semantically correct upper bound and never drops rows the parent may need.
*/
public static long saturatedAdd(long a, long b) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This helper is the right primitive, but the PR does not route all equivalent limit + offset row-count derivations through it. A plain TopN still follows this path:

LogicalTopN(limit=Long.MAX_VALUE, offset=Long.MAX_VALUE, order by id)
  Scan/Union

LogicalTopNToPhysicalTopN.twoPhaseSort then builds a local child with logicalTopN.getLimit() + logicalTopN.getOffset() in fe/fe-core/src/main/java/org/apache/doris/nereids/rules/implementation/LogicalTopNToPhysicalTopN.java:50, so the physical tree can still contain:

PhysicalTopN(MERGE_SORT limit=MAX offset=MAX)
  PhysicalTopN(LOCAL_SORT limit=-2 offset=0)

With SET sort_phase_num = 2 this is not just an unused alternative. The same raw computation also remains in PushDownLimit.java, PushDownLimitDistinctThroughJoin.java, PushDownLimitDistinctThroughUnion.java, LimitAggToTopNAgg.java, PushTopnToAgg.java, the score/vector TopN scan pushdowns, and PhysicalPlanTranslator#setSortLimit. Please replace these parallel derived child/scan/agg limits, or centralize the operation on Limit/TopN, otherwise the same negative limit can still reach BE outside the specific Union rewrite covered here.

) s;
"""
assertEquals(1, topnUnionOverflowRes.size())
assertEquals(0L, topnUnionOverflowRes[0][0])

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This deterministic result should be recorded with a qt_... query and the generated .out file, not Groovy assertions. The local regression-test rules require determined expected results to be generated through qt_sql/order_qt style output files; here count(*) always returns one row with 0, but that expected result is invisible in regression-test/data/nereids_rules_p0/push_down_top_n/push_down_top_n_through_union.out. Please convert this to a qt_topn_union_overflow block and regenerate the .out file.

@morrySnow morrySnow marked this pull request as draft June 18, 2026 08:36
@hello-stephen

Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 70.00% (14/20) 🎉
Increment coverage report
Complete coverage report

@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-H: Total hot run time: 29205 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit e9fd2a7185cb6a9d22434138504227ba8b25fd6d, data reload: false

------ Round 1 ----------------------------------
============================================
q1	17618	3987	3991	3987
q2	2013	356	186	186
q3	10304	1492	835	835
q4	4686	476	343	343
q5	7515	859	571	571
q6	182	166	137	137
q7	778	833	627	627
q8	9336	1719	1626	1626
q9	5870	4508	4490	4490
q10	6804	1782	1539	1539
q11	438	277	243	243
q12	641	420	293	293
q13	18140	3425	2795	2795
q14	267	268	247	247
q15	q16	790	795	709	709
q17	997	1008	1052	1008
q18	7353	5867	5501	5501
q19	1307	1286	1057	1057
q20	488	407	254	254
q21	5899	2578	2455	2455
q22	435	361	302	302
Total cold run time: 101861 ms
Total hot run time: 29205 ms

----- Round 2, with runtime_filter_mode=off -----
============================================
q1	4382	4440	4264	4264
q2	331	362	226	226
q3	4604	4922	4441	4441
q4	2081	2173	1371	1371
q5	4447	4286	4273	4273
q6	238	177	128	128
q7	1723	1909	1891	1891
q8	2579	2200	2215	2200
q9	8076	8376	7995	7995
q10	4806	4778	4353	4353
q11	571	444	393	393
q12	772	768	553	553
q13	3256	3564	2995	2995
q14	308	315	278	278
q15	q16	708	730	627	627
q17	1367	1293	1323	1293
q18	7922	7600	7143	7143
q19	1130	1118	1064	1064
q20	2215	2244	1946	1946
q21	5282	4563	4449	4449
q22	518	455	389	389
Total cold run time: 57316 ms
Total hot run time: 52272 ms

@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-DS: Total hot run time: 175320 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit e9fd2a7185cb6a9d22434138504227ba8b25fd6d, data reload: false

query5	4306	628	488	488
query6	441	199	184	184
query7	4850	576	312	312
query8	364	215	205	205
query9	8717	4113	4107	4107
query10	445	303	255	255
query11	5907	2342	2130	2130
query12	160	106	98	98
query13	1345	613	431	431
query14	6390	5393	5163	5163
query14_1	4437	4405	4389	4389
query15	205	198	179	179
query16	1013	458	472	458
query17	1129	712	584	584
query18	2701	483	352	352
query19	206	186	154	154
query20	118	109	105	105
query21	219	143	119	119
query22	13727	13695	13359	13359
query23	17543	16523	16197	16197
query23_1	16291	16194	16301	16194
query24	7562	1768	1317	1317
query24_1	1357	1314	1344	1314
query25	565	467	360	360
query26	1313	302	169	169
query27	2621	545	340	340
query28	4389	2031	2011	2011
query29	1043	593	469	469
query30	314	233	196	196
query31	1121	1081	981	981
query32	113	61	56	56
query33	505	309	247	247
query34	1143	1169	666	666
query35	759	761	675	675
query36	1407	1408	1206	1206
query37	171	105	87	87
query38	3189	3131	3095	3095
query39	935	913	907	907
query39_1	884	872	881	872
query40	217	122	98	98
query41	64	63	61	61
query42	95	98	95	95
query43	324	323	283	283
query44	1448	778	784	778
query45	211	183	181	181
query46	1115	1192	757	757
query47	2368	2362	2224	2224
query48	406	420	298	298
query49	613	447	345	345
query50	984	352	258	258
query51	4433	4239	4233	4233
query52	91	88	77	77
query53	250	271	184	184
query54	267	218	190	190
query55	78	74	70	70
query56	248	224	212	212
query57	1446	1420	1312	1312
query58	236	212	211	211
query59	1608	1661	1417	1417
query60	296	249	220	220
query61	152	148	151	148
query62	696	643	587	587
query63	227	188	191	188
query64	2475	750	584	584
query65	4863	4755	4783	4755
query66	1711	448	327	327
query67	29848	29708	29521	29521
query68	3179	1604	990	990
query69	412	305	263	263
query70	1045	1003	1005	1003
query71	287	238	213	213
query72	3147	2594	2289	2289
query73	812	838	423	423
query74	5122	4977	4752	4752
query75	2664	2607	2251	2251
query76	2297	1181	758	758
query77	371	374	294	294
query78	12571	12653	11816	11816
query79	1425	1239	804	804
query80	649	554	434	434
query81	451	283	242	242
query82	892	156	122	122
query83	356	282	247	247
query84	264	149	117	117
query85	857	508	408	408
query86	375	302	286	286
query87	3374	3332	3247	3247
query88	3680	2800	2770	2770
query89	451	372	337	337
query90	1880	183	179	179
query91	173	162	133	133
query92	61	57	56	56
query93	1452	1490	913	913
query94	546	351	265	265
query95	692	388	350	350
query96	1040	799	356	356
query97	2710	2691	2567	2567
query98	213	204	196	196
query99	1181	1155	1011	1011
Total cold run time: 261706 ms
Total hot run time: 175320 ms

@hello-stephen

Copy link
Copy Markdown
Contributor
ClickBench: Total hot run time: 25.76 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit e9fd2a7185cb6a9d22434138504227ba8b25fd6d, data reload: false

query1	0.01	0.01	0.01
query2	0.11	0.05	0.05
query3	0.26	0.14	0.14
query4	1.61	0.14	0.14
query5	0.25	0.23	0.23
query6	1.24	1.11	1.11
query7	0.04	0.01	0.00
query8	0.06	0.04	0.03
query9	0.41	0.32	0.34
query10	0.59	0.61	0.57
query11	0.19	0.15	0.15
query12	0.19	0.15	0.14
query13	0.51	0.51	0.50
query14	1.06	1.03	1.05
query15	0.65	0.63	0.64
query16	0.33	0.31	0.33
query17	1.17	1.13	1.11
query18	0.23	0.22	0.21
query19	2.05	1.98	1.97
query20	0.01	0.02	0.01
query21	15.44	0.25	0.13
query22	4.79	0.05	0.06
query23	16.13	0.32	0.12
query24	3.01	0.44	0.32
query25	0.12	0.06	0.04
query26	0.73	0.21	0.14
query27	0.04	0.04	0.04
query28	3.54	0.96	0.53
query29	12.54	4.44	3.50
query30	0.27	0.16	0.16
query31	2.77	0.66	0.32
query32	3.22	0.63	0.50
query33	3.19	3.25	3.20
query34	15.67	4.69	3.63
query35	3.67	3.63	3.72
query36	0.56	0.45	0.43
query37	0.10	0.07	0.06
query38	0.05	0.04	0.04
query39	0.04	0.03	0.03
query40	0.19	0.16	0.15
query41	0.09	0.03	0.03
query42	0.04	0.02	0.02
query43	0.04	0.03	0.04
Total cold run time: 97.21 s
Total hot run time: 25.76 s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants