Use the doc-values skip index to skip per-doc value lookups in LongRangeFacetCutter by slow-J · Pull Request #16268 · apache/lucene

slow-J · 2026-06-17T16:03:12Z

Resolves #16249

Implementation heavily inspired by HistogramCollector.java.

Range faceting (in the sandbox module -LongRangeFacetCutter) currently reads the doc-values value for every matching document and binary-searches it into an elementary interval. When the faceted field is single-valued, we can use a doc-values skip index. For a dense skip block whose min and max values fall into the same elementary interval, every document in that block maps to that interval, allowing us to skip the per-doc value lookup and binary search.

Limitation - applies to single-valued, long fields only.

Benchmark (luceneutil)

I used my branch of https://github.com/slow-J/luceneutil/tree/github-16249-range-facet-bench which cherry picked 2 of @epotyom 's commits (mainly mikemccand/luceneutil#582 which adds range-facet support)

Setup:
runlocal.py, wikimediumall (33.3M docs), index-sorted by lastMod_skipper with
addDVSkippers=true. baseline = main, candidate = this change, both DURING_COLLECTION, so
the only difference is this optimization. 30 JVM iterations.

Command: python3 -u src/python/localrun.py -s rangeFacetsWikimediumAll -b lucene_baseline -c lucene_candidate -iterations 30 -warmups 20 2>&1 | tee "$BASE/run-timing7.txt"

Edit: new benchmark results after the changes for Egors first 2 comments.
Edit2: new benchmark results after unwrapping removed

QPS

Task	QPS baseline	StdDev	QPS modified	StdDev	Pct diff	p-value
BrowseLastModOvlpRangeFacets	1.26	(7.7%)	2.72	(10.6%)	115.5% (90% - 145%)	0.000
BrowseLastModRangeFacets	2.21	(6.0%)	3.31	(8.8%)	50.0% (33% - 68%)	0.000
MedTermLastModOvlpRangeFacets	3.82	(13.5%)	5.48	(5.7%)	43.5% (21% - 72%)	0.000
MedTermLastModRangeFacets	4.15	(13.6%)	5.26	(7.9%)	26.5% (4% - 55%)	0.000
BrowseIDOvlpRangeFacets	1.21	(6.6%)	1.10	(6.7%)	-9.6% (-21% - 4%)	0.000
BrowseIDRangeFacets	2.33	(8.6%)	2.57	(5.1%)	10.1% (-3% - 26%)	0.000
MedTermIDOvlpRangeFacets	3.79	(13.5%)	4.61	(11.1%)	21.6% (-2% - 53%)	0.000
MedTermIDRangeFacets	5.98	(4.6%)	5.92	(2.7%)	-0.9% (-7% - 6%)	0.340

Latency (ms) — aggregated across all iterations

Task	P50 B	P50 C	Diff	P90 B	P90 C	Diff	P99 B	P99 C	Diff	P999 B	P999 C	Diff	P100 B	P100 C	Diff
BrowseLastModOvlpRangeFacets	844.184	386.006	-54.3%	1437.289	581.094	-59.6%	7523.983	828.460	-89.0%	9510.480	868.764	-90.9%	9555.393	888.500	-90.7%
BrowseLastModRangeFacets	474.762	319.836	-32.6%	854.574	546.789	-36.0%	4412.829	781.421	-82.3%	7775.105	862.760	-88.9%	7910.258	893.449	-88.7%
MedTermLastModOvlpRangeFacets	286.226	187.654	-34.4%	552.668	436.448	-21.0%	771.820	599.881	-22.3%	1327.279	705.213	-46.9%	1445.804	707.766	-51.0%
MedTermLastModRangeFacets	260.932	200.115	-23.3%	652.004	510.872	-21.6%	847.848	635.331	-25.1%	2966.134	743.950	-74.9%	3060.317	745.647	-75.6%
BrowseIDOvlpRangeFacets	860.895	976.209	+13.4%	1419.693	1279.444	-9.9%	8271.185	1476.704	-82.1%	9919.502	1531.237	-84.6%	9928.280	1536.195	-84.5%
BrowseIDRangeFacets	461.967	404.593	-12.4%	799.144	625.845	-21.7%	5972.427	860.420	-85.6%	8963.973	930.259	-89.6%	9483.903	942.619	-90.1%
MedTermIDOvlpRangeFacets	294.831	235.198	-20.2%	676.861	539.088	-20.4%	897.009	671.736	-25.1%	1835.175	742.857	-59.5%	2055.182	744.089	-63.8%
MedTermIDRangeFacets	169.089	170.565	+0.9%	495.786	401.676	-19.0%	697.206	591.299	-15.2%	1026.169	690.797	-32.7%	1647.263	695.272	-57.8%

slow-J · 2026-06-18T15:17:21Z

I reran benchmarks, this time correctly using localrun, and updated the results in #16268 (comment)

epotyom

Nice change! One suggestion below

epotyom · 2026-06-19T22:33:10Z

+      for (int level = 0; level < skipper.numLevels(); ++level) {
+        int totalDocsAtLevel = skipper.maxDocID(level) - skipper.minDocID(level) + 1;
+        if (skipper.docCount(level) != totalDocsAtLevel) {
+          // Some docs at this level have no value, so we can't resolve the whole block at once.


I think skipper can stil improve performance for this case, as we can still cache ordinal, it is just that in this case we have to always call longValues.advanceExact(doc). If it returns true - we return cached ordinal (and avoid reading long value as well as binary search elementary interval), otherwise return false

Thanks for the review Egor! Good point!
I'll try to address these 2 comments and run new benchmarks.

Took me some time to publish the changes as I managed to initially introduce a regression into range facets without skipper, I done a small refactor while fixing that.

I am getting much better performance after implementing your suggestions, I will update the benchmark results in the top level comment.
Edit: updated benchmark results.

Hmm looking at the new benchmark results, there is an improvement in the ID tasks, which do not have a doc-values skip index. This is due to a change in the latest commit.

id is a single-valued field, but in main, fromLongField never unwraps to a single-valued source, always picking the multi-valued leaf cutter even when the field is single-valued.

We now route single-valued segments to the single-valued cutter instead. The new create(String field, …) keeps the field name, which lets createLeafCutter inspect each segment during search and pick the right cutter.

I have kept this in this PR but its slightly increasing the scope.

@epotyom let me know what you think about this.

Interesting, nice catch!

This is due to a change in the latest commit.

Does the latest commit also include the interval-tracking rewind change? If so, could you please run benchmarks for the unwrapping change only?

Unwrapping adds a little bit of complexity, but if it improves performance, I think we should keep it.

Does the latest commit also include the interval-tracking rewind change?

Yes, it has all three changes: non-dense fast path, rewind reuse, and unwrapping.

Unwrapping adds a little bit of complexity, but if it improves performance, I think we should keep it.

I'll setup and run a benchmark now just to see the perf diff due to the unwrapping.

Ran: python3 -u src/python/localrun.py -s rangeFacetsWikimediumAll -b lucene_baseline -c lucene_candidate -iterations 30 -warmups 20 2>&1 | tee "$BASE/run9-onlyunwrapping-timing.txt"

Heres the result.

Latency (ms) — aggregated across all iterations

Task P50 B P50 C Diff P90 B P90 C Diff P99 B P99 C Diff P999 B P999 C Diff P100 B P100 C Diff

BrowseIDOvlpRangeFacets 946.928 820.170 -13.4% 1279.722 1178.839 -7.9% 1509.599 3489.425 +131.1% 1579.256 10588.392 +570.5% 1647.232 10662.196 +547.3%

BrowseIDRangeFacets 410.635 296.512 -27.8% 630.174 670.131 +6.3% 869.641 866.369 -0.4% 931.130 8681.360 +832.3% 1008.106 9240.540 +816.6%

BrowseLastModOvlpRangeFacets 381.116 365.262 -4.2% 577.681 639.843 +10.8% 866.191 2046.696 +136.3% 1028.218 10214.331 +893.4% 1029.600 10234.017 +894.0%

BrowseLastModRangeFacets 324.172 317.631 -2.0% 526.130 634.818 +20.7% 808.836 884.852 +9.4% 851.734 9413.995 +1005.3% 887.375 9912.493 +1017.1%

MedTermIDOvlpRangeFacets 213.936 181.524 -15.1% 433.905 437.942 +0.9% 603.100 702.917 +16.6% 679.574 838.447 +23.4% 761.983 848.642 +11.4%

MedTermIDRangeFacets 211.334 171.214 -19.0% 519.706 555.351 +6.9% 735.139 713.797 -2.9% 834.724 836.369 +0.2% 840.486 843.231 +0.3%

MedTermLastModOvlpRangeFacets 176.353 173.086 -1.9% 486.851 537.847 +10.5% 712.630 719.706 +1.0% 830.068 832.742 +0.3% 842.260 840.165 -0.2%

MedTermLastModRangeFacets 188.926 182.942 -3.2% 487.783 553.020 +13.4% 731.373 718.764 -1.7% 830.363 832.185 +0.2% 841.503 834.076 -0.9%

QPS

Task QPS baseline StdDev QPS modified StdDev Pct diff p-value

BrowseLastModRangeFacets 3.26 (5.8%) 3.34 (8.1%) 2.6% (-10% - 17%) 0.155

BrowseLastModOvlpRangeFacets 2.75 (5.1%) 2.85 (6.2%) 3.6% (-7% - 15%) 0.015

MedTermLastModRangeFacets 5.58 (7.9%) 5.80 (8.6%) 3.8% (-11% - 22%) 0.072

MedTermLastModOvlpRangeFacets 5.89 (6.9%) 6.13 (8.2%) 4.1% (-10% - 20%) 0.035

MedTermIDOvlpRangeFacets 4.94 (6.3%) 5.68 (6.6%) 15.0% (1% - 29%) 0.000

MedTermIDRangeFacets 5.19 (9.9%) 6.17 (7.1%) 18.7% (1% - 39%) 0.000

BrowseIDOvlpRangeFacets 1.12 (7.4%) 1.33 (11.9%) 19.1% (0% - 41%) 0.000

BrowseIDRangeFacets 2.52 (4.5%) 3.60 (12.0%) 42.7% (25% - 61%) 0.000

Mainly impacts the ID tasks which do not have a skipper.
But it does seem to cause a worrying latency regression at high percentile latency (p90 and onward).

Hmm, @epotyom what do you think? Since it is not related to the skipper change, I am partial towards removing the unwrapping and retesting performance.

Sounds good to me, let's separate these two changes.

Removed the unwrapping in latest commit, will edit the main benchmark in PR description with new numbers.

…ngeFacetCutter

…erval tracker

slow-J force-pushed the lucene-16249-skipper-range-facets branch from 03d7d2a to 066c419 Compare June 17, 2026 16:03

github-actions Bot added the module:sandbox label Jun 17, 2026

github-actions Bot added this to the 10.5.0 milestone Jun 17, 2026

slow-J force-pushed the lucene-16249-skipper-range-facets branch from 2e7144b to 0c72d5f Compare June 19, 2026 14:45

epotyom reviewed Jun 19, 2026

View reviewed changes

Comment thread ...ne/sandbox/src/java/org/apache/lucene/sandbox/facet/cutters/ranges/LongRangeFacetCutter.java Outdated

slow-J force-pushed the lucene-16249-skipper-range-facets branch from 1065433 to 7db2833 Compare June 23, 2026 10:39

github-actions Bot modified the milestones: 10.5.0, 10.6.0 Jun 23, 2026

slow-J requested a review from epotyom June 23, 2026 11:29

slow-J added 4 commits June 29, 2026 11:01

Use the doc-values skip index to skip per-doc value lookups in LongRa…

1bf8688

…ngeFacetCutter

Remove redundant assertion

2a8d01f

Extend the skip-index fast path to non-dense blocks and reuse the int…

dc38fbc

…erval tracker

Remove single-valued unwrapping routing

88fe293

slow-J force-pushed the lucene-16249-skipper-range-facets branch from 7db2833 to 88fe293 Compare June 29, 2026 11:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use the doc-values skip index to skip per-doc value lookups in LongRangeFacetCutter#16268

Use the doc-values skip index to skip per-doc value lookups in LongRangeFacetCutter#16268
slow-J wants to merge 4 commits into
apache:mainfrom
slow-J:lucene-16249-skipper-range-facets

slow-J commented Jun 17, 2026 •

edited

Loading

Uh oh!

slow-J commented Jun 18, 2026

Uh oh!

epotyom left a comment

Uh oh!

epotyom Jun 19, 2026 •

edited

Loading

Uh oh!

slow-J Jun 22, 2026

Uh oh!

slow-J Jun 23, 2026 •

edited

Loading

Uh oh!

slow-J Jun 23, 2026

Uh oh!

epotyom Jun 25, 2026

Uh oh!

slow-J Jun 26, 2026 •

edited

Loading

Uh oh!

slow-J Jun 26, 2026

Uh oh!

epotyom Jun 28, 2026

Uh oh!

slow-J Jun 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Task	P50 B	P50 C	Diff	P90 B	P90 C	Diff	P99 B	P99 C	Diff	P999 B	P999 C	Diff	P100 B	P100 C	Diff
BrowseIDOvlpRangeFacets	946.928	820.170	-13.4%	1279.722	1178.839	-7.9%	1509.599	3489.425	+131.1%	1579.256	10588.392	+570.5%	1647.232	10662.196	+547.3%
BrowseIDRangeFacets	410.635	296.512	-27.8%	630.174	670.131	+6.3%	869.641	866.369	-0.4%	931.130	8681.360	+832.3%	1008.106	9240.540	+816.6%
BrowseLastModOvlpRangeFacets	381.116	365.262	-4.2%	577.681	639.843	+10.8%	866.191	2046.696	+136.3%	1028.218	10214.331	+893.4%	1029.600	10234.017	+894.0%
BrowseLastModRangeFacets	324.172	317.631	-2.0%	526.130	634.818	+20.7%	808.836	884.852	+9.4%	851.734	9413.995	+1005.3%	887.375	9912.493	+1017.1%
MedTermIDOvlpRangeFacets	213.936	181.524	-15.1%	433.905	437.942	+0.9%	603.100	702.917	+16.6%	679.574	838.447	+23.4%	761.983	848.642	+11.4%
MedTermIDRangeFacets	211.334	171.214	-19.0%	519.706	555.351	+6.9%	735.139	713.797	-2.9%	834.724	836.369	+0.2%	840.486	843.231	+0.3%
MedTermLastModOvlpRangeFacets	176.353	173.086	-1.9%	486.851	537.847	+10.5%	712.630	719.706	+1.0%	830.068	832.742	+0.3%	842.260	840.165	-0.2%
MedTermLastModRangeFacets	188.926	182.942	-3.2%	487.783	553.020	+13.4%	731.373	718.764	-1.7%	830.363	832.185	+0.2%	841.503	834.076	-0.9%

Task	QPS baseline	StdDev	QPS modified	StdDev	Pct diff	p-value
BrowseLastModRangeFacets	3.26	(5.8%)	3.34	(8.1%)	2.6% (-10% - 17%)	0.155
BrowseLastModOvlpRangeFacets	2.75	(5.1%)	2.85	(6.2%)	3.6% (-7% - 15%)	0.015
MedTermLastModRangeFacets	5.58	(7.9%)	5.80	(8.6%)	3.8% (-11% - 22%)	0.072
MedTermLastModOvlpRangeFacets	5.89	(6.9%)	6.13	(8.2%)	4.1% (-10% - 20%)	0.035
MedTermIDOvlpRangeFacets	4.94	(6.3%)	5.68	(6.6%)	15.0% (1% - 29%)	0.000
MedTermIDRangeFacets	5.19	(9.9%)	6.17	(7.1%)	18.7% (1% - 39%)	0.000
BrowseIDOvlpRangeFacets	1.12	(7.4%)	1.33	(11.9%)	19.1% (0% - 41%)	0.000
BrowseIDRangeFacets	2.52	(4.5%)	3.60	(12.0%)	42.7% (25% - 61%)	0.000

Uh oh!

Conversation

slow-J commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmark (luceneutil)

QPS

Latency (ms) — aggregated across all iterations

Uh oh!

slow-J commented Jun 18, 2026

Uh oh!

epotyom left a comment

Choose a reason for hiding this comment

Uh oh!

epotyom Jun 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

slow-J Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

slow-J Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

slow-J Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

epotyom Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

slow-J Jun 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

slow-J Jun 26, 2026

Choose a reason for hiding this comment

Latency (ms) — aggregated across all iterations

QPS

Uh oh!

epotyom Jun 28, 2026

Choose a reason for hiding this comment

Uh oh!

slow-J Jun 29, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

slow-J commented Jun 17, 2026 •

edited

Loading

epotyom Jun 19, 2026 •

edited

Loading

slow-J Jun 23, 2026 •

edited

Loading

slow-J Jun 26, 2026 •

edited

Loading