perf(util): add `FixedBitSet.copyOf()` fast paths for `SparseLiveDocs` and `DenseLiveDocs` by salvatorecampagna · Pull Request #16282 · apache/lucene

salvatorecampagna · 2026-06-22T10:38:01Z

TL;DR

Make FixedBitSet.copyOf() 135x to 193x faster for DenseLiveDocs and 11x to 48x faster for SparseLiveDocs.

Summary

FixedBitSet.copyOf(Bits) has fast paths for FixedBitSet and FixedBits but not for SparseLiveDocs and DenseLiveDocs, the two LiveDocs types introduced in #15413. Both fall through to the generic O(maxDoc) per-bit loop. The hot caller is PendingDeletes.getMutableBits(), which invokes FixedBitSet.copyOf(liveDocs) on the first delete after a reader snapshot. Under write-heavy workloads this cost accumulates across open reader generations.

Each type now exposes a package-private toFixedBitSet() method that FixedBitSet.copyOf() delegates to, keeping the copy logic next to the data it knows about. DenseLiveDocs stores live docs in a FixedBitSet with identical semantics, so toFixedBitSet() clones it at O(maxDoc/64). SparseLiveDocs stores deleted positions in a SparseFixedBitSet, so toFixedBitSet() allocates a FixedBitSet, calls set(0, maxDoc) to mark all docs live in O(maxDoc/64), then iterates only the deleted positions via nextSetBit clearing each one, for a total of O(maxDoc/64 + deletedDocs).

Benchmarks

LiveDocsCopyOfBenchmark, FixedBitSet.copyOf() average time (us/op), -wi 5 -i 7 -f 3. Baseline = main HEAD; contender = this PR.

DenseLiveDocs

maxDoc	del rate	baseline (us)	err%	contender (us)	err%	speedup
1M	0.1%	317.838 +/- 5.686	1.8%	2.362 +/- 0.039	1.7%	135x
10M	0.1%	3166.801 +/- 91.964	2.9%	20.797 +/- 1.160	5.6%	152x
100M	0.1%	31434.474 +/- 234.936	0.7%	198.186 +/- 11.443	5.8%	159x
1M	1%	371.669 +/- 6.345	1.7%	2.302 +/- 0.054	2.3%	162x
10M	1%	3766.590 +/- 35.200	0.9%	19.476 +/- 0.684	3.5%	193x
100M	1%	37788.094 +/- 191.936	0.5%	203.131 +/- 10.855	5.3%	186x

SparseLiveDocs

maxDoc	del rate	baseline (us)	err%	contender (us)	err%	speedup
1M	0.1%	518.084 +/- 6.739	1.3%	10.900 +/- 0.096	0.9%	48x
10M	0.1%	4976.429 +/- 55.761	1.1%	111.454 +/- 1.495	1.3%	45x
100M	0.1%	49412.733 +/- 424.527	0.9%	1152.935 +/- 12.952	1.1%	43x
1M	1%	1201.822 +/- 43.661	3.6%	97.187 +/- 1.349	1.4%	12x
10M	1%	12062.162 +/- 146.840	1.2%	986.402 +/- 14.975	1.5%	12x
100M	1%	119511.553 +/- 688.099	0.6%	10884.173 +/- 159.084	1.5%	11x

The SparseLiveDocs speedup shrinks as the deletion rate grows: the contender always pays O(maxDoc/64) to fill the backing array via set(0, maxDoc), and on top of that clears one position per deleted document. At low deletion rates the fill dominates and the gap with the O(maxDoc) baseline is large; at higher rates the clearing loop contributes more and the advantage narrows.

FixedBitSet.copyOf(Bits) already has fast paths for FixedBitSet and FixedBits, but SparseLiveDocs and DenseLiveDocs (introduced in apache#15413) fell through to the O(maxDoc) generic loop. Each type now exposes a package-private toFixedBitSet() method that FixedBitSet.copyOf() delegates to: - DenseLiveDocs stores live docs in a FixedBitSet: clone it directly, O(maxDoc/64). - SparseLiveDocs stores deleted docs in a SparseFixedBitSet: pre-fill the backing long[] with -1L and clear only deleted positions using nextSetBit, O(deletedDocs + maxDoc/64). The hot caller is PendingDeletes.getMutableBits(), which invokes copyOf(liveDocs) on the first delete after a snapshot.

shubhamsrkdev · 2026-06-23T09:55:10Z

+    } else if (bits instanceof DenseLiveDocs denseLiveDocs) {
+      return denseLiveDocs.toFixedBitSet();
+    } else if (bits instanceof SparseLiveDocs sparseLiveDocs) {
+      return sparseLiveDocs.toFixedBitSet();


Can we have an interface (Have FixedBitSet, DenseLiveDocs, and SparseLiveDocs all implement it) which could be used here instead of multiple if/else if?

Thanks for the suggestion. The interface would make copyOf() cleaner, but the tricky part is that FixedBitSet itself would also need to implement it (to handle the case after the FixedBits unwrap at the top of the method). That means adding a toFixedBitSet() method to FixedBitSet whose only implementation is return clone(), which feels redundant and a bit odd semantically. Happy to go that route if the consensus is that the cleaner dispatch is worth it, but leaning toward keeping the instanceof chain since it mirrors the existing pattern already in the method for FixedBits/FixedBitSet.

That said, if the interface only covers DenseLiveDocs and SparseLiveDocs (not FixedBitSet), the semantic oddity goes away. Is that what you had in mind?

I think either way is fine - if it leads to a reduction of instanceof, not a huge fan of it (if it keeps on branching)

Replace the raw long[] pre-fill approach with FixedBitSet.set(0, maxDoc) followed by result.clear(doc) in the deletion loop. The two approaches are semantically identical: set(0, maxDoc) fills the backing array with -1L and masks off the ghost bits in the last word in one call.

rmuir · 2026-06-24T02:37:39Z

This PR will prevent the function from being inlined anymore (I do not know if it is important). Previously it would work with bimorphic inlining.

salvatorecampagna · 2026-06-24T06:22:43Z

This PR will prevent the function from being inlined anymore (I do not know if it is important). Previously it would work with bimorphic inlining.

I think you're referring to bits.get(i) in the generic fallback loop (that's the only virtual dispatch I see, am I missing anything else?). Before this PR, SparseLiveDocs and DenseLiveDocs both fell through to that loop, so the JIT saw exactly 2 receiver types at that call site and could apply bimorphic inlining for both get() implementations. After this PR those two types take the fast paths and never reach the loop, so that call site loses its 2-type profile, right?

The instanceof chain itself does not introduce megamorphic dispatch since after each type guard the type is statically known at that point.

As a result, the trade-off I see is: an O(maxDoc) loop with bimorphic virtual get() calls is replaced by O(maxDoc/64) direct word operations with no virtual dispatch. Is that the concern you had in mind?

rmuir · 2026-06-24T08:35:01Z

Please do not answer me with an LLM. Your AI is wrong about this.

rmuir

looks like a de-optimization with a benchmark that hides it

salvatorecampagna · 2026-06-24T08:52:29Z

looks like a de-optimization with a benchmark that hides it

I wrote the benchmark to compare against the generic loop, which seems to me like a fair baseline.
The fast path didn't exist before and that is what the new benchmark is measuring: main without fast path versus pr with fast path.

What scenario do you think it is hiding?

Also, the previous comment was mine.

salvatore-campagna added 2 commits June 22, 2026 10:24

CHANGES: update issue number to GITHUB#16282

6402096

github-actions Bot added the module:core/other label Jun 22, 2026

github-actions Bot added this to the 10.6.0 milestone Jun 22, 2026

tidy: fix google-java-format violation in LiveDocsCopyOfBenchmark

4913df0

salvatorecampagna marked this pull request as ready for review June 22, 2026 18:08

shubhamsrkdev reviewed Jun 23, 2026

View reviewed changes

Comment thread lucene/core/src/java/org/apache/lucene/util/SparseLiveDocs.java Outdated

Merge branch 'main' into perf/fixedbitset-copyof-livedocs-fast-paths

7f1bf5e

rmuir requested changes Jun 24, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf(util): add `FixedBitSet.copyOf()` fast paths for `SparseLiveDocs` and `DenseLiveDocs`#16282

perf(util): add `FixedBitSet.copyOf()` fast paths for `SparseLiveDocs` and `DenseLiveDocs`#16282
salvatorecampagna wants to merge 5 commits into
apache:mainfrom
salvatorecampagna:perf/fixedbitset-copyof-livedocs-fast-paths

salvatorecampagna commented Jun 22, 2026 •

edited

Loading

Uh oh!

shubhamsrkdev Jun 23, 2026

Uh oh!

salvatorecampagna Jun 23, 2026

Uh oh!

salvatorecampagna Jun 23, 2026

Uh oh!

shubhamsrkdev Jun 24, 2026 •

edited

Loading

Uh oh!

Uh oh!

rmuir commented Jun 24, 2026

Uh oh!

salvatorecampagna commented Jun 24, 2026 •

edited

Loading

Uh oh!

rmuir commented Jun 24, 2026

Uh oh!

rmuir left a comment

Uh oh!

salvatorecampagna commented Jun 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

salvatorecampagna commented Jun 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

TL;DR

Summary

Benchmarks

DenseLiveDocs

SparseLiveDocs

Uh oh!

shubhamsrkdev Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

salvatorecampagna Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

salvatorecampagna Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

shubhamsrkdev Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

rmuir commented Jun 24, 2026

Uh oh!

salvatorecampagna commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rmuir commented Jun 24, 2026

Uh oh!

rmuir left a comment

Choose a reason for hiding this comment

Uh oh!

salvatorecampagna commented Jun 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

salvatorecampagna commented Jun 22, 2026 •

edited

Loading

shubhamsrkdev Jun 24, 2026 •

edited

Loading

salvatorecampagna commented Jun 24, 2026 •

edited

Loading