Use Panama Vector API to SIMD-evaluate fixed-cardinality sorted numeric range queries in rangeIntoBitSet() by costin · Pull Request #16283 · apache/lucene

costin · 2026-06-22T14:22:23Z

When the stored values have fixed cardinality and no encoding transforms (no gcd, delta, table, or block compression), the vectorization provider loads N values into a SIMD vector, performs a broadcast range check (>= min AND <= max), collapses per-lane results into a per-doc mask, and OR-writes matching docs into the bitset in one operation.

Falls back to scalar when vectorLen % cardinality != 0 (e.g. vpd=8 on AVX2 with 4-lane vectors).

Benchmark

SortedNumericDocValuesRangeQueryBenchmark, 1M docs, cardinality=fixed, density=dense, queryShape=plain. Branch vs main, JDK 25.0.3.

AMD EPYC 7R32 (c5a.2xlarge) — AVX2, 256-bit (4 longs)

SIMD: vpd=2 (2 docs/vec), vpd=4 (1 doc/vec). vpd=8 falls back to scalar.

vpd	pattern	selectivity	baseline (ops/s)	candidate (ops/s)	ratio
2	clustered	0.01	15426.5	15565.0	1.01x
2	clustered	0.1	15331.1	15431.7	1.01x
2	clustered	0.5	19083.7	19168.8	1.00x
2	random	0.01	65.2	134.1	2.06x
2	random	0.1	58.7	95.8	1.63x
2	random	0.5	65.1	86.7	1.33x
4	clustered	0.01	9830.6	9843.3	1.00x
4	clustered	0.1	9676.3	9852.7	1.02x
4	clustered	0.5	18409.4	18256.0	0.99x
4	random	0.01	52.3	70.2	1.34x
4	random	0.1	46.3	49.7	1.07x
4	random	0.5	53.2	64.3	1.21x
8	clustered	0.01	5925.1	5891.1	0.99x
8	clustered	0.1	5830.6	5845.9	1.00x
8	clustered	0.5	17669.9	17599.9	1.00x
8	random	0.01	42.1	43.7	1.04x
8	random	0.1	38.6	40.9	1.06x
8	random	0.5	44.2	50.2	1.14x

Intel Xeon 8375C (c6i.2xlarge) — AVX-512, 512-bit (8 longs)

SIMD: vpd=2 (4 docs/vec), vpd=4 (2 docs/vec), vpd=8 (1 doc/vec).

vpd	pattern	selectivity	baseline (ops/s)	candidate (ops/s)	ratio
2	clustered	0.01	19255.8	19439.4	1.01x
2	clustered	0.1	18689.0	19133.6	1.02x
2	clustered	0.5	22700.0	22745.1	1.00x
2	random	0.01	83.3	208.4	2.50x
2	random	0.1	67.6	133.4	1.97x
2	random	0.5	65.7	127.8	1.94x
4	clustered	0.01	11715.4	11658.7	1.00x
4	clustered	0.1	11791.0	11727.0	0.99x
4	clustered	0.5	20998.0	21080.8	1.00x
4	random	0.01	63.5	104.1	1.64x
4	random	0.1	50.9	65.9	1.29x
4	random	0.5	61.5	94.7	1.54x
8	clustered	0.01	7133.1	7202.5	1.01x
8	clustered	0.1	6956.8	6995.9	1.01x
8	clustered	0.5	20338.1	20369.2	1.00x
8	random	0.01	47.2	53.5	1.13x
8	random	0.1	43.2	38.1	0.88x
8	random	0.5	51.6	54.1	1.05x

Clustered data shows no change since sequential access is already at L1/L2 cache speed; comparison cost is negligible. Wins appear on random data where per-doc cache misses dominate and SIMD batching amortizes comparison overhead.

Gains scale with docsPerVector: vpd=2 on AVX-512 processes 4 docs per vector (best), vpd=8 on AVX2 falls back to scalar (no gain).

Dense fixed-cardinality sorted numeric values can evaluate range blocks with the vectorization provider when the flattened value layout is raw and contiguous. Keep the optimization gated to layouts that benchmark well and retain scalar fallback behavior for other encodings.

sgup432 · 2026-06-22T21:59:39Z

+ *
+ * @lucene.internal
+ */
+public interface SortedNumericDocValuesRangeSupport {


I don't think creating a separate interface for this specific use case looks right.
It differs from the existing DocValuesRangeSupport only by a single parameter ie cardinality. So doesn't justify creating a new abstraction layer just based on that, probably we can create or add just another method to existing DocValuesRangeSupport? Something like:

default void rangeIntoBitSet(LongValues values, int fromDoc, int toDoc, int cardinality, long minValue, long maxValue, FixedBitSet bitSet, int offset) { // default to scalar approach. }

Let me know what you think.

Make sense. I've removed the interface in favor of a new method which has the nice benefit of reducing the PR size.

sgup432

Have few minor comments. Also I think we should unit tests which cover scenarios with different cardinality values where it is > 1, vectorLen % cardinality != 0 and other cases? Assuming this is not already covered via existing tests.

costin · 2026-06-25T11:10:48Z

Have few minor comments. Also I think we should unit tests which cover scenarios with different cardinality values where it is > 1, vectorLen % cardinality != 0 and other cases? Assuming this is not already covered via existing tests.

See my other comments. I've parameterized testSortedNumericRangeIntoBitSetVaryingCardinality to exercise the other cardinalities {2, 3, 4, 5, 7, 8} to check both the SIMD and fallback scalar path.

sgup432

LGTM!

github-actions Bot added the module:core/codecs label Jun 22, 2026

sgup432 reviewed Jun 22, 2026

View reviewed changes

costin force-pushed the lucene/sorted-numeric-fixed-simd branch from 27af116 to 015f370 Compare June 23, 2026 12:59

github-actions Bot added this to the 10.6.0 milestone Jun 23, 2026

sgup432 reviewed Jun 24, 2026

View reviewed changes

Comment thread ...ne/core/src/java25/org/apache/lucene/internal/vectorization/PanamaDocValuesRangeSupport.java

Comment thread ...ne/core/src/java25/org/apache/lucene/internal/vectorization/PanamaDocValuesRangeSupport.java

Address feedback

b8f15a9

github-actions Bot added the module:core/search label Jun 25, 2026

sgup432 approved these changes Jun 25, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use Panama Vector API to SIMD-evaluate fixed-cardinality sorted numeric range queries in rangeIntoBitSet()#16283

Use Panama Vector API to SIMD-evaluate fixed-cardinality sorted numeric range queries in rangeIntoBitSet()#16283
costin wants to merge 2 commits into
apache:mainfrom
costin:lucene/sorted-numeric-fixed-simd

costin commented Jun 22, 2026

Uh oh!

sgup432 Jun 22, 2026

Uh oh!

costin Jun 23, 2026

Uh oh!

sgup432 left a comment

Uh oh!

Uh oh!

Uh oh!

costin commented Jun 25, 2026

Uh oh!

sgup432 left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

costin commented Jun 22, 2026

Benchmark

AMD EPYC 7R32 (c5a.2xlarge) — AVX2, 256-bit (4 longs)

Intel Xeon 8375C (c6i.2xlarge) — AVX-512, 512-bit (8 longs)

Uh oh!

sgup432 Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

costin Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

sgup432 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

costin commented Jun 25, 2026

Uh oh!

sgup432 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants