Skip to content

Single-pass writeString fast path for short strings in ByteBuffersDataOutput#16280

Open
neoremind wants to merge 2 commits into
apache:mainfrom
neoremind:bbo_writestring_fast_path_pr
Open

Single-pass writeString fast path for short strings in ByteBuffersDataOutput#16280
neoremind wants to merge 2 commits into
apache:mainfrom
neoremind:bbo_writestring_fast_path_pr

Conversation

@neoremind

Copy link
Copy Markdown
Contributor

Background

In #13863, ByteBuffersDataOutput.writeString() was optimized to avoid allocating BytesRef and copying bytes to the dest buffer, instead it encoded directly in place. Indeed, it requires two passes over the input string chars: first calcUTF16toUTF8Length to get the VInt length prefix, then UTF16toUTF8 for the utf8 encoding. The opportunity is: for short strings, we can save that first pass.

What this PR does

This PR adds a single-pass fast path for short strings (charCount <= 42) where the max UTF-8 byte length is 42 * 3 = 126, it always fits as 1-byte VInt. So we know the VInt prefix size without needing to go over the string chars upfront. Reserve 1 byte, encode directly into the dest buffer, then backfill the length. For strings that don't hit the shortcut, fall to existing logic.

To my understanding, this could benefit stored fields writes of short strings like business related keywords, IDs, titles, etc. Plus short strings like field infos, codec metadata, segment names, etc.

Benchmarks

I added a JMH benchmark comparing the new impl against the current across ASCII, CJK, and Latin-extended strings at various lengths, see here for keeping the current impl to do apple-to-apple compare. Target written byte size matches stored fields chunk sizes: 80KB (BEST_SPEED default), 480KB (BEST_COMPRESSION default), and 2MB (imagine customized larger chunk in store fields .fdt). The benchmark uses a resettable ByteBuffersDataOutput starting with 1KB blocks to mimic real-world workload.

Results show notable gains on short strings with no regressions on medium/long/very large strings (only acceptable jitter as I saw) which fall to the unchanged logic.

Throughput in ops/s. Each run writes target written byte size into the buffer. Measured on EC2 m5.2xlarge.

See detailed results

Benchmark                                               (stringType)  (targetBytes)   Mode  Cnt      Score     Error  Units
ByteBuffersDataOutputWriteStringBenchmark.newImpl            ascii_1          81920  thrpt   15   1924.154 ±   3.998  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl            ascii_1         491520  thrpt   15    325.054 ±   0.712  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl            ascii_1        2097152  thrpt   15     77.335 ±   0.249  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl           ascii_10          81920  thrpt   15   5127.397 ± 124.657  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl           ascii_10         491520  thrpt   15    894.737 ±   4.701  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl           ascii_10        2097152  thrpt   15    206.414 ±   2.523  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl           ascii_20          81920  thrpt   15   7907.056 ±  28.022  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl           ascii_20         491520  thrpt   15   1374.817 ±   4.420  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl           ascii_20        2097152  thrpt   15    325.101 ±   0.932  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl           ascii_30          81920  thrpt   15   9654.601 ±  40.498  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl           ascii_30         491520  thrpt   15   1764.192 ±   6.306  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl           ascii_30        2097152  thrpt   15    416.434 ±   1.790  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl           ascii_40          81920  thrpt   15  10563.802 ±  30.043  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl           ascii_40         491520  thrpt   15   1891.552 ±   4.140  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl           ascii_40        2097152  thrpt   15    449.588 ±   4.443  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl       ascii_medium          81920  thrpt   15   9263.776 ±  98.204  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl       ascii_medium         491520  thrpt   15   1514.433 ±   0.863  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl       ascii_medium        2097152  thrpt   15    356.831 ±   0.588  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl         ascii_long          81920  thrpt   15  12117.442 ± 424.084  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl         ascii_long         491520  thrpt   15   2114.019 ±   2.865  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl         ascii_long        2097152  thrpt   15    503.861 ±   5.616  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl       ascii_vlarge          81920  thrpt   15  11603.539 ±  28.604  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl       ascii_vlarge         491520  thrpt   15   2050.525 ±   1.159  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl       ascii_vlarge        2097152  thrpt   15    519.435 ±   5.892  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl              cjk_1          81920  thrpt   15   3598.613 ±  27.463  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl              cjk_1         491520  thrpt   15    589.760 ±   2.930  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl              cjk_1        2097152  thrpt   15    142.267 ±   1.822  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl             cjk_10          81920  thrpt   15   6516.930 ± 155.093  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl             cjk_10         491520  thrpt   15   1124.501 ±  51.999  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl             cjk_10        2097152  thrpt   15    268.392 ±  10.699  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl             cjk_20          81920  thrpt   15   7444.068 ±  28.467  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl             cjk_20         491520  thrpt   15   1251.821 ±  63.880  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl             cjk_20        2097152  thrpt   15    316.346 ±   4.879  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl             cjk_30          81920  thrpt   15   7735.062 ±  33.040  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl             cjk_30         491520  thrpt   15   1369.589 ±  23.248  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl             cjk_30        2097152  thrpt   15    310.114 ±  12.392  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl             cjk_40          81920  thrpt   15   7861.299 ±  44.006  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl             cjk_40         491520  thrpt   15   1426.798 ±   1.373  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl             cjk_40        2097152  thrpt   15    328.560 ±   8.392  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl         cjk_medium          81920  thrpt   15   5302.579 ±  67.898  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl         cjk_medium         491520  thrpt   15    829.204 ±   5.262  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl         cjk_medium        2097152  thrpt   15    210.442 ±   0.308  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl           cjk_long          81920  thrpt   15   5704.934 ± 119.140  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl           cjk_long         491520  thrpt   15    934.739 ±  31.456  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl           cjk_long        2097152  thrpt   15    211.968 ±   3.531  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl         cjk_vlarge          81920  thrpt   15   6736.329 ± 244.534  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl         cjk_vlarge         491520  thrpt   15    927.611 ±  12.725  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl         cjk_vlarge        2097152  thrpt   15    231.230 ±   4.009  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl        latin_ext_1          81920  thrpt   15   2330.881 ±  32.202  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl        latin_ext_1         491520  thrpt   15    398.409 ±   5.090  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl        latin_ext_1        2097152  thrpt   15     93.175 ±   1.428  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl       latin_ext_10          81920  thrpt   15   4296.039 ±  48.292  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl       latin_ext_10         491520  thrpt   15    748.831 ±   5.288  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl       latin_ext_10        2097152  thrpt   15    178.731 ±   2.817  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl       latin_ext_20          81920  thrpt   15   4953.465 ±  80.963  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl       latin_ext_20         491520  thrpt   15    859.932 ±  27.221  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl       latin_ext_20        2097152  thrpt   15    206.179 ±   6.109  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl       latin_ext_30          81920  thrpt   15   5053.684 ± 232.941  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl       latin_ext_30         491520  thrpt   15    878.187 ±  10.097  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl       latin_ext_30        2097152  thrpt   15    208.340 ±   1.234  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl       latin_ext_40          81920  thrpt   15   4932.669 ±   9.067  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl       latin_ext_40         491520  thrpt   15    962.194 ±  57.633  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl       latin_ext_40        2097152  thrpt   15    216.052 ±   2.011  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl   latin_ext_medium          81920  thrpt   15   3523.366 ±  14.522  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl   latin_ext_medium         491520  thrpt   15    593.160 ±   3.174  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl   latin_ext_medium        2097152  thrpt   15    138.684 ±   0.154  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl     latin_ext_long          81920  thrpt   15   3652.496 ±  86.858  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl     latin_ext_long         491520  thrpt   15    630.856 ±  23.506  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl     latin_ext_long        2097152  thrpt   15    152.758 ±   5.463  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl   latin_ext_vlarge          81920  thrpt   15   4227.879 ±   7.569  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl   latin_ext_vlarge         491520  thrpt   15    633.812 ±   1.601  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl   latin_ext_vlarge        2097152  thrpt   15    148.096 ±   0.526  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl              mixed          81920  thrpt   15   2610.423 ±   8.035  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl              mixed         491520  thrpt   15    526.189 ±  11.442  ops/s
ByteBuffersDataOutputWriteStringBenchmark.newImpl              mixed        2097152  thrpt   15    117.501 ±   5.147  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl           ascii_1          81920  thrpt   15   1449.904 ±   0.730  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl           ascii_1         491520  thrpt   15    237.547 ±   0.981  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl           ascii_1        2097152  thrpt   15     55.849 ±   0.035  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl          ascii_10          81920  thrpt   15   3632.715 ±   7.330  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl          ascii_10         491520  thrpt   15    608.009 ±   1.032  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl          ascii_10        2097152  thrpt   15    143.089 ±   0.086  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl          ascii_20          81920  thrpt   15   5513.255 ±  16.047  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl          ascii_20         491520  thrpt   15    939.471 ±   0.893  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl          ascii_20        2097152  thrpt   15    221.746 ±   0.437  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl          ascii_30          81920  thrpt   15   6810.637 ±  33.651  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl          ascii_30         491520  thrpt   15   1180.119 ±   2.552  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl          ascii_30        2097152  thrpt   15    276.847 ±   0.688  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl          ascii_40          81920  thrpt   15   7800.776 ±  14.315  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl          ascii_40         491520  thrpt   15   1310.465 ±   2.490  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl          ascii_40        2097152  thrpt   15    311.610 ±   0.348  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl      ascii_medium          81920  thrpt   15   9042.239 ±  37.124  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl      ascii_medium         491520  thrpt   15   1470.004 ±   5.105  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl      ascii_medium        2097152  thrpt   15    346.409 ±   0.763  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl        ascii_long          81920  thrpt   15  10884.157 ±  32.714  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl        ascii_long         491520  thrpt   15   2047.124 ±   3.786  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl        ascii_long        2097152  thrpt   15    485.906 ±   0.356  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl      ascii_vlarge          81920  thrpt   15  11570.370 ±  10.070  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl      ascii_vlarge         491520  thrpt   15   2070.484 ±   1.673  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl      ascii_vlarge        2097152  thrpt   15    506.705 ±  11.358  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl             cjk_1          81920  thrpt   15   2732.453 ±  18.110  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl             cjk_1         491520  thrpt   15    473.930 ±  11.438  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl             cjk_1        2097152  thrpt   15    109.360 ±   2.644  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl            cjk_10          81920  thrpt   15   4078.860 ± 229.551  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl            cjk_10         491520  thrpt   15    729.199 ±  42.046  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl            cjk_10        2097152  thrpt   15    163.849 ±   0.211  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl            cjk_20          81920  thrpt   15   4728.439 ± 108.248  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl            cjk_20         491520  thrpt   15    756.027 ±  28.522  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl            cjk_20        2097152  thrpt   15    180.958 ±  11.565  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl            cjk_30          81920  thrpt   15   4945.852 ± 123.435  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl            cjk_30         491520  thrpt   15    853.268 ±   4.967  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl            cjk_30        2097152  thrpt   15    199.801 ±   0.083  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl            cjk_40          81920  thrpt   15   5080.684 ± 114.575  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl            cjk_40         491520  thrpt   15    872.155 ±   0.935  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl            cjk_40        2097152  thrpt   15    198.099 ±   5.012  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl        cjk_medium          81920  thrpt   15   5114.304 ±  16.729  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl        cjk_medium         491520  thrpt   15    836.790 ±   3.880  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl        cjk_medium        2097152  thrpt   15    193.791 ±  14.359  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl          cjk_long          81920  thrpt   15   5636.091 ±  96.048  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl          cjk_long         491520  thrpt   15    899.898 ±   4.430  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl          cjk_long        2097152  thrpt   15    211.120 ±   0.845  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl        cjk_vlarge          81920  thrpt   15   6610.988 ± 368.882  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl        cjk_vlarge         491520  thrpt   15    897.061 ±  15.893  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl        cjk_vlarge        2097152  thrpt   15    226.848 ±   9.797  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl       latin_ext_1          81920  thrpt   15   1707.395 ±  20.488  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl       latin_ext_1         491520  thrpt   15    290.791 ±   0.661  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl       latin_ext_1        2097152  thrpt   15     68.084 ±   0.438  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl      latin_ext_10          81920  thrpt   15   2562.599 ±  27.365  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl      latin_ext_10         491520  thrpt   15    437.844 ±   3.480  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl      latin_ext_10        2097152  thrpt   15    103.573 ±   0.355  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl      latin_ext_20          81920  thrpt   15   2849.567 ±   5.463  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl      latin_ext_20         491520  thrpt   15    488.922 ±   4.148  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl      latin_ext_20        2097152  thrpt   15    114.500 ±   0.159  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl      latin_ext_30          81920  thrpt   15   3112.005 ± 104.903  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl      latin_ext_30         491520  thrpt   15    519.170 ±   1.386  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl      latin_ext_30        2097152  thrpt   15    125.173 ±   4.172  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl      latin_ext_40          81920  thrpt   15   3159.485 ±  13.467  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl      latin_ext_40         491520  thrpt   15    545.461 ±  10.699  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl      latin_ext_40        2097152  thrpt   15    129.708 ±   4.595  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl  latin_ext_medium          81920  thrpt   15   3521.568 ±   4.052  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl  latin_ext_medium         491520  thrpt   15    604.327 ±  17.521  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl  latin_ext_medium        2097152  thrpt   15    138.913 ±   0.268  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl    latin_ext_long          81920  thrpt   15   3583.787 ±  28.151  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl    latin_ext_long         491520  thrpt   15    619.880 ±   9.109  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl    latin_ext_long        2097152  thrpt   15    156.162 ±   0.251  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl  latin_ext_vlarge          81920  thrpt   15   4230.539 ±  11.689  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl  latin_ext_vlarge         491520  thrpt   15    636.914 ±   1.179  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl  latin_ext_vlarge        2097152  thrpt   15    147.291 ±   0.189  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl             mixed          81920  thrpt   15   2569.503 ±  34.528  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl             mixed         491520  thrpt   15    471.877 ±  13.853  ops/s
ByteBuffersDataOutputWriteStringBenchmark.prevImpl             mixed        2097152  thrpt   15    111.679 ±   0.714  ops/s

80KB target (BEST_SPEED chunk size)

String Type New Prev Delta
ascii_1 1924 1478 +30%
ascii_10 5127 3633 +41%
ascii_20 7907 5513 +43%
ascii_30 9655 6811 +42%
ascii_40 10564 7801 +35%
ascii_medium 9264 9042 +2%
ascii_long 12117 10884 +11%
ascii_vlarge 11604 11570 0%
cjk_1 3599 2732 +32%
cjk_10 6517 4079 +60%
cjk_20 7444 4728 +57%
cjk_30 7735 4946 +56%
cjk_40 7861 5081 +55%
cjk_medium 5303 5114 +4%
cjk_long 5705 5636 +1%
cjk_vlarge 6736 6611 +2%
latin_ext_1 2331 1707 +37%
latin_ext_10 4296 2563 +68%
latin_ext_20 4953 2850 +74%
latin_ext_30 5054 3112 +62%
latin_ext_40 4933 3159 +56%
latin_ext_medium 3523 3522 0%
latin_ext_long 3652 3584 +2%
latin_ext_vlarge 4228 4231 0%
mixed 2610 2570 +2%

480KB target (BEST_COMPRESSION chunk size)

String Type New Prev Delta
ascii_1 325 238 +37%
ascii_10 895 608 +47%
ascii_20 1375 939 +46%
ascii_30 1764 1180 +49%
ascii_40 1892 1310 +44%
ascii_medium 1514 1470 +3%
ascii_long 2114 2047 +3%
ascii_vlarge 2051 2070 −1%
cjk_1 590 474 +24%
cjk_10 1125 729 +54%
cjk_20 1252 756 +66%
cjk_30 1370 853 +61%
cjk_40 1427 872 +64%
cjk_medium 829 837 −1%
cjk_long 935 900 +4%
cjk_vlarge 928 897 +3%
latin_ext_1 398 291 +37%
latin_ext_10 749 438 +71%
latin_ext_20 860 489 +76%
latin_ext_30 878 519 +69%
latin_ext_40 962 545 +76%
latin_ext_medium 593 604 −2%
latin_ext_long 631 620 +2%
latin_ext_vlarge 634 637 0%
mixed 526 472 +12%

2MB target (larger workload)

String Type New Prev Delta
ascii_1 77 56 +38%
ascii_10 206 143 +44%
ascii_20 325 222 +47%
ascii_30 416 277 +50%
ascii_40 450 312 +44%
ascii_medium 357 346 +3%
ascii_long 504 486 +4%
ascii_vlarge 519 507 +3%
cjk_1 142 109 +30%
cjk_10 268 164 +64%
cjk_20 316 181 +75%
cjk_30 310 200 +55%
cjk_40 329 198 +66%
cjk_medium 210 194 +9%
cjk_long 212 211 0%
cjk_vlarge 231 227 +2%
latin_ext_1 93 68 +37%
latin_ext_10 179 104 +73%
latin_ext_20 206 115 +80%
latin_ext_30 208 125 +66%
latin_ext_40 216 130 +67%
latin_ext_medium 139 139 0%
latin_ext_long 153 156 −2%
latin_ext_vlarge 148 147 +1%
mixed 118 112 +5%

More thoughts

I initially attempted a more aggressive approach: adding a second fast path also for 2-byte VInt (charCount 128–5461) and a calcVIntSizeForUTF8Length utility method with early-exit scanning for ambiguous ranges. This showed strong wins for almost all setups but for configurations with larger block sizes or larger target written size (enlarged docs per chunk or chunk size). But for the default settings (80KB chunk / 1024 docs), there is one ~5% regression on ascii_medium, plus it introduced extra branches, more complex logic. So I kept it simple: only the 1-byte VInt fast path. The code is straightforward, easy to read, and no regressions for all cases.

@github-actions github-actions Bot added this to the 10.6.0 milestone Jun 21, 2026
@neoremind

Copy link
Copy Markdown
Contributor Author

@dweiss would you mind taking a look? It's a small change that improves performance. Many thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant