perf: don't apply fancy indexing if split can be a slice. by selmanozleyen · Pull Request #235 · scverse/annbatch

selmanozleyen · 2026-06-15T14:01:31Z

In my use case when batch_size=1, splits becomes something like this = [[0],[1],[2],...]. For each split I think this becomes a fancy indexing operation and can be costly. In my case every row is 40mb. And I want to have some preloaded rows but turns out each in_memory_data[split] is costly even though it fetches for one row.

This is a bit more of a generalized case. But I added (split[-1] - split[0] == len(split) - 1 check so it returns early but if you think we should avoid this check we can also just have a special case for batch_size=1

for more information, see https://pre-commit.ci

codecov · 2026-06-15T14:13:22Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 91.92%. Comparing base (796e789) to head (d520726).

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #235      +/-   ##
==========================================
- Coverage   93.48%   91.92%   -1.57%     
==========================================
  Files          15       15              
  Lines        1397     1399       +2     
==========================================
- Hits         1306     1286      -20     
- Misses         91      113      +22

Files with missing lines	Coverage Δ
src/annbatch/loader.py	`87.79% <100.00%> (-3.01%)`	⬇️

... and 2 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

ilan-gold · 2026-06-15T16:28:07Z

In my use case when batch_size=1

Just to be clear, then

sel = slice(sel[0], sel[-1] + 1)

becomes a length-1 slice? And this is faster than fancy-indexing? Feels like a bug almost (not with us)

selmanozleyen · 2026-06-15T17:39:46Z

In my use case when batch_size=1

Just to be clear, then
sel = slice(sel[0], sel[-1] + 1)
becomes a length-1 slice? And this is faster than fancy-indexing? Feels like a bug almost (not with us)

Yes its setup: preloadnchunks=x,batchsize=1,chunksize1.
It is because I have irregular batch sizes but I also want the preloadnchunks of annbatch. Like unique donors corresponding to a batch can be arbitrary. I asked Lukas if its fine if we sample fixed sized donors and sample the corresponding cells from the donors so we have fixed sized batches and it would be somewhat mathematically equivalent. But I wanted to implement whatever lucas already has 1-1

selmanozleyen · 2026-06-16T08:32:24Z

I checked and for numpy it creates a copy in case of fany indexing of the size of the index. Vs compared to slices which only returns a view. Since each element is 40mb even allocating one row can be costly. In short if its a slice its a view if its a list its a copy view vs copy. Makes sense that they don't have special cases because you would want consistent behaviour.

selmanozleyen added 2 commits June 15, 2026 15:51

init

4414378

add one more condition to shortcircuit

b32c2c9

selmanozleyen changed the title ~~perf: Don't apply fancy indexing if split can be a slice.~~ perf: don't apply fancy indexing if split can be a slice. Jun 15, 2026

[pre-commit.ci] auto fixes from pre-commit.com hooks

ed13ead

for more information, see https://pre-commit.ci

selmanozleyen added the skip-gpu-ci Whether gpu ci should be skipped label Jun 15, 2026

selmanozleyen requested a review from ilan-gold June 15, 2026 14:44

Merge branch 'main' into perf/dont-apply-fancy-indexing-if-not-needed

d520726

selmanozleyen mentioned this pull request Jun 16, 2026

Perf: buffer to avoid expensive fancy indexing for dense data #239

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf: don't apply fancy indexing if split can be a slice.#235

perf: don't apply fancy indexing if split can be a slice.#235
selmanozleyen wants to merge 4 commits into
scverse:mainfrom
selmanozleyen:perf/dont-apply-fancy-indexing-if-not-needed

selmanozleyen commented Jun 15, 2026

Uh oh!

codecov Bot commented Jun 15, 2026 •

edited

Loading

Uh oh!

ilan-gold commented Jun 15, 2026 •

edited

Loading

Uh oh!

selmanozleyen commented Jun 15, 2026

Uh oh!

selmanozleyen commented Jun 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

selmanozleyen commented Jun 15, 2026

Uh oh!

codecov Bot commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

ilan-gold commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

selmanozleyen commented Jun 15, 2026

Uh oh!

selmanozleyen commented Jun 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov Bot commented Jun 15, 2026 •

edited

Loading

ilan-gold commented Jun 15, 2026 •

edited

Loading