feat: contrib Delta executor-side Rust (kernel read + deletion vectors) [Delta contrib split, part 3b]#6
Draft
schenksj wants to merge 2 commits into
Draft
Conversation
1a774d0 to
2a9a4c6
Compare
38e2312 to
7c1773f
Compare
…s) [Delta contrib split, part 3b] Part 3b of the Delta Lake contrib PR breakup (tracking: apache#4366). Completes the contrib native crate: the executor-side read path replaces the build-gate stub planner, so a `-Pcontrib-delta` build now does end-to-end native Delta reads (given a scan task, read through delta-kernel-rs, apply the transform + deletion vectors). - `planner.rs` - replaces the stub: assembles the per-task `DataSourceExec` (parquet scan + partition values + DV filter), wired to the core dispatch shim's `plan_delta_scan` call. - `kernel_scan.rs` - the kernel read bridge (`planner` <-> `kernel_scan` are mutually dependent and ship together): schema resolution, column-mapping, row-tracking, transform. - `dv_reader.rs` - Delta deletion-vector decode (inline + on-disk roaring bitmaps), surfaced as a DataFusion filter; missing-DV-file maps to SparkError::FileNotFound for parity. - `lib.rs` - re-adds the `dv_reader`/`kernel_scan` module decls and the `DeltaScan`/`DeltaScanCommon` proto re-exports trimmed in 3a. - `Cargo.toml` - re-adds the executor deps (parquet, roaring, datafusion-datasource, futures, chrono*, comet-common, tokio dev-dep) deferred from 3a. Core is untouched -- the dispatch shim is unchanged; it now reaches the real planner instead of the stub. The native crate is now equivalent to the integration branch (modulo the crate version, kept at 0.18.0, and a clarified planner doc-link). Default builds still carry zero Delta surface. Verification: gated native build, 89 in-crate unit tests (54 driver + 35 executor), default native build unchanged, clippy (both feature states), gate-verify script (contrib libcomet +13 MB), cargo fmt -- all green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…n guard [apache#30 + themeA, folded into A.3b]
7c1773f to
0fab506
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What this part is
The Rust executor side that completes the contrib Delta native crate. It replaces the build-gate stub
plannerwith the real one, so a-Pcontrib-deltabuild now does end-to-end native Delta reads: given a scan task, read through delta-kernel-rs, apply the transform + deletion vectors.planner.rs— replaces the stub: assembles the per-taskDataSourceExec(kernel scan + partition values + DV filter), reached by the unchanged core dispatch shim.kernel_scan.rs— the kernel read bridge (schema resolution, column-mapping, row-tracking, transform).planner↔kernel_scanare mutually dependent and ship together.dv_reader.rs— Delta deletion-vector decode (inline + on-disk), surfaced as a DataFusion filter; a missing DV file maps toSparkError::FileNotFoundfor JVM error parity.lib.rs/Cargo.toml— re-add the executor module decls, proto re-exports, and deps deferred in 3a.Why it's contained
Core is untouched — the dispatch shim is byte-identical; it now reaches the real planner instead of the stub. A.3b touches only
contrib/delta/native+ the native lockfile. Default (non-contrib-delta) builds still carry zero Delta surface (the gate-verify script confirms 0 Delta symbols in the defaultlibcomet; the contrib build is ~13 MB larger).Verification
gated native build, 89 in-crate unit tests (54 driver + 35 executor), default native build unchanged, clippy (both feature states), the gate-verify script, and
cargo fmt— all green. The review pass also removed four dead dependencies the integration branch carried (roaring,datafusion-datasource, a directparquetdep, and datafusion'sparquetfeature — none referenced by the code; all parquet I/O flows throughdelta_kernel).🤖 AI disclosure: this PR was prepared with assistance from Claude Code (Claude Opus 4.8), under the submitter's review and direction.