Skip to content

feat: build gate + inert wiring for contrib Delta scans [Delta contrib split, part 2]#4

Draft
schenksj wants to merge 1 commit into
pr/delta-A1-spifrom
pr/delta-A2-buildgate
Draft

feat: build gate + inert wiring for contrib Delta scans [Delta contrib split, part 2]#4
schenksj wants to merge 1 commit into
pr/delta-A1-spifrom
pr/delta-A2-buildgate

Conversation

@schenksj

Copy link
Copy Markdown
Owner

Fork-local review draft (Delta-contrib PR split, part 2 / unit A.2). Base is pr/delta-A1-spi so the diff shows only A.2. Stacks on part 1 (apache#4700). Opens upstream once part 1 is approved/merged. Tracking umbrella: apache#4366.

What this part is

Build gate + inert wiring. Establishes the contrib-delta Maven profile / Cargo feature and the inert wiring so a gated build compiles end to end, while the DEFAULT build stays byte-for-byte behavior-unchanged and carries zero Delta surface. No real Delta read logic yet — a Delta read that reaches native returns a clean NotImplemented and falls back to vanilla Spark.

Build gate

  • Maven contrib-delta profile with per-Spark delta.version (3.5→3.3.2, 4.0→4.0.0, 4.1→4.1.0) + add-source of contrib/delta/src. Default spark.version stays 4.1.2 (the delta-spark 4.1.1 pin is a separate, deferred decision).
  • Cargo contrib-delta feature (optional path dep on the contrib crate); native/Cargo.toml excludes ../contrib from the workspace.
  • dev/verify-contrib-delta-gate.sh proves default cargo/mvn/dylib carry zero Delta surface and the gated build pulls the right deps; wired into a minimal delta_build_gate.yml job.

Inert wiring

  • Proto Delta* messages + delta_scan = 118 (117 is BroadcastNestedLoopJoin).
  • Native OpStruct::DeltaScan dispatch arm (not-compiled-in error on default builds; feature-gated shim into the contrib), exhaustive-match arms, convert_spark_types_to_arrow_schemapub(crate).
  • Stub contrib crate (contrib/delta/native): plan_delta_scan returns DataFusionError::NotImplemented — just enough to satisfy the core shim so --features contrib-delta links.
  • JVM bridge DeltaIntegration (reflective, returns None until the contrib classes exist), the CometExecRule Delta-marker hook (CDF hook deferred), the CometScanRule Delta delegation + metadata-col reorder, and the leaf DeltaConf.

Why it's safe on default builds

No -Pcontrib-delta / no --features contrib-delta: the dispatch arm is a not-compiled-in error, the contrib crate isn't linked, and DeltaIntegration's reflective lookups resolve to nothing. The gate-verify script asserts this (0 Delta symbols in the default libcomet, zero io.delta in the effective pom, only the DeltaIntegration bridge class compiled).

Verification

default + gated native build, clippy (both feature states), the gate-verify script, gated + default JVM compile, spotless/scalastyle, and cargo fmt — all green.


🤖 AI disclosure: this PR was prepared with assistance from Claude Code (Claude Opus 4.8), under the submitter's review and direction.

…b split, part 2]

Part 2 of the Delta Lake contrib PR breakup (tracking: apache#4366). Establishes the
`contrib-delta` build gate and the inert wiring that lets a gated build compile
end to end, while the DEFAULT build stays byte-for-byte unchanged (zero Delta
surface). No real Delta read logic yet -- that lands in later parts; here a Delta
read that reaches native returns a clean "not implemented" error and falls back
to vanilla Spark.

Build gate:
- Maven `contrib-delta` profile (spark/pom.xml) with per-Spark `delta.version`
  (3.5->3.3.2, 4.0->4.0.0, 4.1->4.1.0) and an add-source of contrib/delta/src.
  Default `delta.version` floor in pom.xml. The default spark.version stays 4.1.2
  (the delta-spark 4.1.1 pin is a separate, deferred decision).
- Cargo `contrib-delta` feature on core (optional path dep on comet-contrib-delta);
  `native/Cargo.toml` excludes ../contrib from the workspace.
- `dev/verify-contrib-delta-gate.sh` proves default cargo/mvn/dylib carry zero
  Delta surface and the gated build pulls the right deps; wired into a minimal
  `delta_build_gate.yml` CI job (the full suite/regression workflows land later).
  Hardened the script against a `set -o pipefail` + `grep -q` SIGPIPE misfire
  (early grep exit -> echo SIGPIPE -> false guard failure) via here-strings.

Inert wiring:
- Proto: `Delta*` messages + `delta_scan = 118` (117 is BroadcastNestedLoopJoin).
- Native dispatch: `OpStruct::DeltaScan` arm with a not-compiled-in error on
  default builds and a feature-gated `delta_scan` shim that calls the contrib;
  exhaustive-match arms in operator_registry/jni_api; `convert_spark_types_to_
  arrow_schema` promoted to pub(crate).
- Stub contrib crate (contrib/delta/native): `plan_delta_scan` returns
  `DataFusionError::NotImplemented` -- just enough to satisfy the core shim's
  contract so `--features contrib-delta` links.
- JVM bridge `DeltaIntegration` (reflective, all lookups return None until the
  contrib classes exist), the CometExecRule Delta-marker hook (CDF hook deferred
  to a later part), the CometScanRule Delta delegation + metadata-col reorder,
  and the leaf `DeltaConf`.

Verification: default + gated native build, clippy both feature states, gate
script, gated + default JVM compile, spotless/scalastyle, cargo fmt -- all green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant