ci: add Scalpel shadow comparison and upgrade to 0.3.7#22524
Conversation
|
🌟 Thank you for your contribution to the Apache Camel project! 🌟 🐫 Apache Camel Committers, please review the following items:
|
|
ℹ️ CI did not run targeted module tests. |
77a5579 to
1834e34
Compare
|
Claude Code on behalf of Guillaume Nodet @apupier Thanks for the review!
Done — just merged latest
No, the Scalpel extension can be updated independently — that PR (#22572) was already merged and is included in the merge we just did. The CI script is version-agnostic: it uses |
apupier
left a comment
There was a problem hiding this comment.
Please launch it on a branch so that we ensure that it doesn't break the curren tflow and have a better idea of what will be the output
|
|
||
| The script overrides `fullBuildTriggers` to empty (`-Dscalpel.fullBuildTriggers=`) because Scalpel's default (`.mvn/**`) would trigger a full build whenever `.mvn/extensions.xml` itself changes (e.g., Dependabot bumping Scalpel). | ||
|
|
||
| The base branch is pre-fetched by the CI workflow (`git fetch --deepen=200` + fetch of `origin/main`) rather than by Scalpel's built-in JGit fetch (`-Dscalpel.fetchBaseBranch=false`). This avoids JGit issues in shallow CI clones. |
There was a problem hiding this comment.
why do we need 200 hundreds commits from git history to detect dependencies?
What is the performance impact?
There was a problem hiding this comment.
The 200 commits are not for detecting dependencies — they're for finding the git merge-base between the PR branch and main.
GitHub Actions checks out with depth=1 (a single commit). Scalpel needs the merge-base to compute which files changed, then maps those changed files to Maven modules and walks Maven's dependency graph to find affected downstream modules.
Why local git instead of the GitHub API? The existing grep-based script fetches the diff via the GitHub REST API (application/vnd.github.v3.diff), so it doesn't need local history at all. Scalpel uses local git instead because:
- It's a Maven extension (JVM-based), so using local git via JGit is the natural approach
- CI-provider-agnostic — works on GitLab, Jenkins, etc., not just GitHub
- No diff truncation risk (GitHub API truncates large diffs)
Performance impact: git fetch --deepen=200 fetches only commit metadata (not file blobs). It adds ~2-3 seconds to the job — negligible compared to the Maven build.
I've updated CI-ARCHITECTURE.md to clarify this.
Claude Code on behalf of Guillaume Nodet
There was a problem hiding this comment.
Also note that the end goal is to switch to Scalpel completely and remove the grep-based custom mechanism, which will then remove the dependency on the GitHub API for diff fetching entirely.
Claude Code on behalf of Guillaume Nodet
There was a problem hiding this comment.
I've pushed a follow-up commit that switches the grep-based script itself to use local git (git merge-base + git diff) instead of the GitHub API for diff fetching. Both mechanisms now share the same --deepen=200 fetch step, which makes the git fetch justified by both and removes the GitHub API dependency for diffs entirely.
Claude Code on behalf of Guillaume Nodet
There was a problem hiding this comment.
Updated test PRs — now with POM changes to trigger Scalpel's shadow comparison (the previous ones only had Java file changes, which don't invoke Scalpel's POM analysis):
- PR #23062 —
kafka-version(narrow managed dep). Grep finds 0 modules, Scalpel should find camel-kafka. - PR #23063 —
junit-jupiter-version(test-scoped managed dep). Grep finds 0, Scalpel should detect test scope. - PR #23064 —
jackson2-version(widely-used managed dep). Grep finds 0, Scalpel should find many modules.
All three demonstrate grep's blind spot for <dependencyManagement>-inherited versions. Throwaway PRs — will be closed after validation.
Claude Code on behalf of Guillaume Nodet
New validation PRs created with Scalpel 0.3.7 (clean CI history):
Previous test PRs (#23070–#23072, #24250) used Scalpel 0.3.0–0.3.6 which had inflated module counts (scalpel#39). That's fixed in 0.3.7. Preliminary result (from #24250 with 0.3.7):
CI runs for the new PRs are in progress — results will appear in the Scalpel shadow comparison section of each PR's CI comment. Claude Code on behalf of @gnodet |
73a8203 to
faaf5da
Compare
Scalpel 0.3.7 validation resultsCreated 3 clean test PRs to validate the shadow comparison with different types of managed dependency changes:
Scalpel distinguishes between modules that need recompilation and those that need retesting:
Key observations
All 3 test PRs passed CI ✅ with no regression to the existing grep-based test execution. Claude Code on behalf of @gnodet |
Validation results (Scalpel 0.3.7)Three test PRs were created to validate the shadow comparison across different dependency change scenarios:
Key observations
Results for #24272 and #24274 will appear in their CI comments once builds complete. See also: the last comment above with links to all validation PRs. |
Add a shadow comparison section to CI PR comments showing what Maveniverse Scalpel's skip-tests mode would have tested — without affecting actual test execution. Changes: - incremental-build.sh: configure Scalpel with skipTestsForDownstreamModules and fetchBaseBranch=false, add writeScalpelComparison() for collapsible PR comment section with failure reporting - pr-build-main.yml / sonar-build.yml: add base branch fetch step for Scalpel's merge-base detection in shallow CI clones, restore checkout v7 - CI-ARCHITECTURE.md: document shadow comparison approach and configuration - Scalpel upgraded to 0.3.7: fixes inflated affectedModules count for parent POM property changes (scalpel#39) and skipTestsForDownstreamModules
d106d61 to
4585eeb
Compare
|
Claude Code on behalf of Alireza Esmaeili @apupier Good catch! I investigated why PR #24274's comment wasn't updated. Root cause: artifact overwrite race condition with cancelled builds The CI has two JDK matrix entries (17, 25) that both upload the
PR #24272 (kafka) worked fine because both JDK 17 and 25 succeeded — both generated the comment file, so the overwrite was harmless. Fix: I've updated the upload step to only upload (and overwrite) when |
When a matrix build has one JDK failing and another cancelled, the cancelled JDK's cleanup steps upload a ci-comment artifact without the comment file, overwriting the failed JDK's artifact that had it. Only upload the ci-comment artifact when incremental-test-comment.md actually exists, preventing a cancelled build from clobbering a completed build's comment. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Scalpel's auto-detection of GITHUB_BASE_REF via Maven system properties
(env.GITHUB_BASE_REF) is fragile — it can fail in CI rerun contexts or
with certain Maven wrapper configurations, causing the report to silently
not be generated.
Fix: always pass -Dscalpel.baseBranch=origin/${GITHUB_BASE_REF:-main}
explicitly. Also add a git merge-base pre-check and improved diagnostics
(tail of Scalpel log, broader grep) when the report is not found.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
oscerd
left a comment
There was a problem hiding this comment.
Thanks @gnodet — the shadow-mode Scalpel comparison is a sound approach, and the .github/CI-ARCHITECTURE.md update is a welcome bit of documentation. The artifact-race fix (gating the Upload step on hashFiles('ci-comment-artifact/incremental-test-comment.md') != '') looks correct. A couple of things, one of them new:
1. (new) Unguarded base-branch git fetch can fail the build despite the "shadow-only" framing. In .github/workflows/pr-build-main.yml, the new "Fetch base branch for Scalpel change detection" step guards the first git fetch --deepen=$depth with || true, but the follow-up git fetch --no-tags --depth=$depth origin "${BASE_REF}:refs/remotes/origin/${BASE_REF}" and the unshallow-fallback's final git fetch are not guarded. Since Actions runs steps with bash --noprofile --norc -eo pipefail, a failed base-branch fetch would abort the step and fail the PR build — even though Scalpel is observational. The same pattern is in .github/workflows/sonar-build.yml. Adding || true (or continue-on-error) there would keep the "no behavioral change" guarantee fail-open.
2. The [DO NOT MERGE] jackson-downgrade commit is still on the branch. Commit 8feb3d6 downgrades <jackson2-version> 2.22.0→2.21.2 in parent/pom.xml — that's the direct cause of the two red build jobs, and it's out-of-scope application config in an otherwise CI-only PR. Presumably intentional for validating the shadow run; it just needs dropping before merge.
For the record, apupier's earlier conflict / #22572 questions look addressed. One thing I couldn't verify from the repo: Scalpel 0.3.7's report JSON schema that the new jq filters depend on (.affectedModules[].testsSkipped, .category, .artifactId). jq is fail-safe if those fields are absent, so the worst case there is degraded comparison accuracy rather than a broken build.
Not stacking a formal change request on top of apupier's — this is just the one new fail-open nit plus the merge-blocker reminder.
Reviewed with Claude Code on behalf of Andrea Cosentino (@oscerd). This review was generated by an AI agent and may contain inaccuracies; please verify all suggestions before applying.
Instead of a fixed --depth=200 fetch that can miss the merge base for long-lived or stale branches, try 200 → 1000 → unshallow until git merge-base succeeds. Most PRs resolve at depth 200 (no extra cost); only old branches need the deeper fetches. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
e47d327 to
27d1872
Compare
gnodet
left a comment
There was a problem hiding this comment.
Thanks for the thorough review @oscerd!
Both points addressed:
-
Unguarded fetch: All
git fetchcommands in the progressive deepening loop are now guarded with|| true— in bothpr-build-main.ymlandsonar-build.yml. The Scalpel fetch step can never fail the build. The same progressive deepening pattern (200 → 1000 → unshallow) is applied consistently to both workflows. -
[DO NOT MERGE]jackson commit: Dropped from the branch. The PR now has 4 clean CI-only commits with no application config changes.
Re: Scalpel's report JSON schema — correct, the jq filters are fail-safe (defaulting to empty strings/missing fields). Worst case is a degraded shadow comparison section, never a broken build.
Claude Code on behalf of @gnodet
Move all Scalpel-related output below a separator line at the end of the CI comment, with a one-line diff summary showing what Scalpel would add/remove vs the current grep-based detection (e.g. "compile: +51, test: +22"). The top section now shows only what the existing grep mechanism found, making it easy to see what each approach contributes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Summary
Add a Scalpel shadow comparison section to PR CI comments, showing what Maveniverse Scalpel's skip-tests mode would have tested — without affecting actual test execution.
This lets the team validate Scalpel's module detection across many PRs before switching from grep-based to Scalpel-driven test selection.
Changes
---separator, a collapsible section shows a one-line diff summary (e.g.compile: +51, test: +22) followed by Scalpel's full skip-tests module breakdown. The top section shows only what the existing grep mechanism detected — making it clear what each approach contributes.skipTestsForDownstreamModules: derived fromEXCLUSION_LISTvia sed (no duplication) — tells Scalpel which downstream modules to skip in shadow modeaffectedModulescount for parent POM property changes (scalpel#39) andskipTestsForDownstreamModulesnot taking effectbaseBranchexplicitly to Scalpel: relying on Scalpel's auto-detection ofenv.GITHUB_BASE_REFvia Maven system properties is fragile in CI rerun contexts. Now always passes-Dscalpel.baseBranch=origin/mainexplicitly, with agit merge-basepre-check and improved diagnostics.--depth=200that can miss the merge base for long-lived branches, the fetch step now tries 200 → 1000 → unshallow untilgit merge-basesucceeds. All fetch commands are guarded with|| trueso failures never break the build. Applied consistently to bothpr-build-main.ymlandsonar-build.yml.How Scalpel skip-tests mode works
Scalpel distinguishes between modules that need recompilation and modules that need retesting:
EXCLUSION_LIST(generated code, meta-modules likecamel-allcomponents,camel-catalog,camel-endpointdsl…) are recompiled with-DskipTests— no test executionFor example, a
junit-jupiter-versionchange affects 42 modules: 13 are recompiled and tested, 29 are recompiled only (tests skipped).CI comment layout
The CI comment now has two clearly separated sections:
Regression safety
fetchDiff()still uses the GitHub REST API (no local git dependency for grep)|| truethroughout so fetch failures never abort the buildcheckout@v7.0.0andpersist-credentials: falsekept in bothpr-build-main.ymlandsonar-build.ymlValidation
Tested with test PRs on Scalpel 0.3.7 — each bumps a different managed dependency property:
kafka-versionjunit-jupiter-versionjackson2-version<dependencyManagement>, invisible to grep)CI architecture docs
Updated
CI-ARCHITECTURE.mdto document the dual detection strategy, shadow comparison behavior, and Scalpel configuration.Claude Code on behalf of @gnodet