ci: track flaky and slow tests over time#11360
Open
markijbema wants to merge 6 commits into
Open
Conversation
Contributor
Code Review SummaryStatus: No Issues Found | Recommendation: Merge Files Reviewed (2 files)
Reviewed by deepseek-v4-pro-20260423 · 299,112 tokens Review guidance: REVIEW.md from base branch |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Retries currently turn transient CLI test failures into green results, while short-lived JUnit artifacts make recurring instability and slow tests difficult to identify.
This adds an Allure Report 2 test-health job that consumes the JUnit XML already produced by Bun and Gradle. Linux, Windows, and JetBrains remain separate so platform-specific outcomes are not combined. Each workflow run publishes three standalone HTML previews containing retries, recent status history, duration trends, and slow-test views.
Successful main-branch runs also deploy a stable GitHub Pages dashboard at
https://kilo-org.github.io/kilocode/. Main-branch history is carried forward through GitHub artifacts so flaky, regressed, fixed, and duration trends accumulate without an external analytics service or stored credential. Reporting and deployment remain non-blocking observability aids.The CLI test runner retains every failed file attempt separately from the final retry-aware JUnit result. Allure receives the final outcome first and prior attempts as retries, so a recovered test remains green while still being marked as a status-changing retry. Allure 2.42.1 is downloaded from its official release and verified against the release SHA-256 before processing results.