Skip to content

ci: track flaky and slow tests over time#11360

Open
markijbema wants to merge 6 commits into
mainfrom
ci/track-flaky-slow-tests
Open

ci: track flaky and slow tests over time#11360
markijbema wants to merge 6 commits into
mainfrom
ci/track-flaky-slow-tests

Conversation

@markijbema

Copy link
Copy Markdown
Contributor

Retries currently turn transient CLI test failures into green results, while short-lived JUnit artifacts make recurring instability and slow tests difficult to identify.

This adds an Allure Report 2 test-health job that consumes the JUnit XML already produced by Bun and Gradle. Linux, Windows, and JetBrains remain separate so platform-specific outcomes are not combined. Each workflow run publishes three standalone HTML previews containing retries, recent status history, duration trends, and slow-test views.

Successful main-branch runs also deploy a stable GitHub Pages dashboard at https://kilo-org.github.io/kilocode/. Main-branch history is carried forward through GitHub artifacts so flaky, regressed, fixed, and duration trends accumulate without an external analytics service or stored credential. Reporting and deployment remain non-blocking observability aids.

The CLI test runner retains every failed file attempt separately from the final retry-aware JUnit result. Allure receives the final outcome first and prior attempts as retries, so a recovered test remains green while still being marked as a status-changing retry. Allure 2.42.1 is downloaded from its official release and verified against the release SHA-256 before processing results.

@kilo-code-bot

kilo-code-bot Bot commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Code Review Summary

Status: No Issues Found | Recommendation: Merge

Files Reviewed (2 files)
  • .github/workflows/test.yml - Added test-health and deploy-test-health jobs with Allure Report 2 integration (260 additions). All changes wrapped in kilocode_change markers.
  • packages/opencode/script/test-runner.ts - Refactored to track per-attempt JUnit XML files and produce separate unit and test-health merged outputs.

Reviewed by deepseek-v4-pro-20260423 · 299,112 tokens

Review guidance: REVIEW.md from base branch main

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants