Skip to content

feat: 24 - Add comprehensive evals for tdd-workflow skill#25

Open
miroslavpojer wants to merge 7 commits into
masterfrom
feature/24-implement-skill---tdd-workflow
Open

feat: 24 - Add comprehensive evals for tdd-workflow skill#25
miroslavpojer wants to merge 7 commits into
masterfrom
feature/24-implement-skill---tdd-workflow

Conversation

@miroslavpojer

@miroslavpojer miroslavpojer commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Description

This PR establishes a complete eval suite for the tdd-workflow skill, enabling rigorous testing of TDD workflow triggering and body execution across multiple scenarios.

What's included:

  • evals/evals.json: 9 eval cases covering happy-path (implement function, fix bug, new class), regressions (no code before confirmation, no private member access), edge cases (existing SPEC.md), output format validation, negative cases, and paraphrased triggers
  • evals/trigger-eval.json: 15 trigger validation cases (11 should-trigger, 4 should-not-trigger) verifying the skill activates on explicit TDD requests, implicit code-writing intent, and programmatic variations, while correctly ignoring conceptual questions and pure refactors
  • evals/files/bank-account-spec.md: Fixture SPEC.md used by the private-member-access regression eval to ensure tests interact only with public interfaces

Testing approach:

  • Happy-path evals verify core TDD sequence: SPEC creation → test table proposal → user confirmation → failing tests → implementation → refactoring
  • Regression evals enforce TDD constraints: no coding before confirmation, no private member access in tests
  • Trigger evals validate skill activation boundaries on paraphrases and edge cases
  • Edge cases test behavior when SPEC.md already exists

Next steps:
Run eval suite via Copilot CLI (gh copilot + Use the skill-creator skill to test my skill at skills/tdd-workflow) to measure trigger accuracy and body output quality against baseline.

Closes #24

@miroslavpojer miroslavpojer changed the title 24 - Add comprehensive evals for tdd-workflow skill feat: 24 - Add comprehensive evals for tdd-workflow skill Jun 23, 2026
@tmikula-dev

Copy link
Copy Markdown
Collaborator

I have used a TDD skill in the past, that is well rated (144k of stars): https://github.com/mattpocock/skills/tree/main/skills/engineering/tdd. The author (@
mattpocock) is the same, that created for example grill-with-docs skill and this TDD skill is regularly updated (last update a week ago). Worth considering.

@miroslavpojer

Copy link
Copy Markdown
Contributor Author

I have used a TDD skill in the past, that is well rated (144k of stars): https://github.com/mattpocock/skills/tree/main/skills/engineering/tdd. The author (@ mattpocock) is the same, that created for example grill-with-docs skill and this TDD skill is regularly updated (last update a week ago). Worth considering.

I did comparison and current version is hybrid solution - the solution introduce vertical principle for test creation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: implement skill - tdd-workflow

2 participants