feat: 24 - Add comprehensive evals for tdd-workflow skill by miroslavpojer · Pull Request #25 · AbsaOSS/agentic-toolkit

miroslavpojer · 2026-06-23T12:32:06Z

Description

This PR establishes a complete eval suite for the tdd-workflow skill, enabling rigorous testing of TDD workflow triggering and body execution across multiple scenarios.

What's included:

evals/evals.json: 9 eval cases covering happy-path (implement function, fix bug, new class), regressions (no code before confirmation, no private member access), edge cases (existing SPEC.md), output format validation, negative cases, and paraphrased triggers
evals/trigger-eval.json: 15 trigger validation cases (11 should-trigger, 4 should-not-trigger) verifying the skill activates on explicit TDD requests, implicit code-writing intent, and programmatic variations, while correctly ignoring conceptual questions and pure refactors
evals/files/bank-account-spec.md: Fixture SPEC.md used by the private-member-access regression eval to ensure tests interact only with public interfaces

Testing approach:

Happy-path evals verify core TDD sequence: SPEC creation → test table proposal → user confirmation → failing tests → implementation → refactoring
Regression evals enforce TDD constraints: no coding before confirmation, no private member access in tests
Trigger evals validate skill activation boundaries on paraphrases and edge cases
Edge cases test behavior when SPEC.md already exists

Next steps:
Run eval suite via Copilot CLI (gh copilot + Use the skill-creator skill to test my skill at skills/tdd-workflow) to measure trigger accuracy and body output quality against baseline.

Closes #24

tmikula-dev · 2026-06-24T07:48:08Z

I have used a TDD skill in the past, that is well rated (144k of stars): https://github.com/mattpocock/skills/tree/main/skills/engineering/tdd. The author (@
mattpocock) is the same, that created for example grill-with-docs skill and this TDD skill is regularly updated (last update a week ago). Worth considering.

…and design decisions

miroslavpojer · 2026-06-25T09:39:23Z

I have used a TDD skill in the past, that is well rated (144k of stars): https://github.com/mattpocock/skills/tree/main/skills/engineering/tdd. The author (@ mattpocock) is the same, that created for example grill-with-docs skill and this TDD skill is regularly updated (last update a week ago). Worth considering.

I did comparison and current version is hybrid solution - the solution introduce vertical principle for test creation.

miroslavpojer added 3 commits June 23, 2026 13:59

feat: implement TDD workflow skill with SPEC.md and evaluation scenarios

386cc17

Simplify skill strings.

9511f79

Improved triggering

92d1ad1

miroslavpojer requested review from lsulak, oto-macenauer-absa and tmikula-dev as code owners June 23, 2026 12:32

miroslavpojer changed the title ~~24 - Add comprehensive evals for tdd-workflow skill~~ feat: 24 - Add comprehensive evals for tdd-workflow skill Jun 23, 2026

miroslavpojer added 4 commits June 24, 2026 11:57

feat: add TDD workflow skill documentation and update README

8e60d9b

feat: add new evaluation scenarios and trigger queries for TDD workflow

9676fbf

feat: refine TDD workflow description for clarity and scope

a7b6a95

feat: enhance TDD workflow documentation with clearer specifications …

ccd9d20

…and design decisions

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: 24 - Add comprehensive evals for tdd-workflow skill#25

feat: 24 - Add comprehensive evals for tdd-workflow skill#25
miroslavpojer wants to merge 7 commits into
masterfrom
feature/24-implement-skill---tdd-workflow

miroslavpojer commented Jun 23, 2026 •

edited

Loading

Uh oh!

tmikula-dev commented Jun 24, 2026

Uh oh!

miroslavpojer commented Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

miroslavpojer commented Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Uh oh!

tmikula-dev commented Jun 24, 2026

Uh oh!

miroslavpojer commented Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

miroslavpojer commented Jun 23, 2026 •

edited

Loading