Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,7 @@ its purpose, trigger phrases, and full instructions.
| Skill | Description |
|------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------|
| **[pr-review](./skills/pr-review/)** | Pull request code review — reviews diffs for risk, security issues, API contract changes, dependency bumps, CI/CD and infrastructure changes. Produces concise Blocker / Important / Nit comments. |
| **[test-data-management](./skills/test-data-management/)** | Test data setup and management — factory functions, parametrised tests, deterministic seeds, fixture reuse, and production-data rules for unit and integration tests. |
| **[token-saving](./skills/token-saving/)** | Always-active response discipline — enforces brevity, no filler openers or closers, structured output, and a What/Why/How footer on code responses. Suspends on explicit "full detail" requests. |

## Finding More Skills
Expand Down
5 changes: 3 additions & 2 deletions docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,8 +24,9 @@ Navigation hub for all guides in this repository. Browse by category below.

| Guide | Description |
|----|----|
| [PR Review](./pr-review.md) | How the PR review skill works, what sections it applies, and how to trigger it |
| [Token Saving](./token-saving.md) | Keeping AI responses concise — how the token-saving skill works and when it applies |
| [PR Review](./pr-review.md) | How the PR review skill works, what sections it applies, and how to trigger it |
| [Test Data Management](./test-data-management.md) | How the test-data-management skill works, what it covers, and when it fires |
| [Token Saving](./token-saving.md) | Keeping AI responses concise — how the token-saving skill works and when it applies |

> **Keep this index up to date.** When you add a new guide, add a row to the appropriate table above.

Expand Down
62 changes: 62 additions & 0 deletions docs/test-data-management.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
# Test Data Management Skill

The `test-data-management` skill guides consistent, maintainable test data setup across unit and integration tests. It activates when the question is about _how_ to structure or supply data to tests — not about writing the tests themselves.

---

## What it covers

| Topic | Guidance |
|---|---|
| Parametrised tests | When to collapse repeated test functions into a single data-driven test |
| Factory / builder pattern | Default-value factories with keyword overrides; composable nested factories |
| Production data | Why to never use production data (even anonymised); generator-script pattern |
| Deterministic data | Injecting fixed timestamps; avoiding `datetime.now()` in test setup |
| Minimal data | Inline data for simple cases; external fixtures only for complex payloads |
| Integration test cleanup | Transaction rollback, truncation, test containers, run-scoped IDs |

---

## When it fires

The skill activates on intent — it does not require exact phrasing:

```
my test setup is duplicated everywhere
how do I avoid copy-pasting test data across 10 tests?
can I use production data in tests?
my test keeps breaking because the expected timestamp changes every run
how do I create a factory for test orders?
how should I seed data for integration tests?
how do I clean up after an integration test?
```

---

## When it does not fire

| Situation | Correct path |
|---|---|
| Choosing mock, stub, spy, or fake | Test mocking / doubles guide |
| Writing test logic and assertions | Test authoring guide |
| Reviewing tests for standards compliance | Test review guide |
| Debugging a test runtime error | General debugging |
| Configuring test infrastructure (containers, DBs) | Infrastructure / DevOps guide |

---

## Language support

The skill covers patterns for Python, TypeScript/JavaScript, Scala, Java, and .NET. Examples default to Python but the language table in the skill body maps each pattern to the idiomatic tool for each ecosystem.

---

## Evals

The skill ships with 10 functional evals (`evals/evals.json`) and 19 trigger evals (`evals/trigger-eval.json`). Run them to validate behaviour after edits — see [Skill Testing](./skill-testing.md).

---

## Installation

See [Getting Started](./getting-started.md) for the full install guide.
129 changes: 129 additions & 0 deletions skills/test-data-management/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,129 @@
---
name: test-data-management
description: |
Test data management: parametrize tests, use factories and builders, avoid duplication and test pollution.
Use this skill whenever the user asks about: parametrizing tests, creating test fixtures, test data factories,
test builders, avoiding duplicated test setup, cleaning up test data, avoiding test pollution, seeding databases,
using production data in tests, making tests deterministic, isolating tests, or any other test data strategy.
Invoke this skill even if the user doesn't explicitly ask for "test data" — if they're talking about test setup,
fixtures, factories, builders, parametrization, or data isolation in tests, this is the right skill.
---

# Test Data Management

## Prefer data-driven and parametrized tests

When a behaviour must be tested with multiple input combinations, prefer parametrized tests over
duplicated test methods. One parametrized test with a data table is clearer, easier to extend,
and reduces duplication.

| Language | Tool | Pattern |
|----------|------|---------|
| Python | `pytest.mark.parametrize` | `@pytest.mark.parametrize("input,expected", [...])` |
| Scala | ScalaTest `TableDrivenPropertyChecks` | `forAll(table) { (input, expected) => ... }` |
| .NET | xUnit `[Theory]` + `[InlineData]` / `[MemberData]` | `[Theory] [InlineData(1, 2)]` |
| TypeScript | Jest `test.each` | `test.each([[input, expected]])` |
| Java | JUnit 5 `@ParameterizedTest` | `@ParameterizedTest @MethodSource` |

**Use** parametrized tests when: the same behaviour is tested with ≥ 3 input combinations, or when
combinations form a clear equivalence class table.

**Do not use** parametrized tests when: each case requires fundamentally different setup or
assertions — use separate named tests instead.

## Use factory and builder patterns

- Create factory functions or builder classes that produce valid default objects
- Override only the fields relevant to the specific test case
- Shared hardcoded data causes **cross-test coupling**: a change to a shared dict or object breaks every test that references it — tests should never share mutable data structures
- Place factories in the nearest shared location:
- Python: `conftest.py` factory fixture or `tests/factories.py`
- Scala: `TestFactories.scala` object
- .NET: `TestDataBuilder.cs`
- TypeScript: `test-factories.ts`
- Document the factory's defaults and what each parameter controls

```python
# ✅ — factory with keyword overrides
def make_order(*, order_id="ORD-1", user_id="u1", sku="SKU-1", quantity=1, status="pending"):
return Order(order_id=order_id, user_id=user_id, sku=sku, quantity=quantity, status=status)

# test uses only the fields that matter
def test_cancel_order_when_shipped_returns_false():
order = make_order(status="shipped")
assert service.cancel(order.order_id) is False
```

### Composable nested factories

Create a factory per level and compose them:

```python
# ✅ — each factory owns one level; override at any depth
def make_address(*, street="1 Test St", city="Cape Town", country="ZA"):
return Address(street=street, city=city, country=country)

def make_customer(*, customer_id="CUST-1", email="test@example.com", address=None):
return Customer(customer_id=customer_id, email=email,
address=address or make_address())

def make_order(*, order_id="ORD-1", user_id="u1", sku="SKU-1", quantity=1,
status="pending", customer=None):
return Order(order_id=order_id, user_id=user_id, sku=sku, quantity=quantity,
status=status, customer=customer or make_customer())

# Override at any level without affecting other tests
order = make_order(customer=make_customer(address=make_address(country="UK")))
```

Do not hardcode deeply-nested dict literals — they are impossible to override and cannot be reused.

## Never use production data

- Must never use real production data in tests — not even anonymised exports
- Generate synthetic data that represents the shape and constraints of production data
- If a realistic dataset is needed, write a generator script and commit the generator, not the data

## Keep data deterministic

- No random values without a fixed seed
- Timestamps must be fixed or injected — never call `datetime.now()`, `new Date()`, or
`LocalDateTime.now()` directly in test setup
- Inject a clock or timestamp provider that can be fixed per test

```python
# ✅ — fixed timestamp injected
FIXED_NOW = datetime(2026, 1, 15, 12, 0, 0, tzinfo=timezone.utc)

def test_order_placed_at_is_set(mocker):
mocker.patch("myapp.services.order_service.datetime")
.now.return_value = FIXED_NOW
order = service.place_order("u1", "SKU-1", 1)
assert order["placed_at"] == FIXED_NOW.isoformat()

# ❌ — non-deterministic; test may pass or fail depending on timing
def test_order_placed_at_is_set():
order = service.place_order("u1", "SKU-1", 1)
assert order["placed_at"] is not None # always passes; proves nothing
```

## Keep data minimal

- Use the smallest dataset that exercises the behaviour under test
- Avoid large data files checked into the repo — generate programmatically
- Prefer inline data for simple cases; external files only for complex domain fixtures with
many fields (e.g. realistic JSON payloads, XML documents)

## Clean up after integration tests

- Integration tests must clean up created data after each test run
- Strategies: transactions rolled back after each test, temp tables, test containers with
per-test teardown, or database truncation in `afterEach`
- Use a unique run-scoped prefix or ID for all created records so cleanup is scoped and safe
- Unit tests do not need cleanup — they use no real resources

## Out of scope

- Choosing test doubles (mock, stub, spy, fake)
- Writing test logic and assertions — handled separately
- Reviewing tests for standards compliance — handled separately
144 changes: 144 additions & 0 deletions skills/test-data-management/evals/evals.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,144 @@
{
"skill_name": "test-data-management",
"evals": [
{
"id": 1,
"category": "happy-path",
"prompt": "The first six tests in evals/files/separate-variation-tests.py all test the same apply_tax() behaviour with different rate values. How should I restructure this?",
"expected_output": "Recommend collapsing the six separate test functions into a single @pytest.mark.parametrize test with a data table: @pytest.mark.parametrize('amount,rate,expected', [(Decimal('100.00'), 0, Decimal('100.00')), (Decimal('100.00'), 5, Decimal('105.00')), ...]). The two failure-path tests at the bottom test distinct failure logic and should remain as separate named tests.",
"files": [
"evals/files/separate-variation-tests.py"
],
"expectations": [
"Parametrize is recommended as the restructuring approach",
"The data table includes all six rate variations",
"Parametrize syntax is correct for pytest",
"The two failure-path tests are identified as exceptions that should stay separate",
"Explanation of why to use vs not use parametrize is given"
]
},
{
"id": 2,
"category": "regression",
"prompt": "Review the test data setup in evals/files/copy-paste-data-setup.py. The Package construction is repeated everywhere — how do I fix this?",
"expected_output": "Recommend a make_package() factory function with keyword overrides for the fields that vary per test. Place it in conftest.py as a fixture or in a shared factories.py module. Each test then calls make_package(weight_kg=10.0) and only specifies the field under test. Also flag the non-deterministic datetime.now() in test_shipping_audit_log_timestamp — it should be patched to a fixed value.",
"files": [
"evals/files/copy-paste-data-setup.py"
],
"expectations": [
"Factory function pattern is recommended",
"Factory function signature uses keyword overrides with sensible defaults",
"Placement in conftest.py or factories.py is mentioned",
"Each test example only overrides the relevant field",
"Non-deterministic datetime.now() is flagged",
"Fixed clock / patched datetime is recommended as the fix"
]
},
{
"id": 3,
"category": "happy-path",
"prompt": "Can I use an export of production customer records as test data? It's anonymised.",
"expected_output": "No. Must never use real production data in tests — not even anonymised exports. Generate synthetic data instead using a factory function or a generator script that produces data with the same shape and constraints. If a realistic dataset is needed, write and commit the generator script, not the generated data.",
"files": [],
"expectations": [
"Production data — even anonymised — is explicitly prohibited",
"Synthetic data generation is recommended as the alternative",
"Generator script pattern (commit the generator, not the data) is mentioned"
]
},
{
"id": 4,
"category": "happy-path",
"prompt": "How should I handle timestamps in my test data? My tests keep breaking because the expected 'placed_at' value changes every run.",
"expected_output": "Inject a fixed clock or patch datetime.now() to a deterministic value. Show the mocker.patch pattern for pytest, fixing the datetime to a constant like FIXED_NOW = datetime(2026, 1, 15, 12, 0, 0, tzinfo=timezone.utc). Assert the exact isoformat value instead of checking is not None.",
"files": [],
"expectations": [
"Root cause identified: non-deterministic datetime.now() in production code",
"mocker.patch or monkeypatch recommended for pytest",
"FIXED_NOW constant pattern demonstrated",
"Assertion on exact timestamp value (not is-not-None) shown",
"Clock injection as a design alternative is mentioned"
]
},
{
"id": 5,
"category": "negative",
"prompt": "Should I use a mock or a stub for the payment gateway?",
"expected_output": "This is a test double selection question — out of scope for test-data-management. Response should redirect to a mocking/test-doubles guide and not provide factory or parametrize advice.",
"files": [],
"expectations": [
"Skill correctly identifies this as a mocking question",
"Response defers to a mocking/test-doubles guide, not test-data-management"
]
},
{
"id": 6,
"category": "paraphrase",
"prompt": "All my tests use the same hardcoded order data. When I change one test, it breaks others. How do I fix this?",
"expected_output": "Agent identifies the root cause as shared mutable test data or shared hardcoded values. Recommends the factory function pattern: create a make_order() factory with sensible defaults and keyword overrides. Each test calls make_order(status='PENDING') and only overrides the fields it cares about. Hardcoded data shared across tests causes cross-test coupling — a change to the shared dict/object breaks all tests that reference it. Place the factory in conftest.py.",
"files": [],
"expectations": [
"Identifies shared hardcoded data as the root cause",
"Recommends factory function with keyword overrides",
"Explains cross-test coupling as the failure mechanism",
"Notes conftest.py placement"
]
},
{
"id": 7,
"category": "edge-case",
"prompt": "My test factory creates an order with an embedded Customer object. The Customer itself has an embedded Address. How deep should the factory defaults go?",
"expected_output": "Factories should be composable: create a make_address() factory and a make_customer(address=make_address()) factory, then make_order(customer=make_customer()). This allows any test to override at any level: make_order(customer=make_customer(address=make_address(country='ZA'))). Do not hardcode nested objects as deeply-nested dict literals — they are hard to override and impossible to reuse. Each factory owns its own level of the object graph.",
"files": [],
"expectations": [
"Recommends composable nested factories",
"Shows override at any level of nesting",
"Warns against deeply-nested dict literals",
"Notes each factory owns one level of the object graph"
]
},
{
"id": 8,
"category": "output-format",
"prompt": "Show me the factory pattern for creating test Order objects with sensible defaults and override support.",
"expected_output": "The factory function is in a single fenced ```python code block. The function is named make_order() and accepts keyword arguments with defaults. It returns a dict or dataclass with at least: order_id, user_id, sku, quantity, status. A usage example showing override syntax (make_order(status='CANCELLED')) follows in the same or a second code block. No test assertions appear in the factory definition.",
"files": [],
"expectations": [
"Factory in fenced ```python code block",
"Named make_order() with keyword defaults",
"Returns dict or dataclass with order_id, user_id, sku, quantity, status",
"Usage example with override syntax shown",
"No assertions in the factory body"
]
},
{
"id": 9,
"category": "happy-path",
"prompt": "My integration tests are leaving records in the database. How do I clean up after each test?",
"expected_output": "Recommend at least two cleanup strategies from: (1) wrap each test in a transaction and roll back after, (2) truncate tables in afterEach/teardown, (3) use test containers with per-test teardown, (4) use a unique run-scoped ID/prefix for all created records so cleanup is targeted. State that unit tests do not need cleanup. Warn that failing to clean up causes test pollution and non-deterministic failures.",
"files": [],
"expectations": [
"At least two cleanup strategies named",
"Transaction rollback is mentioned as an option",
"Test containers or afterEach teardown mentioned",
"Run-scoped prefix/ID strategy mentioned",
"States unit tests do not need cleanup",
"Test pollution / non-determinism risk is explained"
]
},
{
"id": 10,
"category": "edge-case",
"prompt": "I'm writing tests in TypeScript with Jest. How do I parametrize a test that checks the same discount calculation with five different discount rates?",
"expected_output": "Recommend Jest test.each with an inline data table. Show the test.each([[rate, input, expected], ...]) syntax and the it/test function signature receiving the parameters. Mention that this is the TypeScript/Jest equivalent of pytest.mark.parametrize and xUnit Theory. The five rate variations should collapse into a single test block.",
"files": [],
"expectations": [
"Jest test.each is recommended",
"Correct test.each([[...], ...]) syntax shown",
"Test function signature using destructured or positional params shown",
"Equivalence to pytest.mark.parametrize or xUnit Theory noted",
"All five rate variations collapse into one test block"
]
}
]
}
Loading
Loading