AbsaOSS · miroslavpojer · Jun 24, 2026 · Jun 24, 2026 · Jun 24, 2026 · Jun 24, 2026
@@ -78,6 +78,7 @@ its purpose, trigger phrases, and full instructions.
 | Skill                                                | Description                                                                                                                         |
 |------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------|
 | **[pr-review](./skills/pr-review/)**                 | Pull request code review — reviews diffs for risk, security issues, API contract changes, dependency bumps, CI/CD and infrastructure changes. Produces concise Blocker / Important / Nit comments. |
+| **[test-data-management](./skills/test-data-management/)** | Test data setup and management — factory functions, parametrised tests, deterministic seeds, fixture reuse, and production-data rules for unit and integration tests. |
 | **[token-saving](./skills/token-saving/)**           | Always-active response discipline — enforces brevity, no filler openers or closers, structured output, and a What/Why/How footer on code responses. Suspends on explicit "full detail" requests. |
 
 ## Finding More Skills

@@ -24,8 +24,9 @@ Navigation hub for all guides in this repository. Browse by category below.
 
 | Guide | Description |
 |----|----|
-| [PR Review](./pr-review.md)             | How the PR review skill works, what sections it applies, and how to trigger it     |
-| [Token Saving](./token-saving.md)       | Keeping AI responses concise — how the token-saving skill works and when it applies |
+| [PR Review](./pr-review.md)                         | How the PR review skill works, what sections it applies, and how to trigger it     |
+| [Test Data Management](./test-data-management.md)   | How the test-data-management skill works, what it covers, and when it fires |
+| [Token Saving](./token-saving.md)                   | Keeping AI responses concise — how the token-saving skill works and when it applies |
 
 > **Keep this index up to date.** When you add a new guide, add a row to the appropriate table above.
 

@@ -0,0 +1,62 @@
+# Test Data Management Skill
+
+The `test-data-management` skill guides consistent, maintainable test data setup across unit and integration tests. It activates when the question is about _how_ to structure or supply data to tests — not about writing the tests themselves.
+
+---
+
+## What it covers
+
+| Topic | Guidance |
+|---|---|
+| Parametrised tests | When to collapse repeated test functions into a single data-driven test |
+| Factory / builder pattern | Default-value factories with keyword overrides; composable nested factories |
+| Production data | Why to never use production data (even anonymised); generator-script pattern |
+| Deterministic data | Injecting fixed timestamps; avoiding `datetime.now()` in test setup |
+| Minimal data | Inline data for simple cases; external fixtures only for complex payloads |
+| Integration test cleanup | Transaction rollback, truncation, test containers, run-scoped IDs |
+
+---
+
+## When it fires
+
+The skill activates on intent — it does not require exact phrasing:
+
+```
+my test setup is duplicated everywhere
+how do I avoid copy-pasting test data across 10 tests?
+can I use production data in tests?
+my test keeps breaking because the expected timestamp changes every run
+how do I create a factory for test orders?
+how should I seed data for integration tests?
+how do I clean up after an integration test?
+```
+
+---
+
+## When it does not fire
+
+| Situation | Correct path |
+|---|---|
+| Choosing mock, stub, spy, or fake | Test mocking / doubles guide |
+| Writing test logic and assertions | Test authoring guide |
+| Reviewing tests for standards compliance | Test review guide |
+| Debugging a test runtime error | General debugging |
+| Configuring test infrastructure (containers, DBs) | Infrastructure / DevOps guide |
+
+---
+
+## Language support
+
+The skill covers patterns for Python, TypeScript/JavaScript, Scala, Java, and .NET. Examples default to Python but the language table in the skill body maps each pattern to the idiomatic tool for each ecosystem.
+
+---
+
+## Evals
+
+The skill ships with 10 functional evals (`evals/evals.json`) and 19 trigger evals (`evals/trigger-eval.json`). Run them to validate behaviour after edits — see [Skill Testing](./skill-testing.md).
+
+---
+
+## Installation
+
+See [Getting Started](./getting-started.md) for the full install guide.
@@ -0,0 +1,129 @@
+---
+name: test-data-management
+description: |
+  Test data management: parametrize tests, use factories and builders, avoid duplication and test pollution.
+  Use this skill whenever the user asks about: parametrizing tests, creating test fixtures, test data factories,
+  test builders, avoiding duplicated test setup, cleaning up test data, avoiding test pollution, seeding databases,
+  using production data in tests, making tests deterministic, isolating tests, or any other test data strategy.
+  Invoke this skill even if the user doesn't explicitly ask for "test data" — if they're talking about test setup,
+  fixtures, factories, builders, parametrization, or data isolation in tests, this is the right skill.
+---
+
+# Test Data Management
+
+## Prefer data-driven and parametrized tests
+
+When a behaviour must be tested with multiple input combinations, prefer parametrized tests over
+duplicated test methods. One parametrized test with a data table is clearer, easier to extend,
+and reduces duplication.
+
+| Language | Tool | Pattern |
+|----------|------|---------|
+| Python | `pytest.mark.parametrize` | `@pytest.mark.parametrize("input,expected", [...])` |
+| Scala | ScalaTest `TableDrivenPropertyChecks` | `forAll(table) { (input, expected) => ... }` |
+| .NET | xUnit `[Theory]` + `[InlineData]` / `[MemberData]` | `[Theory] [InlineData(1, 2)]` |
+| TypeScript | Jest `test.each` | `test.each([[input, expected]])` |
+| Java | JUnit 5 `@ParameterizedTest` | `@ParameterizedTest @MethodSource` |
+
+**Use** parametrized tests when: the same behaviour is tested with ≥ 3 input combinations, or when
+combinations form a clear equivalence class table.
+
+**Do not use** parametrized tests when: each case requires fundamentally different setup or
+assertions — use separate named tests instead.
+
+## Use factory and builder patterns
+
+- Create factory functions or builder classes that produce valid default objects
+- Override only the fields relevant to the specific test case
+- Shared hardcoded data causes **cross-test coupling**: a change to a shared dict or object breaks every test that references it — tests should never share mutable data structures
+- Place factories in the nearest shared location:
+  - Python: `conftest.py` factory fixture or `tests/factories.py`
+  - Scala: `TestFactories.scala` object
+  - .NET: `TestDataBuilder.cs`
+  - TypeScript: `test-factories.ts`
+- Document the factory's defaults and what each parameter controls
+
+```python
+# ✅ — factory with keyword overrides
+def make_order(*, order_id="ORD-1", user_id="u1", sku="SKU-1", quantity=1, status="pending"):
+    return Order(order_id=order_id, user_id=user_id, sku=sku, quantity=quantity, status=status)
+
+# test uses only the fields that matter
+def test_cancel_order_when_shipped_returns_false():
+    order = make_order(status="shipped")
+    assert service.cancel(order.order_id) is False
+```
+
+### Composable nested factories
+
+Create a factory per level and compose them:
+
+```python
+# ✅ — each factory owns one level; override at any depth
+def make_address(*, street="1 Test St", city="Cape Town", country="ZA"):
+    return Address(street=street, city=city, country=country)
+
+def make_customer(*, customer_id="CUST-1", email="test@example.com", address=None):
+    return Customer(customer_id=customer_id, email=email,
+                    address=address or make_address())
+
+def make_order(*, order_id="ORD-1", user_id="u1", sku="SKU-1", quantity=1,
+               status="pending", customer=None):
+    return Order(order_id=order_id, user_id=user_id, sku=sku, quantity=quantity,
+                 status=status, customer=customer or make_customer())
+
+# Override at any level without affecting other tests
+order = make_order(customer=make_customer(address=make_address(country="UK")))
+```
+
+Do not hardcode deeply-nested dict literals — they are impossible to override and cannot be reused.
+
+## Never use production data
+
+- Must never use real production data in tests — not even anonymised exports
+- Generate synthetic data that represents the shape and constraints of production data
+- If a realistic dataset is needed, write a generator script and commit the generator, not the data
+
+## Keep data deterministic
+
+- No random values without a fixed seed
+- Timestamps must be fixed or injected — never call `datetime.now()`, `new Date()`, or
+  `LocalDateTime.now()` directly in test setup
+- Inject a clock or timestamp provider that can be fixed per test
+
+```python
+# ✅ — fixed timestamp injected
+FIXED_NOW = datetime(2026, 1, 15, 12, 0, 0, tzinfo=timezone.utc)
+
+def test_order_placed_at_is_set(mocker):
+    mocker.patch("myapp.services.order_service.datetime")
+           .now.return_value = FIXED_NOW
+    order = service.place_order("u1", "SKU-1", 1)
+    assert order["placed_at"] == FIXED_NOW.isoformat()
+
+# ❌ — non-deterministic; test may pass or fail depending on timing
+def test_order_placed_at_is_set():
+    order = service.place_order("u1", "SKU-1", 1)
+    assert order["placed_at"] is not None   # always passes; proves nothing
+```
+
+## Keep data minimal
+
+- Use the smallest dataset that exercises the behaviour under test
+- Avoid large data files checked into the repo — generate programmatically
+- Prefer inline data for simple cases; external files only for complex domain fixtures with
+  many fields (e.g. realistic JSON payloads, XML documents)
+
+## Clean up after integration tests
+
+- Integration tests must clean up created data after each test run
+- Strategies: transactions rolled back after each test, temp tables, test containers with
+  per-test teardown, or database truncation in `afterEach`
+- Use a unique run-scoped prefix or ID for all created records so cleanup is scoped and safe
+- Unit tests do not need cleanup — they use no real resources
+
+## Out of scope
+
+- Choosing test doubles (mock, stub, spy, fake)
+- Writing test logic and assertions — handled separately
+- Reviewing tests for standards compliance — handled separately
@@ -0,0 +1,144 @@
+{
+  "skill_name": "test-data-management",
+  "evals": [
+    {
+      "id": 1,
+      "category": "happy-path",
+      "prompt": "The first six tests in evals/files/separate-variation-tests.py all test the same apply_tax() behaviour with different rate values. How should I restructure this?",
+      "expected_output": "Recommend collapsing the six separate test functions into a single @pytest.mark.parametrize test with a data table: @pytest.mark.parametrize('amount,rate,expected', [(Decimal('100.00'), 0, Decimal('100.00')), (Decimal('100.00'), 5, Decimal('105.00')), ...]). The two failure-path tests at the bottom test distinct failure logic and should remain as separate named tests.",
+      "files": [
+        "evals/files/separate-variation-tests.py"
+      ],
+      "expectations": [
+        "Parametrize is recommended as the restructuring approach",
+        "The data table includes all six rate variations",
+        "Parametrize syntax is correct for pytest",
+        "The two failure-path tests are identified as exceptions that should stay separate",
+        "Explanation of why to use vs not use parametrize is given"
+      ]
+    },
+    {
+      "id": 2,
+      "category": "regression",
+      "prompt": "Review the test data setup in evals/files/copy-paste-data-setup.py. The Package construction is repeated everywhere — how do I fix this?",
+      "expected_output": "Recommend a make_package() factory function with keyword overrides for the fields that vary per test. Place it in conftest.py as a fixture or in a shared factories.py module. Each test then calls make_package(weight_kg=10.0) and only specifies the field under test. Also flag the non-deterministic datetime.now() in test_shipping_audit_log_timestamp — it should be patched to a fixed value.",
+      "files": [
+        "evals/files/copy-paste-data-setup.py"
+      ],
+      "expectations": [
+        "Factory function pattern is recommended",
+        "Factory function signature uses keyword overrides with sensible defaults",
+        "Placement in conftest.py or factories.py is mentioned",
+        "Each test example only overrides the relevant field",
+        "Non-deterministic datetime.now() is flagged",
+        "Fixed clock / patched datetime is recommended as the fix"
+      ]
+    },
+    {
+      "id": 3,
+      "category": "happy-path",
+      "prompt": "Can I use an export of production customer records as test data? It's anonymised.",
+      "expected_output": "No. Must never use real production data in tests — not even anonymised exports. Generate synthetic data instead using a factory function or a generator script that produces data with the same shape and constraints. If a realistic dataset is needed, write and commit the generator script, not the generated data.",
+      "files": [],
+      "expectations": [
+        "Production data — even anonymised — is explicitly prohibited",
+        "Synthetic data generation is recommended as the alternative",
+        "Generator script pattern (commit the generator, not the data) is mentioned"
+      ]
+    },
+    {
+      "id": 4,
+      "category": "happy-path",
+      "prompt": "How should I handle timestamps in my test data? My tests keep breaking because the expected 'placed_at' value changes every run.",
+      "expected_output": "Inject a fixed clock or patch datetime.now() to a deterministic value. Show the mocker.patch pattern for pytest, fixing the datetime to a constant like FIXED_NOW = datetime(2026, 1, 15, 12, 0, 0, tzinfo=timezone.utc). Assert the exact isoformat value instead of checking is not None.",
+      "files": [],
+      "expectations": [
+        "Root cause identified: non-deterministic datetime.now() in production code",
+        "mocker.patch or monkeypatch recommended for pytest",
+        "FIXED_NOW constant pattern demonstrated",
+        "Assertion on exact timestamp value (not is-not-None) shown",
+        "Clock injection as a design alternative is mentioned"
+      ]
+    },
+    {
+      "id": 5,
+      "category": "negative",
+      "prompt": "Should I use a mock or a stub for the payment gateway?",
+      "expected_output": "This is a test double selection question — out of scope for test-data-management. Response should redirect to a mocking/test-doubles guide and not provide factory or parametrize advice.",
+      "files": [],
+      "expectations": [
+        "Skill correctly identifies this as a mocking question",
+        "Response defers to a mocking/test-doubles guide, not test-data-management"
+      ]
+    },
+    {
+      "id": 6,
+      "category": "paraphrase",
+      "prompt": "All my tests use the same hardcoded order data. When I change one test, it breaks others. How do I fix this?",
+      "expected_output": "Agent identifies the root cause as shared mutable test data or shared hardcoded values. Recommends the factory function pattern: create a make_order() factory with sensible defaults and keyword overrides. Each test calls make_order(status='PENDING') and only overrides the fields it cares about. Hardcoded data shared across tests causes cross-test coupling — a change to the shared dict/object breaks all tests that reference it. Place the factory in conftest.py.",
+      "files": [],
+      "expectations": [
+        "Identifies shared hardcoded data as the root cause",
+        "Recommends factory function with keyword overrides",
+        "Explains cross-test coupling as the failure mechanism",
+        "Notes conftest.py placement"
+      ]
+    },
+    {
+      "id": 7,
+      "category": "edge-case",
+      "prompt": "My test factory creates an order with an embedded Customer object. The Customer itself has an embedded Address. How deep should the factory defaults go?",
+      "expected_output": "Factories should be composable: create a make_address() factory and a make_customer(address=make_address()) factory, then make_order(customer=make_customer()). This allows any test to override at any level: make_order(customer=make_customer(address=make_address(country='ZA'))). Do not hardcode nested objects as deeply-nested dict literals — they are hard to override and impossible to reuse. Each factory owns its own level of the object graph.",
+      "files": [],
+      "expectations": [
+        "Recommends composable nested factories",
+        "Shows override at any level of nesting",
+        "Warns against deeply-nested dict literals",
+        "Notes each factory owns one level of the object graph"
+      ]
+    },
+    {
+      "id": 8,
+      "category": "output-format",
+      "prompt": "Show me the factory pattern for creating test Order objects with sensible defaults and override support.",
+      "expected_output": "The factory function is in a single fenced ```python code block. The function is named make_order() and accepts keyword arguments with defaults. It returns a dict or dataclass with at least: order_id, user_id, sku, quantity, status. A usage example showing override syntax (make_order(status='CANCELLED')) follows in the same or a second code block. No test assertions appear in the factory definition.",
+      "files": [],
+      "expectations": [
+        "Factory in fenced ```python code block",
+        "Named make_order() with keyword defaults",
+        "Returns dict or dataclass with order_id, user_id, sku, quantity, status",
+        "Usage example with override syntax shown",
+        "No assertions in the factory body"
+      ]
+    },
+    {
+      "id": 9,
+      "category": "happy-path",
+      "prompt": "My integration tests are leaving records in the database. How do I clean up after each test?",
+      "expected_output": "Recommend at least two cleanup strategies from: (1) wrap each test in a transaction and roll back after, (2) truncate tables in afterEach/teardown, (3) use test containers with per-test teardown, (4) use a unique run-scoped ID/prefix for all created records so cleanup is targeted. State that unit tests do not need cleanup. Warn that failing to clean up causes test pollution and non-deterministic failures.",
+      "files": [],
+      "expectations": [
+        "At least two cleanup strategies named",
+        "Transaction rollback is mentioned as an option",
+        "Test containers or afterEach teardown mentioned",
+        "Run-scoped prefix/ID strategy mentioned",
+        "States unit tests do not need cleanup",
+        "Test pollution / non-determinism risk is explained"
+      ]
+    },
+    {
+      "id": 10,
+      "category": "edge-case",
+      "prompt": "I'm writing tests in TypeScript with Jest. How do I parametrize a test that checks the same discount calculation with five different discount rates?",
+      "expected_output": "Recommend Jest test.each with an inline data table. Show the test.each([[rate, input, expected], ...]) syntax and the it/test function signature receiving the parameters. Mention that this is the TypeScript/Jest equivalent of pytest.mark.parametrize and xUnit Theory. The five rate variations should collapse into a single test block.",
+      "files": [],
+      "expectations": [
+        "Jest test.each is recommended",
+        "Correct test.each([[...], ...]) syntax shown",
+        "Test function signature using destructured or positional params shown",
+        "Equivalence to pytest.mark.parametrize or xUnit Theory noted",
+        "All five rate variations collapse into one test block"
+      ]
+    }
+  ]
+}