goalkeeper

You say you're done. She checks the list. You're not done.

2.4–15.6× more tasks finished · 5/5 vs ~2/5 delivered · 80% fewer defects shipped

From 20,000 seeded simulation runs across mild/typical/severe early-stop rates. Reproduce it yourself.

You know her. Peaked cap, clipboard, posted at the only door out of the building. Has signed off every release since before CI existed. You tell her you're finished; she runs a finger down the list, says "three of these aren't checked," and points you back to your desk.

goalkeeper puts her on the Stop hook of your AI agent.

The philosophy is one line: the agent doesn't get to decide it's done — the checklist does. Most agents stop the moment they think they're finished — the half-done refactor, the "I'll leave the tests to you", the silently dropped requirement. goalkeeper holds a session to an explicit, verifiable checklist and bounces every premature stop straight back into more work.

The decision it makes

On every attempt to stop, goalkeeper walks a short ladder:

Guard disarmed? (off mode) → let it stop.
No goals on the checklist? → let it stop. (An un-armed session is never trapped.)
Every goal done — and, in strict mode, verified? → release; let it stop.
Stuck (blocked repeatedly with no progress)? → stand down, surface what's unfinished.
Otherwise → block the stop and hand the agent its own open checklist as the next instruction.

The agent keeps going until the job is provably done — and in strict mode, until it has double-checked its own work. Nothing on the chopping block: the loop is bounded, fails open, and never compromises a real stop.

Results

From the bundled, seeded simulation of the mechanism (node benchmarks/simulate.js — K=5 verifiable subgoals, 20k trials per cell; assumptions stated in benchmarks/README.md):

2.4× – 15.6× more tasks finished end-to-end, without a human nudge, as an agent's early-stop rate rises from mild to severe.
Full delivery every run — 5 of 5 subgoals vs 1.9 – 3.4 of 5 for an unguarded agent.
80% fewer defects shipped in strict mode, because every claimed completion buys an independent verification pass.

Agent early-stop rate	Tasks finished — bare	Tasks finished — goalkeeper	Lift
20% (mild)	41%	100%	2.4×
35% (typical)	18%	100%	5.5×
50% (severe)	6%	100%	15.6×

These quantify the mechanism under a transparent model, not a vendor benchmark — the harness is in the repo, the seed is fixed, and the numbers reproduce byte-for-byte. To measure your own model on your own tasks, run any task twice (guard off vs strict) as described in benchmarks/README.md.

How it works (the whole trick)

Claude Code and Codex both fire a Stop hook when the agent tries to end its turn. goalkeeper's hook looks at your goal checklist and, if anything is open, returns:

{ "decision": "block", "reason": "STOP BLOCKED BY GOALKEEPER. You are not done. 2 goals remain open: …" }

The host feeds that reason back to the model instead of stopping — on Claude Code it continues the turn, on Codex it becomes the next user prompt. Either way the agent reads its own unfinished checklist and gets back to work, no human in the loop. The guard releases the instant the last goal closes.

That's it. No daemon, no network, no magic — one hook and a JSON file. The exact same hooks run on both hosts; Codex even exposes CLAUDE_PLUGIN_ROOT as a compatibility alias, so nothing in the engine changes between them.

Install

Claude Code

/plugin marketplace add publu/goalkeeper
/plugin install goalkeeper@goalkeeper

Codex

codex plugin marketplace add publu/goalkeeper
codex

Then open /plugins, install goalkeeper, open /hooks, review and trust its hooks, and start a new thread. (In Codex, commands are invoked with @, e.g. @goalkeeper:status.)

Requires node on your PATH. If node is missing, the hooks no-op and the host behaves exactly as if goalkeeper weren't installed.

Use it

Point it at an objective and walk away:

/goalkeeper:go get the auth refactor to green — all tests pass and lint is clean

goalkeeper asks the agent to break that into concrete, verifiable goals, arms the guard, and the agent works until every one is checked off. Check status anytime:

/goalkeeper:status

goalkeeper  mode=strict  (blocks until every goal is done AND independently verified)
3 goal(s), 1 open:
  [x] g1: npm test exits 0
  [x] g2: eslint reports 0 errors
  [~] g3: README documents the new AUTH_SECRET env var   <-- needs verification

When the checklist is empty, the guard steps aside on its own. You never run a "turn it off" command after success.

Modes

Mode	Behavior
`off`	Disarmed. Never blocks.
`lite`	Blocks once with a reminder, then lets the agent stop.
`standard`	Blocks until every goal is marked done. (default)
`strict`	Blocks until every goal is done and independently verified.

/goalkeeper:mode strict

Strict mode is the long-runner. Marking a goal done isn't enough — it stays open until the agent does a separate, evidence-based pass (re-read the code, run the test, prove it) and marks it verified. Every claim of completion buys a double-check. That single rule is the biggest reason an armed session keeps working.

Commands

Command	Does
`/goalkeeper:go <objective>`	Decompose an objective into goals and start.
`/goalkeeper:add <goal>`	Add one verifiable goal.
`/goalkeeper:status`	Show mode + checklist.
`/goalkeeper:mode [off\|lite\|standard\|strict]`	Get/set strictness.
`/goalkeeper:release`	Clear goals, stand the guard down.
`/goalkeeper:help`	What goalkeeper is, in the session.

The agent checks goals off as it works via the bundled CLI (done, verify, reopen, remove) — you rarely touch it directly.

It can't loop forever

A guard that could wedge a session would be worse than no guard. goalkeeper has three independent exits:

off mode disarms it entirely.
An empty checklist releases it — and an un-armed session is never trapped.
A no-progress loop budget (GOALKEEPER_MAX_LOOPS, default 30) stands the guard down if it blocks repeatedly without the open-goal count falling, then surfaces the unfinished goals to you. Progress refills the budget, so a productive agent never trips it — only a genuinely stuck one does.

And every hook fails open: any error, malformed input, or corrupt state file results in a normal stop. goalkeeper can extend a session; it can never freeze one.

Configuration

Env var	Default	Effect
`GOALKEEPER_DEFAULT_MODE`	`standard`	Starting mode.
`GOALKEEPER_MAX_LOOPS`	`30`	Max stop-blocks without progress before standing down.

State lives in <project>/.goalkeeper/state.json — plain JSON, safe to read, edit, or delete by hand.

What it is, and isn't

goalkeeper governs when an agent may stop — nothing else. It is not a planner, a sandbox, a permission system, or a scheduler. It tracks completion; the agent and you decide what the goals are. The full design is in SPEC.md.

Honesty is the whole game

goalkeeper can force the agent to keep going, but only the agent can close a goal, and only honestly. The bundled skill drills one rule into the agent: never mark a goal done to escape the guard. Out-of-scope goals are dropped explicitly, with a reason, never silently. Strict mode's verification pass exists precisely to turn "the model said it's done" into "the model proved it's done."

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.agents/plugins		.agents/plugins
.claude-plugin		.claude-plugin
.codex-plugin		.codex-plugin
assets		assets
benchmarks		benchmarks
commands		commands
hooks		hooks
skills/goalkeeper		skills/goalkeeper
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
SPEC.md		SPEC.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

goalkeeper

The decision it makes

Results

How it works (the whole trick)

Install

Use it

Modes

Commands

It can't loop forever

Configuration

What it is, and isn't

Honesty is the whole game

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

goalkeeper

The decision it makes

Results

How it works (the whole trick)

Install

Use it

Modes

Commands

It can't loop forever

Configuration

What it is, and isn't

Honesty is the whole game

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages