Add ClawBench to Evaluation Harnesses & Benchmarks by reacher-z · Pull Request #9 · Picrew/awesome-agent-harness

reacher-z · 2026-05-20T23:44:41Z

Adds ClawBench to Evaluation Harnesses & Benchmarks.

ClawBench evaluates browser agents on live production websites (Uber Eats, Indeed, Craigslist, etc.). Two-stage harness: HTTP-request interception at per-task URL/method schema + LLM judge on the intercepted payload.

283 tasks (V1 153 + V2 130) across 163 live platforms · 15 life categories
Paper: https://arxiv.org/abs/2604.08523 · Live: https://claw-bench.com
Already sits next to WildClawBench in the table — complementary (Wild evaluates inside OpenClaw env; ClawBench evaluates on the open web).

Affiliation: I'm one of the maintainers.

Add ClawBench to Evaluation Harnesses & Benchmarks

5d1fa3f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ClawBench to Evaluation Harnesses & Benchmarks#9

Add ClawBench to Evaluation Harnesses & Benchmarks#9
reacher-z wants to merge 1 commit into
Picrew:mainfrom
reacher-z:add-clawbench

reacher-z commented May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

reacher-z commented May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant