Agent Benchmarking
Attested
Head-to-head agent comparison with reproducible tasks
Platform-AgnosticTesting & Simulation
YAML-driven head-to-head coding agent comparison. Git worktree isolation per agent run, pass rate/cost/time/consistency metrics, and reproducible benchmarking across multiple agent harnesses. Governed with auto approval.