A 60-line static probe found 3 contamination points in an agent eval harness, exit 1, without running the agent. Passing the eval is not solving the task.
Passing the Eval Isn't Solving the Task: 3 Leaks, 60 Lines
A 60-line static probe found 3 contamination points in an agent eval harness, exit 1, without running the agent. Passing the eval is not solving the task.
My AI conversations were scattered across three apps that couldn't remember each other. So I built a...
DProvenanceKit — regression testing and observability for the reasoning of AI agents (Python, zero...
This week's tooling moves cluster around a common theme: eliminating the overhead tax on developer...