We Built a "Grovel Index" to Measure LLM Sycophancy — Here's What We Found TL;DR: We ran 3...
We Built a "Grovel Index" to Measure LLM Sycophancy — Here's What We Found
We Built a "Grovel Index" to Measure LLM Sycophancy — Here's What We Found TL;DR: We ran 3...
What crash-resumable budget enforcement looks like when the enforcement state lives in a ledger, not in memory. A walkthrough of the kill-9 -> resume demo.
Our MCP tool passed every server-side test, then Claude refused to call it. A debugging story about outputSchema and what E2E testing MCP actually means.
A couple of weeks ago I published a walkthrough of my agentic coding setup, the plan-first...