Frontier LLMs hallucinate FAR clause numbers somewhere between 0% and 32% of the time. A specialized 150M-parameter model trained in 4 minutes matches Claude Haiku on F1 with less than half the hallucination rate. Open dataset, open model, reproducible.
Related
Interim Log: My First Real Mobile Coding Session – Voice, AI Connectors & The Current State of Developer Tooling
Disclaimer / Introduction This interim log post was drafted in collaboration with Grok 4 (xAI). I...
The Agent Is 20% of the Work. The Platform Is the Other 80%.
A payroll agent hit 94% accuracy in testing and dropped to 70% in production. What closed the gap had nothing to do with the model. Here's what that means for every enterprise team...
I Stayed Up Until 3 AM to Build a Better Claude Code Guide Than the One With 52,000 Stars — Here's What I Found
One night. One obsession. One repo that changed how I think about AI-assisted...