How do scientists actually catch an LLM's errors about their own field, and can a checklist help them catch more?A CHI 2026 study builds a schema of 20 LLM error types in seven categories for scholarly QA, grounded in scientists judging answers about papers they wrote. Handing them the schema turned up errors they missed unaided, most often fabricated or misattributed citations, so the taxonomy doubles as a review checklist.https://benjaminhan.net/posts/20260626-expert-schema-scholarly-qa/?utm_source=mastodon&utm_medium=social#LLMs #Evaluation #CHI #AI
Related
An interesting failing I found when testing local #LLM models is that if you try to discuss sexual topics, almost every ...
An interesting failing I found when testing local #LLM models is that if you try to discuss sexual topics, almost every single one will occasionally tell you it is against its term...
🛍️ Nell’era dell’AI, il negozio fisico resta centrale: relazione, fiducia ed esperienza reale continuano a fare la diffe...
🛍️ Nell’era dell’AI, il negozio fisico resta centrale: relazione, fiducia ed esperienza reale continuano a fare la differenza. #Retail #AI🔗 https://www.tomshw.it/business/lai-nel-r...
Klimaatcrisis-hack:Koop een Raspberry Pi, installeer er een #AI model op.en begraaf 'm in de tuin.Als de buren je dan ra...
Klimaatcrisis-hack:Koop een Raspberry Pi, installeer er een #AI model op.en begraaf 'm in de tuin.Als de buren je dan raar aankijken omdat je zoveel drinkwater aan het sproeien ben...