Mastodon discussion Discussions 1h ago

How do scientists actually catch an LLM's errors about their own field, and can a checklist help them catch more?A CHI 2...

by Benjamin Han

How do scientists actually catch an LLM's errors about their own field, and can a checklist help them catch more?A CHI 2026 study builds a schema of 20 LLM error types in seven categories for scholarly QA, grounded in scientists judging answers about papers they wrote. Handing them the schema turned up errors they missed unaided, most often fabricated or misattributed citations, so the taxonomy doubles as a review checklist.https://benjaminhan.net/posts/20260626-expert-schema-scholarly-qa/?utm_source=mastodon&utm_medium=social#LLMs #Evaluation #CHI #AI

Read Original

LLM

Metadata

Account: BenjaminHan@sigmoid.social

Mastodon discussion 14m ago

An interesting failing I found when testing local #LLM models is that if you try to discuss sexual topics, almost every ...

An interesting failing I found when testing local #LLM models is that if you try to discuss sexual topics, almost every single one will occasionally tell you it is against its term...

Mastodon discussion 14m ago

🛍️ Nell’era dell’AI, il negozio fisico resta centrale: relazione, fiducia ed esperienza reale continuano a fare la diffe...

🛍️ Nell’era dell’AI, il negozio fisico resta centrale: relazione, fiducia ed esperienza reale continuano a fare la differenza. #Retail #AI🔗 https://www.tomshw.it/business/lai-nel-r...

Mastodon discussion 15m ago

Klimaatcrisis-hack:Koop een Raspberry Pi, installeer er een #AI model op.en begraaf 'm in de tuin.Als de buren je dan ra...

Klimaatcrisis-hack:Koop een Raspberry Pi, installeer er een #AI model op.en begraaf 'm in de tuin.Als de buren je dan raar aankijken omdat je zoveel drinkwater aan het sproeien ben...

How do scientists actually catch an LLM's errors about their own field, and can a checklist help them catch more?A CHI 2...

Metadata

Related

An interesting failing I found when testing local #LLM models is that if you try to discuss sexual topics, almost every ...

🛍️ Nell’era dell’AI, il negozio fisico resta centrale: relazione, fiducia ed esperienza reale continuano a fare la diffe...

Klimaatcrisis-hack:Koop een Raspberry Pi, installeer er een #AI model op.en begraaf 'm in de tuin.Als de buren je dan ra...