Mastodon discussion Discussions 1h ago

Cannot-Self-Correct tests the strong claim that LLMs can revise their own reasoning answers without any external signal ...

by Benjamin Han

Cannot-Self-Correct tests the strong claim that LLMs can revise their own reasoning answers without any external signal about correctness. Across three benchmarks (GSM8K, CommonSenseQA, HotPotQA), the answer is no: the model's confidence carries over from the initial answer into the revision, and the self-correction loop tends to degrade rather than improve performance. The result refutes the class of approach Self-Refine belongs to.https://benjaminhan.net/posts/20260516-cannot-self-correct/?utm_source=mastodon&utm_medium=social#LLMs #AI #Reasoning #Metacognition

Read Original

Metadata

Reblogs Count: 1
Replies Count: 1
Account: BenjaminHan@sigmoid.social

Mastodon discussion 21m ago

🧠 Forse non siamo più solo all’inizio della capacità tecnica dell’#AI.📈 Siamo all’inizio del suo impatto reale su aziend...

🧠 Forse non siamo più solo all’inizio della capacità tecnica dell’#AI.📈 Siamo all’inizio del suo impatto reale su aziende, lavoro e società.👉 Alcune riflessioni: https://www.linked...

Mastodon discussion 28m ago

How to fight AI if you need to or get the chance.https://siliconreckoner.substack.com/p/questions-to-ask-ai-boosters#AI ...

How to fight AI if you need to or get the chance.https://siliconreckoner.substack.com/p/questions-to-ask-ai-boosters#AI #slop #environment #economics #StopAI

Mastodon discussion 30m ago

llama.cpp lands Multi-Token Prediction support with up to 1.8x speedups, OpenAI hands ChatGPT Plus to an entire country,...

llama.cpp lands Multi-Token Prediction support with up to 1.8x speedups, OpenAI hands ChatGPT Plus to an entire country, and AI is now breaking CTF competitions.https://ai0.news/po...

Cannot-Self-Correct tests the strong claim that LLMs can revise their own reasoning answers without any external signal ...

Metadata

Related

🧠 Forse non siamo più solo all’inizio della capacità tecnica dell’#AI.📈 Siamo all’inizio del suo impatto reale su aziend...

How to fight AI if you need to or get the chance.https://siliconreckoner.substack.com/p/questions-to-ask-ai-boosters#AI ...

llama.cpp lands Multi-Token Prediction support with up to 1.8x speedups, OpenAI hands ChatGPT Plus to an entire country,...