Mastodon discussion Discussions 1h ago

Reflexion splits self-correction in two: an Evaluator that detects success/failure, and a Self-Reflection model that dia...

by Benjamin Han

Reflexion splits self-correction in two: an Evaluator that detects success/failure, and a Self-Reflection model that diagnoses what went wrong. The Evaluator's external signal — heuristic, exact-match, or test execution — gates whether diagnosis fires. When that signal misfires, as on MBPP Python's high false-negative rate, Self-Reflection rewrites correct code wrong, exactly the failure mode Cannot-Self-Correct documented.https://benjaminhan.net/posts/20260516-reflexion/?utm_source=mastodon&utm_medium=social#LLMs #AI #Reasoning #Agents #Metacognition

Read Original

Metadata

Account: BenjaminHan@sigmoid.social

Mastodon discussion 23m ago

🧠 Forse non siamo più solo all’inizio della capacità tecnica dell’#AI.📈 Siamo all’inizio del suo impatto reale su aziend...

🧠 Forse non siamo più solo all’inizio della capacità tecnica dell’#AI.📈 Siamo all’inizio del suo impatto reale su aziende, lavoro e società.👉 Alcune riflessioni: https://www.linked...

Mastodon discussion 30m ago

How to fight AI if you need to or get the chance.https://siliconreckoner.substack.com/p/questions-to-ask-ai-boosters#AI ...

How to fight AI if you need to or get the chance.https://siliconreckoner.substack.com/p/questions-to-ask-ai-boosters#AI #slop #environment #economics #StopAI

Mastodon discussion 32m ago

llama.cpp lands Multi-Token Prediction support with up to 1.8x speedups, OpenAI hands ChatGPT Plus to an entire country,...

llama.cpp lands Multi-Token Prediction support with up to 1.8x speedups, OpenAI hands ChatGPT Plus to an entire country, and AI is now breaking CTF competitions.https://ai0.news/po...

Reflexion splits self-correction in two: an Evaluator that detects success/failure, and a Self-Reflection model that dia...

Metadata

Related

🧠 Forse non siamo più solo all’inizio della capacità tecnica dell’#AI.📈 Siamo all’inizio del suo impatto reale su aziend...

How to fight AI if you need to or get the chance.https://siliconreckoner.substack.com/p/questions-to-ask-ai-boosters#AI ...

llama.cpp lands Multi-Token Prediction support with up to 1.8x speedups, OpenAI hands ChatGPT Plus to an entire country,...