An empirical benchmark of three frontier LLMs on the SmartBugs dataset, with one methodology gotcha that almost cost a m...

An empirical benchmark of three frontier LLMs on the SmartBugs dataset, with one methodology gotcha that almost cost a model 20 points of measured recall. https://hackernoon.com/can-llms-audit-smart-contracts-benchmarking-claude-opus-47-gpt-55-and-gemini-31-pro #ai

Read Original

Related