Mastodon discussion Discussions 4d ago

An empirical benchmark of three frontier LLMs on the SmartBugs dataset, with one methodology gotcha that almost cost a m...

by HackerNoon

An empirical benchmark of three frontier LLMs on the SmartBugs dataset, with one methodology gotcha that almost cost a model 20 points of measured recall. https://hackernoon.com/can-llms-audit-smart-contracts-benchmarking-claude-opus-47-gpt-55-and-gemini-31-pro #ai

Read Original

Anthropic Benchmark Google

Metadata

Reblogs Count: 1
Account: hackernoon@mas.to

Mastodon discussion 28m ago

(AI中英字幕)大鑊！AI衝擊大學生就業市場！過去香港有優勢的職業，竟然最容易被取代！《蕭若元：蕭氏新聞台》2026-05-16https://www.youtube.com/watch?v=xDh8GGrZuGY

(AI中英字幕)大鑊！AI衝擊大學生就業市場！過去香港有優勢的職業，竟然最容易被取代！《蕭若元：蕭氏新聞台》2026-05-16https://www.youtube.com/watch?v=xDh8GGrZuGY

Mastodon discussion 28m ago

Colossus: The Forbin Projecthttps://en.wikipedia.org/wiki/Colossus:_The_Forbin_Project#HackerNews #Colossus #ForbinProje...

Colossus: The Forbin Projecthttps://en.wikipedia.org/wiki/Colossus:_The_Forbin_Project#HackerNews #Colossus #ForbinProject #AI #Technology #SciFi #Film

Mastodon discussion 33m ago

An empirical benchmark of three frontier LLMs on the SmartBugs dataset, with one methodology gotcha that almost cost a m...

Metadata

Related

(AI中英字幕)大鑊！AI衝擊大學生就業市場！過去香港有優勢的職業，竟然最容易被取代！《蕭若元：蕭氏新聞台》2026-05-16https://www.youtube.com/watch?v=xDh8GGrZuGY

Colossus: The Forbin Projecthttps://en.wikipedia.org/wiki/Colossus:_The_Forbin_Project#HackerNews #Colossus #ForbinProje...

Every time I use image search I hate #AI a little more. Nothing has ever made me hate it less, fwiw.