Mastodon discussion Discussions 5d ago

Agentick Benchmark: GPT-5 Mini Tops at 0.309, No Agent Paradigm DominatesAgentick benchmark evaluates RL, LLM, VLM, and ...

Agentick Benchmark: GPT-5 Mini Tops at 0.309, No Agent Paradigm DominatesAgentick benchmark evaluates RL, LLM, VLM, and hybrid agents on 37 tasks. GPT-5 mini leads at 0.309 ONS, but no paradigm dominates. ASCII beats natural language.https://gentic.news/article/agentick-benchmark-gpt-5-mini-tops#AI #ArtificialIntelligence #Tech

Read Original

Benchmark LLM OpenAI

Metadata

Reblogs Count: 1
Account: genticnews

Mastodon discussion 23m ago

I finished arc-agent, a Go CLI for AI-generated system design workspaces.Instead of one giant chat answer, it stages req...

I finished arc-agent, a Go CLI for AI-generated system design workspaces.Instead of one giant chat answer, it stages requirements, entities, API, high-level design, and diagrams in...

Mastodon discussion 24m ago

care is recognition. you can’t really see it in someone else if you’ve trained yourself out of it.(on caring embarrassin...

care is recognition. you can’t really see it in someone else if you’ve trained yourself out of it.(on caring embarrassingly hard, and why irony costs more than it saves)#philosophy...

Mastodon discussion 25m ago

♬ River: https://suno.com/song/263f064d-07e4-433d-a0b8-1a0ade5abb25 🆙 #game #changer SUNO P #AI #related and #new #style...

♬ River: https://suno.com/song/263f064d-07e4-433d-a0b8-1a0ade5abb25 🆙 #game #changer SUNO P #AI #related and #new #style of #UTAU #vocaloid #ボーカロイド #music #音楽 #udio #kaiber #produc...

Agentick Benchmark: GPT-5 Mini Tops at 0.309, No Agent Paradigm DominatesAgentick benchmark evaluates RL, LLM, VLM, and ...

Metadata

Related

I finished arc-agent, a Go CLI for AI-generated system design workspaces.Instead of one giant chat answer, it stages req...

care is recognition. you can’t really see it in someone else if you’ve trained yourself out of it.(on caring embarrassin...

♬ River: https://suno.com/song/263f064d-07e4-433d-a0b8-1a0ade5abb25 🆙 #game #changer SUNO P #AI #related and #new #style...