Alibaba's Metis AI agent uses HDPO reinforcement learning to cut redundant tool calls from 98% to 2% while improving acc...

Alibaba's Metis AI agent uses HDPO reinforcement learning to cut redundant tool calls from 98% to 2% while improving accuracy. The 8B model beats larger agents on reasoning benchmarks and is open source. https://venturebeat.com/orchestration/alibabas-metis-agent-cuts-redundant-ai-tool-calls-from-98-to-2-and-gets-more-accurate-doing-it #Tech #Startup #News #AI

Read Original

Related