Papers

Latest Trending Top

Papers with Code paper Jun 1

VLMs are Good Teachers for Video Reasoning via Adaptive Test-Time Optimization

The recent "Reasoning with Video" paradigm utilizes Video Generation Models (VGMs) to generate temporally coherent visual trajectories to complete reasoning tasks. Although state-o...

Papers with Code paper Jun 1

K-BrowseComp: A Web Browsing Agent Benchmark Grounded in Korean Contexts

Frontier model evaluations are shifting from foundational capabilities (e.g., instruction following and reasoning) toward compositional, agentic ones, but Korean agentic benchmarks...

OpenAI Benchmark

Papers with Code paper Jun 1

On the Scaling of PEFT: Towards Million Personal Models of Trillion Parameters

Parameter-efficient fine-tuning (PEFT) is usually treated as a cheaper alternative to full fine-tuning. We study a broader role: small trainable adapters as persistent local state ...

Fine-Tuning

Papers with Code paper Jun 1

Semantic Motion Anchors: Bridging Motion and Meaning in Co-Speech Gestures

Learning a shared representation between spoken text and gesture is central to co-speech gesture retrieval, synthesis, and understanding, but remains challenging for semantically m...

Papers with Code paper Jun 1

Geometric Latent Reasoning Induces Shorter Generations in LLMs

Large language models solve complex problems by generating lengthy chains of explicit reasoning tokens. While effective, this makes reasoning expensive, length-sensitive, and const...

Papers with Code paper Jun 1

Filter, Then Reweight: Rethinking Optimization Granularity in On-Policy Distillation

On-Policy distillation (OPD) in large language models is shifting from full-trace KL supervision toward more selective training paradigms. Recent OPD methods increasingly focus on ...

Papers with Code paper Jun 1

Harness-1: Reinforcement Learning for Search Agents with State-Externalizing Harnesses

Search agents are often trained as policies over growing transcripts: the model must decide how to search while also remembering what it has seen, which evidence is useful, which c...

Papers with Code paper Jun 1

Policy and World Modeling Co-Training for Language Agents

Reinforcement learning (RL) improves large language model (LLM) agents by teaching them which actions lead to high rewards, but provides little supervision on what those actions do...

Papers with Code paper Jun 1

WALL-WM: Carving World Action Modeling at the Event Joints

WALL-WM is a World Action Model that shifts video-action learning from chunk-centric optimization to event-grounded Vision-Language-Action pretraining, using semantically coherent ...

Papers with Code paper Jun 1

Where Do Deep-Research Agents Go Wrong? Span-Level Error Localization in Agent Trajectories

Deep-research agents solve tasks through long trajectories of search, tool use, evidence inspection, and answer synthesis. Evaluation based on final answers shows whether an agent ...

Papers with Code paper Jun 1

Cosmos 3: Omnimodal World Models for Physical AI

We introduce Cosmos 3, a family of omnimodal world models designed to jointly process and generate language, image, video, audio, and action sequences within a unified mixture-of-t...

Papers with Code paper Jun 1

Off-the-Shelf LLMs as Process Scorers: Training-Free Alternative to PRMs for Mathematical Reasoning

Selecting the best response from multiple small-model samples using a stronger scorer is a simple inference-time strategy, but fails when the small model has already committed to i...

Papers with Code paper Jun 1

Joint Agent Memory and Exploration Learning via Novelty Signals

In open-ended environments, exploration is fundamental for autonomous agents, yet current language model agents struggle with this. Effective exploration requires memory, but retai...

Papers with Code paper Jun 1

Multi-Agent Computer Use

Computer use agents (CUAs) today are primarily deployed as single serial agents. This setup is suboptimal for complex long-horizon tasks that benefit from task decomposition, paral...

Papers with Code paper Jun 1

AutoMedBench: Towards Medical AutoResearch with Agentic AI Models

Autonomous agents are increasingly expected to support end-to-end medical-AI research workflows, moving beyond isolated prediction tasks or short-form clinical question answering. ...

Agents

Papers with Code paper Jun 1

A Local Perturbation Theory for Cross-Domain Interference and Recovery in Multi-Domain RL

Reinforcement learning (RL) post-training improves large language models (LLMs) on individual domains such as mathematical reasoning, code generation, question answering, and creat...

Papers with Code paper Jun 1

Absorbing Complexity: An Interaction-Native Knowledge Harness for Financial LLM Agents

Financial AI agents often fail for a simple reason: they make users carry the complexity. A user must repeatedly restate goals, risk preferences, portfolio context, past judgments,...

LLM

Papers with Code paper Jun 1

SkillHarm: Lifecycle-Aware Skill-Based Attacks via Automated Construction

Agent skills occupy a privileged position in the agent workflow, as agents are expected to implicitly follow and execute them, rendering third-party skills a vulnerable attack surf...

Papers with Code paper Jun 1

LayerRoute: Input-Conditioned Adaptive Layer Skipping via LoRA Fine-Tuning for Agentic Language Models

Agentic language model systems alternate between two structurally distinct step types: structured tool calls (short, deterministic, low perplexity) and open-ended planning/reasonin...

Fine-Tuning Agents

Papers with Code paper Jun 1

DOT-MoE: Differentiable Optimal Transport for MoEfication

The scaling of Large Language Models (LLMs) has driven significant performance gains but created substantial challenges in inference efficiency. While Mixture of Experts (MoEs) arc...

Papers with Code paper Jun 1

Mitigating Perceptual Judgment Bias in Multimodal LLM-as-a-Judge via Perceptual Perturbation and Reward Modeling

Recent multimodal large language models have demonstrated strong reasoning ability, yet their reliability as automated evaluators remains limited by a critical weakness: when visua...

LLM Multimodal

Papers with Code paper Jun 1

VLMs are Good Teachers for Video Reasoning via Adaptive Test-Time Optimization

K-BrowseComp: A Web Browsing Agent Benchmark Grounded in Korean Contexts

On the Scaling of PEFT: Towards Million Personal Models of Trillion Parameters

Semantic Motion Anchors: Bridging Motion and Meaning in Co-Speech Gestures

Geometric Latent Reasoning Induces Shorter Generations in LLMs

Filter, Then Reweight: Rethinking Optimization Granularity in On-Policy Distillation

Harness-1: Reinforcement Learning for Search Agents with State-Externalizing Harnesses

Policy and World Modeling Co-Training for Language Agents

WALL-WM: Carving World Action Modeling at the Event Joints

Where Do Deep-Research Agents Go Wrong? Span-Level Error Localization in Agent Trajectories

Cosmos 3: Omnimodal World Models for Physical AI

Off-the-Shelf LLMs as Process Scorers: Training-Free Alternative to PRMs for Mathematical Reasoning

Joint Agent Memory and Exploration Learning via Novelty Signals

Multi-Agent Computer Use

AutoMedBench: Towards Medical AutoResearch with Agentic AI Models

A Local Perturbation Theory for Cross-Domain Interference and Recovery in Multi-Domain RL

Absorbing Complexity: An Interaction-Native Knowledge Harness for Financial LLM Agents

SkillHarm: Lifecycle-Aware Skill-Based Attacks via Automated Construction

LayerRoute: Input-Conditioned Adaptive Layer Skipping via LoRA Fine-Tuning for Agentic Language Models

DOT-MoE: Differentiable Optimal Transport for MoEfication

Mitigating Perceptual Judgment Bias in Multimodal LLM-as-a-Judge via Perceptual Perturbation and Reward Modeling

AdaCodec: A Predictive Visual Code for Video MLLMs

Scalable Inference-Time Annealing with Surrogate Likelihood Estimators

AFUN: Towards an Affordance Foundation Model for Functionality Understanding