Papers

Latest Trending Top

Papers with Code paper May 29

iVGR: Internalizing Visually Grounded Reasoning for MLLMs with Reinforcement Learning

While visually grounded Chain-of-Thought (CoT) has emerged as a promising paradigm to enhance fine-grained perception in multimodal large language models (MLLMs), its efficacy duri...

Papers with Code paper May 29

Trust-Region Behavior Blending for On-Policy Distillation

On-policy distillation (OPD) trains a student on prefixes sampled from its own policy while matching a stronger teacher. This addresses the prefix mismatch of offline distillation,...

Papers with Code paper May 29

MindZero: Learning Online Mental Reasoning With Zero Annotations

Effective real-world assistance requires AI agents with robust Theory of Mind (ToM): inferring human mental states from their behavior. Despite recent advances, several key challen...

Papers with Code paper May 29

SVI-Bench: A Dynamic Microworld for Strategic Video Intelligence

True video intelligence demands more than recognizing what is visible: it requires reasoning about why events unfold, predicting what would change under different conditions, and d...

Papers with Code paper May 29

Guidance Contrastive Token Credit Assignment for Discrete Policy Optimization

Group-advantage-based reinforcement learning methods, such as GRPO and DAPO, have demonstrated strong performance across diverse domains, including mathematical reasoning and text-...

Papers with Code paper May 29

Masking Stale Observations Helps Search Agents -- Until It Doesn't: A Regime Map and Its Mechanism

Long-horizon search agents accumulate large amounts of retrieved content across many tool calls, making context-budget efficiency increasingly important. A minimal intervention is ...

Papers with Code paper May 29

Skill is Not One-Size-Fits-All: Model-Aware Skill Alignment for LLM Agents

LLM agents increasingly retrieve externally curated skills-procedural instructions retrieved at decision time-to improve performance on long-horizon interactive tasks. Existing ski...

LLM Safety/Alignment

Papers with Code paper May 29

Function2Scene: 3D Indoor Scene Layout from Functional Specifications

Most text-driven 3D indoor scene synthesis methods generate rooms from object-centric prompts, asking what furniture should be placed rather than how the space is used. Yet in real...

Papers with Code paper May 29

Hide-and-Seek in Trajectories: Discovering Failure Signals for VLA Runtime Monitoring

Vision-Language-Action (VLA) models enable robots to follow natural language instructions and generalize across diverse tasks, but they remain vulnerable to execution failures that...

Papers with Code paper May 29

OpenSTBench: Beyond Semantic Evaluation for Speech Translation

Speech translation systems increasingly span speech-to-text translation (S2TT), speech-to-speech translation (S2ST), offline translation, and streaming generation, producing output...

Papers with Code paper May 29

SCOPE: Self-Play via Co-Evolving Policies for Open-Ended Tasks

Self-play can train language models without external supervision. However, existing methods require rule-checkable answers, leaving open-ended tasks dependent on curated prompts or...

Papers with Code paper May 29

The Flip Side of RLHF: On-Policy Feedback for Reward Model Self-Supervised Improvement

Building strong reward models (RMs) for language model alignment is bottlenecked by the cost and difficulty of acquiring diverse and reliable preference data from human annotation ...

Papers with Code paper May 29

Count Anything

Object counting remains fragmented across domain-specific datasets and task formulations, despite rapid progress in generalist vision models. Existing counting models are often tai...

Papers with Code paper May 29

Functional Attention: From Pairwise Affinities to Functional Correspondences

Learning mappings between infinite-dimensional function spaces, or operator learning, is essential for many machine learning applications. Although transformer-based operators are ...

Papers with Code paper May 29

LVSA: Training-Free Sparse Attention for Long Video Diffusion

Dense self-attention is the compute and quality bottleneck of long-video diffusion inference: cost grows quadratically with the sequence length, and beyond the training horizon the...

Papers with Code paper May 29

Human Psychometric Questionnaires Mischaracterize LLM Behavior

We examine whether human psychometric questionnaires can serve as reliable tools for characterizing and predicting LLM behavior in everyday user interactions. We analyze eight open...

LLM

Papers with Code paper May 29

Send a SCOUT First: Pre-hoc Reasoning for Adaptive Detector Allocation in Prompt-Injection Defense

Prompt-injection detectors are heterogeneous: each is strong on a different slice of attacks, and none is always reliable. Yet existing systems still treat detection as a fixed sin...

Papers with Code paper May 29

PaintBench: Deterministic Evaluation of Precise Visual Editing

While current multimodal models are proficient at open-ended visual editing, executing precise single-answer edits remains an important obstacle. To probe this challenge, we introd...

Papers with Code paper May 29

Distilling LLM Feedback for Lean Theorem Proving

Post-training for reasoning models typically combines supervised fine-tuning with reinforcement learning from verifiable rewards, most commonly with GRPO. However, this algorithm s...

LLM

Papers with Code paper May 29

MechVQA: Benchmarking and Enhancing Multimodal LLMs on Comprehensive Mechanical Drawing Understanding

Multimodal Large Language Models (MLLMs) have demonstrated significant achievements in general visual question answering (VQA) tasks. However, they remain brittle on mechanical eng...

Multimodal

Papers with Code paper May 29

SOCO: Benchmarking Semantic Object Correspondence in Vision Foundation Models

Measuring structured object understanding in vision foundation models remains challenging due to inconsistent evaluation protocols and limited part-level supervision. Semantic corr...

Multimodal

Papers with Code paper May 29

iVGR: Internalizing Visually Grounded Reasoning for MLLMs with Reinforcement Learning

Trust-Region Behavior Blending for On-Policy Distillation

MindZero: Learning Online Mental Reasoning With Zero Annotations

SVI-Bench: A Dynamic Microworld for Strategic Video Intelligence

Guidance Contrastive Token Credit Assignment for Discrete Policy Optimization

Masking Stale Observations Helps Search Agents -- Until It Doesn't: A Regime Map and Its Mechanism

Skill is Not One-Size-Fits-All: Model-Aware Skill Alignment for LLM Agents

Function2Scene: 3D Indoor Scene Layout from Functional Specifications

Hide-and-Seek in Trajectories: Discovering Failure Signals for VLA Runtime Monitoring

OpenSTBench: Beyond Semantic Evaluation for Speech Translation

SCOPE: Self-Play via Co-Evolving Policies for Open-Ended Tasks

The Flip Side of RLHF: On-Policy Feedback for Reward Model Self-Supervised Improvement

Count Anything

Functional Attention: From Pairwise Affinities to Functional Correspondences

LVSA: Training-Free Sparse Attention for Long Video Diffusion

Human Psychometric Questionnaires Mischaracterize LLM Behavior

Send a SCOUT First: Pre-hoc Reasoning for Adaptive Detector Allocation in Prompt-Injection Defense

PaintBench: Deterministic Evaluation of Precise Visual Editing

Distilling LLM Feedback for Lean Theorem Proving

MechVQA: Benchmarking and Enhancing Multimodal LLMs on Comprehensive Mechanical Drawing Understanding

SOCO: Benchmarking Semantic Object Correspondence in Vision Foundation Models

MineExplorer: Evaluating Open-World Exploration of MLLM Agents in Minecraft

Discovering Cooperative Pipelines: Autoresearch for Sequential Social Dilemmas

CoHyDE: Iterative Co-Training of LLM Rewriter & Dense Encoder for Tool Retrieval