Papers

Latest Trending Top

Papers with Code paper May 27

SmartDirector: Keyframe-Conditioned Cinematic Video Generation with Narrative Pacing Control

The narrative quality of a video fundamentally determines its perceptual value. Although existing video generation methods can produce visually appealing content, they predominantl...

Papers with Code paper May 27

AsyncTool: Evaluating the Asynchronous Function Calling Capability under Multi-Task Scenarios

Large language model (LLM)-based agents have shown strong capabilities in using external tools to solve complex tasks. However, existing evaluations often overlook the temporal dim...

Papers with Code paper May 27

Pruning and Distilling Mixture-of-Experts into Dense Language Models

Mixture-of-Experts (MoE) is now the dominant architecture for frontier language models, yet it requires all expert parameters to be loaded in memory, making it less preferable for ...

Papers with Code paper May 27

Which Pretraining Paradigm Better Serves Spatial Intelligence? An Empirical Comparison of Vision-Language and Video Generation Models

Spatial intelligence requires visual representations that capture both semantic objects and geometric structure in the physical world. To support this, two major pre-training schem...

Multimodal

Papers with Code paper May 27

A Matter of TASTE: Improving Coverage and Difficulty of Agent Benchmarks

As agent capabilities advance, existing benchmarks, such as τ^2-Bench, are becoming increasingly saturated. Yet constructing new benchmark tasks remains complex, costly, and labor-...

Papers with Code paper May 27

Frequency-Guided Action Diffusion via Sub-Frequency Manifold Traversal

Learning visuomotor policies via behavior cloning typically involves mimicking expert demonstrations collected by human operators. However, natural human demonstrations inherently ...

Papers with Code paper May 27

Comprehensive Benchmarking of Long-Form Speech Generation in Diverse Scenarios

Recent advances in speech generation have enabled high-fidelity synthesis, yet systematic evaluation of models under long-context conditions remains largely underexplored. A compre...

Papers with Code paper May 27

Models That Know How Evaluations Are Designed Score Safer

The validity of AI safety evaluations depends on models behaving consistently across controlled and deployment settings. Prior work has identified test-time contextual cues, such a...

Papers with Code paper May 27

FRAPPE: Full Input, Residual Output Autoencoding with Projection Pursuit Encoder

Media compression standards have reached a plateau in terms of the rate-distortion-complexity trade-off, limiting the ability to offload expensive AI perception to the cloud in app...

Papers with Code paper May 27

ESC-Skills: Discovering and Self-Evolving Skills for Emotional Support Conversations

Existing emotional support conversation (ESC) systems mainly rely on end-to-end response generation or coarse strategy supervision, offering limited interpretability and little sup...

Papers with Code paper May 27

DenoiseRL: Bootstrapping Reasoning Models to Recover from Noisy Prefixes

Reinforcement learning has become a central paradigm for advancing reasoning in large language models, yet most existing methods still depend on stronger teacher models or heavily ...

Papers with Code paper May 27

Review Arcade: On the Human Alignment and Gameability of LLM Reviews

LLM-generated reviews for scientific papers are gaining considerable traction and are even being officially piloted by major conferences. We have to assume that not only reviewers ...

LLM Safety/Alignment

Papers with Code paper May 27

Augmenting Attention with Exponentially Decaying Memory Improves Query-Aware KV Sparsity

Efficient inference is critical for long-context language models, where attention computation and KV-cache access dominate the cost. Recent work RAT+, introduces a recurrence-augme...

Papers with Code paper May 27

Rethinking Memory as Continuously Evolving Connectivity

Existing memory-augmented LLM agents often treat memory as a static repository with pre-defined representations and fixed retrieval pipelines, which is brittle in dynamic agentic e...

Papers with Code paper May 27

MemTrace: Tracing and Attributing Errors in Large Language Model Memory Systems

Memory is essential for enabling large language models to support long-horizon reasoning, yet existing memory systems remain unreliable and difficult to debug. Tracing memory's dyn...

LLM

Papers with Code paper May 27

GEM: Generative Supervision Helps Embodied Intelligence

Embodied Vision-Language Models (VLMs) have demonstrated impressive performance and generalization in robotics, particularly within Vision-Language-Action frameworks. However, a si...

Papers with Code paper May 27

Skill0.5: Joint Skill Internalization and Utilization for Out-of-Distribution Generalization in Agentic Reinforcement Learning

Equipping large language models with explicit skills has emerged as a promising paradigm for enabling autonomous agents to solve complex tasks. Agent skills can be inherently divid...

Agents

Papers with Code paper May 27

Long Live The Balance: Information Bottleneck Driven Tree-based Policy Optimization

Recent advances in online reinforcement learning (RL) for large language models (LLMs) have demonstrated promising performance in complex reasoning tasks. However, they often exhib...

Papers with Code paper May 27

OmniVerifier-M1: Multimodal Meta-Verifier with Explicit Structured Recalibration

Visual outcomes are increasingly central to multimodal large language models, making reliable and fine-grained verification essential for scaling generalist foundation models. In t...

Multimodal

Papers with Code paper May 27

SmartDirector: Keyframe-Conditioned Cinematic Video Generation with Narrative Pacing Control

AsyncTool: Evaluating the Asynchronous Function Calling Capability under Multi-Task Scenarios

Pruning and Distilling Mixture-of-Experts into Dense Language Models

Which Pretraining Paradigm Better Serves Spatial Intelligence? An Empirical Comparison of Vision-Language and Video Generation Models

A Matter of TASTE: Improving Coverage and Difficulty of Agent Benchmarks

Frequency-Guided Action Diffusion via Sub-Frequency Manifold Traversal

Comprehensive Benchmarking of Long-Form Speech Generation in Diverse Scenarios

Models That Know How Evaluations Are Designed Score Safer

FRAPPE: Full Input, Residual Output Autoencoding with Projection Pursuit Encoder

ESC-Skills: Discovering and Self-Evolving Skills for Emotional Support Conversations

DenoiseRL: Bootstrapping Reasoning Models to Recover from Noisy Prefixes

Review Arcade: On the Human Alignment and Gameability of LLM Reviews

Augmenting Attention with Exponentially Decaying Memory Improves Query-Aware KV Sparsity

Rethinking Memory as Continuously Evolving Connectivity

MemTrace: Tracing and Attributing Errors in Large Language Model Memory Systems

GEM: Generative Supervision Helps Embodied Intelligence

Skill0.5: Joint Skill Internalization and Utilization for Out-of-Distribution Generalization in Agentic Reinforcement Learning

Long Live The Balance: Information Bottleneck Driven Tree-based Policy Optimization

OmniVerifier-M1: Multimodal Meta-Verifier with Explicit Structured Recalibration

LiveBrowseComp: Are Search Agents Searching, or Just Verifying What They Already Know?

When Confidence Misleads: Suffix Anchoring and Anchor-Proximity Confidence Modulation for Diffusion Language Models

AlphaTransit: Learning to Design City-scale Transit Routes

The Fragility of Chain-of-Thought Monitoring Across Typologically Diverse Languages

GUI-CIDER: Mid-training GUI Agents via Causal Internalization and Density-aware Exemplar Reselection