Papers

Latest Trending Top

Papers with Code paper Jul 22

Progress Reward Modeling for Robotic Learning: A Comprehensive Survey

Robotic learning takes place in dynamic environments with large behavior spaces. A terminal success signal only tells the robot whether the task is completed. It does not explain w...

Papers with Code paper Jul 22

SLAI T-Rex: Full-Parameter Post-training of the DeepSeek-V4 Family on Ascend SuperPOD

Full-parameter post-training of trillion-parameter-scale MoE models introduces substantial system-level challenges for large-scale distributed training, including severe memory pre...

Papers with Code paper Jul 22

SLPO: Scaling Latent Reasoning via a Surrogate Policy

Reinforcement learning with verifiable rewards has become the predominant recipe for eliciting test-time scaling in explicit Chain-of-Thought reasoners. Yet this scaling path remai...

Papers with Code paper Jul 22

DocOps: A Verifiable Benchmark for Autonomous Agents in Complex Document Operations

As autonomous agents rapidly evolve, their ability to reliably manipulate ubiquitous digital documents has become critical for enabling general-purpose AI assistants and automating...

Benchmark

Papers with Code paper Jul 22

LLMs Get Lost in Evolving User Intent

As LLMs become more capable, they are increasingly deployed as collaborative agents, taking on user-delegated tasks through iterative interaction. Yet genuine interaction is inhere...

Papers with Code paper Jul 22

Train the Model, Not the Reader: Decodability Supervision for Verifiable Activation Explanations

Natural-language autoencoders score explanations of hidden activations by reconstruction: an explanation is deemed faithful if the activation can be regenerated from it. The test i...

Papers with Code paper Jul 22

ENTRAP-VL: A Taxonomic Probe for Dual Contextual Entrainment in Vision-Language Models

Contextual entrainment is the tendency of a model to let auxiliary context in its input pull its output, independently of whether that context is relevant, true, or even meaningful...

Multimodal

Papers with Code paper Jul 22

NVIDIA-labs OO Agents: Native Python Object-Oriented Agents

Traditional agent development is split across prompt templates, tool schemas, callback code, and workflow graphs. We present NVIDIA Object-Oriented Agents (NOOA), a model-agnostic ...

NVIDIA

Papers with Code paper Jul 22

Self Gradient Forcing: Native Long Video Extrapolation

Recent autoregressive video diffusion methods are increasingly built upon Self Forcing, where the student is trained on histories produced by its own rollout rather than ground-tru...

Papers with Code paper Jul 21

ISO: An RLVR-Native Optimization Stack

Reinforcement learning with verifiable rewards (RLVR) is rapidly advancing the reasoning capabilities of language models, yet the optimization layer that converts reward feedback i...

Papers with Code paper Jul 21

ABot-World-0: Infinite Interactive World Rollout on a Single Desktop GPU

We present ABot-World-0, an action-conditioned video world model for real-time, long-horizon closed-loop interaction, supported by a multi-source data infrastructure spanning AAA g...

AI Hardware

Papers with Code paper Jul 21

Generative World Renderer at the Speed of Play

Generative world renderer AlayaRenderer receives structured world states exported from physics engines and synthesizes RGB frames. Unlike models that generate frames from text/cont...

Papers with Code paper Jul 21

AutoIndex: Learning Representation Programs for Retrieval

We present AutoIndex, a framework for learning representation programs: executable transformations that map raw documents into the representations exposed to a retrieval system. Ra...

Papers with Code paper Jul 21

HPD-Parsing: Hierarchical Parallel Document Parsing

Efficient teamwork typically combines global coordination with parallel execution, a principle not yet fully reflected in unified Vision-Language Model (VLM)-based document parsers...

Papers with Code paper Jul 21

Two-Level Meta-Rubrics for Evaluating Open-Ended Generation: GAMUT, a Benchmark for Factual Completeness

Evaluating the factuality of long-form generations has focused predominantly on precision, measuring whether the claims a model makes are correct. The dominant decompose-search-ver...

Benchmark

Papers with Code paper Jul 21

Stale but Stable: Staleness-Adaptive Trust Regions for Stabilizing Asynchronous Reinforcement Learning

Asynchronous reinforcement learning improves throughput by decoupling rollout generation from optimization, but staleness is an inevitable byproduct compounded by policy lag, engin...

Papers with Code paper Jul 21

Computational Humor with Multimodal LLMs: Methods, Datasets, Evaluation, and Challenges

Multimodal humor in memes, cartoons, and comics remains difficult for AI systems because intended meaning depends on non-literal mechanisms, shared cultural knowledge, and communic...

Multimodal

Papers with Code paper Jul 21

Where Should Optimizer State Live? Tiered State Allocation for Memory-Efficient Mixture-of-Experts Training

Optimizer state is the largest single line item in the memory budget of mixture-of-experts (MoE) training: on a 6.78B-parameter MoE language model, AdamW keeps 50.6 GB of first and...

Papers with Code paper Jul 21

Masked Visual Actions for Unified World Modeling

Video models absorb rich priors over how the visual world moves, interacts, and responds to contact, making them promising substrates for robotic world modeling. The central challe...

Papers with Code paper Jul 21

Appearance Pointers -- Multimodal Region Control of Diffusion Transformers

Controllable image generation remains challenging for creative professionals, who often require precise regional control over materials, object identities, and spatial arrangements...

Multimodal

Papers with Code paper Jul 21

FinanceComplexQA: Benchmarking Agentic Reasoning on Industrial-grade Financial Documents

Agentic Reasoning has become a transformative force in financial analysis due to its ability to integrate large-scale information and generate reliable and accurate content. Howeve...

Agents

Progress Reward Modeling for Robotic Learning: A Comprehensive Survey

SLAI T-Rex: Full-Parameter Post-training of the DeepSeek-V4 Family on Ascend SuperPOD

SLPO: Scaling Latent Reasoning via a Surrogate Policy

DocOps: A Verifiable Benchmark for Autonomous Agents in Complex Document Operations

LLMs Get Lost in Evolving User Intent

Train the Model, Not the Reader: Decodability Supervision for Verifiable Activation Explanations

ENTRAP-VL: A Taxonomic Probe for Dual Contextual Entrainment in Vision-Language Models

NVIDIA-labs OO Agents: Native Python Object-Oriented Agents

Self Gradient Forcing: Native Long Video Extrapolation

ISO: An RLVR-Native Optimization Stack

ABot-World-0: Infinite Interactive World Rollout on a Single Desktop GPU

Generative World Renderer at the Speed of Play

AutoIndex: Learning Representation Programs for Retrieval

HPD-Parsing: Hierarchical Parallel Document Parsing

Two-Level Meta-Rubrics for Evaluating Open-Ended Generation: GAMUT, a Benchmark for Factual Completeness

Stale but Stable: Staleness-Adaptive Trust Regions for Stabilizing Asynchronous Reinforcement Learning

Computational Humor with Multimodal LLMs: Methods, Datasets, Evaluation, and Challenges

Where Should Optimizer State Live? Tiered State Allocation for Memory-Efficient Mixture-of-Experts Training

Masked Visual Actions for Unified World Modeling

Appearance Pointers -- Multimodal Region Control of Diffusion Transformers

H^2SD: Hybrid Hindsight Self-Distillation

NexForge: Scaling Agent Capabilities through Requirement-Driven Task Synthesis for LLMs

Moving Alphabet: A Controlled Study of Training Data for Text-to-Video Generation

FinanceComplexQA: Benchmarking Agentic Reasoning on Industrial-grade Financial Documents