Papers | AI Hub

Latest Trending Top

Papers with Code paper 6d ago

Closing the Loop: Training-Free Revisit Consistency for Autoregressive Generative Rendering

Recent conditional video generation models have shown promising potentials to transform 3D engine renderings, such as depth maps and untextured geometry, into photorealistic videos...

Papers with Code paper 6d ago

SANA-Video 2.0: Hybrid Linear Attention with Attention Residuals for Efficient Video Generation

We introduce SANA-Video 2.0, a hybrid video diffusion transformer instantiated at 5B and 14B scales under a unified architecture. Designed to generate high-quality video up to 720p...

Papers with Code paper 6d ago

Show, Don't Tell: Evaluating Spatial Cognition in Generative Pixels Rather Than LLM Text

Spatial intelligence is essential for agents to move from static semantic understanding toward interacting with the physical world. Many spatial tasks are grounded in continuous vi...

LLM

Papers with Code paper 6d ago

Recurrent Sinusoidal INRs for Efficient High-Fidelity Representation

We study sinusoidal recurrence as an iterative mechanism for harmonic spectral enrichment in implicit neural representations (INRs). Our analysis reveals that sinusoidal activation...

Papers with Code paper 6d ago

TableVerse: A Large-scale Tabletop Dataset with Real-world Grounded Layouts for Generalizable Manipulation

The development of generalizable robotic manipulation policies is inherently bounded by the availability of large-scale, high-fidelity scene data. While recent automated synthesis ...

Robotics

Papers with Code paper 6d ago

Streaming Multi-Agent Autoregressive Diffusion Model with World State Registers

Multi-agent interactive world models should not only generate consistent observations, but also maintain world states that persist across agents and evolve across views. Existing a...

Image Generation

Papers with Code paper 6d ago

ICAE-Bench: Evaluating Coding Agents as Interactive Project Builders

The recent emergence of vibe-coding workflows is changing what coding agents are expected to do. Instead of merely completing code under fully specified instructions, agents are in...

Papers with Code paper 6d ago

Oxygen-TryOn: Fashion-Native Foundation Model for Any-item Virtual Try-On

We present Oxygen-TryOn, a unified foundation model for any-item virtual try-on. Rather than repurposing a general-purpose image editor, Oxygen-TryOn is fashion-native, built for t...

LLM

Papers with Code paper 6d ago

AREX: Towards a Recursively Self-Improving Agent for Deep Research

Deep research requires agents to find answers that jointly satisfy multiple constraints. Discovering such answers is costly, whereas verifying a candidate can often be decomposed i...

Papers with Code paper 6d ago

GraphVid: Interactive Graph-Controllable Video Generation

Controllable video generation remains challenging due to the difficulty of specifying precise multi-object interactions using text prompts or motion-control inputs that primarily c...

Papers with Code paper 6d ago

Tencent WorkBuddy Bench: A Multi-Domain Coding-Agent Benchmark with Contamination-Resistant Task Construction

We introduce Tencent WorkBuddy Bench, a multi-domain evaluation suite for coding agents; this report documents its construction methodology, scoring protocol, and a cross-model lea...

Benchmark

Papers with Code paper 6d ago

Agentic Context Management: Solving Agent Memory and Cost by Treating Them as Lifecycle and Architecture Problems

Production AI agents' failures are less often due to an inability to reason well and more often because they cannot manage what is in their reasoning context: conversation historie...

Agents

Papers with Code paper 6d ago

K12-KGraph: A Curriculum-Aligned Knowledge Graph for Benchmarking and Training Educational LLMs

Large language models are increasingly used in K-12 education, but existing benchmarks mainly test exam question answering rather than understanding how curriculum knowledge is str...

Papers with Code paper 6d ago

Sample-Efficient Learning from Agent Experience

Real-world agent learning is often constrained by costly environment interactions, such as running time-consuming experiments or obtaining human feedback. In-context learning offer...

Papers with Code paper Jul 22

Trace: A Taxonomy-Guided Environment for Multidomain Visual Reasoning

Reinforcement learning with verifiable rewards (RLVR) has substantially improved language-model reasoning, yet its extension to vision-language models remains constrained by the la...

Papers with Code paper Jul 22

G-MAD: A Game-Based Data Generation Framework for Multi-View RGB-T Aerial Object Detection

This work introduces G-MAD, an open-source framework that uses Arma3 to generate synchronized multi-view RGB-T data for aerial object detection. G-MAD addresses key limitations of ...

Papers with Code paper Jul 22

SeededGrasp: Language-Guided Grasping in Complex Scenes with Multiple Embodiments

Practical robotic grasping in complex scenes requires both 3D spatial reasoning and alignment with task-specific requirements. Vision-language models (VLMs) offer a natural way to ...

Papers with Code paper Jul 22

ATSplat: Compact Feed-forward 3D Gaussian Splatting with Adaptive Token Expansion

3D Gaussian Splatting (3DGS) achieves high-quality novel-view synthesis by optimizing freely placed primitives in 3D and adaptively densifying them in under-reconstructed regions. ...

Papers with Code paper Jul 22

Reading and Steering Representations of Materials-Science Mechanisms in an Open-Weight Language Model

Large language models can answer scientific questions, yet a correct output does not reveal whether the model represents or uses the governing physics. Here we show that materials ...

Google LLM

Papers with Code paper Jul 22

Robostral Navigate

Deploying navigation systems at scale requires a recipe that minimizes sensor assumptions, generalizes across robot embodiments, and trains efficiently. Yet, today's best systems d...

Papers with Code paper Jul 22

ReferTrack: Referring Then Tracking for Embodied Visual Tracking

Embodied visual tracking (EVT) requires a mobile agent to continuously follow a specific target described in natural language using only onboard vision. While recent vision-languag...

Papers with Code paper Jul 22

Molt: A Scalable PyTorch-Native Training Framework for Agentic Reinforcement Learning

Agentic reinforcement learning research is constant algorithm modification, new estimators, new pipeline stages, new rollout schemes, and in mainstream frameworks each change threa...

Agents

Papers with Code paper Jul 22

Multimodal Speaker Verification as a Threat to Speaker Anonymization

Most automatic speaker verification (ASV) systems operate on individual utterances, despite real-world interactions typically consisting of multiple utterances. As speech accumulat...

Multimodal

Papers with Code paper Jul 22

Beyond Relevance-Centric Retrieval: Rubric-Oriented Document Set Selection and Ranking

As large language models and AI agents become the primary consumers of search results, document set quality determines the upper bound of downstream generation. Yet existing evalua...