Papers

Latest Trending Top

Papers with Code paper Jul 21

NexForge: Scaling Agent Capabilities through Requirement-Driven Task Synthesis for LLMs

Scaling executable agent training data for LLM post-training is bottlenecked by substrate-bound methods that tie task generation to predefined tools, repositories, or skill graphs:...

Papers with Code paper Jul 21

Moving Alphabet: A Controlled Study of Training Data for Text-to-Video Generation

Text-to-video generation has advanced significantly over the past five years through scaling of model size, data, and compute. Unlike model architecture, training data is often und...

Video Generation

Papers with Code paper Jul 21

FinanceComplexQA: Benchmarking Agentic Reasoning on Industrial-grade Financial Documents

Agentic Reasoning has become a transformative force in financial analysis due to its ability to integrate large-scale information and generate reliable and accurate content. Howeve...

Agents

Papers with Code paper Jul 21

AgentDebugX: An Open-Source Toolkit for Failure Observability, Attribution, and Recovery in LLM Agents

LLM agent failures are difficult to debug because the step where an error surfaces is often not the one that caused it. Existing observability tools replay execution traces but pro...

LLM Open Source

Papers with Code paper Jul 21

Text Template Tokens Are Implicit Semantic Registers in Diffusion Transformers

Text-to-image diffusion transformers (DiTs) jointly process text and image tokens, yet their internal computation during denoising remains poorly understood. We introduce a causal ...

Papers with Code paper Jul 21

Scaling Laws for Hypernetwork-Based Knowledge Injection in Large Language Models

Injecting factual knowledge into large language models (LLMs) reliably and at scale remains an open challenge. Hypernetworks provide a promising solution to large-scale knowledge i...

Papers with Code paper Jul 21

Delineate Anything v2: A Global Foundation Model for Field Delineation

Accurate agricultural field boundary delineation at large scale is a foundational task for food security, supply chain transparency, and carbon accounting. While vision foundation ...

LLM

Papers with Code paper Jul 21

Mage-Flow: An Efficient Native-Resolution Foundation Model for Image Generation and Editing

Large-scale visual generators are increasingly capable but costly to train, fine-tune, and deploy. We introduce Mage-Flow, a compact 4B-scale generative stack for efficient text-to...

LLM

Papers with Code paper Jul 21

Transcription Policy as a Latent Variable: Activating Controllable Verbatim ASR with Word-Level Timing

Modern ASR models trained on heterogeneously annotated data treat transcription style (verbatim vs. intended) as an uncontrolled latent variable, causing measurable decoding instab...

Papers with Code paper Jul 20

AlayaWorld: Interactive Long-Horizon World Modeling -- Full Technical Report

Unlike conventional video game development, which relies on labor-intensive pipelines for asset production, animation, physics, and programming, video world models generate interac...

Papers with Code paper Jul 20

SLAM in Low-Light Environments: Project Report

Simultaneous localization and mapping (SLAM) is one of the fundamental problems in robotics, as it enables autonomous operations in real-world scenarios. Under low illumination, re...

Papers with Code paper Jul 20

SWE-Pruner Pro: The Coder LLM Already Knows What to Prune

Pruning long context for coding agents has been a vital technology for efficient context management. While existing context pruning methods such as SWE-Pruner realize this by attac...

LLM

Papers with Code paper Jul 20

RynnBrain 1.1: Towards More Capable and Generalizable Embodied Foundation Model

We present RynnBrain 1.1, a family of embodied foundation models spanning 2B, 9B, and 122B-A10B scales. Trained with a unified spatio-temporal and physically grounded framework, Ry...

LLM

Papers with Code paper Jul 20

LLM-as-a-Coach: Experiential Learning for Non-Verifiable Tasks

Reinforcement learning (RL) on open-ended tasks compresses an LLM's rubric-based evaluation into a scalar reward, discarding rich textual feedback and conflating responses with dis...

LLM

Papers with Code paper Jul 20

DiFA: Inference-Time Forward-Process Alignment for Diffusion Models

The prevailing inference framework for diffusion models formulates generation fundamentally as a problem of numerical integration. This perspective casts the model as an exact esti...

Safety/Alignment

Papers with Code paper Jul 20

EduPanel: A Three-Agent LLM Judge for Teaching Videos -- Reliability, Complementarity, and Human Trust Calibration

Teaching videos are becoming a major medium for education, creating a growing need for scalable evaluation of their pedagogical quality. Existing automatic judges do not fully addr...

LLM

Papers with Code paper Jul 20

Subliminal Clocks: Latent Time Modelling in Diffusion Language Models

Diffusion Language Models (DLMs) have recently emerged as a promising alternative to autoregressive models. Unlike standard diffusion-based approaches, DLMs are not explicitly cond...

Papers with Code paper Jul 20

O-VAD: Industrial Video Anomaly Detection through Object-Centric Tracking and Reasoning

Industrial Video Anomaly Detection (IVAD) aims to identify anomalous objects and events in an industrial process, which is crucial for modern manufacturing and quality control syst...

Papers with Code paper Jul 20

Three-Body Scattering for Generative Modeling

Modern generative models typically rely on an adversarial critic, a prescribed noise-to-data path, or an autoregressive factorization. Instead, we show that a proper distributional...

Papers with Code paper Jul 20

HOMIE: Human-object Centric Video Personalization via Multimodal Intelligent Enchancement

Human-object centric video personalization (HOCVP) is a core task within subject-driven video generation. However, existing methods suffer from two key limitations. First, most app...

Multimodal

Papers with Code paper Jul 20

FlashRT: Agent Harness for Guiding Agents to Deploy Real-Time Multimodal Applications

Real-time multimodal applications, including voice agents and interactive video generation, compose heterogeneous models into pipelines whose efficient deployment requires applicat...

Multimodal

Papers with Code paper Jul 20

NexForge: Scaling Agent Capabilities through Requirement-Driven Task Synthesis for LLMs

Moving Alphabet: A Controlled Study of Training Data for Text-to-Video Generation

FinanceComplexQA: Benchmarking Agentic Reasoning on Industrial-grade Financial Documents

AgentDebugX: An Open-Source Toolkit for Failure Observability, Attribution, and Recovery in LLM Agents

Text Template Tokens Are Implicit Semantic Registers in Diffusion Transformers

Scaling Laws for Hypernetwork-Based Knowledge Injection in Large Language Models

Delineate Anything v2: A Global Foundation Model for Field Delineation

Mage-Flow: An Efficient Native-Resolution Foundation Model for Image Generation and Editing

Transcription Policy as a Latent Variable: Activating Controllable Verbatim ASR with Word-Level Timing

AlayaWorld: Interactive Long-Horizon World Modeling -- Full Technical Report

SLAM in Low-Light Environments: Project Report

SWE-Pruner Pro: The Coder LLM Already Knows What to Prune

RynnBrain 1.1: Towards More Capable and Generalizable Embodied Foundation Model

LLM-as-a-Coach: Experiential Learning for Non-Verifiable Tasks

DiFA: Inference-Time Forward-Process Alignment for Diffusion Models

EduPanel: A Three-Agent LLM Judge for Teaching Videos -- Reliability, Complementarity, and Human Trust Calibration

Subliminal Clocks: Latent Time Modelling in Diffusion Language Models

O-VAD: Industrial Video Anomaly Detection through Object-Centric Tracking and Reasoning

Three-Body Scattering for Generative Modeling

HOMIE: Human-object Centric Video Personalization via Multimodal Intelligent Enchancement

FlashRT: Agent Harness for Guiding Agents to Deploy Real-Time Multimodal Applications

Differentiable Logic Gate Networks for Low-Latency EEG Classification on Edge Devices

SciForma: Structure-Faithful Generation of Scientific Diagrams

ConsiSpace: Learning Geometric Consistency Matters for Video Spatial Reasoning