Papers

Latest Trending Top

Papers with Code paper Jul 10

OpenLongTail: Generative Scaling of Long-Tail Driving Data

Scaling robust driving policies is fundamentally bottlenecked by the scarcity of edge cases in curated datasets. While the real world continuously captures these critical events, s...

Papers with Code paper Jul 10

Self-Guided Test-Time Training for Long-Context LLMs

Long-context processing has become increasingly important for large language models (LLMs), but simply extending the context window does not guarantee effective utilization of long...

Papers with Code paper Jul 10

Video Generation Models are General-Purpose Vision Learners

Driven by next-token prediction, NLP shifted from task-specific models into powerful generalist foundation models. What, then, is the equivalent catalyst needed to achieve a genera...

Multimodal

Papers with Code paper Jul 10

A Sovereign, Open-Source Foundation Model for German and English

We present Soofi S 30B-A3B, a sovereign, open-source Mixture-of-Experts (MoE) hybrid Mamba Transformer foundation model for German and English. Its hybrid design activates only 3B ...

LLM Open Source

Papers with Code paper Jul 10

Scalable Visual Pretraining for Language Intelligence

The rapid progress of large foundation models has been driven predominantly by pretraining on large-scale text corpora. However, many forms of knowledge are conveyed through visual...

Papers with Code paper Jul 10

On Locality and Length Generalization in Visual Reasoning

A striking feature of the human visual system is that it ingests visual information through a series of local foveated glimpses, rather than a single global computation. This makes...

Papers with Code paper Jul 10

Phone Segmentation and Recognition through Phonological Activation Mapping

Phone segmentation and recognition are inherently related tasks, yet modern approaches typically model them separately. We argue that phonetic structure is already latent in the re...

Papers with Code paper Jul 10

PanoWorld: Real-World Panoramic Generation

In this work, we aim to address the challenge of long-range memory in panoramic world models by exploiting the rotation-equivariant property of omnidirectional representations, whe...

Papers with Code paper Jul 9

Towards Mechanistically Understanding Why Memorized Knowledge Fails to Generalize in Large Language Model Finetuning

Fine-tuning LLMs to inject new knowledge faces a critical challenge: LLMs can quickly memorize new facts, yet fail to use them for downstream reasoning tasks. We formalize this fai...

LLM

Papers with Code paper Jul 9

ARDY: Autoregressive Diffusion with Hybrid Representation for Interactive Human Motion Generation

Generating realistic 3D human motions in real-time within interactive applications is key for animation, simulation, and humanoid robotics. While recent offline motion generation a...

Papers with Code paper Jul 9

LongE2V: Long-Horizon Event-based Video Reconstruction, Prediction, and Frame Interpolation with Video Diffusion Models

Recovering high-quality video from sparse event streams is a challenging task. Regression methods often blur textures, while existing generative models struggle with long-term stab...

Papers with Code paper Jul 9

Enhancing In-context Panoramic Generation via Geometric-aware Pretraining

In this work, we present Canvas360, a two-stage framework for in-context panoramic generation that combines geometry-aware pretraining with downstream task-specific fine-tuning. To...

Papers with Code paper Jul 9

A Quantized Native Runtime for On-Device Semantic Audio Generation

Semantic audio applications increasingly require controllable generation on commodity and embedded hardware rather than through framework-heavy datacenter stacks. We present aria, ...

Papers with Code paper Jul 9

What LLM Forecasters Know but Don't Say: Probing Internal Representations for Calibration and Faithfulness

Large language models fine-tuned for forecasting can be accurate yet poorly calibrated, and their chain-of-thought (CoT) reasoning may not faithfully reflect the evidence behind a ...

LLM

Papers with Code paper Jul 9

UniClawBench: A Universal Benchmark for Proactive Agents on Real-World Tasks

The rapid development of large language models and multimodal large language models has accelerated the emergence of proactive agents capable of operating everyday tools and assist...

Benchmark

Papers with Code paper Jul 9

CausalDS: Benchmarking Causal Reasoning in Data-Science Agents

Large language models (LLMs) increasingly act as integrated data-science agents, combining abstract reasoning with advanced tool use. Yet the relevant benchmark landscape largely d...

Papers with Code paper Jul 9

Blind-Spots-Bench: Evaluating Blind Spots in Multimodal Models

Modern AI models achieve strong performance on many established benchmarks, yet they still fail on tasks that humans find almost trivial, such as manipulating a string or drawing a...

Multimodal

Papers with Code paper Jul 9

Search Beyond What Can Be Taught: Evolving the Knowledge Boundary in Agentic Visual Generation

Visual generators excel at rendering, but they confidently fabricate what they do not know. User requests are unbounded, evolving, and deeply long-tailed: new characters, trending ...

Agents

Papers with Code paper Jul 9

Long-Horizon-Terminal-Bench: Testing the Limits of Agents on Long-Horizon Terminal Tasks with Dense Reward-Based Grading

AI agents have become capable of autonomously completing short, well-specified tasks. However, existing terminal benchmarks largely focus on simple problems that finish within minu...

OpenLongTail: Generative Scaling of Long-Tail Driving Data

Self-Guided Test-Time Training for Long-Context LLMs

Video Generation Models are General-Purpose Vision Learners

A Sovereign, Open-Source Foundation Model for German and English

Scalable Visual Pretraining for Language Intelligence

On Locality and Length Generalization in Visual Reasoning

Phone Segmentation and Recognition through Phonological Activation Mapping

PanoWorld: Real-World Panoramic Generation

Towards Mechanistically Understanding Why Memorized Knowledge Fails to Generalize in Large Language Model Finetuning

ARDY: Autoregressive Diffusion with Hybrid Representation for Interactive Human Motion Generation

LongE2V: Long-Horizon Event-based Video Reconstruction, Prediction, and Frame Interpolation with Video Diffusion Models

Enhancing In-context Panoramic Generation via Geometric-aware Pretraining

A Quantized Native Runtime for On-Device Semantic Audio Generation

What LLM Forecasters Know but Don't Say: Probing Internal Representations for Calibration and Faithfulness

UniClawBench: A Universal Benchmark for Proactive Agents on Real-World Tasks

CausalDS: Benchmarking Causal Reasoning in Data-Science Agents

Blind-Spots-Bench: Evaluating Blind Spots in Multimodal Models

Search Beyond What Can Be Taught: Evolving the Knowledge Boundary in Agentic Visual Generation

Ideas Have Genomes: Benchmarking Scientific Lineage Reasoning and Lineage-Grounded Idea Generation

OpenCoF: Learning to Reason Through Video Generation

OPSD-V: On-Policy Self-Distillation for Post-Training Few-Step Autoregressive Video Generators

MuScriptor: An Open Model for Multi-Instrument Music Transcription

DrugGen 2: A disease-aware language model for enhancing drug discovery

Long-Horizon-Terminal-Bench: Testing the Limits of Agents on Long-Horizon Terminal Tasks with Dense Reward-Based Grading