Papers

Latest Trending Top

Papers with Code paper May 28

EarlyTom: Early Token Compression Completes Fast Video Understanding

Video large language models (Video-LLMs) have demonstrated strong capabilities in video understanding tasks. However, their practical deployment is still hindered by the inefficien...

Papers with Code paper May 28

One Click per Cell Type Suffices: Training-free Group Interaction for Cell Instance Segmentation

Cell instance segmentation models trained on cell-specific datasets suffer severe performance drops on out-of-distribution cell types, while interactive foundation models overcome ...

Papers with Code paper May 28

GrepSeek: Training Search Agents for Direct Corpus Interaction

Large Language Model (LLM) search agents have shown strong promise for knowledge-intensive language tasks through multiple rounds of reasoning and information retrieval. Most exist...

Papers with Code paper May 28

SANA-Streaming: Real-time Streaming Video Editing with Hybrid Diffusion Transformer

Real-time streaming video-to-video editing (V2V) is critical for interactive applications such as live broadcasting and gaming, yet it remains a formidable challenge due to the str...

Papers with Code paper May 28

Towards Human-Like Interactive Speech Recognition With Agentic Correction and Semantic Evaluation

Automatic speech recognition (ASR) is a core component of human--computer interaction and an increasingly important front-end for LLM-based assistants and agents. However, most cur...

Agents

Papers with Code paper May 28

CausaLab: A Scalable Environment for Interactive Causal Discovery Toward AI Scientists

We introduce CausaLab, a scalable environment for evaluating interactive causal discovery by LLM agents. Unlike prior evaluations, CausaLab evaluates both whether an agent can solv...

OpenAI

Papers with Code paper May 28

MAAT: Multi-phase Adapter-Aware Targeted Unlearning

Machine unlearning evaluation is structurally skewed: Why-type questions, which probe causal and relational knowledge, comprise less than 0.06% of CounterFact, 0.6% of ZSRE, and le...

Fine-Tuning

Papers with Code paper May 28

Token-Level Generalization in LoRA Adapter Backdoors: Attack Characterization and Behavioral Detection

We show that LoRA adapters, the dominant distribution format for fine-tuned LLMs, can be reliably backdoored through training data poisoning while preserving baseline task performa...

Fine-Tuning

Papers with Code paper May 28

YoCausal: How Far is Video Generation from World Model? A Causality Perspective

As video diffusion models (VDMs) advance toward world models, a key question arises: do they truly understand causality, or merely overfit to statistical temporal patterns? Existin...

Papers with Code paper May 28

Recovering Policy-Induced Errors: Benchmarking and Trajectory Synthesis for Robust GUI Agents

While GUI agents have advanced rapidly, they often lack the robustness to recover from their own errors, hindering real-world deployment. To bridge this gap at both the evaluation ...

Papers with Code paper May 28

A Multi-AI-agent Framework Enabling End-to-end Finite Element Analysis for Solid Mechanics Problems

Finite element analysis (FEA) is the most important numerical approach for solid mechanics. Challenges of FEA include a steep learning curve for entry-level users and potential fal...

Papers with Code paper May 28

VLM3: Vision Language Models Are Native 3D Learners

Vision Language Models (VLMs) enable a unified model to solve various vision tasks through prompting. They have shown promising performance in semantic understanding. However, 3D u...

Multimodal

Papers with Code paper May 28

Exploring Autonomous Agentic Data Engineering for Model Specialization

Large Language Models (LLMs) have demonstrated strong performance on general tasks, while often struggling to adapt to specialized domains without high-quality domain-specific data...

OpenAI Agents

Papers with Code paper May 28

Towards Consistent Video Geometry Estimation

This work presents ViGeo, a feed-forward foundation model for recovering spatially dense and temporally consistent geometry from video sequences. Built upon a plain transformer arc...

Papers with Code paper May 28

DynaFLIP: Rethinking Robotics Perception via Tri-Modal-Dynamics Guided Representation

Robot manipulation critically depends on perception that preserves the action-relevant aspects of a scene. Yet most robot learning pipelines are built upon visual encoders pre-trai...

Papers with Code paper May 28

Tiny but Trusted: Efficient Vision-Language Reasoning for Time-Series Anomaly Detection

Recent advances in Vision-Language Models (VLMs) have achieved impressive performance across many tasks, yet prior studies report unsatisfactory performance when applying large lan...

Multimodal

Papers with Code paper May 28

FreeForm: Reduced-Order Deformable Simulation from Particle-Based Skinning Eigenmodes

We present a novel formulation for mesh-free, reduced-order simulation of deformable hyperelastic objects. Existing work in reduced-order elastodynamic simulation represents the in...

Papers with Code paper May 28

SAAS: Self-Aware Reinforcement Learning for Over-Search Mitigation in Agentic Search

Agentic search enables LLMs to solve complex multi-hop questions through iterative reasoning and external search. Despite the effectiveness, these systems often suffer from a criti...

Agents

Papers with Code paper May 28

How LoRA Remembers? A Parametric Memory Law for LLM Finetuning

Large Language Models (LLMs) must continuously learn and update knowledge to remain effective in dynamic real-world environments. While Low-Rank Adaptation (LoRA) is widely used fo...

LLM Fine-Tuning

Papers with Code paper May 28

LaRA: Layer-wise Representation Analysis for Detecting Data Contamination in RL Post-Training

Reinforcement learning (RL) post-training has shown to improve reasoning in large language models (LLMs). However, there has been little exploration on the problem of data contamin...

Papers with Code paper May 28

Native Audio-Visual Alignment for Generation

Joint audio-video generation aims to synthesize temporally synchronized and semantically coherent visual-acoustic content. However, existing open-source methods mainly rely on eith...

Safety/Alignment

Papers with Code paper May 28

EarlyTom: Early Token Compression Completes Fast Video Understanding

One Click per Cell Type Suffices: Training-free Group Interaction for Cell Instance Segmentation

GrepSeek: Training Search Agents for Direct Corpus Interaction

SANA-Streaming: Real-time Streaming Video Editing with Hybrid Diffusion Transformer

Towards Human-Like Interactive Speech Recognition With Agentic Correction and Semantic Evaluation

CausaLab: A Scalable Environment for Interactive Causal Discovery Toward AI Scientists

MAAT: Multi-phase Adapter-Aware Targeted Unlearning

Token-Level Generalization in LoRA Adapter Backdoors: Attack Characterization and Behavioral Detection

YoCausal: How Far is Video Generation from World Model? A Causality Perspective

Recovering Policy-Induced Errors: Benchmarking and Trajectory Synthesis for Robust GUI Agents

A Multi-AI-agent Framework Enabling End-to-end Finite Element Analysis for Solid Mechanics Problems

VLM3: Vision Language Models Are Native 3D Learners

Exploring Autonomous Agentic Data Engineering for Model Specialization

Towards Consistent Video Geometry Estimation

DynaFLIP: Rethinking Robotics Perception via Tri-Modal-Dynamics Guided Representation

Tiny but Trusted: Efficient Vision-Language Reasoning for Time-Series Anomaly Detection

FreeForm: Reduced-Order Deformable Simulation from Particle-Based Skinning Eigenmodes

SAAS: Self-Aware Reinforcement Learning for Over-Search Mitigation in Agentic Search

How LoRA Remembers? A Parametric Memory Law for LLM Finetuning

LaRA: Layer-wise Representation Analysis for Detecting Data Contamination in RL Post-Training

Native Audio-Visual Alignment for Generation

Brain-IT-VQA: From Brain Signals to Answers

Geometry Matters: 3D Foundation Priors for Learning Semantic Correspondence

Thinking Before Constraining: A Unified Decoding Framework for Large Language Models