/// AI HUB
Dashboard News Models Tools Papers Repos Videos Companies Trending
Login

#Multimodal

1062 articles tagged with Multimodal

Latest Trending
NewsData.io news Mar 31

Robots Crack The 'Invisible Object' Problem With RGB-Only Vision System

(MENAFN - Robotics & Automation News) " itemprop="text"> Scientists in Japan say they have developed a new approach – dubbed 'HEAPGrasp' – that improves robots' grasping success ra...

Multimodal
21
Dev.to tutorial Mar 31

Using GPT-4o-mini for Simple Tasks and GPT-4o for Complex Ones - Automatically

Stop paying gpt-4o prices for tasks gpt-4o-mini handles just as well. Three working approaches to automatic complexity routing: heuristics, classifier calls, and outcome-based Thom...

Multimodal
12
Papers with Code paper Mar 30

MMFace-DiT: A Dual-Stream Diffusion Transformer for High-Fidelity Multimodal Face Generation

Recent multimodal face generation models address the spatial control limitations of text-to-image diffusion models by augmenting text-based conditioning with spatial priors such as...

Multimodal
21
Mastodon discussion Mar 30

📰 Computer Vision and AI Conference 2026: OpenCV & SID Live in LA — May 4The annual Computer Vision and AI conference re...

📰 Computer Vision and AI Conference 2026: OpenCV & SID Live in LA — May 4The annual Computer Vision and AI conference returns this May with a special one-day event in Los Angeles, ...

Multimodal
9
Papers with Code paper Mar 30

ResAdapt: Adaptive Resolution for Efficient Multimodal Reasoning

Multimodal Large Language Models (MLLMs) achieve stronger visual understanding by scaling input fidelity, yet the resulting visual token growth makes jointly sustaining high spatia...

Multimodal
21
Mastodon discussion Mar 30

🎮 "If I release something official, I want it to match my vision" - Undertale and Deltarune creator Toby Fox explains wh...

🎮 "If I release something official, I want it to match my vision" - Undertale and Deltarune creator Toby Fox explains why no translations other than Japanese have happenedIt's not ...

Multimodal
18
Papers with Code paper Mar 30

MiroEval: Benchmarking Multimodal Deep Research Agents in Process and Outcome

Recent progress in deep research systems has been impressive, but evaluation still lags behind real user needs. Existing benchmarks predominantly assess final reports using fixed r...

Multimodal
21
YouTube video Mar 30

WEEKLY NEWS #31: AI and the $109 Billion Clash — When Vision Outruns Reality

One number: $109 billion. And a trial set for April 27. That is how a long-simmering conflict between Elon Musk and OpenAI has ...

OpenAI Multimodal
19
Mastodon discussion Mar 30

📰 İnsanlar Tereddüt Ederse Büyük Modeller Neden Daha Üstün? GPT-4o ve AI Karar Verme Analizi (2026)İnsanların karar veri...

📰 İnsanlar Tereddüt Ederse Büyük Modeller Neden Daha Üstün? GPT-4o ve AI Karar Verme Analizi (2026)İnsanların karar verirken tereddüt etmesi, büyük dil modellerinin avantajını artı...

Multimodal
9
Papers with Code paper Mar 30

GEMS: Agent-Native Multimodal Generation with Memory and Skills

Recent multimodal generation models have achieved remarkable progress on general-purpose generation tasks, yet continue to struggle with complex instructions and specialized downst...

Anthropic Multimodal
21
Mastodon discussion Mar 30

Meituan has released LongCat-Next, a native multimodal AI model that treats text, images, and audio as equivalent tokens...

Meituan has released LongCat-Next, a native multimodal AI model that treats text, images, and audio as equivalent tokens within a single architecture. Unlike conventional models th...

Multimodal
18
Dev.to tutorial Mar 30

MOMENTUM - AIccountability using Notion MCP for Vision Board Progress

This is a submission for the Notion MCP Challenge What I Built MOMENTUM is an AI-powered...

Multimodal MCP
12
Mastodon discussion Mar 30

Vision-language models are impressive—until you ask them something simple.A recent study shows that state-of-the-art sys...

Vision-language models are impressive—until you ask them something simple.A recent study shows that state-of-the-art systems struggle with basic visual tasks like counting shapes o...

Multimodal
18
NewsData.io news Mar 30

AI Governance- India Has the Vision. Now It Needs Clear Execution: Sandip S. Shrotri

MUMBAI- India has reached an important point in its journey with artificial intelligence. Over the past few years, both central and state governments have shown strong intent in ad...

Multimodal
21
Product Hunt tool Mar 30

Halo Vision Headphones

Headphones with a camera to capture moments as you jam Discussion | Link

Multimodal
15
Papers with Code paper Mar 30

Memory-Augmented Vision-Language Agents for Persistent and Semantically Consistent Object Captioning

Vision-Language Models (VLMs) often yield inconsistent descriptions of the same object across viewpoints, hindering the ability of embodied agents to construct consistent semantic ...

Multimodal
21
Papers with Code paper Mar 29

MuSEAgent: A Multimodal Reasoning Agent with Stateful Experiences

Research agents have recently achieved significant progress in information seeking and synthesis across heterogeneous textual and visual sources. In this paper, we introduce MuSEAg...

Multimodal
21
NewsData.io news Mar 29

Vision Expo 2026 Looks Ahead With Cautious Optimism

The Orlando show hosted around 8,000 professionals from 92 countries marking the debut of a single yearly show with a big focus on technology.

Multimodal
21
Mastodon discussion Mar 29

📰 DeepSeek-Prover-V2 Outperforms GPT-4o in Neural Theorem Proving (2026)DeepSeek-Prover-V2 pushes the boundaries of neur...

📰 DeepSeek-Prover-V2 Outperforms GPT-4o in Neural Theorem Proving (2026)DeepSeek-Prover-V2 pushes the boundaries of neural theorem proving with recursive proof search and reinforce...

Multimodal
18
Papers with Code paper Mar 29

On Token's Dilemma: Dynamic MoE with Drift-Aware Token Assignment for Continual Learning of Large Vision Language Models

Multimodal Continual Instruction Tuning aims to continually enhance Large Vision Language Models (LVLMs) by learning from new data without forgetting previously acquired knowledge....

Multimodal
21
Papers with Code paper Mar 29

Gated Condition Injection without Multimodal Attention: Towards Controllable Linear-Attention Transformers

Recent advances in diffusion-based controllable visual generation have led to remarkable improvements in image quality. However, these powerful models are typically deployed on clo...

Multimodal
21
Mastodon discussion Mar 28

📰 Tian Gong AI Unveils 2026 Multimodal Model to Dominate Global AI RaceChinese AI startup Tian Gong AI has unveiled a gr...

📰 Tian Gong AI Unveils 2026 Multimodal Model to Dominate Global AI RaceChinese AI startup Tian Gong AI has unveiled a groundbreaking multimodal model, signaling its entry into the ...

Multimodal
9
Papers with Code paper Mar 28

ChartNet: A Million-Scale, High-Quality Multimodal Dataset for Robust Chart Understanding

Understanding charts requires models to jointly reason over geometric visual patterns, structured numerical data, and natural language -- a capability where current vision-language...

Multimodal
21
Papers with Code paper Mar 28

Structural Graph Probing of Vision-Language Models

Vision-language models (VLMs) achieve strong multimodal performance, yet how computation is organized across populations of neurons remains poorly understood. In this work, we stud...

Multimodal
21
« Previous Page 41 of 45 (1062 items) Next »
AI Hub // AI Intelligence Platform // LIVE FEED // Impressum // Datenschutz © 2026
0 new articles available