/// AI HUB
Dashboard News Models Tools Papers Repos Videos Companies Trending
Login

#Multimodal

916 articles tagged with Multimodal

Latest Trending
Dev.to tutorial May 26

Quick Tip: Benchmarking Multimodal APIs in Under 10 Minutes

Look, I’m a backend engineer. I don’t have time to read through 40 pages of model cards before...

Multimodal
20
Papers with Code paper May 26

LocateAnything: Fast and High-Quality Vision-Language Grounding with Parallel Box Decoding

Vision-language models (VLMs) commonly formulate visual grounding and detection as a coordinate-token generation problem, serializing each 2D box into multiple 1D tokens that are l...

Multimodal
21
Papers with Code paper May 26

Gemini Embedding 2: A Native Multimodal Embedding Model from Gemini

We introduce Gemini Embedding 2, a native multimodal embedding model that allows embedding video, audio, image, and text modalities in a unified representation space. We leverage t...

Google Multimodal
21
Papers with Code paper May 26

QUACK: Questioning, Understanding, and Auditing Communicated Knowledge in Multimodal Social Deduction Agents

Social deduction games have become a popular testbed for probing reasoning, deception, coordination, and belief modeling in Large Language Model (LLM) agents. However, most environ...

Multimodal
21
Papers with Code paper May 26

Chartographer: Counterfactual Chart Generation for Evaluating Vision-Language Models

Chart question-answering (QA) benchmarks aim to pose questions that require visual reasoning to correctly answer, but models can often reach solutions through shortcuts or prior fa...

Multimodal
21
Papers with Code paper May 26

How and What to Imagine? Visual Thinking in Unified Multimodal Models for Cross-View Spatial Reasoning

Cross-view spatial reasoning remains a weak spot for vision-language models (VLMs): they often reason in language and lose the fine-grained geometry needed for the task. Thinking w...

Multimodal
21
Papers with Code paper May 26

PANDO: Efficient Multimodal AI Agents via Online Skill Distillation

Recent advances in multimodal web agents often rely on increased inference-time computation, including rollout search, verifier passes, offline skill discovery, and specialist mode...

Multimodal
21
NewsData.io news May 26

Vision for portable diagnostic technology

By winter, a pair of Winnipeg entrepreneurs aim to have portable vision and concussion-screening products circulating Canada.

Multimodal
21
NewsData.io news May 26

Elektros, Inc.: As More Global Investors Discover ELEKTROS Inc.'s Vision for Hard Rock Lithium Mining and Advanced EV Patent Technology, the Company Celebrates Growing Worldwide Interest

Strong Market Momentum, and Friday's 7.96% Gain While Looking Ahead to What Management Believes Is Only the Beginning of a Potentially Transformational Future WEST PALM BEACH, FL /...

Multimodal
21
YouTube video May 25

​FRAME 5: THE MACRO VISION #ai #news #aishorts #finance #economy #crypto

Multimodal
15
Mastodon discussion May 25

"Pope Leo XIV on Monday set out a sweeping vision for corporate executives, politicians, and individuals who will shape ...

"Pope Leo XIV on Monday set out a sweeping vision for corporate executives, politicians, and individuals who will shape and be shaped by the future of artificial intelligence, warn...

Multimodal
43
GitHub Trending repo May 25

infiniteYuanyl/VRCD: Official implementation of paper “Visual-Redundancy-Controlled Parallel Decoding for Diffusion-Based Multimodal Large Language Models”

Official implementation of paper “Visual-Redundancy-Controlled Parallel Decoding for Diffusion-Based Multimodal Large Language Models”

Multimodal
39
NewsData.io news May 25

‘Magnifica Humanitas’: Pope Leo Invokes Justice to Combat ‘Anti-Human Vision’ in AI

Published Monday, the Pope’s new encyclical warns of a ‘culture of power’ fueled by the digital revolution and artificial intelligence.

Multimodal
21
NewsData.io news May 25

Business News | Bihar AI Summit 2026 Highlights Bihar's Vision to Lead India's Intelligence Revolution Through Artificial Intelligence and Emerging Technologies

Get latest articles and stories on Business at LatestLY. Bihar AI Summit 2026 emerged as a significant platform focused on exploring the transformative impact of Artificial Intelli...

Multimodal
21
Mastodon discussion May 25

ByteDance has open-sourced Lance, a native multimodal AI model that runs locally on as little as 40GB VRAM, withquantise...

ByteDance has open-sourced Lance, a native multimodal AI model that runs locally on as little as 40GB VRAM, withquantised versions working on 24GB GPUs. The 3B parameter model reac...

Multimodal
18
Mastodon discussion May 25

🚀 Fastest-growing AI projects today1. Bytedance's Lance a lightweight native unified multimodal model designed for image...

🚀 Fastest-growing AI projects today1. Bytedance's Lance a lightweight native unified multimodal model designed for image and...2. With its unique approach to handling various multi...

Multimodal
18
Papers with Code paper May 25

Toward Native Multimodal Modeling: A Roadmap

Multimodal modeling represents a vital step from modality-agnostic reasoning toward world modeling. While early approaches predominantly rely on late-fusion that assembles encoders...

Multimodal
21
Papers with Code paper May 25

Squeezing Capacity from Multimodal Large Language Models for Subject-driven Generation

Subject-driven image generation aims to synthesize new images that preserve the identity of the given subject while following textual instructions. Existing approaches often encode...

Multimodal
21
Papers with Code paper May 25

Advancing Creative Physical Intelligence in Large Multimodal Models

Large multimodal models (LMMs) have rapidly advanced in perception and reasoning; however, it remains unclear whether these capabilities generalize to discovering visually grounded...

Multimodal
21
Mastodon discussion May 25

Prescient vision of #TransHumanism https://youtu.be/4x5YDCj-wiE?si=2kqzP9M-VZhFfXnG#music #Mood #AI #TransHuman #MusicVi...

Prescient vision of #TransHumanism https://youtu.be/4x5YDCj-wiE?si=2kqzP9M-VZhFfXnG#music #Mood #AI #TransHuman #MusicVideo #Vision #Dystopia

Multimodal
18
GitHub Trending repo May 24

pardcomper/mllm-jailbreak-bench: Reproducible benchmark for adversarial attacks on multimodal large language models

Reproducible benchmark for adversarial attacks on multimodal large language models

Multimodal Benchmark Safety/Alignment
64
GitHub Trending repo May 24

bandyah/uni-mm-trainer: A small library for training multimodal LLMs combining text, vision, and audio

A small library for training multimodal LLMs combining text, vision, and audio

Multimodal
62
NewsData.io news May 24

Elektros, Inc.: ELEKTROS Inc. Celebrates Booming U.S. Markets and Friday's 7.96% Gain As Growing Numbers of Penny Stock and Microcap Investors Worldwide Discover the Company's Vision for Hard Rock Lithium Mining and Advanced EV Patent Technology

WEST PALM BEACH, FL / ACCESS Newswire / May 24, 2026 / Management Celebrates Friday Trading Momentum of 7.96% and States That Growing Worldwide Awareness of ELEKTROS Represents a P...

Multimodal
21
Mastodon discussion May 24

Apple Immersive video on Real Madrid coming this week to Vision ProApple’s latest immersive video for Vision Pro users i...

Apple Immersive video on Real Madrid coming this week to Vision ProApple’s latest immersive video for Vision Pro users is coming this week. It’s called Real Madrid: The Weight of G...

Google Multimodal
9
« Previous Page 9 of 39 (916 items) Next »
AI Hub // AI Intelligence Platform // LIVE FEED // Impressum // Datenschutz © 2026
0 new articles available