Multiplication in Multimodal LLMs: Computation with Text, Image, and Audio Inputs
Multimodal LLMs can accurately perceive numerical content across modalities yet fail to perform exact multi-digit multiplication when the identical underlying arithmetic problem is...