WorldBench: A Challenging and Visually Diverse Multimodal Reasoning Benchmark
In real-world applications, models are expected to perform reliably across diverse settings. Yet, many existing multimodal benchmarks expand task types without capturing the visual...