H-EmbodVis/NUMINA: [CVPR 2026] When Numbers Speak: Aligning Textual Numerals and Visual Instances in Text-to-Video Diffusion Models

[CVPR 2026] When Numbers Speak: Aligning Textual Numerals and Visual Instances in Text-to-Video Diffusion Models

Read Original

Related