EarlyTom: Early Token Compression Completes Fast Video Understanding
Video large language models (Video-LLMs) have demonstrated strong capabilities in video understanding tasks. However, their practical deployment is still hindered by the inefficien...