STORM (Spatiotemporal Token Reduction for Multimodal LLMs): An Innovative AI Architecture Featuring a Specialized Temporal Encoder Between the Image Encoder and the LLM
Researchers from NVIDIA, Rutgers University, UC Berkeley, MIT, Nanjing University, and KAIST have unveiled a new AI architecture called STORM, which aims to improve how we understand videos using artificial intelligence. This innovative model addresses some of the significant challenges faced by existing video-based AI systems, particularly their struggle to process videos as continuous sequences. … Read more