home

recent news

•I have several openings for postdocs, visiting researchers, and PhD students to work on exciting new projects in embodied AI, video understanding, and multimodal learning at Northeastern University. Prospective applicants should contact me with a CV and a one-page research statement.

•Our state-space video model BIMBA won the first place in the EgoSchema Challenge at CVPR 2025

•Three of our recently published articles received Distinguished Paper Awards at the CVPR 2025 EgoVis Workshop:
✓Video ReCap: Recursive Captioning of Hour-Long Videos,
with Md Mohaiminul Islam, Ngan Ho, Xitong Yang, Tushar Nagarajan, and Gedas Bertasius. Originally published at CVPR 2024.
✓Ego4D Goal-Step: Toward Hierarchical Understanding of Procedural Activities,
with Yale Song, Eugene Byrne, Tushar Nagarajan, Huiyu Wang, and Miguel Martin. Originally published at NeurIPS 2023.
✓HierVL: Learning Hierarchical Video-Language Embeddings.,
with Kumar Ashutosh, Rohit Girdhar, and Kristen Grauman. Originally published at CVPR 2023.

These awards recognize publications that have advanced egocentric vision through original and innovative contributions.

•Two papers to be presented at CVPR 2025:
✓BIMBA: Selective-Scan Compression for Long-Range Video Question Answering,
with Md Mohaiminul Islam, Tushar Nagarajan, Huiyu Wang, and Gedas Bertasius.
✓VITED: Video Temporal Evidence Distillation,
with Yujie Lu, Yale Song, William Wang, and Tushar Nagarajan.

•One paper presented at ECCV 2024:
✓4Diff: 3D- Aware Diffusion Model for Third-to-First Viewpoint Translation,
with Feng Cheng, Mi Luo, Huiyu Wang, Alex Dimakis, Gedas Bertasius, and Kristen Grauman.

•Four papers presented at CVPR 2024:
✓Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives,
with 100 co-authors! Accepted as oral (<1% accept rate).
✓Learning to Segment Referred Objects from Narrated Egocentric Videos,
with Yuhan Shen, Huiyu Wang, Xitong Yang, Matt Feiszli, Ehsan Elhamifar, and Effrosyni Mavroudi. Accepted as oral (<1% accept rate).
✓Step Differences in Instructional Video,
with Tushar Nagarajan.
✓Video ReCap: Recursive Captioning of Hour-Long Videos,
with Md Mohaiminul Islam, Ngan Ho, Xitong Yang, Tushar Nagarajan, and Gedas Bertasius.

research

My research is in computer vision and multimodal (video-audio-language) learning. I aim to develop perceptual AI agents that can assist humans in their daily activities by understanding their behavior from video and communicating through language and actions. I am particularly motivated to apply this research to AI/AR coaching, episodic memory retrieval, and human-robot interaction.

Lorenzo Torresani

Professor & President Joseph E. Aoun Chair

Northeastern University

Khoury College of Computer Sciences

Email / Google Scholar