recent news


  1. I will join Northeastern University as a Joseph E. Aoun Professor in Fall 2025.

  2. I have several openings for postdocs, visiting researchers, and PhD students to work on exciting new projects in embodied AI, video understanding, and multimodal learning. Prospective applicants should contact me with a CV and a one-page research statement.



  1. Two papers to be presented at CVPR 2025:

  2. BIMBA: Selective-Scan Compression for Long-Range Video Question Answering,
    with Md Mohaiminul Islam, Tushar Nagarajan, Huiyu Wang, and Gedas Bertasius.

  3. VITED: Video Temporal Evidence Distillation,
    with Yujie Lu, Yale Song, William Wang, and Tushar Nagarajan.



  1. One paper presented at ECCV 2024:

  2. 4Diff: 3D- Aware Diffusion Model for Third-to-First Viewpoint Translation,
    with Feng Cheng, Mi Luo, Huiyu Wang, Alex Dimakis, Gedas Bertasius, and Kristen Grauman.



  1. Four papers presented at CVPR 2024:

  2. Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives,
    with 100 co-authors! Accepted as oral (<1% accept rate).

  3. Learning to Segment Referred Objects from Narrated Egocentric Videos,
    with Yuhan Shen, Huiyu Wang, Xitong Yang, Matt Feiszli, Ehsan Elhamifar, and Effrosyni Mavroudi. Accepted as oral (<1% accept rate).

  4. Step Differences in Instructional Video,
    with Tushar Nagarajan. 

  5. Video ReCap: Recursive Captioning of Hour-Long Videos,
    with Md Mohaiminul Islam, Ngan Ho, Xitong Yang, Tushar Nagarajan, and Gedas Bertasius. 




research


My research is in computer vision and multimodal (video-audio-language) learning. I aim to develop perceptual AI agents that can assist humans in their daily activities by understanding their behavior from video and communicating through language and actions. I am particularly motivated to apply this research to AI/AR coaching, episodic memory retrieval, and human-robot interaction.

Lorenzo Torresani


Email / Google Scholar