recent news
-
•I will join Northeastern University as a Joseph E. Aoun Professor in Fall 2025.
-
✓I have several openings for postdocs, visiting researchers, and PhD students to work on exciting new projects in embodied AI, video understanding, and multimodal learning. Prospective applicants should contact me with a CV and a one-page research statement.
-
•Two of our recently published articles received Distinguished Paper Awards at the CVPR 2025 EgoVis Workshop:
-
✓Video ReCap: Recursive Captioning of Hour-Long Videos,
with Md Mohaiminul Islam, Ngan Ho, Xitong Yang, Tushar Nagarajan, and Gedas Bertasius. Originally published at CVPR 2024. -
✓Ego4D Goal-Step: Toward Hierarchical Understanding of Procedural Activities,
with Yale Song, Eugene Byrne, Tushar Nagarajan, Huiyu Wang, and Miguel Martin. Originally published at NeurIPS 2023.
These awards recognize seminal papers that advance the field of egocentric vision.
-
•Two papers to be presented at CVPR 2025:
-
✓BIMBA: Selective-Scan Compression for Long-Range Video Question Answering,
with Md Mohaiminul Islam, Tushar Nagarajan, Huiyu Wang, and Gedas Bertasius. -
✓VITED: Video Temporal Evidence Distillation,
with Yujie Lu, Yale Song, William Wang, and Tushar Nagarajan.
-
•One paper presented at ECCV 2024:
-
✓4Diff: 3D- Aware Diffusion Model for Third-to-First Viewpoint Translation,
with Feng Cheng, Mi Luo, Huiyu Wang, Alex Dimakis, Gedas Bertasius, and Kristen Grauman.
-
•Four papers presented at CVPR 2024:
-
✓Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives,
with 100 co-authors! Accepted as oral (<1% accept rate). -
✓Learning to Segment Referred Objects from Narrated Egocentric Videos,
with Yuhan Shen, Huiyu Wang, Xitong Yang, Matt Feiszli, Ehsan Elhamifar, and Effrosyni Mavroudi. Accepted as oral (<1% accept rate). -
✓Step Differences in Instructional Video,
with Tushar Nagarajan. -
✓Video ReCap: Recursive Captioning of Hour-Long Videos,
with Md Mohaiminul Islam, Ngan Ho, Xitong Yang, Tushar Nagarajan, and Gedas Bertasius.
research
My research is in computer vision and multimodal (video-audio-language) learning. I aim to develop perceptual AI agents that can assist humans in their daily activities by understanding their behavior from video and communicating through language and actions. I am particularly motivated to apply this research to AI/AR coaching, episodic memory retrieval, and human-robot interaction.