recent news
-
•I will join Northeastern University as a Joseph E. Aoun Professor in Fall 2025.
-
✓I have several openings for postdocs, visiting researchers, and PhD students to work on exciting new projects in embodied AI, video understanding, and multimodal learning. Prospective applicants should contact me with a CV and a one-page research statement.
-
•Two papers to be presented at CVPR 2025:
-
✓BIMBA: Selective-Scan Compression for Long-Range Video Question Answering,
with Md Mohaiminul Islam, Tushar Nagarajan, Huiyu Wang, and Gedas Bertasius. -
✓VITED: Video Temporal Evidence Distillation,
with Yujie Lu, Yale Song, William Wang, and Tushar Nagarajan.
-
•One paper presented at ECCV 2024:
-
✓4Diff: 3D- Aware Diffusion Model for Third-to-First Viewpoint Translation,
with Feng Cheng, Mi Luo, Huiyu Wang, Alex Dimakis, Gedas Bertasius, and Kristen Grauman.
-
•Four papers presented at CVPR 2024:
-
✓Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives,
with 100 co-authors! Accepted as oral (<1% accept rate). -
✓Learning to Segment Referred Objects from Narrated Egocentric Videos,
with Yuhan Shen, Huiyu Wang, Xitong Yang, Matt Feiszli, Ehsan Elhamifar, and Effrosyni Mavroudi. Accepted as oral (<1% accept rate). -
✓Step Differences in Instructional Video,
with Tushar Nagarajan. -
✓Video ReCap: Recursive Captioning of Hour-Long Videos,
with Md Mohaiminul Islam, Ngan Ho, Xitong Yang, Tushar Nagarajan, and Gedas Bertasius.
research
My research is in computer vision and multimodal (video-audio-language) learning. I aim to develop perceptual AI agents that can assist humans in their daily activities by understanding their behavior from video and communicating through language and actions. I am particularly motivated to apply this research to AI/AR coaching, episodic memory retrieval, and human-robot interaction.