Chen Wei
Bridging the Gap from Raw Visual Data to High-Level Cognition
Research Abstract:
Computer vision has made rapid progress, evolving from closed-set recognition of specific tasks to open-set applications in the real world. One of the next frontiers is developing intelligent systems that can perceive, process, and interact with our 3D environment as humans do. My research targets two critical steps toward this goal, first to encode enormously complex raw signals to structured visual representations that are easier to understand and process for machine, and second, to connect visual representations to high-level cognition. My work addresses the first by discovering natural structures in the raw data by proxy tasks and generative modeling. With little to none task-specific human annotation, we obtain visual representation that is scalable to a wide range of domains and applicable to a variety of complex tasks. For the second, we build vision systems that learn from the commonsense world knowledge provided by the large foundational models, which then in turn can help refine foundational models to better capture our world. By addressing these challenges, I am working toward realizing the vision of creating compositional intelligent systems that engage with the physical world like humans.
Bio:
Chen Wei is a fifth-year Ph.D student at Johns Hopkins University, advised by Professor Alan L. Yuille. Her research lies at the intersection of artificial intelligence, machine learning and computer vision, with a particular interest on developing reliable, scalable and general visual representation to bridge the gap from raw visual data to high-level cognition. Prior to her Ph.D., Chen received a B.Sc. in Computer Science from Peking University. She has interned at renowned research institutions, including Fundamental AI Research (FAIR), Meta and Google DeepMind.