logo EECS Rising Stars 2023




Paola Cascante-Bonilla

More from Less: Learning with Limited Annotated Data in Vision and Language



Research Abstract:

Despite the impressive results of deep learning models, modern large-scale systems are required to be trained using massive amounts of manually annotated or freely available data on the Internet. But this “data in the wild” is insufficient to learn specific structural patterns of the world, and existing large-scale models still fail on common sense tasks requiring compositional inference. Today, systems are trained with this online content, which is bounded by the interaction of only ~64.6% of the world’s population with Internet access. This establishes a clear limit to the diversity of the available data, impacting specialized information and underrepresented cultures. On top of that, now we have automated systems that have been trained with this data, and yet are deployed to add textual and visual information to the Internet. This poses a major threat to the reliability of intelligent agents, which may become more biased with arbitrary information systematically distributed. – My current research focuses on answering three fundamental questions: (a) how can we create systems that can learn with limited annotated data? (b) how can we create systems able to encode real-world concepts with granularity for common-sense reasoning tasks? (c) is it possible to create such a system with “alternative data” (e.g., synthetic+generative images and text), complying with privacy protection principles and avoiding cultural bias? – Given my work’s intersection with Computer Vision and Natural Language Processing, my aim is to analyze and apply Machine Learning algorithms to understand how images and text can interact and model complex patterns, reinforcing compositional reasoning without forgetting prior knowledge. I plan to continue exploring hyper-realistic synthetic data generation techniques and the expressiveness of generative models to train multimodal systems able to perform well in real-world scenarios, with applications including visual-question answering, cross-modal retrieval, zero-shot classification, and task planning.

Bio:

Paola Cascante-Bonilla is a Ph.D. candidate in Computer Science at Rice University, advised by Professor Vicente Ordóñez-Román, working on Computer Vision, Natural Language Processing, and Machine Learning. She has been focusing on multi-modal learning, few-shot learning, semi-supervised learning, representation learning, and synthetic data generation for compositionality and privacy protection. Her work has been published in machine learning, vision, and language conferences (CVPR, ICCV, AAAI, NeurIPS, BMVC, NAACL). She has previously interned at the Mitsubishi Electric Research Laboratories (MERL) and twice at the MIT-IBM Watson AI Lab. She is the recipient of the Ken Kennedy Institute SLB Graduate Fellowship (2022/23) and has been recently selected as a Future Faculty Fellow by Rice's George R. Brown School of Engineering (2023).