Qiaochu Chen

Position: PhD Candidate Institution: University of Texas at Austin

Neurosymbolic Programming for Data Analytics Pipeline

Research Abstract:

Data-driven decision-making is pivotal in modern workflows. However, many existing tools offer only basic data functionalities, impeding end-users from achieving complex tasks. To fully unlock user productivity, there's an urgent need for novel data extraction paradigms and automation techniques. Recent research primarily follows either a symbolic approach, relying on logic reasoning and predefined rules, or a neural approach, where predictions stem from neural networks. The former grapples with scalability issues due to computational limitations and the intensive labor of rule specification, while the latter lacks guarantees on result correctness. Addressing these challenges, my research leverages the potential of neurosymbolic programming to scale up both the complexity and efficiency of automated information extraction tasks. The key idea of my work lies in bridging the gap between human intent expressed in natural language and automated data processing, making information extraction more intuitive and efficient. From an efficiency perspective, we introduce novel algorithms for automatically crafting regular expressions from multi-modal inputs and synthesizing visualizations based on natural language prompts. In terms of complexity, we propose neurosymbolic languages for both web and unstructured texts information extraction, combining symbolic operations to parse input structure with neural NLP modules, such as question answering systems and entity extraction, to comprehend texts semantics. Through empirical evaluation, our techniques have proven not only efficient but also superior in handling more varied information extraction tasks than existing methodologies. Looking ahead, to further streamline complex information extraction for users by leveraging advancements in machine learning, I aim to integrate different types of multi-modal language models with existing techniques. Furthermore, I aim to improve the usability and robustness of neurosymbolic program generation by developing human-in-the-loop synthesis techniques. Lastly, I am interested in developing generalizable and scalable neurosymbolic synthesis techniques that are easy to incorporate domain knowledge.

Bio:

Jocelyn (Qiaochu) Chen is currently a final year PhD student at University of Texas at Austin, advised by Isil Dillig and Greg Durrett. Her research interests lie in the intersection of programming language and machine learning. Her current research focuses on developing new neurosymbolic languages and synthesis techniques for end-user-oriented programming.