logo EECS Rising Stars 2023




Yangsibo Huang

Advancing Safety, Privacy, and Transparency in Large-Scale Machine Learning Systems



Research Abstract:

The continuous evolution of machine learning systems, growing in size and performance year after year. On the other hand, their increasing complexity brings forth new concerns related to their safety and data usage, demanding more rigorous attention. My long-term research agenda is addressing these critical safety and data-related issues associated with large-scale ML systems. My overarching objective is to develop ML systems that exhibit three fundamental attributes: safety, privacy preservation, and transparency. This commitment is evident in my Ph.D. research, which encompasses two core areas: 1) Safety and Privacy Risk Identification and Mitigation in Machine Learning Systems: Specifically, I have scrutinized widely deployed ML systems such as Federated Learning, Retrieval-based Language Models, and Open-source Large Language Models. By identifying potential safety and privacy pitfalls and developing strategies to minimize risks, I aim to enhance the dependability of these systems. 2) Regulatory Compliance and Data Governance: In addition to risk mitigation, I have worked extensively on ensuring compliance with critical regulations and policies, such as GDPR and CCPA, via developing methodologies and tools for auditing data usage, covering aspects like personally identifiable information and copyrighted data. This line of work promotes transparency and facilitates adherence to legal and ethical standards in ML deployment.

Bio:

Yangsibo Huang is a final-year Ph.D. student at Princeton University, co-advised by Prof. Kai Li and Prof. Sanjeev Arora. Her research primarily centers on the intersection of Systems and Machine Learning, with a focus on privacy, security, and safety. She exhibits a profound interest in practical applications, including but not limited to large language models, retrieval-based language models, and embedding models. Yangsibo is the recipient of the Wallace Memorial Fellowship (2023-2024). She also gained valuable industry experience during internships at Meta Research and Google Research, contributing to privacy-preserving techniques for large-scale machine learning models.