Adriana Sejfia
PhD Candidate
Research Abstract:
Code can become exploitable in various manners. For instance, software engineers can inadvertently introduce errors that lead to vulnerabilities. These vulnerabilities create vectors of attack that can be exploited by potential attackers. Another way in which code becomes exploitable is through a lack of adherence to privacy requirements: when working with user data, engineers may end up writing code exploits that harm those users by violating their privacy. Lastly, malicious actors can exploit third-party platforms to introduce vulnerable code that will be executed by the users of these platforms; engineers who maintain these platforms may not always be able to detect this malicious code swiftly. Approaches that automatically help engineers locate this exploitable code enable quicker mitigation of the risks imposed by them. In my work, I focus on advancing the efficient identification of these three types of exploitable code: code vulnerabilities, violations of privacy requirements, and malicious third-party code. Within the identification of code vulnerabilities, recently we have seen an explosion of Deep Learning (DL) based solutions. Though these solutions have improved over the traditional solutions (using static analysis or Machine Learning (ML)), their success is still difficult to reproduce in vulnerabilities drawn from distributions other than what the DL models have been explicitly trained on. Through my work, I attempt to remedy this by (1) improving the quality of vulnerability data through automated noise removal, (2) highlighting oversights of current solutions in what they consider to be a vulnerability, and (3) ensuring existing solutions are realistically trained and evaluated. Privacy requirements for developers that access user data are an important aspect of respecting these users’ privacy. However, most developers do not have the expertise required to ensure that the various privacy requirements are satisfied. In my work, I attempt to enshrine the expertise coming from privacy experts in automated algorithms that can track developers' code; this way developers can independently and privately access user data. Helping maintainers of platforms that host third-party code detect malicious code has been the focus of a lot of software engineering researchers. Existing work in this area has proposed a range of solutions relying on static analysis or machine learning. The former ones produce significant levels of false positives, which renders them unusable in a practical setting. The latter solutions had not been extensively tested which would demonstrate their practicality. My work in this area has yielded an ML-based approach, considering a wider range of features than previous work, that was tested extensively and was able to identify 93 new samples of malicious code in NPM, the javascript package manager.
Bio:
Adriana Sejfia is a Ph.D. candidate at the University of Southern California (USC). Adriana’s research lies in the field of software engineering. Specifically, she seeks to provide developers with actionable insights on their security and privacy-related tasks by using program analysis and machine learning. Her work has been accepted and presented at conferences like ICSE, FSE, and ICSA. Adriana’s research has been supported by a Google Ph.D. fellowship and an Annenberg Fellowship. In the course of her Ph.D. studies, she has also completed research internships at Google and GitHub. Adriana served as a board member of the Women and gender minorities club (WinCC) at USC and is passionate about bringing more equity into the world of computing.