Lucy Li

Position: PhD Candidate Institution: University of California, Berkeley

Depictions and Expressions of Social Groups and Identities in Natural Language Processing

Research Abstract:

My doctoral research in natural language processing investigates how language expressed by social groups may vary across audiences, and how portrayals of people can reflect beliefs and perceptions around them. My work engages with multiple ways in which people and language data intersect: language is produced by people for their audience, and sometimes to talk about people. I address two key questions: 1) How can NLP measure context-dependent depictions and expressions of people in text? 2) In return, how can these measurements inform how we should build better models? Though I primarily publish in NLP venues, my research intersects data mining, cultural analytics, computational social science, and AI fairness. I have measured social aspects of language in contexts as fine-grained as individual sentences, to ones as broad as hundreds of communities. The data I analyze includes language model pretraining data and output behavior, as well as text from school curricula, social media, and science. I question defaults and assumptions: the default representation of social groups in datasets and models, and who is excluded or disadvantaged by these defaults. I emphasize social scientific questions in my research because they can motivate new tasks and approaches, raising the bar for the effective analysis and distillation of data. Interpretability, robustness, and accessibility of methods are challenges that prevent core NLP approaches from being immediately transferable to real-world applications. For example, I am currently experimenting with large language models (LLMs) to measure the representation of characters of color in literature taught in U.S. classrooms. I must design approaches that adequately consider theory from social psychology, and mitigate the degradation of current models’ performance across names of different cultural and ethnic backgrounds. Thus, my research benefits both scientific inquiry and NLP methodology.

Bio:

Lucy Li is a PhD student at the University of California, Berkeley, working on natural language processing (NLP), computational social science, cultural analytics, and AI fairness. She researches how social groups are discussed and represented in language models and textual data, such as textbooks, fiction, and online forums. She is passionate about bridging NLP with the humanities and social sciences, especially education and curriculum studies. She is supported by a NSF Graduate Research Fellowship, and has interned at Microsoft Research and the Allen Institute for AI, the latter which awarded her Outstanding Intern of the Year.