Across campus, Princeton researchers use machine learning to aid discovery

Written by
Allison Gasparini
Feb. 12, 2024

How does a researcher make sense of the data they collect over the course of what might be years of conducting experiments? They could have to comb through masses of data looking for patterns or they might want to predict unknown information using what they do have. Increasingly, machine learning is a tool that is being used across academic disciplines to help researchers make sense of their data and ultimately aid their research.

On January 17, the Center for Statistics and Machine Learning hosted a workshop exploring the ways in which machine learning can enhance research efforts across campus. The center organized the workshop as a part of Princeton University’s Wintersession series, which invites students and employees across campus to sign up for courses that cover subjects ranging from STEM to arts and crafts. 

Attendees ranging from undergraduate students to post-doctoral researchers gathered at the CSML building on Prospect Ave., largely with a shared goal in mind. “I’m here because I want to learn more about machine learning,” was the sentiment echoed by most participants who came from a range of backgrounds. Over the course of the afternoon, four researchers from different departments across the Princeton campus discussed how machine learning has enhanced and informed their own work.

AI as an aid for research across campus

“Machine learning use has been increasing exponentially in biology,” said Brian Arnold, who works as a data scientist in the Department of Ecology and Evolutionary Biology. He explained that when it comes to research, machine learning is useful for pattern recognition and predicting unknown quantities. For example, machine learning could be used to predict biological classifications from traits or information contained in the DNA of organisms. Arnold said that biologists have successfully used machine learning to study extinct species. With the help of AI in predicting the interactions of extinct species, researchers can recreate the evolution of food webs through time. 

In his talk, Arnold also emphasized that different types of machine learning models work best with different datasets. When determining whether machine learning can enhance your research, Arnold posits:  “If statistics can aid your research, then machine learning probably can as well.”

Professor Tom Griffiths presents a lecture at the 2024 Wintersession hosted by CSML

Professor of Psychology and Computer Science Tom Griffiths presents his talk at the 2024 Wintersession workshop hosted by the Center for Statistics and Machine Learning. 

Machine learning can also be used to make predictions about the contents of images and videos. Christine Allen-Blanchette, Assistant Professor of Mechanical and Aerospace Engineering, discussed how researchers can build better machine learning models by including information on the physical laws that govern the dynamics of an object. Like Arnold, Allen-Blanchette said machine learning is a handy tool for predicting information that is unknown to researchers.

For example, Allen-Blanchette co-founded the Machine Learning 4 Political Economy and Race Lab, where they work on creating models that explore patterns in socio-economic inequality. In this research, Allen-Blanchette said they have considered relationships between the political involvement of a population and their demographic data. A key question of their research asks, “Can we train a neural network to predict who’s voting and who’s not?”

Within the field of psychology, researchers are using machine learning to predict human behavior. In his work, Professor Tom Griffiths asked how well the similarity judgments of people could be predicted. He and his colleagues took a dataset of various animal images and asked people to compare and rate how similar the different images were. They found that a machine learning model did well at predicting how similar a person would deem two different images to be.

Psychologists tend to study the mind through questions boiled down to their most fundamental forms, using basic tasks that can be studied in the laboratory. Now, machine learning is allowing researchers to work with more expansive datasets. “We’re used to theories being simple things,” said Griffiths. Machine learning is making the process more complex. Instead of searching for which theory is “right,” said Griffiths, psychologists are now asking “where can different theories help us understand different things.”

In the Department of Philosophy, Professor Sarah-Jane Leslie is working with machine learning in the hopes of understanding how large language models (LLMs) operate similarly to humans, and how they are different. LLMs – perhaps the most popularly known one being ChatGPT – are trained on giant swaths of text taken from all around the internet. Leslie specifically uses types of statements known as generics to study how LLMs understand language versus how humans do.

An example of a generic would be “Ducks lay eggs.” It’s a generalization that has exceptions – take into account that in reality less than half of ducks lay eggs because male ducks don’t lay eggs and some female ducks are infertile. Leslie studied whether LLMs understand the exceptions to generics. She said that about 86 percent of the exceptions to generics generated by GPT-3 were good (e.g., to the generic “birds can fly,” GPT-3 said exceptions included birds that are very heavy, birds that are very old, and penguins). Yet, while humans tended to overgeneralize less when reminded of the exceptions to generics, LLMs generalized more.

By the end of the session, attendees had the opportunity to get a glimpse into the wide variety of ways that machine learning is being used in departments across campus as a tool of discovery. Professor Griffiths, who is also the Director of the Center for Statistics and Machine Learning, summed it up: “Machine learning is transforming the way we do research all over the Princeton campus.”