With the launch of the Gaia satellite in 2013, scientists have been receiving an unprecedented quantity and quality of data on a billion stars in the Milky Way. But the data sets are incomplete. Gaia does not currently provide the complete position and velocity coordinates for all billion stars.
Mariangela Lisanti, associate professor of physics, wants to harness machine learning techniques to help fill in the missing information. She plans to use the completed data set to figure out which stars were born here and which were dragged into the Milky Way by cosmic collisions. This classification can be leveraged to further understand how dark matter is distributed within the galaxy.
This project was one of the first nine funded last year by Princeton University’s Schmidt DataX Fund, which aims to spread and deepen the use of data science and machine learning across campus with the aim of accelerating scientific discovery. In February last year, the University announced the new fund, which was made possible through a major gift from Schmidt Futures.
“The first DataX projects are interdisciplinary in scope and cover a wide range of fields, from chemistry to cosmology,” said Peter Ramadge, director of the Center for Statistics and Machine Learning, which oversees parts of the DataX Fund. “The breadth of the projects show that modern data science and machine learning are broadly applicable research tools helping expand our knowledge of the world.”
In Lisanti’s project, machine learning can help them “reconstruct the family tree of the galaxy and the history of collisions that built up the galaxy today,” she said.
“If we have a better understanding of the Milky Way, we can also use that to map the dark matter distribution,” she added.
For this research, she collaborated with Timothy Cohen, associate professor of physics at the University of Oregon, and a team made up of several postdocs, grad students and undergrads on campus.
In the first arm of the project, Lisanti’s team used machine learning to estimate missing data from the Gaia data set. This enable researchers to more easily find interesting patterns amongst the positions and velocities of the stars. Any unusual clustering of the stars could be evidence of `debris’ left behind by another galaxy that merged with our own and got shredded over time, thus showing how our own galaxy has evolved, Lisanti said.
The team has successfully developed a procedure that to learn the distributions and correlations of the missing data. They are currently fully testing the method on simulated catalogs of Gaia data, before applying it to the real data. Additionally, work has already started on the other two arms of the project. One undergraduate has built expertise in simulating these galactic mergers, which is a crucial ingredient for interpreting the results of the team’s future analyses on Gaia data. Another undergraduate has started developing statistical tools that will be used to better quantify clustering patterns in the stellar data.
“When we perfect all these new methods, we can then apply them to the Gaia data set,” Lisanti said. “They will help us interpret the Gaia information and we can identify patterns and signs that will help us in our understanding of the universe or uncover new, unexpected information.”
Lisanti, who has been buoyed by the project’s results so far, also sees that machine learning has a promising future in her research field. Machine learning is such a new tool that so many theoretical particle physicists like herself are “throwing a lot of problems” at it and seeing where it can be of most use, Lisanti said.
“I am a high energy particle physicist so I am interested in the fundamental nature of matter in the Universe,” said Lisanti. “The kind of work I do involves a lot of data mining. Machine learning tools can dive into these massive amounts of data coming from satellites and particle colliders, find interesting patterns and help us elucidate how things work and change in the universe.”