
Taken by the Hubble Space Telescope, an image of a grand design spiral galaxy stuns with vibrant hues of pinks and blues and stars distributed like glitter. While images like these convey triumphant advancements of astronomy, not all data can be collected with such high magnification and detail.
In truth, any highly detailed data taken from galaxies is typically focused on a small, specific region of the night sky. Meanwhile, large surveys which scan much larger regions may take in far more data but sacrifice detail. This leaves researchers wondering how to merge the high quality and high quantity datasets to get a more holistic scientific understanding of galaxies.
With the help of machine learning, Peter Melchior, assistant professor of statistical astronomy at Princeton University’s Department of Astrophysical Sciences and the Center for Statistics and Machine Learning, is working to solve the problem to maximize the value of the information extracted from these complementary datasets. “We’re trying to find the limit of what the data can tell us in order to tease out more information about galaxies,” said Melchior.
Finding the limit
In a seminar given at the Center for Statistics and Machine Learning on Nov. 26, Melchior discussed the ways that he and his research team are using machine learning to reach beyond the limits of observational astronomy – and recover implicit information from collected data.
Methods of observational astronomy tend to fall into a dichotomy. On one hand, researchers may use a telescope that is extremely capable in one specific way, and then study a small number of objects with it, such as galaxies. “And then there is this other style where you use an instrument that is very broadly capable, and use that for something like a billion galaxies, but individually, per galaxy, we get much less information,” said Melchior. The result is that datasets are often too limited to find rare events – those “things that are astonishing and would challenge our theories of how certain things work,” said Melchior.
In order to train a machine learning model to elucidate further insights from a limited dataset, Melchior and colleagues train the model on a small dataset of high quality taken from a high precision instrument. This allows the model to be more predictive with less data, said Melchior. Then, to gain better statistical analysis, they update the model on a much larger dataset – which is typically less individually informative due to the less precise instrument which collected it.
“Machine learning in the physical sciences is different in important ways from machine learning in other fields,” said Melchior. Physical systems must follow physical laws, so the set of behaviors which can be found in a given dataset are generally specified and restricted – meaning researchers can make certain assumptions about the behavior of a dataset they’re working with. “By knowing how a system behaves, you can apply this in cases when you can't really observe it in detail.”
Beyond using machine learning for astrophysics, over the last three years Melchior has also been working with Reed Maxwell, professor of civil and environmental engineering and the High Meadows Environmental Institute, on physical hydrology models. Together, the researchers have built a continental-scale model for the groundwater distribution in the United States which brings together data taken from various locations and times that may otherwise seem disconnected. “We can tie all of that data together to better estimate drought conditions, wildfire conditions, and also the availability of drinking water and water for agricultural purposes,” said Melchior.
Translating techniques used in astrophysics to be used in a geophysical context is an aspect of his work as a researcher that Melchior said he finds particularly enjoyable. “We can use our research to achieve a large and important societal goal,” he said.