DataX workshop held for researchers who want to incorporate data science and machine learning into their work

Written by
Sharon Adarlo
May 25, 2022
Alfredo Canziani in front of a blackboard.

Alfredo Canziani, assistant professor of computer science at New York University, gave an interactive talk for DataX on May 13, 2022. Photo by Sharon Adarlo.

A two-day DataX workshop that covered a wide range of scientific topics, from Bayesian inference techniques to looking at machine learning in the context of the larger world, was held from May 13th to the 14th at Princeton University’s Friend Center. According to its organizers, the event, “Tutorial Workshop on Machine Learning for Experimental Science,” was meant to disseminate current topics and techniques in the field so that scholars may advance their research.

“When we organized this event, we wanted to give an entry point on deep learning and Bayesian statistics for scientists in the physical sciences. We saw the workshop as a way to help scholars better understand current research on these topics. That way, they may get ideas on how to apply these techniques to their own area of expertise,” said Michael Churchill, staff research physicist at Princeton Plasma Physics Laboratory (PPPL).

Dmitriy Smirnov and Michael Churchill speak in a lecture hall.

Dmitriy Smirnov, a computer science doctoral student at the Massachusetts Institute of Technology’s Computer Science and Artificial Intelligence Laboratory, and Michael Churchill, staff research physicist at Princeton Plasma Physics Laboratory, speak at a DataX workshop on May 13, 2022. Photo by Sharon Adarlo.

Besides Churchill, the event’s other organizers were Hantao Ji, professor in the Department of Astrophysical Sciences, and William Tang, principal research physicist at PPPL and lecturer with the rank of professor in astrophysical sciences.

The workshop was held under Princeton’s Schmidt DataX Initiative, whose mission is to spread and deepen the use of data science and machine learning on campus. The Center for Statistics and Machine Learning (CSML), which oversees portions of DataX, also helped organize the event with the workshop leaders. The next DataX workshop, Synthetic Control Methods, is from June 2nd to 3rd. Read more about it here.

The May workshop started off with a lecture by Julian Kates-Harbeck, director of physics modeling at Kernel, a neuroscience start-up. Kates-Harbeck centered his talk on giving attendees an introduction to machine and deep learning in the natural sciences. He then gave application examples in fusion, specifically predicting disruptions in fusion reactors.

Alfredo Canziani in front of a blackboard.

Alfredo Canziani, assistant professor of computer science at New York University, gave an interactive talk for DataX on May 13, 2022. Photo by Sharon Adarlo.

Alfredo Canziani, assistant professor of computer science at New York University, gave an interactive talk that touched on basics of deep learning and computer vision, but he also led a blackboard lesson based on questions from audience members. One of the topics he discussed was the difference between supervised and unsupervised learning.

Dmitriy Smirnov, a computer science doctoral student at the Massachusetts Institute of Technology’s (MIT) Computer Science and Artificial Intelligence Laboratory, gave a lecture on geometric deep learning. This is an approach to deep learning that builds neural networks to model complex, non-Euclidean data such as in molecules, graphs, manifolds and other data types.

Julius Adebayo, also an MIT doctoral student in electrical engineering and computer science, discussed the issue of interpretability. He gave his lecture remotely to the audience while Churchill facilitated the talk’s concluding Q&A.

“There’s a common downside to many neural networks or deep learning models – in that people see them as black boxes. You can make a prediction, but you have no idea how that happened,” said Churchill. “Explainable or interpretable A.I., the subject of Julius Adebayo’s talk, allows people to see how a neural network makes a prediction. It’s a way to explain what’s happening in the black box.”

Peter Melchior, assistant professor jointly appointed to astrophysics and CSML, gave two talks: one on Bayesian inference/Markov Chain Monte Carlo and the second on simulation- based inference. The notes to his talks are here.

Jaan Altosaar writes on a blackboard.

Jaan Altosaar speaks and writes during a DataX workshop on May 12, 2022. Photo by Sharon Adarlo.

Jaan Altosaar, who earned a physics doctorate from Princeton in 2020, gave an informal talk on machine learning and its place in the world and how data may be impacted by different outside factors. He drew examples from microelectronics, medicine, politics, COVID-19 pandemic and other domains.

Prasanna Balaprakash, a computer scientist at Argonne National Laboratory, opened the second day with the talk, “Democratizing Deep Learning Development with DeepHyper.” The day then concluded with lightening talks and group break outs.

Joseph Abbate, a doctoral student at PPPL, attended the workshop and found it rewarding.

“I attended because I wanted to learn the latest techniques -- the field is always changing,” said Abbate, who’s interested in controlling plasma fusion in tokamaks by building predictive machine learning models. “It was a great workshop, with many different perspectives on machine learning. There were definitely techniques I could take into my own research.”

Dmitriy Smirnov speaking in a lecture hall.

Dmitriy Smirnov, a computer science doctoral student at the Massachusetts Institute of Technology’s (MIT) Computer Science and Artificial Intelligence Laboratory, gave a lecture on geometric deep learning at a DataX workshop on May 13, 2022. Photo by Sharon Adarlo.