-
Amy Winecoff, DataX data scientist, teaches a workshop during Princeton University's Wintersession.
With machine learning making headways into a variety of research fields and industries and garnering media headlines, a five-day Wintersession mini-course offering an introduction to machine learning became a popular draw, with more than 200 people signing up for at least one of the five days.
The mini-course, Introduction to Machine Learning, was held from January 17th to the 24th, and was organized by the Princeton Institute for Computational Science and Engineering (PICSciE) and OIT Research Computing, with the Center for Statistics and Machine Learning (CSML) serving as a co-sponsor. Attendees were a mix of undergraduates, graduate students and postdocs, drawn from various departments across campus, according to organizers.
“The idea was to give attendees an overview of machine learning and get them exposed to both simple conventional models and to more advanced models such as neural networks, so they could work on computer vision and natural language processing if they should choose in the future,” said Jonathan Halverson, the research software and computing training lead with PICSciE and OIT Research Computing. “The mini-course is also intended to educate attendees on what’s happening in artificial intelligence today, specifically advanced chatbots and generative image AIs such as DALL-E and Stable Diffusion.”
The last day culminated in a hackathon in which attendees were tasked with different problems such as analyzing NFL statistics, sentiment analysis of movie reviews, and classifying images.
Two DataX data scientists participated as session leaders: Brian Arnold and Amy Winecoff. Arnold and Winecoff are two of several data scientists hired as part of Princeton University’s Schmidt DataX Fund, which aims to spread and deepen the use of machine learning on campus. CSML oversees part of this initiative and has been heavily involved in hiring and mentoring data scientists. Arnold works on biomedical data science projects, while Winecoff works in the Center for Information Technology Policy (CITP).
On the first day of the mini-course, Arnold gave a broad introduction to machine learning. He informed attendees about a variety of simple algorithms such as k-nearest neighbors, a type of data classification method that predicts a data point’s grouping according to the classification of neighboring data points. Arnold also covered more complex algorithms such as neural networks. Arnold said he hoped the session will inform attendees’ future endeavors.
“We went through an example for students on how to use machine learning, end to end. This involved downloading the data, processing it, and applying the model. This provided them a template they can use for their own research. And I hope they got an intuition on what machine learning is, how to use it, when to use it, and when not to use it,” said Arnold.
Winecoff taught one session on evaluating and improving models.
“The goal of this workshop was to provide students with an overview of how to conduct evaluations of machine learning models that provide honest estimates of model performance. In addition to covering standard practices in structuring machine learning model evaluations, the workshop also covered common pitfalls that can lead to inaccurate, overly optimistic estimates of model performance,” said Winecoff.
Gage DeZoort, a graduate student in the physics department and whose research uses machine learning in experimental particle physics, led a session that went into greater depth on neural networks. He also led a hackathon on the last day in which attendees used a small data set of 4,000 images to train a convolutional neural network to accurately classify new images.
“It’s not an easy problem because the images are not clean images of simple objects. For example, the images contain clutter and there can be object occlusions,” said DeZoort.
Christina Peters, a postdoctoral researcher at the University of Delaware’s Department of Computer & Information Sciences, taught a session on simple classical machine learning methods such as classification, regression and clustering. On the last day, she also led a hackathon on analyzing data on NFL quarterbacks to detect informative correlations in the dataset.
“I hope the students who are doing active research came away with appropriate tools or methods they can apply to their own work, while students who are curious about machine learning came away with a better understanding of it,” said Peters.
Halverson led a hackathon on natural language processing, specifically on developing a model to gauge the sentiment of movie reviews from the IMDB website.
Claire Willeck, doctoral student in politics, found the mini-course to be informative, valuable, hands-on, and practical. She is studying ways to teach civics so that young people become engaged in the political system.
“I use machine learning in my research to classify audio data in classroom environments,” she said. “We have statistics training in the politics department, and I wanted to get a refresher on these topics – so that’s why I signed up. But I am also interested in machine learning in general."