SML 201 reaches peak enrollment with Daisy Yan Huang and Ricardo Masini as co-lecturers

Wednesday, Nov 24, 2021
by Sharon Adarlo

With a return to campus this fall, the Center for Statistics and Machine Learning’s flagship undergraduate course, SML 201: Introduction to Data Science, opened its enrollment to more than 150 students, the highest in its history and a sign of the burgeoning interest in data science and machine learning.

“Data science just became so popular in recent years. Many students recognize that knowing data science and machine learning can get them a leg up in applying for internships and can further their eventual career,” said Daisy Yan Huang, CSML lecturer, who is co-teaching the class with Ricardo Masini.

“Many times, I have students come back to me and say they are applying for an internship and often these internships have an analysis component,” she said. “In that regard the course has helped them undertstand the basic issues of data analysis and the tools that are available to address these issues.”

The class, which was introduced in spring 2016, is geared towards teaching essential tools for conducting research involving dataset analysis. Students learn to work with a variety of real datasets, write computer code to extract information and find hidden patterns in data, draw conclusions using sound statistical reasoning and produce scientific reports reporting conclusions based on real data. Students learn how to use basic data analysis techniques such as data visualization, computer simulations, creating reproducible research reports, hypothesis testing, and building and evaluating multivariate linear regression models.

Every year since its inception, enrollment and interest in the course and in the CSML undergraduate certificate program have grown. For example, 10 students earned the certificate in 2014, the year the program started. In 2021, 104 students earned certificates.

“The SML 201 course has been essential in introducing data science and machine learning to students on campus. It’s a great entry point into this lively discipline,” said Peter Ramadge, CSML director. “In turn, the course has helped fuel enrollment in our undergraduate certificate program, which has attracted majors from a wide variety of disciplines across campus.”

Current enrollment in SML 201 includes students from computer science, engineering, physics and mathematics, and students in social science, politics, ORFE, psychology, economics, molecular biology, and even a few students in the humanities, such as, history, art & archaeology, and Slavic literature, said Huang and Masini. The lecturers say that the course is accessible to many students. Students are not required to have prior programming or statistical experience.

The fall semester class had a unique set up since Huang and Masini are co-teaching, which makes handling the large number of students easier, they said.

Huang led lectures in the first half of the semester, which focused on teaching the students coding in R and statistical visualization. Masini is leading lectures in the second half, which is centered on statistical inference from datasets. While one lecturer leads the twice-weekly lectures, the other lecturer works in the background leading precepts, they both said.

The two teachers have sought to make the course relevant and exciting to the students by using real world examples in their lessons. One exercise involved data from speed dating events in New York City.

“We try to understand female and the male behavior and see how they behave during the speed dating event. We also try to see what kind of personality qualities (e.g., attractiveness, intelligence, being fun, ambition, etc.) the participants are looking for in the opposite sex. In particular, are their self-reported preferences consistent with their selection criteria,” said Huang. “For example, a person may say they really like strawberry ice cream but behavioral data says you actually like caramel because you ordered caramel most of the time when you visited an ice cream parlor even when the strawberry flavor was available.”

They have also had students look at election data, particularly the Hillary Clinton versus Donald Trump race. Huang said so many predictive polling models got the results wrong.

“We ask the students to look back at the election date and see what possibly went awry,” Huang said. “It’s a really fun, educational exercise.”

With last day of regular classes scheduled for the week of December 6th, both Huang and Masini hope that students can look back and find value in what they have studied in the course.

“I expect them to be able to perform insightful analysis with data and have an appreciation for statistical analysis,” said Masini.

I hope they had fun,” said Huang. “I want them to finish the course and say, ‘Hey, I really like data science. I want to take another data science course.’ ”