From analyzing fairness to detecting bias in videos, CSML poster session shines with student ingenuity

Monday, Jun 1, 2020
by Sharon Adarlo

In recent years, machine learning algorithms have become more ubiquitous in everyday life as people employ these methods to automate tasks and help in complex decision making in different sectors, from banking to criminal justice. But these algorithms can have built-in biases, reflecting the complexity of the tasks and the inherent biases in the data sets used to train these tools.

Raluca Cobzaru '20, a mathematics major, set out to explore biases in classification algorithms by attempting to translate the concept of fairness mathematically and provide procedural solutions that mitigate these biases.

"These are decisions that will impact people's lives," she said. "We want them to be as accurate and fair as possible."

She presented her project as part of the annual poster session held by the Center for Statistics and Machine Learning (CSML) in May. This year, the event was held online due to the social distancing restrictions stemming from the COVID-19 pandemic. Despite the novelty of the format, CSML's annual poster session was a success and boasted 72 students participating. This number exceeds last year's 62 students, reflecting the increasing growth and interest in data science on campus.

Student projects ranged from fundamental research on machine learning algorithms, the analysis of politically biased videos on YouTube to exploring classical music. The diversity of projects mirrored the many disciplines and departments represented.

The participating students, a mix of juniors and seniors, came from a wide range of disciplines: computer science, electrical engineering, mathematics, politics, psychology, astrophysics, economics, sociology, molecular biology, history, linguistics, neuroscience, African American studies, operations research and financial engineering, and public and international affairs.

The students' projects are a component of CSML's Undergraduate Certificate Program in Statistics and Machine Learning. A final requirement of the program is an independent project that incorporates data science in a significant way and a presentation at the poster session. 

Two students received special recognition this year for their research projects: 

John Hallman '20, a mathematics major, was honored for his work titled, "Non-Stochastic Control with Bandit Feedback." Hallman developed a control system approach for linear dynamical systems using only bandit feedback. His adviser was Elad Hazan, a professor of computer science.

Florence Wang '21, a WWS major, was honored for her work titled, "Punitive Taxes on the Poor"? A Quantitative Look at Social Stratification in U.S. Lottery Play." Wang examined what statistical factors contribute to lottery play by low-income groups. Her adviser was Timothy Nelson, lecturer in sociology and public policy.

"There was a lot of great work and a wide range of subjects that the students tackled. The students asked interesting questions in their projects and utilized the skills they learned in their CSML classes," said Peter J. Ramadge, the CSML director. "Overall, we are pleased with the students' high-level efforts, especially in light of the pandemic disruption on campus."

Ryan P. Adams, CSML undergraduate program director and professor of computer science, also applauded the projects.

"The students were very creative in their projects and the topics they chose. This event is also a chance to highlight what students take away from the CSML curriculum, which sets them up for success as researchers or industrial practitioners," he said.

Raluca Cobzaru said she found all aspects of her project useful and that it has informed her career path. After graduation, Cobzaru is enrolling at the Massachusetts Institute of Technology for a doctoral degree in operations research. She is interested in the intersection between statistics and policy.

"I felt a sense of accomplishment when I did my project," she said. "It is my first independent project that I nurtured and grew from beginning to completion."

Besides formulating fairness as a mathematical concept, Cobzaru's project examined if different fairness criteria proposed by scholars are compatible with each other and whether there were present algorithmic solutions to mitigate the fairness issues in classification procedures.

"The data sets we have in certain fields may be biased. Criminal sentencing data, for example, does not reflect crime rates, but rather arrest rates for different groups," said Cobzaru, claiming that merely making judgments based on historical data will perpetuate any existing biases. 

In her project, Farrah Lee-Elabd '20, a psychology major, examined if different races can detect fake or real smiles of their own demographic and people in outside groups. Lee-Elabd showed 95 students false and genuine smiles. Her participants' groups were white Americans, Chinese raised in America, and Chinese raised in China. She used three data science techniques for her project, including a machine learning tool for facial analysis. This tool validated whether the smiles she collected were fake or real. 

She found that white Americans performed best, Chinese Americans showed second best, and Chinese raised in China performed worst, Lee-Elabd said, cautioning that the results are inconclusive and may require further study. For example, the smiles from Chinese raised in China may be easier to detect compared to other smile images.

Max Piasevoli, a senior in computer science, wanted to know whether the YouTube recommendation algorithm had a political bias. This question is essential because YouTube has become a breeding ground for conspiracy theories and extremist ideology.

"I spend a lot of time on YouTube and read that it has more viewers between the ages of 18 to 49 than any single news channel, and its recommendation algorithm drives 70 percent of views. I was curious how ideologically diverse are the recommendations of the YouTube recommendation algorithm," he said. "I wanted to put this to the test."

Piasevoli trained a logistic regression classifier to predict the bias of various videos. 

"Ultimately, I found that YouTube pushes users towards partisan videos with a slight bias in favor of Democratic videos. Additionally, the average separation between divergent videos in the network graph suggests that YouTube does not have echo chambers," he said.

Rohan Rao, a senior in mathematics, set out to improve algorithms that can reconstruct molecules from electron microscope data. Since images from electron microscope data are noisy, it is difficult to reconstruct a molecule's structure.

"In this study, researchers have a lot of pictures of a molecule from different angles. My project aimed to efficiently cluster images of molecules in groups from similar viewing angles. When you have images from a similar angle, you can average them out to improve the quality of the images," he said.

Rao developed a tool that outperforms current methods, according to preliminary data. He plans on continuing to work on this project this summer because the results are encouraging. 

"We need to run more experiments at scale and verify if this algorithm outperforms the current one everybody is using," he said. "We want to improve our confidence in the results and see if this algorithm will be useful in practice."

Nikoo Karbassi, a junior in sociology, investigated if there is an association between social isolation levels in early childhood and frequency of social media use in adolescent years.

"Social isolation is a viable factor that could make some children more vulnerable to excessive social media use," said Karbassi, who looked at the data set from Princeton's Fragile Families & Child Wellbeing Study. 

She estimated social isolation by how much time a child spends with family and outside the family, such as a non-parental figure. 

She looked at associations between social isolation indicators at age nine, and the frequency of social media use at age 15. She then ran linear regressions between the social isolation indicators and social media use.

"I didn't find any significant association. My results suggest that individuals socially isolated in early childhood use social media at the same frequency as individuals who were not," said Karbassi, who noted more studies are needed to validate her findings. 

In addition to the results and skill set she acquired, Karbassi found the whole process to develop her project to be rewarding.

"This is my first extensive project," she said. "It taught me to take the initiative with research and go after questions I wanted to ask."