Dating websites are not only a great place to meet a potential mate, but are also a treasure trove of data.
Some Princeton University students recently got the chance to delve into a data set scraped from the dating website OkCupid. They sought connections between people’s backgrounds and proclivities, and created data visualizations illustrating their findings. The students were taking part in a data visualization contest that Michael Guerzhoy, a lecturer at Princeton’s Center for Statistics and Machine Learning (CSML), held in his course, SML 201: Introduction to Data Science. The winners’ submissions were showcased during the last lecture of the semester on May 2. One student project correlated educational attainment levels and reported drug use, while another team investigated the relationship between the occupations of online daters and their Myers-Briggs personality types.
Guerzhoy’s course covered fundamental tools and techniques for data-driven research, including computer programming, statistics and machine learning, and the use of real-world data sets. The ultimate goal was for students to “learn to draw conclusions using sound statistical reasoning and to produce scientific reports,” according to Guerzhoy. The entries for the data visualization contest, which students could enter to earn extra credit, were judged on their originality, on the quality of the presentation, and on whether they provided an interesting, comprehensible demonstration of trends in the data.
“A lot of the time, people do exercises with small-scale standard data sets, with a predetermined final conclusion,” said Guerzhoy. “But with this contest, we are trying to get people to creatively explore and present novel results instead of performing a rote exercise. Everybody knows something about dating and that’s the advantage of this data set: anyone can form hypotheses.”
A total of 61 students from the two lecture sections of SML 201 -- over half of the class -- participated in the contest. Guerzhoy presented the work of several students and awarded winners with squishy CSML-branded stress balls in Princeton orange.
Vinicius Wagner and Hari Raval, both from the class of 2021, explored the relationship of extroverted/introverted personality traits and reported occupation. They found that people who go into business tend to be extroverts while people who go into academia and STEM tend to be introverts. They won a gold medal for their work.
Wagner said they used users’ self-reported Myers-Briggs personality types, as well as a list of keywords associated with introversion and extroversion to determine whether a person was an introvert or an extrovert based on their dating profile.
Betsy Vasquez ’20 and Kalyn Nix ’21 received a silver medal for illustrating the pitfalls of data visualization in their project. They computed the number of times that users with different astrological signs used the word “love” in their profiles. They produced a plot that exaggerated differences in the usage of “love” by different zodiac signs by cropping the y-axis, and titled it “Studies show that Aquariuses are the most loving.” However, as Vazquez and Nix’s second plot illustrated, for people of all signs, the percentage of profiles that mention “love” only ranged between 17.5 and 19.1 percent. Nix explained that the differences between the different astrological signs in terms of mentions of the word “love” are not statistically significant.
“We basically wanted to show that depending on how you present the data, it can tell you different things. We have two different graphs showing the same thing. We just changed the axes,” said Vasquez, referring to the two graphs they made.
“It was a lot of fun. We had a lot of laughs,” said Nix. “Everybody has an opinion about horoscopes.”