Michael Guerzhoy: new lecturer at Center for Statistics and Machine Learning

Tuesday, Sep 4, 2018
by Sharon Adarlo

Data scientist Michael Guerzhoy will join Princeton University’s Center for Statistics and Machine Learning as a lecturer, effective September 1.

In the fall semester, Guerzhoy will be teaching SML 310: Research Projects in Data Science, a course designed to allow students to explore large-scale data science projects with a dataset of their choice. Lectures and workshops during the course will support students in their individual projects. In the spring, Guerzhoy will teach SML 201: Introduction to Data Science. In the course, students will learn the basics of data science and be introduced to advanced topics, from data visualization to predictive modeling.

“I am excited to come to Princeton and teach here,” he said. “It’s fair to say that Princeton has some of the best students in the United States and the world.”

Guerzhoy has recently held appointments as an assistant professor (status only) at the University of Toronto’s Department of Statistical Sciences and as a data scientist at the Li Ka Shing Knowledge Institute at St. Michael’s Hospital in Toronto.

He earned three degrees from the University of Toronto: bachelor of science in computer science, mathematics, and statistics in 2007; master of science in computer science in 2010; and master of science in statistics in 2014.

He was a software developer in Epson’s image algorithms team, from 2005 to 2006, and then again in 2007. In this role, Guerzhoy worked on computer vision/image processing research and development for Epson’s scanner and printer drivers. He specifically worked on algorithms for photo orientation detection, image segmentation, and object detection. His object detection projects included both humans and cat faces. He received patents on some of the applications he developed while at Epson.

Previously, Guerzhoy was also a visiting engineer and researcher at Inria, the French National Institute for Research in Computer Science and Automation, as well as lecturer and researcher at the University of Toronto. Since 2013, Guerzhoy has also worked as an independent consultant on data science, machine learning, and statistical methodology for industrial and academic clients.

These projects varied in scope: Guerzhoy consulted for homesters.com - a crowd-sourced review website for contracting services - on their algorithms for ranking contracting companies; developed and performed statistical analysis of data concerning restaurants chains’ compliance with food safety regulations for Marketplace, a consumer research show on CBC Television; and consulted on machine learning techniques and statistical analysis techniques for several research groups at the University of Toronto.

“We are excited to have Michael Guerzhoy joining CSML,” said Peter J Ramadge, the CSML director, the Gordon Y.S. Wu Professor of Engineering, and Professor of Electrical Engineering. “Michael has an excellent record of teaching and industrial data science experience. He will add to the intellectual richness of our community here at CSML.”

Guerzhoy, who was born in Moscow, Russia, was not sure whether he wanted to go into physics or another career path until his third year as an undergraduate.

“I had become interested in data and figuring out what data sets say. So I combined my interest in computer science and artificial intelligence,” he said about his journey into data science.

In his role as a hospital data scientist, Guerzhoy has been involved in creative applications of data science. One of his projects involved developing an early warning system for General Internal Medicine (GIM) patients. This system analyzed patient data and made predictions on whether a patient would need to be transferred to ICU, or was at risk of unexpected death. The system alerted doctors to intervene in order to prevent bad outcomes, he said.

“It’s a very exciting project because by analyzing hospital lab data and vital signs, we can predict the risk of adverse outcomes, and in so doing, help doctors and patients in real time,” he said.

Another interesting hospital project involved processing dictated clinical notes. He lead a project with the goal of automatically extracting information, such the diagnosis or the prescribed medication, from the notes.

“The amount of data being collected at hospitals has grown in the last five years,” he said. “Clinicians and hospital administrators have increasingly realized this data can be very useful.”

Why has interest in data science and machine learning exploded in recent years?

Guerzhoy said it’s because organizations increasingly realize the value of collecting and analyzing data to improve their goods, their services, and their bottom line. In the case of hospitals, it can improve patient outcomes and increase the effectiveness of doctors and nurses. The explosion in available data, and advances in hardware and algorithms have been important enabling mechanisms, he said.

“The internet has enabled researchers to collect vast amounts of data. For example, Facebook handles and stores data that is truly huge,” he said. “Advanced hardware and better algorithms have enabled people to fit complex models to such large datasets, and to make accurate predictions about the data. This was not possible a few years ago.”