26 Prospect Ave

The Center for Statistics and Machine Learning is a focal point for education and research in data science at Princeton University. By its nature, CSML is an interdisciplinary enterprise. The center’s mission is to foster and support:

  • a community of scholars addressing the manifold challenges of modern data-driven exploratory research
  • the development of innovative methodologies for extracting information from data
  • the education of students in the foundations of modern data science

The center supports and collaborates on research and teaching that combines insights from computation, machine learning, and statistics with specific application domains. To encourage a flow of ideas, CSML welcomes connections with faculty, departments, centers and institutes across the Princeton campus.  In addition to exploring novel applications, the center supports innovations in the theoretic foundations of data science, including advanced algorithms for big-data problems, machine learning, optimization, and statistics.

Established in July 2014, the Center for Statistics and Machine Learning is part of a rich and influential history in data science at Princeton University. Individuals such as Samuel Wilks, John Tukey, William Feller, Alonzo Church, Alan Turing, and John Von Neumann played key roles in pioneering the use of statistics, probabilistic models, and computers to solve real world problems. The Cooley–Tukey FFT algorithm (1965), and the initiation of the ImageNet database (2009) are two prominent examples of Princeton’s prior contributions to data science.

The center is housed at 26 Prospect Avenue (Bendheim Center for Finance Building, and formerly Dial Lodge).

Barbara Engelhardt, an assistant professor of computer science at Princeton and SML Certificate Executive Committee Member, is a principal investigator with the GTEx Consortium, an international group of researchers studying the diversity of genetic roles in maintaining human tissues.
Olga Troyanskaya and her team have developed techniques to comb large collections of genomic and other data to make fundamental discoveries and identify new therapeutic targets.
Cathy Chen remembers wondering, as a freshman, how her interests in applied math, algorithms, and programming would ever come together.  “I’d say the ‘a-ha!’ moments came during my sophomore year,” she recalls.


Upcoming Events