Schmidt DataX Fund supports projects that harness data science to speed discovery

Monday, Nov 18, 2019
by Sharon Adarlo

DataX Group Photo

Representing the nineteen faculty members from nine awarded proposals: 

Prateek Mittal, Electrical Engineering; Karthik Narasimhan, Computer Science; Simon Levin, Ecology and Evolutionary Biology; Dean Knox, Politics; Tom Griffiths, Computer Science and Psychology; Jared Toettcher, Molecular Biology; Ryan Adams, Computer Science; Olga Russakovsky, Computer Science; Barbara Engelhardt, Computer Science; Alin Coman, Psychology and Public Affairs; Mariangela Lisanti, Physics; Herschel Rabitz, Chemistry.

Nine data-driven research projects have won funding from Princeton University’s Schmidt DataX Fund, which aims to spread and deepen the use of artificial intelligence and machine learning across campus with the aim of accelerating discovery.

In February, the University announced the new fund, which was made possible through a major gift from Schmidt Futures. 

The newly selected faculty research projects the Schmidt DataX Fund is supporting are cross-disciplinary and involve 19 faculty members and several departments and programs. 

Among the research the projects explore are brain activity and language, crowd behavioral issues on social media, and the construction of the Milky Way. 

“The variety of projects and the mix of researchers involved shows that data science is of great interest to many disciplines on campus to help accelerate discovery, whether by scaling up experiments or modeling complex processes,” said Peter Ramadge, the Gordon Y.S. Wu Professor of Engineering and professor of electrical engineering and the director of the Center for Statistics and Machine Learning (CSML). “The use of advanced data science algorithms and modern computation enables research that was once difficult or impossible to conduct.” CSML is overseeing a range of efforts made possible by the Schmidt DataX Fund to extend the reach of data science across campus. 

The winning projects and research faculty are:

• Seeking to Greatly Accelerate the Achievement of Quantum Many-body Optimal Control Through the Use of Artificial Neural Networks

Herschel Rabitz, the Charles Phelps Smyth ’16 *17 Professor of Chemistry

This project seeks to harness artificial neural networks to design, model, understand and control quantum dynamics phenomena between different particles, i.e. atoms and molecules. 

• Society-Scale Behavioral Simulations Through Crowdsourcing

Tom Griffiths, the Henry R. Luce Professor of Information Technology, Consciousness, and Culture of Psychology and Computer Science; Alin Coman, associate professor of psychology and public affairs; Simon Levin, the James S. McDonnell Distinguished University Professor in Ecology and Evolutionary Biology; Elizabeth Levy Paluck, professor of psychology and public affairs; Elke Weber, the Gerhard R. Andlinger Professor in Energy and the Environment and Professor of Psychology and Public Affairs 

This project seeks to confront intertwined behavioral issues in online social networks that are impacting the global warming debate. 

• Secure and Private Federated Learning

Prateek Mittal, associate professor of electrical engineering; H. Vincent Poor, interim dean of the School of Engineering and Applied Science and the Michael Henry Strater University Professor of Electrical Engineering

This project will use data science techniques to look at security, privacy and utility issues in federated learning — a technique that allows computer programs to train from decentralized data — and design them to be more robust. 

• Unveiling the History of the Milky Way Galaxy

Mariangela Lisanti, associate professor of physics

This project, performed in collaboration with Timothy Cohen of the University of Oregon, aims to use machine learning techniques to study the data coming from a billion stars via the Gaia satellite, launched in 2013, to shed light on the evolution of the Milky Way. 

• Bayesian Optimization Approach to Experimental Design for Optobinder Synthesis

Barbara Engelhardt, associate professor of computer science; Mengdi Wang, associate professor of operations research and financial engineering; and Jared Toettcher, assistant professor of molecular biology 

Optogenetics, a technique that enables researchers to use light to turn on and off proteins in cells, has the potential to help in understanding diseases such as cancer and cholera. But scientists don’t have specific strategies to cause any protein to respond to light except for certain proteins from plants. This research aims to use machine learning techniques to model strategies that may make more proteins more responsive. 

• Decoding the Language of the Brain

Uri Hasson, professor of psychology and the Princeton Neuroscience Institute, and Karthik Narasimhan, assistant professor of computer science

This project aims to dive into the inner workings of the human brain and study how our thoughts become words, how we communicate with each other, and our use of language. 

• Towards Compositional Grounding of Language in Perception

Olga Russakovsky, assistant professor of computer science, and Karthik Narasimhan, assistant professor of computer science

This project will establish metrics to evaluate the accuracy of various image captioning systems and find methods to improve these systems. 

• Physical Priors for Generative Modeling of Molecular Structures and Interactions

Ryan Adams, professor of computer science, and Abigail Doyle, the A. Barton Hepburn Professor of Chemistry 

Synthesizing new chemical compounds for the pharmaceutical industry and other industrial sectors can be a difficult and expensive process. This project aims to streamline that process by employing machine learning techniques to model molecular structures and their chemical interactions in order to create new, useful compounds or easier pathways to generate desired compounds. 

• A Toolkit for the Quantitative Analysis of Audiovisual Speech Data

Dean Knox, assistant professor of politics

This project aims to develop a machine-learning powered program that collects, cleans and analyzes audiovisual data in human speech, adding to the overall effectiveness of communication analysis.