Eight new interdisciplinary research projects have won seed funding from Princeton University’s Schmidt DataX Fund, marking the third round of grants undertaken by the fund. The fund, supported through a major gift from the Schmidt Futures Foundation, provides grants to explore using artificial intelligence and machine learning to accelerate discovery.
The eight funded projects involve 13 faculty across seven departments and programs, from computer science to Near Eastern studies.
The projects run the gamut and include proposals such as deciphering large corpus of documents from 11th to 13th Century Egypt, advancing the performance of organic semiconductor devices, improving the safety of autonomous driving, and uncovering the dynamics of human thought.
“These projects are exciting because they explore how important and challenging problems can be tackled using modern data analysis and machine learning approaches while speeding scientific discovery. The core idea is to replace processes which are slow and laborious with machine-assisted methods,” said Peter Ramadge, director of the Center for Statistics and Machine Learning (CSML). “These projects are not confined to traditional ‘technical’ fields, but also span large scale problems in the humanities and social sciences.”
CSML is overseeing a range of efforts made possible by the Schmidt DataX Fund to extend the reach of data science and machine learning across campus. These efforts include the hiring of data scientists and overseeing the awarding of DataX grants. This is the third round of DataX seed funding with the first in 2019.
The selected projects and faculty participants are the following:
Handwritten Text Recognition for the Princeton Geniza Project (HTR4PGP)
Marina Rustow, Khedouri A. Zilkha Professor of Jewish Civilization in the Near East and Professor of Near Eastern Studies and History
The Cairo Geniza, a cache of medieval manuscripts discovered in an Egyptian synagogue, has helped historians reconstruct networks of ordinary merchants, craftspeople, women, children and the enslaved, stretching from Spain to Sumatra. But in the century since the cache's discovery, scholars have published fewer than 5,000 of its documentary texts. HTR4PGP seeks to triple that number by using machine learning to produce searchable transcriptions.
Machine Learning Methods for Next Generation Immuno-epidemology
C. Jessica Metcalf, Associate Professor of Ecology and Evolutionary Biology and Public Affairs, Princeton School of Public and International Affairs
Bryan Grenfell, Kathryn Briger and Sarah Fenton Professor of Ecology and Evolutionary Biology and Public Affairs, Princeton School of Public and International Affairs.
High dimensional immunological data is increasingly available, especially as the COVID-19 pandemic hammers home the importance of record keeping. Yet appropriate methods for analysis that yield insight into population outcomes and disease control remain elusive. To address this issue, this project seeks to develop a process that uses machine learning to analyze immunological data and uncover hidden mechanisms in infectious diseases that impact whole populations.
Enabling Crystalline Organic Semiconductor Devices
Barry Rand, Associate Professor of Electrical and Computer Engineering and the Andlinger Center for Energy and the Environment
Adji Bousso Dieng, Assistant Professor of Computer Science
Organic semiconductor devices are multilayered, sometimes involving seven to eight distinct layers but these layers are disordered and restrict the performance of organic semiconductor devices. This project addresses these shortcomings by using data science to design crystalline layers that will enable the further advancement of these devices.
Computational Approaches to Uncovering the Dynamics of the Stream of Thought
Yael Niv, Professor of Psychology and Neuroscience
Diana Tamir, Associate Professor of Psychology
In our minds, thoughts unfold continuously and freely. Typical methods used to analyze these unconstrained spontaneous thoughts are labor intensive, expensive and ineffective. To address these constraints, this project aims to develop machine learning tools to efficiently analyze the content and dynamics of spontaneous thought in order to lend insights into the function of thought and its clinical implications.
Learning How Quickly Antarctic Ice Shelves Melt Using Neural Networks
Ching-Yao Lai, Assistant Professor of Geosciences
As climate warms, substantial melting of ice shelves can accelerate sea levels rising in the future, but current methods to estimate melt rates are inadequate for data with large noises and poor resolutions. To more accurately evaluate how much and how quickly ice shelves are melting in Antarctica, this project proposes to use neural networks trained by both observational data and physical laws, thus evaluating the impact of this climate change phenomenon.
Efficient Parameter Estimation of Sampled Random Fields in Geophysics
Frederik Simons, Professor of Geosciences
The geosciences are awash with data such as measurements of the seismic, gravitational and magnetic properties of Earth, but techniques to analyze them are often inefficient because geoscience data can be incomplete and noisy. To work around these issues, we developed a computational technique to analyze geoscience data that is efficient, fast and robust. To further advance this technique, this project proposes to test this procedure by performing data analysis of terrestrial and planetary fields, with an emphasis on the topography of Venus and the bathymetry of the Earth's ocean floor.
Provably Robust Perception and Control for Safe Autonomous Driving
Jaime Fernandez Fisac, Assistant Professor of Electrical and Computer Engineering
Prateek Mittal, Associate Professor of Electrical and Computer Engineering
Autonomous vehicles are poised to revolutionize transportation but still face hurdles in navigating unexpected or even hostile situations. This project seeks to overcome these challenges by unifying robust visual perception and safe trajectory planning under a common framework.
Evaluation Frameworks for Privacy, Auditing and Valuation in Federated Learning
Sanjeev Arora, Charles C. Fitzmorris Professor in Computer Science
Kai Li, Paul M. Wythes '55 P86 and Marcia R. Wythes P86 Professor in Computer Science
Federated learning is an emerging machine learning technique that enables training models on a centralized cloud server while using data from different users, who keep their data on their own devices. This framework is intended to address concerns about security and privacy in the online world, but there is little research evaluating the level of security and privacy provided by federated learning. This project seeks to develop and evaluate different federated learning frameworks providing mechanisms for privacy preservation, for identifying whether traces of the user data remain in the trained model, and for evaluating the contribution of each participating user to the final model.
For information on the previous DataX grantees, read about the 2019 cohort and the 2021 cohort. CSML's DataX website can be found here.