-
-
The use of machine learning (ML) methods for prediction and forecasting has become widespread across the quantitative sciences. However, there's a reproducibility crisis brewing. Indeed, we found 20 reviews across 17 scientific fields that find errors in a total of 329 papers that use ML-based science. Hosted by the Center for Statistics and Machine Learning at Princeton University, our online workshop provides an interdisciplinary venue for diagnosing and addressing reproducibility failures in ML-based science.
We especially welcome researchers outside traditional ML fields who are interested in applying ML methods in their own fields. Participants will learn to identify reproducibility failures in their own fields and ensure that their research is reproducible. Through our interdisciplinary workshop, we will:
-
Highlight the scale and scope of the crisis in ML-based science.
-
Identify root causes of the observed reproducibility failures and explain why they have occurred in dozens of fields that adopted ML methods.
-
Make progress towards solutions by outlining a concrete research agenda for reproducibility in ML-based science.
Talks
-
Background on the workshop and the crisis - Arvind Narayanan, Princeton University
-
Leakage and the reproducibility crisis in ML-based science - Sayash Kapoor, Princeton University
-
Overly optimistic prediction results on imbalanced data - Gilles Vandewiele, Ghent University
-
Is the ML reproducibility crisis a natural consequence? - Michael Roberts, University of Cambridge
-
Towards a definition of reproducibility - Odd Erik Gundersen, NTNU
-
Panel 1: Diagnose - Moderator: Priyanka Nanayakkara Panelists: Gilles Vandewiele, Michael Roberts, Odd Erik Gundersen
-
How to avoid machine learning pitfalls: a guide for academic researchers - Michael Lones, Heriot-Watt University
-
Consequences of reproducibility issues in ML research and practice - Inioluwa Deborah Raji, University of California Berkeley
-
When (and why) we shouldn't expect reproducibility in ML-based science - Momin M. Malik, Mayo Clinic
-
The replication crisis in social science: does science self-correct? - Marta Serra-Garcia, University of California San Diego
-
Panel 2: Fix - Moderator: Sayash Kapoor Panelists: Michael Lones, Inioluwa Deborah Raji, Momin M. Malik, Marta Serra-Garcia
-
Integrating explanation and prediction in ML-based science - Jake Hofman, Microsoft Research
-
The worst of both worlds: a comparative analysis of errors in learning from data in psychology and machine learning - Jessica Hullman, Northwestern University
-
What is your estimand? Implications for prediction and machine learning - Brandon Stewart, Princeton University
-
Panel 3: Future paths - Moderator: Arvind Narayanan Speakers: Jake Hofman, Jessica Hullman, Brandon Stewart
-
-
-
Presented by Matias Cattaneo, Professor of Operations Research and Financial Engineering.
Synthetic controls are widely applied to estimate the effects of policy interventions and other treatments of interests. The DataX Workshop on synthetic control methods seeks to provide an introduction to synthetic control methods for non-experts as well as an opportunity for researchers working on synthetic control methods to communicate new results, reach audiences outside their primary disciplinary fields, and seek potential collaborations.
Thursday, June 2
- 1:30 pm - 2:30 pm - Tutorial on SC Methods: Part I - Alberto Abadie (MIT)
- 2:50 pm - 3:50 pm - Tutorial on SC Methods: Part II - Devavrat Shah (MIT)
SESSION 1: Regina Liu (Rutgers), Session Chair
- 4:10 pm - 4:40 pm - A Design Based Perspective on Synthetic Control Methods - Guido Imbens (Stanford)
- 4:40 pm - 5:10 pm - Synthetic Control Methods: A Generative Machine Learning Perspective - Uros Seljak (Berkeley)
Friday, June 3
SESSION 2: Hongyu Zhao (Yale), Session Chair
- 8:30 am - 9:00 am - Casual Matric Completion (virtual session) - Anish Agarwal (MIT)
- 9:00 am - 9:30 am - Statistical Inference for the Factor Model Approach to Estimate Causal Effects in Quasi-Experimental Settings (virtual session) - Kathleen Li (UT Austin)
- 9:30 am - 10:00 am - Uncertainty Quantification in Synthetic Controls with Staggered Treatment Adoption - Yingjie Feng (Tsinghua)
SESSION 3: Rocio Titiunik (Princeton), Session Chair
- 10:30 am - 11:00 am - Theory for Identification and Inference with Synthetic Controls: A Proximal Causal Inference Framework - Eric Tchetgen Tchetgen (UPenn)
- 11:00 am - 11:30 am - Information Criteria and Degrees of Freedom for the Synthetic Control Method (with Zhen Xie) - Guillaume Pouliot (Chicago)
- 11:30 am - 12:00 pm - Randomization-Based Inference for Synthetic Control Estimators (joint with David Hirshberg) - Dmitry Arkhangelsky (CEMFI)
SESSION 4: Xu Cheng (UPenn), Session Chair
- 1:30 pm - 2:00 pm - Synthetic Learner: Model-free inference on treatments over time - Jelena Bradic (UC San Diego)
- 2:00 pm - 2:30 pm - Synthetic Interventions - Dennis Shen (Berkeley)
- 2:30 pm - 3:00 pm - Synthetic Controls for Experimental Design - Jinglong Zhao (BU)
-
-
Many scientific experiments generate large, multi-modal datasets, often in the form of time-series of different dimensionality. A particular challenge that scientists face in their workflows is comparing experiments to model and simulation, determining how close experiments match expected theory. The various analyses that scientists perform on these datasets can greatly be enhanced and accelerated by machine learning techniques, including recent deep learning and Bayesian inference techniques. The main objective of the workshop is to distill current machine learning techniques to a broad scientific audience at Princeton, and provide much needed research tools based on machine learning to advance their science. This should benefit mostly the Princeton research community but also the broader nearby research institutions.
Friday, May 13
- 8:40 am - 9:30 am - Intro to ML/DL, Use in Natural Sciences - Julian Kates-Harbeck
- 9:30 am - 10:20 am - Deep Learning Tutorial - Alfredo Canziani
- 10:20 am - 11:10 am - ML/DL for Computer Vision - Alfredo Canziani
- 11:30 am - 12:20 pm - Geometric Deep Learning - Dmitriy Smirnov
- 1:30 pm - 2:20 pm - Interpretability - Julius Adebayo
- 2:20 pm - 3:50 pm - Bayesian Inference/Markov Chain Monte Carlo (MCMC) - Peter Melchior
- 4:10 pm - 5:00 pm - Variational Inference - Jaan Altosaar
- 5:00 pm - 5:30 pm - Simulation Based Inference - Peter Melchior
-
-
Presented by Jose Garrido Torres, Data Scientist, and Vineet Bansal, Senior Research Scientist at Princeton Research Computing.
Friday, March 4
- Part 1: Introduction to some tools that computer programmers typically use to write and debug code in an effective manner.
Friday, April 1
- Part 2: Introduction to cloud computing (create and manage cloud computing resources) How to use some tools that offer the possibility of writing code locally while seamlessly executing/running it on powerful cloud computing.
-
-
-
This 2-day virtual workshop explores social biases in machine learning and in human nature; what social scientists and computer scientists can learn from each other. We bring cutting-edge, innovative sociology, social psychology, cognitive science, and computer science perspectives on the interplay between stereotyping and human and artificial intelligence.
Organizers: Susan T. Fiske & Xuechunzi Bai, Princeton University, Psychology
Sponsors: DataX at Princeton, Department of Psychology, Center for Statistics and Machine Learning- Introduction | Social Biases in Machine Learning and in Human Nature: What Social Scientists and Computer Scientists can Learn from Each Other - Susan J. Fiske, Professor of Psychology and Xuechunzi Bai, Graduate Student in Psycology, Princeton University
- Fairness in Visual Recognition - Olga Russakovsky, Department of Computer Science, Princeton University
- Bias and Norms - Thomas Kelly, Department of Philosophy, Princeton University
- Racial Discrimination in Hiring through the Lens of Field Experiments - Lincoln Quillian, Department of Sociology, Northwestern University
- Human Decisions in Social Environments - Elizabeth Bruch, Department of Sociology, University of Michigan
- Lessons for Artificial Intelligence from the Study of Natural Stupidity - Todd Gureckis, Department of Psychology, New York University
- Computational Justice: Simulating Structural Bias and Interventions - Ida Momennejad, Microsoft Research
- Is There a Filter Bubble on Social Media? A Call for Epistemic Humility - Arvind Narayanan, Department of Computer Science, Princeton University