SML Courses
The Statistics and Machine Learning (SML) courses at Princeton University provide students with a general understanding of the field. Taught by leading experts, the curriculum covers a broad range of topics, including data collection, analysis, and deep learning. The program equips students with the skills and knowledge necessary to pursue opportunities in data science, machine learning, and related fields. The courses are challenging yet rewarding, offering students the chance to work on real-world data science projects.
-
-
Overview
(Please note: This class has no prerequisites and is open to students with little or no prior programming or statistics experience. It is offered both semesters.)
Introduction to Data Science provides a practical introduction to the burgeoning field of data science. Upon completion of the course, students will have learned essential tools for conducting research involving dataset analysis.
Among other skills, students will learn how to:
- Work with a variety of real datasets
- Write computer code to extract information and find hidden patterns in data
- Draw conclusions using sound statistical reasoning
- Produce scientific reports based on real data
- Use basic data analysis techniques such as stratification, classification, and regression
The world is increasingly data driven. Students will find this class to be a challenging intellectual journey and a practical way to gain an edge in their college research; in preparing their Junior Projects and Senior Theses; in competing for top internships; and in preparing for their lives after graduation.
Please visit the archived page for the Spring 2020 course. Note that the course changes and evolves over time.
-
-
Overview
This course provides the training for students to be independent in modern data analysis. The course emphasizes the rigorous treatment of data and the programming skills and conceptual understanding required for dealing with modern datasets. The course examines data analysis through the lens of statistics and machine learning methods. Students verify their understanding by working with real datasets. The course also covers supporting topics such as experiment design, ethical data use, best practices for statistical and machine learning methods, reproducible research, writing a quantitative research paper, and presenting research results.
Sample Reading List
- Trevor Hastie, Robert Tibshirani, and Jerome Friedman, Elements of Stat Learn: Data Mining, Inference, & Prediction
- Jon Krohn with Grant Beyleveld and Aglaé Bassens, Deep Learn Illustrated: A Visual, Interactive Guide to AI
Prerequisites and Restrictions
SML 201 or other equivalent courses. One semester of calculus or discuss with the course instructor. Students are expected to have taken at least one introductory course that explores the fundamental statistical concepts in data science. Familiarity with R or Python programming is assumed.
More information can be found at the course registrar page.
Related links
-
-
Overview
This course provides a comprehensive and practical background for students interested in continuous mathematics for computer science. The goal is to prepare students for higher-level subjects in artificial intelligence, machine learning, computer vision, natural language processing, graphics, and other topics that require numerical computation. This course is intended students who wish to pursue these more advanced topics, but who have not taken (or do not feel comfortable) with university-level multivariable calculus (e.g., MAT 201/203) and probability (e.g., ORF 245 or ORF 309). See "Other Information"
Sample Reading List
- Marc Deisenroth, Aldo Faisal, and Cheng Soon Ong, Mathematics for Machine Learning
Reading/Writing Assignments
About ten homeworks assigned approximately weekly, each with both a theoretical and programming component.
Requirements/Grading
- Mid term exam - 20%
- Final exam - 20%
- Problem set(s) - 60%
Prerequisites and Restrictions
The prerequisites are COS 126 and MAT 202.
Other Information
Topics will include vectors, matrices, norms, orthogonality, projection, eigenvalues, singular value decomposition, basic vector calculus, introductory probability, Monte Carlo, information theory, convex optimization, Lagrange multipliers, and gradient descent. Assignments will have both conceptual and coding components. Students will complete the coding portions in Python. Familiarity with programming will be assumed, but expertise in Python is not required.
-
-
Overview
This is a class about using the tools of machine learning to study social data. The power of machine learning tools is their applicability around a wide range of tasks. There are huge opportunities for applying these tools to learn and make decisions about real people but there are also important challenges. This course aims to (1) show social scientists and digital humanities scholars the potential of machine learning to help them learn about humans, make policy and help people while also (2) showing computer scientists how a social science research design perspective can improve their work and give them new outlets for their skills.
Sample Reading List
- Salganik, Matthew, Bit by Bit: Social Research inn the digital age
- Grimmer, Roberts, and Stewart, Text as Data
- Barocas, Hardt and Narayanan, Fairness and machine learning: Limitations and Opportunities
- Benjamin, Ruha, Race After Technology: Abolitionist Tools for New Jim Code
Reading/Writing Assignments
Major assignments include: collaborative annotation and reading, precept coding assignments, three problem sets and a take-home final exam.
Requirements/Grading
- Take home final exam - 30%
- Programming assignments - 10%
- Class/precept participation - 15%
- Problem set(s) - 45%
Other Requirements
- Statistical, design or other software use required
Prerequisites and Restrictions
Some foundation in how to do basic programming in R. A basic statistics or machine learning course that provides familiarity with linear regression (e.g. POL 345, SML 201, COS 324)
Other Information
If the course is closed, you may fill out this form to be added to the wait list.
-
-
Overview
A project-based seminar course in which students work individually or in small teams to tackle data science and machine learning problems, working with real-world datasets. The course emphasizes critical thinking about experiments and large dataset analysis and the ability to clearly communicate one's research. This course is intended to support students in developing the analytical skills necessary for quantitative independent work. Students are not required to bring in their own project proposal and dataset for this course; however, if they do, students should consult with their home department about how this course could appropriately complement, but not replace, their independent work requirements. (Note: SML 312 is an alternative version of SML 310; the two courses are equivalent.)
Enrollment to the Course
For enrollment, please use the SML 310/312 Enrollment form.
Please submit applications as soon as possible. Applications will be considered on an ongoing basis subject to space availability. Students will be notified by email of acceptance decisions.
FAQs
Can I use my work in SML 310 as part of my undergraduate thesis?
With permission from their thesis advisor and/or their undergraduate Department, students can incorporate work they did in SML 310 into their thesis.
Students should indicate in their thesis which parts of the work were completed as part of SML 310.
Can I use my work in SML 310 to fulfill the CSML Certificate’s Independent Work requirement?
That is in principle possible if the scope of your SML 310 project is sufficiently large and the write-up is sufficiently comprehensive. However, many SML 310 students would need to expand their SML 310 project in order to fulfill the IW requirement.
Can I use work that I did for an earlier project (e.g. for my JP) as my project in SML 310?
You cannot resubmit work that you already have completed elsewhere. You can build on work that you had done previously. Your write-up must clearly indicate what part of the work was done as part of SML 310, and what part of the work was done earlier. -
-
Overview
This course provides an introduction to Bayesian analysis---a powerful statistical framework for making inferences and modeling uncertainty in a wide range of applications. Students will explore the fundamental principles of Bayesian statistics, probability theory, Bayesian inference, and practical applications of Bayesian modeling. The course will cover both the theory and hands-on implementation using data science software and the R programming language.
Sample Reading List
- Alicia A Johnson, Miles Q Ott, Mine Dogucu, Bayes Rules!
- Richard McElreath, Statistical Rethinking
- Andrew Gelman, Bayesian Data Analysis
- Will Kurt, Bayesian Statistics the Fun Way
- Gary L Rosner, Bayesian Thinking in Biostatistics
Reading/Writing Assignments
Weekly problem sets (about 10 to 15 mathematical and computer programming tasks from the textbook and/or created by the instructor) will be assigned. Students will be presented scenarios of model summaries and results and will be asked to answer inference tasks. The culminating semester project will be due on Dean's date and will be assessed according to a rubric that will be provided in advance.
Requirements/Grading
Term Assessments:
- Project(s) - 10%
- Presentation or performance - 20%
- Papers/writing assignments - 15%
- Quizzes - 10%
- Programming assignments - 25%
Final Assessments:
- Final paper or project - 20%
Other Requirements
- Statistical, design or other software use required
Prerequisites and Restrictions
Basic knowledge of probability distributions (binomial, normal) Understanding of statistical concepts (sampling, confidence intervals) Familiarity with R or Python programming is assumed One semester of Calculus or instructor approval.
Enrollment to the Course
For enrollment, please use the SML 320 Enrollment Application form.