STATS357: Reliability and Validity in Artificial Intelligence

Tijana Zrnic, Stanford University, Spring 2026


Announcements


Lectures

Mon/Wed 9:30am-10:50am, McCullough 122


Office hours

TBD


Course description

This course examines the principles and methods required to make artificial intelligence (AI) systems reliable and scientifically sound. Topics include evaluation and benchmarking, notions of validity, distribution shift, predictive inference, AI-assisted statistical inference, data attribution, and beyond. Problem sets will involve both mathematical components and coding projects to see the practical effects of the methods we develop.


Syllabus

Lecture Date Topics Reading
1 Mar 30 Benchmarks; Holdout method TBD
2 Apr 1 Cross-validation; Bootstrap TBD
3 Apr 6 Model selection; Overfitting & selection bias TBD
4 Apr 8 Adaptive overfitting TBD
5 Apr 13 Internal, external, & construct validity TBD
6 Apr 15 Frontier lecture TBD
7 Apr 20 Distribution shift TBD
8 Apr 22 Predictive inference; Conformal prediction TBD
9 Apr 27 Predictive inference under distribution shift TBD
10 Apr 29 Calibration TBD
11 May 4 Multicalibration TBD
12 May 6 Frontier lecture TBD
13 May 11 AI for science; Prediction-powered inference (PPI) TBD
14 May 13 PPI pt. 2 TBD
15 May 18 AI-assisted annotation TBD
16 May 20 Data attribution TBD
17 May 25 Data attribution pt. 2 TBD
18 May 27 Frontier lecture TBD

“Frontier lectures” will consist of student presentations of frontier papers related to the class topics.