STATS357: Reliability and Validity in Artificial Intelligence

Tijana Zrnic, Stanford University, Spring 2026

Announcements

Welcome to STATS357!

Lectures

Mon/Wed 9:30am-10:50am, McCullough 122

Office hours

TBD

Course description

This course examines the principles and methods required to make artificial intelligence (AI) systems reliable and scientifically sound. Topics include evaluation and benchmarking, notions of validity, distribution shift, predictive inference, AI-assisted statistical inference, data attribution, and beyond. Problem sets will involve both mathematical components and coding projects to see the practical effects of the methods we develop.

Syllabus

Lecture	Date	Topics	Reading
1	Mar 30	Benchmarks; Holdout method	TBD
2	Apr 1	Cross-validation; Bootstrap	TBD
3	Apr 6	Model selection; Overfitting & selection bias	TBD
4	Apr 8	Adaptive overfitting	TBD
5	Apr 13	Internal, external, & construct validity	TBD
6	Apr 15	Frontier lecture	TBD
7	Apr 20	Distribution shift	TBD
8	Apr 22	Predictive inference; Conformal prediction	TBD
9	Apr 27	Predictive inference under distribution shift	TBD
10	Apr 29	Calibration	TBD
11	May 4	Multicalibration	TBD
12	May 6	Frontier lecture	TBD
13	May 11	AI for science; Prediction-powered inference (PPI)	TBD
14	May 13	PPI pt. 2	TBD
15	May 18	AI-assisted annotation	TBD
16	May 20	Data attribution	TBD
17	May 25	Data attribution pt. 2	TBD
18	May 27	Frontier lecture	TBD

“Frontier lectures” will consist of student presentations of frontier papers related to the class topics.