1. Learning with mirror descent, Patrick Rebeschini (Oxford)
- Background: This lecture introduces the learning framework of out-of-sample prediction and excess risk minimization, covering the main probabilistic tools of uniform learning, Rademacher complexity, and concentration inequalities that we will need in later lectures.
- Explicit regularization: This lecture explores the classical setting of explicit regularization, separately investigating the performance of gradient descent methods on the generalization error and the optimization error in fundamental settings involving linear models with different constraint geometries, including applications to ridge regression and Lasso regression.
- Implicit regularization: This lecture investigates recent works on the implicit regularization of first-order methods, showcasing how the Hadamard parametrization of the empirical risk can lead to sparse recovery as well as discussing direct links between the optimization path of iterative solvers with excess risk guarantees via localized notions of Rademacher complexity.
2. Predictive uncertainty estimation for Machine Learning, Maxime Panov (Technology Innovation Institute)
- Basics of confidence estimation, asymptotic confidence intervals, bootstrap.
- Bayesian uncertainty estimation via MCMC, variational inference and other approximate inference methods.
- Distribution free uncertainty estimation: conformal prediction and beyond.
3. Algorithmic fairness in Machine Learning, Evgenii Chzhen (CNRS and Paris Saclay)
After a brief introduction to the problematic of algorithmic fairness, the main focus of the lecture will be regression under the demographic parity constraint.
- Lecture 1. Introduction to basic notions of fairness in classification and regression. Some elementary tools from optimal transport theory in dimension one.
- Lecture 2. Optimal fair regression under the demographic parity constraints: explicit expression and its relation to classification.
- Lecture 3. Relaxation of the demographic parity constraint. A post-processing estimator with statistical guarantees.
4. Training ML models without hyper-parameter tuning – adaptive step-size procedures, Martin Takac (MBZUAI)
- Stochastic Gradient Descent (SGD) with feature scaling and momentum (Adagrad/Adam)
- Polyak step-size for SGD, better scaling using Hutchinson method to lean partial curvature information
- Variance reduced SGD methods (SARAH) and using implicit step-size