Mathematics of Machine Learning M2 MAPI3 (Paul Sabatier, 2024-2025)
Lecturer: Clément Lalanne
Overview
This class aims to introduce the main theoretical foundations of modern machine learning, with a focus on supervised learning. Basic knowledge of linear algebra and probability theory (including measure theory) is required. For the last lectures, basic knowledge of functional analysis is recommended.
Evaluation
IMPORTANT PLEASE READ I have been informed that the evaluation methods are determined by the university and cannot be altered. I am deeply sorry about the missleading early information that I gave you. Below are the updated evaluation modalities:
For all students, the final grade will consist of an 80% weight from a 2-hour final exam and 20% from project work. For part-time working students (étudiants en alternance), the project grade will be based on the TP2, which I will evaluate (replacing the previous homework assignment). For other students, the project grade will be split equally: 50% from TP2 and 50% from the actual projects.
The TP2 (which is due for the evaluation) is out.
Survival Kit for the Exam
For the exam, you should be familiar with the following concepts and techniques:
- Decomposing risk as an estimation error (considering the uniform worst-case error over the predictor set), approximation error, and potential other error sources (e.g., optimization errors). You should be able to recognize the different bias and variance terms in this decomposition.
- Basic linear algebra (vectors, linear maps, rank, matrices) and bilinear algebra (eigenvectors, eigenvalues, spectral theorem, basic matrix decompositions).
- Multivariable calculus (Gradients, Hessians, Taylor expansions, higher-order derivative tensors), Convexity, First and Second Order optimality conditions, KKT conditions.
- Basic probability theory in general probability spaces, conditional laws, conditional expectations, and independence.
- Common inequalities: Triangle inequality, convexity, AM-GM inequality, Cauchy-Schwarz, Jensen, Hölder, Minkowski.
- Concentration inequalities: Markov, Bienaymé-Tchebychev and McDarmid's inequality.
- Rademacher complexity: Definition, upper bounding the sup deviations (each side by two times the Rademacher complexity), Lipschitz contraction principle, and typical use cases.
- Kernel methods: Aronszajn's theorem (equivalence between positive kernels and Hilbert space dot products), representer theorem, kernel trick, operations on kernels, and Bochner's theorem.
- More to be continued...
Lectures
TDs / TPs
References and External Resources
Machine Learning and Learning Theory
Optimization for Machine Learning
Measure and Probability Theory
- Intégration, Probabilités et Processus Aléatoires by Jean-François Le Gall, 2006: An excellent introduction to measure theory with elements of stochastic processes and conditional expectation.
- Probabilités 2 by Jean-Yves Ouvrard, 2009: A detailed treatment of advanced topics in probability theory.
- Concentration Inequalities: A Nonasymptotic Theory of Independence by Stéphane Boucheron, Gábor Lugosi, and Pascal Massart, 2013.
Analysis