Mathematics of Machine Learning Projects M2 MAPI3 (Toulouse University, 2025-2026)

Lecturers : Clément Lalanne

Overview

This course is designed to provide hands-on experience and theoretical understanding in the field of machine learning. Students will work in groups of 3 or 4 on either practical or theoretical projects, which will be presented to the class.

Important

Send me an email with your chosen project and the members of your group (one email per group) with the following object : [MAPI3 2025 ML PROJECTS]

Evaluation

Practical Projects: Choose a challenge from Challenge Data and propose a solution using machine learning techniques. Evaluation will be based on:
- Your 15 minutes presentation AND code.
- Problem-Solving Skills: Effectiveness in addressing the chosen challenge.
- Technical Implementation: Quality and rigor of the machine learning methods used.
- Presentation Quality: Clarity, organization, and accessibility of the presentation to classmates.
- Engagement: Ability to engage the audience and facilitate understanding.
- Mathematical Highlight: Include at least one mathematical highlight to emphasize important mathematical concepts or techniques relevant to the project.
Theoretical Projects: Explore an advanced ML topic, analyze recent research, and present your findings. Evaluation will be based on:
- Your 15 minutes presentation and other resources that you wish to share (such as code).
- Depth of Understanding: Thoroughness in exploring the topic and current research.
- Critical Analysis: Quality of analysis and critique of different approaches.
- Clarity of Presentation: Logical structure, use of visual aids, and accessibility to classmates.
- Presentation Quality: Ability to explain complex concepts in an understandable manner.
- Engagement: Effectiveness in engaging the audience and fostering interest in the topic.
- Mathematical Highlight: Include at least one mathematical highlight to emphasize important mathematical concepts or techniques relevant to the topic.

In Class Sessions

22-09-2025 10h -> 12h U3 106
19-11-2025 9h -> 12h 3A G33

Students - Projects mapping

Theoretical Projects

If you want to work on a subject that is not listed, ask me in advance.

Langevin Sampling and Denoising Autoencoders

Overdamped Langevin dynamics targets a density π(x) ∝ e^{-U(x)} via the SDE dX_t = -∇U(X_t)dt + √2 dW_t, with practical discretizations such as ULA/MALA and SGLD. Denoising autoencoders (DAEs) learn to predict clean data from noisy inputs; the denoising vector field approximates the score ∇ log p(x), linking DAEs to score matching and Langevin sampling via Tweedie’s formula.

Starter references: Welling & Teh (2011) Bayesian Learning via Stochastic Gradient Langevin Dynamics; Vincent (2011) A Connection Between Score Matching and Denoising Autoencoders.
Variational Autoencoders (VAE)

VAEs posit a latent-variable model p_θ(x,z)=p(z)p_θ(x|z) and optimize the evidence lower bound (ELBO) using the reparameterization trick for low-variance gradients. Core topics include the ELBO–KL decomposition, amortized inference, posterior collapse, and expressivity via richer posteriors (normalizing flows) and tighter bounds (IWAE). Theoretical angles involve identifiability, variational gaps, and bits-back coding interpretations.

Starter references: Kingma & Welling (2014) Auto-Encoding Variational Bayes.
Score-Based Diffusion Models

Score-based generative modeling learns the score ∇_x log p_t(x) across noise levels and samples via reverse-time SDEs or probability-flow ODEs. The framework unifies denoising score matching, annealed Langevin dynamics, and continuous-time diffusion, with precise links to Fokker–Planck equations and Girsanov’s theorem. Key questions include consistency of score estimators, stability of reverse SDE solvers, and likelihood computation.

Starter references: Hyvärinen (2005/2007) Score Matching; Song & Ermon (2019) Generative Modeling by Estimating Gradients of the Data Distribution; Song et al. (2021) Score-Based Generative Modeling through SDEs; Song et al. (2020) Improved Techniques for Training Score-Based Generative Models.
Flow Matching (CNFs, CFM, Rectified Flows)

Flow Matching trains continuous normalizing flows by regressing a time-dependent velocity field along a chosen probability path between a simple base distribution and the data, avoiding ODE simulation during training. It connects to diffusion via the probability-flow ODE and admits practical variants: conditional/generalized flow matching for conditional tasks and rectified flows that learn near-straight trajectories for fast sampling. Typical mathematical highlights include deriving the FM objective from continuity equations and analyzing path choices (Gaussian vs. OT/displacement interpolation) and their impact on sample complexity and stability.

Starter references: Lipman et al., “Flow Matching for Generative Modeling” (2022) ; Gagneux et al., “Improving & A Visual Dive into Conditional Flow Matching (2025) .
Denoising Diffusion Models (Discrete Setting)

For categorical/sequential data, the forward process is a time-inhomogeneous Markov chain (e.g., multinomial corruption) that gradually randomizes symbols; the reverse model learns transition probabilities back to data. Design issues include choosing the corruption family (absorbing vs non-absorbing), parameterizing reverse kernels, and training via variational bounds or score-style objectives on simplices. Applications span text, protein sequences, and graphs.

Starter references: Austin et al. (2021) Structured Denoising Diffusion Models in Discrete State Spaces (D3PM).
Autoregressive Data Generation: RNNs and LSTMs

Autoregressive models factorize p(x) into products of conditionals (e.g., language models) and learn to predict the next token. Recurrent networks capture long-range dependencies through hidden states; LSTMs/GRUs mitigate vanishing gradients via gating mechanisms. Theoretical directions include expressivity, mixing properties of the induced Markov chains, and generalization under teacher forcing vs free-running.

Starter references: Hochreiter & Schmidhuber (1997) Long Short-Term Memory; Bengio et al. (2003) A Neural Probabilistic Language Model; Mikolov et al. (2010) RNN-Based Language Models.
Autoregressive Data Generation: Transformer Architecture

Transformers replace recurrence with attention, enabling parallelizable sequence modeling with strong inductive biases for long-range dependencies. In the autoregressive regime, causal masking yields powerful language models whose performance scales predictably with data, model size, and compute. Theory topics include attention expressivity, context length extrapolation, and scaling/compute optimality.

Starter references: Vaswani et al. (2017) Attention Is All You Need.

Mathematics of Machine Learning Projects M2 MAPI3 (Toulouse University, 2025-2026)

Overview

Important

Evaluation

In Class Sessions

Students - Projects mapping

Theoretical Projects

Langevin Sampling and Denoising Autoencoders

Variational Autoencoders (VAE)

Score-Based Diffusion Models

Flow Matching (CNFs, CFM, Rectified Flows)

Denoising Diffusion Models (Discrete Setting)

Autoregressive Data Generation: RNNs and LSTMs

Autoregressive Data Generation: Transformer Architecture