LSML 24: Large Scale Machine Learning
!!!! [PSL week|http://pslweek.fr] Spring Course 2024
!!!! C2MINES-07 Large-Scale Machine Learning
!!!! March 4-8, 2024
!!!! Mines Paris, 60 boulevard Saint-Michel, 75006 Paris Room L.109
This course is co-organized by [Chloé-Agathe Azencott|http://cazencott.info] (MINES ParisTech & Institut Curie) and [Fabien Moutarde|http://perso.mines-paristech.fr/fabien.moutarde/] (MINES ParisTech).
[outline|http://cazencott.info/index.php/pages/LSML-24-Large-Scale-Machine-Learning#outline] |
[schedule|http://cazencott.info/index.php/pages/LSML-24-Large-Scale-Machine-Learning#schedule] |
[grading|http://cazencott.info/index.php/pages/LSML-24-Large-Scale-Machine-Learning#grade] |
[textbook|http://cazencott.info/index.php/pages/LSML-24-Large-Scale-Machine-Learning#textbook] |
[practical sessions|http://cazencott.info/index.php/pages/LSML-24-Large-Scale-Machine-Learning#practicals]
!!!! Outline
~outline~
Machine learning is a fast-growing field at the interface of mathematics, computer science and engineering, which provides computers with the ability to learn without being explicitly programmed, in order to make predictions or take rational actions. From cancer research to finance, natural language processing, marketing or self-driving cars, many fields are nowadays impacted by recent progress in machine learning algorithms that benefit from the ability to collect huge amounts of data and "learn" from them.
The goal of this intensive 5-day __advanced__ course is to present the theoretical foundations and practical algorithms to implement and solve __large-scale__ machine learning and data mining problems, and to expose the students to current applications and challenges of "big data" in science and industry.
__Prerequisites:__
* Numerical Python (ie familiarity with programming in Python and the numpy, scipy, matplotlib librairies).
* Basics of machine learning (such as the content of the Apprentissage Artificiel course for Mines Paris – PSL students).
!!!! Schedule
~schedule~
__Monday, March 4, 2024__
*__09:00 – 12:15__ Lecture: __Introduction to large-scale ML & optimization__ (K. Antonenko, CBIO Mines Paris – PSL) \[[slides (pdf)|http://cazencott.info/dotclear/public/lectures/lsml24/2024-03-04_Antonenko_Lecture1.pdf]\].
*__13:45 – 17:00__ Practical session: __ML on large data with scikit-learn__; this session will also contain an introduction to scikit-learn for those who have not used the library before.
__Tuesday, March 5, 2024__
*__09:00 – 12:15__ Lecture: __Deep unsupervised learning and generative models __ (B. Sauvalle, CAOR Mines Paris – PSL) \[[slides (pdf)|http://cazencott.info/dotclear/public/lectures/lsml24/2024-03-05_Sauvalle_Lecture2.pdf]\].
*__13:45 – 17:00__ Practical session: __Deep learning, autoencoders and GANs with Python__.
__Wednesday, March 6, 2024__
*__09:00 – 12:15__ Lecture: __Natural Language Processing (NLP) with Recurrent Neural Networks and Transformers__ (A. Recanati, Sancare) \[[slides (pdf)|http://cazencott.info/dotclear/public/lectures/lsml24/2024-03-06_Recanati_Lecture3.pdf]\].
*__13:45 – 17:00__ Practical session: __NLP: word embeddings and RNNs__.
__Thursday, March 7, 2024__
*__09:00 – 12:15__ Practical session: __Stochastic Gradient Descent__.
*__13:45 – 17:00__ Lecture: __Systems for large-scale ML: focus on MapReduce__ (C.-A. Azencott, CBIO Mines Paris – PSL) \[[slides (pdf)|http://cazencott.info/dotclear/public/lectures/lsml24/2024-03-07_Azencott_Lecture4.pdf]\].
__Friday, March 8, 2024__
*__09:00 – 12:15__ Lecture: __Deep reinforcement learning__ (F. Moutarde, CAOR Mines Paris – PSL).
*__13:45 – 17:00__ Practical session: __Deep reinforcement learning with Python__.
""All course materials will be in English but some lectures will be given in French.""
!!!! Grade
~grade~
If you are taking this class for credit (PASS/FAIL), you will be ask to turn in the notebooks of all your practical sessions.%%%
Total credits: 2 ECTS.
!!!! Practical sessions
~practicals~
Practical sessions will take the form of Jupyter notebooks on the ""[course github repo|https://github.com/chagaz/lsml24]"".
Please follow the instructions there to ""install Python3 and all the relevant packages."" An alternative (sometimes preferable for deep learning notebooks) is to use [Google Colab|https://colab.research.google.com/notebooks/intro.ipynb], for which you will need a Google account.
TAs: Alice Blondel (CBIO Mines Paris – PSL), Amandine Brunetto (CAOR Mines Paris – PSL), Simon de Moreau (CAOR Mines Paris – PSL), Waël Doulzami (CAOR Mines Paris – PSL), Gwenn Guichaoua (CBIO Mines Paris – PSL).
!!!! Textbook
~textbook~
There is no single textbook for this course, but the following resources are relevant:
* [Mining of massive datasets|http://www.mmds.org] by Leskovec, Rajaraman and Ullman;
* [Deep learning|http://www.deeplearningbook.org] by Goodfellow, Bengio and Courville;
* [Large-Scale Optimization: Beyond Stochastic Gradient Descent and Convexity|https://learn.microsoft.com/en-us/events/neural-information-processing-systems-conference-nips-2016/large-scale-optimization-beyond-stochastic-gradient-descent-convexity] by Sra and Bach.
This course is __not__ an introductory course to machine learning! If you want to learn the basics, or need a refresher, we recommend:
* In French, the lectures of the [Parcours Data Scientist|https://openclassrooms.com/fr/paths/164-data-scientist] sur OpenClassrooms (vidéos et textes en accès libre);
* In French, [Introduction au Machine Learning|https://cazencott.info/index.php/pages/Introduction-au-Machine-Learning]. Chloé-Agathe Azencott, Collection InfoSup, Dunod, 2022;
* In French, [Apprentissage statistique supervisé|https://www.techniques-ingenieur.fr/base-documentaire/42659210-big-data/download/h5010/apprentissage-statistique-supervise.html] by Fabien Moutarde in Techniques de l'Ingénieur;
* In English, [Machine learning by Andrew Ng|https://fr.coursera.org/learn/machine-learning] on Coursera;
* In English, [The elements of statistical learning|https://statweb.stanford.edu/~tibs/ElemStatLearn/] by Hastie, Tibshirani and Friedman;
* In English, [Pattern recognition and machine learning|http://www.springer.com/us/book/9780387310732] by Bishop.