PSL week Spring Course 2019

Large-Scale Machine Learning

March 25-29, 2019

MINES ParisTech, 60 boulevard Saint-Michel, 75006 Paris

This course is co-organized by Chloé-Agathe Azencott (MINES ParisTech & Institut Curie) and Fabien Moutarde (MINES ParisTech). The RAMP is organized by Akin Kazakci.

outline | schedule | registration | grading | textbook | practical sessions

Outline

Machine learning is a fast-growing field at the interface of mathematics, computer science and engineering, which provides computers with the ability to learn without being explicitly programmed, in order to make predictions or take rational actions. From cancer research to finance, natural language processing, marketing or self-driving cars, many fields are nowadays impacted by recent progress in machine learning algorithms that benefit from the ability to collect huge amounts of data and "learn" from them.

The goal of this intensive 5-day course is to present the theoretical foundations and practical algorithms to implement and solve large-scale machine learning and data mining problems, and to expose the students to current applications and challenges of "big data" in science and industry.

Schedule

The course is roughly organized in morning lectures (room L118) and afternoon practical sessions (rooms L117-L119-L120), however there are slight variations. Practical sessions are only open to PSL students who are officially enrolled for the course and taking the course for credit. The RAMP challenge will contribute to your grade.

March 25, 2019

  • 09:30 ­– 12:30 Lecture (room L118): Introduction to large-scale ML & Optimization [slides (pdf)] by Chloé-Agathe Azencott (MINES ParisTech & Institut Curie).
  • 14:00 – 17:30 Practical session (rooms L117-L119-L120): Machine learning with Python.

March 26, 2019

  • 09:30 ­– 12:30 Lecture (room L118): Introduction to deep learning with Convolutional Networks [slides (pdf)] and Unsupervised deep learning of generative models [slides (pdf)] by Fabien Moutarde (MINES ParisTech).
  • 14:00 – 15:30 Practical session (room L118): Deep learning with Python/Keras.
  • 15:30 – 17:30 Practical session (rooms L117-L119-L120): Feature engineering with Python.

March 27, 2019

  • 09:30 ­– 12:30 Lecture (room L118): Large-Scale Natural Language Processing (NLP) [slides (pdf)] by Édouard Grave (Facebook AI Research Paris).
  • 14:00 – 15:30 Practical session (rooms L117-L119-L120): Stochastic gradient descent with Python.
  • 15:30 – 17:30 Practical session (rooms L117-L119-L120): RAMP.

March 28, 2019

March 29, 2019

  • 09:30 ­– 12:30 Practical session (rooms L117-L119-L120): RAMP.
  • 14:00 – 15:30 Exam.

Registration

PSL students must enroll officially through their institutions. The morning lectures are open to all pending on space ability and priority will be given to PSL members. Registration for morning lectures are closed!

Participants are expected to have working knowledge of basic linear algebra, probability, optimization and programming in Python. Ideally, they have had prior exposure to a basic machine learning course, such as the ES2A "Apprentissage artificiel" for MINES ParisTech students.

Grade

60% final written exam; 40% practical session.
Total credits: 2 ECTS.

Textbook

There is no single textbook for this course, but the following resources are relevant:

This course is not an introductory course to machine learning! If you want to learn the basics, or need a refresher, we recommend:

Practical sessions

To get started, clone, fork or download the labs github repository. You do not need to understand what git and github are for this course; nevertheless you'll find more information below.

Personal installation

If you want to do the labs on your own machine, you will need to have Python and all the relevant packages installed. The easiest way to install all the requirements is to install Anaconda. You can test your installation by downloading one or several of the SciPy 2016 notebooks, starting Anaconda then Jupyter, open the notebook(s) and run them. If you prefer, you can also install only the required packages (numpy, scipy, matplotlib, seaborn, pandas, scikit-learn, and Jupyter notebooks) with pip if you already have Python.

Git-what?

GitHub is a web-based repository hosting services, allowing for version control and source code management. GitHub is based on the git version control system. A version control system allows you to manage automatically different versions and draft of a document; in essence, it is the grownup version of lab1finalv2.2_chloe-copy-1.ipynb. You can read more about the benefits of version control here. Git (and GitHub) are widely used in tech nowadays.

GitHub offers both private and public repositories, and supports free accounts for academics. Here is a short tutorial of how to use GitHub to version control your own copy of the labs:

  • Log onto GitHub (start by signing up if you do not have an account)
  • Create a fork of the lsml19 repository. A fork is a copy you own and can experiment with without changing the project. To do so: navigate to https://github.com/chagaz/lsml19 and click "Fork" in the upper right corner.
  • Download and install git if it is not installed on your computer. To do so, follow the official instructions. If you do not know whether Git is installed on your computer, try typing git in a terminal. If it returns a help message, then git is installed. If you'd rather use a graphical interface (a GUI) than the command line, have a look here.
  • Set up git, following these instructions
  • Clone your fork. This means you’ll get a local version on your computer (for now your fork only exists on GitHub’s servers):
    • On the GitHub website, navigate to your fork of the lsml19 repository. Its URL should be something like https://github.com/<yourusername>/lsml19.
    • Click "Clone or download" (on the top right)
    • Copy the URL that was just displayed (should be something like https://github.com/<yourusername>/lsml19.git)
    • In the terminal (I’m assuming Linux/MacOS), navigate to where you want your copy to be. For example, if you want it under Desktop > Courses > 2019, type cd Desktop/Courses/2019.
    • Then type git clone <the URL you just copied>. You should see a message telling you the repository is downloading.

For more on creating forks, see here.

  • On your computer, edit the file you want to make changes to (for example, the first notebook).
  • Push your changes (i.e. send them from your computer to your GitHub account). To do this, from your lsml19 repository, do:
git add <name of the file you edited>
git commit -m "<Short message explaining your modification>"
git push origin master