## MA 2823 Foundations of Machine Learning (Fall 2016)

This is a course I am teaching at CentraleSupelec, as part of the Engineering Program as well as (this year with the help of Evangelia Zacharaki) the M.Sc. in Data Sciences & Business Analytics.

**Page content:**

+ course description

+ evaluation

+ resources

+ teaching team

+ schedule, slides, homework assignments & solutions

**Quick links:**
Kaggle project (pdf)

## Course description

Machine learning lies at the heart of data science. It is essentially the intersection between statistics and computation, though the principles of machine learning have been rediscovered from many different traditions, including artificial intelligence, Bayesian statistics, and frequentist statistics. In this course, we view machine learning as the automatic learning of a prediction function given a training sample of data (labeled or not).

Machine learning methods form the foundation of many successful companies and technologies in multiple domains. Their applications, to name a few, include search engines, robotics, bioinformatics analyses of genetic data, algorithmic trading, social network analysis, targeted advertising, computer vision, or machine translation.

This course gives an overview of the most important trends in machine learning, with a particular focus on statistical risk and its minimization with respect to a prediction function. A substantial lab section involves group projects on data science competitions and gives students the ability to apply the course theory to real-world problems.

This course is divided in 13 chapters of 1.5 hours each, as well as 9 labs of 1.5 hours each. The first two labs will be dedicated to a tutorial on the scikit-learn library for machine learning in Python. The other labs will give the students the opportunity to apply the course theory to a data science competition.

## Course evaluation

This course will be evaluated through a project report on the How Many Bikes? data science competition, as well as a written exam (on **December 16**).

Detailed instructions for the Kaggle project are available here (pdf).

Last year's exam (with solutions) is available here (pdf).

Grade breakdown (100 pts total):

- Written exam (pen and paper, closed book): 60 pts
- Project report (2 pages): 30 pts
- Homeworks (10 total): 1 pt each, based on turning it in
- Scribe extra credit: 5pts.

## Resources

Lecture slides, homeworks and other supplementary materials will be made available on this website throughout the course.

Lecture notes and labs will be made available on the dedicated github repository.

Instructions for scribes are available here. Detailed instructions on how to fork the repository and make a pull request are available as pdf here.

## Teaching team

**Instructor: **

Chloé-Agathe Azencott `chloe-agathe.azencott@mines-paristech.fr`

**TAs:**

Benoît Playe `benoit.playe@mines-paristech.fr`

Mihir Sahasrabudhe `mihir.sahasrabudhe@centralesupelec.fr`

## Schedule

**Chap 1. Introduction (Sep 7)**

We introduce machine learning, its applications, and various classes of problems.

**Chap 2. Supervised learning (Sep 7)**

We introduce and formalize a core problem of machine learning: supervised learning, in which the data is labeled and the goal is to predict the label of new, unseen data points.

Concepts: classification and regression, hypothesis space, Vapnik-Chervonenkis dimension, probably approximately correct (PAC) learning, overfitting.

[slides (pdf)] [lecture notes] [homework (pdf)]

**Chap 3. Model evaluation and selection (Sep 14)**

We discuss the assessment and evaluation of supervised machine learning models.

Concepts: training and test sets, cross-validation, bootstrap, measures of performance for classification and regression, measures of model complexity.

Lab: Introduction to scientific Python.

[handout (pdf)] [slides (pdf)] [lecture notes] [lab] [homework (pdf)]

**Chap 4. Bayesian decision theory (Sep 21)**

We discuss the quantity to be optimized in statistical estimation, and its various finite sample approximations.

Concepts: Bayes rule, losses and risks, Bayes risk, maximum a posteriori.

Lab: Introduction to scikit-learn.

[handout (pdf)] [slides (pdf)] [lecture notes] [lab] [homework (pdf)]

**Chap 5. Linear and logistic regression (Sep 30)**

We introduce parametric approaches to supervised learning as well as the most simple linear models. We formulate linear regression as a maximum likelihood estimation problem and derive its estimator.

Concepts: parametric methods, maximum likelihood estimates, linear regression, logistic regression.

Lab: Introduction to Kaggle project.

[handout (pdf)] [slides (pdf)] [lecture notes] [lab] [homework (pdf)] [Kaggle project (pdf)] [proof of Gauss-Markov (pdf)]

**Chap 6. Regularized linear regression (Oct 7)**

We introduce the concept of regularization as a means to controlling the complexity of the hypothesis space, and apply it to linear models.

Concepts: Lasso, ridge regression, structured regularization.

Lab: Regularized linear regression + Kaggle project.

[handout (pdf)] [slides (pdf)] [lecture notes] [lab] [homework (pdf)]

**Chap 7. Nearest-neighbors methods (Oct 14)**

We introduce non-parametric methods, whose complexity grows with the size of the data sample. We illustrate them with nearest-neighbors approaches.

Concepts: non-parametric learning, nearest neighbor, k-nearest neighbors, instance-based learning, similarities, Voronoi tesselation, curse of dimensionality.

Lab: Nearest-neighbors methods + Kaggle project.

[handout (pdf)] [slides (pdf)] [lecture notes] [lab] [homework (pdf)]

**Chap 8. Tree-based methods (Nov 4)**

We introduce decision trees, one of the most intuitive supervised learning algorithms, and show how to combine simple classifiers to yield state-of-the-art predictors.

Concepts: decision trees, ensemble methods, boosting, random forests.

Lab: Tree-based methods + Kaggle project.

[handout (pdf)] [slides (pdf)] [lecture notes] [lab] [homework (pdf)]

**Chap 9. Support vector machines (Nov 18)**

We introduce a very popular class of machine learning methods, that has achieved state-of-the-art performance on a wide range of tasks. We derive the support-vector machine from first principles in the case of linearly separable data, extend it to non-separable data, and show how positive-definite kernels can be used to extend the approach to non-linear separating functions.

Concepts: maximum margin, soft-margin SVM, non-linear data mapping, kernel trick, kernels.

Research talk: Beyrem Khalfaoui.

[handout (pdf)] [slides (pdf)] [lecture notes] [homework (pdf)]

**Chap 10. Neural networks (Nov 25)**

We introduce the perceptron algorithm from Rosenblatt (1957), one of the earliest steps towards learning with computers, and discuss its many extensions.

Concepts: perceptrons, multi-layer networks, backpropagation.

Lab: SVMs + Kaggle project.

[handout (pdf)] [slides (pdf)] [lecture notes] [lab] [homework (pdf) due 12-02]

**Chap 11. Dimensionality reduction (Dec 2)**

We discuss how to approach high-dimensional learning problems, and present approaches to reduce this dimension.

Concepts: feature selection, wrapper approaches, principal component analysis, autoencoders.

Lab: Dimensionality reduction + Kaggle project.

[handout (pdf)] [slides (pdf)] [lecture notes] [lab] [homework (pdf) due 12-09]

**Chap 12. Clustering (Dec 9)**

We conclude this course by presenting the most common unsupervised learning problem, that is to say clustering, or how to find groups within data that is given without labels.

Concepts: hierarchical clustering, k-means.

Lab: Kaggle project.