## MA 2823 Introduction to Machine Learning (Fall 2017)

This is a course I am teaching at CentraleSupelec, as part of the Engineering Program.

**Page content:**

+ course description

+ evaluation

+ resources

+ teaching team

+ schedule, slides, homework assignments & solutions

**Please download and read the complete syllabus**.

**Quick links:** project submission link · labs github page · project description · solution homework 10

exam June 2016 · exam December 2016

## Course description

Machine learning lies at the heart of data science. It is essentially the intersection between statistics and computation, though the principles of machine learning have been rediscovered from many different traditions, including artificial intelligence, Bayesian statistics, and frequentist statistics. In this course, we view machine learning as the automatic learning of a prediction function given a training sample of data (labeled or not).

Machine learning methods form the foundation of many successful companies and technologies in multiple domains. Their applications, to name a few, include search engines, robotics, bioinformatics analyses of genetic data, algorithmic trading, social network analysis, targeted advertising, computer vision, or machine translation.

This course gives an overview of the most important trends in machine learning, with a particular focus on statistical risk and its minimization with respect to a prediction function. A substantial lab section will let students apply the course to real-world data. Throughout the course, students will participate in a data science competition.

This course is divided in 13 chapters of 1.5 hours each, as well as 9 computer labs of 1.5 hours each.

## Course evaluation

This course will be evaluated through a project report on a data science competition, as well as a written exam (on **December 22**).

Ten percent of the grade will be awarded based on homework assignments. Homework solutions will be provided on the course's website. Please follow the **instructions to turn in your homework:**

- Homeworks must be submitted electronically as PDF files.
- Files should be named according to the following scheme:
`HW<2-digits homework number>_<LastName>_<FirstName>.pdf`

. Please strip all accents from your name. For instance, my first homework would be called`HW01_Azencott_ChloeAgathe.pdf`

. - Homeworks should be deposited at this URL.

Grade breakdown (100 pts total):

- Written exam (pen and paper, closed book): 60 pts
- Project report (2 pages): 30 pts
- Homeworks (10 total): 1 pt each, based on turning it in.

## Resources

Lecture slides, homework assignments and their solutions, and other supplementary materials will be made available on this website throughout the course. Each week, a printable version of the slides will be made available on the website on the day prior to the lecture.

Labs will be made available on the dedicated github repository.

There is no single textbook corresponding to this course, but the lectures will point to relevant sections of the following books (all available online for free):

- A Course in Machine Learning by Hal Daumé III
- The Elements of Statistical Learning by Hastie, Tibshirani and Friedman
- Learning with Kernels by Schölkopf and Smola
- Convex Optimization by Boyd and Vendenberghe

## Teaching team

**Instructor: **

Chloé-Agathe Azencott `chloe-agathe.azencott@mines-paristech.fr`

**TAs:**

Joseph Boyd `joseph.boyd@mines-paristech.fr`

Benoît Playe `benoit.playe@mines-paristech.fr`

Mihir Sahasrabudhe `mihir.sahasrabudhe@centralesupelec.fr`

## Schedule

**Chap 1. Introduction (Fr, Sep 29)**

We introduce machine learning, its applications, and various classes of problems.

Concepts: classification and regression, supervised and unsupervised learning, generalization.

**Chap 2. Notions of convex optimization (Fr, Sep 29)**

We introduce notions of convex optimization that will be useful throughout the course.

Concepts: quadratic optimization, quadratic optimization with constraints, Lagrange multipliers, gradient descent.

[slides (pdf)] [homework (pdf, with solution)]

**Chap 3. Dimensionality reduction (Mo, Oct 2)**

We discuss how to approach high-dimensional learning problems, and present approaches to reduce this dimension.

Concepts: feature selection, wrapper approaches, principal component analysis.

**Lab:** Dimensionality reduction.

[handout (pdf)] [slides (pdf)] [lab] [homework (pdf, with solution)]

**Chap 4. Model evaluation and selection (Fr, Oct 10)**

We discuss the assessment and evaluation of supervised machine learning models.

Concepts: training and test sets, cross-validation, bootstrap, measures of performance for classification and regression, measures of model complexity.

**Lab:** Introduction to scipy.optimize.

[handout (pdf)] [slides (pdf)] [lab] [homework (pdf, with solution)]

**Chap 5. Bayesian decision theory (Fr, Oct 13)**

We discuss the quantity to be optimized in statistical estimation, and its various finite sample approximations.

Concepts: Bayes rule, losses and risks, Bayes risk, maximum a posteriori.

**Lab:** Cross-validation, Naive Bayes, and Kaggle project.

[handout (pdf)] [slides (pdf)] [lab] [homework (pdf, with solution)]

Instructions for the KaggleInClass project [pdf]

**Chap 6. Linear and logistic regression (Fr, Oct 20)**

We introduce parametric approaches to supervised learning as well as the most simple linear models. We formulate linear regression as a maximum likelihood estimation problem and derive its estimator.

Concepts: parametric methods, maximum likelihood estimates, linear regression, logistic regression.

**Lab:** Linear regression.

[handout (pdf)] [slides (pdf)] [lab] [homework (pdf, with solution)]

**Chap 7. Regularized linear regression (Fr, Nov 10)**

We introduce the concept of regularization as a means to controlling the complexity of the hypothesis space, and apply it to linear models.

Concepts: Lasso, ridge regression, structured regularization.

**Lab:**Regularized linear regression.

[handout (pdf)] [slides (pdf)] [lab] [homework (pdf, with solution)]

**Chap 8. Nearest-neighbors methods (Fr, Nov 17)**

We introduce non-parametric methods, whose complexity grows with the size of the data sample. We illustrate them with nearest-neighbors approaches.

Concepts: non-parametric learning, nearest neighbor, k-nearest neighbors, instance-based learning, similarities, Voronoi tesselation, curse of dimensionality.

**Lab:** Nearest-neighbors methods.

[handout (pdf)] [slides (pdf)] [lab] [homework (pdf, with solution)]

**Chap 9. Tree-based methods (Fr, Nov 24)**

We introduce decision trees, one of the most intuitive supervised learning algorithms, and show how to combine simple classifiers to yield state-of-the-art predictors.

Concepts: decision trees, ensemble methods, boosting, random forests.

**Lab:** Tree-based methods.

[handout (pdf)] [slides (pdf)] [lab] [homework (pdf, with solution)]

**Chap 10. Support vector machines (Fr, Dec 1)**

We introduce a very popular class of machine learning methods, that has achieved state-of-the-art performance on a wide range of tasks. We derive the support-vector machine from first principles in the case of linearly separable data, extend it to non-separable data, and show how positive-definite kernels can be used to extend the approach to non-linear separating functions.

Concepts: maximum margin, soft-margin SVM, non-linear data mapping, kernel trick, kernels.

**Lab:** SVMs.

[handout (pdf)] [slides (pdf)] [lab] [homework (pdf, with solution)]

**Chap 11. Neural networks (Fr, Dec 8)**

We introduce the perceptron algorithm from Rosenblatt (1957), one of the earliest steps towards learning with computers, and discuss its many extensions.

Concepts: perceptrons, multi-layer networks, backpropagation.

**Demo:** Deep learning (by Joseph Boyd).

**Research talk** by Peter Naylor.

[handout (pdf)] [slides (pdf)] [TensorFlow Tutorial (J. Boyd)] [homework (pdf, with solution)]

**Chap 12. Clustering (Dec 15)**

We conclude this course by presenting the most common unsupervised learning problem, that is to say clustering, or how to find groups within data that is given without labels.

Concepts: hierarchical clustering, k-means, DBSCAN.