MA2823: Introductjon to Machine Learning CentraleSuplec Fall 2017 - PowerPoint PPT Presentation

MA2823: Introductjon to Machine Learning CentraleSupélec — Fall 2017 Chloé-Agathe Azencot Centre for Computatjonal Biology, Mines ParisTech chloe-agathe.azencott@mines-paristech.fr

● Course material & contact http://tinyurl.com/ma2823-2017 chloe-agathe.azencott@mines-paristech.fr Slides thanks to Ethem Alpaydi, Matuhew Blaschko, Trevor Hastje, Rob Tibshirani and Jean-Philippe Vert. 2

? What is (Machine) Learning 3

Why Learn? ● Learning : Modifying a behavior based on experience [F. Benureau] ● Machine learning : Programming computers to – Model phenomena – by means of optjmizing an objectjve functjon – using example data. 4

Why Learn? ● There is no need to “learn” to calculate payroll. ● Learning is used when – Human expertjse does not exist (bioinformatjcs); – Humans are unable to explain their expertjse (speech recognitjon, computer vision); – Complex olutjons change in tjme (routjng computer networks). Data Classical program Answers Rules Data Machine learning Rules Answers program 5

What about AI? 6

Artjfjcial Intelligence ML is a subfjeld of Artjfjcial Intelligence – A system that lives in a changing environment must have the ability to learn in order to adapt. – ML algorithms are building blocks that make computers behave more intelligently by generalizing rather than merely storing and retrieving data (like a database system would do). 7

Learning objectjves ● Defjne machine learning ● Given a problem – Decide whether it can be solved with machine learning – Decide as what type of machine learning problem you can formalize it ( unsupervised – clustering , dimension reductjon , supervised – classifjcatjon , regression ?) – Describe it formally in terms of design matrix , features , samples , and possibly target . ● Defjne a loss functjon (supervised settjng) ● Defjne generalizatjon. 8

What is machine learning? ● Learning general models from partjcular examples (data) – Data is (mostly) cheap and abundant; – Knowledge is expensive and scarce. ● Example in retail: From customer transactjons to consumer behavior People who bought “Game of Thrones” also bought “Lord of the Rings” [amazon.com] ● Goal: Build a model that is a good and useful approximatjon to the data. 9

What is machine learning? ● Optjmizing a performance criterion using example data or past experience. ● Role of Statjstjcs: Build mathematjcal models to make inference from a sample. ● Role of Computer Science: Effjcient algorithms to – Solve the optjmizatjon problem; – Represent and evaluate the model for inference. 10

Zoo of ML Problems 11

Unsupervised learning Learn a new representatjon of the data Data! ML algo Data p Images, text, measurements, omics data... X n 12

Dimensionality reductjon Find a lower-dimensional representatjon ML algo Data Data p m Images, text, measurements, omics data... X X n n 13

Dimensionality reductjon Find a lower-dimensional representatjon ML algo Data Data – Reduce storage space & computatjonal tjme – Remove redundances – Visualizatjon (in 2 or 3 dimensions) and interpretability . 14

Clustering Group similar data points together ML algo Data – Understand general characteristjcs of the data; – Infer some propertjes of an object based on how it relates to other objects. 15

Clustering: applicatjons – Customer segmentatjon Find groups of customers with similar buying behaviors. – Topic modeling Groups documents based on the words they contain to identjfy common topics. – Image compression Find groups of similar pixels that can be easily summarized. – Disease subtyping (cancer, mental health) Find groups of patjents with similar pathologies (molecular or symptomes level). 16

Supervised learning Make predictjons ML algo Predictor Data Labels decision function p X y n n 17

Classifjcatjon Make discrete predictjons ML algo Predictor Data Labels 18

Classifjcatjon Make discrete predictjons ML algo Predictor Data Binary classifjcatjon Labels 19

Classifjcatjon Make discrete predictjons ML algo Predictor Data Binary classifjcatjon Labels Multj-class classifjcatjon 20

Classifjcatjon Dog good eater Cat human contact 21

Training set D Dog good eater + Cat - human contact 22

Classifjcatjon: Applicatjons – Face recognitjon Identjfy faces independently of pose, lightjng, occlusion (glasses, beard), make-up, hair style. – Vehicle identjfjcatjon (self-driving cars) – Character recognitjon Read letuers or digits independently of difgerent handwritjng styles. – Sound recognitjon Which language is spoken? Who wrote this music? What type of bird is this? – Spam detectjon – Precision medicine Does this sample come from a sick or healthy person? Will this drug work on this patjent? 23

Regression Make contjnuous predictjons ML algo Predictor Data Labels 24

Regression train occupancy time of day 25

Regression train occupancy time of day 26

Regression: Applicatjons – Click predictjon How many people will click on this ad? Comment on this post? Share this artjcle on social media? – Load predictjon How many users will my service have at a given tjme? – Algorithmic trading What will the price of this share be? – Drug development What is the binding affjnity between this drug candidate and its target? What is the sensibility of the tumor to this drug? 27

Supervised learning settjng features variables p descriptors atuributes data matrix outcome design matrix target label observatjons Binary classifjcatjon: X y samples n data points Multj-class classifjcatjon: Regression: 28

Hypothesis class ● Hypothesis class – The space of possible decision functjons we are considering – Chosen based on our beliefs about the problem 29

Hypothesis class ● Hypothesis class – The space of possible decision functjons we are considering – Chosen based on our beliefs about the problem family car x 2 : Engine power not family car x 1 : Price 30

Hypothesis class ● Hypothesis class – The space of possible decision functjons we are ? considering What shape do you think the discriminant should take? – Chosen based on our beliefs about the problem family car x 2 : Engine power not family car x 1 : Price 31

Hypothesis class ● Hypothesis class – Belief: the decision functjon is a rectangle family car x 2 : Engine power not family car e 2 e 1 x 1 : Price p 1 p 2 32

Loss functjon ● Loss functjon (or cost functjon , or risk ): Quantjfjes how far the decision functjon is from the truth (= oracle ). ● E.g. ? – 33

Loss functjon ● Loss functjon (or cost functjon , or risk ): Quantjfjes how far the decision functjon is from the truth (= oracle ). ● E.g. ? – 34

Loss functjon ● Loss functjon (or cost functjon , or risk ): Quantjfjes how far the decision functjon is from the truth (= oracle ). ● Empirical risk on dataset D 35

Supervised learning: 3 ingredients A good and useful approximatjon ● Chose a hypothesis class ● Parametric methods — e.g. ● Non-parametric methods — e.g. f(x) is the label of the point closest to x. ● Chose a loss functjon L Empirical error: ● Chose an optjmizatjon procedure 36

Generalizatjon A good and useful approximatjon ● It’s easy to build a model that performs well on the training data ● But how well will it perform on new data? ● “Predictjons are hard, especially about the future” — Niels Bohr. – Learn models that generalize well. – Evaluate whether models generalize well. 37

Artjfjcial intelligence Electrical engineering Signal processing Patern recognitjon Engineering Knowledge discovery Optjmizatjon in databases Computer science Data mining Inference Big data Discriminant analysis Business Statjstjcs Data science Inductjon http://www.kdnuggets.com/2016/11/machine-learning-vs-statistics.html 38

Learning objectjves Afuer this course, you should be able to – Identjfy problems that can be solved by machine learning; – Formulate your problem in machine learning terms – Given such a problem, identjfy and apply the most appropriate classical algorithm(s); – Implement some of these algorithms yourself; – Evaluate and compare machine learning algorithms for a partjcular task. 39

Course Syllabus ● Sep 29 1. Introductjon 2. Convex optjmizatjon ● Oct 2 3. Dimensionality reductjon Lab: Principal component analysis + Jupyter, pandas, and scikit-learn. ● Oct 6 4. Model selectjon Lab: Convex optjmizatjon with scipy.optjmize ● Oct 13 5. Bayesian decision theory Lab: Intro to Kaggle challenge ● Oct 20 6. Linear regression Lab: Linear regression 40

● Nov 10 7. Regularized linear regression Lab: Regularized linear regression ● Nov 17 8. Nearest-neighbor approaches Lab: Nearest-neighbor approaches ● Nov 24 9. Tree-based approaches Lab: Tree-based approaches ● Dec 01 10. Support vector machines Lab: Support vector machines ● Dec 08 11. Neural networks Deep learning (Joseph Boyd) + Bioimage informatjcs applicatjons (Peter Naylor) ● Dec 15 12. Clustering Lab: Clustering 41

MA2823: Introductjon to Machine Learning CentraleSuplec Fall 2017 - PowerPoint PPT Presentation

MA2823: Introductjon to Machine Learning CentraleSuplec Fall 2017 Chlo-Agathe Azencot Centre for Computatjonal Biology, Mines ParisTech chloe-agathe.azencott@mines-paristech.fr Course material & contact

Introductjon to SQL Part 1 Single-Table Queries By Michael Hahsler based on slides for CS145

PRESENTATION PLANNER Quickly create a cohesive plan for your next presentatjon. Introductjon:

Linear Programming Chapter 1-2.2 Bjrn Morn 3 Convex Func- 1 Introductjon tjons 2 System

DS 1300 - Introductjon to SQL Part 3 Aggregatjon & other Topics by Michael Hahsler Based

Reaching Agreement: Auctions Contents Introductjon Auctjon Parameters English, Dutch,

Introductjon to SQL Part 2 Multj-table Queries By Michael Hahsler based on slides for CS145

DS 1300 - Introductjon to SQL Part 2 Multj-table Queries By Michael Hahsler based on slides

An Introductjon to Dynamics of Structures Giacomo Boffj

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

ENVIRONMENTAL SCANNING Dr.M. Thenmozhi Professor Department of Management Studies Indian

Business Key Elements and Functions What Is a Business? An individual or organization that

CMA action in the remote gambling sector Overview for industry Overview 1. Our role and

Every Business Carolyn Nye Marketing Manager S&S Worldwide, Inc. Thursday, January 21, 2010

WISR. WISR. - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - +

Money matters & Risk management Anna Lyons - Senior Lawyer, Homeless Law May 2015

Privacy law overview Privacy law overview Engineering & Public Policy Lorrie Faith

Q3 11 Investor Presentation August 23 2011 1 Financial Results August 23 2011

MA2823: Introductjon to Machine Learning CentraleSuplec Fall 2017 - PowerPoint PPT Presentation

MA2823: Introductjon to Machine Learning CentraleSuplec Fall 2017 Chlo-Agathe Azencot Centre for Computatjonal Biology, Mines ParisTech chloe-agathe.azencott@mines-paristech.fr Course material & contact

Introductjon to SQL Part 1 Single-Table Queries By Michael Hahsler based on slides for CS145

PRESENTATION PLANNER Quickly create a cohesive plan for your next presentatjon. Introductjon:

Linear Programming Chapter 1-2.2 Bjrn Morn 3 Convex Func- 1 Introductjon tjons 2 System

DS 1300 - Introductjon to SQL Part 3 Aggregatjon &amp; other Topics by Michael Hahsler Based

Reaching Agreement: Auctions Contents Introductjon Auctjon Parameters English, Dutch,

Introductjon to SQL Part 2 Multj-table Queries By Michael Hahsler based on slides for CS145

DS 1300 - Introductjon to SQL Part 2 Multj-table Queries By Michael Hahsler based on slides

An Introductjon to Dynamics of Structures Giacomo Boffj

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

ENVIRONMENTAL SCANNING Dr.M. Thenmozhi Professor Department of Management Studies Indian

Business Key Elements and Functions What Is a Business? An individual or organization that

CMA action in the remote gambling sector Overview for industry Overview 1. Our role and

Every Business Carolyn Nye Marketing Manager S&amp;S Worldwide, Inc. Thursday, January 21, 2010

WISR. WISR. - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - +

Money matters &amp; Risk management Anna Lyons - Senior Lawyer, Homeless Law May 2015

Privacy law overview Privacy law overview Engineering &amp; Public Policy Lorrie Faith

Q3 11 Investor Presentation August 23 2011 1 Financial Results August 23 2011

DS 1300 - Introductjon to SQL Part 3 Aggregatjon & other Topics by Michael Hahsler Based

Every Business Carolyn Nye Marketing Manager S&S Worldwide, Inc. Thursday, January 21, 2010

Money matters & Risk management Anna Lyons - Senior Lawyer, Homeless Law May 2015

Privacy law overview Privacy law overview Engineering & Public Policy Lorrie Faith