Machine Learning: Basic Principles Teaching demonstration Kalle Palomäki Department of Signal Processing and Acoustics Aalto University
Content 1. Goal 2. Machine learning: definition 3. Classification – an important machine learning approach 4. A machine learning problem Hands on problem solving Demonstration 5. Summary
Goal Part of introductory sessions adjusted to 20 minutes 4 th year students with no background in machine learning Start building understanding of machine learning by Concrete examples Solving simple hands on problems
Machine learning - definition Wikipedia: “Machine learning deals with the construction and study of systems that can learn from data, rather than follow only explicitly programmed instructions”
Common sense definition: machines that learn a little like the brains http://oldentech.files.wordpress.com/2010/07/1028528_29880053.jpg http://www.paranormalpeopleonline.com/boskop-man-big-brains-and-increased-intelligence/
Internet and machine learning - far beyond the single brains capacity http://www.slate.com/blogs/future_tense/2014/10/24/internet_sleep_new_research_from_usc_shows_internet_activity_changes_in.html
Machine learning categories Supervised learning Classification Unsupervised learning Clustering Reinforcement learning
Classifier
Classifier
Problem Lisa is a tailor... http://ecx.images-amazon.com/images/I/51f9cnKx90L._SY300_.jpg http://upload.wikimedia.org/wikipedia/commons/3/39/Leonardo_da_Vinci_043-mod.jpg
Lisa makes uniforms Salvation army uniforms: men have trousers, women skirts http://www.bilerico.com/2009/03/Army%20Uniforms.jpg
Sometimes she makes mistakes These should be skirts.
Once she made a skirt for prince Charles! http://i.dailymail.co.uk/i/pix/2009/05/21/article-1186234-050B9CB2000005DC-834_224x423.jpg
Waist Hip Waist Hip
Here is Lisa’s data waist (cm) hip (cm) gender 29.6 34.4 Female 28.9 34.4 Female 31.3 34.5 ??? 30.8 33.7 Male 29.8 34.5 ??? 32.5 33.6 Male 30.6 34.4 ??? ..... ..... .......
Missing gender information: Female samples: Red * * * Male samples : Blue
Some help to Lisa? Discuss in pairs 2 min: How would you approach this problem? What kind of algorithm would you design? Try to come up with some ideas please! Use the picture provided to assist your discussion
K-nearest neighbours algorithm 1. Determine K = number of nearest neighbours 2. Calculate the distance between test sample all the training samples Use euclidean distance measure: � �,� � � � ��� 3. Sort the distances and determine nearst neigbours 4. Gather the categories of the nearest neighbors 5. Use the majority voting to predict the test sample class http://people.revoledu.com/kardi/tutorial/KNN/
Missing gender information : Female samples: Red * * * Male samples : Blue
K = 3 Missing gender information: Female samples: Red * * * Male samples : Blue
K-nearest neighbours algorithm 1. Determine K = number of nearest neighbours 2. Calculate the distance between test sample all the training samples Use euclidean distance measure: � �,� � � � ��� 3. Sort the distances and determine nearst neigbours 4. Gather the categories of the nearest neighbors 5. Use the majority voting to predict the test sample class http://people.revoledu.com/kardi/tutorial/KNN/
Euclidean distance Training samples Euclidean distance � �,� � � � ��� Test sample http://people.revoledu.com/kardi/tutorial/KNN/
Euclidean distance Training samples Eucidean distance � �,� � � � ��� Test sample Training sample index http://people.revoledu.com/kardi/tutorial/KNN/
Euclidean distance Data dimension Training samples Eucidean distance � �,� � � � ��� Test sample Training sample index Dimension index http://people.revoledu.com/kardi/tutorial/KNN/
Euclidean distance Data dimension M=2 Training samples Eucidean distance � �,� � � � ��� Test sample Training sample index Dimension index http://people.revoledu.com/kardi/tutorial/KNN/
Female samples of training data Test sample * Euclidean distance: d 1 Male samples of training data
Female samples of training data Test sample * d 2 Male samples of training data
Female samples of training data Test sample d 3 * Male samples of training data
Female samples of training data Test sample d 4 * Male samples of training data
Female samples of training data Test sample * d 5 Male samples of training data
Female samples of training data Test sample * d 6 Male samples of training data
K-nearest neighbours algorithm 1. Determine K = number of nearest neighbours 2. Calculate the distance between test sample all the training samples Use euclidean distance measure: � �,� � � � ��� 3. Sort the distances and determine nearest neigbours 4. Gather the categories of the nearest neighbors 5. Use the majority voting to predict the test sample class http://people.revoledu.com/kardi/tutorial/KNN/
Female samples of training data Test sample * 3 nearest neighbors Male samples of training data
K-nearest neighbours algorithm 1. Determine K = number of nearest neighbours 2. Calculate the distance between test sample all the training samples Use euclidean distance measure: � �,� � � � ��� 3. Sort the distances and determine nearest neigbours 4. Gather the categories of the nearest neighbors 5. Use the majority voting to predict the test sample class http://people.revoledu.com/kardi/tutorial/KNN/
Female samples of training data Test sample * 3 nearest neighbors Male samples of training data All 3 neighbors were Male Class was male
Female samples of training data Test sample 3 nearest neighbors * Male samples of training data
Female samples of training data Test sample 3 nearest neighbors * Male samples of training data 2 neighbors Female 1 neighbor Male More Females than Males Class is Female
Classification problem Lisa has lost gender information of one of her customers, and does not know whether to make skirt or trousers. She is planning to throw a coin. Can you help her to make a better decision? The customer who is missing gender information: Gender ------, Waist 28, Hip 34, waist hip gender (cm) (cm) Male 28 32 Male 33 35 Female 27 33 Female 31 36 http://www.dcs.gla.ac.uk/~srogers/firstcourseml/matlab/chapter5/knnexample.html#1 Molarius A, Seidell JC, Sans S, Tuomilehto J, Kuulasmaa K. (1999) "Waist and hip circumferences, and waist-hip ratio in 19 populations of the WHO MONICA Project", International Journal of Obesity and Related Metabolic Disorders :J. Internat. Association Study Obesity, 23:116-125.
Solution waist hip Gender (cm) (cm) distance 32 (28-28) 2 +(34-32) 2 =4 Male 28 35 (28-33) 2 +(34-35) 2 =26 Male 33 33 (28-27) 2 +(34-33) 2 =2 Female 27 36 (28-31) 2 +(34-36) 2 =13 Female 31 Test sample 28, 34
Solution waist hip Gender (cm) (cm) distance 32 (28-28) 2 +(34-32) 2 =4 Male 28 35 (28-33) 2 +(34-35) 2 =26 Male 33 33 (28-27) 2 +(34-33) 2 =2 Female 27 36 (28-31) 2 +(34-36) 2 =13 Female 31 Test sample 28, 34
Solution waist hip Gender (cm) (cm) Distance rank 32 (28-28) 2 +(34-32) 2 =4 Male 28 2 35 (28-33) 2 +(34-35) 2 =26 Male 33 4 33 (28-27) 2 +(34-33) 2 =2 Female 27 1 36 (28-31) 2 +(34-36) 2 =13 Female 31 3 Test sample 28, 34
Solution belongs to the waist hip neighborhood (Yes or No) Gender (cm) (cm) Distance rank 32 (28-28) 2 +(34-32) 2 =4 Male 28 2 Yes 35 (28-33) 2 +(34-35) 2 =26 Male 33 4 No 33 (28-27) 2 +(34-33) 2 =2 Female 27 1 Yes 36 (28-31) 2 +(34-36) 2 =13 Female 31 3 Yes Test sample 28, 34
Solution belongs to the waist hip neighborhood gender if in (Yes or No) neigborhood Gender (cm) (cm) Distance rank 32 (28-28) 2 +(34-32) 2 =4 Male 28 2 Yes Male 35 (28-33) 2 +(34-35) 2 =26 Male 33 4 No ‐‐‐‐‐ 33 (28-27) 2 +(34-33) 2 =2 Female 27 1 Yes Female 36 (28-31) 2 +(34-36) 2 =13 Female 31 3 Yes Female Male 1 Test sample 28, 34 Female 2 Number of Female > Number of Male Class: Female
Summary • We addressed briefly principles of machine learning 1. First we defined the machine learning 2. Classification as an important machine learning task 3. Solved a hands on problem of classification utilizing K- nearest neighbour algorithm • Check out my website for • These slides • Exercise • The code on the decision border calculations in previous slides http://users.spa.aalto.fi/kpalomak/demonstration_session
What next Supervised learning Classification Unsupervised learning Clustering Reinforcement learning
Face recognition http://cs.nyu.edu/~roweis/data.html
Speech recognition Spectrum over time for “cat” k a t
Searches
Recommend
More recommend