comp24111 machine learning and optimisation
play

COMP24111: Machine Learning and Optimisation Chapter 1: Machine - PowerPoint PPT Presentation

COMP24111: Machine Learning and Optimisation Chapter 1: Machine Learning Basics Dr. Tingting Mu Email: tingting.mu@manchester.ac.uk Outline We are going to learn the following concepts: Machine learning. Unsupervised, supervised,


  1. COMP24111: Machine Learning and Optimisation Chapter 1: Machine Learning Basics Dr. Tingting Mu Email: tingting.mu@manchester.ac.uk

  2. Outline • We are going to learn the following concepts: – Machine learning. – Unsupervised, supervised, reinforcement learning. – Classification. – Regression. 1

  3. Machine Learning • “The goal of machine learning is to make a computer learn just like a baby — it should get better at tasks with experience.” • A machine learning system can be used to – Automate a process – Automate decision making – Extract knowledge from data – Predict future event – Adapt systems dynamically to enable better user experiences – … • How do we build a machine learning system? 2

  4. Machine Learning • Basic idea: – To represent experiences with data . – To covert a task to a parametric model . – To convert the learning quality to an objective function . – To determine the model through optimising an objective function. • Machine learning research builds on optimisaton theory, linear algebra, probability theory … 3

  5. Optimisation • Goal: to find the minimum (or maximum) of a real-valued function by – systematically choosing the values of the function input from an allowed set. – computing the value of the function using the chosen value. ) = ( x + 1) 2 sin y ( ( ) f x , y • We look at the example function , the input x is allowed to be chosen from the set of real numbers between 0 and 3, the input y is allowed to be chosen from the set of real numbers between 0 and 5. ( ) min f x , y – Its minimum is or min f x , y ( ) x ∈ [0,3], subject to 0 ≤ x ≤ 3 y ∈ [0,5] 0 ≤ y ≤ 5 [ x * , y * ] = argmin f x , y ( ) – The chosen input that gives the minimum is x ∈ [0,3], y ∈ [0,5] ( ) ( ) – For maximum case, max f x , y argmax f x , y x ∈ [0,3], x ∈ [0,3], y ∈ [0,5] y ∈ [0,5] 4

  6. Machine Learning in Data Science • Data is recorded on real-world phenomenons. The World is driven by data. – Germany’s climate research centre generates 10 petabytes per year. – Google processes 24 petabytes per day. – PC users crossed over 300 billion videos in August 2014 alone, with an average of 202 videos and 952 minutes per viewer. – There were 223 million credit card purchases in March 2016, with a total value of £12.6 billion in UK. – Photo uploads in Facebook is around 300 million per day. – Approximately 2.5 million new scientific papers are published each year. – … • What might we want to do with that data? – Prediction - what can we predict about this phenomenon? – Description - how can we describe/understand this phenomenon in a new way? • Humans cannot handle manually data in such scale any more. A machine learning system can learn from data and offer insights. 5

  7. Machine learning in A.I. • All of these are subfields of Speech Artificial Recognition Intelligence (A.I.). Speech Robotics • Machine Synthesis learning plays a significant role in A.I. Machine Learning Data Natural Mining, Language Analysis, Processing Engineering Computer Text Mining Vision 6

  8. COMP24111, Machine Learning and School Courses Optimisation COMP61011, Foundations of Machine Learning COMP14112, COMP61021, Modelling and visualization Fundamentals of A.I. of high-dimensional data Speech Recognition COMP34120, AI and Games Speech Robotics Synthesis Machine COMP60711, Learning Data Engineering Data Natural Mining, Language Analysis, Processing Engineering COMP38120, Documents, Computer Text Mining Services and Data on the Web COMP37212, Vision Computer Vision COMP61332, Text Mining 7

  9. Example: Wine Classification • Wine experts identify the grape type by smelling and tasting the wine. • The chemist says that wines derived from different grape types are different in terms of alcohol, malic acid, alcalinity of ash, magnesium, color intensity, etc. • We get the measurements. But, too many numbers … Can build a machine learning system to automate grape type identification! 8

  10. Example: Wine Classification • Task: To identify the grape type of a wine sample based on the measured chemical quantities! Feature v Collecting wine samples for each grape type. Extraction v Characterising each wine sample with 13 chemical features. Experiences feature vectors class labels x 1 = x 1,1 , x 1,2 , x 1,3 , … , x 1,12 , x 1,13 ⎡ ⎤ ⎦ , y 1 = grape type 1 ⎣ x 2 = x 2,1 , x 2,2 , x 2,3 , … , x 2,12 , x 2,13 ⎡ ⎤ ⎦ , y 2 = grape type 2 ⎣ 30 bottles in total, 10 bottles for each tree type, x 3 = x 3,1 , x 3,2 , x 3,3 , … , x 3,12 , x 3,13 ⎡ ⎤ ⎦ , y 3 = grape type 2 ⎣ each bottle is charaterised by 13 features. ! ! x 30 = x 30,1 , x 30,2 , x 30,3 , … , x 30,12 , x 30,13 ⎡ ⎤ ⎦ , y 30 = grape type 1 ⎣ 9

  11. Example: Wine Classification v Design a mathematical model to predict the grape type. The model below is controlled by 14 parameters: w 1 , w 2 , … , w 13 , b [ ] Wine features. Predicted grape type by computer. ⎧ 13 ✔ y 1 ⎪ ∑ type 1, if w i x i + b ≥ 0 ⇒ ˆ bottle 1: x 1 y 1 = g ( x 1 ) ⎪ ✗ y 2 i = 1 ˆ ⇒ ˆ y = g ( x ) = y 2 = g ( x 2 ) ⎨ bottle 2: x 2 ! 13 ⎪ ! ! ! ∑ type 2, if w i x i + b < 0 ⎪ ✔ y 30 ⇒ ˆ y 30 = g ( x 30 ) bottle 30: x 30 ⎩ i = 1 Real grape type. v System training is the process of finding the best model parameters by minimising a loss function. ⎡ ⎤ ( ) * , w 2 * , … , w 13 * , b * O loss w 1 , w 2 , … , w 13 , b w 1 ⎦ = argmin ⎣ w 1 , w 2 , … , w 13 , b Loss: predictive error 10

  12. Example: Wine Classification • Now, given an unseen bottle of wine: ⎧ 13 + b * ≥ 0 ∑ ⎪ w * type 1, if i x i ⎪ i = 1 y = g ( x ) = ˆ ⎨ 13 ⎪ + b * < 0 ∑ w * type 2, if i x i ⎪ ⎩ i = 1 13 Features: x 1 = 12.25, x 2 = 3.88, x 3 = 2.2, x 4 = 18.5, x 5 = 112, x 6 = 1.38, x 7 = 0.78, x 8 = 0.29, x 9 = 1.14, x 10 = 8.21, x 11 = 0.65, x 12 = 2, x 13 = 855 11

  13. Three Ingredients in Machine Learning • “Model” (final product): The thing you have to package up and send to a customer. A piece of code with some parameters that need to be optimised. • “Error function” (performance criterion): The function you use to judge how well the parameters of the model are set. • “Learning algorithm” (training): The algorithm that optimises the model parameters, using the error function to judge how well it is doing. 12

  14. Learning Type: Supervised • In supervised learning, there is a “teacher” who provides a target output for each data pattern. This guides the computer to build a predictive relationship between the data pattern and the target output. • The target output can be a real-valued number, an integer, a symbol, a set of real-valued numbers, a set of integers, or a set of symbols. • A training example (or called sample) is a pair consisting of an input data pattern (or called object) and a target output. • A test example is used to assess the strength and utility of a predictive relationship. Its target output is only used for evaluation purpose, and never contributes to the learning process. • Typical supervised learning tasks include classification and regression . 13

  15. Classification Examples: The target output is a category label. • Medical diagnosis: x=patient data, y=positive/negative of some pathology • Optical character recognition: x=pixel values and writing curves, y=‘A’, ‘B’, ‘C’, … • Image analysis: x=image pixel features, y=scene/objects contained in image • Weather: x=current & previous conditions per location, y=tomorrow’s weather … … … this list can never end, applications of classification are vast and extremely active! 14

  16. Regression Examples: The target output is a continuous number (or a set of such numbers). • Finance: x=current market conditions and other possible side information, y=tomorrow’s stock market price • Social Media: x=videos the viewer is watching on YouTube, y=viewer’s age • Robotics: x=control signals sent to motors, y=the 3D location of a robot arm end effector • Medical Health: x=a number of clinical measurements, y=the amount of prostate specific antigen in the body • Environment: x=weather data, time, door sensors, etc., y=the temperature at any location inside a building … … … this list can never end, applications of regression are vast and extremely active! 15

  17. Successful Applications • Convert speech to text, translate from one language to the other. 16

  18. Successful Applications • Face recognition 17

  19. Successful Applications • Object recognition, speech synthesis, information retrieval. 18

  20. Learning Type: Unsupervised • In unsupervised learning, there is no explicit “teacher”. • The systems form a natural “understanding” of the hidden structure from unlabelled data. • Typical unsupervised learning tasks include – Clustering: group similar data patterns together. – Generative modelling: estimate distribution of the observed data patterns. – Unsupervised representation learning: remove noise, capture data statistics, capture inherent data structure. MATLAB’s example MATLAB’s example 19 From https://cambridge-intelligence.com/keylines-network-clustering/

  21. Successful Applications • Document clustering and visualisation 20

Recommend


More recommend