lecture outline
play

Lecture outline Introduction to the course Introduction to Machine - PowerPoint PPT Presentation

1 Introduction to Machine Learning Lecture 1: Introduction and Linear Regression Iasonas Kokkinos Iasonas.kokkinos@gmail.com University College London 2 Lecture outline Introduction to the course Introduction to Machine Learning Least


  1. 1 Introduction to Machine Learning Lecture 1: Introduction and Linear Regression Iasonas Kokkinos Iasonas.kokkinos@gmail.com University College London

  2. 2 Lecture outline Introduction to the course Introduction to Machine Learning Least squares

  3. 3 Machine Learning Principles, methods, and algorithms for learning and prediction based on past evidence Goal: Machines that perform a task based on experience, instead of explicitly coded instructions Why? • Crucial component of every intelligent/autonomous system • Important for a system’s adaptability • Important for a system’s generalization capabilities • Attempt to understand human learning

  4. 4 Machine Learning variants • Supervised – Classification – Regression • Unsupervised – Clustering – Dimensionality Reduction • Weakly supervised/semi-supervised Some data supervised, some unsupervised • Reinforcement learning Supervision: sparse reward for a sequence of decisions

  5. 5 Classification • Based on our experience, should we give a loan to this customer? – Binary decision: yes/no Decision boundary

  6. 6 Classification examples • Digit Recognition • Spam Detection • Face detection

  7. 7 `Faceness function’: classifier Decision boundary Background Face

  8. 8 Test time: deploy the learned function • Scan window over image – Multiple scales – Multiple orientations • Classify window as either: – Face – Non-face Face Window Classifier Non-face

  9. 9 Machine Learning variants • Supervised – Classification – Regression • Unsupervised – Clustering – Dimensionality Reduction • Weakly supervised Some data supervised, some unsupervised • Reinforcement learning Supervision: reward for a sequence of decisions

  10. 10 Regression • Output: Continuous – E.g. price of a car based on years, mileage, condition, …

  11. 11 Computer vision example • Human estimation: from image to vector-valued pose estimate

  12. 12 Machine Learning variants • Supervised – Classification – Regression • Unsupervised – Clustering – Dimensionality Reduction • Weakly supervised Some data supervised, some unsupervised • Reinforcement learning Supervision: reward for a sequence of decisions

  13. 13 Clustering • Break a set of data into coherent groups – Labels are `invented’

  14. 14 Clustering examples • Spotify recommendations

  15. 15 Clustering examples • Image segmentation

  16. 16 Machine Learning variants • Supervised – Classification – Regression • Unsupervised – Clustering – Dimensionality Reduction • Weakly supervised Some data supervised, some unsupervised • Reinforcement learning Supervision: reward for a sequence of decisions

  17. 17 Dimensionality reduction & manifold learning • Find a low-dimensional representation of high-dimensional data – Continuous outputs are `invented’

  18. 18 Example of nonlinear manifold: faces Average of two faces is not a face 1 2( x 1 + x 2 ) x 2

  19. 19 Moving along the learned face manifold Trajectory along the “male” dimension Trajectory along the “young” dimension Lample et. al. Fader Networks, NIPS 2017

  20. 20 Machine Learning variants • Supervised – Classification – Regression • Unsupervised – Clustering – Dimensionality Reduction • Weakly supervised/semi supervised Partially supervised • Reinforcement learning Supervision: reward for a sequence of decisions

  21. 21 Weakly supervised learning: only part of the supervision signal Supervision signal: “motorcycle” Inferred localization information

  22. 22 Weakly supervised learning: only part of the supervision signal Supervision signal: “motorcycle” Inferred localization information

  23. 23 Semi-supervised learning: only part of the data labelled Labelled data Labelled + unlabelled data

  24. 24 Machine Learning variants • Supervised – Classification – Regression • Unsupervised – Clustering – Dimensionality Reduction • Weakly supervised/semi supervised learning Some data supervised, some unsupervised • Reinforcement learning Supervision: reward for a sequence of decisions

  25. 25 Reinforcement learning • Agent interacts with environment repeatedly – Take actions, based on state – (occasionally) receive rewards – Update state – Repeat • Goal: maximize cumulative reward

  26. 26 Reinforcement learning examples • Beat human champions in games Backgammon, 90’s GO, 2015 • Robotics

  27. 27 Focus of first part: supervised learning • Supervised – Classification – Regression • Unsupervised – Clustering – Dimensionality Reduction, Manifold Learning • Weakly supervised Some data supervised, some unsupervised • Reinforcement learning Supervision: reward for a sequence of decisions

  28. 28 Classification: yes/no decision

  29. 29 Regression: continuous output

  30. 30 What we want to learn: a function • Input-output mapping y = f w ( x )

  31. 31 What we want to learn: a function • Input-output mapping method prediction y = f w ( x ) Input parameters

  32. 32 What we want to learn: a function method prediction y = f w ( x ) Input parameters x ∈ R Calculus x ∈ R D Vector calculus Machine learning: can work also for discrete inputs, strings, trees, graphs, …

  33. 33 What we want to learn: a function method prediction y = f w ( x ) Input parameters y ∈ { 0 , 1 } Classification: y ∈ R Regression:

  34. 34 What we want to learn: a function method prediction y = f w ( x ) Linear classifiers, neural networks, decision trees, ensemble models, probabilistic classifiers, …

  35. 35 Example of method: K-nearest neighbor classifier X X X (a) 1-nearest neighbor (b) 2-nearest neighbor (c) 3-nearest neighbor – Compute distance to other training records – Identify K nearest neighbors – Take majority vote

  36. 36 Training data for NN classifier (in R 2 )

  37. 37 1-nn classifier prediction (in R 2 )

  38. 38 3-nn classifier prediction

  39. 39 Method example: decision tree Machine learning: can work also for discrete inputs, strings, trees, graphs, …

  40. 40 Method example: decision tree

  41. 41 Method example: decision tree What is the depth of the decision tree for this problem?

  42. 42 Method example: linear classifier Feature coordinate j Feature coordinate i

  43. 43 Method example: neural network

  44. 44 Method example: neural network

  45. 45 Method example: neural network

  46. 46 We have two centuries of material to cover! https://en.wikipedia.org/wiki/Least_squares The first clear and concise exposition of the method of least squares was published by Legendre in 1805. The technique is described as an algebraic procedure for fitting linear equations to data and Legendre demonstrates the new method by analyzing the same data as Laplace for the shape of the earth. The value of Legendre's method of least squares was immediately recognized by leading astronomers and geodesists of the time

  47. 47 What we want to learn: a function • Input-output mapping method prediction y = f w ( x ) = f ( x ; w ) Input parameters w ∈ R w ∈ R K

  48. 48 Assumption: linear function y = f w ( x ) = f ( x , w ) = w T x Inner product: D X w T x = h w , x i = w d x d d =1 x ∈ R D , w ∈ R D

  49. 49 Reminder: linear classifier ⋅ + ≥ x positive : x w b 0 Feature coordinate j i i ⋅ + < x negative : x w b 0 i i Each data point has a class label: +1 ( ) y t = -1 ( ) Feature coordinate i

  50. 50 Question: which one? ⋅ + ≥ x positive : x w b 0 Feature coordinate j i i ⋅ + < x negative : x w b 0 i i Each data point has a class label: +1 ( ) y t = -1 ( ) Feature coordinate i

  51. 51 Linear regression in 1D

  52. 52 Linear regression in 1D Training set: input–output pairs S = { ( x i , y i ) } , i = 1 . . . , N x i ∈ R , y i ∈ R

  53. 53 Linear regression in 1D y i = w 0 + w 1 x i 1 + ✏ i = w 0 x i 0 + w 1 x i 1 + ✏ i , x i ∀ i 0 = 1 , = w T x i + ✏ i

  54. 54 Sum of squared errors criterion y i = w T x i + ✏ i Loss function: sum of squared errors N X ( ✏ i ) 2 L ( w ) = i =1 Expressed as a function of two variables: N �⇤ 2 X y i − w 0 x i 0 + w 1 x i ⇥ � L ( w 0 , w 1 ) = 1 i =1 Question: what is the best (or least bad) value of w? Answer: least squares

  55. 55 Calculus 101 f ( x ) x ∗ x

  56. 56 Calculus 101 f ( x ) x ∗ x x ∗ = argmax x f ( x )

  57. 57 Condition for maximum: derivative is zero f ( x ) x ∗ x x ∗ = argmax x f ( x )

  58. 58 Condition for maximum: derivative is zero f ( x ) x ∗ x x ∗ = argmax x f ( x ) f 0 ( x ⇤ ) = 0 →

  59. 59 Condition for minimum: derivative is zero x ∗ = argmin x f ( x ) f 0 ( x ⇤ ) = 0 →

  60. 60 Vector calculus 101 " # ∂ f f ( x ) ∂ x 1 f ( x ) = c r f ( x ) = ∂ f ∂ x 2 2D function graph isocontours gradient field r f ( x ) = 0 at minimum of function:

Recommend


More recommend