csce 478 878 lecture 2
play

CSCE 478/878 Lecture 2: Supervised Learning Supervised Learning - PowerPoint PPT Presentation

CSCE 478/878 Lecture 2: CSCE 478/878 Lecture 2: Supervised Learning Supervised Learning Stephen Scott Introduction Outline Learning a Stephen Scott Class from Examples Noise and Other (Adapted from Ethem Alpaydin) Problems


  1. CSCE 478/878 Lecture 2: CSCE 478/878 Lecture 2: Supervised Learning Supervised Learning Stephen Scott Introduction Outline Learning a Stephen Scott Class from Examples Noise and Other (Adapted from Ethem Alpaydin) Problems Regression Multi-Class Problems General Steps of Machine Learning sscott@cse.unl.edu 1 / 21

  2. Introduction CSCE 478/878 Lecture 2: Supervised Learning Stephen Scott Introduction Supervised learning is most fundamental, “classic” form of Outline machine learning Learning a Class from “Supervised” part comes from the part of labels for Examples examples (instances) Noise and Other Problems Regression Multi-Class Problems General Steps of Machine Learning 2 / 21

  3. Outline CSCE 478/878 Lecture 2: Learning a class from labeled examples Supervised Learning Definition Stephen Scott Thinking about C Hypotheses and error Introduction Margin Outline Learning a Noise and other problems Class from Examples Noise Noise and Model selection Other Inductive bias Problems Regression Regression Multi-Class Multi-class problems Problems General Steps General steps of machine learning of Machine Learning 3 / 21

  4. Learning a Class from Examples CSCE 478/878 Lecture 2: Let C be the target concept to be learned Supervised Learning Think of C as a function that takes as input an example Stephen Scott (or instance ) and outputs a label Goal: Given a training set X = { ( x t , r t ) } N Introduction t = 1 where r t = C ( x t ) , output a hypothesis h ∈ H that approximates Outline Learning a C in its classifications of new instances Class from Examples Each instance x represented as a vector of attributes or Definitions Thinking about C features Hypotheses and Error E.g., let each x = ( x 1 , x 2 ) be a vector describing Margin attributes of a car; x 1 = price and x 2 = engine power Noise and Other In this example, label is binary (positive/negative, Problems yes/no, 1/0, + 1 / − 1 ) indicating whether instance x is a Regression “family car” Multi-Class Problems General Steps of Machine 4 / 21 Learning

  5. Learning a Class from Examples (cont’d) � CSCE r e w 478/878 o Lecture 2: p Supervised e Learning n i g n Stephen Scott E : 2� � x Introduction Outline Learning a Class from Examples Definitions Thinking about C Hypotheses and Error t� x� Margin 2� Noise and Other Problems Regression Multi-Class t� Problems x� 1� x� : Price� 1� General Steps of Machine 5 / 21 Learning

  6. Thinking about C CSCE 478/878 Lecture 2: Supervised Learning Can think of target concept C as a function Stephen Scott In example, C is an axis-parallel box, equivalent to upper and lower bounds on each attribute Introduction Might decide to set H (set of candidate hypotheses) to Outline the same family that C comes from Learning a Class from Not required to do so Examples Can also think of target concept C as a set of positive Definitions Thinking about C instances Hypotheses and Error Margin In example, C the continuous set of all positive points in Noise and the plane Other Problems Use whichever is convenient at the time Regression Multi-Class Problems General Steps of Machine 6 / 21 Learning

  7. Thinking about C (cont’d) � CSCE r e w 478/878 o Lecture 2: p Supervised e Learning n i g n Stephen Scott E : 2� � x C� Introduction e� 2� Outline Learning a Class from Examples Definitions Thinking about C e� 1� Hypotheses and Error Margin Noise and Other Problems Regression Multi-Class p� p� Problems 1� 2� x� : Price� 1� General Steps of Machine 7 / 21 Learning

  8. Hypotheses and Error CSCE 478/878 A learning algorithm uses training set X and finds a Lecture 2: Supervised hypothesis h ∈ H that approximates C Learning In example, H can be set of all axis-parallel boxes Stephen Scott If C guaranteed to come from H , then we know that a Introduction perfect hypothesis exists Outline In this case, we choose h from the version space = Learning a Class from subset of H consistent with X Examples What learning algorithm can you think of to learn C ? Definitions Thinking about C Can think of two types of error (or loss ) of h Hypotheses and Error Empirical error is fraction of X that h gets wrong Margin Generalization error is probability that a new, randomly Noise and Other selected, instance is misclassified by h Problems Depends on the probability distribution over instances Regression Can further classify error as false positive and false Multi-Class Problems negative General Steps of Machine 8 / 21 Learning

  9. Hypotheses and Error (cont’d) r e CSCE w o 478/878 p Lecture 2: e n Supervised i Learning g n E Stephen Scott : 2 x Introduction Outline Learning a Class from Examples Definitions Thinking about C Hypotheses and Error Margin Noise and Other Problems Regression Multi-Class Problems General Steps of Machine 9 / 21 Learning

  10. Margin CSCE Since we will have many (infinitely?) choices of h , often will 478/878 choose one with maximum margin (min distance to any Lecture 2: Supervised point in X ) Learning r e Stephen Scott w o p e n Introduction i g n E Outline : 2 x Learning a Class from Examples Definitions Thinking about C Hypotheses and Error Margin Noise and Other Problems Regression Multi-Class Problems General Steps Why? of Machine 10 / 21 Learning

  11. Noise and Other Problems CSCE 478/878 Lecture 2: Supervised Learning In reality, it’s unlikely that there exists an h ∈ H that is Stephen Scott perfect on X Introduction Could be noise in the data (attribute errors, labeling Outline errors) Learning a Could be attributes that are hidden or latent , which Class from Examples impact the label but are unobserved Noise and Could find a better (or even perfect) fit to X if we choose Other Problems a more powerful (expressive) hypothesis class H Noise Model Selection Is this a good idea? Inductive Bias Regression Multi-Class Problems General Steps of Machine Learning 11 / 21

  12. Noise and Other Problems (cont’d) CSCE � x� 2 478/878 Lecture 2: Supervised Learning Stephen Scott h� 2� h� Introduction 1� Outline Learning a Class from Examples Noise and Other Problems Noise Model Selection Inductive Bias Regression x� Multi-Class 1� Problems For what reasons might we prefer h 1 over h 2 ? General Steps of Machine Learning 12 / 21

  13. Model Selection CSCE Might prefer simpler hypothesis because it is: 478/878 Easier/more efficient to evaluate Lecture 2: Supervised Easier to train (fewer parameters) Learning Easier to describe/justify prediction Stephen Scott Better fits Occam’s Razor: Tend to prefer simpler Introduction explanation among similar ones Model selection is the act of choosing a hypothesis Outline class H Learning a Class from Need to balance H ’s complexity with that of the model Examples that labels the data: Noise and Other If H not sophisticated enough, might underfit and not Problems generalize well (e.g., fit line to data from cubic model) Noise If H too sophisticated, might overfit and not generalize Model Selection Inductive Bias well (e.g., fit the noise) Regression Can validate choice of h (and H ) if some data held back Multi-Class from X to serve as validation set Problems Still part of training, but not directly used to select h General Steps Independent test set often used to do final evaluation of of Machine Learning chosen h 13 / 21

  14. Inductive Bias CSCE Must assume something about the learning task 478/878 Lecture 2: Supervised Otherwise, learning becomes rote memorization Learning Imagine allowing H to be set of arbitrary functions over Stephen Scott set of all possible instances Introduction Every hypothesis in version space V ⊆ H is consistent Outline with all instances in X Learning a For every other instance, exactly half the hypotheses in Class from V will predict positive, the rest negative (see next slide) Examples ⇒ No way to generalize on new, unseen instances without Noise and Other way to favor one hypothesis over another Problems Noise Inductive bias is a set of assumptions that we make to Model Selection enable generalization over rote memorization Inductive Bias Regression Manifests in choice of H Multi-Class Instead (or in addition), can have bias in preference of Problems some hypotheses over others (e.g., based on specificity General Steps of Machine or simplicity) Learning 14 / 21

  15. Inductive Bias (cont’d) CSCE 478/878 Lecture 2: Supervised Learning Stephen Scott E.g., if X = { ( � 0 , 0 , 0 � , +) , ( � 1 , 1 , 0 � , +) , ( � 0 , 1 , 0 � , − ) , ( � 1 , 0 , 1 � , − ) } then version space V is the set of truth Introduction tables satisfying Outline 000 + 010 − 100 110 + Learning a Class from 001 011 101 − 111 Examples Noise and Since there are 4 holes, |V| = 2 4 = 16 = number of Other Problems ways to fill holes, and for any yet unclassified example Noise Model Selection x , exactly half of hyps in V classify x as + and half as − Inductive Bias Regression Multi-Class Problems General Steps of Machine Learning 15 / 21

Recommend


More recommend