introduction to big data and machine learning
play

Introduction to Big Data and Machine Learning Classification Dr. - PowerPoint PPT Presentation

Introduction to Big Data and Machine Learning Classification Dr. Mihail September 19, 2019 (Dr. Mihail) Intro Big Data September 19, 2019 1 / 3 Linear models for classification Goal Goal of classification: take an input vector x and assign


  1. Introduction to Big Data and Machine Learning Classification Dr. Mihail September 19, 2019 (Dr. Mihail) Intro Big Data September 19, 2019 1 / 3

  2. Linear models for classification Goal Goal of classification: take an input vector x and assign it to one of K discrete classes C k where k = 1 , . . . , K The input space is therefore divided into decision regions whose boundaries are called “ decision bounderies ” or “decision surfaces” Here, we will consider linear models where the decision boundaries are linear functions of the input vector “x” and hence are definded by D − 1-dimensional hyperplanes within the D -dimensional input space Data sets that can be separated exactly by linear decision surfaces are said to be “ linearly separable ” (Dr. Mihail) Intro Big Data September 19, 2019 2 / 3

  3. Probabilistic models For probabilistic models, the most convenient, in the case of two-class problems is the binary representation, in which there is a single target variable t = { 0 , 1 } For K > 2 classes, it is convenient to use 1 − of − K coding scheme, in which t is a vector of length K such that if the class is C j , then all elements of t k are zero exacept t j . For instance if we have 5 classes, the a patter from class 2 would be given by the target vector t = (0 , 1 , 0 , 0 , 0) T (Dr. Mihail) Intro Big Data September 19, 2019 3 / 3

  4. Using Bayes Theorem Model posterior class conditional probability: p ( C k | x ) = p ( x |C k ) p ( C k ) p ( x ) Notice the denominator is not a function of C Prior class distribution: p ( C k ) Class conditional density: p ( x |C k ) (Dr. Mihail) Intro Big Data September 19, 2019 4 / 3

  5. Discriminative models Discriminative model P ( c | x ) To train a discriminative classifier, all training examples of different classes must be jointly used to build up a single discriminative classifier Output K probabilities for K class labels in probabilistic classifiers, while a single label is produced by non-probabilistic classifier (Dr. Mihail) Intro Big Data September 19, 2019 5 / 3

  6. Discriminative classifier (Dr. Mihail) Intro Big Data September 19, 2019 6 / 3

  7. Generative classifier P ( x | c ), c = c 1 , . . . , c K , x = ( x 1 , . . . , x n ) K probabilistic models have to be trained independently Each is trained on only the examples of the same label Output K probabilities for a given input with K models “Generative” means that model can produce data via distribution sampling (Dr. Mihail) Intro Big Data September 19, 2019 7 / 3

  8. Generative classifier (Dr. Mihail) Intro Big Data September 19, 2019 8 / 3

  9. Maximum a-posteriori (MAP) For an input x , find the largest one from K probabilities output by a discriminative probabilistic classifier P ( c 1 | x ) , . . . , P ( c K | x ) Assign x to label c ∗ if P ( c ∗ | x ) is the largest Generative classification with the MAP rule: P ( c i | x ) = P ( x | c i ) P ( c i ) ∝ P ( x | c i ) P ( c i ) (1) P ( x ) (Dr. Mihail) Intro Big Data September 19, 2019 9 / 3

  10. Na¨ ıve Bayes Bayes classification P ( c | x ) ∝ P ( x | c ) P ( c ) = P ( x 1 , . . . , x n | c ) P ( c ) (2) for c = c 1 , . . . , c K (Dr. Mihail) Intro Big Data September 19, 2019 10 / 3

  11. Na¨ ıve Bayes Bayes classification P ( c | x ) ∝ P ( x | c ) P ( c ) = P ( x 1 , . . . , x n | c ) P ( c ) (2) for c = c 1 , . . . , c K Problem The joint probability P ( x 1 , . . . , x n | c ) is not feasible to learn. (Dr. Mihail) Intro Big Data September 19, 2019 10 / 3

  12. Na¨ ıve Bayes Bayes classification P ( c | x ) ∝ P ( x | c ) P ( c ) = P ( x 1 , . . . , x n | c ) P ( c ) (2) for c = c 1 , . . . , c K Problem The joint probability P ( x 1 , . . . , x n | c ) is not feasible to learn. Solution Assume all input features are class conditionally independent! (Dr. Mihail) Intro Big Data September 19, 2019 10 / 3

  13. Bayes model P ( x 1 , x 2 , . . . , x n | c ) = P ( x 1 | x 2 , . . . , x n , c ) P ( x 2 , . . . , x n | c ) = P ( x 1 | c ) P ( x 2 , . . . , x n | c ) (3) = P ( x 1 | c ) P ( x 2 | c ) . . . P ( x n | c ) (Dr. Mihail) Intro Big Data September 19, 2019 11 / 3

  14. Algorithm Discrete valued features Learning phase: Given a training set S of F features and K classes, For each target value of c i ( c i = c 1 , . . . , c K ): ˆ P ( c i ) ← estimate P ( c i ) with examples in S For every feature value x jk of each feature x j ( j = 1 , . . . , F ; k = 1 , . . . N ): ˆ P ( x j = x jk | c i ) ← estimate P ( x jk | c i ) with samples in S Output: F × K conditional probabilistic (generative) models. Test phase: Given an unknown instance x ′ = ( a ′ 1 , . . . , a ′ n ) assign label c ∗ to x ′ if [ ˆ 1 | c ∗ ) . . . ˆ n | c ∗ )] ˆ P ( c ∗ ) > [ ˆ 1 | c i ) . . . ˆ n | c i )] ˆ P ( a ′ P ( a ′ P ( a ′ P ( a ′ P ( c i ) (4) for c i � = c ∗ , c i = c 1 , . . . c K (Dr. Mihail) Intro Big Data September 19, 2019 12 / 3

  15. Example (Dr. Mihail) Intro Big Data September 19, 2019 13 / 3

  16. Learning phase (Dr. Mihail) Intro Big Data September 19, 2019 14 / 3

  17. Test phase Given a new instance, predict its label: x ′ = ( Outlook = Sunny , Temperature = Cool , Humidity = High , Wind = Strong ) Look up tables: Make decision with the MAP rule: (Dr. Mihail) Intro Big Data September 19, 2019 15 / 3

Recommend


More recommend