data mining
play

DATA MINING CSE4334/5334 Data Mining, Fall 2014 Lecture 5: - PowerPoint PPT Presentation

CSE4334/5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Lecture 5: Department of Computer Science and Engineering, University of Texas at Arlington Classification (2) Chengkai Li (Slides courtesy of Vipin Kumar) Bayes Classifier


  1. CSE4334/5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Lecture 5: Department of Computer Science and Engineering, University of Texas at Arlington Classification (2) Chengkai Li (Slides courtesy of Vipin Kumar)

  2. Bayes Classifier

  3. Bayes Classifier  A probabilistic framework for solving classification problems ( , ) P A C  ( | ) P C A  Conditional Probability: ( ) P A ( , ) P A C  ( | ) P A C ( ) P C  Bayes theorem: ( | ) ( ) P A C P C  ( | ) P C A ( ) P A

  4. Example of Bayes Theorem Given:  Team A wins P(W=A) = 0.65  Team B wins P(W=B) = 0.35  If team A won, the probability that team B hosted the game P(H=B|W=A) = 0.30  If team B won, the probability that team B hosted the game P(H=B|W=B) = 0.75  If team B is the next host, which team has a better chance to win?  ( | ) ( ) P H W P W  ( | ) P W H ( ) P H And how big is the chance?      ( | ) ( ) 0 . 30 0 . 65 P H B W A P W A     ( | ) P W A H B   ( ) ( ) P H B P H B     ( | ) ( ) 0 . 75 0 . 35 P H B W B P W B     ( | ) P W B H B   ( ) ( ) P H B P H B                ( ) ( , ) ( , ) ( | ) ( ) ( | ) ( ) P H B P H B W A P H B W B P H B W A P W A P H B W B P W B     0 . 30 0 . 65 0 . 75 0 . 35  0 . 75 0 . 35    ( | ) P W B H B    0 . 30 0 . 65 0 . 75 0 . 35

  5. Bayesian Classifiers  Consider each attribute and class label as random variables  Given a record with attributes (A 1 , A 2 ,…,A n )  Goal is to predict class C  Specifically, we want to find the value of C that maximizes P(C| A 1 , A 2 ,…,A n )  Can we estimate P(C| A 1 , A 2 ,…,A n ) directly from data?

  6. Bayesian Classifiers  Approach:  compute the posterior probability P(C | A 1 , A 2 , …, A n ) for all values of C using the Bayes theorem  ( | ) ( ) P A A A C P C   ( | ) P C A A A 1 2 n  1 2 n ( ) P A A A 1 2 n  Choose value of C that maximizes P(C | A 1 , A 2 , …, A n )  Equivalent to choosing value of C that maximizes P(A 1 , A 2 , …, A n |C) P(C)  How to estimate P(A 1 , A 2 , …, A n | C )?

  7. Naïve Bayes Classifier  Assume independence among attributes A i when class is given:  P(A 1 , A 2 , …, A n |C) = P(A 1 | C j ) P(A 2 | C j )… P(A n | C j )  Can estimate P(A i | C j ) for all A i and C j .  New point is classified to C j if P(C j )  P(A i | C j ) is maximal.

  8. How to Estimate Probabilities from Data?  Class: P(C) = N c /N a a o c c c c  e.g., P(No) = 7/10, Tid Refund Marital Taxable P(Yes) = 3/10 Evade Status Income 1 Yes Single 125K No  For discrete attributes: 2 No Married 100K No 3 No Single 70K No P(A i | C k ) = |A ik |/ N c 4 Yes Married 120K No 5 No Divorced 95K Yes  where |A ik | is number of instances having attribute A i and belongs to 6 No Married 60K No class C k 7 Yes Divorced 220K No  Examples: 8 No Single 85K Yes 9 No Married 75K No P(Status=Married|No) = 4/7 10 No Single 90K Yes P(Refund=Yes|Yes)=0 10

  9. How to Estimate Probabilities from Data?  For continuous attributes:  Discretize the range into bins  one ordinal attribute per bin  violates independence assumption  Two-way split: (A < v) or (A > v)  Probability density estimation:  Assume attribute follows a normal distribution  Use data to estimate parameters of distribution (e.g., mean and standard deviation)  Once probability distribution is known, can use it to estimate the conditional probability P(A i |c)

  10. How to Estimate Probabilities from Data? c c c c  Normal distribution: Tid Refund Marital Taxable Evade Status Income 1   2 ( ) A  i ij   2 ( | ) 2 P A c e 1 Yes Single 125K No ij  i j 2 2 2 No Married 100K No ij 3 No Single 70K No  One for each ( A i ,c j ) pair 4 Yes Married 120K No 5 No Divorced 95K Yes  For (Income, Class=No): 6 No Married 60K No  If Class=No 7 Yes Divorced 220K No 8 No Single 85K Yes  sample mean = 110 9 No Married 75K No  sample variance = 2975 10 No Single 90K Yes 10 1  2 ( 120 110 )     ( 120 | ) 0 . 0072 2 ( 2975 ) P Income No e  2 ( 54 . 54 )

  11. Example of Naïve Bayes Classifier Given a Test Record:    ( Refund No, Married, Income 120K) X naive Bayes Classifier: P(Refund=Yes|No) = 3/7 P(X|Class=No) = P(Refund=No|Class=No)  P(Refund=No|No) = 4/7  P(Married| Class=No) P(Refund=Yes|Yes) = 0  P(Income=120K| Class=No) P(Refund=No|Yes) = 1 = 4/7  4/7  0.0072 = 0.0024 P(Marital Status=Single|No) = 2/7 P(Marital Status=Divorced|No)=1/7 P(Marital Status=Married|No) = 4/7 P(X|Class=Yes) = P(Refund=No| Class=Yes)  P(Marital Status=Single|Yes) = 2/3  P(Married| Class=Yes) P(Marital Status=Divorced|Yes)=1/3  P(Income=120K| Class=Yes) P(Marital Status=Married|Yes) = 0 = 1  0  1.2  10 -9 = 0 For taxable income: If class=No: sample mean=110 Since P(X|No)P(No) > P(X|Yes)P(Yes) sample variance=2975 Therefore P(No|X) > P(Yes|X) If class=Yes: sample mean=90 sample variance=25 => Class = No

  12. Naïve Bayes Classifier  If one of the conditional probability is zero, then the entire expression becomes zero  Probability estimation: c: number of classes N  Original : ( | ) ic P A C i p: prior probability N c  1 m: parameter N  ic Laplace : ( | ) P A C  i N c c  N mp  m - estimate : ( | ) ic P A C  i N m c

  13. Example of Naïve Bayes Classifier Name Give Birth Can Fly Live in Water Have Legs Class A: attributes human yes no no yes mammals python no no no no non-mammals M: mammals salmon no no yes no non-mammals whale yes no yes no mammals frog no no sometimes yes non-mammals N: non-mammals komodo no no no yes non-mammals 6 6 2 2      ( | ) 0 . 06 bat yes yes no yes mammals P A M pigeon no yes no yes non-mammals 7 7 7 7 cat yes no no yes mammals 1 10 3 4 leopard shark yes no yes no non-mammals      ( | ) 0 . 0042 P A N turtle no no sometimes yes non-mammals 13 13 13 13 penguin no no sometimes yes non-mammals porcupine yes no no yes mammals 7 eel no no yes no non-mammals    ( | ) ( ) 0 . 06 0 . 021 P A M P M salamander no no sometimes yes non-mammals 20 gila monster no no no yes non-mammals platypus no no no yes mammals 13    owl no yes no yes non-mammals ( | ) ( ) 0 . 004 0 . 0027 P A N P N dolphin yes no yes no mammals 20 eagle no yes no yes non-mammals P(A|M)P(M) > Give Birth Can Fly Live in Water Have Legs Class P(A|N)P(N) yes no yes no ? => Mammals

  14. Naïve Bayes (Summary)  Robust to isolated noise points  Handle missing values by ignoring the instance during probability estimate calculations  Robust to irrelevant attributes  Independence assumption may not hold for some attributes  Use other techniques such as Bayesian Belief Networks (BBN)

Recommend


More recommend