Finding Explanations Instead of finding structure in a data set, we are now focusing on methods that find explanations for an unknown dependency within the data. Given: Dataset D = { ( x i , Y i ) | i = 1 , ..., n } with n tuples x : Object description Y : Target attribute nominal: classification problem numerical: regression problem Data analysis Supervised (because we know the desired outcome) Descriptive (because we care about explanation) Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011. 1 / 39 � Michael R. Berthold, Christian Borgelt, Frank H¨ c oppner, Frank Klawonn and Iris Ad¨ a
Bayes Classifiers Given: Dataset D = { ( x i , Y i ) | i = 1 , ..., n } with n tuples x : Object description Y : Nominal target attribute ⇒ classification problem Bayes classifiers express their model in terms of simple probabilities. Provide ’gold standard’ for evaluating other learning algorithms. Any other model should at least perform as well as the naive Bayes classifier. Suggestion Before trying to apply more complex models, a quick look at a Bayes classifier can be helpful to get a feeling for realistic accuracy expectations and simple dependencies in the data. Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011. 2 / 39 � Michael R. Berthold, Christian Borgelt, Frank H¨ c oppner, Frank Klawonn and Iris Ad¨ a
Bayes’ theorem P ( h | E ) = P ( E | h ) · P ( h ) P ( E ) Interpretation The probability P ( h | E ) that a hypothesis h is true given event E has occurred, can be derived from P ( h ) the probability of the hypothesis h itself, P ( E ) the probability of the event E and P ( E | h ) the conditional probability of the event E given the hypothesis h . Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011. 3 / 39 � Michael R. Berthold, Christian Borgelt, Frank H¨ c oppner, Frank Klawonn and Iris Ad¨ a
Choosing Hypotheses We want the most probable hypothesis h ∈ H for a given event E Maximum a posteriori hypothesis: h MAP = arg max P ( h | E ) h ∈ H P ( E | h ) P ( h ) = arg max P ( E ) h ∈ H = arg max P ( E | h ) P ( h ) h ∈ H Maximum likelihood If we assume that every hypothesis h ∈ H is equally probable a priori ( P ( h i ) = P ( h j ) for all h i , h j ∈ H ) we can further simplify the equation and get the maximum likelihood hypothesis: h ML = arg max P ( E | h ) h ∈ H Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011. 4 / 39 � Michael R. Berthold, Christian Borgelt, Frank H¨ c oppner, Frank Klawonn and Iris Ad¨ a
Bayes classifiers The probability P ( h ) can be estimated easily based on a given data set D : P ( h ) = no. of data from class h no. of data In principle, the probability P ( E | h ) could be determined analogously based on the values of the attributes A 1 , . . . , A m , i.e. the attribute vector E = ( a 1 , . . . , a m ) . P ( E | h ) = no. of data from class h with values ( a 1 , . . . , a m ) no. of data from class h Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011. 5 / 39 � Michael R. Berthold, Christian Borgelt, Frank H¨ c oppner, Frank Klawonn and Iris Ad¨ a
Bayes classifiers Problem For n = 10 nominal attributes A 1 , . . . , A 10 , each having three possible values, we would need 3 10 = 59049 data objects to have at least one example per combination. Therefore, the computation is carried out under the (na¨ ıve, unrealistic) assumption that the attributes A 1 , . . . , A m are independent given the class, i.e. � P ( E = ( a 1 , . . . , a m ) | h ) = P ( a 1 | h ) · . . . · P ( a m | h ) = P ( a i | h ) a i ∈ E P ( a i | h ) can be computed easily: P ( a i | h ) = no. of data from class h with A i = a i no. of data from class h Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011. 6 / 39 � Michael R. Berthold, Christian Borgelt, Frank H¨ c oppner, Frank Klawonn and Iris Ad¨ a
Na¨ ıve Bayes classifier Given: A data set with only nominal attributes. Based on the values a 1 , . . . , a m of the attributes A 1 , . . . , A m a prediction for the value of the attribute H should be derived: For each class h ∈ H compute the likelihood L ( h | E ) under the assumption that the A 1 , . . . , A m are independent given the class � L ( h | E ) = P ( a i | h ) · P ( h ) . a i ∈ E Assign E to the class h ∈ H with the highest likelihood pred ( E ) = arg max L ( h | E ) . h ∈ H This Bayes classifier is called na¨ ıve because of the (conditional) independence assumption for the attributes A 1 , . . . , A m . Although this assumption is unrealistic in most cases, the classifier often yields good results, when not too many attributes are correlated. Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011. 7 / 39 � Michael R. Berthold, Christian Borgelt, Frank H¨ c oppner, Frank Klawonn and Iris Ad¨ a
Example Given the dataset D : ID Height Weight Long hair Sex 1 m n n m 2 s l y f 3 t h n m 4 s n y f 5 t n y f 6 s l n f 7 s h n m 8 m n n f 9 m l y f 10 t n n m we want to predict the sex ( male or female ) of a person x with the following attribute values: x = ( Height = tall, Weight = low, Long hair = yes ) Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011. 8 / 39 � Michael R. Berthold, Christian Borgelt, Frank H¨ c oppner, Frank Klawonn and Iris Ad¨ a
Example We need to calculate L ( Sex = m | Height = t, Weight = l, Long hair = y ) = P ( Height = t | Sex = m ) · P ( Weight = l | Sex = m ) · P ( Long hair = y | Sex = m ) · P ( Sex = m ) and L ( Sex = f | Height = t, Weight = l, Long hair = y ) = P ( Height = t | Sex = f ) · P ( Weight = l | Sex = f ) · P ( Long hair = y | Sex = f ) · P ( Sex = f ) . Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011. 9 / 39 � Michael R. Berthold, Christian Borgelt, Frank H¨ c oppner, Frank Klawonn and Iris Ad¨ a
Example P ( Height = t | Sex = m ) ID Height Weight Long hair Sex 1 m n n m 2 s l y f 3 t h n m 4 s n y f 5 t n y f 6 s l n f 7 s h n m 8 m n n f 9 m l y f 10 t n n m Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011. 10 / 39 � Michael R. Berthold, Christian Borgelt, Frank H¨ c oppner, Frank Klawonn and Iris Ad¨ a
Example P ( Height = t | Sex = m ) ID Height Weight Long hair Sex 1 m n n m 2 s l y f 3 t h n m 4 s n y f 5 t n y f 6 s l n f 7 s h n m 8 m n n f 9 m l y f 10 t n n m Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011. 11 / 39 � Michael R. Berthold, Christian Borgelt, Frank H¨ c oppner, Frank Klawonn and Iris Ad¨ a
Example P ( Height = t | Sex = m ) = 2 / 4 = 1 / 2 ID Height Weight Long hair Sex 1 m n n m 2 s l y f 3 t h n m 4 s n y f 5 t n y f 6 s l n f 7 s h n m 8 m n n f 9 m l y f 10 t n n m Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011. 12 / 39 � Michael R. Berthold, Christian Borgelt, Frank H¨ c oppner, Frank Klawonn and Iris Ad¨ a
Example P ( Weight = l | Sex = m ) = 0 / 4 = 0 ID Height Weight Long hair Sex 1 m n n m 2 s l y f 3 t h n m 4 s n y f 5 t n y f 6 s l n f 7 s h n m 8 m n n f 9 m l y f 10 t n n m Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011. 13 / 39 � Michael R. Berthold, Christian Borgelt, Frank H¨ c oppner, Frank Klawonn and Iris Ad¨ a
Example P ( Long hair = y | Sex = m ) = 0 / 4 = 0 ID Height Weight Long hair Sex 1 m n n m 2 s l y f 3 t h n m 4 s n y f 5 t n y f 6 s l n f 7 s h n m 8 m n n f 9 m l y f 10 t n n m Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011. 14 / 39 � Michael R. Berthold, Christian Borgelt, Frank H¨ c oppner, Frank Klawonn and Iris Ad¨ a
Example P ( Sex = m ) = 4 / 10 = 2 / 5 ID Height Weight Long hair Sex 1 m n n m 2 s l y f 3 t h n m 4 s n y f 5 t n y f 6 s l n f 7 s h n m 8 m n n f 9 m l y f 10 t n n m Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011. 15 / 39 � Michael R. Berthold, Christian Borgelt, Frank H¨ c oppner, Frank Klawonn and Iris Ad¨ a
Example L ( Sex = m | Height = t, Weight = l, Long hair = y ) = 2 4 · 0 4 · 0 4 · 4 10 = 1 2 · 0 · 0 · 2 5 = 0 ⇒ the likelihood of person x being a men is 0 . ID Height Weight Long hair Sex 1 m n n m 3 t h n m 4 s n y f 5 t n y f 6 s l n f 7 s h n m 8 m n n f 9 m l y f 10 t n n m Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011. 16 / 39 � Michael R. Berthold, Christian Borgelt, Frank H¨ c oppner, Frank Klawonn and Iris Ad¨ a
Example P ( Height = t | Sex = f ) ID Height Weight Long hair Sex 1 m n n m 2 s l y f 3 t h n m 4 s n y f 5 t n y f 6 s l n f 7 s h n m 8 m n n f 9 m l y f 10 t n n m Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011. 17 / 39 � Michael R. Berthold, Christian Borgelt, Frank H¨ c oppner, Frank Klawonn and Iris Ad¨ a
Recommend
More recommend