ricco rakotomalala
play

Ricco RAKOTOMALALA Ricco Rakotomalala 1 Tutoriels Tanagra - - PowerPoint PPT Presentation

Ricco RAKOTOMALALA Ricco Rakotomalala 1 Tutoriels Tanagra - http://data-mining-tutorials.blogspot.fr/ Maximum a posteriori rule Calculating the posterior probability P Y y P / Y y


  1. Ricco RAKOTOMALALA Ricco Rakotomalala 1 Tutoriels Tanagra - http://data-mining-tutorials.blogspot.fr/

  2. Maximum a posteriori rule Calculating the posterior probability           P Y y P / Y y    k k P Y y /   Bayes  k P     theorem     P Y y P / Y y  k k K          P Y y P / Y y l l  l 1 MAP – Maximum a posteriori rule      y arg max P Y y / k * k k           y arg max P Y y P / Y y k * k k k How to estimate P(X/Y=y k ) Prior probability of class k : P(Y = y k ) Assumptions are introduced in order to obtain Estimated by empirical frequency n k /n a convenient calculation of this likelihood Ricco Rakotomalala 2 Tutoriels Tanagra - http://data-mining-tutorials.blogspot.fr/

  3. Ricco Rakotomalala 3 Tutoriels Tanagra - http://data-mining-tutorials.blogspot.fr/

  4. Conditional independence assumption Conditional independence for the J      P ( / Y y ) P ( X / Y y ) calculation of the likelihood k j k  j 1 The attributes are all conditionally independent of one another given the value of Y For a categorical attribute X, the conditional    P ( X x Y y )    probability for the value x l is computed as follows… l k ( / ) P X x Y y  l k P ( Y y ) k           The probability is estimated using   # , X ( ) x Y ( ) y n ˆ     l k kl P X x / Y y   the conditional relative frequency      l k # , Y ( ) y n k k  Y \ X x l The Laplace rule of succession is often used to estimate the conditional probability y n n k kl k    n 1 ˆ     kl P X x / Y y p  l k l / k  n K n k This is a kind of smoothing; it enables also to overcome the (n kl = 0) problem. Ricco Rakotomalala 4 Tutoriels Tanagra - http://data-mining-tutorials.blogspot.fr/

  5. An example using a toy dataset Maladie Marié Etud.Sup Direct estimation of the posterior probability Présent Non Oui Présent Non Oui 1 ˆ      Absent Non Non P ( Maladie Absent / Marié oui , Etu oui ) 1 1 Absent Oui Oui Présent Non Oui 0 ˆ      Absent Non Non P ( Maladie Présent / Marié oui , Etu oui ) 0 1 Absent Oui Non Présent Non Oui  If Etu = oui and Marié = oui Then Maladie = Absent Absent Oui Non Présent Oui Non (+) No assumptions, (-) small number of covered examples Conditional independence assumption NB Maladie Maladie Total ˆ    Absent 5 P ( Maladie Absent / Marié oui , Etu oui ) Présent 5 ˆ ˆ ˆ         P ( Maladie Absent ) P ( Marié oui / M Abs .) P ( Etu oui / M Abs .) Total général 10    5 1 3 1 1 1     0 . 082 NB Maladie Marié    10 2 5 2 5 2 Maladie Non Oui Total général Absent 2 3 5 ˆ    P ( Maladie présent / Marié oui , Etu oui ) Présent 4 1 5 Total général 6 4 10 ˆ ˆ ˆ         ( ) ( / .) ( / .) P Maladie présent P Marié oui M Abs P Etu oui M Abs    NB Maladie Etud.Sup 5 1 1 1 4 1     0 . 102 Oui Maladie Non Total général    10 2 5 2 5 2 Absent 4 1 5 Présent 1 4 5  If Etu = oui and Marié = oui Then Maladie = Présent Total général 5 5 10 (-) Questionable assumption, (+) more reliable estimation of probabilities Ricco Rakotomalala 5 Tutoriels Tanagra - http://data-mining-tutorials.blogspot.fr/

  6. Advantage and shortcoming (end of the course?) >> Simplicity, quickness, ability to handle very large dataset, no possible crash during the calculations >> Incrementality (we store only the contingency tables) >> Statistically robust (even if the assumption is very questionable) >> This is a linear classifier  similar classification performance (see the numerous experiments described in scientific papers) >> No indication about the relevance of the attributes (really ?) >> Very high number of rules (in practice, the logical rules are not computed, the contingency tables for the calculation of the conditional frequency are deployed e.g. PMML format) >> Not explicit model (really ?)  not used in marketing domain, etc. We see often these conclusions in the literature… Is it possible to go beyond that? Ricco Rakotomalala 6 Tutoriels Tanagra - http://data-mining-tutorials.blogspot.fr/

  7. Logarithmic transformation J      y arg max P ( Y y ) P ( X / Y y ) k * k j k k  j 1   J         y arg max ln P ( Y y ) ln P ( X / Y y ) k * k j k   k  j 1 Ricco Rakotomalala 7 Tutoriels Tanagra - http://data-mining-tutorials.blogspot.fr/

  8. Model using one predictive attribute A discrete attribute X with L levels     d ( y , X ) ln P ( Y y ) ln P ( X / Y y ) k k k From X, we can create L dummy variables L        d ( y , X ) ln P ( Y y ) ln P ( X x / Y y ) I k k l k l  l 1 L        ln P ( Y y ) ln P ( X x / Y y ) I k l k l  l 1 L     a a I 0 , k l , k l  l 1 We obtain a linear combination of the dummy variables i.e. an explicit model which is easy to deploy  K linear classification functions (such as linear discriminant analysis) Ricco Rakotomalala 8 Tutoriels Tanagra - http://data-mining-tutorials.blogspot.fr/

  9. An example (Y : Maladie; X : Etu.Sup) NB Maladie Maladie Total Absent 5 Présent 5 Total général 10 NB Maladie Etud.Sup Oui Maladie Non Total général Absent 4 1 5 Présent 1 4 5 Total général 5 5 10    5 1 4 1 1 1        d ( absent , X ) ln ln ( X non ) ln ( X oui )    10 2 5 2 5 2         0 . 6931 0 . 3365 ( X non ) 1 . 2528 ( X oui )         d ( present , X ) 0 . 6931 1 . 2528 ( X non ) 0 . 3365 ( X oui ) For an instance (Etu.Sup = NON)      d ( absent , X ) 0 . 6931 0 . 3365 1 . 0296 Prediction : Maladie = non      d ( present , X ) 0 . 9631 1 . 2528 1 . 9495 Ricco Rakotomalala 9 Tutoriels Tanagra - http://data-mining-tutorials.blogspot.fr/

  10. Implemented solution into TANAGRA (Using [L-1] dummy variables for an attribute X with L levels) since     L   I I I 1       1 2 L d ( y , X ) ln P ( Y y ) ln P ( X x / Y y ) I k k l k l  l 1    L 1 P ( X x / Y y )         l k ln P ( Y y ) ln P ( X x / Y y ) ln I   k L k l P ( X x / Y y )  l 1 L k  L 1     b b I 0 , k l , k l  l 1 One level [x L ] becomes the reference level The dummy coding is the most commonly used coding scheme Ricco Rakotomalala 10 Tutoriels Tanagra - http://data-mining-tutorials.blogspot.fr/

  11. Maladie Marié Etud.Sup Présent Non Oui Extension to J predictive attributes Présent Non Oui Absent Non Non Absent Oui Oui Présent Non Oui Dummy coding scheme Absent Non Non Absent Oui Non X j with L j levels  (L j -1) dummy variables Présent Non Oui Absent Oui Non Présent Oui Non Linear classification functions using the indicator variables Ricco Rakotomalala 11 Tutoriels Tanagra - http://data-mining-tutorials.blogspot.fr/

  12. The particular case of the binary classification (K = 2) Construction of the SCORE function The class attribute has 2 levels :: Y={+,-}         ( , ) d X a a X a X a X Decision rule      , 0 , 1 1 , 2 2 , J J         d ( , X ) a a X a X a X      D(X) > 0  Y = + , 0 , 1 1 , 2 2 , J J       d ( X ) c c X c X c X 1 1 2 2 J J Interpretation >> D(X) is the SCORE function. It assigns a score proportional to positive class probability estimate to the instances >> The sign of the coefficients allows to interpret the influence of the descriptors Notre Classification exemple : functions SCORE Descriptors Présent Absent D(X) Not being married makes sick… Marié = Non 0.916291 -0.287682 1.203973 To study makes sick… Etud.Sup = Oui 0.916291 -0.916291 1.832582 constant -3.198673 -1.589235 -1.609438 Ricco Rakotomalala 12 Tutoriels Tanagra - http://data-mining-tutorials.blogspot.fr/

Recommend


More recommend