Bayesian Networks Probabilistic Inference Estimating Parameters T-61.3050 Machine Learning: Basic Principles Bayesian Networks Kai Puolam¨ aki Laboratory of Computer and Information Science (CIS) Department of Computer Science and Engineering Helsinki University of Technology (TKK) Autumn 2007 AB Kai Puolam¨ aki T-61.3050
Bayesian Networks Reminders Probabilistic Inference Inference Estimating Parameters Finding the Structure of the Network Outline Bayesian Networks 1 Reminders Inference Finding the Structure of the Network Probabilistic Inference 2 Bernoulli Process Posterior Probabilities Estimating Parameters 3 Estimates from Posterior Bias and Variance Conclusion AB Kai Puolam¨ aki T-61.3050
Bayesian Networks Reminders Probabilistic Inference Inference Estimating Parameters Finding the Structure of the Network Rules of Probability P ( E , F ) = P ( F , E ): probability of both E and F happening. P ( E ) = � F P ( E , F ) (sum rule, marginalization) P ( E , F ) = P ( F | E ) P ( E ) (product rule, conditional probability) Consequence: P ( F | E ) = P ( E | F ) P ( F ) / P ( E ) (Bayes’ formula) We say E and F are independent if P ( E , F ) = P ( E ) P ( F ) (for all E and F ). We say E and F are conditionally independent given G if P ( E , F | G ) = P ( E | G ) P ( F | G ), or equivalently P ( E | F , G ) = P ( E | G ). AB Kai Puolam¨ aki T-61.3050
Bayesian Networks Reminders Probabilistic Inference Inference Estimating Parameters Finding the Structure of the Network Bayesian Networks Bayesian network is a directed acyclic graph (DAG) that describes a joint distribution over the vertices X 1 ,. . . , X d such that d � P ( X 1 , . . . , X d ) = P ( X i | parents ( X i )) , i =1 where parents ( X i ) are the set of vertices from which there is an edge to X i . C A B P ( A , B , C ) = P ( A | C ) P ( B | C ) P ( C ). AB ( A and B are conditionally independent given C .) Kai Puolam¨ aki T-61.3050
Bayesian Networks Reminders Probabilistic Inference Inference Estimating Parameters Finding the Structure of the Network Outline Bayesian Networks 1 Reminders Inference Finding the Structure of the Network Probabilistic Inference 2 Bernoulli Process Posterior Probabilities Estimating Parameters 3 Estimates from Posterior Bias and Variance Conclusion AB Kai Puolam¨ aki T-61.3050
Bayesian Networks Reminders Probabilistic Inference Inference Estimating Parameters Finding the Structure of the Network Inference in Bayesian Networks When structure of the Bayesian network and the P� (� C� )=0.5� Cloudy� probability factors are known, one usually wants to P� (� S � | � C� )=0.1� P� (� R � | � C� )=0.8� P� (� S � | ~� C� )=0.5� P� (� R � | ~� C� )=0.1� do inference by computing conditional probabilities. Sprinkler� Rain� This can be done with the P� (� W � | � R� ,� S� )=0.95� (� | � )=0.1� P� F � R� help of the sum and product (� | � ,~� )=0.90� P� W � R� S� P� (� F � | ~� R� )=0.7� P� (� W � | ~� R� ,� S� )=0.90� rules. P� (� W � | ~� R� ,~� S� )=0.10� Wet grass� rooF� Example: probability of the cat being on roof if it is Figure 3.5 of Alpaydin (2004). cloudy, P ( F | C )? AB Kai Puolam¨ aki T-61.3050
Bayesian Networks Reminders Probabilistic Inference Inference Estimating Parameters Finding the Structure of the Network Inference in Bayesian Networks Example: probability of the cat being on roof if it is cloudy, P ( F | C )? Cloudy S , R and W are unknown or hidden variables. F and C are observed variables. Sprinkler Rain Conventionally, we denote the observed variables by gray nodes (see figure on the right). Wet grass rooF We use the product rule P ( F | C ) = P ( F , C ) / P ( C ), where P ( C , S , R , W , F ) = P ( C ) = � F P ( F , C ). P ( F | R ) P ( W | S , R ) P ( S | C ) P ( R | We must sum over or marginalize over C ) P ( C ) hidden variables S , R and W : P ( F , C ) = AB � � � W P ( C , S , R , W , F ). S R Kai Puolam¨ aki T-61.3050
Bayesian Networks Reminders Probabilistic Inference Inference Estimating Parameters Finding the Structure of the Network Inference in Bayesian Networks P ( F , C ) = P ( C , S , R , W , F ) + P ( C , − S , R , W , F ) Cloudy + P ( C , S , − R , W , F ) + P ( C , − S , − R , W , F ) + P ( C , S , R , − W , F ) + P ( C , − S , R , − W , F ) + P ( C , S , − R , − W , F ) + P ( C , − S , − R , − W , F ) Sprinkler Rain We obtain similar formula for P ( F , − C ), Wet grass rooF P ( − F , C ) and P ( − F , − C ). Notice: we have used shorthand F to P ( C , S , R , W , F ) = denote F = 1 and − F to denote F = 0. P ( F | R ) P ( W | In principle, we know the numeric value of S , R ) P ( S | C ) P ( R | each joint distribution, hence we can C ) P ( C ) compute the probabilities. AB Kai Puolam¨ aki T-61.3050
Bayesian Networks Reminders Probabilistic Inference Inference Estimating Parameters Finding the Structure of the Network Inference in Bayesian Networks There are 2 5 terms in the sums. Generally: marginalization is NP-hard, the Cloudy most staightforward approach would involve a computation of O (2 d ) terms. We can often do better by smartly Sprinkler Rain re-arranging the sums and products. Behold: Wet grass rooF Do the marginalization over W first: P ( C , S , R , F ) = � W P ( F | R ) P ( W | P ( C , S , R , W , F ) = S , R ) P ( S | C ) P ( R | C ) P ( C ) = P ( F | P ( F | R ) P ( W | R ) � W [ P ( W | S , R )] P ( S | C ) P ( R | S , R ) P ( S | C ) P ( R | C ) P ( C ) = P ( F | R ) P ( S | C ) P ( R | C ) P ( C ) C ) P ( C ). AB Kai Puolam¨ aki T-61.3050
Bayesian Networks Reminders Probabilistic Inference Inference Estimating Parameters Finding the Structure of the Network Inference in Bayesian Networks Now we can marginalize over S easily: P ( C , R , F ) = � S P ( F | R ) P ( S | C ) P ( R | C ) P ( C ) = P ( F | R ) � S [ P ( S | C )] P ( R | Cloudy C ) P ( C ) = P ( F | R ) P ( R | C ) P ( C ). We must still marginalize over R: Sprinkler Rain P ( C , F ) = P ( F | R ) P ( R | C ) P ( C ) + P ( F | − R ) P ( − R | C ) P ( C ) = 0 . 1 × 0 . 8 × 0 . 5 + 0 . 7 × 0 . 2 × 0 . 5 = 0 . 11. Wet grass rooF P ( C , − F ) = P ( − F | R ) P ( R | C ) P ( C )+ P ( − F | − R ) P ( − R | C ) P ( C ) = P ( C , S , R , W , F ) = 0 . 9 × 0 . 8 × 0 . 5 + 0 . 3 × 0 . 2 × 0 . 5 = 0 . 39. P ( F | R ) P ( W | P ( C ) = P ( C , F ) + P ( C , − F ) = 0 . 5. S , R ) P ( S | C ) P ( R | C ) P ( C ) P ( F | C ) = P ( C , F ) / P ( C ) = 0 . 22. AB P ( − F | C ) = P ( C , − F ) / P ( C ) = 0 . 78. Kai Puolam¨ aki T-61.3050
Bayesian Networks Reminders Probabilistic Inference Inference Estimating Parameters Finding the Structure of the Network Bayesian Networks: Inference To do inference in Bayesian networks one has to marginalize over variables. For example: P ( X 1 ) = � X 2 . . . � X d P ( X 1 , . . . , X d ). If we have Boolean arguments the sum has O (2 d ) terms. This is inefficient! Generally, marginalization is a NP-hard problem. If Bayesian Network is a tree: Sum-Product Algorithm (a special case being Belief Propagation). If Bayesian Network is “close” to a tree: Junction Tree Algorithm. Otherwise: approximate methods (variational approximation, MCMC etc.) AB Kai Puolam¨ aki T-61.3050
Bayesian Networks Reminders Probabilistic Inference Inference Estimating Parameters Finding the Structure of the Network Sum-Product Algorithm Idea: sum of products is difficult to compute. Product of sums is easy to compute, if sums have been re-arranged smartly. Example: disconnected Bayesian network with d vertices, computing P ( X 1 ). sum of products: P ( X 1 ) = � X 2 . . . � X d P ( X 1 ) . . . P ( X d ). product of sums: �� � �� � P ( X 1 ) = P ( X 1 ) X 2 P ( X 2 ) . . . X d P ( X d ) = P ( X 1 ). Sum-Product Algorithm works if the Bayesian Network is directed tree. For details, see e.g., Bishop (2006). AB Kai Puolam¨ aki T-61.3050
Bayesian Networks Reminders Probabilistic Inference Inference Estimating Parameters Finding the Structure of the Network Sum-Product Algorithm Example D A B C P ( A , B , C , D ) = P ( A | D ) P ( B | D ) P ( C | D ) P ( D ) Task: compute ˜ P ( D ) = � � � C P ( A , B , C , D ). A B AB Kai Puolam¨ aki T-61.3050
Bayesian Networks Reminders Probabilistic Inference Inference Estimating Parameters Finding the Structure of the Network Sum-Product Algorithm Example D D P(A|D) P(B|D) P(C|D) P(D) A B C A B C P ( A , B , C , D ) = P ( A | D ) P ( B | D ) P ( C | D ) P ( D ) Factor graph is composed of vertices (ellipses) and factors (squares), describing the factors of the joint probability. The Sum-Product Algorithm re-arranges the product (check!): X ! X ! X ! ˜ P ( D ) = P ( A | D ) P ( B | D ) P ( C | D ) P ( D ) A B C X X X = P ( A , B , C , D ) . (1) AB A B C Kai Puolam¨ aki T-61.3050
Recommend
More recommend