the elimination algorithm
play

The Elimination Algorithm Probabilistic Graphical Models (10- - PDF document

School of Computer Science The Elimination Algorithm Probabilistic Graphical Models (10- Probabilistic Graphical Models (10 -708) 708) Lecture 4, Sep 26, 2007 Receptor A Receptor A Receptor B Receptor B X 1 X 1 X 1 X 2 X 2 X 2 Eric Xing


  1. School of Computer Science The Elimination Algorithm Probabilistic Graphical Models (10- Probabilistic Graphical Models (10 -708) 708) Lecture 4, Sep 26, 2007 Receptor A Receptor A Receptor B Receptor B X 1 X 1 X 1 X 2 X 2 X 2 Eric Xing Eric Xing Kinase C Kinase C X 3 X 3 X 3 Kinase D Kinase D X 4 X 4 X 4 Kinase E Kinase E X 5 X 5 X 5 TF F TF F X 6 X 6 X 6 Reading: J-Chap 3, KF-Chap. 8, 9 Gene G Gene G X 7 X 7 X 7 Gene H Gene H X 8 X 8 X 8 1 � Questions? Eric Xing 2 1

  2. Probabilistic Inference � We now have compact representations of probability distributions: Graphical Models � A GM G describes a unique probability distribution P � How do we answer queries about P ? � We use inference as a name for the process of computing answers to such queries Eric Xing 3 Query 1: Likelihood � Most of the queries one may ask involve evidence Evidence e is an assignment of values to a set E variables in the domain � Without loss of generality E = { X k+1 , …, X n } � � Simplest query: compute probability of evidence ∑ ∑ = e e L K P ( ) P ( x , ,x , ) 1 k x x 1 k this is often referred to as computing the likelihood of e � Eric Xing 4 2

  3. Query 2: Conditional Probability � Often we are interested in the conditional probability distribution of a variable given the evidence e e P ( X, ) P ( X, ) = = e P ( X | ) ∑ = e e P ( ) P ( X x, ) x this is the a posteriori belief in X , given evidence e � � We usually query a subset Y of all domain variables X = { Y , Z } and "don't care" about the remaining, Z : ∑ = = Y Y Z z e P ( | e ) P ( , | ) z the process of summing out the "don't care" variables z is called � marginalization , and the resulting P ( y | e ) is called a marginal prob. Eric Xing 5 Applications of a posteriori Belief Prediction : what is the probability of an outcome given the starting � condition ? A B C the query node is a descendent of the evidence � Diagnosis : what is the probability of disease/fault given symptoms � ? A B C the query node an ancestor of the evidence � Learning under partial observation � fill in the unobserved values under an "EM" setting (more later) � The directionality of information flow between variables is not restricted � by the directionality of the edges in a GM probabilistic inference can combine evidence form all parts of the network � Eric Xing 6 3

  4. Query 3: Most Probable Assignment � In this query we want to find the most probable joint assignment (MPA) for some variables of interest � Such reasoning is usually performed under some given evidence e , and ignoring (the values of) other variables z : ∑ = = Y e y e y z e MPA ( | ) arg max P ( | ) arg max P ( , | ) ∈ Y ∈ Y y y z this is the maximum a posteriori configuration of y . � Eric Xing 7 Applications of MPA � Classification find most likely label, given the evidence � � Explanation what is the most likely scenario, given the evidence � Cautionary note: � The MPA of a variable depends on its "context"---the set of variables been jointly queried x y P(x,y) 0 0 0.35 � Example: 0 1 0.05 MPA of X ? � 1 0 0.3 MPA of ( X, Y ) ? � 1 1 0.3 Eric Xing 8 4

  5. Complexity of Inference Thm: Computing P ( X = x | e ) in a GM is NP-hard � Hardness does not mean we cannot solve inference It implies that we cannot find a general procedure that works efficiently � for arbitrary GMs For particular families of GMs, we can have provably efficient � procedures Eric Xing 9 Approaches to inference � Exact inference algorithms The elimination algorithm � Message-passing algorithm (sum-product, belief propagation) � The junction tree algorithms � � Approximate inference techniques Stochastic simulation / sampling methods � Markov chain Monte Carlo methods � Variational algorithms � Eric Xing 10 5

  6. Marginalization and Elimination � A signal transduction pathway: A B C D E What is the likelihood that protein E is active? � Query: P ( e ) ∑∑∑∑ = P ( e ) P(a,b,c,d, e) d c b a a naïve summation needs to enumerate over an exponential number of terms � By chain decomposition, we get ∑∑∑∑ = P ( a ) P ( b | a ) P ( c | b ) P ( d | c ) P ( e | d ) d c b a Eric Xing 11 Elimination on Chains A B C D E � Rearranging terms ... ∑∑∑∑ = P ( e ) P ( a ) P ( b | a ) P ( c | b ) P ( d | c ) P ( e | d ) d c b a ∑∑∑ ∑ = P ( c | b ) P ( d | c ) P ( e | d ) P ( a ) P ( b | a ) d c b a Eric Xing 12 6

  7. Elimination on Chains X A B C D E � Now we can perform innermost summation ∑∑∑ ∑ = P ( e ) P ( c | b ) P ( d | c ) P ( e | d ) P ( a ) P ( b | a ) d c b a ∑∑∑ = P ( c | b ) P ( d | c ) P ( e | d ) p ( b ) d c b � This summation "eliminates" one variable from our summation argument at a "local cost". Eric Xing 13 Elimination in Chains X X A B C D E � Rearranging and then summing again, we get ∑∑∑ = P ( e ) P ( c | b ) P ( d | c ) P ( e | d ) p ( b ) d c b ∑∑ ∑ = P ( d | c ) P ( e | d ) P ( c | b ) p ( b ) d c b ∑∑ = P ( d | c ) P ( e | d ) p ( c ) d c Eric Xing 14 7

  8. Elimination in Chains X X X X A B C D E � Eliminate nodes one by one all the way to the end, we get ∑ = P ( e ) P ( e | d ) p ( d ) d � Complexity: Each step costs O(|Val(X i )|*|Val(X i+1 )|) operations: O(kn 2 ) � Compare to naïve evaluation that sums over joint values of n-1 variables O(n k ) � Eric Xing 15 Undirected Chains A B C D E � Rearranging terms ... 1 ∑∑∑∑ = φ φ φ φ P ( e ) ( b , a ) ( c , b ) ( d , c ) ( e , d ) Z d c b a 1 ∑∑∑ ∑ = φ φ φ φ ( c , b ) ( d , c ) ( e , d ) ( b , a ) Z d c b a = L Eric Xing 16 8

  9. The Sum-Product Operation � In general, we can view the task at hand as that of computing the value of an expression of the form: ∑∏ φ φ ∈ F z where F is a set of factors � We call this task the sum-product inference task. Eric Xing 17 Outcome of elimination Let X be some set of variables, � let F be a set of factors such that for each φ ∈ F , Scope [ φ ] ∈ X , let Y ⊂ X be a set of query variables, and let Z = X − Y be the variable to be eliminated The result of eliminating the variable Z is a factor � ∑∏ τ = φ Y ( ) φ ∈ F z This factor does not necessarily correspond to any probability or conditional � probability in this network. (example forthcoming) Eric Xing 18 9

  10. Dealing with evidence � Conditioning as a Sum-Product Operation 1 ⎧ ≡ if E e δ = i i The evidence potential: ( E , e ) ⎨ � ≠ i i ⎩ 0 if E e i i ∏ δ = δ E e Total evidence potential: ( , ) ( E i e , ) � i ∈ i I E Introducing evidence --- restricted factors: � ∑∏ τ = φ × δ Y e E e ( , ) ( , ) z e φ ∈ F , Eric Xing 19 Inference on General GM via Variable Elimination General idea: Write query in the form � ∑ ∑∑∏ = e L P ( X , ) P ( x | pa ) 1 i i x n x x i 3 2 � this suggests an "elimination order" of latent variables to be marginalized Iteratively � � Move all irrelevant terms outside of innermost sum � Perform innermost sum, getting a new term � Insert the new term into the product φ e ( X , ) wrap-up = 1 � e P ( X | ) ∑ 1 φ e ( X , ) 1 x 1 Eric Xing 20 10

  11. The elimination algorithm Procedure Elimination ( G , // the GM E , // evidence Z , // Set of variables to be eliminated X , // query variable(s) ) Initialize ( G ) 1. Evidence ( E ) 2. Sum-Product-Elimination ( F , Z , ≺ ) 3. Normalization ( F ) 4. Eric Xing 21 The elimination algorithm Procedure Initialize ( G , Z ) Let Z 1 , . . . ,Z k be an ordering of Z 1. such that Z i ≺ Z j iff i < j Initialize F with the full the set of 2. factors Procedure Evidence ( E ) for each i ∈ Ι E , 1. F = F ∪δ ( E i , e i ) Procedure Sum-Product-Variable- Elimination ( F , Z , ≺ ) for i = 1, . . . , k 1. F ← Sum-Product-Eliminate-Var( F , Z i ) φ ∗ ← ∏ φ∈ F φ 2. return φ ∗ 3. Normalization ( φ ∗ ) 4. Eric Xing 22 11

  12. The elimination algorithm Procedure Normalization ( φ ∗ ) Procedure Initialize ( G , Z ) P ( X | E )= φ ∗ ( X )/ ∑ x φ ∗ ( X ) Let Z 1 , . . . ,Z k be an ordering of Z 1. 1. such that Z i ≺ Z j iff i < j Initialize F with the full the set of 2. factors Procedure Evidence ( E ) for each i ∈ Ι E , 1. Procedure Sum-Product-Eliminate-Var ( F = F ∪δ ( E i , e i ) F , // Set of factors Z // Variable to be eliminated Procedure Sum-Product-Variable- Elimination ( F , Z , ≺ ) ) F ′ ← { φ ∈ F : Z ∈ Scope [ φ ]} 1. for i = 1, . . . , k 1. F ′′ ← F − F ′ 2. F ← Sum-Product-Eliminate-Var( F , Z i ) ψ ← ∏ φ ∈ F ′ φ φ ∗ ← ∏ φ∈ F φ 3. 2. τ ← ∑ Z ψ 4. return φ ∗ 3. return F ′′ ∪ { τ } Normalization ( φ ∗ ) 5. 4. Eric Xing 23 A more complex network A food web B A C D F E G H What is the probability that hawks are leaving given that the grass condition is poor? Eric Xing 24 12

Recommend


More recommend