10-418 / 10-618 Machine Learning for Structured Data Machine Learning Department School of Computer Science Carnegie Mellon University Belief Propagation Matt Gormley Lecture 9 Sep. 25, 2019 1
Q&A Q: What if I already answered a homework question using different assumptions than what was clarified in a Piazza note? A: Just write down the assumptions you made. We will usually give credit so long as your assumptions are clear in the writeup and your answer correct under those assumptions. (Obviously, this only applies to underspecified / ambiguous questions. You can’t just add arbitrary assumptions!) 2
Reminders • Homework 1: DAgger for seq2seq – Out: Thu, Sep. 12 – Due: Thu, Sep. 26 at 11:59pm • Homework 2: Labeling Syntax Trees – Out: Thu, Sep. 26 – Due: Thu, Oct. 10 at 11:59pm 3
Variable Elimination Complexity ψ 12 ψ 23 X 1 X 2 X 4 Instead, capitalize on the factorization of ψ 45 ψ 13 p( x ) . ψ 234 ψ 5 X 5 X 3 In-Class Exercise: Fill in the blank Brute force, naïve, Variable elimination inference is O(____) is O(____) where n = # of variables k = max # values a variable can take r = # variables participating in largest “intermediate” table 4
Exact Inference Variable Elimination Belief Propagation • • Uses Uses – Computes the partition – Computes the partition function of any factor graph function of any acyclic factor graph – Computes the marginal – Computes all marginal probability of a query variable in any factor graph probabilities of factors and • variables at once, for any Limitations acyclic factor graph – Only computes the marginal • Limitations for one variable at a time (i.e. – Only exact on acyclic factor need to re-run variable elimination for each variable if graphs (though we’ll consider you need them all) its “loopy” variant later) – Elimination order affects – Message passing order runtime affects runtime (but the obvious topological ordering always works best) 6
MESSAGE PASSING 7
Great Ideas in ML: Message Passing Count the soldiers there's 1 of me 1 2 3 4 5 before before before before before you you you you you 5 4 3 2 1 behind behind behind behind behind you you you you you 8 adapted from MacKay (2003) textbook
Great Ideas in ML: Message Passing Count the soldiers there's Belief: 1 of me Must be 2 + 1 + 3 = 6 of 2 1 3 2 2 us before before you you only see 3 behind my incoming you messages 9 adapted from MacKay (2003) textbook
Great Ideas in ML: Message Passing Count the soldiers there's Belief: Belief: 1 of me Must be Must be 1 + 1 + 4 = 6 of 1 1 4 2 + 1 + 3 = 6 of 2 1 3 1 before us us you only see 4 behind my incoming you messages 10 adapted from MacKay (2003) textbook
Great Ideas in ML: Message Passing Each soldier receives reports from all branches of tree 3 here 7 here 1 of me 11 here (= 7+3+1) 11 adapted from MacKay (2003) textbook
Great Ideas in ML: Message Passing Each soldier receives reports from all branches of tree 3 here 7 here (= 3+3+1) 3 here 12 adapted from MacKay (2003) textbook
Great Ideas in ML: Message Passing Each soldier receives reports from all branches of tree 11 here (= 7+3+1) 7 here 3 here 13 adapted from MacKay (2003) textbook
Great Ideas in ML: Message Passing Each soldier receives reports from all branches of tree 3 here 7 here Belief: Must be 14 of us 3 here 14 adapted from MacKay (2003) textbook
Great Ideas in ML: Message Passing Each soldier receives reports from all branches of tree 3 here 7 here Belief: Must be 14 of us 3 here wouldn't work correctly with a 'loopy' (cyclic) graph 15 adapted from MacKay (2003) textbook
Exact marginal inference for factor trees SUM-PRODUCT BELIEF PROPAGATION 16
Message Passing in Belief Propagation v 6 v 1 n 6 n 6 a 9 a 3 My other factors … think I’m a noun X Ψ … … But my other variables and I think you’re a verb … v 6 n 1 a 3 Both of these messages judge the possible values of variable X . Their product = belief at X = product of all 3 messages to X . 17
Sum-Product Belief Propagation Variables Factors ψ 2 X 2 Beliefs ψ 1 ψ 3 ψ 1 X 1 X 1 X 3 ψ 2 Messages X 2 ψ 1 ψ 3 ψ 1 X 1 X 1 X 3 18
Sum-Product Belief Propagation Variable Belief ψ 2 v 1 n 2 p 2 v 0.1 v 4 n 3 n 1 p p 1 0 ψ 1 ψ 3 X 1 v .4 n 6 p 0 19
Sum-Product Belief Propagation Variable Message ψ 2 v 1 n 2 p 2 v 0.1 v 0.1 n 3 n 6 p p 1 2 ψ 1 ψ 3 X 1 20
Sum-Product Belief Propagation Factor Belief v n p 4 p 0.1 8 v 8 d 1 d 3 0 n 0.2 n 0 n 1 1 ψ 1 X 1 X 3 v n p 3.2 6.4 d 24 0 n 0 0 21
Sum-Product Belief Propagation Factor Belief ψ 1 X 1 X 3 v n p 3.2 6.4 d 24 0 n 0 0 22
Sum-Product Belief Propagation Factor Message v n p 0.8 + 0.16 p 0.1 8 v 8 d 24 + 0 d 3 0 n 0.2 n 8 + 0.2 n 1 1 ψ 1 X 1 X 3 23
Sum-Product Belief Propagation Factor Message matrix-vector product (for a binary factor) ψ 1 X 1 X 3 24
Sum-Product Belief Propagation Input: a factor graph with no cycles Output: exact marginals for each variable and factor Algorithm: 1. Initialize the messages to the uniform distribution. 1. Choose a root node. 2. Send messages from the leaves to the root . Send messages from the root to the leaves . 1. Compute the beliefs (unnormalized marginals). 2. Normalize beliefs and return the exact marginals. 25
Sum-Product Belief Propagation Variables Factors ψ 2 X 2 Beliefs ψ 1 ψ 3 ψ 1 X 1 X 1 X 3 ψ 2 Messages X 2 ψ 1 ψ 3 ψ 1 X 1 X 1 X 3 26
Sum-Product Belief Propagation Variables Factors ψ 2 X 2 Beliefs ψ 1 ψ 3 ψ 1 X 1 X 1 X 3 ψ 2 Messages X 2 ψ 1 ψ 3 ψ 1 X 1 X 1 X 3 27
FORWARD BACKWARD AS SUM-PRODUCT BP 28
CRF Tagging Model X 1 X 2 X 3 find preferred tags Could be verb or noun Could be adjective or verb Could be noun or verb 29
CRF Tagging by Belief Propagation Backward algorithm = Forward algorithm = belief message passing message passing v 1.8 (matrix-vector products) (matrix-vector products) n 0 a 4.2 message message α α β β v n a v n a v 7 v 3 v 2 v 3 … … v 0 2 v 0 2 1 1 n 2 n 1 n 1 n 6 n 2 n 2 1 0 1 0 a 1 a 6 a 7 a 1 a 0 3 a 0 3 1 1 v 0.3 n 0 a 0.1 find tags preferred • Forward-backward is a message passing algorithm. • It’s the simplest case of belief propagation. 30
So Let’s Review Forward-Backward … X 1 X 2 X 3 find preferred tags Could be verb or noun Could be adjective or verb Could be noun or verb 31
So Let’s Review Forward-Backward … X 1 X 2 X 3 v v v n n n START END a a a find preferred tags • Show the possible values for each variable 32
So Let’s Review Forward-Backward … X 1 X 2 X 3 v v v n n n START END a a a find preferred tags • Let’s show the possible values for each variable • One possible assignment 33
So Let’s Review Forward-Backward … X 1 X 2 X 3 v v v n n n START END a a a find preferred tags • Let’s show the possible values for each variable • One possible assignment • And what the 7 factors think of it … 34
Viterbi Algorithm: Most Probable Assignment X 1 X 2 X 3 ) v , v v v ( START ψ {1} ( v ) } 1 ψ {1,2} (v,a) ψ , 0 { ψ {3,4} (a, END ) n n n ) START END n , a ( } ψ {3} ( n ) 3 ψ , 2 { a a a ψ {2} ( a ) find preferred tags • So p( v a n ) = (1/Z) * product of 7 numbers • Numbers associated with edges and nodes of path • Most probable assignment = path with highest product 35
Viterbi Algorithm: Most Probable Assignment X 1 X 2 X 3 ) v , v v v ( START ψ {1} ( v ) } 1 ψ {1,2} (v,a) ψ , 0 { ψ {3,4} (a, END ) n n n ) START END n , a ( } ψ {3} ( n ) 3 ψ , 2 { a a a ψ {2} ( a ) find preferred tags • So p( v a n ) = (1/Z) * product weight of one path 36
Forward-Backward Algorithm: Finds Marginals X 1 X 2 X 3 v v v n n n START END a a a find preferred tags • So p( v a n ) = (1/Z) * product weight of one path • Marginal probability p( X 2 = a) = (1/Z) * total weight of all paths through 37 a
Forward-Backward Algorithm: Finds Marginals X 1 X 2 X 3 v v v n n n START END a a a find preferred tags • So p( v a n ) = (1/Z) * product weight of one path • Marginal probability p( X 2 = a) = (1/Z) * total weight of all paths through 38 n
Forward-Backward Algorithm: Finds Marginals X 1 X 2 X 3 v v v n n n START END a a a find preferred tags • So p( v a n ) = (1/Z) * product weight of one path • Marginal probability p( X 2 = a) = (1/Z) * total weight of all paths through 39 v
Forward-Backward Algorithm: Finds Marginals X 1 X 2 X 3 v v v n n n START END a a a find preferred tags • So p( v a n ) = (1/Z) * product weight of one path • Marginal probability p( X 2 = a) = (1/Z) * total weight of all paths through 40 n
Forward-Backward Algorithm: Finds Marginals X 1 X 2 X 3 v v v n n n START END a a a find preferred tags α 2 ( n ) = total weight of these path prefixes 41 (found by dynamic programming: matrix-vector products)
Recommend
More recommend