belief propagation
play

Belief Propagation Matt Gormley Lecture 9 Sep. 25, 2019 1 - PowerPoint PPT Presentation

10-418 / 10-618 Machine Learning for Structured Data Machine Learning Department School of Computer Science Carnegie Mellon University Belief Propagation Matt Gormley Lecture 9 Sep. 25, 2019 1 Q&A Q: What if I already answered a


  1. 10-418 / 10-618 Machine Learning for Structured Data Machine Learning Department School of Computer Science Carnegie Mellon University Belief Propagation Matt Gormley Lecture 9 Sep. 25, 2019 1

  2. Q&A Q: What if I already answered a homework question using different assumptions than what was clarified in a Piazza note? A: Just write down the assumptions you made. We will usually give credit so long as your assumptions are clear in the writeup and your answer correct under those assumptions. (Obviously, this only applies to underspecified / ambiguous questions. You can’t just add arbitrary assumptions!) 2

  3. Reminders • Homework 1: DAgger for seq2seq – Out: Thu, Sep. 12 – Due: Thu, Sep. 26 at 11:59pm • Homework 2: Labeling Syntax Trees – Out: Thu, Sep. 26 – Due: Thu, Oct. 10 at 11:59pm 3

  4. Variable Elimination Complexity ψ 12 ψ 23 X 1 X 2 X 4 Instead, capitalize on the factorization of ψ 45 ψ 13 p( x ) . ψ 234 ψ 5 X 5 X 3 In-Class Exercise: Fill in the blank Brute force, naïve, Variable elimination inference is O(____) is O(____) where n = # of variables k = max # values a variable can take r = # variables participating in largest “intermediate” table 4

  5. Exact Inference Variable Elimination Belief Propagation • • Uses Uses – Computes the partition – Computes the partition function of any factor graph function of any acyclic factor graph – Computes the marginal – Computes all marginal probability of a query variable in any factor graph probabilities of factors and • variables at once, for any Limitations acyclic factor graph – Only computes the marginal • Limitations for one variable at a time (i.e. – Only exact on acyclic factor need to re-run variable elimination for each variable if graphs (though we’ll consider you need them all) its “loopy” variant later) – Elimination order affects – Message passing order runtime affects runtime (but the obvious topological ordering always works best) 6

  6. MESSAGE PASSING 7

  7. Great Ideas in ML: Message Passing Count the soldiers there's 1 of me 1 2 3 4 5 before before before before before you you you you you 5 4 3 2 1 behind behind behind behind behind you you you you you 8 adapted from MacKay (2003) textbook

  8. Great Ideas in ML: Message Passing Count the soldiers there's Belief: 1 of me Must be 2 + 1 + 3 = 6 of 2 1 3 2 2 us before before you you only see 3 behind my incoming you messages 9 adapted from MacKay (2003) textbook

  9. Great Ideas in ML: Message Passing Count the soldiers there's Belief: Belief: 1 of me Must be Must be 1 + 1 + 4 = 6 of 1 1 4 2 + 1 + 3 = 6 of 2 1 3 1 before us us you only see 4 behind my incoming you messages 10 adapted from MacKay (2003) textbook

  10. Great Ideas in ML: Message Passing Each soldier receives reports from all branches of tree 3 here 7 here 1 of me 11 here (= 7+3+1) 11 adapted from MacKay (2003) textbook

  11. Great Ideas in ML: Message Passing Each soldier receives reports from all branches of tree 3 here 7 here (= 3+3+1) 3 here 12 adapted from MacKay (2003) textbook

  12. Great Ideas in ML: Message Passing Each soldier receives reports from all branches of tree 11 here (= 7+3+1) 7 here 3 here 13 adapted from MacKay (2003) textbook

  13. Great Ideas in ML: Message Passing Each soldier receives reports from all branches of tree 3 here 7 here Belief: Must be 14 of us 3 here 14 adapted from MacKay (2003) textbook

  14. Great Ideas in ML: Message Passing Each soldier receives reports from all branches of tree 3 here 7 here Belief: Must be 14 of us 3 here wouldn't work correctly with a 'loopy' (cyclic) graph 15 adapted from MacKay (2003) textbook

  15. Exact marginal inference for factor trees SUM-PRODUCT BELIEF PROPAGATION 16

  16. Message Passing in Belief Propagation v 6 v 1 n 6 n 6 a 9 a 3 My other factors … think I’m a noun X Ψ … … But my other variables and I think you’re a verb … v 6 n 1 a 3 Both of these messages judge the possible values of variable X . Their product = belief at X = product of all 3 messages to X . 17

  17. Sum-Product Belief Propagation Variables Factors ψ 2 X 2 Beliefs ψ 1 ψ 3 ψ 1 X 1 X 1 X 3 ψ 2 Messages X 2 ψ 1 ψ 3 ψ 1 X 1 X 1 X 3 18

  18. Sum-Product Belief Propagation Variable Belief ψ 2 v 1 n 2 p 2 v 0.1 v 4 n 3 n 1 p p 1 0 ψ 1 ψ 3 X 1 v .4 n 6 p 0 19

  19. Sum-Product Belief Propagation Variable Message ψ 2 v 1 n 2 p 2 v 0.1 v 0.1 n 3 n 6 p p 1 2 ψ 1 ψ 3 X 1 20

  20. Sum-Product Belief Propagation Factor Belief v n p 4 p 0.1 8 v 8 d 1 d 3 0 n 0.2 n 0 n 1 1 ψ 1 X 1 X 3 v n p 3.2 6.4 d 24 0 n 0 0 21

  21. Sum-Product Belief Propagation Factor Belief ψ 1 X 1 X 3 v n p 3.2 6.4 d 24 0 n 0 0 22

  22. Sum-Product Belief Propagation Factor Message v n p 0.8 + 0.16 p 0.1 8 v 8 d 24 + 0 d 3 0 n 0.2 n 8 + 0.2 n 1 1 ψ 1 X 1 X 3 23

  23. Sum-Product Belief Propagation Factor Message matrix-vector product (for a binary factor) ψ 1 X 1 X 3 24

  24. Sum-Product Belief Propagation Input: a factor graph with no cycles Output: exact marginals for each variable and factor Algorithm: 1. Initialize the messages to the uniform distribution. 1. Choose a root node. 2. Send messages from the leaves to the root . Send messages from the root to the leaves . 1. Compute the beliefs (unnormalized marginals). 2. Normalize beliefs and return the exact marginals. 25

  25. Sum-Product Belief Propagation Variables Factors ψ 2 X 2 Beliefs ψ 1 ψ 3 ψ 1 X 1 X 1 X 3 ψ 2 Messages X 2 ψ 1 ψ 3 ψ 1 X 1 X 1 X 3 26

  26. Sum-Product Belief Propagation Variables Factors ψ 2 X 2 Beliefs ψ 1 ψ 3 ψ 1 X 1 X 1 X 3 ψ 2 Messages X 2 ψ 1 ψ 3 ψ 1 X 1 X 1 X 3 27

  27. FORWARD BACKWARD AS SUM-PRODUCT BP 28

  28. CRF Tagging Model X 1 X 2 X 3 find preferred tags Could be verb or noun Could be adjective or verb Could be noun or verb 29

  29. CRF Tagging by Belief Propagation Backward algorithm = Forward algorithm = belief message passing message passing v 1.8 (matrix-vector products) (matrix-vector products) n 0 a 4.2 message message α α β β v n a v n a v 7 v 3 v 2 v 3 … … v 0 2 v 0 2 1 1 n 2 n 1 n 1 n 6 n 2 n 2 1 0 1 0 a 1 a 6 a 7 a 1 a 0 3 a 0 3 1 1 v 0.3 n 0 a 0.1 find tags preferred • Forward-backward is a message passing algorithm. • It’s the simplest case of belief propagation. 30

  30. So Let’s Review Forward-Backward … X 1 X 2 X 3 find preferred tags Could be verb or noun Could be adjective or verb Could be noun or verb 31

  31. So Let’s Review Forward-Backward … X 1 X 2 X 3 v v v n n n START END a a a find preferred tags • Show the possible values for each variable 32

  32. So Let’s Review Forward-Backward … X 1 X 2 X 3 v v v n n n START END a a a find preferred tags • Let’s show the possible values for each variable • One possible assignment 33

  33. So Let’s Review Forward-Backward … X 1 X 2 X 3 v v v n n n START END a a a find preferred tags • Let’s show the possible values for each variable • One possible assignment • And what the 7 factors think of it … 34

  34. Viterbi Algorithm: Most Probable Assignment X 1 X 2 X 3 ) v , v v v ( START ψ {1} ( v ) } 1 ψ {1,2} (v,a) ψ , 0 { ψ {3,4} (a, END ) n n n ) START END n , a ( } ψ {3} ( n ) 3 ψ , 2 { a a a ψ {2} ( a ) find preferred tags • So p( v a n ) = (1/Z) * product of 7 numbers • Numbers associated with edges and nodes of path • Most probable assignment = path with highest product 35

  35. Viterbi Algorithm: Most Probable Assignment X 1 X 2 X 3 ) v , v v v ( START ψ {1} ( v ) } 1 ψ {1,2} (v,a) ψ , 0 { ψ {3,4} (a, END ) n n n ) START END n , a ( } ψ {3} ( n ) 3 ψ , 2 { a a a ψ {2} ( a ) find preferred tags • So p( v a n ) = (1/Z) * product weight of one path 36

  36. Forward-Backward Algorithm: Finds Marginals X 1 X 2 X 3 v v v n n n START END a a a find preferred tags • So p( v a n ) = (1/Z) * product weight of one path • Marginal probability p( X 2 = a) = (1/Z) * total weight of all paths through 37 a

  37. Forward-Backward Algorithm: Finds Marginals X 1 X 2 X 3 v v v n n n START END a a a find preferred tags • So p( v a n ) = (1/Z) * product weight of one path • Marginal probability p( X 2 = a) = (1/Z) * total weight of all paths through 38 n

  38. Forward-Backward Algorithm: Finds Marginals X 1 X 2 X 3 v v v n n n START END a a a find preferred tags • So p( v a n ) = (1/Z) * product weight of one path • Marginal probability p( X 2 = a) = (1/Z) * total weight of all paths through 39 v

  39. Forward-Backward Algorithm: Finds Marginals X 1 X 2 X 3 v v v n n n START END a a a find preferred tags • So p( v a n ) = (1/Z) * product weight of one path • Marginal probability p( X 2 = a) = (1/Z) * total weight of all paths through 40 n

  40. Forward-Backward Algorithm: Finds Marginals X 1 X 2 X 3 v v v n n n START END a a a find preferred tags α 2 ( n ) = total weight of these path prefixes 41 (found by dynamic programming: matrix-vector products)

Recommend


More recommend