inference graph search
play

Inference: Graph Search CS 6355: Structured Prediction 1 So far in - PowerPoint PPT Presentation

Inference: Graph Search CS 6355: Structured Prediction 1 So far in the class Thinking about structures A graph, a collection of parts that are labeled jointly, a collection of decisions Algorithms for learning Local learning


  1. Inference: Graph Search CS 6355: Structured Prediction 1

  2. So far in the class • Thinking about structures – A graph, a collection of parts that are labeled jointly, a collection of decisions • Algorithms for learning – Local learning • Learn parameters for individual components independently • Learning algorithm not aware of the full structure – Global learning • Learn parameters for the full structure • Learning algorithm “knows” about the full structure • Next: Prediction – Sets structured prediction apart from binary/multiclass 2

  3. Inference What is inference? • – An overview of what we have seen before – Combinatorial optimization – Different views of inference Graph algorithms • – Dynamic programming, greedy algorithms, search Integer programming • Heuristics for inference • – Sampling Learning to search • 3

  4. Inference What is inference? • – An overview of what we have seen before – Combinatorial optimization – Different views of inference Graph algorithms • – Dynamic programming, greedy algorithms, search Integer programming • Heuristics for inference • – Sampling Learning to search • 4

  5. Variable elimination: Max-product We have a collection of inference variables that need to be assigned 𝐳 = (𝑧 1 , 𝑧 2 , ! ) 5

  6. Variable elimination: Max-product We have a collection of inference variables that need to be assigned 𝐳 = (𝑧 1 , 𝑧 2 , ! ) General algorithm – First fix an ordering of the variables, say (𝑧 1 , 𝑧 2 , ! ) – Iteratively: • Find the best value for y i given the values of the previous neighbors – Use back pointers to find final answer 6

  7. Variable elimination: Max-product We have a collection of inference variables that need to be assigned 𝐳 = (𝑧 1 , 𝑧 2 , ! ) General algorithm – First fix an ordering of the variables, say (𝑧 1 , 𝑧 2 , ! ) – Iteratively: • Find the best value for y i given the values of the previous neighbors – Use back pointers to find final answer Viterbi is an instance of max-product variable elimination 7

  8. Variable elimination example transitions(𝑧 0 , 𝑧 4 ) A B C D A B C D … y 1 y 2 y 3 y n A B C D emissions 𝑧 0 8

  9. Variable elimination example transitions(𝑧 0 , 𝑧 4 ) A B C D A B C D … y 1 y 2 y 3 y n A B C D emissions 𝑧 0 score-local 𝑧 8 , 𝑧 890 = emissions 𝑧 890 + transitions(𝑧 8 , 𝑧 890 ) 9

  10. Variable elimination example transitions(𝑧 0 , 𝑧 4 ) A B C D A B C D … y 1 y 2 y 3 y n A B C D emissions 𝑧 0 score-local 𝑧 8 , 𝑧 890 = emissions 𝑧 890 + transitions(𝑧 8 , 𝑧 890 ) score 4 𝑧 4 = max score 0 𝑧 0 + score-local 𝑧 0 , 𝑧 4 First eliminate y 1 < = 10

  11. Variable elimination example transitions(𝑧 4 , 𝑧 > ) A B C D A B C D … y 2 y 3 y n A B C D score 4 𝑧 4 score-local 𝑧 8 , 𝑧 890 = emissions 𝑧 890 + transitions(𝑧 8 , 𝑧 890 ) 11

  12. Variable elimination example transitions(𝑧 4 , 𝑧 > ) A B C D A B C D … y 2 y 3 y n A B C D score 4 𝑧 4 score-local 𝑧 8 , 𝑧 890 = emissions 𝑧 890 + transitions(𝑧 8 , 𝑧 890 ) score > 𝑧 > = max score 4 𝑧 4 + score-local 𝑧 4 , 𝑧 > Next eliminate y 2 < ? 12

  13. Variable elimination example transitions(𝑧 > , 𝑧 @ ) A B C D A B C D … y 3 y n A B C D score > 𝑧 > score-local 𝑧 8 , 𝑧 890 = emissions 𝑧 890 + transitions(𝑧 8 , 𝑧 890 ) 13

  14. Variable elimination example transitions(𝑧 > , 𝑧 @ ) A B C D A B C D … y 3 y n A B C D score > 𝑧 > score-local 𝑧 8 , 𝑧 890 = emissions 𝑧 890 + transitions(𝑧 8 , 𝑧 890 ) score @ 𝑧 @ = max score > 𝑧 > + score-local 𝑧 > , 𝑧 @ Next eliminate y 3 < A 14

  15. Variable elimination example y n A B C D score B 𝑧 C After n such steps We have all the information to make a decision for y n 15

  16. Variable elimination: Max-product We have a collection of inference variables that need to be assigned 𝐳 = (𝑧 1 , 𝑧 2 , ! ) General algorithm – First fix an ordering of the variables, say (𝑧 1 , 𝑧 2 , ! ) – Iteratively: • Find the best value for y i given the values of the previous neighbors – Use back pointers to find final answer Viterbi is an instance of max-product variable elimination 16

  17. Variable elimination: Max-product We have a collection of inference variables that need to be assigned 𝐳 = (𝑧 1 , 𝑧 2 , ! ) General algorithm Challenge: What makes a good order? – First fix an ordering of the variables, say (𝑧 1 , 𝑧 2 , ! ) – Iteratively: • Find the best value for y i given the values of the previous neighbors – Use back pointers to find final answer Viterbi is an instance of max-product variable elimination 17

  18. � Max-product algorithm • Where is the “product” in max-product? 𝐱 E 𝜚 𝐲, 𝐳 = H score-local(𝑧 8 , 𝑧 890 ) 8 • Generalizes beyond sequence models – Requires a clever ordering of the output variables – Exact inference when the output is a tree • If not, no guarantees • Also works for summing over all structures – Sum-product message passing – Belief propagation 18

  19. � Max-product algorithm • Where is the “product” in max-product? 𝐱 E 𝜚 𝐲, 𝐳 = H score-local(𝑧 8 , 𝑧 890 ) 8 • Generalizes beyond sequence models – Requires a clever ordering of the output variables – Exact inference when the output is a tree • If not, no guarantees • Also works for summing over all structures – Sum-product message passing – Belief propagation 19

  20. Dynamic programming General solution strategy for inference • Examples • – Viterbi, CKY algorithm, Dijkstra’s algorithm, and many more Key ideas: • – Memoization: Don’t re-compute something you already have – Requires an ordering of the variables Remember: • – The hypergraph may not allow for the best ordering of the variables – Existence of a dynamic programming algorithm does not mean polynomial time/space. • State space may be too big. Use heuristics such as beam search 20

  21. Graph algorithms for inference • Many graph algorithms you have seen are applicable for inference • Some examples – “Best” path. Eg: Viterbi, parsing – Min-cut/max-flow. Eg: Image segmentation – Maximum spanning tree. Eg: Dependency parsing – Bipartite matching. Eg: Aligning sequences 21

  22. Best path for inference • Broad description of approach: – Construct a graph/hypergraph from the input and output – Decompose the total score along edge/hyperedges – Inference is finding the shortest/longest path in this weighted graph Viterbi algorithm finds a shortest path in a specific graph! 22

  23. Viterbi algorithm as best path Goal: To find the highest scoring path in this trellis Different labels for each step Time steps 23

  24. Viterbi algorithm as best path Goal: To find the highest scoring path in this trellis Different labels for each step 24

  25. Viterbi algorithm as best path Goal: To find the highest scoring path in this trellis Different labels for each step No cycles Nodes and edges have a specific meaning 25 Ordering helps

  26. Best path algorithms • Dijkstra’s algorithm – Cost functions should be non-negative • Bellman-ford algorithm – Slower than Dijkstra’s algorithm but works with negative weights • A* search – If you have a heuristic that gives the future path cost from a state but does not over-estimate it 26

  27. Inference as search: Setting • Predicting a graph as a sequence of decisions • Data structures: – State: Encodes partial structure – Transitions: Move from one partial structure to another – Start state – End state: We have a full structure • There may be more than one end state • Each transition is scored with the learned model • Goal: Find an end state that has the highest total score 27

  28. Example Suppose each y can be one of A, B or C y 3 y 1 y 2 x 1 x 2 x 3 State: Triples (y 1 , y 2 , y 3 ) all possibly unknown • (A, -, -), (-, A, A), (-, -, -),… • Transition: Fill in one of the unknowns • Start state: (-,-,-) • End state: All three y’s are assigned • 28

  29. Example Suppose each y can be one of A, B or C y 3 y 1 y 2 Start state: No assignments x 1 x 2 x 3 (-,-,-) State: Triples (y 1 , y 2 , y 3 ) all possibly unknown • (A, -, -), (-, A, A), (-, -, -),… • Transition: Fill in one of the unknowns • Start state: (-,-,-) • End state: All three y’s are assigned • 29

  30. Example Suppose each y can be one of A, B or C y 3 y 1 y 2 x 1 x 2 x 3 (-,-,-) State: Triples (y 1 , y 2 , y 3 ) all possibly unknown • (A, -, -), (-, A, A), (-, -, -),… • (A,-,-) (B,-,-) (C,-,-) Transition: Fill in one of the unknowns • Fill in a label in a slot. The edge is scored by the factors Start state: (-,-,-) • that can be computed so far End state: All three y’s are assigned • 30

  31. Example Suppose each y can be one of A, B or C y 3 y 1 y 2 x 1 x 2 x 3 (-,-,-) State: Triples (y 1 , y 2 , y 3 ) all possibly unknown • (A, -, -), (-, A, A), (-, -, -),… • (A,-,-) (B,-,-) (C,-,-) Transition: Fill in one of the unknowns • ….. (A,A,-) (C,C,-) Start state: (-,-,-) • Keep assigning values to slots End state: All three y’s are assigned • 31

Recommend


More recommend