map inference with milp
play

MAP Inference with MILP Matt Gormley Lecture 12 Oct. 7, 2019 1 - PowerPoint PPT Presentation

10-418 / 10-618 Machine Learning for Structured Data Machine Learning Department School of Computer Science Carnegie Mellon University MAP Inference with MILP Matt Gormley Lecture 12 Oct. 7, 2019 1 Reminders Homework 2: BP for Syntax


  1. 10-418 / 10-618 Machine Learning for Structured Data Machine Learning Department School of Computer Science Carnegie Mellon University MAP Inference with MILP Matt Gormley Lecture 12 Oct. 7, 2019 1

  2. Reminders • Homework 2: BP for Syntax Trees – Out: Sat, Sep. 28 – Due: Sat, Oct. 12 at 11:59pm • Last chance to switch between 10-418 / 10- 618 is October 7th (drop deadline) • Today’s after-clas office hours are un- cancelled (i.e. I am having them) 3

  3. MBR DECODING 4

  4. Minimum Bayes Risk Decoding • Suppose we given a loss function l( y ’, y ) and are asked for a single tagging • How should we choose just one from our probability distribution p( y | x ) ? • A minimum Bayes risk (MBR) decoder h( x ) returns the variable assignment with minimum expected loss under the model’s distribution E y ∼ p θ ( ·| x ) [ ` (ˆ h θ ( x ) = argmin y , y )] ˆ y X p θ ( y | x ) ` (ˆ = argmin y , y ) ˆ y y 5

  5. Minimum Bayes Risk Decoding h θ ( x ) = argmin E y ∼ p θ ( ·| x ) [ ` (ˆ y , y )] ˆ y Consider some example loss functions: X The Hamming loss corresponds to accuracy and returns the number of incorrect variable assignments: V X ` (ˆ y , y ) = (1 − I (ˆ y i , y i )) i =1 The MBR decoder is: y i = h θ ( x ) i = argmax ˆ p θ (ˆ y i | x ) y i ˆ This decomposes across variables and requires the variable marginals. 6

  6. Minimum Bayes Risk Decoding h θ ( x ) = argmin E y ∼ p θ ( ·| x ) [ ` (ˆ y , y )] ˆ y Consider some example loss functions: X The 0-1 loss function returns 1 only if the two assignments are identical and 0 otherwise: ` (ˆ y , y ) = 1 − I (ˆ y , y ) The MBR decoder is: X h θ ( x ) = argmin p θ ( y | x )(1 − I (ˆ y , y )) ˆ y y = argmax p θ (ˆ y | x ) ˆ y which is exactly the MAP inference problem! 7

  7. LINEAR PROGRAMMING & INTEGER LINEAR PROGRAMMING 8

  8. Linear Programming Whiteboard – Example of Linear Program in 2D – LP Standard Form – Converting an LP to Standard Form – LP and its Polytope – Simplex algorithm (tableau method) – Interior points algorithm(s) 9

  9. Integer Linear Programming Whiteboard – Example of an ILP in 2D – Example of an MILP in 2D 10

  10. Background: Nonconvex Global Optimization Goal: optimize over the blue surface . 11

  11. Background: Nonconvex Global Optimization Goal: optimize over the blue surface . 12

  12. Background: Nonconvex Global Optimization Relaxation : provides an upper bound on the surfac e. 13

  13. Background: Nonconvex Global Optimization Branching: partitions the search space into subspaces, and enables tighter relaxation s. X 1 ≤ 0.0 X 1 ≥ 0.0 14

  14. Background: Nonconvex Global Optimization Branching: partitions the search space into subspaces, and enables tighter relaxation s. X 1 ≤ 0.0 X 1 ≥ 0.0 15

  15. Background: Nonconvex Global Optimization Branching: partitions the search space into subspaces, and enables tighter relaxation s. X 1 ≤ 0.0 X 1 ≥ 0.0 16

  16. Background: Nonconvex Global Optimization The max of all relaxed solutions for each of the partitions is a global upper bound . 17

  17. Background: Nonconvex Global Optimization We can project a relaxed solution onto the feasible region . 18

  18. Background: Nonconvex Global Optimization The incumbent is ε-optimal if the relative difference between the global upper bound and the incumbent score is less than ε . 19

  19. How much should we subdivide? 20

  20. How much should we subdivide? BRANCH-AND-BOUND • Method for recursively subdividing the search space • Subspace order can be determined heuristically (e.g. best-first search with depth-first plunging) • Prunes subspaces that can’t yield better solutions 21

  21. Background: Nonconvex Global Optimization If the subspace upper bound is worse than the current incumbent , we can prune that subspac e. 22

  22. Background: Nonconvex Global Optimization If the subspace upper bound is worse than the current incumbent , we can prune that subspac e. 23

  23. Limitations: Branch-and-Bound for the Viterbi Objective • The Viterbi Objective • Preview of Experiments – Nonconvex – We solve 5 sentences, but on 200 sentences, we couldn’t – NP Hard to solve run to completion (Cohen & Smith, 2010) – Our (hybrid) global search • Branch-and-bound framework incorporates – Kind of tricky to get it local search right… – This hybrid approach – Curse of dimensionality kicks sometimes finds higher in quickly likelihood (and higher • Nonconvex quadratic accuracy) solutions than optimization by LP-based pure local search branch-and-bound usually fails with more than 80 variables (Burer and Vandenbussche, 2009) • Our smallest (toy) problems have hundreds of variables 24

  24. BRANCH-AND-BOUND INGREDIENTS Mathematical Program Relaxation Projection (Branch-and-Bound Search Heuristics) 25

  25. Background: Nonconvex Global Optimization We solve the relaxation using the Simplex algorithm . 26

  26. Background: Nonconvex Global Optimization We can project a relaxed solution onto the feasible region . 27

  27. Integer Linear Programming Whiteboard – Branch and bound for an ILP in 2D 28

  28. Branch and Bound Algorithm 2.1 Branch-and-bound Input : Minimization problem instance R . Output : Optimal solution x ⋆ with value c ⋆ , or conclusion that R has no solution, indicated by c ⋆ = ∞ . c := ∞ . 1. Initialize L := { R } , ˆ [ init ] 2. If L = ∅ , stop and return x ⋆ = ˆ x and c ⋆ = ˆ c . [ abort ] 3. Choose Q ∈ L , and set L := L \ { Q } . [ select ] c := ∞ . Otherwise, let ˇ 4. Solve a relaxation Q relax of Q . If Q relax is empty, set ˇ x be an optimal solution of Q relax and ˇ c its objective value. [ solve ] c ≥ ˆ 5. If ˇ c , goto Step 2. [ bound ] 6. If ˇ x is feasible for R , set ˆ x := ˇ x , ˆ c := ˇ c , and goto Step 2. [ check ] 7. Split Q into subproblems Q = Q 1 ∪ . . . ∪ Q k , set L := L ∪ { Q 1 , . . . , Q k } , and goto Step 2. [ branch ] 29 Slide from Achterberg (thesis, 2007)

  29. Branch and Bound root node R pruned solved subproblem subproblem current feasible Q subproblem solution unsolved new Q 1 Q k subproblems subproblems 30 Slide from Achterberg (thesis, 2007)

  30. Branch and Bound Q Q 1 Q 2 x ˇ x ˇ Figure 2.2. LP based branching on a single fractional variable. 31 Slide from Achterberg (thesis, 2007)

Recommend


More recommend