MAP Inference with MILP Matt Gormley Lecture 12 Oct. 7, 2019 1 - PowerPoint PPT Presentation

10-418 / 10-618 Machine Learning for Structured Data Machine Learning Department School of Computer Science Carnegie Mellon University MAP Inference with MILP Matt Gormley Lecture 12 Oct. 7, 2019 1

Reminders • Homework 2: BP for Syntax Trees – Out: Sat, Sep. 28 – Due: Sat, Oct. 12 at 11:59pm • Last chance to switch between 10-418 / 10- 618 is October 7th (drop deadline) • Today’s after-clas office hours are un- cancelled (i.e. I am having them) 3

MBR DECODING 4

Minimum Bayes Risk Decoding • Suppose we given a loss function l( y ’, y ) and are asked for a single tagging • How should we choose just one from our probability distribution p( y | x ) ? • A minimum Bayes risk (MBR) decoder h( x ) returns the variable assignment with minimum expected loss under the model’s distribution E y ∼ p θ ( ·| x ) [ ` (ˆ h θ ( x ) = argmin y , y )] ˆ y X p θ ( y | x ) ` (ˆ = argmin y , y ) ˆ y y 5

Minimum Bayes Risk Decoding h θ ( x ) = argmin E y ∼ p θ ( ·| x ) [ ` (ˆ y , y )] ˆ y Consider some example loss functions: X The Hamming loss corresponds to accuracy and returns the number of incorrect variable assignments: V X ` (ˆ y , y ) = (1 − I (ˆ y i , y i )) i =1 The MBR decoder is: y i = h θ ( x ) i = argmax ˆ p θ (ˆ y i | x ) y i ˆ This decomposes across variables and requires the variable marginals. 6

Minimum Bayes Risk Decoding h θ ( x ) = argmin E y ∼ p θ ( ·| x ) [ ` (ˆ y , y )] ˆ y Consider some example loss functions: X The 0-1 loss function returns 1 only if the two assignments are identical and 0 otherwise: ` (ˆ y , y ) = 1 − I (ˆ y , y ) The MBR decoder is: X h θ ( x ) = argmin p θ ( y | x )(1 − I (ˆ y , y )) ˆ y y = argmax p θ (ˆ y | x ) ˆ y which is exactly the MAP inference problem! 7

LINEAR PROGRAMMING & INTEGER LINEAR PROGRAMMING 8

Linear Programming Whiteboard – Example of Linear Program in 2D – LP Standard Form – Converting an LP to Standard Form – LP and its Polytope – Simplex algorithm (tableau method) – Interior points algorithm(s) 9

Integer Linear Programming Whiteboard – Example of an ILP in 2D – Example of an MILP in 2D 10

Background: Nonconvex Global Optimization Goal: optimize over the blue surface . 11

Background: Nonconvex Global Optimization Goal: optimize over the blue surface . 12

Background: Nonconvex Global Optimization Relaxation : provides an upper bound on the surfac e. 13

Background: Nonconvex Global Optimization Branching: partitions the search space into subspaces, and enables tighter relaxation s. X 1 ≤ 0.0 X 1 ≥ 0.0 14

Background: Nonconvex Global Optimization The max of all relaxed solutions for each of the partitions is a global upper bound . 17

Background: Nonconvex Global Optimization We can project a relaxed solution onto the feasible region . 18

Background: Nonconvex Global Optimization The incumbent is ε-optimal if the relative difference between the global upper bound and the incumbent score is less than ε . 19

How much should we subdivide? 20

How much should we subdivide? BRANCH-AND-BOUND • Method for recursively subdividing the search space • Subspace order can be determined heuristically (e.g. best-first search with depth-first plunging) • Prunes subspaces that can’t yield better solutions 21

Background: Nonconvex Global Optimization If the subspace upper bound is worse than the current incumbent , we can prune that subspac e. 22

Background: Nonconvex Global Optimization If the subspace upper bound is worse than the current incumbent , we can prune that subspac e. 23

Limitations: Branch-and-Bound for the Viterbi Objective • The Viterbi Objective • Preview of Experiments – Nonconvex – We solve 5 sentences, but on 200 sentences, we couldn’t – NP Hard to solve run to completion (Cohen & Smith, 2010) – Our (hybrid) global search • Branch-and-bound framework incorporates – Kind of tricky to get it local search right… – This hybrid approach – Curse of dimensionality kicks sometimes finds higher in quickly likelihood (and higher • Nonconvex quadratic accuracy) solutions than optimization by LP-based pure local search branch-and-bound usually fails with more than 80 variables (Burer and Vandenbussche, 2009) • Our smallest (toy) problems have hundreds of variables 24

BRANCH-AND-BOUND INGREDIENTS Mathematical Program Relaxation Projection (Branch-and-Bound Search Heuristics) 25

Background: Nonconvex Global Optimization We solve the relaxation using the Simplex algorithm . 26

Background: Nonconvex Global Optimization We can project a relaxed solution onto the feasible region . 27

Integer Linear Programming Whiteboard – Branch and bound for an ILP in 2D 28

Branch and Bound Algorithm 2.1 Branch-and-bound Input : Minimization problem instance R . Output : Optimal solution x ⋆ with value c ⋆ , or conclusion that R has no solution, indicated by c ⋆ = ∞ . c := ∞ . 1. Initialize L := { R } , ˆ [ init ] 2. If L = ∅ , stop and return x ⋆ = ˆ x and c ⋆ = ˆ c . [ abort ] 3. Choose Q ∈ L , and set L := L \ { Q } . [ select ] c := ∞ . Otherwise, let ˇ 4. Solve a relaxation Q relax of Q . If Q relax is empty, set ˇ x be an optimal solution of Q relax and ˇ c its objective value. [ solve ] c ≥ ˆ 5. If ˇ c , goto Step 2. [ bound ] 6. If ˇ x is feasible for R , set ˆ x := ˇ x , ˆ c := ˇ c , and goto Step 2. [ check ] 7. Split Q into subproblems Q = Q 1 ∪ . . . ∪ Q k , set L := L ∪ { Q 1 , . . . , Q k } , and goto Step 2. [ branch ] 29 Slide from Achterberg (thesis, 2007)

Branch and Bound root node R pruned solved subproblem subproblem current feasible Q subproblem solution unsolved new Q 1 Q k subproblems subproblems 30 Slide from Achterberg (thesis, 2007)

Branch and Bound Q Q 1 Q 2 x ˇ x ˇ Figure 2.2. LP based branching on a single fractional variable. 31 Slide from Achterberg (thesis, 2007)

MAP Inference with MILP Matt Gormley Lecture 12 Oct. 7, 2019 1 - PowerPoint PPT Presentation

10-418 / 10-618 Machine Learning for Structured Data Machine Learning Department School of Computer Science Carnegie Mellon University MAP Inference with MILP Matt Gormley Lecture 12 Oct. 7, 2019 1 Reminders Homework 2: BP for Syntax

Cube-Attack-Like Cryptanalysis of Round-Reduced Keccak Using MILP Ling Song , Jian Guo FSE 2019 @

map-D map-D data refined map-D data refined map-D A GPU Database for Real-Time Big Data

Abstract Data Type Map Map ADT Another fundamental abstract data type is the map (also The most

GENERIC CUTS: AN EFFICIENT ALGORITHM FOR OPTIMAL INFERENCE IN HIGHER ORDER MRF-MAP Chetan Arora,

Probabilistic Graphical Models Probabilistic Graphical Models MAP inference Siamak Ravanbakhsh

Graphical Models Graphical Models MAP inference Siamak Ravanbakhsh Winter 2018 Learning

CS786 Lecture 15: May 21, 2012 MAP inference [KF Chapter 13] CS786 P. Poupart 2012 1 MAP Queries

Inference in Bayesian networks Chapter 14.45 Chapter 14.45 1 Outline Exact inference

Evaluation Map Guide Evaluation Map Guide Evaluation Map Guide Evaluation Map Guide Progress on

var ol3d = new olcs.OLCesium({map: map, target: id}); ol3d.setEnabled(true); var ol3d = new

Standards & Energy Efficiency for the Lighting of Road Tunnels John Rands CEng MILP

Applying MILP Method to Searching Integral Distinguishers Based on Division Property for 6

New MILP Modelings for Symmetric-Key Primitives Christina Boura (Joint-work with Daniel Coggia)

A MILP model for planning at operative level in a meat packing plant By Sara Vernica Rodrguez

Learning to Branch in MILP Solvers Maxime Gasse, Didier Chetelat, Laurent Charlin, Andrea Lodi

MILP Modeling for (Large) S-boxes to Optimize Probability of Differential Characteristics Ahmed

On Recovering Affine Encodings in White-Box Implementations Patrick Derbez 1 , Pierre-Alain Fouque

Project Takeaway submitted. Next submission: Software Testing Nim Oct 26 th

Morphology in CLARIN-D Danil de Kok Introduction A whirlwind introduction: CLARIN-D

3 COMP 1 5 9 3 Algorithmic Verification LTL Model Checking and B uchi Automata Dr. Liam

Data Compression Heiko Schwarz Freie Universitt Berlin Fachbereich Mathematik und Informatik

1 XML: A Language for Metadata Tags Extensible Markup Language Tagging scheme similar to

ASSIGNMENT, LOOPS, AND BASIC TYPES CSSE 120 Rose-Hulman Institute of Technology Outline

Maths daily planning slides Week beginning 5.10.20 5.10.20 Today you need to complete the

Sambuz

Useful Links

Newsletter

Mail Us

MAP Inference with MILP Matt Gormley Lecture 12 Oct. 7, 2019 1 - PowerPoint PPT Presentation

10-418 / 10-618 Machine Learning for Structured Data Machine Learning Department School of Computer Science Carnegie Mellon University MAP Inference with MILP Matt Gormley Lecture 12 Oct. 7, 2019 1 Reminders Homework 2: BP for Syntax

Cube-Attack-Like Cryptanalysis of Round-Reduced Keccak Using MILP Ling Song , Jian Guo FSE 2019 @

map-D map-D data refined map-D data refined map-D A GPU Database for Real-Time Big Data

Abstract Data Type Map Map ADT Another fundamental abstract data type is the map (also The most

GENERIC CUTS: AN EFFICIENT ALGORITHM FOR OPTIMAL INFERENCE IN HIGHER ORDER MRF-MAP Chetan Arora,

Probabilistic Graphical Models Probabilistic Graphical Models MAP inference Siamak Ravanbakhsh

Graphical Models Graphical Models MAP inference Siamak Ravanbakhsh Winter 2018 Learning

CS786 Lecture 15: May 21, 2012 MAP inference [KF Chapter 13] CS786 P. Poupart 2012 1 MAP Queries

Inference in Bayesian networks Chapter 14.45 Chapter 14.45 1 Outline Exact inference

Evaluation Map Guide Evaluation Map Guide Evaluation Map Guide Evaluation Map Guide Progress on

var ol3d = new olcs.OLCesium({map: map, target: id}); ol3d.setEnabled(true); var ol3d = new

Standards &amp; Energy Efficiency for the Lighting of Road Tunnels John Rands CEng MILP

Applying MILP Method to Searching Integral Distinguishers Based on Division Property for 6

New MILP Modelings for Symmetric-Key Primitives Christina Boura (Joint-work with Daniel Coggia)

A MILP model for planning at operative level in a meat packing plant By Sara Vernica Rodrguez

Learning to Branch in MILP Solvers Maxime Gasse, Didier Chetelat, Laurent Charlin, Andrea Lodi

MILP Modeling for (Large) S-boxes to Optimize Probability of Differential Characteristics Ahmed

On Recovering Affine Encodings in White-Box Implementations Patrick Derbez 1 , Pierre-Alain Fouque

Project Takeaway submitted. Next submission: Software Testing Nim Oct 26 th

Morphology in CLARIN-D Danil de Kok Introduction A whirlwind introduction: CLARIN-D

3 COMP 1 5 9 3 Algorithmic Verification LTL Model Checking and B uchi Automata Dr. Liam

Data Compression Heiko Schwarz Freie Universitt Berlin Fachbereich Mathematik und Informatik

1 XML: A Language for Metadata Tags Extensible Markup Language Tagging scheme similar to

ASSIGNMENT, LOOPS, AND BASIC TYPES CSSE 120 Rose-Hulman Institute of Technology Outline

Maths daily planning slides Week beginning 5.10.20 5.10.20 Today you need to complete the

Sambuz

Useful Links

Newsletter

Mail Us

Standards & Energy Efficiency for the Lighting of Road Tunnels John Rands CEng MILP