high arity interactions polyhedral relaxations and
play

High-arity Interactions, Polyhedral Relaxations, and Cutting Plane - PowerPoint PPT Presentation

High-arity Interactions, Polyhedral Relaxations, and Cutting Plane Algorithm for Soft Constraint Optimisation (MAP-MRF) Tom a s Werner Center for Machine Perception Czech Technical University Prague, Czech Republic 1 / 18 Abstract


  1. High-arity Interactions, Polyhedral Relaxations, and Cutting Plane Algorithm for Soft Constraint Optimisation (MAP-MRF) Tom´ aˇ s Werner Center for Machine Perception Czech Technical University Prague, Czech Republic 1 / 18

  2. Abstract The LP relaxation approach to finding the most probable configuration of MRF has been mostly considered only for binary (= pairwise) interactions [e.g. Schlesinger-76, Wainwright-05,Kolmogorov-06] . Based on [Schlesinger-76,Kovalevsky-75,Werner-07] , we generalise the approach to n-ary interactions, including the following contributions: ◮ Formulation of LP relaxation and its dual for n-ary problems. ◮ A simple algorithm to optimise the LP bound, the n-ary max-sum diffusion. ◮ A hierarchy of gradually tighter polyhedral relaxations of MAP-MRF, obtained by adding zero interactions. ◮ A cutting plane algorithm, where the cuts correspond to adding zero interactions and the separation problem to finding an unsatisfiable constraint satisfaction subproblem. ◮ We show that a class of high-arity interactions (e.g. of global interactions) can be included into the framework in a principled way. ◮ A simple proof that n-ary max-sum diffusion finds global optimum for n-ary supermodular problems. The result is a principled framework to deal with n-ary problems and designing their tighter relaxations. 2 / 18

  3. Problem formulation V (finite) set of variables v ∈ V a single variable (finite) domain of variable v ∈ V X v x v ∈ X v state of variable v ∈ V A ⊆ V a subset of variables X A = × v ∈ A X v joint domain of variables A ⊆ V x A ∈ X A joint state of variables A ⊆ V Problem: Finding the most probable configuration of MRF Instance: ◮ variables V and their domains { X v | v ∈ V } ◮ hypergraph E ⊆ 2 V ◮ interaction θ A : X A → R for each A ∈ E � Task: Compute max θ A ( x A ) . x V A ∈ E 3 / 18

  4. Examples ◮ V = { 1 , 2 , 3 , 4 } and E = {{ 2 , 3 , 4 } , { 1 , 2 } , { 3 , 4 } , { 3 }} : x 1 , x 2 , x 3 , x 4 [ θ 234 ( x 2 , x 3 , x 4 ) + θ 12 ( x 1 , x 2 ) + θ 34 ( x 3 , x 4 ) + θ 3 ( x 3 )] max ∪ E ′ where E ′ ⊆ � V � V � � ◮ E = : binary problem 1 2 � � � � max θ v ( x v ) + θ vv ′ ( x v , x v ′ ) x V v ∈ V vv ′ ∈ E ′ ∪ E ′ ∪ { V } where E ′ ⊆ � V � V � � ◮ E = (binary problem with a global constraint): 1 2 � � � � max θ v ( x v ) + θ vv ′ ( x v , x v ′ ) + θ V ( x V ) x V vv ′ ∈ E ′ v ∈ V 4 / 18

  5. Linear programming relaxation primal program dual program θ ⊤ µ → max ψ ⊤ 1 → min µ ϕ,ψ M µ = 0 ϕ ≶ 0 N µ = 1 ψ ≶ 0 ϕ ⊤ M + ψ ⊤ N ≥ θ ⊤ µ ≥ 0 X X X θ A ( x A ) µ A ( x A ) → max ψ A → min A ∈ E x A A ∈ E X µ A ( x A ) = µ B ( x B ) ϕ A , B ( x B ) ≶ 0 ( A , B ) ∈ J , x B ∈ X B x A \ B X µ A ( x A ) = 1 ψ A ≶ 0 A ∈ E x A X X µ A ( x A ) ≥ 0 ϕ B , A ( x A ) − ϕ A , B ( x B )+ ψ A ≥ θ A ( x A ) A ∈ E , x A ∈ X A B | ( B , A ) ∈ J B | ( A , B ) ∈ J where J ⊆ I ( E ) = { ( A , B ) | A ∈ E , B ∈ E , B ⊂ A } 5 / 18

  6. Meaning of primal LP: Consistency of distributions on joint states ◮ Each A ∈ E is assigned a probability distribution µ A : X A → R on its joint states. ◮ For each ( A , B ) ∈ J , distribution µ A marginalises onto µ B , i.e., � µ B ( x B ) = µ A ( x A ) x A \ B Example Let A = { 1 , 2 , 3 , 4 } and B = { 1 , 3 } ⊂ A . Then the equation � � µ B ( x B ) = µ A ( x A ) reads µ 13 ( x 1 , x 3 ) = µ 1234 ( x 1 , x 2 , x 3 , x 4 ). x A \ B x 2 , x 4 What happens if the distributions are crisp (i.e., they can attain only 0 or 1)? ◮ Then µ A represents a single joint state. � ◮ The marginalisation constraints µ B ( x B ) = µ A ( x A ) represents the fact that x A \ B joint state µ B is the restriction of joint state µ A onto variables B ⊂ A . 6 / 18

  7. Reparameterisations Definition A reparameterisation (equivalent transformation) is a change of weight vector θ � that preserves the objective function θ A ( x A ). A ∈ E ◮ Elementary reparameterisation on triplet ( A , B , x B ) with B ⊆ A : add ϕ A , B ( x B ) to weights { θ A ( x A ) | x A \ B ∈ X A \ B } and subtract it from θ B ( x B ). ◮ Doing this for all triplets ( A , B , x B ) such that ( A , B ) ∈ J yields � � θ ϕ A ( x A ) = θ A ( x A ) + ϕ A , B ( x B ) − ϕ B , A ( x A ) B | ( A , B ) ∈ J B | ( B , A ) ∈ J Example ∪ E ′ with E ′ ⊆ � V � V � � For a binary problem, i.e. E = , and J = I ( E ), we have 1 2 � θ ϕ v ( x v ) = θ v ( x v ) − ϕ vv ′ , v ( x v ) v ′ ∈ N v θ ϕ vv ′ ( x v , x v ′ ) = θ vv ′ ( x v , x v ′ ) + ϕ vv ′ , v ( x v ) + ϕ vv ′ , v ′ ( x v ′ ) 7 / 18

  8. Meaning of dual LP: Minimising upper bound by reparameterisations � � ◮ Upper bound on the true optimum: max θ A ( x A ) ≤ max x A θ A ( x A ) x V A ∈ E A ∈ E � x A θ ϕ ◮ The dual LP can be written as min max A ( x A ) ϕ A ∈ E When is the upper bound exact? ◮ Joint state x A of variables A ∈ E is called active if θ A ( x A ) = max x A θ A ( x A ). ◮ The upper bound is exact iff the constraint satisfaction problem (CSP) formed by the active joint states is satisfiable. Is this CSP satisfiable? Yes! 8 / 18

  9. N-ary max-sum diffusion Algorithm (n-ary max-sum diffusion) 1: loop for ( A , B ) ∈ J and x B ∈ X B do 2: ϕ A , B ( x B ) += [ θ ϕ B ( x B ) − max x A \ B θ ϕ A ( x A )] / 2 3: (I.e., do reparameterisation on ( A , B , x B ) that makes θ ϕ x A \ B θ ϕ B ( x B ) = max A ( x A ).) end for 4: 5: end loop ◮ Monotonically decreases the upper bound by reparameterisations. ◮ Converges to a state when θ ϕ x A \ B θ ϕ B ( x B ) = max A ( x A ) for all ( A , B ) ∈ J and x B . ◮ For binary problems, equivalent to TRW-S [Kolmogorov-06] with edge updates. ◮ May end up in a local minimum (because minimisation by coordinates is applied to a nonsmooth convex function) but it is not a big drawback. Evaluating max x A \ B θ ϕ A ( x A ) means solving an auxiliary problem, the structure of which is hypergraph E ∩ 2 A rather than E . 9 / 18

  10. Adding a zero interaction may tighten the relaxation Idea Adding a hyperedge A / ∈ E to E while setting θ A ≡ 0 does not change the objective but may improve the relaxation. In fact, we can virtually add all possible zero interactions: then E = 2 V but only a few θ A are non-zero. Now, the relaxation is fully determined by J . 1234 Example for V = { 1 , 2 , 3 , 4 } : – the lattice I (2 V ) of subsets of V – original E depicted by red nodes 123 124 134 234 – J ⊆ I (2 V ) depicted by red edges 13 12 14 23 24 34 1 2 3 4 10 / 18

  11. Hierarchy of polyhedral relaxations J 1 ⊆ J 2 implies that relaxation J 1 is not tighter than J 2 . Therefore: Result All possible sets J ⊆ I (2 V ) form a hierarchy of relaxations, partially ordered by the inclusion relation on I (2 V ). In particular: ◮ J = ∅ : the weakest relaxation (the sum of independent maxima for each hyperedge A ∈ E ). ◮ J = I ( E ): the well-known ‘tree’ relaxation for binary problems by [Schlesinger-76,Koster-98,Wainwright-03] ◮ J = I (2 V ): the exact solution. Note: Even if J 1 ⊂ J 2 , relaxations J 1 and J 2 may be the same. In particular: J = I (2 V ) and J = { ( V , A ) | A ∈ E } both yield the same relaxation. Interpretation as lift+constrain+project Tightening the relaxation can be seen as lifting the original LP polytope, imposing a marginalisation constraint in this lifted space, and projecting back. 11 / 18

  12. Example: Adding zero 4-cycle interactions to binary problems ◮ On a number of instances of a binary problem, we computed how many instances were solved by the n-ary max-sum diffusion to optimality. ◮ Two relaxations tested: ◮ J tree : ‘traditional’ LP relaxation [Schlesinger-76,Kolmogorov-06,...] ◮ J 4cycle : J tree augmented with zero interactions on 4-tuples of variables (thus inducing 4-cycle subproblems). type image side | X v | r tree r 4cycle 15 5 0.01 1.00 random 25 3 0.00 0.98 random 100 3 0.00 0.72 random 15 5 0.79 0.99 Potts 25 5 0.48 0.98 Potts 100 5 0.00 0.81 Potts 10 4 0.72 0.88 lines 25 4 0.00 0.00 lines 10 9 0.17 0.65 curve 15 9 0.00 0.24 curve 25 9 0.00 0.00 curve 15 5 0.00 0.82 Pi 12 / 18

  13. Cutting plane algorithm θ ⊤ µ � µ ∈ P θ ⊤ µ � µ ∈ P ∩ Z n � � � � � � Let max be LP relaxation of ILP max . Cutting plane algorithm for general ILP in primal space 1: P ′ ← P 2: loop Find a maximiser µ ∗ of max � µ ∈ P ′ � � θ ⊤ µ � . 3: Find a half-space H such that P ∩ Z n ⊆ H and µ ∗ / ∈ H . If none exists, halt. 4: (separation problem) P ′ ← P ′ ∩ H 5: 6: end loop Cutting plane algorithm for MAP-MRF in dual space 1: J ← I ( E ) 2: loop Minimise upper bound of relaxation J by max-sum diffusion. 3: Find A / ∈ E such that the CSP formed by the active joint states restricted on 4: variables A is unsatisfiable. If none exists, halt. θ A ← 0; E ← E ∪ { A } ; J ← I ( E ) 5: 6: end loop 13 / 18

  14. Example Instead of adding all 4-cycles initially, we add only some of them one by one. ◮ Advantage: Much less memory needed for dual variables (‘messages’) ◮ Drawback: Not very practical (slow) in this simple form 14 / 18

Recommend


More recommend