unifying local consistency and max sat relaxations for
play

Unifying Local Consistency and MAX SAT Relaxations for Scalable - PowerPoint PPT Presentation

Unifying Local Consistency and MAX SAT Relaxations for Scalable Inference with Rounding Guarantees Stephen H. Bach Bert Huang Lise Getoor Maryland Virginia Tech UC Santa Cruz AISTATS


  1. Unifying Local Consistency and MAX SAT Relaxations for Scalable Inference with Rounding Guarantees Stephen H. Bach Bert Huang Lise Getoor Maryland Virginia Tech UC Santa Cruz AISTATS 2015

  2. This Talk § Markov random fields capture rich dependencies in structured data, but inference is NP-hard § Relaxed inference can help, but techniques have tradeoffs § Two approaches: C Local Consistency Relaxation MAX SAT Relaxation 2

  3. Takeaways C ⊃ Local Consistency Relaxation MAX SAT Relaxation § We can combine their advantages: quality guarantees and highly scalable message-passing algorithms § New inference algorithm for broad class of structured, relational models 3

  4. Modeling Relational Data with Markov Random Fields

  5. Markov Random Fields § Probabilistic model for high-dimensional data: w > φ ( x ) � � P ( x ) ∝ exp § The random variables represent the data, such as x whether a person has an attribute or whether a link exists § The potentials score different configurations of the data φ § The weights scale the influence of different potentials w 5

  6. Markov Random Fields § Variables and potentials form graphical structure: φ φ φ φ φ x x x φ φ φ x 6

  7. Modeling Relational Data § Many important problems have relational structure § Common to use logic to describe probabilistic dependencies § Relations in data map to logical predicates crossing waiting queueing walking talking dancing jogging 7

  8. Logical Potentials § One way to compactly define MRFs is with first-order logic, e.g., Markov logic networks [Richardson and Domingos, 2006] 5 . 0 : Friends ( X, Y ) ∧ Smokes ( X ) = ⇒ Smokes ( Y ) § Each first-order rule is a template for potentials - Ground out rule over relational data - The truth table of each ground rule is a potential - Each potential’s weight comes from the rule that templated it 8

  9. Logical Potentials: Grounding 5 . 0 : Friends ( X, Y ) ∧ Smokes ( X ) = ⇒ Smokes ( Y ) 5 . 0 : Friends ( Alexis , Bob ) ∧ Smokes ( Alexis ) = ⇒ Smokes ( Bob ) � � Alexis � Erin Bob � � Dave Claudia 9

  10. Logical Potentials § Let be a set of rules, where each rule has the R R j general form 0 1 0 1 _ _ _ B C B C w j : x i ¬ x i @ A @ A i ∈ I + i ∈ I − j j - Weights and sets and index variables I + I − w j ≥ 0 j j 10

  11. MAP Inference § MAP ( maximum a posteriori ) inference seeks a most- probable assignment to the unobserved variables § MAP inference is 0 0 1 0 1 1 X _ _ _ arg max P ( x ) arg max w j x i ¬ x i B B C B C C ≡ @ @ A @ A A x x ∈ { 0 , 1 } n R j ∈ R i ∈ I + i ∈ I − j j § This MAX SAT problem is combinatorial and NP-hard! 11

  12. Relaxed MAP Inference

  13. Approaches to Relaxed Inference § Local consistency relaxation - Developed in probabilistic graphical models community - ADVANTAGE: Many highly scalable algorithms available - DISADVANTAGE: No known quality guarantees for logical MRFs § MAX SAT relaxation - Developed in randomized algorithms community - ADVANTAGE: Provides strong quality guarantees - DISADVANTAGE: No algorithms designed for large-scale models § How can we combine these advantages? 13

  14. Local Consistency Relaxation

  15. Local Consistency Relaxation § LCR is a popular technique for approximating MAP in MRFs - Often simply called linear programming (LP) relaxation - Dual decomposition solves dual to LCR objective § Lots of work in PGM community, e.g., - Globerson and Jaakkola, 2007 - Wainwright and Jordan, 2008 - Sontag et al. 2008, 2012 § Idea: relax search over consistent marginals to simpler set 15

  16. Local Consistency Relaxation µ µ µ µ φ φ φ φ φ x x x φ φ φ x 16

  17. Local Consistency Relaxation θ θ φ φ φ φ φ x x x φ φ φ x 17

  18. Local Consistency Relaxation X X arg max θ j ( x j ) φ j ( x j ) w j ( θ , µ ) ∈ L x j R j ∈ R µ : pseudomarginals over variable states x : pseudomarginals over joint potential states φ ( x j ) θ 18

  19. MAX SAT Relaxation

  20. Approximate Inference § View MAP inference as optimizing rounding probabilities § Expected score of a clause is a weighted noisy-or function: 0 1 Y Y @ 1 − (1 − p i ) B C w j p i A i ∈ I + i ∈ I − j j § Then expected total score is 0 1 C ˆ X Y Y W = @ 1 − (1 − p i ) w j p i B C A R j ∈ R i ∈ I + i ∈ I − j j § But, is highly non-convex! arg max p ˆ W 20

  21. Approximate Inference § It is the products in the objective that make it non-convex § The expected score can be lower bounded using the relationship between arithmetic and harmonic means: p 1 + p 2 + · · · + p k √ p 1 p 2 · · · p k ≥ k k § This leads to the lower bound 8 9 0 1 > > ✓ ◆ X 1 − 1 < = X Y Y X X @ 1 − (1 − p i ) w j min p i + (1 − p i ) , 1 A ≥ B C w j p i e > > R j ∈ R i ∈ I + R j ∈ R i ∈ I + i ∈ I − : i ∈ I − ; j j j j 21 Goemans and Williamson, 1994

  22. Approximate Inference § So, we solve the linear program 8 9 > > < = X X X arg max w j min y i + (1 − y i ) , 1 y ∈ [0 , 1] n > > R j ∈ R i ∈ I + i ∈ I − : ; j j § If we set , a greedy rounding method will find a p i = y i -optimal discrete solution ✓ ◆ 1 − 1 e p i = 1 2 y i + 1 § If we set , it improves to ¾ -optimal 4 22 Goemans and Williamson, 1994

  23. Unifying the Relaxations

  24. Analysis X X arg max θ j ( x j ) φ j ( x j ) w j ( θ , µ ) ∈ L x j R j ∈ R and so on… j=1 j=3 j=2 φ φ φ φ φ x x x φ φ φ x 24

  25. Analysis ˆ X arg max φ j ( µ ) µ ∈ [0 , 1] n R j ∈ R X max θ j ( x j ) φ j ( x j ) θ j | ( θ j , µ ) ∈ L w j x j X X max θ j ( x j ) φ j ( x j ) max θ j ( x j ) φ j ( x j ) θ j | ( θ j , µ ) ∈ L w j θ j | ( θ j , µ ) ∈ L w j x j x j X max θ j ( x j ) φ j ( x j ) θ j | ( θ j , µ ) ∈ L w j x j µ µ µ X max θ j ( x j ) φ j ( x j ) θ j | ( θ j , µ ) ∈ L w j x j X X max θ j ( x j ) φ j ( x j ) max θ j ( x j ) φ j ( x j ) X θ j | ( θ j , µ ) ∈ L w j max θ j ( x j ) φ j ( x j ) θ j | ( θ j , µ ) ∈ L w j θ j | ( θ j , µ ) ∈ L w j x j x j x j µ 25

  26. Analysis § We can now analyze each potential’s parameterized subproblem in isolation: ˆ X φ j ( µ ) = max θ j ( x j ) φ j ( x j ) θ j | ( θ j , µ ) ∈ L w j x j § Using the KKT conditions, we can find a simplified expression for each solution based on the parameters : µ 8 9 > > < = ˆ X X φ j ( µ ) = w j min µ i + (1 − µ i ) , 1 > > i ∈ I + i ∈ I − : ; j j 26

  27. Substitute back into Analysis outer objective ˆ X arg max φ j ( µ ) µ ∈ [0 , 1] n R j ∈ R 8 9 > > < = X X w j min µ i + (1 − µ i ) , 1 > > i ∈ I + 8 9 : i ∈ I − ; 8 9 j j > > > > < = X X < = X X w j min µ i + (1 − µ i ) , 1 w j min µ i + (1 − µ i ) , 1 > > i ∈ I + > > : i ∈ I − ; i ∈ I + : i ∈ I − ; j j j j 8 9 > > < = X X w j min µ i + (1 − µ i ) , 1 > > i ∈ I + : i ∈ I − ; j j 8 9 µ µ µ > > < = X X w j min µ i + (1 − µ i ) , 1 > i ∈ I + > : i ∈ I − ; j j 8 9 8 9 8 9 > > < = > > X X < = w j min µ i + (1 − µ i ) , 1 > > X X w j min µ i + (1 − µ i ) , 1 < = X X w j min µ i + (1 − µ i ) , 1 > i ∈ I + > : i ∈ I − ; > i ∈ I + > : i ∈ I − ; j j > i ∈ I + > j j : i ∈ I − ; j j µ 27

  28. Analysis § Leads to simplified, projected LCR over : µ 8 9 m > > < = X X X arg max w j min µ i + (1 − µ i ) , 1 µ ∈ [0 , 1] n j =1 > > i ∈ I + i ∈ I − : ; j j 28

  29. Analysis Local Consistency Relaxation 8 9 > > < = X X X arg max w j min µ i + (1 − µ i ) , 1 µ ∈ [0 , 1] n > > R j ∈ R i ∈ I + i ∈ I − : ; j j MAX SAT Relaxation 8 9 > > < = X X X arg max w j min y i + (1 − y i ) , 1 y ∈ [0 , 1] n > > R j ∈ R i ∈ I + i ∈ I − : ; j j 29

  30. Evaluation

  31. New Algorithm: Rounded LP § Three steps: - Solves relaxed MAP inference problem - Modifies pseudomarginals - Rounds to discrete solutions § We use the alternating direction method of multipliers (ADMM) to implement a message-passing approach [Glowinski and Marrocco, 1975; Gabay and Mercier, 1976] § ADMM-based inference for MAX SAT form of problem was originally developed for hinge-loss MRFs [Bach et al., 2015] 31

  32. Evaluation Setup § Compared with - MPLP [Globerson and Jaakkola, 2007; Sontag et al. 2008, 2012] - MPLP with cycle tightening § MPLP uses coordinate descent dual decomposition, so rounding not applicable § Solved MAP in social-network opinion models with super- and submodular features § Measured primal score, i.e., weighted sum of satisfied clauses 32

Recommend


More recommend