Unifying Local Consistency and MAX SAT Relaxations for Scalable Inference with Rounding Guarantees Stephen H. Bach Bert Huang Lise Getoor Maryland Virginia Tech UC Santa Cruz AISTATS 2015
This Talk § Markov random fields capture rich dependencies in structured data, but inference is NP-hard § Relaxed inference can help, but techniques have tradeoffs § Two approaches: C Local Consistency Relaxation MAX SAT Relaxation 2
Takeaways C ⊃ Local Consistency Relaxation MAX SAT Relaxation § We can combine their advantages: quality guarantees and highly scalable message-passing algorithms § New inference algorithm for broad class of structured, relational models 3
Modeling Relational Data with Markov Random Fields
Markov Random Fields § Probabilistic model for high-dimensional data: w > φ ( x ) � � P ( x ) ∝ exp § The random variables represent the data, such as x whether a person has an attribute or whether a link exists § The potentials score different configurations of the data φ § The weights scale the influence of different potentials w 5
Markov Random Fields § Variables and potentials form graphical structure: φ φ φ φ φ x x x φ φ φ x 6
Modeling Relational Data § Many important problems have relational structure § Common to use logic to describe probabilistic dependencies § Relations in data map to logical predicates crossing waiting queueing walking talking dancing jogging 7
Logical Potentials § One way to compactly define MRFs is with first-order logic, e.g., Markov logic networks [Richardson and Domingos, 2006] 5 . 0 : Friends ( X, Y ) ∧ Smokes ( X ) = ⇒ Smokes ( Y ) § Each first-order rule is a template for potentials - Ground out rule over relational data - The truth table of each ground rule is a potential - Each potential’s weight comes from the rule that templated it 8
Logical Potentials: Grounding 5 . 0 : Friends ( X, Y ) ∧ Smokes ( X ) = ⇒ Smokes ( Y ) 5 . 0 : Friends ( Alexis , Bob ) ∧ Smokes ( Alexis ) = ⇒ Smokes ( Bob ) � � Alexis � Erin Bob � � Dave Claudia 9
Logical Potentials § Let be a set of rules, where each rule has the R R j general form 0 1 0 1 _ _ _ B C B C w j : x i ¬ x i @ A @ A i ∈ I + i ∈ I − j j - Weights and sets and index variables I + I − w j ≥ 0 j j 10
MAP Inference § MAP ( maximum a posteriori ) inference seeks a most- probable assignment to the unobserved variables § MAP inference is 0 0 1 0 1 1 X _ _ _ arg max P ( x ) arg max w j x i ¬ x i B B C B C C ≡ @ @ A @ A A x x ∈ { 0 , 1 } n R j ∈ R i ∈ I + i ∈ I − j j § This MAX SAT problem is combinatorial and NP-hard! 11
Relaxed MAP Inference
Approaches to Relaxed Inference § Local consistency relaxation - Developed in probabilistic graphical models community - ADVANTAGE: Many highly scalable algorithms available - DISADVANTAGE: No known quality guarantees for logical MRFs § MAX SAT relaxation - Developed in randomized algorithms community - ADVANTAGE: Provides strong quality guarantees - DISADVANTAGE: No algorithms designed for large-scale models § How can we combine these advantages? 13
Local Consistency Relaxation
Local Consistency Relaxation § LCR is a popular technique for approximating MAP in MRFs - Often simply called linear programming (LP) relaxation - Dual decomposition solves dual to LCR objective § Lots of work in PGM community, e.g., - Globerson and Jaakkola, 2007 - Wainwright and Jordan, 2008 - Sontag et al. 2008, 2012 § Idea: relax search over consistent marginals to simpler set 15
Local Consistency Relaxation µ µ µ µ φ φ φ φ φ x x x φ φ φ x 16
Local Consistency Relaxation θ θ φ φ φ φ φ x x x φ φ φ x 17
Local Consistency Relaxation X X arg max θ j ( x j ) φ j ( x j ) w j ( θ , µ ) ∈ L x j R j ∈ R µ : pseudomarginals over variable states x : pseudomarginals over joint potential states φ ( x j ) θ 18
MAX SAT Relaxation
Approximate Inference § View MAP inference as optimizing rounding probabilities § Expected score of a clause is a weighted noisy-or function: 0 1 Y Y @ 1 − (1 − p i ) B C w j p i A i ∈ I + i ∈ I − j j § Then expected total score is 0 1 C ˆ X Y Y W = @ 1 − (1 − p i ) w j p i B C A R j ∈ R i ∈ I + i ∈ I − j j § But, is highly non-convex! arg max p ˆ W 20
Approximate Inference § It is the products in the objective that make it non-convex § The expected score can be lower bounded using the relationship between arithmetic and harmonic means: p 1 + p 2 + · · · + p k √ p 1 p 2 · · · p k ≥ k k § This leads to the lower bound 8 9 0 1 > > ✓ ◆ X 1 − 1 < = X Y Y X X @ 1 − (1 − p i ) w j min p i + (1 − p i ) , 1 A ≥ B C w j p i e > > R j ∈ R i ∈ I + R j ∈ R i ∈ I + i ∈ I − : i ∈ I − ; j j j j 21 Goemans and Williamson, 1994
Approximate Inference § So, we solve the linear program 8 9 > > < = X X X arg max w j min y i + (1 − y i ) , 1 y ∈ [0 , 1] n > > R j ∈ R i ∈ I + i ∈ I − : ; j j § If we set , a greedy rounding method will find a p i = y i -optimal discrete solution ✓ ◆ 1 − 1 e p i = 1 2 y i + 1 § If we set , it improves to ¾ -optimal 4 22 Goemans and Williamson, 1994
Unifying the Relaxations
Analysis X X arg max θ j ( x j ) φ j ( x j ) w j ( θ , µ ) ∈ L x j R j ∈ R and so on… j=1 j=3 j=2 φ φ φ φ φ x x x φ φ φ x 24
Analysis ˆ X arg max φ j ( µ ) µ ∈ [0 , 1] n R j ∈ R X max θ j ( x j ) φ j ( x j ) θ j | ( θ j , µ ) ∈ L w j x j X X max θ j ( x j ) φ j ( x j ) max θ j ( x j ) φ j ( x j ) θ j | ( θ j , µ ) ∈ L w j θ j | ( θ j , µ ) ∈ L w j x j x j X max θ j ( x j ) φ j ( x j ) θ j | ( θ j , µ ) ∈ L w j x j µ µ µ X max θ j ( x j ) φ j ( x j ) θ j | ( θ j , µ ) ∈ L w j x j X X max θ j ( x j ) φ j ( x j ) max θ j ( x j ) φ j ( x j ) X θ j | ( θ j , µ ) ∈ L w j max θ j ( x j ) φ j ( x j ) θ j | ( θ j , µ ) ∈ L w j θ j | ( θ j , µ ) ∈ L w j x j x j x j µ 25
Analysis § We can now analyze each potential’s parameterized subproblem in isolation: ˆ X φ j ( µ ) = max θ j ( x j ) φ j ( x j ) θ j | ( θ j , µ ) ∈ L w j x j § Using the KKT conditions, we can find a simplified expression for each solution based on the parameters : µ 8 9 > > < = ˆ X X φ j ( µ ) = w j min µ i + (1 − µ i ) , 1 > > i ∈ I + i ∈ I − : ; j j 26
Substitute back into Analysis outer objective ˆ X arg max φ j ( µ ) µ ∈ [0 , 1] n R j ∈ R 8 9 > > < = X X w j min µ i + (1 − µ i ) , 1 > > i ∈ I + 8 9 : i ∈ I − ; 8 9 j j > > > > < = X X < = X X w j min µ i + (1 − µ i ) , 1 w j min µ i + (1 − µ i ) , 1 > > i ∈ I + > > : i ∈ I − ; i ∈ I + : i ∈ I − ; j j j j 8 9 > > < = X X w j min µ i + (1 − µ i ) , 1 > > i ∈ I + : i ∈ I − ; j j 8 9 µ µ µ > > < = X X w j min µ i + (1 − µ i ) , 1 > i ∈ I + > : i ∈ I − ; j j 8 9 8 9 8 9 > > < = > > X X < = w j min µ i + (1 − µ i ) , 1 > > X X w j min µ i + (1 − µ i ) , 1 < = X X w j min µ i + (1 − µ i ) , 1 > i ∈ I + > : i ∈ I − ; > i ∈ I + > : i ∈ I − ; j j > i ∈ I + > j j : i ∈ I − ; j j µ 27
Analysis § Leads to simplified, projected LCR over : µ 8 9 m > > < = X X X arg max w j min µ i + (1 − µ i ) , 1 µ ∈ [0 , 1] n j =1 > > i ∈ I + i ∈ I − : ; j j 28
Analysis Local Consistency Relaxation 8 9 > > < = X X X arg max w j min µ i + (1 − µ i ) , 1 µ ∈ [0 , 1] n > > R j ∈ R i ∈ I + i ∈ I − : ; j j MAX SAT Relaxation 8 9 > > < = X X X arg max w j min y i + (1 − y i ) , 1 y ∈ [0 , 1] n > > R j ∈ R i ∈ I + i ∈ I − : ; j j 29
Evaluation
New Algorithm: Rounded LP § Three steps: - Solves relaxed MAP inference problem - Modifies pseudomarginals - Rounds to discrete solutions § We use the alternating direction method of multipliers (ADMM) to implement a message-passing approach [Glowinski and Marrocco, 1975; Gabay and Mercier, 1976] § ADMM-based inference for MAX SAT form of problem was originally developed for hinge-loss MRFs [Bach et al., 2015] 31
Evaluation Setup § Compared with - MPLP [Globerson and Jaakkola, 2007; Sontag et al. 2008, 2012] - MPLP with cycle tightening § MPLP uses coordinate descent dual decomposition, so rounding not applicable § Solved MAP in social-network opinion models with super- and submodular features § Measured primal score, i.e., weighted sum of satisfied clauses 32
Recommend
More recommend