Unifying Local Consistency and MAX SAT Relaxations for Scalable - PowerPoint PPT Presentation

Unifying Local Consistency and MAX SAT Relaxations for Scalable Inference with Rounding Guarantees Stephen H. Bach Bert Huang Lise Getoor Maryland Virginia Tech UC Santa Cruz AISTATS 2015

This Talk § Markov random fields capture rich dependencies in structured data, but inference is NP-hard § Relaxed inference can help, but techniques have tradeoffs § Two approaches: C Local Consistency Relaxation MAX SAT Relaxation 2

Takeaways C ⊃ Local Consistency Relaxation MAX SAT Relaxation § We can combine their advantages: quality guarantees and highly scalable message-passing algorithms § New inference algorithm for broad class of structured, relational models 3

Modeling Relational Data with Markov Random Fields

Markov Random Fields § Probabilistic model for high-dimensional data: w > φ ( x ) � � P ( x ) ∝ exp § The random variables represent the data, such as x whether a person has an attribute or whether a link exists § The potentials score different configurations of the data φ § The weights scale the influence of different potentials w 5

Markov Random Fields § Variables and potentials form graphical structure: φ φ φ φ φ x x x φ φ φ x 6

Modeling Relational Data § Many important problems have relational structure § Common to use logic to describe probabilistic dependencies § Relations in data map to logical predicates crossing waiting queueing walking talking dancing jogging 7

Logical Potentials § One way to compactly define MRFs is with first-order logic, e.g., Markov logic networks [Richardson and Domingos, 2006] 5 . 0 : Friends ( X, Y ) ∧ Smokes ( X ) = ⇒ Smokes ( Y ) § Each first-order rule is a template for potentials - Ground out rule over relational data - The truth table of each ground rule is a potential - Each potential’s weight comes from the rule that templated it 8

Logical Potentials: Grounding 5 . 0 : Friends ( X, Y ) ∧ Smokes ( X ) = ⇒ Smokes ( Y ) 5 . 0 : Friends ( Alexis , Bob ) ∧ Smokes ( Alexis ) = ⇒ Smokes ( Bob ) � � Alexis � Erin Bob � � Dave Claudia 9

Logical Potentials § Let be a set of rules, where each rule has the R R j general form 0 1 0 1 _ _ _ B C B C w j : x i ¬ x i @ A @ A i ∈ I + i ∈ I − j j - Weights and sets and index variables I + I − w j ≥ 0 j j 10

MAP Inference § MAP ( maximum a posteriori ) inference seeks a most- probable assignment to the unobserved variables § MAP inference is 0 0 1 0 1 1 X _ _ _ arg max P ( x ) arg max w j x i ¬ x i B B C B C C ≡ @ @ A @ A A x x ∈ { 0 , 1 } n R j ∈ R i ∈ I + i ∈ I − j j § This MAX SAT problem is combinatorial and NP-hard! 11

Relaxed MAP Inference

Approaches to Relaxed Inference § Local consistency relaxation - Developed in probabilistic graphical models community - ADVANTAGE: Many highly scalable algorithms available - DISADVANTAGE: No known quality guarantees for logical MRFs § MAX SAT relaxation - Developed in randomized algorithms community - ADVANTAGE: Provides strong quality guarantees - DISADVANTAGE: No algorithms designed for large-scale models § How can we combine these advantages? 13

Local Consistency Relaxation

Local Consistency Relaxation § LCR is a popular technique for approximating MAP in MRFs - Often simply called linear programming (LP) relaxation - Dual decomposition solves dual to LCR objective § Lots of work in PGM community, e.g., - Globerson and Jaakkola, 2007 - Wainwright and Jordan, 2008 - Sontag et al. 2008, 2012 § Idea: relax search over consistent marginals to simpler set 15

Local Consistency Relaxation µ µ µ µ φ φ φ φ φ x x x φ φ φ x 16

Local Consistency Relaxation θ θ φ φ φ φ φ x x x φ φ φ x 17

Local Consistency Relaxation X X arg max θ j ( x j ) φ j ( x j ) w j ( θ , µ ) ∈ L x j R j ∈ R µ : pseudomarginals over variable states x : pseudomarginals over joint potential states φ ( x j ) θ 18

MAX SAT Relaxation

Approximate Inference § View MAP inference as optimizing rounding probabilities § Expected score of a clause is a weighted noisy-or function: 0 1 Y Y @ 1 − (1 − p i ) B C w j p i A i ∈ I + i ∈ I − j j § Then expected total score is 0 1 C ˆ X Y Y W = @ 1 − (1 − p i ) w j p i B C A R j ∈ R i ∈ I + i ∈ I − j j § But, is highly non-convex! arg max p ˆ W 20

Approximate Inference § It is the products in the objective that make it non-convex § The expected score can be lower bounded using the relationship between arithmetic and harmonic means: p 1 + p 2 + · · · + p k √ p 1 p 2 · · · p k ≥ k k § This leads to the lower bound 8 9 0 1 > > ✓ ◆ X 1 − 1 < = X Y Y X X @ 1 − (1 − p i ) w j min p i + (1 − p i ) , 1 A ≥ B C w j p i e > > R j ∈ R i ∈ I + R j ∈ R i ∈ I + i ∈ I − : i ∈ I − ; j j j j 21 Goemans and Williamson, 1994

Approximate Inference § So, we solve the linear program 8 9 > > < = X X X arg max w j min y i + (1 − y i ) , 1 y ∈ [0 , 1] n > > R j ∈ R i ∈ I + i ∈ I − : ; j j § If we set , a greedy rounding method will find a p i = y i -optimal discrete solution ✓ ◆ 1 − 1 e p i = 1 2 y i + 1 § If we set , it improves to ¾ -optimal 4 22 Goemans and Williamson, 1994

Unifying the Relaxations

Analysis X X arg max θ j ( x j ) φ j ( x j ) w j ( θ , µ ) ∈ L x j R j ∈ R and so on… j=1 j=3 j=2 φ φ φ φ φ x x x φ φ φ x 24

Analysis ˆ X arg max φ j ( µ ) µ ∈ [0 , 1] n R j ∈ R X max θ j ( x j ) φ j ( x j ) θ j | ( θ j , µ ) ∈ L w j x j X X max θ j ( x j ) φ j ( x j ) max θ j ( x j ) φ j ( x j ) θ j | ( θ j , µ ) ∈ L w j θ j | ( θ j , µ ) ∈ L w j x j x j X max θ j ( x j ) φ j ( x j ) θ j | ( θ j , µ ) ∈ L w j x j µ µ µ X max θ j ( x j ) φ j ( x j ) θ j | ( θ j , µ ) ∈ L w j x j X X max θ j ( x j ) φ j ( x j ) max θ j ( x j ) φ j ( x j ) X θ j | ( θ j , µ ) ∈ L w j max θ j ( x j ) φ j ( x j ) θ j | ( θ j , µ ) ∈ L w j θ j | ( θ j , µ ) ∈ L w j x j x j x j µ 25

Analysis § We can now analyze each potential’s parameterized subproblem in isolation: ˆ X φ j ( µ ) = max θ j ( x j ) φ j ( x j ) θ j | ( θ j , µ ) ∈ L w j x j § Using the KKT conditions, we can find a simplified expression for each solution based on the parameters : µ 8 9 > > < = ˆ X X φ j ( µ ) = w j min µ i + (1 − µ i ) , 1 > > i ∈ I + i ∈ I − : ; j j 26

Substitute back into Analysis outer objective ˆ X arg max φ j ( µ ) µ ∈ [0 , 1] n R j ∈ R 8 9 > > < = X X w j min µ i + (1 − µ i ) , 1 > > i ∈ I + 8 9 : i ∈ I − ; 8 9 j j > > > > < = X X < = X X w j min µ i + (1 − µ i ) , 1 w j min µ i + (1 − µ i ) , 1 > > i ∈ I + > > : i ∈ I − ; i ∈ I + : i ∈ I − ; j j j j 8 9 > > < = X X w j min µ i + (1 − µ i ) , 1 > > i ∈ I + : i ∈ I − ; j j 8 9 µ µ µ > > < = X X w j min µ i + (1 − µ i ) , 1 > i ∈ I + > : i ∈ I − ; j j 8 9 8 9 8 9 > > < = > > X X < = w j min µ i + (1 − µ i ) , 1 > > X X w j min µ i + (1 − µ i ) , 1 < = X X w j min µ i + (1 − µ i ) , 1 > i ∈ I + > : i ∈ I − ; > i ∈ I + > : i ∈ I − ; j j > i ∈ I + > j j : i ∈ I − ; j j µ 27

Analysis § Leads to simplified, projected LCR over : µ 8 9 m > > < = X X X arg max w j min µ i + (1 − µ i ) , 1 µ ∈ [0 , 1] n j =1 > > i ∈ I + i ∈ I − : ; j j 28

Analysis Local Consistency Relaxation 8 9 > > < = X X X arg max w j min µ i + (1 − µ i ) , 1 µ ∈ [0 , 1] n > > R j ∈ R i ∈ I + i ∈ I − : ; j j MAX SAT Relaxation 8 9 > > < = X X X arg max w j min y i + (1 − y i ) , 1 y ∈ [0 , 1] n > > R j ∈ R i ∈ I + i ∈ I − : ; j j 29

Evaluation

New Algorithm: Rounded LP § Three steps: - Solves relaxed MAP inference problem - Modifies pseudomarginals - Rounds to discrete solutions § We use the alternating direction method of multipliers (ADMM) to implement a message-passing approach [Glowinski and Marrocco, 1975; Gabay and Mercier, 1976] § ADMM-based inference for MAX SAT form of problem was originally developed for hinge-loss MRFs [Bach et al., 2015] 31

Evaluation Setup § Compared with - MPLP [Globerson and Jaakkola, 2007; Sontag et al. 2008, 2012] - MPLP with cycle tightening § MPLP uses coordinate descent dual decomposition, so rounding not applicable § Solved MAP in social-network opinion models with super- and submodular features § Measured primal score, i.e., weighted sum of satisfied clauses 32

Unifying Local Consistency and MAX SAT Relaxations for Scalable - PowerPoint PPT Presentation

Unifying Local Consistency and MAX SAT Relaxations for Scalable Inference with Rounding Guarantees Stephen H. Bach Bert Huang Lise Getoor Maryland Virginia Tech UC Santa Cruz AISTATS

Consistency - Chapter 5 Introduce several notions of Local Consistency: arc consistency,

Constraint Programming - An overview Node-consistency Arc-consistency Path-consistency

Watched Literals in SAT and CP T opics in this Series Why SAT & Constraints? SAT

Web Cache Consistency Web Cache Consistency Web Cache Consistency Web Cache Consistency

Smarter Balanced/SAT Testing Results 2017 1 Smarter Balanced 2 3 4 SAT Achievement Trend 5

SAT SAT SAT SAT To Become an Auto Parts Manufacturing Leader in ASEAN with Excellent Quality

1 Applications ? Trading Consistency for Performance Applications ? Trading Consistency for

Practical Proof Systems for SAT and QBF Marijn J.H. Heule Dagstuhl Seminar on SAT and

SAT and SMT Murphy Berzish Overview Boolean Satisfiability (SAT) problem SAT solvers:

Introduction to LP and SDP Hierarchies Madhur Tulsiani Princeton University Convex Relaxations

Relaxations Well Solved Problems Network Flows Marco Chiarandini Department of Mathematics

Unifying Traditional and Unifying Traditional and Formal Verification Through Formal

Unifying Heterogeneous Cray Unifying Heterogeneous Cray Resources and Systems into an

Unifying Notions of Feedback Sergey Goncharov FAU Tag der Informatik 2019, April 26 Unifying

Unifying Mirror Symmetry Constructions David Favero favero@ualberta.ca University of Alberta

Seminar: Search and Optimization Directional Consistency Gabi R oger Universit at Basel

Using Smartphone Sensor Data Rajeev Piyare 1,* and Seong Ro Lee 2 Department of Electronics

CS 528 Mobile and Ubiquitous Computing Lecture 7b: Machine Learning for Ubiquitous Computing

Diffusion on fractals: Branching Processes and Random Fractals Ben Hambly Mathematical Insitute

Simple random walk on the two-dimensional uniform spanning tree STATISTICAL MECHANICS SEMINAR,

Felicitous Computing David S. Rosenblum School of Computing National University of Singapore

PEDALI NG T HRO UG H PANDEMI C Me lissa & Chris B runtle tt @ m o d a c it y lif e C O

Learning Flat Latent Manifolds with VAEs Nutan Chen 1 , Alexej Klushyn 1 , Francesco Ferroni 2 ,

CSCI 4152/6509 Natural Language Processing Lab 8: Prolog Tutorial 1 Lab Instructor: Dijana

Unifying Local Consistency and MAX SAT Relaxations for Scalable - PowerPoint PPT Presentation

Unifying Local Consistency and MAX SAT Relaxations for Scalable Inference with Rounding Guarantees Stephen H. Bach Bert Huang Lise Getoor Maryland Virginia Tech UC Santa Cruz AISTATS

Consistency - Chapter 5 Introduce several notions of Local Consistency: arc consistency,

Constraint Programming - An overview Node-consistency Arc-consistency Path-consistency

Watched Literals in SAT and CP T opics in this Series Why SAT &amp; Constraints? SAT

Web Cache Consistency Web Cache Consistency Web Cache Consistency Web Cache Consistency

Smarter Balanced/SAT Testing Results 2017 1 Smarter Balanced 2 3 4 SAT Achievement Trend 5

SAT SAT SAT SAT To Become an Auto Parts Manufacturing Leader in ASEAN with Excellent Quality

1 Applications ? Trading Consistency for Performance Applications ? Trading Consistency for

Practical Proof Systems for SAT and QBF Marijn J.H. Heule Dagstuhl Seminar on SAT and

SAT and SMT Murphy Berzish Overview Boolean Satisfiability (SAT) problem SAT solvers:

Introduction to LP and SDP Hierarchies Madhur Tulsiani Princeton University Convex Relaxations

Relaxations Well Solved Problems Network Flows Marco Chiarandini Department of Mathematics

Unifying Traditional and Unifying Traditional and Formal Verification Through Formal

Unifying Heterogeneous Cray Unifying Heterogeneous Cray Resources and Systems into an

Unifying Notions of Feedback Sergey Goncharov FAU Tag der Informatik 2019, April 26 Unifying

Unifying Mirror Symmetry Constructions David Favero favero@ualberta.ca University of Alberta

Seminar: Search and Optimization Directional Consistency Gabi R oger Universit at Basel

Using Smartphone Sensor Data Rajeev Piyare 1,* and Seong Ro Lee 2 Department of Electronics

CS 528 Mobile and Ubiquitous Computing Lecture 7b: Machine Learning for Ubiquitous Computing

Diffusion on fractals: Branching Processes and Random Fractals Ben Hambly Mathematical Insitute

Simple random walk on the two-dimensional uniform spanning tree STATISTICAL MECHANICS SEMINAR,

Felicitous Computing David S. Rosenblum School of Computing National University of Singapore

PEDALI NG T HRO UG H PANDEMI C Me lissa &amp; Chris B runtle tt @ m o d a c it y lif e C O

Learning Flat Latent Manifolds with VAEs Nutan Chen 1 , Alexej Klushyn 1 , Francesco Ferroni 2 ,

CSCI 4152/6509 Natural Language Processing Lab 8: Prolog Tutorial 1 Lab Instructor: Dijana

Watched Literals in SAT and CP T opics in this Series Why SAT & Constraints? SAT

PEDALI NG T HRO UG H PANDEMI C Me lissa & Chris B runtle tt @ m o d a c it y lif e C O