map estimation message passing and perfect graphs
play

MAP Estimation, Message Passing and Perfect Graphs Tony Jebara - PowerPoint PPT Presentation

Background Matchings Perfect Graphs MAP Estimation MAP Estimation, Message Passing and Perfect Graphs Tony Jebara November 25, 2009 Background Matchings Perfect Graphs MAP Estimation Background 1 Perfect Graphs Graphical Models


  1. Background Matchings Perfect Graphs MAP Estimation MAP Estimation, Message Passing and Perfect Graphs Tony Jebara November 25, 2009

  2. Background Matchings Perfect Graphs MAP Estimation Background 1 Perfect Graphs Graphical Models Matchings 2 Bipartite Matching Generalized Matching Perfect Graphs 3 nand Markov Random Fields Packing Linear Programs Recognizing Perfect Graphs MAP Estimation 4 Proving Exact MAP Pruning NMRFs MAP Experiments Conclusions

  3. Background Matchings Perfect Graphs MAP Estimation Perfect Graphs Background on Perfect Graphs In 1960, Berge introduces perfect graphs and two conjectures Perfect: every induced subgraph of G has clique# = coloring# Weak conjecture: G is perfect iff its complement is perfect Strong conjecture: a graph is perfect iff it is Berge Weak perfect graph theorem (Lov´ asz 1972) Link between perfection and integral LPs (Lov´ asz 1972) Strong perfect graph theorem (SPGT) open for 4+ decades

  4. Background Matchings Perfect Graphs MAP Estimation Perfect Graphs Background on Perfect Graphs SPGT Proof (Chudnovsky, Robertson, Seymour, Thomas 2003) Berge passes away shortly after hearing of the proof Many NP-hard and hard to approximate problems are P for perfect graphs Graph coloring Maximum clique Maximum independent set Recognizing perfect graphs is O ( n 9 ) (Chudnovsky et al. 2006)

  5. Background Matchings Perfect Graphs MAP Estimation Graphical Models Graphical Models x 1 x 2 x 3 x 4 x 6 x 5 Perfect graph theory for MAP and graphical models (J 2009) Graphical model: a factor graph G = ( V , E ) representing a distribution p ( X ) where X = { x 1 , . . . , x n } and x i ∈ Z Distribution factorizes as product of functions (squares) over subsets of variables (adjacent nodes) 1 � p ( x 1 , . . . , x n ) = ψ c ( X c ) Z c ∈ C E.g. p ( x 1 , . . . , x 6 )= ψ ( x 1 , x 2 ) ψ ( x 2 , x 3 ) ψ ( x 3 , x 4 , x 5 ) ψ ( x 4 , x 5 , x 6 )

  6. Background Matchings Perfect Graphs MAP Estimation Graphical Models MAP Estimation A canonical problem, find most probable configuration X ∗ = argmax p ( x 1 , . . . , x n ) Useful for image processing, protein folding, coding, etc. Brute force requires � n i =1 | x i | Efficient for trees and singly linked graphs (Pearl 1988) NP-hard for general graphs (Shimony 1994) Approach A: relaxations and variational methods First order LP relaxations (Wainwright et al. 2002) TRW max-product (Kolmogorov & Wainwright 2006) Higher order LP relaxations (Sontag et al. 2008) Fractional and integral LP rounding (Ravikumar et al. 2008) Open problem: when are LPs tight? Approach B: max product and message passing

  7. Background Matchings Perfect Graphs MAP Estimation Graphical Models Max Product Message Passing m t +1 d ∈ Ne( i ) \ c m t 1. For each x i to each X c : i → c = � d → i m t +1 j ∈ c \ i m t 2. For each X c to each x i : c → i = max X c \ x i ψ c ( X c ) � j → c 3. Set t = t + 1 and goto 1 until convergence 4. Output x ∗ d ∈ Ne( i ) m t � i = argmax x i d → i Simple and fast algorithm for MAP Exact for trees (Pearl 1988) Converges for single-loop graphs (Weiss & Freeman 2001) Local optimality guarantees (Wainwright et al. 2003) Performs well in practice for images, turbo codes, etc. Similar to first order LP relaxation Recent progress Exact for matchings (Bayati et al. 2005) Exact for generalized b matchings (Huang and J 2007)

  8. Background Matchings Perfect Graphs MAP Estimation Bipartite Matching Motorola Apple IBM   0 1 0 ” laptop ” 0$ 2$ 2$ → C = 0 0 1   ” server ” 0$ 2$ 3$ 1 0 0 ” phone ” 2$ 3$ 0$ Given W , max C ∈ B n × n � ij W ij C ij such that � i C ij = � j C ij = 1 Classical Hungarian marriage problem O ( n 3 ) Creates a very loopy graphical model Max product takes O ( n 3 ) for exact MAP (Bayati et al. 2005)

  9. Background Matchings Perfect Graphs MAP Estimation Bipartite Matching Bipartite Generalized Matching Motorola Apple IBM   0 1 1 ” laptop ” 0$ 2$ 2$ → C = 1 0 1   ” server ” 0$ 2$ 3$ 1 1 0 ” phone ” 2$ 3$ 0$ Given W , max C ∈ B n × n � ij W ij C ij such that � i C ij = � j C ij = b Combinatorial b -matching problem O ( bn 3 ), (Google Adwords) Creates a very loopy graphical model Max product takes O ( bn 3 ) for exact MAP (Huang & J 2007)

  10. Background Matchings Perfect Graphs MAP Estimation Bipartite Matching Bipartite Generalized Matching u 1 u 2 u 3 u 4 v 1 v 2 v 3 v 4 Graph G = ( U , V , E ) with U = { u 1 , . . . , u n } and V = { v 1 , . . . , v n } and M ( . ), a set of neighbors of node u i or v j Define x i ∈ X and y i ∈ Y where x i = M ( u i ) and y i = M ( v j ) Then p ( X , Y ) = 1 � � j ψ ( x i , y j ) � k φ ( x k ) φ ( y k ) where i Z φ ( y j ) = exp( � u i ∈ y j W ij ) and ψ ( x i , y j ) = ¬ ( v j ∈ x i ⊕ u i ∈ y j )

  11. Background Matchings Perfect Graphs MAP Estimation Bipartite Matching Bipartite Generalized Matching Theorem (Huang & J 2007) Max product on G converges in O ( bn 3 ) time. Proof. Form unwrapped tree T of depth Ω( n ), maximizing belief at root of T is equivalent to maximizing belief at corresponding node in G u 1 v 1 v 2 v 3 v 4 u 2 u 3 u 4 u 2 u 3 u 4 u 2 u 3 u 4 u 2 u 3 u 4 Theorem (Salez & Shah 2009) Under mild assumptions, max product 1 -matching is O ( n 2 ) .

  12. Background Matchings Perfect Graphs MAP Estimation Bipartite Matching Bipartite Generalized Matching Code at http://www.cs.columbia.edu/ ∼ jebara/code

  13. Background Matchings Perfect Graphs MAP Estimation Generalized Matching Generalized Matching BP median running time Median Running time when B=5 3 BP GOBLIN Applications: 0.15 2 t 1/3 0.1 unipartite matching t 1 0.05 40 0 clustering (J & S 2006) 20 100 0 50 b 20 40 60 80 100 n n classification (H & J 2007) GOBLIN median running time Median Running time when B=  n/2  4 BP collaborative filtering (H & J 2009) GOBLIN 3 150 semisupervised (J et al. 2009) t 1/4 2 100 t 50 1 visualization (S & J 2009) 40 0 20 100 0 50 b 20 40 60 80 100 n n Max product is O ( n 2 ), beats other solvers (Salez & Shah 2009)

  14. Background Matchings Perfect Graphs MAP Estimation Generalized Matching Unipartite Generalized Matching Above is k -nearest neighbors with k = 2

  15. Background Matchings Perfect Graphs MAP Estimation Generalized Matching Unipartite Generalized Matching Above is unipartite b -matching with b = 2

  16. Background Matchings Perfect Graphs MAP Estimation Generalized Matching Unipartite Generalized Matching Left is k -nearest neighbors, right is unipartite b -matching.

  17. Background Matchings Perfect Graphs MAP Estimation Generalized Matching Unipartite Generalized Matching p 1 p 2 p 3 p 4  0 1 0 1  p 1 0 2 1 2 1 0 1 0   P 2 2 0 2 1 → C =   0 1 0 1   p 3 1 2 0 2 1 0 1 0 p 4 2 1 2 0 max C ∈ B n × n , C ii =0 � ij W ij C ij such that � i C ij = b , C ij = C ji Combinatorial unipartite matching is efficient (Edmonds 1965) Makes an LP with exponentially many blossom inequalities Max product exact if LP is integral (Sanghavi et al. 2008) �� � � p ( X ) = � i ∈ V δ j ∈ Ne( i ) x ij ≤ 1 ij ∈ E exp( W ij x ij )

  18. Background Matchings Perfect Graphs MAP Estimation Back to Perfect Graphs Max product and exact MAP depend on the LP’s integrality Matchings have special integral LPs (Edmonds 1965) How to generalize beyond matchings? Perfect graphs imply LP integrality (Lov´ asz 1972) Lemma (Lov´ asz 1972) For every non-negative vector � f ∈ R N , the linear program � f ⊤ � x ≤ � β = max x subject to � x ≥ 0 and A � 1 x ∈ R N � recovers a vector � x which is integral if and only if the (undominated) rows of A form the vertex versus maximal cliques incidence matrix of some perfect graph.

  19. Background Matchings Perfect Graphs MAP Estimation Back to Perfect Graphs Lemma (Lov´ asz 1972) � f ⊤ � x ≤ � β = max x subject to � x ≥ 0 and A � 1 x ∈ R N � x 1 x 2 x 3 x 4 x 5 x 6  1 1 0 0 0 0  0 1 1 0 0 0   A =   0 0 1 1 1 0   0 0 0 1 1 1

  20. Background Matchings Perfect Graphs MAP Estimation nand Markov Random Fields nand Markov Random Fields Lov´ asz’s lemma is not solving max p ( X ) on G We have p ( x 1 , . . . , x n ) = 1 � c ∈ C ψ c ( X c ) Z How to apply the lemma to any model G and space X ? ψ c ( X c ) Without loss of generality assume ψ c ( X c ) ← min Xc ψ c ( X c ) + ǫ Consider procedure to convert G to G in NMRF form NMRF is a nand Markov random field over space X all variables are binary X = { x 1 , . . . , x N } all potential functions are pairwise nand gates Φ( x i , x j ) = δ [ x i + x j ≤ 1]

  21. Background Matchings Perfect Graphs MAP Estimation nand Markov Random Fields nand Markov Random Fields !" $# "% A B C "% $$ !" !" !" ## "% ## #$ $$ "% $# #$ Figure: Binary graphical model G (left) and nand MRF G (right). Initialize G as the empty graph For each clique c in graph G do For each configuration k ∈ X c do add a corresponding binary node x c , k to G for each x d , l ∈ G which is incompatible with x c , k connect x c , k and x d , l with an edge

Recommend


More recommend