Fast Convergence of Belief Propagation to Global Optima: Beyond Correlation Decay Frederic Koehler Massachusetts Institute of Technology NeurIPS 2019 Frederic Koehler Fast Convergence of BP to Global Optima 1 / 7
Graphical models Ising model: For x ∈ {± 1 } n , Pr( X = x ) = 1 � 1 � 2 x T Jx + h t x Z exp Natural model of correlated random variables. Some examples: Hopfield networks, Restricted Boltzmann Machine (RBM) = bipartite Ising model. X b X IA X e X OH X c X WI X f X MI X MN X a Popular model in ML, natural and social sciences, etc. Frederic Koehler Fast Convergence of BP to Global Optima 2 / 7
Inference Inference: Given J , h compute properties of the model. E.g. E [ X i ] or E [ X i | X j = x j ] . X IA X OH E [ X WI | X OH = + 1 ] = ? X WI X MI X MN Problem: inference in Ising models (e.g. approximating E [ X i ] ) is NP-hard! Natural markov chain approaches (e.g. Gibbs sampling) may mix very slowly. Frederic Koehler Fast Convergence of BP to Global Optima 3 / 7
Variational Inference Variational objectives (Mean-Field/VI, Bethe): Φ MF ( x ) = 1 � � 1 + x i �� � 2 x T Jx + h T x + H Ber 2 i Φ Bethe ( P ) = E P [ 1 � � 2 X T JX + h T X ] + H P ( X E ) − ( deg ( i ) − 1 ) H P ( X i ) E i Message-passing algorithms (MF/VI, BP): x ( t + 1 ) = tanh ⊗ n ( Jx ( t ) + h ) ν ( t + 1 ) tanh − 1 (tanh( J ik ) ν ( t ) � = tanh h i + k → i ) i → j k ∈ ∂ i \ j Non-convex objective — when do these algorithms find global optima? Frederic Koehler Fast Convergence of BP to Global Optima 4 / 7
Our Assumption We suppose, following Dembo-Montanari ’10, that the model is ferromagnetic : J ij ≥ 0 , h i ≥ 0 for all i , j I.e. neighbors want to align. This assumption is necessary : if we don’t have it, computing the optimal mean-field approximation, even approximately, is NP hard. Objective typically has sub-optimal critical points. (cf. correlation decay) Frederic Koehler Fast Convergence of BP to Global Optima 5 / 7
Our Theorems Fix a ferromagnetic Ising model ( J , h ) with m edges and n nodes. Theorem (Mean-Field Convergence) Let x ∗ be a global maximizer of Φ MF . Initializing with x ( 0 ) = � 1 and defining x ( 1 ) , x ( 2 ) , . . . by iterating the mean-field equations, for every t ≥ 1 : � � 4 / 3 � � � J � 1 + � h � 1 � J � 1 + � h � 1 0 ≤ Φ MF ( x ∗ ) − Φ MF ( x ( t ) ) ≤ min , 2 t t Theorem (BP Convergence) Let P ∗ be a global maximizer of Φ Bethe . Initializing ν ( 0 ) i → j = 1 for all i ∼ j and defining ν ( 1 ) , ν ( 2 ) , . . . by BP iteration, � 8 mn ( 1 + � J � ∞ ) 0 ≤ Φ Bethe ( P ∗ ) − Φ ∗ Bethe ( ν ( t ) ) ≤ . t Frederic Koehler Fast Convergence of BP to Global Optima 6 / 7
For More The poster: Poster 174, Wednesday 10:45-12:45 The paper: https://arxiv.org/abs/1905.09992 Frederic Koehler Fast Convergence of BP to Global Optima 7 / 7
Recommend
More recommend