bilinear generalized approximate message passing big amp
play

Bilinear generalized approximate message passing (BiG-AMP) for High - PowerPoint PPT Presentation

Bilinear generalized approximate message passing (BiG-AMP) for High Dimensional Inference Phil Schniter Collaborators: Jason Parker @OSU, Jeremy Vila @OSU, and Volkan Cehver @EPFL With support from NSF CCF-1218754, NSF CCF-1018368, NSF


  1. Bilinear generalized approximate message passing (BiG-AMP) for High Dimensional Inference Phil Schniter Collaborators: Jason Parker @OSU, Jeremy Vila @OSU, and Volkan Cehver @EPFL With support from NSF CCF-1218754, NSF CCF-1018368, NSF IIP-0968910, and DARPA/ONR N66001-10-1-4090 Oct. 10, 2013

  2. BiG-AMP Motivation Four Important High Dimensional Inference Problems 1 Matrix Completion (MC): Recover low-rank matrix Z � � from noise-corrupted incomplete observations Y = P Ω Z + W . 2 Robust Principle Components Analysis (RPCA): Recover low-rank matrix Z and sparse matrix S from noise-corrupted observations Y = Z + S + W . 3 Dictionary Learning (DL): Recover (possibly overcomplete) dictionary A and sparse matrix X from noise-corrupted observations Y = AX + W . 4 Non-negative Matrix Factorization (NMF): Recover non-negative matrices A and X from noise-corrupted observations Y = AX + W . The following generalizations may also be of interest: RPCA, DL, or NMF with incomplete observations. RPCA or DL with structured sparsity. Any of the above with non-additive corruptions (e.g., one-bit or phaseless Y ). Phil Schniter (OSU) BiG-AMP Inference Oct. 10, 2013 2 / 31

  3. BiG-AMP Contributions Contributions We propose a novel unified approach to these matrix-recovery problems that leverages the recent framework of approximate message passing (AMP). While previous AMP algorithms have been proposed for the linear model: Infer x ∼ � n p x ( x n ) from y = Φ x + w with AWGN w and known Φ . [Donoho/Maleki/Montanari’10] or the generalized linear model: Infer x ∼ � n p x ( x n ) from y ∼ � m p y | z ( y m | z m ) with hidden z = Φ x and known Φ . [Rangan’10] our work tackles the generalized bilinear model: Infer A ∼ � m,n p a ( a mn ) and X ∼ � n,l p x ( x nl ) from Y ∼ � m,l p y | z ( y ml | z ml ) with hidden Z = AX . [Schniter/Cevher’11] In addition, we propose methods to select the rank of Z , to estimate the parameters of p a , p x , p y | z , and to handle non-separable priors on A , X , Y | Z . Phil Schniter (OSU) BiG-AMP Inference Oct. 10, 2013 3 / 31

  4. BiG-AMP Contributions Outline 1 Bilinear Generalized AMP (BiG-AMP) Background on AMP BiG-AMP heuristics Example configurations/applications 2 Practicalities Adaptive damping Parameter tuning Rank selection Non-separable priors 3 Numerical results: Matrix completion Robust PCA Dictionary learning Hyperspectral unmixing (via NMF) Phil Schniter (OSU) BiG-AMP Inference Oct. 10, 2013 4 / 31

  5. BiG-AMP Description Bilinear Generalized AMP (BiG-AMP) BiG-AMP is a Bayesian approach that uses approximate message passing (AMP) strategies to infer ( Z , A , X ) . Generalized Bilinear: Generalized Linear: p x x nl p y | z ( y ml |· ) a mk p a p x x 1 k p y | z ( y 1 |· ) p x x 2 n m p y | z ( y 2 |· ) p x x 3 p y | z ( y M |· ) x 4 p x l In AMP, beliefs are propagated on a loopy factor graph using approximations that exploit certain blessings of dimensionality: 1 Gaussian message approximation (motivated by central limit theorem), 2 Taylor-series approximation of message differences. Rigorous analyses of GAMP for CS (with large iid sub-Gaussian Φ ) reveal a state evolution whose fixed points are optimal when unique. [Javanmard/Montanari’12] Phil Schniter (OSU) BiG-AMP Inference Oct. 10, 2013 5 / 31

  6. BiG-AMP Heuristics p a p x a 1 1 ← 1 1 → 1 x 1 BiG-AMP sum-product heuristics a 2 x 2 1. Message from i th node of Z to j th node of X : . . . . . . p x 1 ← N a N x N z i | x j ≈ N via CLT! � � �� � �� � �� � � � � � p x n p a n � = j p x i → j ( x j ) ∝ p y | z y i n a n x n i ← n ( a n ) i ← n ( x n ) � { a n } N n =1 , { x n } n � = j � � � z i ( x j ) , ν z ≈ p y | z ( y i | z i ) N z i ; ˆ i ( x j ) ≈ N (exact for AWGN!) z i (A similar thing then happens with the messages from Z to A .) z i ( x j ) , ν z i ( x j ) , the means and variances of p x i ← n & p a To compute ˆ i ← n suffice, and thus we have Gaussian message passing! p y | z ( y 1 |· ) 2. Although Gaussian, we still have 4 MLN messages to compute (too many!). Exploiting similarity among p x 1 → 1 ( x 1 ) p y | z ( y 2 |· ) the messages { p x i ← j } M i =1 , we employ a Taylor-series approximation whose error vanishes as M → ∞ . . . . (Same for { p a i ← j } L i =1 with L → ∞ .) In the end, we p y | z ( y M |· ) x N only need to compute O ( ML ) messages! p x p x M ← N Phil Schniter (OSU) BiG-AMP Inference Oct. 10, 2013 6 / 31

  7. BiG-AMP Configurations Example Configurations 1 Matrix Completion (MC): Recover low-rank Z = AX from Y = P Ω ( Z + W ) . � N ( z ml , v w ) ( m, l ) ∈ Ω a ml ∼ N (0 , 1) , x nl ∼ N ( µ x , v x ) , and y ml | z ml ∼ ∈ Ω 1 1 0 ( m, l ) / 2 Robust PCA (RPCA): a) Recover low-rank Z = AX from Y = Z + E . a mn ∼ N (0 , 1) , x nl ∼ N ( µ x , v x ) , y ml | z ml ∼ GM 2 ( λ, z ml , v w + v s , z ml , v w ) b) Recover low-rank Z = AX and sparse S from Y = [ A I ][ X T S T ] T + W . a mn ∼ N (0 , 1) , x nl ∼ N ( µ x , v x ) , s ml ∼ BG ( λ, 0 , v s ) , y ml | z ml ∼ N ( z ml , v w ) 3 Dictionary Learning (DL): Recover dictionary A and sparse X from Y = AX + W . a mn ∼ N (0 , 1) , x nl ∼ BG ( λ, 0 , v x ) , and y ml | z ml ∼ N ( z ml , v w ) 4 Non-negative Matrix Factorization (NMF): Recover non-negative A and X (up to perm/scale) from Y = AX + W . a mn ∼ N + (0 , µ a ) , x nl ∼ N + (0 , µ x ) , and y ml | z ml ∼ N ( z ml , v w ) Phil Schniter (OSU) BiG-AMP Inference Oct. 10, 2013 7 / 31

  8. BiG-AMP Configurations Example Configurtions (cont.) 5 One-bit Matrix Completion (MC): Recover low-rank Z = AX from Y = P Ω (sgn( Z + W )) . � probit ( m, l ) ∈ Ω a ml ∼ N (0 , 1) , x nl ∼ N ( µ x , v x ) , and y ml | z ml ∼ 1 1 0 ( m, l ) / ∈ Ω . . . leveraging previous work on one-bit/classification GAMP [Ziniel/Schniter’13] 6 Phaseless Matrix Completion (MC): Recover low-rank Z = AX from Y = P Ω ( abs ( Z + W )) . a ml ∼ N (0 , 1) , x nl ∼ N ( µ x , v x ) , and � | y || z | � � − | y | 2 + | z | 2 � � exp I 0 ( m, l ) ∈ Ω p y ml | z ml ( y | z ) = v w v w ∈ Ω 1 1 0 ( m, l ) / . . . leveraging previous work on phase-retrieval GAMP [Schniter/Rangan’12] 7 and so on . . . Phil Schniter (OSU) BiG-AMP Inference Oct. 10, 2013 8 / 31

  9. Practicalities Adaptive Damping Adaptive Damping The heuristics used to derive GAMP hold in the large system limit: M, N, L → ∞ with fixed M/N , M/L . In practice, M, N, L are finite and the rank N is often very small! To prevent BiG-AMP from diverging, we damp the updates using an adjustable step-size parameter β ∈ (0 , 1] . Moreover, we adapt β by monitoring (an approximation to) the cost function minimized by BiG-AMP and adjusting β as needed to ensure decreasing cost, leveraging similar methods from GAMP [Rangan/Schniter/Riegler/Fletcher/Cevher’13] . � �� � � � � ˆ � Y � J ( t ) = D p x nl | Y ˆ · � p x nl ( · ) ← KL divergence between posterior & prior n,l � �� � � � � � Y � + D p a mn | Y ˆ · � p a mn ( · ) m,n � � � − E N ( z ml ;¯ log p y ml | z ml ( y ml | z ml ) . p ml ( t ); ν p ml ( t )) m,l Phil Schniter (OSU) BiG-AMP Inference Oct. 10, 2013 9 / 31

  10. Practicalities Parameter Tuning Parameter Tuning via EM We treat the parameters θ that determine the priors p x , p a , p y | z as deterministic unknowns and compute (approximate) ML estimates using expectation-maximization (EM), as done for GAMP in [Vila/Schniter’13] . Taking X , A , and Z to be the hidden variables, the EM recursion becomes k +1 = arg max � � k � ˆ � Y ; ˆ E log p X , A , Z , Y ( X , A , Z , Y ; θ ) θ θ � θ � � � � k � � Y ; ˆ = arg max E log p x nl ( x nl ; θ ) θ � θ n,l � � k � � � Y ; ˆ + E log p a mn ( a mn ; θ ) θ � m,n k �� � � � � Y ; ˆ log p y ml | z ml ( y ml | z ml ; θ ) + E θ � m,l For tractability, the θ -maximization is performed one variable at a time. Phil Schniter (OSU) BiG-AMP Inference Oct. 10, 2013 10 / 31

  11. Practicalities Rank Selection Rank Selection In practice, the rank of Z (i.e., # columns in A and rows in X ) is unknown. We propose two methods for rank selection: 1 Penalized log-likelihood maximization: ˆ 2 log p Y | Z ( Y | ˆ A N ˆ X N ; ˆ N = arg max θ N ) − η ( N ) , N =1 ,...,N where η ( N ) penalizes the effective number of parameters under rank N (e.g., BIC, AIC). Although ˆ A N , ˆ X N , ˆ θ N are ideally ML estimates under rank N , we use EM-BiG-AMP estimates. 2 Rank contraction (adapted from LMaFit [Wen/Ying/Zhang’12] ): Run EM-BiG-AMP at maximum rank N and then set ˆ N to the location of the largest gap between singular values, but only if the gap is sufficiently large. If not, run EM-BiG-AMP and check again. For matrix completion we advocate the first strategy (with the AICc rule), while for robust PCA we advocate the second strategy. Phil Schniter (OSU) BiG-AMP Inference Oct. 10, 2013 11 / 31

Recommend


More recommend