scalable semidefinite relaxation for maximum a posteriori
play

Scalable Semidefinite Relaxation for Maximum A Posteriori Estimation - PowerPoint PPT Presentation

ICML 2014, Beijing Scalable Semidefinite Relaxation for Maximum A Posteriori Estimation Qixing Huang, Yuxin Chen, and Leonidas Guibas Stanford University Page 1 Maximum A Posteriori (MAP) Inference Markov Random Field (MRF) w i :


  1. ICML 2014, Beijing Scalable Semidefinite Relaxation for Maximum A Posteriori Estimation Qixing Huang, Yuxin Chen, and Leonidas Guibas Stanford University Page 1

  2. Maximum A Posteriori (MAP) Inference • Markov Random Field (MRF) � w i : potential function for vertices � W ij : potential function for edges Page 2

  3. Maximum A Posteriori (MAP) Inference • Markov Random Field (MRF) � w i : potential function for vertices � W ij : potential function for edges • Maximum A Posteriori (MAP) Inference � Find the mode with the lowest energy / potential Page 2

  4. A Large Number of Applications ... • Computer Vision Applications � Image Segmentation � Geometric Surface Labeling � Photo Montage � Scene Decomposition � Object Detection � Color Segmentation � ... • Protein Folding • Metric Labeling • Error-Correcting Codes • ... OpenGM Benchmark Page 3

  5. Problem Setup • Model � n vertices ( x 1 , · · · , x n ) � m di ff erent states ( ) x i 2 { 1 , · · · , m } • Goal: n X X maximize f ( x 1 , · · · , x n ) := w i ( x i ) + W ij ( x i , x j ) | {z } i =1 ( i,j ) 2 G negative energy function s.t. x i 2 { 1 , · · · , m } Page 4

  6. Matrix Representation • Representation of Each x i � m possible states ( ) x i 2 { e 1 , e 2 , · · · , e m } x i Page 5

  7. Matrix Representation • Representation of Each x i � m possible states ( ) x i 2 { e 1 , e 2 , · · · , e m } • Representation of Potentials � potential on vertices: w i 2 R m x i W ij 2 R m ⇥ m � potential on edges: w i W ij Page 5

  8. Matrix Representation • Representation of Each x i � m possible states ( ) x i 2 { e 1 , e 2 , · · · , e m } • Representation of Potentials � potential on vertices: w i 2 R m x i W ij 2 R m ⇥ m � potential on edges: w i W ij • Equivalent Integer Program: n X X ⌦ ↵ W ij , x i x > maximize f ( x 1 , · · · , x n ) := h w i , x i i + j i =1 ( i,j ) 2 G s.t. x i 2 { e 1 , · · · , e m } � Non-Convex! Page 5

  9. Matrix Representation n X X ⌦ ↵ W ij , x i x > maximize f ( x 1 , · · · , x n ) := h w i , x i i + j i =1 ( i,j ) 2 G s.t. x i 2 { e 1 , · · · , e m } Page 6

  10. Matrix Representation n X X ⌦ ↵ W ij , x i x > maximize f ( x 1 , · · · , x n ) := h w i , x i i + j i =1 ( i,j ) 2 G s.t. x i 2 { e 1 , · · · , e m } 2 3 · · · X 11 X 12 X 1 n X > · · · X 22 X 2 n 6 7 12 • Auxiliary Variable X = 6 7 . . . . . . . . · · · . 4 5 X > X > · · · X nn 1 n 2 n n X X maximize f ( x 1 , · · · , x n ) := h w i , x i i + h W ij , X ij i i =1 ( i,j ) 2 G X ij = x i x > X ii = x i x > s.t. i = diag( x i ) j , x i 2 { e 1 , · · · , e m } Page 6

  11. Matrix Representation n X X ⌦ ↵ W ij , x i x > maximize f ( x 1 , · · · , x n ) := h w i , x i i + j i =1 ( i,j ) 2 G s.t. x i 2 { e 1 , · · · , e m } 2 3 2 3 · · · X 11 X 12 X 1 n x 1 X > · · · X 22 X 2 n x 2 6 7 6 7 12 • Auxiliary Variables X = 5 and x = 6 7 6 7 . . . . . . . . . . · · · . . 4 4 5 X > X > x n · · · X nn 1 n 2 n n X X maximize f ( x 1 , · · · , x n ) := h w i , x i i + h W ij , X ij i i =1 ( i,j ) 2 G X = xx > , s.t. X ii = diag( x i ) x i 2 { e 1 , · · · , e m } Page 7

  12. Convex Relaxation n X X maximize f ( x 1 , · · · , x n ) := h w i , x i i + h W ij , X ij i i =1 ( i,j ) 2 G X = xx > , s.t. X ii = diag( x i ) x i 2 { e 1 , · · · , e m } Page 8

  13. Convex Relaxation n X X maximize f ( x 1 , · · · , x n ) := h w i , x i i + h W ij , X ij i i =1 ( i,j ) 2 G X = xx > , s.t. X ii = diag( x i ) x i 2 { e 1 , · · · , e m } • Semidefinite Relaxation n X X maximize f ( x 1 , · · · , x n ) := h w i , x i i + h W ij , X ij i i =1 ( i,j ) 2 G  � x > 1 s.t. ⌫ 0 x X X ii = diag( x i ) x i 2 { e 1 , · · · , e m } Page 8

  14. Convex Relaxation n X X maximize f ( x 1 , · · · , x n ) := h w i , x i i + h W ij , X ij i i =1 ( i,j ) 2 G  � x > 1 s.t. ⌫ 0 , X ii = diag( x i ) x X x i 2 { e 1 , · · · , e m } Page 9

  15. Convex Relaxation n X X maximize f ( x 1 , · · · , x n ) := h w i , x i i + h W ij , X ij i i =1 ( i,j ) 2 G  � x > 1 s.t. ⌫ 0 , X ii = diag( x i ) x X x i 2 { e 1 , · · · , e m } • Relax the Constraints x i 2 { e 1 , · · · , e m } n X X maximize f ( x 1 , · · · , x n ) := h w i , x i i + h W ij , X ij i i =1 ( i,j ) 2 G  � x > 1 s.t. ⌫ 0 , X ii = diag( x i ) x X 1 > x i = 1 , x i � 0 , X ij � 0 , 8 ( i, j ) 2 G Page 9

  16. Our Semidefinite Formulation • Final Semidefinite Program (SDR): n X X maximize h w i , x i i + h W ij , X ij i i =1 ( i,j ) 2 G  � x > 1 s.t. ⌫ 0 , X ii = diag( x i ) x X 1 > x i = 1 , x i � 0 , X ij � 0 , 8 ( i, j ) 2 G • Low-Rank and Sparse! • O ( nm 2 ) linear equality constraints x · x > = X Page 10

  17. Superiority to Linear Programming Relaxation LP Relaxation Semidefinite Relaxation (SDR)  � x > 1 X ij 1 = x i ( 1  i , j  n ) ⌫ 0 , x X X ii = diag( x i ) X ii = diag( x i ) 1 > x i = 1 x i � 0 , 1 > x i = 1 x i � 0 , X ij � 0 , 8 ( i, j ) 2 G X ij � 0 , 8 ( i, j ) 2 G • Shall we enforce the marginalization constraints? ( X ij 1 = x i , 1  i, j  n ) | {z } Θ ( n 2 m ) constraints Page 11

  18. Superiority to Linear Programming Relaxation LP Relaxation Semidefinite Relaxation (SDR)  � x > 1 X ij 1 = x i ( 1  i , j  n ) ⌫ 0 , x X X ii = diag( x i ) X ii = diag( x i ) 1 > x i = 1 x i � 0 , 1 > x i = 1 x i � 0 , X ij � 0 , 8 ( i, j ) 2 G X ij � 0 , 8 ( i, j ) 2 G • Shall we enforce the marginalization constraints? ( X ij 1 = x i , 1  i, j  n ) | {z } Θ ( n 2 m ) constraints • Answer: No! Proposition Any feasible solution to SDR necessarily satisfies X ij 1 = x i . O ( nm 2 ) v.s. O ( n 2 m + nm 2 ) linear equality constraints! Page 11

  19. ADMM • Alternating Direction Methods of Multipliers � Fast convergence in the first several tens of iterations Semidefinite Relaxation (SDR) n X X max h w i , x i i + h W ij , X ij i i =1 ( i,j ) 2 G  � x > 1 s.t. ⌫ 0 , x X X ii = diag( x i ) 1 > x i = 1 , x i � 0 , X ij � 0 , 8 ( i, j ) 2 G Page 12

  20. ADMM • Alternating Direction Methods of Multipliers � Fast convergence in the first several tens of iterations Semidefinite Relaxation (SDR) n X X max h w i , x i i + h W ij , X ij i i =1 ( i,j ) 2 G  � Generic Formulation x > 1 s.t. ⌫ 0 , x X max h C , X i X ii = diag( x i ) s.t. A ( X ) = b , 1 > x i = 1 , x i � 0 , B ( X ) � 0 , X ij � 0 , 8 ( i, j ) 2 G X ⌫ 0 . • A , B , C are all highly sparse! Page 12

  21. Scalability? Generic Formulation max h C , X i dual vars s.t. A ( X ) = b , y B ( X ) � 0 , z � 0 X ⌫ 0 . S ⌫ 0 • A , B , C are all sparse! � All operations are fast except ... Page 13

  22. Scalability? Generic Formulation max h C , X i dual vars s.t. A ( X ) = b , y B ( X ) � 0 , z � 0 X ⌫ 0 . S ⌫ 0 • A , B , C are all sparse! � All operations are fast except ... ! X ( t � 1) � C + A ⇤ � y ( t ) � � B ⇤ � z ( t ) � X ( t ) = µ ⌫ 0 | {z } projection onto PSD cone • Eigen-decomposition of dense matrices is expensive! Page 13

  23. Accelerated ADMM (SDPAD-LR) ! X ( t � 1) � C + A ⇤ � y ( t ) � � B ⇤ � z ( t ) � X ( t ) = µ ⌫ 0 | {z } projection onto PSD cone • Recall: the ground truth obeys rank ( X ) = 1 � Enforce / Exploit Low-Rank Structure! Page 14

  24. Accelerated ADMM (SDPAD-LR) ! X ( t � 1) � C + A ⇤ � y ( t ) � � B ⇤ � z ( t ) � X ( t ) = µ ⌫ 0 | {z } projection onto PSD cone • Recall: the ground truth obeys rank ( X ) = 1 � Enforce / Exploit Low-Rank Structure! • Our Strategy: � Only keep rank- r approximation of X ( t ) ⇡ Y ( t ) Y ( t ) > 0 1 � C + A ⇤ � y ( t ) � � B ⇤ � z ( t ) � B C @ Y ( t � 1) Y ( t � 1) > eigens of B C | {z } µ A low rank | {z } sparse Page 14

  25. Accelerated ADMM (SDPAD-LR) • Our Strategy: � Only keep rank- r approximation of X ( t ) ⇡ Y ( t ) Y ( t ) > 0 1 � C + A ⇤ � y ( t ) � � B ⇤ � z ( t ) � B C @ Y ( t � 1) Y ( t � 1) > eigens of B C | {z } µ A low rank | {z } sparse • Numerically fast � e.g. Lanczos Process O ( nmr 2 + m 2 |G| ) • Empirically, r ⇡ 8 Cornelius Lanczos Page 15

  26. Benchmark Data Sets • Benchmark � OPENGM2 � PIC � ORIENT Page 16

  27. Benchmark Data Sets • Benchmark � OPENGM2 � PIC � ORIENT categories graphs # instances avg time n m PIC-Object full 60 11-21 37 5m32s PIC-Folding mixed 2K 2-503 21 21m42s PIC-Align dense 30-400 20-93 19 37m63s GM-Label sparse 1K 7 324 6m32s GM-Char sparse 5K-18K 2 100 1h13m GM-Montage grid 100K 5,7 3 9h32m GM-Matching dense 19 19 4 2m21s ORIENT sparse 1K 16 10 10m21s All problems can be solved within reasonable time! Page 16

  28. Empirical Convergence: Example • Benchmark: Geometric Surface Labeling (gm275) � matrix size: 5201; # constraints: 218791 � Stopping criterion: duality gap < 10 � 3 Page 17

Recommend


More recommend