ICML 2014, Beijing Scalable Semidefinite Relaxation for Maximum A Posteriori Estimation Qixing Huang, Yuxin Chen, and Leonidas Guibas Stanford University Page 1
Maximum A Posteriori (MAP) Inference • Markov Random Field (MRF) � w i : potential function for vertices � W ij : potential function for edges Page 2
Maximum A Posteriori (MAP) Inference • Markov Random Field (MRF) � w i : potential function for vertices � W ij : potential function for edges • Maximum A Posteriori (MAP) Inference � Find the mode with the lowest energy / potential Page 2
A Large Number of Applications ... • Computer Vision Applications � Image Segmentation � Geometric Surface Labeling � Photo Montage � Scene Decomposition � Object Detection � Color Segmentation � ... • Protein Folding • Metric Labeling • Error-Correcting Codes • ... OpenGM Benchmark Page 3
Problem Setup • Model � n vertices ( x 1 , · · · , x n ) � m di ff erent states ( ) x i 2 { 1 , · · · , m } • Goal: n X X maximize f ( x 1 , · · · , x n ) := w i ( x i ) + W ij ( x i , x j ) | {z } i =1 ( i,j ) 2 G negative energy function s.t. x i 2 { 1 , · · · , m } Page 4
Matrix Representation • Representation of Each x i � m possible states ( ) x i 2 { e 1 , e 2 , · · · , e m } x i Page 5
Matrix Representation • Representation of Each x i � m possible states ( ) x i 2 { e 1 , e 2 , · · · , e m } • Representation of Potentials � potential on vertices: w i 2 R m x i W ij 2 R m ⇥ m � potential on edges: w i W ij Page 5
Matrix Representation • Representation of Each x i � m possible states ( ) x i 2 { e 1 , e 2 , · · · , e m } • Representation of Potentials � potential on vertices: w i 2 R m x i W ij 2 R m ⇥ m � potential on edges: w i W ij • Equivalent Integer Program: n X X ⌦ ↵ W ij , x i x > maximize f ( x 1 , · · · , x n ) := h w i , x i i + j i =1 ( i,j ) 2 G s.t. x i 2 { e 1 , · · · , e m } � Non-Convex! Page 5
Matrix Representation n X X ⌦ ↵ W ij , x i x > maximize f ( x 1 , · · · , x n ) := h w i , x i i + j i =1 ( i,j ) 2 G s.t. x i 2 { e 1 , · · · , e m } Page 6
Matrix Representation n X X ⌦ ↵ W ij , x i x > maximize f ( x 1 , · · · , x n ) := h w i , x i i + j i =1 ( i,j ) 2 G s.t. x i 2 { e 1 , · · · , e m } 2 3 · · · X 11 X 12 X 1 n X > · · · X 22 X 2 n 6 7 12 • Auxiliary Variable X = 6 7 . . . . . . . . · · · . 4 5 X > X > · · · X nn 1 n 2 n n X X maximize f ( x 1 , · · · , x n ) := h w i , x i i + h W ij , X ij i i =1 ( i,j ) 2 G X ij = x i x > X ii = x i x > s.t. i = diag( x i ) j , x i 2 { e 1 , · · · , e m } Page 6
Matrix Representation n X X ⌦ ↵ W ij , x i x > maximize f ( x 1 , · · · , x n ) := h w i , x i i + j i =1 ( i,j ) 2 G s.t. x i 2 { e 1 , · · · , e m } 2 3 2 3 · · · X 11 X 12 X 1 n x 1 X > · · · X 22 X 2 n x 2 6 7 6 7 12 • Auxiliary Variables X = 5 and x = 6 7 6 7 . . . . . . . . . . · · · . . 4 4 5 X > X > x n · · · X nn 1 n 2 n n X X maximize f ( x 1 , · · · , x n ) := h w i , x i i + h W ij , X ij i i =1 ( i,j ) 2 G X = xx > , s.t. X ii = diag( x i ) x i 2 { e 1 , · · · , e m } Page 7
Convex Relaxation n X X maximize f ( x 1 , · · · , x n ) := h w i , x i i + h W ij , X ij i i =1 ( i,j ) 2 G X = xx > , s.t. X ii = diag( x i ) x i 2 { e 1 , · · · , e m } Page 8
Convex Relaxation n X X maximize f ( x 1 , · · · , x n ) := h w i , x i i + h W ij , X ij i i =1 ( i,j ) 2 G X = xx > , s.t. X ii = diag( x i ) x i 2 { e 1 , · · · , e m } • Semidefinite Relaxation n X X maximize f ( x 1 , · · · , x n ) := h w i , x i i + h W ij , X ij i i =1 ( i,j ) 2 G � x > 1 s.t. ⌫ 0 x X X ii = diag( x i ) x i 2 { e 1 , · · · , e m } Page 8
Convex Relaxation n X X maximize f ( x 1 , · · · , x n ) := h w i , x i i + h W ij , X ij i i =1 ( i,j ) 2 G � x > 1 s.t. ⌫ 0 , X ii = diag( x i ) x X x i 2 { e 1 , · · · , e m } Page 9
Convex Relaxation n X X maximize f ( x 1 , · · · , x n ) := h w i , x i i + h W ij , X ij i i =1 ( i,j ) 2 G � x > 1 s.t. ⌫ 0 , X ii = diag( x i ) x X x i 2 { e 1 , · · · , e m } • Relax the Constraints x i 2 { e 1 , · · · , e m } n X X maximize f ( x 1 , · · · , x n ) := h w i , x i i + h W ij , X ij i i =1 ( i,j ) 2 G � x > 1 s.t. ⌫ 0 , X ii = diag( x i ) x X 1 > x i = 1 , x i � 0 , X ij � 0 , 8 ( i, j ) 2 G Page 9
Our Semidefinite Formulation • Final Semidefinite Program (SDR): n X X maximize h w i , x i i + h W ij , X ij i i =1 ( i,j ) 2 G � x > 1 s.t. ⌫ 0 , X ii = diag( x i ) x X 1 > x i = 1 , x i � 0 , X ij � 0 , 8 ( i, j ) 2 G • Low-Rank and Sparse! • O ( nm 2 ) linear equality constraints x · x > = X Page 10
Superiority to Linear Programming Relaxation LP Relaxation Semidefinite Relaxation (SDR) � x > 1 X ij 1 = x i ( 1 i , j n ) ⌫ 0 , x X X ii = diag( x i ) X ii = diag( x i ) 1 > x i = 1 x i � 0 , 1 > x i = 1 x i � 0 , X ij � 0 , 8 ( i, j ) 2 G X ij � 0 , 8 ( i, j ) 2 G • Shall we enforce the marginalization constraints? ( X ij 1 = x i , 1 i, j n ) | {z } Θ ( n 2 m ) constraints Page 11
Superiority to Linear Programming Relaxation LP Relaxation Semidefinite Relaxation (SDR) � x > 1 X ij 1 = x i ( 1 i , j n ) ⌫ 0 , x X X ii = diag( x i ) X ii = diag( x i ) 1 > x i = 1 x i � 0 , 1 > x i = 1 x i � 0 , X ij � 0 , 8 ( i, j ) 2 G X ij � 0 , 8 ( i, j ) 2 G • Shall we enforce the marginalization constraints? ( X ij 1 = x i , 1 i, j n ) | {z } Θ ( n 2 m ) constraints • Answer: No! Proposition Any feasible solution to SDR necessarily satisfies X ij 1 = x i . O ( nm 2 ) v.s. O ( n 2 m + nm 2 ) linear equality constraints! Page 11
ADMM • Alternating Direction Methods of Multipliers � Fast convergence in the first several tens of iterations Semidefinite Relaxation (SDR) n X X max h w i , x i i + h W ij , X ij i i =1 ( i,j ) 2 G � x > 1 s.t. ⌫ 0 , x X X ii = diag( x i ) 1 > x i = 1 , x i � 0 , X ij � 0 , 8 ( i, j ) 2 G Page 12
ADMM • Alternating Direction Methods of Multipliers � Fast convergence in the first several tens of iterations Semidefinite Relaxation (SDR) n X X max h w i , x i i + h W ij , X ij i i =1 ( i,j ) 2 G � Generic Formulation x > 1 s.t. ⌫ 0 , x X max h C , X i X ii = diag( x i ) s.t. A ( X ) = b , 1 > x i = 1 , x i � 0 , B ( X ) � 0 , X ij � 0 , 8 ( i, j ) 2 G X ⌫ 0 . • A , B , C are all highly sparse! Page 12
Scalability? Generic Formulation max h C , X i dual vars s.t. A ( X ) = b , y B ( X ) � 0 , z � 0 X ⌫ 0 . S ⌫ 0 • A , B , C are all sparse! � All operations are fast except ... Page 13
Scalability? Generic Formulation max h C , X i dual vars s.t. A ( X ) = b , y B ( X ) � 0 , z � 0 X ⌫ 0 . S ⌫ 0 • A , B , C are all sparse! � All operations are fast except ... ! X ( t � 1) � C + A ⇤ � y ( t ) � � B ⇤ � z ( t ) � X ( t ) = µ ⌫ 0 | {z } projection onto PSD cone • Eigen-decomposition of dense matrices is expensive! Page 13
Accelerated ADMM (SDPAD-LR) ! X ( t � 1) � C + A ⇤ � y ( t ) � � B ⇤ � z ( t ) � X ( t ) = µ ⌫ 0 | {z } projection onto PSD cone • Recall: the ground truth obeys rank ( X ) = 1 � Enforce / Exploit Low-Rank Structure! Page 14
Accelerated ADMM (SDPAD-LR) ! X ( t � 1) � C + A ⇤ � y ( t ) � � B ⇤ � z ( t ) � X ( t ) = µ ⌫ 0 | {z } projection onto PSD cone • Recall: the ground truth obeys rank ( X ) = 1 � Enforce / Exploit Low-Rank Structure! • Our Strategy: � Only keep rank- r approximation of X ( t ) ⇡ Y ( t ) Y ( t ) > 0 1 � C + A ⇤ � y ( t ) � � B ⇤ � z ( t ) � B C @ Y ( t � 1) Y ( t � 1) > eigens of B C | {z } µ A low rank | {z } sparse Page 14
Accelerated ADMM (SDPAD-LR) • Our Strategy: � Only keep rank- r approximation of X ( t ) ⇡ Y ( t ) Y ( t ) > 0 1 � C + A ⇤ � y ( t ) � � B ⇤ � z ( t ) � B C @ Y ( t � 1) Y ( t � 1) > eigens of B C | {z } µ A low rank | {z } sparse • Numerically fast � e.g. Lanczos Process O ( nmr 2 + m 2 |G| ) • Empirically, r ⇡ 8 Cornelius Lanczos Page 15
Benchmark Data Sets • Benchmark � OPENGM2 � PIC � ORIENT Page 16
Benchmark Data Sets • Benchmark � OPENGM2 � PIC � ORIENT categories graphs # instances avg time n m PIC-Object full 60 11-21 37 5m32s PIC-Folding mixed 2K 2-503 21 21m42s PIC-Align dense 30-400 20-93 19 37m63s GM-Label sparse 1K 7 324 6m32s GM-Char sparse 5K-18K 2 100 1h13m GM-Montage grid 100K 5,7 3 9h32m GM-Matching dense 19 19 4 2m21s ORIENT sparse 1K 16 10 10m21s All problems can be solved within reasonable time! Page 16
Empirical Convergence: Example • Benchmark: Geometric Surface Labeling (gm275) � matrix size: 5201; # constraints: 218791 � Stopping criterion: duality gap < 10 � 3 Page 17
Recommend
More recommend