a study of nesterov s scheme for lagrangian decomposition
play

A Study of Nesterovs Scheme for Lagrangian Decomposition and MAP - PowerPoint PPT Presentation

A Study of Nesterovs Scheme for Lagrangian Decomposition and MAP Labeling Bogdan Savchynskyy, J org Kappes, Stefan Schmidt, Christoph Schn orr Heidelberg Collaboratory for Image Processing (HCI) University of Heidelberg 1/15 MRF/MAP


  1. A Study of Nesterov’s Scheme for Lagrangian Decomposition and MAP Labeling Bogdan Savchynskyy, J¨ org Kappes, Stefan Schmidt, Christoph Schn¨ orr Heidelberg Collaboratory for Image Processing (HCI) University of Heidelberg 1/15

  2. MRF/MAP Inference – Applications ✓➳ ✛ y ✝ ✏ arg min ➳ θ v ♣ y v q � θ vv ✶ ♣ y v , y v ✶ q y P Y V v P V vv ✶ P E Segmentation [Rother et al. 2004], [Nowozin, Lampert 2010] Multi-camera stereo [Kolmogorov, Zabih 2002] Stereo and Motion [Kim et al. 2003] Clustering [Zabih, Kolmogorov. 2004] Medical imaging [Raj et al. 2007] Pose Estimation [Bergtholdt et al. 2010], [Bray et al. 2006] . . . A comparative study of energy minimization methods for Markov random fields with smoothness-based priors. R. Szeliski et al. 2008 2/15

  3. MRF/MAP Inference – Approaches Graph Cuts [Boykov et al. 2001] Special type of potentials. [Kolmogorov, Zabih 2002] Sub-modularity [Boykov, Kolmogorov 2004] QPBO and Roof Duality [Hammel et al.1984], Partial optimality. [Boros, Hammer 2002], [Rother et al. 2007], [Kohli et al. 2008] Combinatorial methods [Bergtholdt et al. 2006], [Schlesinger 2009] Exponential complexity in the [Sanchez et al. 2008], worst-case. [Marinescu, Dechter 2009] 3/15

  4. MRF/MAP Inference – Approaches Message passing and belief propagation Relaxation, dual decomposition. [Weiss, Freeman 2001], [Wainwright et al. Sub-optimal fixed point 2002], [Kolmogorov 2005], [Globerson, Stopping criterion? Jaakkola 2007] Sub-gradient Optimization Schemes Relaxation, dual decomposition. [Komodakis et al. 2007], [Schlesinger, Slow convergence. Giginyak 2007], [Kappes et al. 2010] Stopping criterion? Focus and Contribution: Local Polytope/LP relaxation based on dual decomposition – similar to message passing and sub-gradient schemes; efficient iterations – outperforms subgradient; convergence to the optimum – outperforms message passing; stopping criterion based on duality gap – novel! 4/15

  5. Dual Decomposition Approach E 1 ♣ θ 1 , y q E 2 ♣ θ 2 , y q E ♣ θ, y q ✏ � Ñ � ✒ ✚ y P Y V E 1 ♣ θ 1 , y q � min y P Y V E 2 ♣ θ 2 , y q y P Y V E ♣ θ, y q ➙ min max min θ 1 � θ 2 ✏ θ Simple subproblems in parallel Concave, but non-smooth 5/15

  6. Large Scale Convex Optimization Problem: Dual Decomposition Ñ Convex, Large-Scale, Non-Smooth Sub-gradient schemes: [Komodakis et al. 2007], Smoothing technique + [Schlesinger, Giginyak 2007] accelerated gradient Block-coordinate ascent: methods: [Nesterov 2004, [Wainwright 2004], 2007] [Kolmogorov 2005], [Globerson, Proximal methods: [Combettes, Jaakkola 2007] Wajs 2005], [Beck, Teboulle Smoothing + Block-coordinate 2009] ascent: [Johnson et al. 2007], Proximal Primal-Dual [Werner 2009] Algorithms: [Esser et al. 2010] Proximal methods: [Ravikumar et al. 2010] Solution direction: Smooth and Optimize 6/15

  7. Smoothing Technique by Y.Nesterov y P D r① Ax, y ② � φ ♣ y qs ✁ ✁ ✁ ✁ ✁ Ñ min y P D r① Ax, y ② � φ ♣ y q � ρd ♣ y qs min ❧♦♦♦♦♦♦♦♦♦♦♦♦♠♦♦♦♦♦♦♦♦♦♦♦♦♥ ❧♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♠♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♥ f ♣ x q ˜ f ρ ♣ x q Concave, but non-smooth, Lipschitz-continuous gradient, convergence t ✓ O ♣ 1 convergence t ✓ O ♣ 1 ε 2 q ε q 7/15

  8. Efficient Implementation of a Nesterov’s Method Our approach Basic scheme Worst-case number Stopping condition Duality gap of steps Smoothing selection Worst-case analysis Adaptive Lipschitz constant estimation Worst-case analysis Adaptive (step-size selection) 8/15

  9. Duality Gap and Stopping Condition g ♣ x, y q ✁ max x g ♣ x, y q ↕ ε min x max min y y dual decomposition approaches optimize the relaxed dual x g ♣ x, y q . max min y standard approach – estimate a non-relaxed primal, integer solution. we estimate the relaxed primal min x max g ♣ x, y q – difficult! . y 9/15

  10. Smoothing Selection Fast optimization – low precision Slow optimization – high precision 10/15

  11. Smoothing Selection δ ε ✏ 2 δ Nesterov: worst-case δ estimate. Ours: adaptive estimate. Tsukuba dataset and precision about 0 . 3 % 11/15

  12. Lipschitz Constant (Steps-Size) Estimation Nesterov: worst-case estimate of L . x ✏ y � 1 L ∇ f ♣ y q Ours: adaptive estimate of L without violating the theory! Tsukuba dataset and precision about 3 % 12/15

  13. Comparison to Other Approaches Random synthetic grid model 20x20, 5 labels and Tsukuba dataset 13/15

  14. Summary Contribution: O ♣ 1 ε q vs. O ♣ 1 Improved convergence estimation: ε 2 q Sound stopping condition: min x max g ♣ x, y q ✁ max min x g ♣ x, y q ↕ ε y y Fine-grained ✓ � parallelization properties Applicable to arbitrary graphs and arbitrary potentials. Future work: Examine Primal-Dual viewpoint – EMMCVPR 2011 Appication in structured prediction and learning. 14/15

  15. V. Jojic, S. Gould, and D. Koller. Accelerated dual decomposition ... 2010 Primal LP solution Primal integer solution Synthetic grid 20x20, 5 labels. 15/15

Recommend


More recommend