i Piano: Inertial Proximal Algorithm for Non-convex Optimization - PowerPoint PPT Presentation

i Piano: Inertial Proximal Algorithm for Non-convex Optimization Thomas Pock Institute for Computer Graphics and Vision Graz University of Technology MOBIS Workshop, University of Graz, July 5th, 2014 Graz University of Technology Joint work with: P. Ochs, T. Brox (University of Freiburg) Y. Chen (Graz University of Technology) 1 / 34

Energy minimization methods ◮ Typical variational approaches to solve inverse problems consist of a regularization term and a data term min u { E ( u | f ) = R ( u ) + D ( u , f ) } , where f is the input data and u is the unknown solution 2 / 34

Energy minimization methods ◮ Typical variational approaches to solve inverse problems consist of a regularization term and a data term min u { E ( u | f ) = R ( u ) + D ( u , f ) } , where f is the input data and u is the unknown solution ◮ Low-energy states reflect the physical properties of the problem 2 / 34

Energy minimization methods ◮ Typical variational approaches to solve inverse problems consist of a regularization term and a data term min u { E ( u | f ) = R ( u ) + D ( u , f ) } , where f is the input data and u is the unknown solution ◮ Low-energy states reflect the physical properties of the problem ◮ Minimizer provides the best (in the sense of the model) solution to the problem 2 / 34

Optimization problems are unsolvable Consider the following general mathematical optimization problem: min f 0 ( x ) s.t. f i ( x ) ≤ 0 , i = 1 . . . m x ∈ X , where f 0 ( x ) ... f m ( x ) are real-valued functions, x = ( x 1 , ... x n ) T ∈ R n is a n -dimensional real-valued vector, and X is a subset of R n How to solve this problem? 3 / 34

Optimization problems are unsolvable Consider the following general mathematical optimization problem: min f 0 ( x ) s.t. f i ( x ) ≤ 0 , i = 1 . . . m x ∈ X , where f 0 ( x ) ... f m ( x ) are real-valued functions, x = ( x 1 , ... x n ) T ∈ R n is a n -dimensional real-valued vector, and X is a subset of R n How to solve this problem? ◮ Naive: “Download a commercial package ...” 3 / 34

Optimization problems are unsolvable Consider the following general mathematical optimization problem: min f 0 ( x ) s.t. f i ( x ) ≤ 0 , i = 1 . . . m x ∈ X , where f 0 ( x ) ... f m ( x ) are real-valued functions, x = ( x 1 , ... x n ) T ∈ R n is a n -dimensional real-valued vector, and X is a subset of R n How to solve this problem? ◮ Naive: “Download a commercial package ...” ◮ Reality: “Finding a solution is far from being trivial!” 3 / 34

Optimization problems are unsolvable Consider the following general mathematical optimization problem: min f 0 ( x ) s.t. f i ( x ) ≤ 0 , i = 1 . . . m x ∈ X , where f 0 ( x ) ... f m ( x ) are real-valued functions, x = ( x 1 , ... x n ) T ∈ R n is a n -dimensional real-valued vector, and X is a subset of R n How to solve this problem? ◮ Naive: “Download a commercial package ...” ◮ Reality: “Finding a solution is far from being trivial!” ◮ Efficiently finding solutions to the whole class of Lipschitz continuous problems is a hopeless case [Nesterov ’04] ◮ Can take several million years for small problems with only 10 unknowns 3 / 34

Optimization problems are unsolvable Consider the following general mathematical optimization problem: min f 0 ( x ) s.t. f i ( x ) ≤ 0 , i = 1 . . . m x ∈ X , where f 0 ( x ) ... f m ( x ) are real-valued functions, x = ( x 1 , ... x n ) T ∈ R n is a n -dimensional real-valued vector, and X is a subset of R n How to solve this problem? ◮ Naive: “Download a commercial package ...” ◮ Reality: “Finding a solution is far from being trivial!” ◮ Efficiently finding solutions to the whole class of Lipschitz continuous problems is a hopeless case [Nesterov ’04] ◮ Can take several million years for small problems with only 10 unknowns ◮ “Optimization problems are unsolvable” [Nesterov ’04] 3 / 34

Convex versus non-convex “The great watershed in optimization is not between linearity and non-linearity, but convexity and non-convexity.” R. Rockafellar, 1993 4 / 34

Convex versus non-convex “The great watershed in optimization is not between linearity and non-linearity, but convexity and non-convexity.” R. Rockafellar, 1993 ◮ Convex problems ◮ Any local minimizer is a global minimizer ◮ Result is independent of the initialization ◮ Convex models often inferior 4 / 34

Convex versus non-convex “The great watershed in optimization is not between linearity and non-linearity, but convexity and non-convexity.” R. Rockafellar, 1993 ◮ Convex problems ◮ Any local minimizer is a global minimizer ◮ Result is independent of the initialization ◮ Convex models often inferior ◮ Non-convex problems ◮ In general no chance to find the global minimizer ◮ Result strongly depends on the initialization ◮ Often give more accurate models 4 / 34

Non-convex optimization problems ◮ Smooth non-convex problems can be solved via generic nonlinear numerical optimization algorithms (SD, CG, BFGS, ...) ◮ Hard to generalize to constraints, or non-differentiable functions ◮ Line-search procedure can be time intensive 5 / 34

Non-convex optimization problems ◮ Smooth non-convex problems can be solved via generic nonlinear numerical optimization algorithms (SD, CG, BFGS, ...) ◮ Hard to generalize to constraints, or non-differentiable functions ◮ Line-search procedure can be time intensive ◮ A reasonable idea is to develop algorithms for special classes of structured non-convex problems ◮ A promising class of problems that has a moderate degree of non-convexity is given by the sum of a smooth non-convex function and a non-smooth convex function [Sra ’12], [Chouzenoux, Pesquet, Repetti ’13] 5 / 34

Problem definition ◮ We consider the problem of minimizing a function h : X → R ∪ { + ∞} min x ∈ X h ( x ) = f ( x ) + g ( x ) , where X is a finite dimensional real vector space. ◮ We assume that h is coercive, i.e. � x � 2 → + ∞ ⇒ h ( x ) → + ∞ and bounded from below by some value h > −∞ 6 / 34

Problem definition ◮ We consider the problem of minimizing a function h : X → R ∪ { + ∞} min x ∈ X h ( x ) = f ( x ) + g ( x ) , where X is a finite dimensional real vector space. ◮ We assume that h is coercive, i.e. � x � 2 → + ∞ ⇒ h ( x ) → + ∞ and bounded from below by some value h > −∞ ◮ The function f is possibly non-convex but has a Lipschitz continuous gradient, i.e. �∇ f ( x ) − ∇ f ( y ) � 2 ≤ L � x − y � 2 6 / 34

Problem definition ◮ We consider the problem of minimizing a function h : X → R ∪ { + ∞} min x ∈ X h ( x ) = f ( x ) + g ( x ) , where X is a finite dimensional real vector space. ◮ We assume that h is coercive, i.e. � x � 2 → + ∞ ⇒ h ( x ) → + ∞ and bounded from below by some value h > −∞ ◮ The function f is possibly non-convex but has a Lipschitz continuous gradient, i.e. �∇ f ( x ) − ∇ f ( y ) � 2 ≤ L � x − y � 2 ◮ The function g is a proper lower semi-continuous convex function with an efficient to compute proximal map x � 2 � x − ˆ ( I + α∂ g ) − 1 (ˆ 2 x ) := arg min + α g ( x ) , 2 x ∈ X where α > 0. 6 / 34

Forward-backward splitting ◮ We aim at seeking a critical point x ∗ , i.e. a point satisfying 0 ∈ ∂ h ( x ∗ ) which in our case becomes −∇ f ( x ∗ ) ∈ ∂ g ( x ∗ ) . ◮ A critical point can also be characterized via the proximal residual r ( x ) := x − ( I + ∂ g ) − 1 ( x − ∇ f ( x )) , where I is the identity map. ◮ Clearly r ( x ∗ ) = 0 implies that x ∗ is a critical point. ◮ The norm of the proximal residual can be used as a (bad) measure of optimality 7 / 34

Forward-backward splitting ◮ We aim at seeking a critical point x ∗ , i.e. a point satisfying 0 ∈ ∂ h ( x ∗ ) which in our case becomes −∇ f ( x ∗ ) ∈ ∂ g ( x ∗ ) . ◮ A critical point can also be characterized via the proximal residual r ( x ) := x − ( I + ∂ g ) − 1 ( x − ∇ f ( x )) , where I is the identity map. ◮ Clearly r ( x ∗ ) = 0 implies that x ∗ is a critical point. ◮ The norm of the proximal residual can be used as a (bad) measure of optimality ◮ The proximal residual already suggests an iterative method of the form x n +1 = ( I + α∂ g ) − 1 ( x n − α ∇ f ( x n )) ◮ For f convex, this algorithm is well studied [Lions, Mercier ’79], [Tseng ’91], [Daubechie et al. ’04], [Combettes, Wajs ’05], [Raguet, Fadili, Peyr´ e ’13] 7 / 34

Inertial/accelerated methods ◮ Inertial: Introduced by Polyak in [Polyak ’64] as a special case of multi-step algorithms for minimizing a µ -strongly convex function: x n +1 = x n − α ∇ f ( x n ) + β ( x n − x n − 1 ) ◮ Can be seen as an explicit finite differences discretization of the heavy-ball with friction dynamical system ¨ x ( t ) + γ ˙ x ( t ) + ∇ f ( x ( t )) = 0 . 8 / 34

Inertial/accelerated methods ◮ Inertial: Introduced by Polyak in [Polyak ’64] as a special case of multi-step algorithms for minimizing a µ -strongly convex function: x n +1 = x n − α ∇ f ( x n ) + β ( x n − x n − 1 ) ◮ Can be seen as an explicit finite differences discretization of the heavy-ball with friction dynamical system ¨ x ( t ) + γ ˙ x ( t ) + ∇ f ( x ( t )) = 0 . Source: Stich et al. 8 / 34

i Piano: Inertial Proximal Algorithm for Non-convex Optimization - PowerPoint PPT Presentation

i Piano: Inertial Proximal Algorithm for Non-convex Optimization Thomas Pock Institute for Computer Graphics and Vision Graz University of Technology MOBIS Workshop, University of Graz, July 5th, 2014 Graz University of Technology Joint work

iPiano: Inertial Proximal Algorithm for Non-Convex Optimization David Stutz June 2, 2016 David

Convex Hell 362 dnc CS 16: Convex Hull Whoops, I mean... Convex Hull Whats a Convex Hull?

Inertial Block Proximal Methods for Non-Convex Non-Smooth Optimization L. T. K. Hien 1 N. Gillis 1

Running a Successful Piano Studio INTRODUCTION Piano teaching A vocation and profession Many

Inertial support of distinguished and inertial support representations Examples G -data

Convex hull 1 - 1 Convex hull 1 - 2 Convex hull 1 - 3 Convex hull Definition, extremal

CS133 Computational Geometry Convex Hull 1 Convex Hull Given a set of n points, find the

constrained convex optimization virgil pavlu 1 convex set a set X in a vector space is convex if

Stochastic Perturbations of Proximal-Gradient methods for nonsmooth convex optimization: the

Not a hobby job On professionalism in private piano teaching A personal view The Oxford Piano

MERGING AR AND PIANO EDUCATION: AN APPLICATION DESIGNED AROUND PIANO INSTRUCTORS AND ASSISTED

Dipole Assisted Dipole Assisted Inertial Electrostatic Inertial Electrostatic Confinement

Optimizing Convex Functions over Non-Convex Domains Dan Bienstock and Alex Michalka

CS675: Convex and Combinatorial Optimization Spring 2018 Convex Sets Instructor: Shaddin Dughmi

Convex hull: basic facts Convex hull: basic facts CG Lecture 1 CG Lecture 1 Problem : give a set

Convex hulls of spheres and convex hulls of convex polytopes lying on parallel hyperplanes

Suicide Risk Assessments in Hospitals Using Systematic Expert Risk Assessment for Suicide (SERAS)

Crafting a Cybersecurity Strategy that Works Texas Association of Broadcasters August 2016

Summit Link Transit Battery Electric Projects Todd Daniel, Maintenance and Technology Manager

Liberating energy innovators Data, AI, IOT, Blockchain are critical uncertainties for NZ energy

Emergent Complexity via Multi-agent Competition Bansal et al. 2017 CS330 Student Presentation

WELCOME/BIENVENUE Welcome to #ACPA16 in Montreal! We are glad you are here! Bienvenue Montr

SEMAFO PRESENTATION Stockholm, June 12, 2019 FORWARD-LOOKING STATEMENTS This presentation

NGS Sequence Analysis for Regulation and Epigenomics Timothy Bailey Winter School in Mathematical