Optimization for data processing at a large scale Sparsity4PSL - PowerPoint PPT Presentation

Introduction Introduction to optimization Majoration-Minimization approaches Optimization for data processing at a large scale 1/32 Optimization for data processing at a large scale Sparsity4PSL Summer School Emilie Chouzenoux Center for Visual Computing CentraleSup´ elec, INRIA Saclay 24 June 2019

Introduction Introduction to optimization Majoration-Minimization approaches Optimization for data processing at a large scale 2/32 Inverse problems and large scale optimization [Microscopy, ISBI Challenge 2013, F. Soulez] Original image Degraded image

Introduction Introduction to optimization Majoration-Minimization approaches Optimization for data processing at a large scale 2/32 Inverse problems and large scale optimization [Microscopy, ISBI Challenge 2013, F. Soulez] Original image Degraded image x ∈ R N z = D ( Hx ) ∈ R M ◮ H ∈ R M × N : matrix associated with the degradation operator. ◮ D : R M → R M : noise degradation. Inverse problem: Find a good estimate of x from the observations z , using some a priori knowledge on x and on the noise characteristics .

Introduction Introduction to optimization Majoration-Minimization approaches Optimization for data processing at a large scale 3/32 Inverse problems and large scale optimization Inverse problem: Find an estimate � x close to x from the observations z = D ( Hx ) . ◮ Inverse filtering (if M = N and H is invertible) x = H − 1 z � if b ∈ R M is an additive noise = H − 1 ( Hx + b ) ← = x + H − 1 b → Closed form expression, but amplification of the noise if H is ill-conditioned ( ill-posed problem ).

Introduction Introduction to optimization Majoration-Minimization approaches Optimization for data processing at a large scale 4/32 Inverse problems and large scale optimization Inverse problem: Find an estimate � x close to x from the observations z = D ( Hx ) . ✭ ◮ ✭✭✭✭✭✭✭ Inverse filtering ◮ Variational approach � x ∈ Argmin f 1 ( x ) + f 2 ( x ) �� x ∈ R N Data fidelity term Regularization term

Introduction Introduction to optimization Majoration-Minimization approaches Optimization for data processing at a large scale 4/32 Inverse problems and large scale optimization Inverse problem: Find an estimate � x close to x from the observations z = D ( Hx ) . ✭ ◮ ✭✭✭✭✭✭✭ Inverse filtering ◮ Variational approach � x ∈ Argmin f 1 ( x ) + f 2 ( x ) �� x ∈ R N Data fidelity term Regularization term Examples of data fidelity term ◮ Gaussian noise f 1 ( x ) = 1 ( ∀ x ∈ R N ) σ 2 � Hx − z � 2 ◮ Poisson noise � � � M [ Hx ] ( m ) − z ( m ) log([ Hx ] ( m ) ) ( ∀ x ∈ R N ) f 1 ( x ) = m =1

Introduction Introduction to optimization Majoration-Minimization approaches Optimization for data processing at a large scale 5/32 Examples of regularization terms (1) ◮ Admissibility constraints M � Find x ∈ C = C m m =1 where ( ∀ m ∈ { 1 , . . . , M } ) C m ⊂ R N .

Introduction Introduction to optimization Majoration-Minimization approaches Optimization for data processing at a large scale 5/32 Examples of regularization terms (1) ◮ Admissibility constraints M � Find x ∈ C = C m m =1 where ( ∀ m ∈ { 1 , . . . , M } ) C m ⊂ R N . ◮ Variational formulation M � ( ∀ x ∈ R N ) f 2 ( x ) = ι C m ( x ) m =1 where, for all m ∈ { 1 , . . . , M } , ι C m is the indicator function of C m : � if x ∈ C m 0 ( ∀ x ∈ R N ) ι C m ( x ) = + ∞ otherwise.

Introduction Introduction to optimization Majoration-Minimization approaches Optimization for data processing at a large scale 6/32 Examples of regularization terms (2) ◮ ℓ 1 norm (analysis approach) � [ Fx ] ( k ) � � K � � � ( ∀ x ∈ R N ) f 2 ( x ) = � = � Fx � 1 k =1 F ∈ R K × N : Frame decomposition operator ( K ≥ N ) F signal x frame coefficients

Introduction Introduction to optimization Majoration-Minimization approaches Optimization for data processing at a large scale 6/32 Examples of regularization terms (2) ◮ ℓ 1 norm (analysis approach) � � [ Fx ] ( k ) � K � � � ( ∀ x ∈ R N ) f 2 ( x ) = � = � Fx � 1 k =1 ◮ Total variation ( ∀ x = ( x ( i 1 , i 2 ) ) 1 ≤ i 1 ≤ N 1 , 1 ≤ i 2 ≤ N 2 ∈ R N 1 × N 2 ) N 1 N 2 � � �∇ x ( i 1 , i 2 ) � 2 f 2 ( x ) = tv( x ) = i 1 =1 i 2 =1 ∇ x ( i 1 , i 2 ) : discrete gradient at pixel ( i 1 , i 2 ).

Introduction Introduction to optimization Majoration-Minimization approaches Optimization for data processing at a large scale 7/32 Inverse problems and large scale optimization Inverse problem: Find an estimate � x close to x from the observations z = D ( Hx ) . ◮ ✭✭✭✭✭✭✭ Inverse filtering ✭ ◮ Variational approach (more general context) m � � x ∈ Argmin f i ( x ) x ∈ R N i =1 where f i may denote a data fidelity term / a (hybrid) regularization term / constraint.

Introduction Introduction to optimization Majoration-Minimization approaches Optimization for data processing at a large scale 7/32 Inverse problems and large scale optimization Inverse problem: Find an estimate � x close to x from the observations z = D ( Hx ) . ◮ ✭✭✭✭✭✭✭ Inverse filtering ✭ ◮ Variational approach (more general context) m � � x ∈ Argmin f i ( x ) x ∈ R N i =1 where f i may denote a data fidelity term / a (hybrid) regularization term / constraint. → Often no closed form expression or solution expensive to compute (especially in large scale context). ◮ Need for an efficient iterative minimization strategy !

Introduction Introduction to optimization Majoration-Minimization approaches Optimization for data processing at a large scale 8/32 Main challenges ◮ How to exploit the mathematical properties of each term involved in f ? How to handle constraints efficiently ? How to deal with non differentiable terms in f ? Which convergence result can be expected if f is non convex? ◮ How to reduce the memory requirements of an optimization algorithm? How to avoid large-size matrix inversion? ◮ What are the benefits of block alternating strategies? What are their convergence guaranties? ◮ How to accelerate the convergence speed of a first-order (gradient-like) optimization method?

Introduction Introduction to optimization Majoration-Minimization approaches Optimization for data processing at a large scale 9/32 Outline 1. Introduction to optimization ◮ Notation/definitions ◮ Existence and unicity of minimizers ◮ Differential/subdifferential ◮ Optimality conditions 2. Majoration-Minimization approaches ◮ Majorization-Minimization principle ◮ Majorization techniques ◮ MM quadratic methods ◮ Forward-backward algorithm ◮ Block-coordinate MM algorithms

Introduction Introduction to optimization Majoration-Minimization approaches Optimization for data processing at a large scale 10/32 Introduction to optimization

Introduction Introduction to optimization Majoration-Minimization approaches Optimization for data processing at a large scale 11/32 Domain of a function Let f : R N → R ∪ + ∞ . ◮ The domain of f is dom f = { x ∈ R N | f ( x ) < + ∞} . ◮ The function f is proper if dom f � = ∅ .

Introduction Introduction to optimization Majoration-Minimization approaches Optimization for data processing at a large scale 12/32 Indicator function Let C ⊂ R N . The indicator function of C is � 0 if x ∈ C ( ∀ x ∈ R N ) ι C ( x ) = + ∞ otherwise. Example:

Introduction Introduction to optimization Majoration-Minimization approaches Optimization for data processing at a large scale 13/32 Epigraph Let f : R N → R ∪ + ∞ . The epigraph of f is � � � � f ( x ) ≤ ζ epi f = ( x , ζ ) ∈ dom f × R Examples:

Introduction Introduction to optimization Majoration-Minimization approaches Optimization for data processing at a large scale 14/32 Lower semi-continuous function Let f : R N → R ∪ + ∞ . f is a lower semi-continuous function on R N if and only if epi f is closed Examples:

Introduction Introduction to optimization Majoration-Minimization approaches Optimization for data processing at a large scale 15/32 Convex set C ⊂ R N is a convex set if ( ∀ ( x , y ) ∈ C 2 )( ∀ α ∈ ]0 , 1[) α x + (1 − α ) y ∈ C

Introduction Introduction to optimization Majoration-Minimization approaches Optimization for data processing at a large scale 16/32 Coercive function Let f : R N → R ∪ + ∞ . f is coercive if lim � x �→ + ∞ f ( x ) = + ∞ .

Introduction Introduction to optimization Majoration-Minimization approaches Optimization for data processing at a large scale 17/32 Convex function f : R N → R ∪ + ∞ is a convex function if � ∀ ( x , y ) ∈ ( R N ) 2 � ( ∀ α ∈ ]0 , 1[) f ( α x + (1 − α ) y ) ≤ α f ( x ) + (1 − α ) f ( y ) ◮ f is convex ⇔ its epigraph is convex. Examples:

Introduction Introduction to optimization Majoration-Minimization approaches Optimization for data processing at a large scale 18/32 Strictly convex function f : R N → R ∪ + ∞ is strictly convex if ( ∀ x ∈ dom f )( ∀ y ∈ dom f )( ∀ α ∈ ]0 , 1[) x � = y ⇒ f ( α x + (1 − α ) y ) < α f ( x ) + (1 − α ) f ( y ) .

Optimization for data processing at a large scale Sparsity4PSL - PowerPoint PPT Presentation

Introduction Introduction to optimization Majoration-Minimization approaches Optimization for data processing at a large scale 1/32 Optimization for data processing at a large scale Sparsity4PSL Summer School Emilie Chouzenoux Center for

A large-scale International IPv6 Network A large-scale International IPv6 Network www.6net.org

Efficient Large-Scale Graph Processing on Hybrid CPU and GPU Systems A. Gharaibeh, E.

FINANCING LARGE SCALE SOLAR Large Scale Solar Conference - Sydney Gloria Chan Director, Large

Large-scale Data Processing and Optimisation Eiko Yoneki University of Cambridge Computer

Apache Flink Fast and Reliable Large-Scale Data Processing Fabian Hueske @fhueske 1 What is

Networks and large scale optimization Open Data Science Conference Boston, May 2018 Sam Safavi

61A Lecture 30 Announcements Data Processing Data Processing 4 Data Processing Many data sets

15-780: Optimization J. Zico Kolter March 14-16, 2015 1 Outline Introduction to optimization

Large-Scale Machine Learning at Twitter 2 Large-Scale Machine Learning at Twitter Jimmy Lin and

INFRASTRUCTURE 2110414 Large Scale Computing Systems Natawut Nupairoj, Ph.D. Outline 2

Large-Scale Data Engineering Data streams and low latency processing event.cwi.nl/lsde2015 DATA

Large-Scale Data Engineering Data streams and low latency processing event.cwi.nl/lsde DATA

Ethics in Techniques for large-scale data Graham J.L. Kemp TECHNIQUES FOR LARGE-SCALE DATA

Granula: Toward Fine-grained Performance Analysis of Large-scale Graph Processing Platforms Wing

FOOD PROCESSING FOOD PROCESSING GREEN BEAN PROCESSING GREEN BEAN PROCESSING GREEN BEAN

Pregel Large-Scale Graph Processing William Jones Analysing large graphs is hard. We are

Constrained Tensor Factorization with Accelerated AO-ADMM Shaden Smith 1 , Alec Beri 2 , and

Keyword-based Queries Single words

Inexact variable metric proximal gradient methods with line-search for convex and nonconvex

Harnessing Structure in Optimization for Machine Learning Franck Iutzeler LJK, Univ. Grenoble

Signal analysis using sparse representation and proximal optimization methods Mai Quyen PHAM

CLARINET: WAN-Aware Optimization for Analytics Queries Presented By Robert Claus Agenda 1.

Stochastic Proximal Algorithms with Applications to Online Image Recovery Patrick Louis Combettes

A Generalization of the Stone Duality Theorem G. Dimov, E. Ivanova-Dimova, D. Vakarelov Faculty