STOCHASTIC FISTA ALGORITHMS: SO FAST ? G. Fort 1 , L. Risser 1 , Y. - PDF document

STOCHASTIC FISTA ALGORITHMS: SO FAST ? G. Fort 1 , L. Risser 1 , Y. Atchad´ e 2 , E. Moulines 3 , 1 IMT, Universit´ e de Toulouse & CNRS, F-31062 Toulouse, France. 2 Department of Statistics, Univ. of Michigan, 1085 South University Ave, Ann Arbor 48109, MI, USA. 3 CMAP, Ecole Polytechnique, Route de Saclay,91128 Palaiseau Cedex, France. ABSTRACT 2. PENALIZED MAXIMUM LIKELIHOOD ESTIMATION IN MODELS WITH INTRACTABLE LIKELIHOOD Motivated by challenges in Computational Statistics such as Pe- nalized Maximum Likelihood inference in statistical models with In this section, two classes of problems arising in Computational intractable likelihoods, we analyze the convergence of a stochastic Statistics, and illustrating the question (1) in the framework A1-A2, perturbation of the Fast Iterative Shrinkage-Thresholding Algo- are presented. The first situation corresponds to the computation of rithm (FISTA), when the stochastic approximation relies on a biased the Penalized Maximum Likelihood, or equivalently the Bayesian Monte Carlo estimation as it happens when the points are drawn Maximum a Posteriori estimator, in latent variable models. In that from a Markov chain Monte Carlo (MCMC) sampler. We first moti- case, g stands for the penalty term on parameter θ (in the Bayesian vate this general framework and then show a convergence result for context, the prior on the parameter); while f is the normalized neg- the perturbed FISTA algorithm. We discuss the convergence rate of ative log-likelihood: for latent variable models, it is of the form (see this algorithm and the computational cost of the Monte Carlo ap- e.g.[2]) proximation to reach a given precision. Finally, through a numerical example, we explore new directions for a better understanding of f ( θ ) = − ℓ N ( θ ) := − 1 � N log p ( x, θ )d µ ( x ) (2) these Proximal-Gradient based stochastic optimization algorithms. X Index Terms — Computational Statistics, Stochastic Approxi- where for any θ , p ( · , θ )d µ is the complete data likelihood and the mation, Markov chain Monte Carlo, Proximal-Gradient algorithms, latent variables x take values in X ( µ is a positive σ -finite measure, Nesterov acceleration. such as the Lebesgue measure when X ⊆ R p or the counting measure when X is countable). In (2), the dependence upon the N ob- 1. INTRODUCTION servations is omitted. Under regularity conditions on the model, In various analyses, we are faced with solving: ∇ f ( θ ) = − 1 � ∂ θ log p ( x, θ ) d π θ ( x ) (3) N argmin θ ∈ Θ ( f ( θ ) + g ( θ )) , (1) X where where the set Θ and the functions f, g satisfy X p ( u, θ ) d µ ( u ) = p ( x, θ ) d µ ( x ) p ( x, θ ) d µ ( x ) A1 g : R d → [0 , + ∞ ] is convex, not identically + ∞ and lower d π θ ( x ) := (4) � exp( Nℓ N ( θ )) semi-continuous; f : R d → R ∪ { + ∞} is continuously differen- tiable on Θ := { θ ∈ R d : g ( θ ) + | f ( θ ) | < ∞} and its gradient is is the a posteriori distribution (of the latent variables, given the ob- L -Lipschitz on Θ ; servations, when the parameter is θ ) which is known up to a normalizing constant. In this example, the computation of the gradient and the gradient ∇ f is numerically intractable. Motivated by situ- ∇ f is not explicit; the gradient is an expectation with respect to a ations arising in Computational Statistics (see the examples in Sec- distribution known up to a normalizing constant; this integral can be tion 2), we consider the case when approximated by a Monte Carlo sum computed from the output of an MCMC sampler (see e.g. [3, Chapter 6]), thus providing a biased A2 for any θ ∈ R d , ∇ f ( θ ) = � X H ( θ, x ) π θ (d x ) where X is a stochastic approximation of the exact gradient: note indeed that if topological space endowed with its Borel σ -field, π θ is a probability { X j,θ , j ≥ 0 } is a (non stationary) ergodic Markov chain produced measure on X and H : R × X → R d is measurable. In addition, by an MCMC sampler with target d π θ , then for any positive mea- x �→ H ( θ, x ) is π θ -integrable for any θ ∈ R d , surable function h and only an approximation of ∇ f ( θ ) is available, possibly a stochas- � m � 1 � tic approximation and if such, possibly a biased one. In the present � E h ( X j,θ ) − h d π θ � = 0 m paper, our main contribution is to address a convergence analysis of j =1 a numerical tool to solve Eq.(1), namely a Stochastic perturbation of FISTA (see [1]), in the challenging situation when the perturbation but this bias vanishes when m → ∞ (see e.g. [4, Chapter 13]). comes from a stochastic and biased approximation of ∇ f . The second situation corresponds to the computation of the Pe- This work is partially supported by ANR-11-LABX-0040-CIMI within nalized Maximum Likelihood estimator in a binary graphical model. the program ANR-11-IDEX-0002-02

STOCHASTIC FISTA ALGORITHMS: SO FAST ? G. Fort 1 , L. Risser 1 , Y. - PDF document

STOCHASTIC FISTA ALGORITHMS: SO FAST ? G. Fort 1 , L. Risser 1 , Y. Atchad e 2 , E. Moulines 3 , 1 IMT, Universit e de Toulouse & CNRS, F-31062 Toulouse, France. 2 Department of Statistics, Univ. of Michigan, 1085 South University Ave, Ann

Lecture: Fast Proximal Gradient Methods http://bicmr.pku.edu.cn/~wenzw/opt-2018-fall.html

Sponsor GO GREEN Financed by the Austrian Research Promotion Agency FFG Ralf Risser, Daniel

t [0,1] Minimized energy: [Beg et al, IJCV 2005] Registration algorithm: [Beg et al, IJCV 2005]

Fort Worth Public Art a City of Fort Worth Program Fort Worth Public Art (FWPA) FWPA

Case Study: Fort Ord Case Study: Fort Ord 3a 1 - 1 FortOrd Fort Ord 3a 2 - 2 1 9/14/2020

Stochastic approximation for adaptive Markov chain Monte Carlo algorithms Gersende FORT LTCI /

Restarting accelerated gradient methods with a rough strong convexity estimate Olivier Fercoq

THE AMBER FORT LISTED IN UNESCOS WORLD HERITAGE SITE OUR LOCATION AMBER FORT JAL MAHAL HAWA

Fort Lauderdale Community Center Fort Lauderdale Community Center In Existence Since 2002 Became

E-FORT d.o.o. Fort of Europe - Strategic place ASSEMBLY PARTS FOR YOUR PRODUCTION Business

Being a METS Startup Fast Failure; Fast Reward November 2016 Fast Failure; Fast Reward

Shelter Task Force Presentation Budget and Metrics Fort Worth Code Compliance Fort Worth Code

For Fort Ord rd Pres escri ribed B d Burn P rn Progr ogram Jul uly 17, y 17, 2017 2017

Presentation to Lamond-Riggs Citizens Association Fort Totten South Washington, DC October 3,

Preparing for your visit to Newhaven Fort The Parade Ground Arrival When you arrive at the Fort

OURT & W ELLING ELLINGTON C OUR OURT A PA PARTMENTS A Proposed Development of Fort Worth

biblatex P . S. Langeslag 13 December 2018 - -- --- An Overdue Note on Dashes Description

Part 2: Getting Started in Java By: Mor orteza eza Zakeri eri PhD Studen dent Iran

INF5140 Specification and Verification of Parallel Systems Spring 2017 Institutt for

Assignment I: Calculator Objective The goal of this assignment is to recreate the demonstration

Radio Astronomy Antennas by the Thousands Roger Schultz 650-964-5899 schultz_assoc@pipeline.com

WRIST: Wearables for Rich, Subtle, Transient Interactions in Ubiquitous Environments Edward Lank

Periodically generated Vortex Rings Equations: 3D, unsteady , incompressible Navier-Stokes

From the zones of influence of skeleton branch points to meaningful object parts Luca Serino,

STOCHASTIC FISTA ALGORITHMS: SO FAST ? G. Fort 1 , L. Risser 1 , Y. - PDF document

STOCHASTIC FISTA ALGORITHMS: SO FAST ? G. Fort 1 , L. Risser 1 , Y. Atchad e 2 , E. Moulines 3 , 1 IMT, Universit e de Toulouse & CNRS, F-31062 Toulouse, France. 2 Department of Statistics, Univ. of Michigan, 1085 South University Ave, Ann

Lecture: Fast Proximal Gradient Methods http://bicmr.pku.edu.cn/~wenzw/opt-2018-fall.html

Sponsor GO GREEN Financed by the Austrian Research Promotion Agency FFG Ralf Risser, Daniel

t [0,1] Minimized energy: [Beg et al, IJCV 2005] Registration algorithm: [Beg et al, IJCV 2005]

Fort Worth Public Art a City of Fort Worth Program Fort Worth Public Art (FWPA) FWPA

Case Study: Fort Ord Case Study: Fort Ord 3a 1 - 1 FortOrd Fort Ord 3a 2 - 2 1 9/14/2020

Stochastic approximation for adaptive Markov chain Monte Carlo algorithms Gersende FORT LTCI /

Restarting accelerated gradient methods with a rough strong convexity estimate Olivier Fercoq

THE AMBER FORT LISTED IN UNESCOS WORLD HERITAGE SITE OUR LOCATION AMBER FORT JAL MAHAL HAWA

Fort Lauderdale Community Center Fort Lauderdale Community Center In Existence Since 2002 Became

E-FORT d.o.o. Fort of Europe - Strategic place ASSEMBLY PARTS FOR YOUR PRODUCTION Business

Being a METS Startup Fast Failure; Fast Reward November 2016 Fast Failure; Fast Reward

Shelter Task Force Presentation Budget and Metrics Fort Worth Code Compliance Fort Worth Code

For Fort Ord rd Pres escri ribed B d Burn P rn Progr ogram Jul uly 17, y 17, 2017 2017

Presentation to Lamond-Riggs Citizens Association Fort Totten South Washington, DC October 3,

Preparing for your visit to Newhaven Fort The Parade Ground Arrival When you arrive at the Fort

OURT &amp; W ELLING ELLINGTON C OUR OURT A PA PARTMENTS A Proposed Development of Fort Worth

biblatex P . S. Langeslag 13 December 2018 - -- --- An Overdue Note on Dashes Description

Part 2: Getting Started in Java By: Mor orteza eza Zakeri eri PhD Studen dent Iran

INF5140 Specification and Verification of Parallel Systems Spring 2017 Institutt for

Assignment I: Calculator Objective The goal of this assignment is to recreate the demonstration

Radio Astronomy Antennas by the Thousands Roger Schultz 650-964-5899 schultz_assoc@pipeline.com

WRIST: Wearables for Rich, Subtle, Transient Interactions in Ubiquitous Environments Edward Lank

Periodically generated Vortex Rings Equations: 3D, unsteady , incompressible Navier-Stokes

From the zones of influence of skeleton branch points to meaningful object parts Luca Serino,

OURT & W ELLING ELLINGTON C OUR OURT A PA PARTMENTS A Proposed Development of Fort Worth