Inf erence p enalis ee dans les mod` eles ` a vraisemblance non - PowerPoint PPT Presentation

Inf´ erence p´ enalis´ ee dans les mod` eles ` a vraisemblance non explicite par des algorithmes gradient-proximaux perturb´ es Inf´ erence p´ enalis´ ee dans les mod` eles ` a vraisemblance non explicite par des algorithmes gradient-proximaux perturb´ es Gersende Fort Institut de Math´ ematiques de Toulouse, CNRS and Univ. Paul Sabatier Toulouse, France

Inf´ erence p´ enalis´ ee dans les mod` eles ` a vraisemblance non explicite par des algorithmes gradient-proximaux perturb´ es Based on joint works with Yves Atchad´ e (Univ. Michigan, USA) Eric Moulines (Ecole Polytechnique, France) ֒ → On Perturbed Proximal-Gradient algorithms (JMLR, 2016) Edouard Ollier (ENS Lyon, France) Adeline Samson (Univ. Grenoble Alpes, France). → Penalized inference in Mixed Models by Proximal Gradients ֒ methods (work in progress) Jean-Fran¸ cois Aujol (IMB, Bordeaux, France) Charles Dossal (IMB, Bordeaux, France). ֒ → Acceleration for perturbed Proximal Gradient algorithms (work in progress)

Inf´ erence p´ enalis´ ee dans les mod` eles ` a vraisemblance non explicite par des algorithmes gradient-proximaux perturb´ es Motivation : Pharmacokinetic (1/2) N patients. For patient i , observations { Y ij , 1 ≤ j ≤ J } : evolution of the concentration at times t ij , 1 ≤ j ≤ J . Initial dose D . Model: i.i.d. ∼ N (0 , σ 2 ) Y ij = f ( t ij , X i ) + ǫ ij ǫ ij i.i.d. X i = Z i β + d i ∈ R L ∼ N L (0 , Ω) and independent of ǫ • d i Z i known matrix s.t. each row of X i has in intercept (fixed effect) and covariates

Inf´ erence p´ enalis´ ee dans les mod` eles ` a vraisemblance non explicite par des algorithmes gradient-proximaux perturb´ es Motivation : Pharmacokinetic (1/2) N patients. For patient i , observations { Y ij , 1 ≤ j ≤ J } : evolution of the concentration at times t ij , 1 ≤ j ≤ J . Initial dose D . Model: i.i.d. ∼ N (0 , σ 2 ) Y ij = f ( t ij , X i ) + ǫ ij ǫ ij i.i.d. X i = Z i β + d i ∈ R L ∼ N L (0 , Ω) and independent of ǫ • d i Z i known matrix s.t. each row of X i has in intercept (fixed effect) and covariates Statistical analysis: estimation of ( β, σ 2 , Ω) , under sparsity constraints on β selection of the covariates based on ˆ β . ֒ → Penalized Maximum Likelihood

Inf´ erence p´ enalis´ ee dans les mod` eles ` a vraisemblance non explicite par des algorithmes gradient-proximaux perturb´ es Motivation : Pharmacokinetic (2/2) Model: i.i.d. ∼ N (0 , σ 2 ) Y ij = f ( t ij , X i ) + ǫ ij ǫ ij i.i.d. X i = Z i β + d i ∈ R L d i ∼ N L (0 , Ω) and independent of ǫ • Z i known matrix s.t. each row of X i has in intercept (fixed effect) and covariates Likelihoods: The distribution of { Y ij , X i ; 1 ≤ i ≤ N, 1 ≤ j ≤ J } has an explicit expression. The distribution of { Y ij ; 1 ≤ i ≤ N, 1 ≤ j ≤ J } does not have an explicit expression; at least, the marginal distribution of the previous one.

Inf´ erence p´ enalis´ ee dans les mod` eles ` a vraisemblance non explicite par des algorithmes gradient-proximaux perturb´ es Penalized Maximum Likelihood inference in models with untractable likelihood Outline Penalized Maximum Likelihood inference in models with untractable likelihood Example 1: Latent variable models Example 2: Discrete graphical model (Markov random field) Numerical methods for Penalized ML in such models: Perturbed Proximal Gradient algorithms Convergence analysis

Inf´ erence p´ enalis´ ee dans les mod` eles ` a vraisemblance non explicite par des algorithmes gradient-proximaux perturb´ es Penalized Maximum Likelihood inference in models with untractable likelihood Penalized Maximum Likelihood inference with untractable Likelihood N observations : Y = ( Y 1 , · · · , Y N ) A parametric statistical model θ ∈ Θ ⊆ R d θ �→ L ( θ ) likelihood of the observations θ �→ g ( θ ) A penalty constraint on the parameter θ : for sparsity constraints on θ . Usually, g non-smooth and convex. Goal: Computation of � � − 1 θ �→ argmin θ ∈ Θ N log L ( θ ) + g ( θ ) when the likelihood L has no closed form expression, and can not be evaluated.

Inf´ erence p´ enalis´ ee dans les mod` eles ` a vraisemblance non explicite par des algorithmes gradient-proximaux perturb´ es Penalized Maximum Likelihood inference in models with untractable likelihood Example 1: Latent variable models Example 1: Latent variable model The log-likelihood of the observations Y is of the form � θ �→ log L ( θ ) L ( θ ) = p θ ( x ) µ ( d x ) , X where µ is a positive σ -finite measure on a set X. x are the missing/latent data; ( x, Y ) are the complete data. In these models, the complete likelihood p θ ( x ) can be evaluated explicitly, the likelihood has no closed expression. The exact integral could be replaced by a Monte Carlo approximation ; known to be inefficient. Numerical methods based on the a posteriori distribution of the missing data are preferred (see e.g. Expectation-Maximization approaches). → What about the gradient of the (log)-likelihood ? ֒

Inf´ erence p´ enalis´ ee dans les mod` eles ` a vraisemblance non explicite par des algorithmes gradient-proximaux perturb´ es Penalized Maximum Likelihood inference in models with untractable likelihood Example 1: Latent variable models Gradient of the likelihood in a latent variable model � log L ( θ ) = log p θ ( x ) µ ( d x ) Under regularity conditions, θ �→ log L ( θ ) is C 1 and � ∂ θ p θ ( x ) µ ( d x ) ∇ log L ( θ ) = � p θ ( z ) µ ( d z ) � p θ ( x ) µ ( d x ) = ∂ θ log p θ ( x ) � p θ ( z ) µ ( d z ) � �� the a posteriori distribution

Inf´ erence p´ enalis´ ee dans les mod` eles ` a vraisemblance non explicite par des algorithmes gradient-proximaux perturb´ es Penalized Maximum Likelihood inference in models with untractable likelihood Example 1: Latent variable models Gradient of the likelihood in a latent variable model � log L ( θ ) = log p θ ( x ) µ ( d x ) Under regularity conditions, θ �→ log L ( θ ) is C 1 and � ∂ θ p θ ( x ) µ ( d x ) ∇ log L ( θ ) = � p θ ( z ) µ ( d z ) � p θ ( x ) µ ( d x ) = ∂ θ log p θ ( x ) � p θ ( z ) µ ( d z ) � �� the a posteriori distribution The gradient of the log-likelihood � � � − 1 ∇ θ N log L ( θ ) = H θ ( x ) π θ ( d x ) is an untractable expectation w.r.t. the conditional distribution of the latent variable given the observations Y . For all ( x, θ ) , H θ ( x ) can be evaluated.

Inf´ erence p´ enalis´ ee dans les mod` eles ` a vraisemblance non explicite par des algorithmes gradient-proximaux perturb´ es Penalized Maximum Likelihood inference in models with untractable likelihood Example 1: Latent variable models Approximation of the gradient � � � − 1 ∇ θ N log L ( θ ) = H θ ( x ) π θ ( d x ) X Quadrature techniques: poor behavior w.r.t. the dimension of X 1 use i.i.d. samples from π θ to define a Monte Carlo approximation: not 2 possible, in general. use m samples from a non stationary Markov chain { X j,θ , j ≥ 0 } with 3 unique stationary distribution π θ , and define a Monte Carlo approximation. MCMC samplers provide such a chain.

Inf´ erence p´ enalis´ ee dans les mod` eles ` a vraisemblance non explicite par des algorithmes gradient-proximaux perturb´ es Penalized Maximum Likelihood inference in models with untractable likelihood Example 1: Latent variable models Approximation of the gradient � � � − 1 ∇ θ N log L ( θ ) = H θ ( x ) π θ ( d x ) X Quadrature techniques: poor behavior w.r.t. the dimension of X 1 use i.i.d. samples from π θ to define a Monte Carlo approximation: not 2 possible, in general. use m samples from a non stationary Markov chain { X j,θ , j ≥ 0 } with 3 unique stationary distribution π θ , and define a Monte Carlo approximation. MCMC samplers provide such a chain. Stochastic approximation of the gradient a biased approximation � E [ h ( X j,θ )] � = h ( x ) π θ ( d x ) . If the Markov chain is ergodic ”enough”, the bias vanishes when j → ∞ .

Inf´ erence p´ enalis´ ee dans les mod` eles ` a vraisemblance non explicite par des algorithmes gradient-proximaux perturb´ es Penalized Maximum Likelihood inference in models with untractable likelihood Example 2: Discrete graphical model (Markov random field) Example 2: Discrete graphical model (Markov random field) N independent observations of an undirected graph with p nodes. Each node takes values in a finite alphabet X. N i.i.d. observations Y i in X p with distribution   p 1 � � def y = ( y 1 , · · · , y p ) �→ π θ ( y ) = Z θ exp θ kk B ( y k , y k ) + θ kj B ( y k , y j )   k =1 1 ≤ j<k ≤ p = 1 �� θ, ¯ Z θ exp B ( y ) where B is a symmetric function. θ is a symmetric p × p matrix. the normalizing constant (partition function) Z θ can not be computed - sum over | X | p terms.

Inf´ erence p´ enalis´ ee dans les mod` eles ` a vraisemblance non explicite par des algorithmes gradient-proximaux perturb´ es Penalized Maximum Likelihood inference in models with untractable likelihood Example 2: Discrete graphical model (Markov random field) Likelihood and its gradient in Markov random field ◮ Likelihood of the form (scalar product between matrices = Frobenius inner product) � � N 1 θ, 1 � ¯ N log L ( θ ) = B ( Y i ) − log Z θ N i =1 The likelihood is untractable.

Inf erence p enalis ee dans les mod` eles ` a vraisemblance non - PowerPoint PPT Presentation

Inf erence p enalis ee dans les mod` eles ` a vraisemblance non explicite par des algorithmes gradient-proximaux perturb es Inf erence p enalis ee dans les mod` eles ` a vraisemblance non explicite par des algorithmes

ELES role in the energy transition Uro Salobir, Director for Strategic Innovation, ELES New

Mthodes du maximum de vraisemblance et alternatives Baysiennes pour le criblage haut

es OF Los Angel eles CITY O & CALCULATING SUCCESS 1 Y OF Los Angel eles es at a Glance

The w hat, w hy and how of long-term data preservation Ingrid Dillo Deputy Director DANS/

Vraisemblance Gabrielle Fitzgibbon 2007 Gabrielle Fitzgibbon How I got here High School, 1986

Large-Scale Distributed Systems and Networks TDDE35 Lectures on Embedded Systems Petru Eles

Probl` emes inverses ` a la fronti` ere pour l equation de Beltrami dans des domaines

M arcel Aym est n le 29 mars 1902 Joigny, dans lYonne, o son pre, matre

Thresholding and Learning theory Dominique Picard Laboratoire Probabilit es et Mod` eles Al

L ABLATION DANS LES ABLATION DANS LES L ARYTHMIES ARYTHMIES SUPRAVENTRICULAIRES

Arthur Miller (1915-2005) Biographie Miller nat dans une famille dimmigrants polonais juifs

Po Pop-Up P Proj ojec ect Ho Homel eles ess C Con onnec ect Mobile. Monthly. Mini but

V ictor Hugo est n Besanon le 26 fvrier 1802. Son enfance scoula dans la maison des

Using Quantitative Analysis in Support of Military Intelligence P. Dobias, P. Eles DRDC CORA J.

Table Ronde Nationale sur la Promotion de lintgrit et la Prvention de la Corruption dans

D eprotection semi-automatique de binaire avec Metasm : celui qui fond dans la bouche et pas

Flexibility Driven Scheduling and Mapping for Distributed Real-Time Systems Paul Pop, Petru Eles,

CILITY SOLID AT AT L OS OS A NGELE ELES I NTERNATIO IONAL A IRPO PORT RT December 14, 2016

Distributed Systems for Real-Time Applications: Analysis and Synthesis Petru Eles Department of

FET dans Horizon 2020 IECB, Bordeaux, 5 fvrier 2016 Technologies Futures et Emergentes I

TH IN 16 16 TH INTERNATIO IONAL DESIG IGN AN AND CHI CHILDREN RENS CONFEREN ERENCE CE

Co Co-Inf Inference erence wi with th De Device ice-Edge Edge Sy Syne nerg rgy En Li,

Peer-reviewed open research data: results of a pilot Marjan Grootveld & Jeff van Egmond DANS

Rle des comorbidits psychiatriques dans le traitement et le pronostic des maladies

Inf erence p enalis ee dans les mod` eles ` a vraisemblance non - PowerPoint PPT Presentation

Inf erence p enalis ee dans les mod` eles ` a vraisemblance non explicite par des algorithmes gradient-proximaux perturb es Inf erence p enalis ee dans les mod` eles ` a vraisemblance non explicite par des algorithmes

ELES role in the energy transition Uro Salobir, Director for Strategic Innovation, ELES New

Mthodes du maximum de vraisemblance et alternatives Baysiennes pour le criblage haut

es OF Los Angel eles CITY O &amp; CALCULATING SUCCESS 1 Y OF Los Angel eles es at a Glance

The w hat, w hy and how of long-term data preservation Ingrid Dillo Deputy Director DANS/

Vraisemblance Gabrielle Fitzgibbon 2007 Gabrielle Fitzgibbon How I got here High School, 1986

Large-Scale Distributed Systems and Networks TDDE35 Lectures on Embedded Systems Petru Eles

Probl` emes inverses ` a la fronti` ere pour l equation de Beltrami dans des domaines

M arcel Aym est n le 29 mars 1902 Joigny, dans lYonne, o son pre, matre

Thresholding and Learning theory Dominique Picard Laboratoire Probabilit es et Mod` eles Al

L ABLATION DANS LES ABLATION DANS LES L ARYTHMIES ARYTHMIES SUPRAVENTRICULAIRES

Arthur Miller (1915-2005) Biographie Miller nat dans une famille dimmigrants polonais juifs

Po Pop-Up P Proj ojec ect Ho Homel eles ess C Con onnec ect Mobile. Monthly. Mini but

V ictor Hugo est n Besanon le 26 fvrier 1802. Son enfance scoula dans la maison des

Using Quantitative Analysis in Support of Military Intelligence P. Dobias, P. Eles DRDC CORA J.

Table Ronde Nationale sur la Promotion de lintgrit et la Prvention de la Corruption dans

D eprotection semi-automatique de binaire avec Metasm : celui qui fond dans la bouche et pas

Flexibility Driven Scheduling and Mapping for Distributed Real-Time Systems Paul Pop, Petru Eles,

CILITY SOLID AT AT L OS OS A NGELE ELES I NTERNATIO IONAL A IRPO PORT RT December 14, 2016

Distributed Systems for Real-Time Applications: Analysis and Synthesis Petru Eles Department of

FET dans Horizon 2020 IECB, Bordeaux, 5 fvrier 2016 Technologies Futures et Emergentes I

TH IN 16 16 TH INTERNATIO IONAL DESIG IGN AN AND CHI CHILDREN RENS CONFEREN ERENCE CE

Co Co-Inf Inference erence wi with th De Device ice-Edge Edge Sy Syne nerg rgy En Li,

Peer-reviewed open research data: results of a pilot Marjan Grootveld &amp; Jeff van Egmond DANS

Rle des comorbidits psychiatriques dans le traitement et le pronostic des maladies

es OF Los Angel eles CITY O & CALCULATING SUCCESS 1 Y OF Los Angel eles es at a Glance

Peer-reviewed open research data: results of a pilot Marjan Grootveld & Jeff van Egmond DANS