Segmentation of Counting Processes and Dynamical Models PhD Thesis - PowerPoint PPT Presentation

Segmentation of Counting Processes and Dynamical Models PhD Thesis Defense Mokhtar Zahdi Alaya June 27, 2016

Plan Motivations 1 Learning the intensity of time events with change-points 2 Piecewise constant intensity Estimation procedure Change-points detection + Numerical experiments Binarsity 3 Features binarization Binarsity penalization Generalized linear models + binarsity High-dimensional time-varying Aalen and Cox models 4 Weighted ( ℓ 1 + ℓ 1 )-TV penalization Theoretical guaranties Algorithm + Numerical experiments Conclusion + Perspectives 5

Weighted and unweighted TV For a chosen positive vector of weights ˆ w , we define the (discrete) weighted total-variation (TV) by p � w j | β j − β j − 1 | , for all β ∈ R p . � β � TV , ˆ w = ˆ j =2 w ≡ 1, then we define the unweighted TV by If ˆ p � | β j − β j − 1 | , for all β ∈ R p . � β � TV = j =2

Motivations for using TV Appropriate for multiple change-points estimation. − → Partitioning a nonstationary signal into several contiguous stationary segments of variable duration [Harchaoui and evy-Leduc (2010)] . L´ Widely used in sparse signal processing and imaging (2D) [Chambolle et al. (2010)] . Enforces sparsity in the discrete gradient, which is desirable for applications with features ordered in some meaningful way [Tibshirani et al. (2005)] .

Plan Motivations 1 Learning the intensity of time events with change-points 2 Piecewise constant intensity Estimation procedure Change-points detection + Numerical experiments Binarsity 3 Features binarization Binarsity penalization Generalized linear models + binarsity High-dimensional time-varying Aalen and Cox models 4 Weighted ( ℓ 1 + ℓ 1 )-TV penalization Theoretical guaranties Algorithm + Numerical experiments Conclusion + Perspectives 5

Counting process: stochastic setup N = { N ( t ) } 0 ≤ t ≤ 1 is a counting process. Doob-Meyer decomposition: N ( t ) = Λ 0 ( t ) + M ( t ) , 0 ≤ t ≤ 1 . � �� compensator martingale The intensity of N is defined by λ 0 ( t ) dt = d Λ 0 ( t ) = P [ N has a jump in [ t , t + dt ) |F ( t )] , where F ( t ) = σ ( N ( s ) , s ≤ t ).

Piecewise constant intensity Assume that L 0 � β 0 ,ℓ 1 ( τ 0 ,ℓ − 1 ,τ 0 ,ℓ ] ( t ) , 0 ≤ t ≤ 1 . λ 0 ( t ) = ℓ =1 { τ 0 , 0 = 0 < τ 0 , 1 < · · · < τ 0 , L 0 − 1 < τ 0 , L 0 = 1 } : set of true change-points. { β 0 ,ℓ : 1 ≤ ℓ ≤ L 0 } : set of jump sizes of λ 0 . L 0 : number of true change-points.

Assumption on observations Data We observe n i.i.d copies of N on [0 , 1], denoted N 1 , . . . , N n . � n � We define ¯ N n ( I ) = 1 i =1 N i ( I ) , N i ( I ) = I dN i ( t ) , for any n interval I ⊂ [0 , 1] . This assumption is equivalent to observing a single process N with intensity n λ 0 (only used to have a notion of growing observations with an increasing n ).

A procedure based on total-variation penalization We introduce the least-squares functional � 1 � 1 n λ ( t ) 2 dt − 2 � R n ( λ ) = λ ( t ) dN i ( t ) , n 0 0 i =1 [Reynaud-Bouret (2003, 2006), Ga¨ ıffas and Guilloux (2012)] . Fix m = m n ≥ 1, an integer that shall go to infinity as n → ∞ . We approximate λ 0 in the set of nonnegative piecewise constant functions on [0 , 1] given by m � � � β j , m λ j , m : β = [ β j , m ] 1 ≤ j ≤ m ∈ R m Λ m = λ β = , + j =1 where λ j , m = √ m 1 I j , m � j − 1 m , j � et I j , m = . m

A procedure based on total-variation penalization The estimator of λ 0 is defined by m � ˆ ˆ λ = λ ˆ β = β j , m λ j , m . j =1 where ˆ β is giving by � � ˆ β = argmin R n ( λ β ) + � β � TV , ˆ . w β ∈ R m + We consider the dominant term � �� j − 1 m log m �� ¯ w j ≈ ˆ N n m , 1 . n

Oracle inequality with fast rate The linear space Λ m is endowed by the norm �� 1 � λ � = λ 2 ( t ) dt . 0 Let ˆ S to be the support of the discrete gradient of ˆ β , � � ˆ j : ˆ β j , m � = ˆ S = β j − 1 , m for j = 2 , . . . , m . Let ˆ L to be the estimated number of change-points defined by: ˆ L = | ˆ S | .

Oracle inequality with fast rate The estimator ˆ λ satisfies the following: Theorem 1 Fix x > 0 and let the data-driven weights ˆ w defined as above. Assume that ˆ L satisfies ˆ L ≤ L max . Then, we have � 2 + 6( L max + 2( L 0 − 1)) max λ − λ 0 � 2 ≤ inf � � � ˆ w 2 � λ β − λ 0 1 ≤ j ≤ m ˆ j β ∈ R m + � � � λ 0 � ∞ x + L max (1 + log m ) + C 1 n � � 2 x + L max (1 + log m ) m + C 2 , n 2 with a probability larger than 1 − L max e − x .

Oracle inequality with fast rate Let ∆ β, max = max 1 ≤ ℓ,ℓ ′ ≤ L 0 | β 0 ,ℓ − β 0 ,ℓ ′ | , be the maximum of jump size of λ 0 . Corollary We have 2( L 0 − 1)∆ 2 � λ β − λ 0 � 2 ≤ β, max . m Our procedure has a fast rate of convergence of order ( L max ∨ L 0 ) m log m . n An optimal tradeoff between approximation and complexity is given by the choice: If L max = O ( m ) ⇒ m ≈ n 1 / 3 . If L max = O (1) ⇒ m ≈ n 1 / 2 .

Consistency of change-points detection There is an unavoidable non-parametric bias of approximation. The approximate change-points sequence ( j ℓ m ) 0 ≤ ℓ ≤ L 0 is defined as the right-hand side boundary of the unique interval I j ℓ , m that contains the true change-point τ 0 ,ℓ . � � j ℓ − 1 m , j ℓ τ 0 ,ℓ ∈ , for ℓ = 1 , . . . , L 0 − 1, where j 0 = 0 and j L 0 = m by m convention. τ 0 ,ℓ − 1 τ 0 ,ℓ τ 0 ,ℓ +1 t ˆ τ ℓ I j ℓ − 1 , m I j ℓ , m I j ℓ +1 , m Let ˆ S = { ˆ j 1 , . . . , ˆ L } with ˆ j 1 < · · · < ˆ L , and ˆ j 0 = 0 and ˆ j ˆ j ˆ j ˆ L +1 = m . We define simply ˆ j ℓ m for ℓ = 0 , . . . , ˆ ˆ τ ℓ = L + 1 .

Consistency of change-points detection We can’t recover the exact position of two change-points if they lie on the same interval I j , m . Minimal distance between true change-points Assume that there is a positive constant c ≥ 8 such that 1 ≤ ℓ ≤ L 0 | τ 0 ,ℓ − τ 0 ,ℓ − 1 | > c min m . − → The change-points of λ 0 are sufficiently far apart. − → There cannot be more than one change-point in the “high-resolution” intervals I j , m . The procedure will be able to recover the (unique) intervals I j ℓ , m , for ℓ = 0 , . . . , L 0 , where the change-point belongs.

Consistency of change-points detection ∆ j , min = 1 ≤ ℓ ≤ L 0 − 1 | j ℓ +1 − j ℓ | , the minimum distance between min two consecutive terms in the change-points of λ 0 . ∆ β, min = 1 ≤ q ≤ m − 1 | β 0 , q +1 , m − β 0 , q , m | , the smallest jump size of min the projection λ 0 , m of λ 0 onto Λ m . ( ε n ) n ≥ 1 , a non-increasing and positive sequence that goes to zero as n → ∞ . Technical Assumptions We assume that ∆ j , min , ∆ β, min and ( ε n ) n ≥ 1 satisfy √ nm ε n ∆ β, min √ n ∆ j , min ∆ β, min √ log m → ∞ and √ m log m → ∞ , as n → ∞ .

Consistency of change-points detection Theorem 2 Under the given Assumptions, and if ˆ L = L 0 − 1, then the change-points estimators { ˆ L } satisfy τ 1 , . . . , ˆ τ ˆ � � 1 ≤ ℓ ≤ L 0 − 1 | ˆ max τ ℓ − τ 0 ,ℓ | ≤ ε n → 1 , as n → ∞ . P If m ≈ n 1 / 3 , Theorem 2 holds with ε n ≈ n − 1 / 3 , ∆ β, min = n − 1 / 6 et ∆ j , min ≥ 6 . m ≈ n 1 / 2 , Theorem 2 holds with ε n ≈ n − 1 / 2 , ∆ β, min = n − 1 / 6 et ∆ j , min ≥ 6 .

Proximal operator + algorithm We are interested in computing a solution x ⋆ = argmin x ∈ R p { g ( x ) + h ( x ) } , where g is smooth and h is simple (prox-calculable). The proximal operator prox h of a proper, lower semi-continuous, convex function h : R m → ( −∞ , ∞ ] , is defined as � 1 � 2 � v − x � 2 , for all v ∈ R m . prox h ( v ) = argmin 2 + h ( x ) x ∈ R m Proximal gradient descent (PGD) algorithm is based on x ( k ) − ε k ∇ g ( x ( k ) ) x ( k +1) = prox ε k h � � . [Daubechies et al. (2004) (ISTA) , Beck and Teboulle (2009) (FISTA)]

Proximal operator of the weighted TV penalization We have � 1 � ˆ 2 � N − β � 2 β = argmin 2 + � β � TV , ˆ , w β ∈ R m + where N = [ N j ] 1 ≤ j ≤ m ∈ R m + is given by � √ m ¯ N n ( I 1 , m ) , . . . , √ m ¯ � N = N n ( I m , m . Then ˆ β = prox �·� TV , ˆ w ( N ) . Modification of Condat’s algorithm [Condat (2013)] . If we have a feasible dual variable ˆ u , we can compute the primal solution ˆ β, by Fenchel duality. The Karush-Kuhn-Tucker (KKT) optimality conditions characterize the unique solutions ˆ β and ˆ u .

Segmentation of Counting Processes and Dynamical Models PhD Thesis - PowerPoint PPT Presentation

Segmentation of Counting Processes and Dynamical Models PhD Thesis Defense Mokhtar Zahdi Alaya June 27, 2016 Plan Motivations 1 Learning the intensity of time events with change-points 2 Piecewise constant intensity Estimation procedure

Segmentation Bottom-up Segmentation Semantic / instance segmentation Many Slides from L.

VIDEO SIGNALS Segmentation WHAT IS SEGMENTATION WHAT IS SEGMENTATION Segmentation is a

Semantic Segmentation / Instance Segmentation Based on Deep learning Yiding Liu 2018.12.08

Probability and Random Processes Lecture 11 Measurable dynamical systems Random processes

Segmentation using Segmentation using Bayesian Decision Theory Bayesian Decision Theory

Segmentation Segmentation Segmentation Define the accurate boundaries of all objects in an image

Continuous orbit equivalence rigidity Xin Li Dynamical systems and operator algebras Dynamical

Homotopy theories of dynamical systems Rick Jardine University of Western Ontario July 15, 2013

44 Days And Counting 44 Days And Counting 2010 World Equestrian Games Overview September 25

Counting and Probability Whats to come? Counting and Probability Whats to come?

Pixel-Level Im Image Understanding wit ith Semantic Segmentation and Panoptic Segmentation

Lecture 8: Image Segmentation Peng Chao Face++ Researcher pengchao@megvii.com Nov. 2017

Co-Segmentation of 3D Shapes via Subspace Clustering Ruizhen Hu Lubin Fan

Introduction to RFM segmentation Karolis Urbonas Head of Data Science, Amazon DataCamp

Image Segmentation Machine Learning Study Group Presented by Yaochen Xie Jan 25, 2018 Outline

Counting CS1200, CSE IIT Madras Meghana Nasre April 2, 2020 CS1200, CSE IIT Madras Meghana

Flexible parametric joint modelling of longitudinal and survival data Workshop on Flexible Models

Patients with Acute or Chronic Non-Cancer Pain Applicant Town Hall Cycle 3, 2016 November 3,

& Why We Need 24/6 (Sessions 1 & 2) 1. How can keeping a weekly Stop Day help heal you?

Bigtable David Wyrobnik, MEng Overview What is Bigtable? Data Model API

Homelessness and access to General Practice Training pack for reception staff People who sleep

Future vision for Primary Care Jonathan Kerry, Senior Assistant of Primary Care 1. To give you

Interim Results Highlights: Quick COVID-19 Primary Care Survey Melinda Davis, PhD Associate

A fleet of packages for inputting United Kingdom primary care data Roger B. Newson

Segmentation of Counting Processes and Dynamical Models PhD Thesis - PowerPoint PPT Presentation

Segmentation of Counting Processes and Dynamical Models PhD Thesis Defense Mokhtar Zahdi Alaya June 27, 2016 Plan Motivations 1 Learning the intensity of time events with change-points 2 Piecewise constant intensity Estimation procedure

Segmentation Bottom-up Segmentation Semantic / instance segmentation Many Slides from L.

VIDEO SIGNALS Segmentation WHAT IS SEGMENTATION WHAT IS SEGMENTATION Segmentation is a

Semantic Segmentation / Instance Segmentation Based on Deep learning Yiding Liu 2018.12.08

Probability and Random Processes Lecture 11 Measurable dynamical systems Random processes

Segmentation using Segmentation using Bayesian Decision Theory Bayesian Decision Theory

Segmentation Segmentation Segmentation Define the accurate boundaries of all objects in an image

Continuous orbit equivalence rigidity Xin Li Dynamical systems and operator algebras Dynamical

Homotopy theories of dynamical systems Rick Jardine University of Western Ontario July 15, 2013

44 Days And Counting 44 Days And Counting 2010 World Equestrian Games Overview September 25

Counting and Probability Whats to come? Counting and Probability Whats to come?

Pixel-Level Im Image Understanding wit ith Semantic Segmentation and Panoptic Segmentation

Lecture 8: Image Segmentation Peng Chao Face++ Researcher pengchao@megvii.com Nov. 2017

Co-Segmentation of 3D Shapes via Subspace Clustering Ruizhen Hu Lubin Fan

Introduction to RFM segmentation Karolis Urbonas Head of Data Science, Amazon DataCamp

Image Segmentation Machine Learning Study Group Presented by Yaochen Xie Jan 25, 2018 Outline

Counting CS1200, CSE IIT Madras Meghana Nasre April 2, 2020 CS1200, CSE IIT Madras Meghana

Flexible parametric joint modelling of longitudinal and survival data Workshop on Flexible Models

Patients with Acute or Chronic Non-Cancer Pain Applicant Town Hall Cycle 3, 2016 November 3,

&amp; Why We Need 24/6 (Sessions 1 &amp; 2) 1. How can keeping a weekly Stop Day help heal you?

Bigtable David Wyrobnik, MEng Overview What is Bigtable? Data Model API

Homelessness and access to General Practice Training pack for reception staff People who sleep

Future vision for Primary Care Jonathan Kerry, Senior Assistant of Primary Care 1. To give you

Interim Results Highlights: Quick COVID-19 Primary Care Survey Melinda Davis, PhD Associate

A fleet of packages for inputting United Kingdom primary care data Roger B. Newson

& Why We Need 24/6 (Sessions 1 & 2) 1. How can keeping a weekly Stop Day help heal you?