Restarted Bayesian Online Change-point Detector achieves Optimal Detection Delay Reda ALAMI Joint work with Odalric Maillard and Raphael F´ eraud. reda.alami@total.com Presented at ICML 2020
Overview ◮ A pruning version of the Bayesian Online Change-point Detector. 2/14
Overview ◮ A pruning version of the Bayesian Online Change-point Detector. ◮ High probability guarantees in term of: 2/14
Overview ◮ A pruning version of the Bayesian Online Change-point Detector. ◮ High probability guarantees in term of: ◮ False alarm rate. 2/14
Overview ◮ A pruning version of the Bayesian Online Change-point Detector. ◮ High probability guarantees in term of: ◮ False alarm rate. ◮ Detection delay . 2/14
Overview ◮ A pruning version of the Bayesian Online Change-point Detector. ◮ High probability guarantees in term of: ◮ False alarm rate. ◮ Detection delay . ◮ The detection delay is asymptotically optimal 2/14
Overview ◮ A pruning version of the Bayesian Online Change-point Detector. ◮ High probability guarantees in term of: ◮ False alarm rate. ◮ Detection delay . ◮ The detection delay is asymptotically optimal (reaching the existing lower bound [Lai and Xing, 2010]) . 2/14
Overview ◮ A pruning version of the Bayesian Online Change-point Detector. ◮ High probability guarantees in term of: ◮ False alarm rate. ◮ Detection delay . ◮ The detection delay is asymptotically optimal (reaching the existing lower bound [Lai and Xing, 2010]) . ◮ Empirical comparisons with the original BOCPD [Fearnhead and Liu, 2007] 2/14
Overview ◮ A pruning version of the Bayesian Online Change-point Detector. ◮ High probability guarantees in term of: ◮ False alarm rate. ◮ Detection delay . ◮ The detection delay is asymptotically optimal (reaching the existing lower bound [Lai and Xing, 2010]) . ◮ Empirical comparisons with the original BOCPD [Fearnhead and Liu, 2007] and the Improved Generalized Likelihood Ratio test [Maillard, 2019]. 2/14
Setting & Notations 3/14
Setting & Notations ◮ B ( µ t ) : Bernoulli distribution of mean µ t ∈ [0 , 1] . 3/14
Setting & Notations ◮ B ( µ t ) : Bernoulli distribution of mean µ t ∈ [0 , 1] . ◮ Piece-wise stationary process: ∀ c ∈ [1 , C ] , ∀ t ∈ T c = [ τ c , τ c +1 ) µ t = θ c 3/14
Setting & Notations ◮ B ( µ t ) : Bernoulli distribution of mean µ t ∈ [0 , 1] . ◮ Piece-wise stationary process: ∀ c ∈ [1 , C ] , ∀ t ∈ T c = [ τ c , τ c +1 ) µ t = θ c ◮ Sequence of observations: x s : t = ( x s , ...x t ) . 3/14
Setting & Notations ◮ B ( µ t ) : Bernoulli distribution of mean µ t ∈ [0 , 1] . ◮ Piece-wise stationary process: ∀ c ∈ [1 , C ] , ∀ t ∈ T c = [ τ c , τ c +1 ) µ t = θ c ◮ Sequence of observations: x s : t = ( x s , ...x t ) . ◮ Length: n s : t = t − s + 1 . 3/14
Setting & Notations ◮ B ( µ t ) : Bernoulli distribution of mean µ t ∈ [0 , 1] . ◮ Piece-wise stationary process: ∀ c ∈ [1 , C ] , ∀ t ∈ T c = [ τ c , τ c +1 ) µ t = θ c ◮ Sequence of observations: x s : t = ( x s , ...x t ) . ◮ Length: n s : t = t − s + 1 . 3/14
Bayesian Online Change-point Detector Runlength inference Runlength inference 4/14
Bayesian Online Change-point Detector Runlength inference Runlength inference Runlength r t : number of time steps since the last change-point. � ∀ r t ∈ [0 , t − 1] p ( r t | x 1: t ) ∝ p ( r t | r t − 1 ) p ( x t | r t − 1 , x 1: t − 1 ) p ( r t − 1 | x 1: t − 1 ) � �� � � �� � � �� � r t − 1 ∈ [0 ,t − 2] Runlength distribution at t hazard UPM 4/14
Bayesian Online Change-point Detector Runlength inference Runlength inference Runlength r t : number of time steps since the last change-point. � ∀ r t ∈ [0 , t − 1] p ( r t | x 1: t ) ∝ p ( r t | r t − 1 ) p ( x t | r t − 1 , x 1: t − 1 ) p ( r t − 1 | x 1: t − 1 ) � �� � � �� � � �� � r t − 1 ∈ [0 ,t − 2] Runlength distribution at t hazard UPM Constant hazard rate assumption ( h ∈ (0 , 1) ) (geometric inter-arrival time of change-point): 4/14
Bayesian Online Change-point Detector Runlength inference Runlength inference Runlength r t : number of time steps since the last change-point. � ∀ r t ∈ [0 , t − 1] p ( r t | x 1: t ) ∝ p ( r t | r t − 1 ) p ( x t | r t − 1 , x 1: t − 1 ) p ( r t − 1 | x 1: t − 1 ) � �� � � �� � � �� � r t − 1 ∈ [0 ,t − 2] Runlength distribution at t hazard UPM Constant hazard rate assumption ( h ∈ (0 , 1) ) (geometric inter-arrival time of change-point): � p ( r t = r t − 1 + 1 | x 1: t ) ∝ (1 − h ) p ( x t | r t − 1 , x 1: t − 1 ) p ( r t − 1 | x 1: t − 1 ) ∝ h � p ( r t = 0 | x 1: t ) r t − 1 p ( x t | r t − 1 , x 1: t − 1 ) p ( r t − 1 | x 1: t − 1 ) 4/14
Bayesian Online Change-point Detector Runlength inference Runlength inference Runlength r t : number of time steps since the last change-point. � ∀ r t ∈ [0 , t − 1] p ( r t | x 1: t ) ∝ p ( r t | r t − 1 ) p ( x t | r t − 1 , x 1: t − 1 ) p ( r t − 1 | x 1: t − 1 ) � �� � � �� � � �� � r t − 1 ∈ [0 ,t − 2] Runlength distribution at t hazard UPM Constant hazard rate assumption ( h ∈ (0 , 1) ) (geometric inter-arrival time of change-point): � p ( r t = r t − 1 + 1 | x 1: t ) ∝ (1 − h ) p ( x t | r t − 1 , x 1: t − 1 ) p ( r t − 1 | x 1: t − 1 ) ∝ h � p ( r t = 0 | x 1: t ) r t − 1 p ( x t | r t − 1 , x 1: t − 1 ) p ( r t − 1 | x 1: t − 1 ) p ( x t | r t − 1 , x 1: t − 1 ) is computed via the Laplace predictor as MLE: � � t i = s x i +1 if x t +1 = 1 n s : t +2 Lp ( x t +1 | x s : t ) := � t i = s (1 − x i )+1 if x t +1 = 0 n s : t +2 4/14
Bayesian Online Change-point Detector Forecaster Learning Forecaster Learning Instead of runlength r t ∈ [0 , t − 1] , use the forecaster notion. Forecaster weight: ∀ s ∈ [1 , t ] v s,t := p ( r t = t − s | x s : t ) 5/14
Bayesian Online Change-point Detector Forecaster Learning Forecaster Learning Instead of runlength r t ∈ [0 , t − 1] , use the forecaster notion. Forecaster weight: ∀ s ∈ [1 , t ] v s,t := p ( r t = t − s | x s : t ) � (1 − h ) exp ( − l s,t ) v s,t − 1 ∀ s < t, v s,t = h � t − 1 i =1 exp ( − l i,t ) v i,t − 1 s = t . 5/14
Bayesian Online Change-point Detector Forecaster Learning Forecaster Learning Instead of runlength r t ∈ [0 , t − 1] , use the forecaster notion. Forecaster weight: ∀ s ∈ [1 , t ] v s,t := p ( r t = t − s | x s : t ) � (1 − h ) exp ( − l s,t ) v s,t − 1 ∀ s < t, v s,t = h � t − 1 i =1 exp ( − l i,t ) v i,t − 1 s = t . Instantaneous loss: l s,t := − log Lp ( x t | x s ′ : t − 1 ) . 5/14
Bayesian Online Change-point Detector Forecaster Learning Forecaster Learning Instead of runlength r t ∈ [0 , t − 1] , use the forecaster notion. Forecaster weight: ∀ s ∈ [1 , t ] v s,t := p ( r t = t − s | x s : t ) � � � � (1 − h ) n s : t h I { s � =1 } exp − � L s : t V s ∀ s < t, (1 − h ) exp ( − l s,t ) v s,t − 1 ∀ s < t, v s,t = v s,t = h � t − 1 i =1 exp ( − l i,t ) v i,t − 1 s = t . hV t s = t. Instantaneous loss: l s,t := − log Lp ( x t | x s ′ : t − 1 ) . 5/14
Bayesian Online Change-point Detector Forecaster Learning Forecaster Learning Instead of runlength r t ∈ [0 , t − 1] , use the forecaster notion. Forecaster weight: ∀ s ∈ [1 , t ] v s,t := p ( r t = t − s | x s : t ) � � � � (1 − h ) n s : t h I { s � =1 } exp − � L s : t V s ∀ s < t, (1 − h ) exp ( − l s,t ) v s,t − 1 ∀ s < t, v s,t = v s,t = h � t − 1 i =1 exp ( − l i,t ) v i,t − 1 s = t . hV t s = t. Instantaneous loss: l s,t := − log Lp ( x t | x s ′ : t − 1 ) . L s : t := � t s ′ = s l s,t : cumulative loss and V t = � t � s =1 v s,t 5/14
Main difficulty to provide the theoretical guarantees Lemma (Computing the initial weight V t ) 6/14
Main difficulty to provide the theoretical guarantees Lemma (Computing the initial weight V t ) � � k − 1 t − 1 � h V t = (1 − h ) t − 2 ˜ V k : t , 1 − h k =1 . 6/14
Main difficulty to provide the theoretical guarantees Lemma (Computing the initial weight V t ) � � k − 1 t − 1 � h V t = (1 − h ) t − 2 ˜ V k : t , where: 1 − h k =1 t − ( k − 1) � � � � � � t − k t − 2 k − 2 � � � � ˜ − � − � − � V k : t = exp × exp × exp ... L 1: i 1 L i j +1: i j +1 L i k − 1 +1: t − 1 , i 1 =1 i 2 = i 1 +1 i k − 1 = i k − 2 +1 j =1 . 6/14
Main difficulty to provide the theoretical guarantees Lemma (Computing the initial weight V t ) � � k − 1 t − 1 � h V t = (1 − h ) t − 2 ˜ V k : t , where: 1 − h k =1 t − ( k − 1) � � � � � � t − k t − 2 k − 2 � � � � ˜ − � − � − � V k : t = exp × exp × exp ... L 1: i 1 L i j +1: i j +1 L i k − 1 +1: t − 1 , i 1 =1 i 2 = i 1 +1 i k − 1 = i k − 2 +1 j =1 � t − 2 � t − ( k − 1) t − k t − 2 � � � 1 = ... . with: k − 1 i 1 =1 i 2 = i 1 +1 i k − 1 = i k − 2 +1 6/14
Recommend
More recommend