on some mixture models for inar 1 processes
play

On Some Mixture Models for INAR(1) Processes Helton Graziadei, - PowerPoint PPT Presentation

On Some Mixture Models for INAR(1) Processes Helton Graziadei, Paulo Marques and Hedibert Lopes IME-USP & Insper November 1st, 2019 Outline 1 Introduction 2 The AdINAR(1) Model 3 Learning the latent pattern of heterogeneity in time series


  1. On Some Mixture Models for INAR(1) Processes Helton Graziadei, Paulo Marques and Hedibert Lopes IME-USP & Insper November 1st, 2019

  2. Outline 1 Introduction 2 The AdINAR(1) Model 3 Learning the latent pattern of heterogeneity in time series of counts 4 Future work

  3. Introduction

  4. Time series of counts arise in a wide range of applications such as econometrics, public policy and environmental studies. Traditional time series models consider continuously valued processes. In count scenarios, continuous time series models are not suitable for analyzing discrete data. We WILL NOT pursue the well known class of generalized dynamic linear models. We assume here a special autoregressive structure for discrete variables [Alzaid and Al-Osh, 1987, McKenzie, 1985]. We consider some mixture models on the innovation process as a means to improve forecasting accuracy.

  5. INAR(1) process Consider a Markov process { Y t } t ∈ N represented by the following functional form [McKenzie, 1985, Alzaid and Al-Osh, 1987]: Y t = α ◦ Y t − 1 + Z t , ���� ���� � �� � Count at time t Innovation at time t Survivors from t − 1 where Y t − 1 � M t = α ◦ Y t − 1 = B i ( t ) , i =1 is refereed to here as maturation at time t , and { B i ( t ) } is a collection of independent Bernoulli( α ) random variables. The original formulation assumes that Z t follows a parametric model, usually a Poisson or a Geometric distribution.

  6. Our contributions 1 Model Z t via a Poisson-Geometric mixture to account for over-dispersion in time series of counts. 2 Develop a semi-parametric model based on the Dirichlet Process in order to learn the patters of heterogeneity in time series of counts. 3 Investigate the Pitman-Yor process to robustify inference for the number of clusters.

  7. The AdINAR(1) Model

  8. The AdINAR(1) model is defined such that Z t is a mixture of a Geometric and a Poisson distributions z t | θ, λ, w ∼ w Geometric ( θ ) + (1 − w ) Poisson ( λ ) t = 2 , . . . , T , w ∈ [0 , 1] . As w becomes large, the innovation is contaminated by the Geometric distribution in the mixture, increasing variability of the process.

  9. w = 0.1 w = 0.1 30 30 20 20 Count Count 10 10 0 0 0 0 25 25 50 50 75 75 100 100 Time Time w = 0.9 w = 0.9 30 30 20 20 Count Count 10 10 0 0 0 0 25 25 50 50 75 75 100 100 Time Time Figure: Typical simulated series for w = 0 . 1 and w = 0 . 9.

  10. The joint distribution of ( Y 1 , . . . , Y T ), given α and λ , can be written as T � p ( y 1 , . . . , y T | α, θ, λ, w ) = p ( y t | y t − 1 , α, θ, λ, w ) . t =2

  11. The joint distribution of ( Y 1 , . . . , Y T ), given α and λ , can be written as T � p ( y 1 , . . . , y T | α, θ, λ, w ) = p ( y t | y t − 1 , α, θ, λ, w ) . t =2 The likelihood function of y = ( y 2 , . . . , y T ) is directly derived: Hence, the AdINAR(1) model likelihood function is given by min { y t − 1 , y t } T � y t − 1 � � � α m t (1 − α ) y t − 1 − m t × L y ( α, θ, λ, w ) = m t t =2 m t =0 � � w × θ (1 − θ ) y t − m t + (1 − w ) × e − λ λ y t − m t . ( y t − m t )!

  12. Reparameterizaton Let us introduce some new items. Let M = ( M 2 , . . . , M T ) be the set of maturations. Let the model be augmented by the latent varables u = ( u 2 , . . . , u T ) such that u t = 1 , if z t | θ ∼ Geometric ( θ ) or u t = 0 , if z t | λ ∼ Poisson ( λ ) , for t = 2 , . . . , T .

  13. Conditionally conjugate priors Beta ( a ( α ) 0 , b ( α ) Thinning: α ∼ 0 ) Beta ( a ( w ) , b ( w ) Weight: w ∼ ) 0 0 Beta ( a ( θ ) 0 , b ( θ ) Geometric: θ ∼ 0 ) Gamma ( a ( λ ) 0 , b ( λ ) Poisson: λ ∼ 0 )

  14. Simpler conditional distributions Postulate that: p ( y t | m t , u t = 1) = θ (1 − θ ) y t − m t I { m t , m t +1 ,... } ( y t ) , p ( y t | m t , u t = 0) = e − λ λ y t − m t ( y t − m t )! I { m t , m t +1 ,... } ( y t ) , � y t − 1 � α m t (1 − α ) y t − 1 − m t . p ( m t | α, y t − 1 ) = m t for t = 2 , . . . , T . It is possible to show that using these conditional distributions, we recover the original likelihood.

  15. Full conditionals The full conditional distributions are simply derived: � � T T � � a ( α ) m t , b ( α ) ( α | . . . ) ∼ Beta + + ( y t − 1 − m t ) 0 0 t =2 t =2 � � T T � � a ( w ) u t , b ( w ) ( w | . . . ) ∼ Beta + + ( T − 1) − u t 0 0 t =2 t =2   T � �  a ( θ ) u t , b ( θ ) ( θ | . . . ) ∼ Beta + + ( y t − m t )  0 0 t =2 { t : u t =1 }   T � �  a ( λ ) ( y t − m t ) , b ( λ ) ( λ | . . . ) ∼ Gamma + + ( T − 1) − u t  0 0 t =2 { t : u t =0 }

  16. Full conditionals Additionally, Pr { U t = 1 | . . . } ∝ w θ (1 − θ ) y t − m t ; Pr { U t = 0 | . . . } ∝ (1 − w ) e − λ λ y t − m T ( y t − m t )! , and Pr { M t = m t | . . . }  � � m t 1 α  if u t = 1   ( y t − 1 − m t )! m t ! (1 − θ )(1 − α ) � � m t ∝ 1 α  if u t = 0   ( y t − m t )! ( y t − 1 − m t )! m t ! λ (1 − α ) for t = 2 , . . . , T , m t = 0 , 1 , . . . , min { y t , y t − 1 } .

  17. Direct acyclic graph w U t − 1 U t θ λ Y t − 1 Y t M t − 1 M t t = 2 , . . . , T α

  18. This mixture distribution allows the model to account for overdispersion in a time series of counts and accommodate inflation of zeros. In what follows, we extend the 2-component mixture of distributions by a generalized, DP-based version of the INAR(1) model.

  19. Learning the latent pattern of heterogeneity in time series of counts

  20. The Dirichlet Process Given a measurable space ( X , B ) and a probability space (Ω , F , Pr), a random probability measure G is a mapping G : B × Ω → [0 , 1]. Definition (Ferguson, 1973): Let α be a finite non-null measure on ( X , B ). We say G is a Dirichlet process if, for every measurable partition { B 1 , . . . , B k } of X , the random vector ( P ( B 1 ) , . . . , P ( B k )) follows a Dirichlet distribution with parameter vector ( α ( B 1 ) , . . . , α ( B k )). Let τ = α ( X ) be the concentration parameter and, for every B ∈ B , G 0 ( B ) = α ( B ) /α ( X ) the base measure which leads to a suitable parametrization in terms of a probability measure. Under this formulation, we denote G ∼ DP ( τ G 0 ).

  21. The Dirichlet Process 1 E( G ( B )) = G 0 ( B ) . 2 Var( G ( B )) = G 0 ( B )(1 − G 0 ( B )) . τ +1 3 Assume that, given a Dirichlet process G with parameter α , X 1 , . . . , X n are conditionally independent and identically distributed such that P ( X i ∈ B | G ) = G ( B ) i = 1 , . . . , n , then G | X 1 , . . . , X n ∼ DP ( β ), where β ( C ) = α ( C ) + � n i =1 I C ( X i ). 4 As shown by [Blackwell and MacQueen, 1973] the predictive distribution of X n +1 , n ≥ 1, given X 1 , . . . , X n may be obtained integrating out G , which entails that n τ 1 � X n +1 | X 1 , . . . , X n ∼ τ + nG 0 + δ X i , τ + n i =1 where δ x denotes a point mass on x .

  22. Dirichlet and Pitman-Yor Processes The discrete parcel in the predictive distribution implies the clustering property of the Dirichlet process, which induces a probability distribution on the number of distinct values in ( X 1 , . . . , X n ), which we denote by k . [Pitman and Yor, 1997] generalized the Dirichlet process introducing a discount parameter σ , The predictive distribution for the Pitman-Yor process is given by: � � n X n +1 | X 1 , . . . , X n ∼ τ + k σ 1 1 − σ � τ + n G 0 + δ X i , τ + n n i i =1 where n i is the number of elements in ( X 1 , . . . , X n ) equal to X i , σ ∈ [0 , 1].

  23. The Pitman-Yor process with high σ induces less informative prior distributions for K [Pitman and Yor, 1997, De Blasi et al., 2013]. σ = 0 σ = 0.25 0.12 0.20 0.10 0.15 0.08 p(k) p(k) 0.06 0.10 0.04 0.05 0.02 0.00 0.00 1 2 3 4 5 6 7 8 9 10 12 1 3 5 7 9 11 13 15 17 19 21 23 k k σ = 0.5 σ = 0.75 0.06 0.025 0.05 0.020 0.04 0.015 p(k) p(k) 0.03 0.010 0.02 0.005 0.01 0.00 0.000 1 5 9 13 18 23 28 33 39 44 1 6 12 19 26 33 40 47 54 61 68 75

  24. In the INAR(1) structure, we now assume the innovation process is time-varying, i.e., E ( Z t ) = λ t . From a realization of the process y 1 , . . . , y T , we want to learn the distribution of each λ t and represent our uncertainties about the future steps Y T +1 , . . . , Y T + h in order to forecast them. We create clusters of innovation rates as a means to learn the latent patterns of heterogeneity in the count time series.

  25. DAG τ G λ 2 λ 3 λ T − 1 λ T . . . Y 1 Y 2 Y 3 Y T − 1 Y T . . . M 2 M 3 M T − 1 M T . . . α

  26. Let y = ( y 1 , . . . , y T ) and m = ( m 2 , . . . , m T ). To obtain the posterior p ( α, λ, m ) we integrate out the random distribution P . From the parametric part in the graph, we have that: � p ( y , m , α, λ ) = p ( y , m , α, λ | G ) d µ G ( G ) � T � � = p ( y t | m t , λ t ) p ( m t | y t − 1 , α ) × t =2 T � � π ( α ) × p ( λ t | G ) d µ G ( G ) . t =2 The random vector ( λ 2 , . . . , λ T ) has an exchangeable distribution.

Recommend


More recommend