Comments on Choice of ARMA model • Keep it simple! Use small p and q . • Some systems have autoregressive-like structure. • E.g. first order dynamics: dx ( t ) = − αx ( t ) dt or in stochastic form, dx ( t ) = − αx ( t ) dt + dW ( t ) where W ( t ) is a Wiener process , the continuous time limit of the random walk. 1
• Discrete time approximation: δx ( t ) = x ( t + δt ) − x ( t ) = − αx ( t ) δt + δW ( t ) or x ( t + δt ) = x ( t ) − αx ( t ) δt + δW ( t ) = (1 − αδt ) x ( t ) + δW ( t ) , an AR(1) (causal if α > 0 and δt is small). • Similarly a second order system leads to AR(2). • Since many real-world systems can be approximated by first or second order dynamics, this suggests using p = 1 or 2, and q = 0. 2
• Some systems have more dimensions. E.g. first order vector autoregression, VAR p (1): = + . x t Φ x t − 1 w t p × 1 p × 1 p × 1 p × p • Here each component time series is typically ARMA( p, p − 1). • This suggests using q < p , especially q = p − 1. 3
• Added noise: if y t is ARMA( p, q ) with q < p , but we observe x t = y t + w ′ t where w ′ t is white noise, uncorrelated with y t , then x t is ARMA( p, p ). • This suggests using q = p . • Summary: you’ll often find that you can use small p and q ≤ p , perhaps q = 0 or q = p − 1 or q = p , depending on the background of the series. 4
Estimation • Current methods are likelihood-based: f 1 , 2 ,...,n ( x 1 , x 2 , . . . , x n ) = f 1 ( x 1 ) × f 2 | 1 ( x 2 | x 1 ) × . . . � . � x n | x n − 1 , x n − 2 , . . . , x 1 × f n | n − 1 ,..., 1 • If x t is AR( p ) and n > p , then � = f n | n − 1 ,..., 1 � x n | x n − 1 , x n − 2 , . . . , x 1 � . f n | n − 1 ,...,n − p � x n | x n − 1 , x n − 2 , . . . , x n − p 5
• Assume x t is Gaussian. E.g. AR(1): f t | t − 1 ( x t | x t − 1 ) is N [(1 − φ ) µ + φx t − 1 , σ 2 w ] for t > 1 , and f 1 ( x 1 ) is N [ µ, σ 2 w / (1 − φ 2 )] . • So the likelihood, still for AR(1), is � � − S ( µ, φ ) � 1 − φ 2 exp L ( µ, φ, σ 2 w ) = (2 πσ 2 w ) − n/ 2 , 2 σ 2 w where n S ( µ, φ ) = (1 − φ 2 ) ( x 1 − µ ) 2 + �� 2 . � � ( x t − µ ) − φ � x t − 1 − µ t =2 6
Methods in proc arima • method = ml : maximize the likelihood. • method = uls : minimize the unconditional sum of squares S ( µ, φ ). • method = cls : minimize the conditional sum of squares S c ( µ, φ ): S c ( µ, φ ) = S ( µ, φ ) − (1 − φ 2 ) ( x 1 − µ ) 2 n �� 2 . � � ( x t − µ ) − φ � x t − 1 − µ = t =2 This is essentially least squares regression of x t on x t − 1 . 7
• AR( p ), p > 1, can be handled similarly. • ARMA( p, q ) with q > 0 is more complicated; state space methods can be used to calculate the exact likelihood. • proc arima implements the same three methods in all cases. • All three methods give estimators with the same large-sample normal distribution; all are asymptotically optimal. 8
Brute Force • Above methods fail (or need serious modification) if any data are missing. • Can always fall back to brute force: x 1 , x 2 , . . . , x n ∼ N n ( µ 1 , Γ ) , where γ (0) γ (1) γ (2) . . . γ ( n − 1) γ (1) γ (0) γ (1) . . . γ ( n − 2) = γ (2) γ (1) γ (0) . . . γ ( n − 3) Γ . . . . ... . . . . n × n . . . . γ ( n − 1) γ ( n − 2) γ ( n − 3) . . . γ (0) 9
• Write γ ( h ) = σ 2 w γ ∗ ( h ), and use e.g. R’s ARMAacf(...) to compute γ ∗ ( h ). • Likelihood is 1 − 1 � � 2( x − µ 1 ) ′ Γ − 1 ( x − µ 1 ) exp � det(2 π Γ ) � � 1 − 1 ( x − µ 1 ) ′ Γ ∗ − 1 ( x − µ 1 ) = exp 2 σ 2 � w Γ ∗ ) det(2 πσ 2 w • Can maximize analytically with respect to µ and σ 2 w , then numerically with respect to φ and θ . • Missing data? Just leave out corresponding rows and columns of Γ ∗ . 10
Recommend
More recommend