estimation of autoregressive processes with sparse
play

Estimation of Autoregressive Processes with Sparse Parameters Abbas - PowerPoint PPT Presentation

Estimation of Autoregressive Processes with Sparse Parameters Abbas Kazemipour MAST Group Meeting University of Maryland. College Park kaazemi@umd.edu November 18, 2015 Abbas Kazemipour (UMD) Sparse AR November 18, 2015 1 / 1 Overview


  1. Introduction 1 Yule-Walker equations: 1 r 0 = θ T r − 1 R p × p θ = r − 1 − p , − p + 1 = | 1 − � i θ i | 2 , (4) 2 R := R p × p = E [ x p 1 x pT 1 ] : p × p covariance matrix of the process 3 r k = E [ x i x i + k ] is the k -th autocorrelation 4 If n ≫ p ⇒ estimate R , r k ’s + Yule-Walker Does not exploit the sparsity of θ Does not perform well when n is small or comparable with p . Poor estimates mainly due to the fact that requires an inversion of � R p which might not be numerically stable. 5 Usually biased estimates are used at the cost of distorting the Yule-Walker equations. Abbas Kazemipour (UMD) Sparse AR November 18, 2015 6 / 1

  2. Introduction 1 Yule-Walker equations: 1 r 0 = θ T r − 1 R p × p θ = r − 1 − p , − p + 1 = | 1 − � i θ i | 2 , (4) 2 R := R p × p = E [ x p 1 x pT 1 ] : p × p covariance matrix of the process 3 r k = E [ x i x i + k ] is the k -th autocorrelation 4 If n ≫ p ⇒ estimate R , r k ’s + Yule-Walker Does not exploit the sparsity of θ Does not perform well when n is small or comparable with p . Poor estimates mainly due to the fact that requires an inversion of � R p which might not be numerically stable. 5 Usually biased estimates are used at the cost of distorting the Yule-Walker equations. Abbas Kazemipour (UMD) Sparse AR November 18, 2015 6 / 1

  3. Introduction 1 Yule-Walker equations: 1 r 0 = θ T r − 1 R p × p θ = r − 1 − p , − p + 1 = | 1 − � i θ i | 2 , (4) 2 R := R p × p = E [ x p 1 x pT 1 ] : p × p covariance matrix of the process 3 r k = E [ x i x i + k ] is the k -th autocorrelation 4 If n ≫ p ⇒ estimate R , r k ’s + Yule-Walker Does not exploit the sparsity of θ Does not perform well when n is small or comparable with p . Poor estimates mainly due to the fact that requires an inversion of � R p which might not be numerically stable. 5 Usually biased estimates are used at the cost of distorting the Yule-Walker equations. Abbas Kazemipour (UMD) Sparse AR November 18, 2015 6 / 1

  4. Our Formulation 1 LASSO type estimator given by a conditional log-likelihood penalization: 1 n � x n 1 − X θ � 2 2 + γ n � θ � 1 , minimize (5) θ ∈ R p where   · · · x n − 1 x n − 2 x n − p   x n − 2 x n − 3 · · · x n − p − 1   X =   .  (6) . . . ... . . .  . . . x 0 x − 1 · · · x − p +1 2 X is toeplitz matrix with highly correlated elements . Abbas Kazemipour (UMD) Sparse AR November 18, 2015 7 / 1

  5. Our Formulation 1 LASSO type estimator given by a conditional log-likelihood penalization: 1 n � x n 1 − X θ � 2 2 + γ n � θ � 1 , minimize (5) θ ∈ R p where   · · · x n − 1 x n − 2 x n − p   x n − 2 x n − 3 · · · x n − p − 1   X =   .  (6) . . . ... . . .  . . . x 0 x − 1 · · · x − p +1 2 X is toeplitz matrix with highly correlated elements . Abbas Kazemipour (UMD) Sparse AR November 18, 2015 7 / 1

  6. Results Theorem If σ s ( θ ) = O ( √ s ) , there exist positive constants c 1 , c 2 , c 3 and c ǫ such that for n > c ǫ sp 2 / 3 (log p ) 2 / 3 and a choice of regularization parameter � n , any solution � log p γ n = c 1 θ sp to ( ?? ) satisfies the bound � � � � � s log p log p � � �� 4 θ sp − θ 2 ≤ c 2 + c 2 σ s ( θ ) n , (7) � n with probability greater than 1 − O ( 1 n c 3 ) . 1 n can be much less than p 2 Better than Yule-Walker Abbas Kazemipour (UMD) Sparse AR November 18, 2015 8 / 1

  7. Results Theorem If σ s ( θ ) = O ( √ s ) , there exist positive constants c 1 , c 2 , c 3 and c ǫ such that for n > c ǫ sp 2 / 3 (log p ) 2 / 3 and a choice of regularization parameter � n , any solution � log p γ n = c 1 θ sp to ( ?? ) satisfies the bound � � � � � s log p log p � � �� 4 θ sp − θ 2 ≤ c 2 + c 2 σ s ( θ ) n , (7) � n with probability greater than 1 − O ( 1 n c 3 ) . 1 n can be much less than p 2 Better than Yule-Walker Abbas Kazemipour (UMD) Sparse AR November 18, 2015 8 / 1

  8. Results Theorem If σ s ( θ ) = O ( √ s ) , there exist positive constants c 1 , c 2 , c 3 and c ǫ such that for n > c ǫ sp 2 / 3 (log p ) 2 / 3 and a choice of regularization parameter � n , any solution � log p γ n = c 1 θ sp to ( ?? ) satisfies the bound � � � � � s log p log p � � �� 4 θ sp − θ 2 ≤ c 2 + c 2 σ s ( θ ) n , (7) � n with probability greater than 1 − O ( 1 n c 3 ) . 1 n can be much less than p 2 Better than Yule-Walker Abbas Kazemipour (UMD) Sparse AR November 18, 2015 8 / 1

  9. Simulation: p = 100, n = 500, s = 3 and γ n = 0 . 1. True Parameters 0.1 0 -0.1 -0.2 20 40 60 80 100 Regularized ML 0.05 0 -0.05 20 40 60 80 100 Yule-Walker 0.1 0 -0.1 -0.2 20 40 60 80 100 1 Abbas Kazemipour (UMD) Sparse AR November 18, 2015 9 / 1 Figure: Recovery Results for n = 500 , p = 100 , s = 3

  10. Proof of the Main Theorem Lemma (Cone Condition) n � X T ( x n For a choice of the regularization parameter γ n = 2 1 − X θ ) � ∞ the optimal error h = � θ − θ belongs to the cone C := { h ∈ R p |� h S c � 1 ≤ 3 � h S � 1 } . (8) Abbas Kazemipour (UMD) Sparse AR November 18, 2015 10 / 1

  11. Proof of the Main Theorem Definition (Restricted Eigenvalue Condition) X is said to satisfy the RE condition of order s if 2 ≥ 1 n θ T X T X θ = 1 λ max ( s ) � θ � 2 n � X θ � 2 2 ≥ λ min ( s ) � θ � 2 2 , (9) for all θ which is s -sparse. 1 Essentially requires the eigenvalues of all n × s submatrices of X to be bounded and strictly positive. Abbas Kazemipour (UMD) Sparse AR November 18, 2015 11 / 1

  12. Proof of the Main Theorem Definition (Restricted Eigenvalue Condition) X is said to satisfy the RE condition of order s if 2 ≥ 1 n θ T X T X θ = 1 λ max ( s ) � θ � 2 n � X θ � 2 2 ≥ λ min ( s ) � θ � 2 2 , (9) for all θ which is s -sparse. 1 Essentially requires the eigenvalues of all n × s submatrices of X to be bounded and strictly positive. Abbas Kazemipour (UMD) Sparse AR November 18, 2015 11 / 1

  13. Proof of the Main Theorem Definition (Restricted Strong Convexity) X is said to satisfy the RSC condition of order s if n h T X T X h = 1 1 n � X h � 2 2 ≥ κ � h � 2 ∀ h ∈ C . 2 , (10) Abbas Kazemipour (UMD) Sparse AR November 18, 2015 12 / 1

  14. Proof of the Main Theorem Lemma (Theorem 1 of Negahban) If X satisfies the RSC condition then any optimal solution � θ satisfies θ − θ � 2 ≤ 2 √ sγ n � � κ θ − θ � 1 ≤ 6 sγ n � � . ( ⋆ ) κ Abbas Kazemipour (UMD) Sparse AR November 18, 2015 13 / 1

  15. Proof of the Main Theorem Lemma (lemma 4.1 of Bickel) If X satisfies the RE condition of order s ⋆ = ( r + 1) s then the RSC condition is also satisfied with � � � � λ max ( rs ) κ = λ min (( r + 1) s ) 1 − 3 . (11) rλ min (( r + 1) s ) Abbas Kazemipour (UMD) Sparse AR November 18, 2015 14 / 1

  16. Proof of the Main Theorem 1 Step 1: Finding a lower bound on κ 2 Step 2: Finding an upper bound on γ n Abbas Kazemipour (UMD) Sparse AR November 18, 2015 15 / 1

  17. Proof of the Main Theorem 1 Step 1: Finding a lower bound on κ 2 Step 2: Finding an upper bound on γ n Abbas Kazemipour (UMD) Sparse AR November 18, 2015 15 / 1

  18. Finding a lower bound on κ Lemma (Haykin) Let R ∈ R k × k be the k × k covariance matrix of a stationary process with power spectral density S ( ω ) , and denote its maximum and minimum eigenvalues by φ max ( k ) and φ min ( k ) respectively then φ min ( k ) ↓ inf ω S ( ω ) , (12) and φ max ( k ) ↑ sup ω S ( ω ) . (13) Abbas Kazemipour (UMD) Sparse AR November 18, 2015 16 / 1

  19. Convergence of the Eigenvalues of R 1 Minimum and Maximum Eigenvalues of R Power Spectral Density 4 4 λ min ( k ) 3.5 3.5 λ max ( k ) 3 3 2.5 2.5 2 2 1.5 1.5 1 1 0.5 0.5 0 0 0 100 200 300 0 0.2 0.4 0.6 0.8 1 k f Figure: Recovery Results for n = 500 , p = 100 , s = 3 Abbas Kazemipour (UMD) Sparse AR November 18, 2015 17 / 1

  20. Finding a lower bound on κ Corollary (RE of R ) Under the assumptions of our problem, for an AR process R satisfies RE condition (of any order) for λ max = 1 /ǫ 2 and λ min = 1 / 4 . Proof. For an AR( p ) process σ 2 w S ( ω ) = | 1 − � θ i e − jω | 2 . Using the assumption � | θ i |≤ 1 − ǫ in conjunction with lemma 6 proves the claim. 1 RE condition also holds for any stationary process satisfying inf ω S ( ω ) > 0 and sup ω S ( ω ) < ∞ ! Abbas Kazemipour (UMD) Sparse AR November 18, 2015 18 / 1

  21. Finding a lower bound on κ Corollary (RE of R ) Under the assumptions of our problem, for an AR process R satisfies RE condition (of any order) for λ max = 1 /ǫ 2 and λ min = 1 / 4 . Proof. For an AR( p ) process σ 2 w S ( ω ) = | 1 − � θ i e − jω | 2 . Using the assumption � | θ i |≤ 1 − ǫ in conjunction with lemma 6 proves the claim. 1 RE condition also holds for any stationary process satisfying inf ω S ( ω ) > 0 and sup ω S ( ω ) < ∞ ! Abbas Kazemipour (UMD) Sparse AR November 18, 2015 18 / 1

  22. Finding a lower bound on κ Lemma (RE of � R ) If R satisfies the RE condition with parameters λ max and λ min , then � R satisfies the RE condition of order s ⋆ with parameters min = λ min − ts ⋆ , where t = max i,j | � λ ′ max = λ max + ts ⋆ and λ ′ R ij − R ij | . Proof. For every s ⋆ -sparse vector θ we have θ T � R θ ≥ θ T R θ − t � θ � 2 1 ≥ ( λ min − ts ⋆ ) � θ � 2 2 , θ T � R θ ≤ θ T R θ + t � θ � 2 1 ≤ ( λ max + ts ⋆ ) � θ � 2 2 , which is what we claimed. Abbas Kazemipour (UMD) Sparse AR November 18, 2015 19 / 1

  23. Finding a lower bound on κ Lemma (RE of � R ) If R satisfies the RE condition with parameters λ max and λ min , then � R satisfies the RE condition of order s ⋆ with parameters min = λ min − ts ⋆ , where t = max i,j | � λ ′ max = λ max + ts ⋆ and λ ′ R ij − R ij | . Proof. For every s ⋆ -sparse vector θ we have θ T � R θ ≥ θ T R θ − t � θ � 2 1 ≥ ( λ min − ts ⋆ ) � θ � 2 2 , θ T � R θ ≤ θ T R θ + t � θ � 2 1 ≤ ( λ max + ts ⋆ ) � θ � 2 2 , which is what we claimed. 1 Abbas Kazemipour (UMD) Sparse AR November 18, 2015 20 / 1

  24. Finding a lower bound on κ Lemma (RE of � R ) If R satisfies the RE condition with parameters λ max and λ min , then � R satisfies the RE condition of order s ⋆ with parameters min = λ min − ts ⋆ , where t = max i,j | � λ ′ max = λ max + ts ⋆ and λ ′ R ij − R ij | . Proof. For every s ⋆ -sparse vector θ we have θ T � R θ ≥ θ T R θ − t � θ � 2 1 ≥ ( λ min − ts ⋆ ) � θ � 2 2 , θ T � R θ ≤ θ T R θ + t � θ � 2 1 ≤ ( λ max + ts ⋆ ) � θ � 2 2 , which is what we claimed. 1 Abbas Kazemipour (UMD) Sparse AR November 18, 2015 20 / 1

  25. Finding a lower bound on κ Lemma (RE of � R ) If R satisfies the RE condition with parameters λ max and λ min , then � R satisfies the RE condition of order s ⋆ with parameters min = λ min − ts ⋆ , where t = max i,j | � λ ′ max = λ max + ts ⋆ and λ ′ R ij − R ij | . 1 This holds for every t . 2 We will be interested in t = λ min / 2 s ⋆ . R = X T X/n , corollary 7 + lemma 5 with 3 Noting that � r = 288 / ( ǫ 2 + 1 / 8): ⇒ X satisfies the RSC with parameter √ κ = 1 / 4 2. 4 In order to complete this bound it only remains to show that t can be chosen to be suitably small. Abbas Kazemipour (UMD) Sparse AR November 18, 2015 21 / 1

  26. Finding a lower bound on κ Lemma (RE of � R ) If R satisfies the RE condition with parameters λ max and λ min , then � R satisfies the RE condition of order s ⋆ with parameters min = λ min − ts ⋆ , where t = max i,j | � λ ′ max = λ max + ts ⋆ and λ ′ R ij − R ij | . 1 This holds for every t . 2 We will be interested in t = λ min / 2 s ⋆ . R = X T X/n , corollary 7 + lemma 5 with 3 Noting that � r = 288 / ( ǫ 2 + 1 / 8): ⇒ X satisfies the RSC with parameter √ κ = 1 / 4 2. 4 In order to complete this bound it only remains to show that t can be chosen to be suitably small. Abbas Kazemipour (UMD) Sparse AR November 18, 2015 21 / 1

  27. Finding a lower bound on κ Lemma (RE of � R ) If R satisfies the RE condition with parameters λ max and λ min , then � R satisfies the RE condition of order s ⋆ with parameters min = λ min − ts ⋆ , where t = max i,j | � λ ′ max = λ max + ts ⋆ and λ ′ R ij − R ij | . 1 This holds for every t . 2 We will be interested in t = λ min / 2 s ⋆ . R = X T X/n , corollary 7 + lemma 5 with 3 Noting that � r = 288 / ( ǫ 2 + 1 / 8): ⇒ X satisfies the RSC with parameter √ κ = 1 / 4 2. 4 In order to complete this bound it only remains to show that t can be chosen to be suitably small. Abbas Kazemipour (UMD) Sparse AR November 18, 2015 21 / 1

  28. Finding a lower bound on κ Lemma (RE of � R ) If R satisfies the RE condition with parameters λ max and λ min , then � R satisfies the RE condition of order s ⋆ with parameters min = λ min − ts ⋆ , where t = max i,j | � λ ′ max = λ max + ts ⋆ and λ ′ R ij − R ij | . 1 This holds for every t . 2 We will be interested in t = λ min / 2 s ⋆ . R = X T X/n , corollary 7 + lemma 5 with 3 Noting that � r = 288 / ( ǫ 2 + 1 / 8): ⇒ X satisfies the RSC with parameter √ κ = 1 / 4 2. 4 In order to complete this bound it only remains to show that t can be chosen to be suitably small. Abbas Kazemipour (UMD) Sparse AR November 18, 2015 21 / 1

  29. Finding a lower bound on κ Lemma (RE of � R ) If R satisfies the RE condition with parameters λ max and λ min , then � R satisfies the RE condition of order s ⋆ with parameters min = λ min − ts ⋆ , where t = max i,j | � λ ′ max = λ max + ts ⋆ and λ ′ R ij − R ij | . 1 This holds for every t . 2 We will be interested in t = λ min / 2 s ⋆ . R = X T X/n , corollary 7 + lemma 5 with 3 Noting that � r = 288 / ( ǫ 2 + 1 / 8): ⇒ X satisfies the RSC with parameter √ κ = 1 / 4 2. 4 In order to complete this bound it only remains to show that t can be chosen to be suitably small. Abbas Kazemipour (UMD) Sparse AR November 18, 2015 21 / 1

  30. Concentration Inequlatiy Lemma (Concentration Inequality: Theorem 4 of Rudzkis) Let x n − p +1 be samples of a stationary process which satisfies x k = � ∞ j = −∞ b j − k w j , where w k ’s are i.i.d random variables with | E ( w K j ) |≤ C K .K ! , k = 2 , 3 , · · · , (14) and � ∞ j = −∞ | b j | < ∞ . Then the biased sample autocorrelation given by n + k � 1 r b � k = x i x j n + k i,j =1 ,j − i = k satisfies � � − c 2 t 3 / 2 √ r b k − r b P ( | � k | > t ) ≤ c 1 exp n + k , (15) for positive constants c 1 and c 2 which are independent of dimensions of the problem. Abbas Kazemipour (UMD) Sparse AR November 18, 2015 22 / 1

  31. Concentration Inequlatiy Corollary (Concentration Inequality for Unbiased Estimates) The unbiased estimate satisfies � � − c 2 n 3 / 2 t 3 / 2 P ( | � r k − r k | > t ) ≤ c 1 exp . (16) n + k λ min 1 Choose t ⋆ = 2( r +1) s = c ǫ /s + union bound ( k = p in all inequ.): � � − c 2 n 3 / 2 t 3 / 2 R ij − R ij | > t ⋆ ) ≤ c 1 p 2 exp ⋆ i,j | � P (max n + p � � c ǫ n 3 / 2 ≤ c 1 exp − s 3 / 2 ( n + p ) + 2 log p . 2 p ≫ n + choose n > csp 2 / 3 (log p ) 2 / 3 ⇒ bound on κ . Abbas Kazemipour (UMD) Sparse AR November 18, 2015 23 / 1

  32. Concentration Inequlatiy Corollary (Concentration Inequality for Unbiased Estimates) The unbiased estimate satisfies � � − c 2 n 3 / 2 t 3 / 2 P ( | � r k − r k | > t ) ≤ c 1 exp . (16) n + k λ min 1 Choose t ⋆ = 2( r +1) s = c ǫ /s + union bound ( k = p in all inequ.): � � − c 2 n 3 / 2 t 3 / 2 R ij − R ij | > t ⋆ ) ≤ c 1 p 2 exp ⋆ i,j | � P (max n + p � � c ǫ n 3 / 2 ≤ c 1 exp − s 3 / 2 ( n + p ) + 2 log p . 2 p ≫ n + choose n > csp 2 / 3 (log p ) 2 / 3 ⇒ bound on κ . Abbas Kazemipour (UMD) Sparse AR November 18, 2015 23 / 1

  33. Concentration Inequlatiy Corollary (Concentration Inequality for Unbiased Estimates) The unbiased estimate satisfies � � − c 2 n 3 / 2 t 3 / 2 P ( | � r k − r k | > t ) ≤ c 1 exp . (16) n + k λ min 1 Choose t ⋆ = 2( r +1) s = c ǫ /s + union bound ( k = p in all inequ.): � � − c 2 n 3 / 2 t 3 / 2 R ij − R ij | > t ⋆ ) ≤ c 1 p 2 exp ⋆ i,j | � P (max n + p � � c ǫ n 3 / 2 ≤ c 1 exp − s 3 / 2 ( n + p ) + 2 log p . 2 p ≫ n + choose n > csp 2 / 3 (log p ) 2 / 3 ⇒ bound on κ . Abbas Kazemipour (UMD) Sparse AR November 18, 2015 23 / 1

  34. Finding an upper bound on γ n 1 Gradient of the objective function � x n 1 − X θ � 2 2 : ∇ L ( θ ) := 2 nX T ( x n 1 − X θ ) , 2 Lemmas 8 and 4 suggest that a suitable choice of the regularization parameter is given by: γ n = �∇ L ( θ ) � ∞ . Abbas Kazemipour (UMD) Sparse AR November 18, 2015 24 / 1

  35. Finding an upper bound on γ n 1 Gradient of the objective function � x n 1 − X θ � 2 2 : ∇ L ( θ ) := 2 nX T ( x n 1 − X θ ) , 2 Lemmas 8 and 4 suggest that a suitable choice of the regularization parameter is given by: γ n = �∇ L ( θ ) � ∞ . Abbas Kazemipour (UMD) Sparse AR November 18, 2015 24 / 1

  36. Finding an upper bound on γ n 1 First it is easy to check that by the uncorrelatedness of w k ’s we have � � � � E [ ∇ L ( θ )] = 2 = 2 X T ( x n X T w n n E 1 − X θ ) n E = 0 . (17) 1 2 In linear regression terminology, (17) is known as the orthogonality principle. 3 We show that ∇ L ( θ ) is concentrated around its mean. Abbas Kazemipour (UMD) Sparse AR November 18, 2015 25 / 1

  37. Finding an upper bound on γ n 1 First it is easy to check that by the uncorrelatedness of w k ’s we have � � � � E [ ∇ L ( θ )] = 2 = 2 X T ( x n X T w n n E 1 − X θ ) n E = 0 . (17) 1 2 In linear regression terminology, (17) is known as the orthogonality principle. 3 We show that ∇ L ( θ ) is concentrated around its mean. Abbas Kazemipour (UMD) Sparse AR November 18, 2015 25 / 1

  38. Finding an upper bound on γ n 1 First it is easy to check that by the uncorrelatedness of w k ’s we have � � � � E [ ∇ L ( θ )] = 2 = 2 X T ( x n X T w n n E 1 − X θ ) n E = 0 . (17) 1 2 In linear regression terminology, (17) is known as the orthogonality principle. 3 We show that ∇ L ( θ ) is concentrated around its mean. Abbas Kazemipour (UMD) Sparse AR November 18, 2015 25 / 1

  39. Finding an upper bound on γ n 1 We have ( ∇ L ( θ )) i = 2 nx n − i T − i +1 w n 1 2 The j th element in this expansion is of the form y j = x n − i − j +1 w n − j +1 . 3 It is easy to check that the sequence y n 1 is a martingale with respect to the filtration given by � � x n − j +1 F j = σ , − p +1 where σ ( · ) denote the sigma-field generated by the random variables in its argument. Abbas Kazemipour (UMD) Sparse AR November 18, 2015 26 / 1

  40. Finding an upper bound on γ n 1 We have ( ∇ L ( θ )) i = 2 nx n − i T − i +1 w n 1 2 The j th element in this expansion is of the form y j = x n − i − j +1 w n − j +1 . 3 It is easy to check that the sequence y n 1 is a martingale with respect to the filtration given by � � x n − j +1 F j = σ , − p +1 where σ ( · ) denote the sigma-field generated by the random variables in its argument. Abbas Kazemipour (UMD) Sparse AR November 18, 2015 26 / 1

  41. Finding an upper bound on γ n 1 We have ( ∇ L ( θ )) i = 2 nx n − i T − i +1 w n 1 2 The j th element in this expansion is of the form y j = x n − i − j +1 w n − j +1 . 3 It is easy to check that the sequence y n 1 is a martingale with respect to the filtration given by � � x n − j +1 F j = σ , − p +1 where σ ( · ) denote the sigma-field generated by the random variables in its argument. Abbas Kazemipour (UMD) Sparse AR November 18, 2015 26 / 1

  42. Finding an upper bound on γ n 1 We will now state the following concentration result for sums of dependent random variables [ ? ]: Proposition Fix n ≥ 1 . Let Z j ’s be subgaussian F j -measurable random variables, satisfying for each j = 1 , 2 , · · · , n , E [ Z j |F j − 1 ] = 0 , almost surely , then there exists a constant c such that for all t > 0 , � �   � � n � � � � − cnt 2 � 1  � �  ≤ exp Z j − E [ Z j ] ≥ t . P � � n � � j =1 Abbas Kazemipour (UMD) Sparse AR November 18, 2015 27 / 1

  43. Finding an upper bound on γ n 1 Since y j ’s are a product of two independent subgaussian random variables, they are subgaussian as well. 2 Proposition 1 implies that P ( |∇ L ( θ ) i |≥ t ) ≤ exp( − cnt 2 ) . (18) By union bound, we get: � � ≤ exp( − ct 2 n + log p ) . P �∇ L ( θ ) � ∞ ≥ t (19) Choosing � � 1 + α 1 log p t = c n for some α 1 > 0 yields � � � � 1 + α 1 log p 2 P �∇ L ( θ ) � ∞ ≥ ≤ 2 exp( − α 1 log p ) ≤ n α 1 . c n Abbas Kazemipour (UMD) Sparse AR November 18, 2015 28 / 1

  44. Finding an upper bound on γ n 1 Since y j ’s are a product of two independent subgaussian random variables, they are subgaussian as well. 2 Proposition 1 implies that P ( |∇ L ( θ ) i |≥ t ) ≤ exp( − cnt 2 ) . (18) By union bound, we get: � � ≤ exp( − ct 2 n + log p ) . P �∇ L ( θ ) � ∞ ≥ t (19) Choosing � � 1 + α 1 log p t = c n for some α 1 > 0 yields � � � � 1 + α 1 log p 2 P �∇ L ( θ ) � ∞ ≥ ≤ 2 exp( − α 1 log p ) ≤ n α 1 . c n Abbas Kazemipour (UMD) Sparse AR November 18, 2015 28 / 1

  45. Finding an upper bound on γ n � � log p 1 Hence γ n ≤ d 2 1+ α 1 with d 2 := with probability at least n c 2 1 − n α 1 . 2 Combined with the result of Corollary 12 for n > d 1 sp 2 / 3 (log p ) 2 / 3 , we get the claim of Theorem 1. Abbas Kazemipour (UMD) Sparse AR November 18, 2015 29 / 1

  46. Finding an upper bound on γ n � � log p 1 Hence γ n ≤ d 2 1+ α 1 with d 2 := with probability at least n c 2 1 − n α 1 . 2 Combined with the result of Corollary 12 for n > d 1 sp 2 / 3 (log p ) 2 / 3 , we get the claim of Theorem 1. Abbas Kazemipour (UMD) Sparse AR November 18, 2015 29 / 1

  47. Future Work 1 Greedy methods 2 Penalized Yule-Walker 3 Dynamic ℓ 1 reconstruction 4 Dynamic Durbin Levinson Abbas Kazemipour (UMD) Sparse AR November 18, 2015 30 / 1

  48. Future Work 1 Greedy methods 2 Penalized Yule-Walker 3 Dynamic ℓ 1 reconstruction 4 Dynamic Durbin Levinson Abbas Kazemipour (UMD) Sparse AR November 18, 2015 30 / 1

  49. Future Work 1 Greedy methods 2 Penalized Yule-Walker 3 Dynamic ℓ 1 reconstruction 4 Dynamic Durbin Levinson Abbas Kazemipour (UMD) Sparse AR November 18, 2015 30 / 1

  50. Future Work 1 Greedy methods 2 Penalized Yule-Walker 3 Dynamic ℓ 1 reconstruction 4 Dynamic Durbin Levinson Abbas Kazemipour (UMD) Sparse AR November 18, 2015 30 / 1

  51. Other Methods 1 Penalized Yule-Walker � � r − 1 minimize R θ − � − p � 2 + λ � θ � 1 θ ∈ R p Need to do fourth moment analysis instead 2 Instead try � � r − 1 minimize R θ − � − p � 1 + λ � θ � 1 θ ∈ R p Abbas Kazemipour (UMD) Sparse AR November 18, 2015 31 / 1

  52. Summary of Simulation Methods 1 Reguralized ML minimize � y − X θ � 2 + λ � θ � 1 θ ∈ R p 2 Yule-Walker l 2,1 r − 1 − p − � � � R θ � 2 + λ � θ � 1 minimize θ ∈ R p 3 Yule-Walker l 1,1 − p − � r − 1 minimize � � R θ � 1 + λ � θ � 1 θ ∈ R p 4 Least Square Solutions to Yule-Walker and Maximum Likelihood (Traditional Method) Abbas Kazemipour (UMD) Sparse AR November 18, 2015 32 / 1

  53. Simulation Results for n = 60 , p = 200 , s = 4 Yule-Walker ℓ 2 , 1 True Parameters 0.2 0.2 0.1 0 0 −0.2 0 20 40 60 80 100 120 140 160 180 200 0 20 40 60 80 100 120 140 160 180 200 Reguralized ML Yule-Walker Least Squares 0.2 500 0 0 −0.2 −500 0 20 40 60 80 100 120 140 160 180 200 0 20 40 60 80 100 120 140 160 180 200 ML Least Squares OMP 0.1 0.5 0 0 −0.1 −0.5 0 20 40 60 80 100 120 140 160 180 200 0 20 40 60 80 100 120 140 160 180 200 Yule-Walker ℓ 1 , 1 OMP 0.5 0.2 0 0 −0.5 −0.2 0 20 40 60 80 100 120 140 160 180 200 0 20 40 60 80 100 120 140 160 180 200 Abbas Kazemipour (UMD) Sparse AR November 18, 2015 33 / 1

  54. Simulation Results for n = 120 , p = 200 , s = 4 Yule-Walker ℓ 2 , 1 True Parameters 0.2 0.5 0.1 0 0 −0.5 0 20 40 60 80 100 120 140 160 180 200 0 20 40 60 80 100 120 140 160 180 200 Reguralized ML Yule-Walker Least Squares 0.5 2 0 0 −2 −0.5 −4 0 20 40 60 80 100 120 140 160 180 200 0 20 40 60 80 100 120 140 160 180 200 ML Least Squares OMP 0.5 0.4 0 0.2 −0.5 0 0 20 40 60 80 100 120 140 160 180 200 0 20 40 60 80 100 120 140 160 180 200 Yule-Walker ℓ 1 , 1 OMP 0.5 0.5 0 0 −0.5 −0.5 0 20 40 60 80 100 120 140 160 180 200 0 20 40 60 80 100 120 140 160 180 200 Abbas Kazemipour (UMD) Sparse AR November 18, 2015 34 / 1

  55. Simulation Results for n = 180 , p = 200 , s = 4 Yule-Walker ℓ 2 , 1 True Parameters 0.2 0.2 0.1 0 0 −0.2 0 20 40 60 80 100 120 140 160 180 200 0 20 40 60 80 100 120 140 160 180 200 Reguralized ML Yule-Walker Least Squares 0.1 1 0 0 −0.1 −1 0 20 40 60 80 100 120 140 160 180 200 0 20 40 60 80 100 120 140 160 180 200 ML Least Squares OMP 1 0.5 0.5 0 0 −0.5 −0.5 0 20 40 60 80 100 120 140 160 180 200 0 20 40 60 80 100 120 140 160 180 200 Yule-Walker ℓ 1 , 1 OMP 0.5 0.2 0 0 −0.5 −0.2 0 20 40 60 80 100 120 140 160 180 200 0 20 40 60 80 100 120 140 160 180 200 Abbas Kazemipour (UMD) Sparse AR November 18, 2015 35 / 1

  56. Simulation Results for n = 280 , p = 200 , s = 4 Yule-Walker ℓ 2 , 1 True Parameters 0.2 0.2 0.1 0.1 0 0 −0.1 0 20 40 60 80 100 120 140 160 180 200 0 20 40 60 80 100 120 140 160 180 200 Reguralized ML Yule-Walker Least Squares 0.1 0.2 0 0 −0.1 −0.2 0 20 40 60 80 100 120 140 160 180 200 0 20 40 60 80 100 120 140 160 180 200 ML Least Squares OMP 0.5 0.5 0 0 −0.5 −0.5 0 20 40 60 80 100 120 140 160 180 200 0 20 40 60 80 100 120 140 160 180 200 Yule-Walker ℓ 1 , 1 OMP 0.2 0.5 0 0 −0.2 −0.5 0 20 40 60 80 100 120 140 160 180 200 0 20 40 60 80 100 120 140 160 180 200 Abbas Kazemipour (UMD) Sparse AR November 18, 2015 36 / 1

  57. Simulation Results for n = 700 , p = 200 , s = 4 Yule-Walker ℓ 2 , 1 True Parameters 0.2 0.5 0.1 0 0 −0.5 0 20 40 60 80 100 120 140 160 180 200 0 20 40 60 80 100 120 140 160 180 200 Reguralized ML Yule-Walker Least Squares 0.5 0.5 0 0 −0.5 −0.5 0 20 40 60 80 100 120 140 160 180 200 0 20 40 60 80 100 120 140 160 180 200 ML Least Squares OMP 0.5 0.5 0 0 −0.5 −0.5 0 20 40 60 80 100 120 140 160 180 200 0 20 40 60 80 100 120 140 160 180 200 Yule-Walker ℓ 1 , 1 OMP 0.5 0.5 0 0 −0.5 −0.5 0 20 40 60 80 100 120 140 160 180 200 0 20 40 60 80 100 120 140 160 180 200 Abbas Kazemipour (UMD) Sparse AR November 18, 2015 37 / 1

  58. Simulation Results for n = 4000 , p = 200 , s = 4 Yule-Walker ℓ 2 , 1 True Parameters 0.2 0.2 0.1 0.1 0 0 −0.1 0 20 40 60 80 100 120 140 160 180 200 0 20 40 60 80 100 120 140 160 180 200 Reguralized ML Yule-Walker Least Squares 0.2 0.2 0.1 0 0 −0.1 −0.2 0 20 40 60 80 100 120 140 160 180 200 0 20 40 60 80 100 120 140 160 180 200 ML Least Squares OMP 0.2 0.4 0 0.2 −0.2 0 0 20 40 60 80 100 120 140 160 180 200 0 20 40 60 80 100 120 140 160 180 200 Yule-Walker ℓ 1 , 1 OMP 0.2 0.2 0.1 0.1 0 0 −0.1 0 20 40 60 80 100 120 140 160 180 200 0 20 40 60 80 100 120 140 160 180 200 Abbas Kazemipour (UMD) Sparse AR November 18, 2015 38 / 1

  59. MSE for different values of n , p = 300 , s = 3 10 2 Regularized ML Least Squares Yule-Walker ℓ 11 Yule-Walker 10 1 Regularized ML + OMP Yule-Walker ℓ 21 OMP + Yule-Walker 10 0 10 -1 10 -2 10 1 10 2 10 3 10 4 10 5 Abbas Kazemipour (UMD) Sparse AR November 18, 2015 39 / 1

  60. OMP Input: L ( θ ) , s ⋆ θ ( s ⋆ ) Output: � AROMP � Start with the index set S (0) = ∅ Initialization: θ (0) and the initial estimate � AROMP = 0 for k = 1 , 2 , · · · , s ⋆ � � � � �� 1 � � θ ( k − 1) � j = arg max ∇ L � � AROMP i i S ( k ) = S ( k − 1) ∪ { j } θ ( k ) � AROMP = supp( x ) ⊂ S ( k ) L ( θ ) arg min end Table: Autoregressive Orthogonal Matching Pursuit (AROMP) Abbas Kazemipour (UMD) Sparse AR November 18, 2015 40 / 1

  61. Main Theoretical Result Theorem If θ is ( s, ξ, 2) -compressible for some ξ < 1 / 2 , there exist constants 1 ǫ s 2 / 3 p 2 / 3 (log s ) 2 / 3 log p , the d ′ 1 , d ′ 2 ǫ , d ′ 3 ǫ and d ′ 4 such that for n > d ′ AROMP estimate satisfies the bound � � � s log s log p log s � � �� 2 ≤ d ′ + d ′ θ AROMP − θ (20) � 2 ǫ 3 ǫ 1 n ξ − 2 s after s ⋆ = O ǫ ( s log s ) iterations with probability greater than � � 1 1 − O . n d ′ 4 1 Abbas Kazemipour (UMD) Sparse AR November 18, 2015 41 / 1

  62. Application to Financial Data 1 Crude oil price of cushing, OK WTI Spot Price FOB dataset. 2 The dataset consists of 7429 daily values 3 outliers removed by visual inspection, n = 4000 4 Long-memory time series → first order differencing 5 Model order selection of low importance here Abbas Kazemipour (UMD) Sparse AR November 18, 2015 42 / 1

  63. Application to Financial Data Abbas Kazemipour (UMD) Sparse AR November 18, 2015 43 / 1

  64. Conclusions 1 First order differences show Gaussian behavior 2 Given no outliers our method predicts a sudden change in prices every 40 , 100 , 150 days. 3 Yule-Walker is bad ! 4 Greedy is good! Abbas Kazemipour (UMD) Sparse AR November 18, 2015 44 / 1

  65. Minimax Framework 1 minimax estimation risk over the class of good stationary: � � �� 1 / 2 R e ( � � � θ − θ � 2 θ ) = sup . (21) E 2 2 The minimax estimator: � R e ( � θ minimiax = arg min θ ) . (22) θ ∈ Θ 3 Typically cannot be constructed → interested in optimal in order estimators : R e ( � θ ) ≤ c R e ( � θ minimax ) . (23) 4 Can also define the minimax prediction risk: �� � 2 � R p ( � x k − � θ ′ x k − 1 θ ) = sup E . (24) k − p Abbas Kazemipour (UMD) Sparse AR November 18, 2015 45 / 1

  66. Minimax Framework 1 ℓ 2 -regularized LS problem: [Goldenhauser 2001] 2 Slightly weaker exponential inequality 3 p ⋆ = ⌊− 1 / 2 log(1 − ǫ ) log n ⌋ is minimax optimal 4 Requires an exponentially large in p sample size 5 Our result: the ℓ 1 -regularized LS estimator is minimax optimal 6 Can afford higher orders Abbas Kazemipour (UMD) Sparse AR November 18, 2015 46 / 1

  67. Minimax Optimality Theorem Let x n 1 be samples of an AR process with s -sparse parameters satisfying � 2 � √ n � θ � 1 ≤ 1 − ǫ , then with a choice of p ⋆ = O ǫ we have: 3 � s n ≤ R e ( � θ minimax ) ≤ R e ( � θ sp ) ≤ c ′ ǫ R e ( � c ǫ θ minimax ) , that is the ℓ 1 -regularized LS estimator is minimax optimal modulo logarithmic factors. 1 Abbas Kazemipour (UMD) Sparse AR November 18, 2015 47 / 1

  68. Minimax Optimality Theorem Let x n − p +1 be samples of an AR process with Gaussian innovation There exist positive constants c ǫ and c ′ ǫ such that for n > c ǫ sp 2 / 3 (log p ) 2 / 3 we have: � 1 � n 2 + (1 − ǫ ) 2 p s + s R p ( � θ sp ) ≤ c ′ + 1 . (25) ǫ n 1 Large n → prediction error variance is very close to the variance of the innovations. Abbas Kazemipour (UMD) Sparse AR November 18, 2015 48 / 1

  69. Proofs: 1 Define the event: i,j | � A := { max R ij − R ij |≤ t ⋆ } . 2 � � c ǫ n 3 / 2 P ( A c ) ≤ c 1 exp − s 3 / 2 ( n + p ) + 2 log p . Abbas Kazemipour (UMD) Sparse AR November 18, 2015 49 / 1

  70. Proofs: 1 � � �� 1 / 2 θ minimax ) 2 ≤ R e ( � θ sp ) 2 = sup R e ( � � � θ − θ � 2 E 2 � � 2 � � � s log p log p 4 ≤ P ( A ) c 2 + c 2 σ s ( θ ) n n + P ( A c ) � � θ sp − � θ � 2 2 � � 2 � � � s log p log p 4 ≤ c 2 + c 2 σ s ( θ ) n n � � c ǫ n 3 / 2 + 4(1 − ǫ ) 2 c 1 exp − s 3 / 2 ( n + p ) + 2 log p . 2 For n > c ǫ sp 2 / 3 (log p ) 2 / 3 , the first term will be the dominant factor Abbas Kazemipour (UMD) Sparse AR November 18, 2015 50 / 1

  71. Proofs: Converse 1 Assumption: Gaussian innovations Lemma (Fano’s Inequality) Let Z be a class of densities with a subclass Z ⋆ densities f θ i , i ∈ { 0 , · · · , 2 s } such that for any two distinct θ 1 , θ 2 ∈ Z ⋆ : D ( f θ 1 � f θ 2 ) ≤ β. Let � θ be an estimate of the parameters. Then θ � = θ j | H j ) ≥ 1 − β + log 2 P ( � sup , (26) s j where H j denotes the hypothesis that θ j is the true parameter, and induces the probability measure P ( . | H j ) . Abbas Kazemipour (UMD) Sparse AR November 18, 2015 51 / 1

  72. Proofs: Converse 1 Class Z of AR processes defined over a fixed subset S ⊂ { 1 , 2 , · · · , p } satisfying | S | = s and by the s -sparse parameter set given by: θ j = ± ηe − N ✶ S ( j ) , (27) where η and N remain to be chosen 2 Add the all zero vector θ to Z . 3 |Z| = 2 s + 1 . Abbas Kazemipour (UMD) Sparse AR November 18, 2015 52 / 1

  73. Proofs: Converse Lemma (Gilbert-Varshamov Lemma) ∃Z ⋆ ⊂ Z , such that |Z ⋆ |≥ 2 ⌊ s/ 8 ⌋ + 1 , and any two distinct θ 1 , θ 2 ∈ Z ⋆ differ at least in s/ 16 components! 1 2 √ se − N := α. � θ 1 − θ 2 � 2 ≥ 1 (28) 4 3 Arbitrary estimate � θ : Hypothesis testing problem between the 2 ⌊ s/ 8 ⌋ + 1 hypotheses H j : θ = θ j ∈ Z ⋆ , and the minimum distance decoding strategy. Abbas Kazemipour (UMD) Sparse AR November 18, 2015 53 / 1

  74. Proofs: Converse 1 Markov’s inequality: � � � � Z ⋆ E [ � � sup θ − θ � 2 ≥ sup θ − θ � 2 ] E Z � � ≥ α θ − θ � 2 ≥ s � � 2 sup Z ⋆ P 2 � � = α � sup θ � = θ j | H j . (29) P 2 j =0 , ··· , 2 ⌊ s 8 ⌋ Abbas Kazemipour (UMD) Sparse AR November 18, 2015 54 / 1

  75. Proofs: Converse 1 f θ i : joint pdf of { x k } k = 1 n conditioned on { x } 0 − p +1 , i ∈ { 0 , · · · , 2 s } . 2 Gaussian innovations, for i � = j � � log f θ i D ( f θ i � f θ j ) ≤ sup E | H i f θ j i � = j � � �� � 2 � � � 2 � n � − 1 � x k − θ ′ i x k − 1 x k − θ ′ j x k − 1 ≤ sup − � H i E k − p k − p 2 i � = j k =1 �� � � 2 � n � ( θ i − θ j ) ′ x k − 1 ≤ sup � H i 2 E k − p i � = j = n ( θ i − θ j ) ′ R p × p ( θ i − θ j ) 2 sup i � = j 2 λ max ≤ ηnse − 2 N ≤ n � θ i − θ j � 2 2 sup := β. (30) ǫ 2 i � = j Abbas Kazemipour (UMD) Sparse AR November 18, 2015 55 / 1

  76. Proofs: Converse 1 Using Fano’s: � � √ se − N � � 1 − 8( ηnse − 2 N + log 2) � � ǫ 2 sup E θ − θ � 2 ≥ . 8 s Z 2 Choose η = ǫ 2 and N = log n for large enough s and n . 3 Any θ ∈ Z , satisfies � θ � 1 ≤ 1 − ǫ . Abbas Kazemipour (UMD) Sparse AR November 18, 2015 56 / 1

  77. Statistical Tests of Goodness-of-Fit 1 The residues (estimated innovations) of the process: e k = x k − � θ x k − 1 i = 1 , 2 , · · · , n. k − p 2 Goal: quantify how close the sequence { e i } n i =1 is to an i.i.d realization of an unknown (mostly absolutely continuous) distribution F 0 . Lemma (Glivenko-Cantelli Theorem) If the samples are generated from F 0 the theorem suggests that: F n ( t ) − F 0 ( t ) | a . s . | � − → 0 . sup t 3 Abbas Kazemipour (UMD) Sparse AR November 18, 2015 57 / 1

  78. Statistical Tests of Goodness-of-Fit 1 Kolmogorov-Smirnov (KS) test statistic | � F n ( t ) − F 0 ( t ) | , K n := sup t 2 Cramer-Von Mises (CvM) statistic � � � 2 � C n := F n ( t ) − F 0 ( t ) dF 0 ( t ) , 3 Anderson-Darling (AD) statistic � � � 2 � F n ( t ) − F 0 ( t ) A n := F 0 ( t ) (1 − F 0 ( t )) dF 0 ( t ) . Abbas Kazemipour (UMD) Sparse AR November 18, 2015 58 / 1

  79. Statistical Tests of Goodness-of-Fit 1 KS �� � � � � � � � � i i − 1 � � � � n − F 0 ( e i ) − F 0 ( e i ) K n = max 1 ≤ i ≤ n max � , , � � � n 2 CvM � � 2 n � 1 F 0 ( e i ) − 2 i − 1 nC n = 12 n + , 2 n i =1 3 AD � � �� n � nA n = − n − 1 (2 i − 1) log F 0 ( e i ) + log 1 − F 0 ( e i ) . n i =1 Abbas Kazemipour (UMD) Sparse AR November 18, 2015 59 / 1

  80. Spectral Forms 1 Based on the similarities of the spectrogram of the data and the estimated power-spectral density of the process. Lemma Let S ( ω ) be the (normalized) power-spectral density of stationary process with bounded condition number, and � S n ( ω ) be the spectrogram of the n samples of a realization of such a process, then for all ω we have: � � � ω � � √ n d . � 2 S n ( λ ) − S ( λ ) dλ − → Z ( ω ) , (31) 0 where Z ( ω ) is a mean zero Gaussian process. 2 Spectral KS, CvM, AD tests ... Abbas Kazemipour (UMD) Sparse AR November 18, 2015 60 / 1

Recommend


More recommend