higrad statistical inference for stochastic approximation
play

HiGrad: Statistical Inference for Stochastic Approximation and - PowerPoint PPT Presentation

HiGrad: Statistical Inference for Stochastic Approximation and Online Learning Weijie Su University of Pennsylvania Collaborator Yuancheng Zhu (UPenn) 2 / 59 Learning by optimization Sample Z 1 , . . . , Z N , and f ( , z ) is cost


  1. Iterate along HiGrad tree Recall: noisy gradient g ( θ, Z ) unbiased for ∇ f ( θ ) ; partition { Z s } of { Z 1 , . . . , Z N } ; and L k := n 0 + · · · + n k ◮ Iterate along level 0 segment: θ j = θ j − 1 − γ j ∇ f ( θ j − 1 , Z j ) for j = 1 , . . . , n 0 , starting from some θ 0 21 / 59

  2. Iterate along HiGrad tree Recall: noisy gradient g ( θ, Z ) unbiased for ∇ f ( θ ) ; partition { Z s } of { Z 1 , . . . , Z N } ; and L k := n 0 + · · · + n k ◮ Iterate along level 0 segment: θ j = θ j − 1 − γ j ∇ f ( θ j − 1 , Z j ) for j = 1 , . . . , n 0 , starting from some θ 0 ◮ Iterate along each level 1 segment s = ( b 1 ) for 1 ≤ b 1 ≤ B 1 θ s j = θ s j − 1 − γ j + L 0 g ( θ s j − 1 , Z s j ) for j = 1 , . . . , n 1 , starting from θ n 0 21 / 59

  3. Iterate along HiGrad tree Recall: noisy gradient g ( θ, Z ) unbiased for ∇ f ( θ ) ; partition { Z s } of { Z 1 , . . . , Z N } ; and L k := n 0 + · · · + n k ◮ Iterate along level 0 segment: θ j = θ j − 1 − γ j ∇ f ( θ j − 1 , Z j ) for j = 1 , . . . , n 0 , starting from some θ 0 ◮ Iterate along each level 1 segment s = ( b 1 ) for 1 ≤ b 1 ≤ B 1 θ s j = θ s j − 1 − γ j + L 0 g ( θ s j − 1 , Z s j ) for j = 1 , . . . , n 1 , starting from θ n 0 ◮ Generally, for the segment s = ( b 1 · · · b k ) , iterate θ s j = θ s j − 1 − γ j + L k − 1 g ( θ s j − 1 , Z s j ) for j = 1 , . . . , n k , starting from θ ( b 1 ··· b k − 1 ) n k − 1 21 / 59

  4. A second look at the HiGrad tree An example of HiGrad tree: B 1 = 2 , B 2 = 3 , K = 2 22 / 59

  5. A second look at the HiGrad tree An example of HiGrad tree: B 1 = 2 , B 2 = 3 , K = 2 Fulfilled • Online in nature with same computational cost as vanilla SGD 22 / 59

  6. A second look at the HiGrad tree An example of HiGrad tree: B 1 = 2 , B 2 = 3 , K = 2 Fulfilled • Online in nature with same computational cost as vanilla SGD Bonus Easier to parallelize than vanilla SGD! 22 / 59

  7. The HiGrad algorithm in action Require: g ( · , · ) , Z 1 , . . . , Z N , ( n 0 , n 1 , . . . , n K ) , ( B 1 , . . . , B K ) , ( γ 1 , . . . , γ N K ) , θ 0 s = 0 for all segments s θ function NodeTreeSGD( θ, s ) θ s 0 = θ k = # s for j = 1 to n k do θ s j ← θ s j − 1 − γ j + L k − 1 g ( θ s j − 1 , Z s j ) s ← θ s + θ s θ j /n k end for if k < K then for b k +1 = 1 to B k +1 do s + ← ( s , b k +1 ) � n k , s + � execute NodeTreeSGD θ s end for end if end function execute NodeTreeSGD ( θ 0 , ∅ ) s for all segments s output: θ 23 / 59

  8. Outline 1. Deriving HiGrad 2. Constructing Confidence Intervals 3. Configuring HiGrad 4. Empirical Performance 24 / 59

  9. Estimate µ ∗ x through each thread Average over each segment s = ( b 1 , . . . , b k ) n k s = 1 � θ s θ j n k j =1 Given weights w 0 , w 1 , . . . , w K that sum up to 1, weighted average along thread t = ( b 1 , . . . , b K ) is K ( b 1 ,...,b k ) � θ t = w k θ k =0 25 / 59

  10. Estimate µ ∗ x through each thread Average over each segment s = ( b 1 , . . . , b k ) n k s = 1 � θ s θ j n k j =1 Given weights w 0 , w 1 , . . . , w K that sum up to 1, weighted average along thread t = ( b 1 , . . . , b K ) is K ( b 1 ,...,b k ) � θ t = w k θ k =0 Estimator yielded by thread t µ t x := µ x ( θ t ) 25 / 59

  11. How to construct a confidence interval based on T := B 1 B 2 · · · B K many such µ t x estimates? 25 / 59

  12. Assume normality Denote by µ x the T -dimensional vector consisting of all µ t x Normality of µ x (to be proved soon) √ N ( µ x − µ ∗ x 1 ) converges weakly to normal distribution N ( 0 , Σ) as N → ∞ 26 / 59

  13. Convert to simple linear regression a ∼ N ( µ ∗ From µ x x 1 , Σ /N ) we get Σ − 1 2 µ x ≈ (Σ − 1 2 1 ) µ ∗ x + ˜ z , z ∼ N (0 , I /N ) ˜ 27 / 59

  14. Convert to simple linear regression a ∼ N ( µ ∗ From µ x x 1 , Σ /N ) we get Σ − 1 2 µ x ≈ (Σ − 1 2 1 ) µ ∗ x + ˜ z , z ∼ N (0 , I /N ) ˜ Simple linear regression! Least-squares estimator of µ ∗ x given as ( 1 ′ Σ − 1 2 Σ − 1 2 1 ) − 1 1 ′ Σ − 1 2 Σ − 1 2 µ x = ( 1 ′ Σ − 1 1 ) − 1 1 ′ Σ − 1 µ x = 1 � µ t x ≡ µ x T t ∈T HiGrad estimator Just the sample mean µ x 27 / 59

  15. A t -based confidence interval A pivot for µ ∗ x µ x − µ ∗ a x ∼ t T − 1 , SE x where the standard error is given as √ � ( µ ′ x − µ x 1 ′ )Σ − 1 ( µ x − µ x 1 ) 1 ′ Σ 1 SE x = · T − 1 T 28 / 59

  16. A t -based confidence interval A pivot for µ ∗ x µ x − µ ∗ a x ∼ t T − 1 , SE x where the standard error is given as √ � ( µ ′ x − µ x 1 ′ )Σ − 1 ( µ x − µ x 1 ) 1 ′ Σ 1 SE x = · T − 1 T HiGrad confidence interval of coverage 1 − α � � µ x − t T − 1 , 1 − α 2 SE x , µ x + t T − 1 , 1 − α 2 SE x 28 / 59

  17. Do we know the covariance Σ ? 28 / 59

  18. An extension of Ruppert–Polyak normality Given a thread t = ( b 1 , . . . , b K ) , denote by segments s k = ( b 1 , b 2 , . . . , b k ) Fact (informal) √ n 0 ( θ s 0 − θ ∗ ) , √ n 1 ( θ s 1 − θ ∗ ) , . . . , √ n K ( θ s K − θ ∗ ) converge to i.i.d. centered normal distributions 29 / 59

  19. An extension of Ruppert–Polyak normality Given a thread t = ( b 1 , . . . , b K ) , denote by segments s k = ( b 1 , b 2 , . . . , b k ) Fact (informal) √ n 0 ( θ s 0 − θ ∗ ) , √ n 1 ( θ s 1 − θ ∗ ) , . . . , √ n K ( θ s K − θ ∗ ) converge to i.i.d. centered normal distributions • Hessian H = ∇ 2 f ( θ ∗ ) and V = E [ g ( θ ∗ , Z ) g ( θ ∗ , Z ) ′ ] . Ruppert (1988), Polyak (1990), and Polyak and Juditsky (1992) prove √ N ( θ N − θ ∗ ) ⇒ N (0 , H − 1 V H − 1 ) 29 / 59

  20. An extension of Ruppert–Polyak normality Given a thread t = ( b 1 , . . . , b K ) , denote by segments s k = ( b 1 , b 2 , . . . , b k ) Fact (informal) √ n 0 ( θ s 0 − θ ∗ ) , √ n 1 ( θ s 1 − θ ∗ ) , . . . , √ n K ( θ s K − θ ∗ ) converge to i.i.d. centered normal distributions • Hessian H = ∇ 2 f ( θ ∗ ) and V = E [ g ( θ ∗ , Z ) g ( θ ∗ , Z ) ′ ] . Ruppert (1988), Polyak (1990), and Polyak and Juditsky (1992) prove √ N ( θ N − θ ∗ ) ⇒ N (0 , H − 1 V H − 1 ) • Difficult to estimate sandwich covariance H − 1 V H − 1 (Chen et al, 2016) 29 / 59

  21. An extension of Ruppert–Polyak normality Given a thread t = ( b 1 , . . . , b K ) , denote by segments s k = ( b 1 , b 2 , . . . , b k ) Fact (informal) √ n 0 ( θ s 0 − θ ∗ ) , √ n 1 ( θ s 1 − θ ∗ ) , . . . , √ n K ( θ s K − θ ∗ ) converge to i.i.d. centered normal distributions • Hessian H = ∇ 2 f ( θ ∗ ) and V = E [ g ( θ ∗ , Z ) g ( θ ∗ , Z ) ′ ] . Ruppert (1988), Polyak (1990), and Polyak and Juditsky (1992) prove √ N ( θ N − θ ∗ ) ⇒ N (0 , H − 1 V H − 1 ) • Difficult to estimate sandwich covariance H − 1 V H − 1 (Chen et al, 2016) • To know covariance of { µ x ( θ t ) } , really need to know H − 1 V H − 1 ? 29 / 59

  22. Covariance determined by number of shared segments Consider µ x ( θ ) = T ( x ) ′ θ and observe • √ n 0 ( µ x ( θ x ) , √ n 1 ( µ x ( θ x ) , . . . , √ n K ( µ x ( θ s 0 ) − µ ∗ s 1 ) − µ ∗ s K ) − µ ∗ x ) converge to i.i.d. centered univariate normal distributions • µ t K � s k ) − µ ∗ � x − µ ∗ x = µ x ( θ t ) − µ ∗ � x = w k µ x ( θ x k =0 30 / 59

  23. Covariance determined by number of shared segments Consider µ x ( θ ) = T ( x ) ′ θ and observe • √ n 0 ( µ x ( θ x ) , √ n 1 ( µ x ( θ x ) , . . . , √ n K ( µ x ( θ s 0 ) − µ ∗ s 1 ) − µ ∗ s K ) − µ ∗ x ) converge to i.i.d. centered univariate normal distributions • µ t K � s k ) − µ ∗ � x − µ ∗ x = µ x ( θ t ) − µ ∗ � x = w k µ x ( θ x k =0 Fact (informal) For any two threads t and t ′ that agree at the first k segments and differ henceforth, we have k w 2 � � x , µ t ′ � µ t = (1 + o (1)) σ 2 i Cov x n i i =0 30 / 59

  24. Specify Σ up to a multiplicative factor If µ x ( θ ) = T ( x ) ′ θ , then for any two threads t and t ′ that agree only at the first k segments, k ω 2 i N � Σ t , t ′ = (1 + o (1)) C n i i =0 31 / 59

  25. Specify Σ up to a multiplicative factor If µ x ( θ ) = T ( x ) ′ θ , then for any two threads t and t ′ that agree only at the first k segments, k ω 2 i N � Σ t , t ′ = (1 + o (1)) C n i i =0 • Do we need to know C as well? 31 / 59

  26. Specify Σ up to a multiplicative factor If µ x ( θ ) = T ( x ) ′ θ , then for any two threads t and t ′ that agree only at the first k segments, k ω 2 i N � Σ t , t ′ = (1 + o (1)) C n i i =0 • Do we need to know C as well? • No! Standard error of µ x invariant under multiplying Σ by a scalar √ � ( µ ′ x − µ x 1 ′ )Σ − 1 ( µ x − µ x 1 ) 1 ′ Σ 1 SE x = · T − 1 T 31 / 59

  27. Some remarks • In generalized linear models, µ x often takes the form µ x ( θ ) = η − 1 ( T ( x ) ′ θ ) for an increasing η . Construct confidence interval for η ( µ x ) and then invert • For general nonlinear but smooth µ x ( θ ) , use delta method • Need less than Ruppert–Polyak: remains to hold if √ N ( θ N − θ ∗ ) converges to some centered normal distribution 32 / 59

  28. Formal statement of theoretical results 32 / 59

  29. Assumptions Local strong convexity. f ( θ ) ≡ E f ( θ, Z ) convex , differentiable, with 1 Lipschitz gradients. Hessian ∇ 2 f ( θ ) locally Lipschitz and positive-definite at θ ∗ Noise regularity. V ( θ ) = E [ g ( θ, Z ) g ( θ, Z ) ′ ] Lipschitz and does not grow too 2 fast. Noisy gradient g ( θ, Z ) has 2 + o (1) moment locally at θ ∗ 33 / 59

  30. Examples satisfying assumptions • Linear regression : f ( θ, z ) = 1 2 ( y − x ⊤ θ ) 2 . • Logistic regression : f ( θ, z ) = − yx ⊤ θ + log � 1 + e x ⊤ θ � . • Penalized regression : Add a ridge penalty λ � θ � 2 . • Huber regression : f ( θ, z ) = ρ λ ( y − x ⊤ θ ) , where ρ λ ( a ) = a 2 / 2 for | a | ≤ λ and ρ λ ( a ) = λ | a | − λ 2 / 2 otherwise. Sufficient conditions X in generic position, and E � X � 4+ o (1) < ∞ and E | Y | 2+ o (1) � X � 2+ o (1) < ∞ 34 / 59

  31. Main theoretical results Theorem (S. and Zhu) Assume K and B 1 , . . . , B K are fixed, n k ∝ N as N → ∞ , and µ x has a nonzero derivative at θ ∗ . Taking γ j ≍ j − α for α ∈ (0 . 5 , 1) gives µ x − µ ∗ x = ⇒ t T − 1 SE x 35 / 59

  32. Main theoretical results Theorem (S. and Zhu) Assume K and B 1 , . . . , B K are fixed, n k ∝ N as N → ∞ , and µ x has a nonzero derivative at θ ∗ . Taking γ j ≍ j − α for α ∈ (0 . 5 , 1) gives µ x − µ ∗ x = ⇒ t T − 1 SE x Confidence intervals µ ∗ � � �� lim x ∈ µ x − t T − 1 , 1 − α 2 SE x , µ x + t T − 1 , 1 − α 2 SE x = 1 − α N →∞ P 35 / 59

  33. Main theoretical results Theorem (S. and Zhu) Assume K and B 1 , . . . , B K are fixed, n k ∝ N as N → ∞ , and µ x has a nonzero derivative at θ ∗ . Taking γ j ≍ j − α for α ∈ (0 . 5 , 1) gives µ x − µ ∗ x = ⇒ t T − 1 SE x Confidence intervals µ ∗ � � �� lim x ∈ µ x − t T − 1 , 1 − α 2 SE x , µ x + t T − 1 , 1 − α 2 SE x = 1 − α N →∞ P Fulfilled • Online in nature with same computational cost as vanilla SGD • A confidence interval for µ ∗ x in addition to an estimator 35 / 59

  34. How accurate is the HiGrad estimator? 35 / 59

  35. Optimal variance with optimal weights By Cauchy–Schwarz � K � � K k � w 2 N V ar ( µ x ) = (1 + o (1)) σ 2 � � � k n k B i � k n k i =1 B i i =1 k =0 k =0 � K � 2 � � ≥ (1 + o (1)) σ 2 w 2 = (1 + o (1)) σ 2 , k k =0 with equality if � k k = n k i =1 B i w ∗ N 36 / 59

  36. Optimal variance with optimal weights By Cauchy–Schwarz � K � � K k � w 2 N V ar ( µ x ) = (1 + o (1)) σ 2 � � � k n k B i � k n k i =1 B i i =1 k =0 k =0 � K � 2 � � ≥ (1 + o (1)) σ 2 w 2 = (1 + o (1)) σ 2 , k k =0 with equality if � k k = n k i =1 B i w ∗ N • Segments at an early level weighted less 36 / 59

  37. Optimal variance with optimal weights By Cauchy–Schwarz � K � � K k � w 2 N V ar ( µ x ) = (1 + o (1)) σ 2 � � � k n k B i � k n k i =1 B i i =1 k =0 k =0 � K � 2 � � ≥ (1 + o (1)) σ 2 w 2 = (1 + o (1)) σ 2 , k k =0 with equality if � k k = n k i =1 B i w ∗ N • Segments at an early level weighted less • HiGrad estimator has the same asymptotic variance as vanilla SGD 36 / 59

  38. Optimal variance with optimal weights By Cauchy–Schwarz � K � � K k � w 2 N V ar ( µ x ) = (1 + o (1)) σ 2 � � � k n k B i � k n k i =1 B i i =1 k =0 k =0 � K � 2 � � ≥ (1 + o (1)) σ 2 w 2 = (1 + o (1)) σ 2 , k k =0 with equality if � k k = n k i =1 B i w ∗ N • Segments at an early level weighted less • HiGrad estimator has the same asymptotic variance as vanilla SGD • Achieves Cramér–Rao lower bound when model specified 36 / 59

  39. Prediction intervals for vanilla SGD Theorem (S. and Zhu) Run vanilla SGD on a fresh dataset of the same size, producing µ SGD . Then, x with optimal weights, √ √ � � �� µ SGD lim ∈ µ x − 2 t T − 1 , 1 − α 2 SE x , µ x + 2 t T − 1 , 1 − α 2 SE x = 1 − α. N →∞ P x • µ SGD can be replaced by the HiGrad estimator with the same structure • Interpretable even under model misspecification x 37 / 59

  40. HiGrad enjoys three appreciable properties Under certain assumptions, for example, f being locally strongly convex Fulfilled • Online in nature with same computational cost as vanilla SGD • A confidence interval for µ ∗ x in addition to an estimator • Estimator (almost) as accurate as vanilla SGD 38 / 59

  41. Outline 1. Deriving HiGrad 2. Constructing Confidence Intervals 3. Configuring HiGrad 4. Empirical Performance 39 / 59

  42. Which one? 40 / 59

  43. Length of confidence intervals Denote by L CI = 2 t T − 1 , 1 − α 2 SE x the length of HiGrad confidence interval Proposition (S. and Zhu) √ � T √ � N E L CI → 2 σ 2 t T − 1 , 1 − α 2 Γ 2 √ � T − 1 � T − 1 Γ 2 41 / 59

  44. Length of confidence intervals Denote by L CI = 2 t T − 1 , 1 − α 2 SE x the length of HiGrad confidence interval Proposition (S. and Zhu) √ � T √ � N E L CI → 2 σ 2 t T − 1 , 1 − α 2 Γ 2 √ � T − 1 � T − 1 Γ 2 � T • The function t T − 1 , 1 − α � 2 Γ 2 √ � is decreasing in T ≥ 2 � T − 1 T − 1 Γ 2 41 / 59

  45. Length of confidence intervals Denote by L CI = 2 t T − 1 , 1 − α 2 SE x the length of HiGrad confidence interval Proposition (S. and Zhu) √ � T √ � N E L CI → 2 σ 2 t T − 1 , 1 − α 2 Γ 2 √ � T − 1 � T − 1 Γ 2 � T • The function t T − 1 , 1 − α � 2 Γ 2 √ � is decreasing in T ≥ 2 � T − 1 T − 1 Γ 2 • The more threads, the shorter the HiGrad confidence interval on average 41 / 59

  46. Length of confidence intervals Denote by L CI = 2 t T − 1 , 1 − α 2 SE x the length of HiGrad confidence interval Proposition (S. and Zhu) √ � T √ � N E L CI → 2 σ 2 t T − 1 , 1 − α 2 Γ 2 √ � T − 1 � T − 1 Γ 2 � T • The function t T − 1 , 1 − α � 2 Γ 2 √ � is decreasing in T ≥ 2 � T − 1 T − 1 Γ 2 • The more threads, the shorter the HiGrad confidence interval on average • More contrasting leads to shorter confidence interval 41 / 59

  47. Really want to set T = 1000 ? 42 / 59

  48. T = 4 is sufficient ● 9 Length 6 ● 3 ● ● ● ● ● ● ● 0 2 4 6 8 10 T t T − 1 , 0 . 975 Γ ( T/ 2) √ Plot of T − 1 Γ ( T/ 2 − 0 . 5) • Too many threads result in inaccurate normality (unless N is huge) • Large T leads to much contrasting and little sharing 43 / 59

  49. How to choose ( n 0 , . . . , n K ) ? n 0 + B 1 n 1 + B 1 B 2 n 2 + B 1 B 2 B 3 n 3 + · · · + B 1 B 2 · · · B K n K = N Length of each thread L K := n 0 + n 1 + · · · + n K 44 / 59

  50. How to choose ( n 0 , . . . , n K ) ? n 0 + B 1 n 1 + B 1 B 2 n 2 + B 1 B 2 B 3 n 3 + · · · + B 1 B 2 · · · B K n K = N Length of each thread L K := n 0 + n 1 + · · · + n K • Sharing: want a larger L K by setting n 0 > n 1 > · · · > n K 44 / 59

  51. How to choose ( n 0 , . . . , n K ) ? n 0 + B 1 n 1 + B 1 B 2 n 2 + B 1 B 2 B 3 n 3 + · · · + B 1 B 2 · · · B K n K = N Length of each thread L K := n 0 + n 1 + · · · + n K • Sharing: want a larger L K by setting n 0 > n 1 > · · · > n K • Contrasting: want n 0 < n 1 < · · · < n K 44 / 59

  52. Outline 1. Deriving HiGrad 2. Constructing Confidence Intervals 3. Configuring HiGrad 4. Empirical Performance 45 / 59

  53. General simulation setup X generated as i.i.d. N (0 , 1) and Z = ( X, Y ) ∈ R d × R . Set N = 10 6 and use γ j = 0 . 5 j − 0 . 55 • Linear regression Y ∼ N ( µ X ( θ ∗ ) , 1) , where µ x ( θ ) = x ′ θ • Logistic regression Y ∼ Bernoulli( µ X ( θ ∗ )) , where e x ′ θ µ x ( θ ) = 1 + e x ′ θ Criteria • Accuracy: � θ − θ ∗ � 2 , where θ averaged over T threads • Coverage probability and length of confidence interval 46 / 59

  54. Accuracy Dimension d = 50 . MSE � θ − θ ∗ � 2 normalized by that of vanilla SGD • null case where θ 1 = · · · = θ 50 = 0 • dense case where θ 1 = · · · = θ 50 = 1 √ • sparse case where θ 1 = · · · = θ 5 = 50 1 5 , θ 6 = · · · = θ 50 = 0 √ 47 / 59

  55. Accuracy : : : : Linear regression, null Linear regression, sparse Linear regression, dense 1.30 1.30 1.30 Normalized risk Normalized risk Normalized risk 1.20 1.20 1.20 1.10 1.10 1.10 1.00 1.00 1.00 1e+04 5e+04 2e+05 5e+05 1e+04 5e+04 2e+05 5e+05 1e+04 5e+04 2e+05 5e+05 Total number of steps Total number of steps Total number of steps Logistic regression, null Logistic regression, sparse Logistic regression, dense 1.30 1.30 1.30 Normalized risk Normalized risk Normalized risk 1.20 1.20 1.20 1.10 1.10 1.10 1.00 1.00 1.00 1e+04 5e+04 2e+05 5e+05 1e+04 5e+04 2e+05 5e+05 1e+04 5e+04 2e+05 5e+05 Total number of steps Total number of steps Total number of steps 48 / 59

  56. Coverage and CI length HiGrad configurations • K = 1 , then n 1 = n 0 = r = 1 ; • K = 2 , then n 1 /n 0 = n 2 /n 1 = r ∈ { 0 . 75 , 1 , 1 . 25 , 1 . 5 } Set θ ∗ i = ( i − 1) /d for i = 1 , . . . , d and α = 5% . Use measure 20 1 � 1 ( µ x i ( θ ∗ ) ∈ CI x i ) 20 i =1 49 / 59

  57. Linear regression: d = 20 0.956 1, 4, 1 0.0851 0.938 1, 8, 1 0.0683 0.9185 1, 12, 1 0.0653 0.887 1, 16, 1 0.0637 0.8488 1, 20, 1 0.0637 0.9425 2, 2, 1 0.0801 0.9472 2, 2, 1.25 0.0811 0.9452 2, 2, 1.5 0.0828 0.9448 2, 2, 2 0.0815 0.924 3, 2, 1 0.061 0.9318 3, 2, 1.25 0.0614 0.935 3, 2, 1.5 0.062 0.9378 3, 2, 2 0.0633 0.925 2, 3, 1 0.0605 0.9185 2, 3, 1.25 0.0606 0.9245 2, 3, 1.5 0.0618 0.9348 2, 3, 2 0.0621 50 / 59

  58. Linear regression: d = 100 0.9472 1, 4, 1 0.2403 0.9478 1, 8, 1 0.2197 0.9308 1, 12, 1 0.2312 0.92 1, 16, 1 0.2495 0.9125 1, 20, 1 0.2649 0.9312 2, 2, 1 0.1917 0.9338 2, 2, 1.25 0.1927 0.9358 2, 2, 1.5 0.1946 0.9302 2, 2, 2 0.1972 0.9 3, 2, 1 0.1412 0.9065 3, 2, 1.25 0.1428 0.9148 3, 2, 1.5 0.1453 0.917 3, 2, 2 0.1489 0.894 2, 3, 1 0.1457 0.8992 2, 3, 1.25 0.1466 0.897 2, 3, 1.5 0.1491 0.9115 2, 3, 2 0.15 51 / 59

Recommend


More recommend