estimating nonlinear functions of means
play

Estimating Nonlinear Functions of Means Peter J. Haas CS 590M: - PowerPoint PPT Presentation

Estimating Nonlinear Functions of Means Peter J. Haas CS 590M: Simulation Spring Semester 2020 1 / 30 Estimating Nonlinear Functions of Means Overview Delta Method Jackknife Method Bootstrap Confidence Intervals Complete Bias Elimination


  1. Estimating Nonlinear Functions of Means Peter J. Haas CS 590M: Simulation Spring Semester 2020 1 / 30

  2. Estimating Nonlinear Functions of Means Overview Delta Method Jackknife Method Bootstrap Confidence Intervals Complete Bias Elimination 2 / 30

  3. Nonlinear Functions of Means Our focus up until now I Estimate quantities of the form µ = E [ X ] I E.g., expected win/loss of gambling game I We’ll now focus on more complex quantities Nonlinear functions of means: α = g ( µ 1 , µ 2 , . . . , µ d ), where I g is a nonlinear function I µ i = E [ X ( i ) ] for 1  i  d I For simplicity, take d = 2 and focus on α = g ( µ X , µ Y ) I µ X = E [ X ] and µ Y = E [ Y ] 3 / 30

  4. Bonferroni inequality Example: Retail Outlet - NBC - PLAT ) PLAN B) 21 I Goal: Estimate α = long-run average revenue per customer I X i = R i = revenue generated on day i I Y i = number of customers on day i I Assume that pairs ( X 1 , Y 1 ) , ( X 2 , Y 2 ) , . . . are i.i.d. I Set ¯ i =1 X i and ¯ X n = (1 / n ) P n Y n = (1 / n ) P n i =1 Y i ¯ ¥ X 1 + · · · + X n X n α = lim = lim = ¯ Y 1 + · · · + Y n Y n My n !1 n !1 X I So α = g ( µ X , µ Y ), where g ( x , y ) = y 4 / 30

  5. Example: Higher-Order Moments I Let R 1 , R 2 , . . . be daily revenues as before I Assume that the R i ’s are i.i.d. (Critique?) I α = Var[ R ] = variance of daily revenue I Let X = R 2 and Y = R - yin x α = g ( µ X , µ Y ) , where g ( x , y ) = 5 / 30

  6. Estimating Nonlinear Functions of Means Overview Delta Method Jackknife Method Bootstrap Confidence Intervals Complete Bias Elimination 6 / 30

  7. gcmx.tt/)--3Mxt4MyuaYdn--3Xn*4YnECdnT--3EExnTt4 Linger ; if Delta Method (Taylor Series) ELENI 'd Assume that function g ( x , y ) is smooth I Continuously di ff erentiable in neighborhood of ( µ x , µ y ) I I.e., g is continuous, as are ∂ g / ∂ x and ∂ g / ∂ y Point estimate I Run simulation n times to get ( X 1 , Y 1 ) , . . . , ( X n , Y n ) i.i.d. strong seonsistenoy I Set α n = g ( ¯ X n , ¯ Y n ) Plan → a) =L I This estimator is biased: I E [ α n ] = E [ g ( ¯ X n , ¯ Y n )] 6 = g ( E [ ¯ X n ] , E [ ¯ Y n ]) = g ( µ x , µ y ) = α I Jensen’s inequality: E [ α n ] = E [ g ( ¯ X n )] � g ( µ X ) = α if g is convex I By SLLN and continuity of g , we have bias ! 0 as n ! 1 (Estimator α n is asymptotically unbiased) 7 / 30

  8. Delta Method, Continued Confidence interval I ( ¯ X n , ¯ Y n ) should be “close” to ( µ X , µ Y ) for large n by SLLN I α n = g ( ¯ X n , ¯ Y n ) should be close to g ( µ X , µ Y ) = α α n � α = g ( ¯ X n , ¯ Y n ) � g ( µ X , µ Y ) = ∂ g X � µ X ) + ∂ g ∂ x ( µ X , µ Y ) · ( ¯ ∂ y ( µ X , µ Y ) · ( ¯ Y � µ Y ) n n = ¯ Z n I Z i = c ( X i � µ X ) + d ( Y i � µ Y ) and ¯ Z n = (1 / n ) P n i =1 Z i I c = ∂ g ∂ x ( µ X , µ Y ) and d = ∂ g ∂ y ( µ X , µ Y ) 8 / 30

  9. Delta Method, Continued Confidence interval, continued I { Z n : n � 1 } are i.i.d. as Z = c ( X � µ X ) + d ( Y � µ Y ) I E [ Z ] = 0 I By CLT, p n ¯ Z n / σ D ⇠ N (0 , 1) approximately for large n I Thus p n ( α n � α ) / σ D ⇠ N (0 , 1) approximately for large n I Here σ 2 = Var[ Z ] = E [ Z 2 ] = E � 2 ⇤ ⇥� c ( X � µ X ) + d ( Y � µ Y ) I So asymptotic 100(1 � δ )% CI is α n ± z δ σ / p n I z δ is 1 � ( δ / 2) quantile of standard normal distribution I Estimate c , d , and σ from data 9 / 30

  10. Delta Method, Continued Delta Method CI Algorithm 1. Simulate to get ( X 1 , Y 1 ) , . . . , ( X n , Y n ) i.i.d. 2. α n g ( ¯ X n , ¯ Y n ) ∂ x ( ¯ X n , ¯ ∂ y ( ¯ X n , ¯ 3. c n ∂ g Y n ) and d n ∂ g Y n ) � 2 n = ( n � 1) − 1 P n c n ( X i � ¯ X n ) + d n ( Y i � ¯ 4. s 2 � Y n ) i =1 5. Return asymptotic 100(1 � δ )% CI: α n � z δ s n p n , α n + z δ s n h i p n I SLLN and continuity assumptions imply that, with prob. 1, c n ! c , d n ! d , and s 2 n ! σ 2 10 / 30

  11. Example: Ratio Estimation: g ( x , y ) = x / y Multi-pass method (apply previous algorithm directly) - Y÷r " ' Ma , fry ' ' Ii α = c = d = α n = c n = d n = - En My . n � 2 s 2 n = ( n � 1) � 1 X c n ( X i � ¯ X n ) + d n ( Y i � ¯ � Y n ) i =1 11 / 30

  12. Example: Ratio Estimation: g ( x , y ) = x / y Single-pass method σ 2 = Var[ Z ] = Var[ c ( X � µ X ) + d ( Y � µ Y )] = Var[ X ] � 2 α Cov[ X , Y ] + α 2 Var[ Y ] µ 2 Y n = s n (1 , 1) � 2 α n s n (1 , 2) + α 2 n s n (2 , 2) s 2 � ¯ � 2 Y n 1 P n i =1 ( X i � ¯ I s n (1 , 1) = X n ) 2 Use n � 1 1 P n i =1 ( Y i � ¯ I s n (2 , 2) = Y n ) 2 single-pass n � 1 i =1 ( X i � ¯ X n )( Y i � ¯ 1 P n formulas I s n (1 , 2) = Y n ) n � 1 n = P n n = P n I Set S X i =1 X i and S Y i =1 Y i ! ! S X S Y k � 1 � ( k � 1) X k k � 1 � ( k � 1) Y k ( k � 1) v k = ( k � 1) v k � 1 + k � 1 k 12 / 30

  13. Delta Method for Stochastic Root-Finding Problem: Find ¯ θ such that E [ g ( X , ¯ θ )] = 0 (can replace 0 with any fixed constant) Applications: I Process control, risk management, finance, quantiles, . . . I Stochastic optimization: min θ E [ h ( X , θ )] I Optimality condition: ∂ ∂θ E [ h ( X , θ )] = 0 h i I Can often show that ∂ ∂ ∂θ E [ h ( X , θ )] = E ∂θ h ( X , θ ) I So take g ( X , θ ) = ∂ ∂θ h ( X , θ ) Point Estimate (Stochastic Average Approximation) I Generate X 1 , . . . , X n i.i.d. as X I Find θ n s.t. 1 P n i =1 g ( X i , θ n ) = 0 (deterministic problem) n 13 / 30

  14. Delta Method for Stochastic Root-Finding Problem: Find ¯ θ such that E [ g ( X , ¯ θ )] = 0 Point Estimate (Stochastic Average Approximation) I Generate X 1 , . . . , X n i.i.d. as X P n I Find θ n s.t. 1 i =1 g ( X i , θ n ) = 0 n How to find a confidence interval for ¯ θ ? I Taylor series: g ( X i , θ n ) ⇡ g ( X i , ¯ ∂θ ( X i , ¯ θ )( θ n � ¯ θ ) + ∂ g θ ) I Implies: 1 P n i =1 g ( X i , θ n ) ⇡ 1 P n i =1 g ( X i , ¯ θ ) � c n (¯ θ � θ n ) n n ⇥ ∂ g I where c n = 1 P n ∂ g ∂θ ( X i , ¯ ∂θ ( X , ¯ ⇤ θ ) ⇡ E θ ) O i =1 n Etgcxi , = I Implies: ¯ P n i =1 g ( X i , ¯ θ � θ n ⇡ 1 1 θ ) c n n I Implies: θ n � ¯ θ ⇡ N (0 , σ 2 / n ), where σ 2 = Var[ g ( X , ¯ n = E [ g ( X , ¯ θ )] / c 2 θ ) 2 ] / c 2 n 14 / 30

  15. Delta Method for Stochastic Root-Finding Algorithm 1. Simulate to get X 1 , . . . , X n i.i.d. 2. Find θ n s.t. 1 P n i =1 g ( X i , θ n ) = 0 n c n 1 P n ∂ g 3. ˆ ∂θ ( X i , θ n ) i =1 n 4. s 2 n 1 P n i =1 g ( X i , θ n ) 2 / ˆ c 2 n n 5. Return asymptotic 100(1 � δ )% CI: θ n � z δ s n p n , θ n + z δ s n h i p n I Can use pilot runs, etc. in the usual way 15 / 30

  16. Estimating Nonlinear Functions of Means Overview Delta Method Jackknife Method Bootstrap Confidence Intervals Complete Bias Elimination 16 / 30

  17. Jackknife Method Estimate 9=9 ( Mx , My ) Overview I Naive point estimator α n = g ( ¯ X n , ¯ Y n ) is biased I Jackknife estimator has lower bias I Avoids need to compute partial derivatives as in Delta method I More computationally intensive Starting point: Taylor series + expectation E [ α n ] = α + b n + c n 2 + · · · I Thus bias is O ( n − 1 ) I Estimate b and adjust? α ∗ n = α n � b n n I Messy partial derivative calculation, adds noise 17 / 30

  18. Jackknife, Continued I Observe that E ( α n ) = α + b n + c n 2 + · · · b c E ( α n � 1 ) = α + n � 1 + ( n � 1) 2 + · · · I and so ✓ 1 1 ◆ c E [ n α n � ( n � 1) α n � 1 ] = α + c n � + · · · = α � n ( n � 1) + · · · n � 1 I Bias reduced to O ( n − 2 )! I Q: What is special about deleting the n th data point? 18 / 30

  19. Jackknife, Continued I Delete each data point in turn to get a low-bias estimator I Average the estimators to reduce variance Jackknife CI Algorithm for α = g ( µ X , µ Y ) 1. Choose n and δ , and set z δ = 1 � ( δ / 2) quantile of N (0 , 1) 2. Simulate to get ( X 1 , Y 1 ) , . . . , ( X n , Y n ) i.i.d. 3. α n g ( ¯ X n , ¯ Y n ) 4. For i = 1 to n n n ! 1 1 X X 4.1 α i n g X j , Y j (leave out X i ) n � 1 n � 1 j =1 j =1 j 6 = i j 6 = i 4.2 α n ( i ) n α n � ( n � 1) α i ( i th pseudovalue) n n (1 / n ) P n 5. Point estimator: α J i =1 α n ( i ) � 2 1 P n 6. v J � α n ( i ) � α J n = i =1 n n − 1 h i α J p n / n , α J p v J v J 7. 100(1 � δ )% CI: n � z δ n + z δ n / n 19 / 30

  20. Jackknife, Continued Observations I Not obvious that CI is correct (why?) I Substitutes computational brute force for analytical complexity I Not a one-pass algorithm I Basic jackknife breaks down for “non-smooth” statistics like quantiles, maximum (but can fix—see next lecture) 20 / 30

  21. Estimating Nonlinear Functions of Means Overview Delta Method Jackknife Method Bootstrap Confidence Intervals Complete Bias Elimination 21 / 30

  22. Bootstrap Confidence Intervals Another brute force method I Key idea: analyze variability of estimator using samples of original data I More general than jackknife (estimates entire sampling distribution of estimator, not just mean and variance) I Jackknife is somewhat better empirically at variance estimates I “Non-repeatable”, unlike jackknife I OK for quantiles, still breaks down for maximum 22 / 30

Recommend


More recommend