Estimating Nonlinear Functions of Means Peter J. Haas CS 590M: - PowerPoint PPT Presentation

Estimating Nonlinear Functions of Means Peter J. Haas CS 590M: Simulation Spring Semester 2020 1 / 30

Estimating Nonlinear Functions of Means Overview Delta Method Jackknife Method Bootstrap Confidence Intervals Complete Bias Elimination 2 / 30

Nonlinear Functions of Means Our focus up until now I Estimate quantities of the form µ = E [ X ] I E.g., expected win/loss of gambling game I We’ll now focus on more complex quantities Nonlinear functions of means: α = g ( µ 1 , µ 2 , . . . , µ d ), where I g is a nonlinear function I µ i = E [ X ( i ) ] for 1  i  d I For simplicity, take d = 2 and focus on α = g ( µ X , µ Y ) I µ X = E [ X ] and µ Y = E [ Y ] 3 / 30

Bonferroni inequality Example: Retail Outlet - NBC - PLAT ) PLAN B) 21 I Goal: Estimate α = long-run average revenue per customer I X i = R i = revenue generated on day i I Y i = number of customers on day i I Assume that pairs ( X 1 , Y 1 ) , ( X 2 , Y 2 ) , . . . are i.i.d. I Set ¯ i =1 X i and ¯ X n = (1 / n ) P n Y n = (1 / n ) P n i =1 Y i ¯ ¥ X 1 + · · · + X n X n α = lim = lim = ¯ Y 1 + · · · + Y n Y n My n !1 n !1 X I So α = g ( µ X , µ Y ), where g ( x , y ) = y 4 / 30

Example: Higher-Order Moments I Let R 1 , R 2 , . . . be daily revenues as before I Assume that the R i ’s are i.i.d. (Critique?) I α = Var[ R ] = variance of daily revenue I Let X = R 2 and Y = R - yin x α = g ( µ X , µ Y ) , where g ( x , y ) = 5 / 30

gcmx.tt/)--3Mxt4MyuaYdn--3Xn*4YnECdnT--3EExnTt4 Linger ; if Delta Method (Taylor Series) ELENI 'd Assume that function g ( x , y ) is smooth I Continuously di ff erentiable in neighborhood of ( µ x , µ y ) I I.e., g is continuous, as are ∂ g / ∂ x and ∂ g / ∂ y Point estimate I Run simulation n times to get ( X 1 , Y 1 ) , . . . , ( X n , Y n ) i.i.d. strong seonsistenoy I Set α n = g ( ¯ X n , ¯ Y n ) Plan → a) =L I This estimator is biased: I E [ α n ] = E [ g ( ¯ X n , ¯ Y n )] 6 = g ( E [ ¯ X n ] , E [ ¯ Y n ]) = g ( µ x , µ y ) = α I Jensen’s inequality: E [ α n ] = E [ g ( ¯ X n )] � g ( µ X ) = α if g is convex I By SLLN and continuity of g , we have bias ! 0 as n ! 1 (Estimator α n is asymptotically unbiased) 7 / 30

Delta Method, Continued Confidence interval I ( ¯ X n , ¯ Y n ) should be “close” to ( µ X , µ Y ) for large n by SLLN I α n = g ( ¯ X n , ¯ Y n ) should be close to g ( µ X , µ Y ) = α α n � α = g ( ¯ X n , ¯ Y n ) � g ( µ X , µ Y ) = ∂ g X � µ X ) + ∂ g ∂ x ( µ X , µ Y ) · ( ¯ ∂ y ( µ X , µ Y ) · ( ¯ Y � µ Y ) n n = ¯ Z n I Z i = c ( X i � µ X ) + d ( Y i � µ Y ) and ¯ Z n = (1 / n ) P n i =1 Z i I c = ∂ g ∂ x ( µ X , µ Y ) and d = ∂ g ∂ y ( µ X , µ Y ) 8 / 30

Delta Method, Continued Confidence interval, continued I { Z n : n � 1 } are i.i.d. as Z = c ( X � µ X ) + d ( Y � µ Y ) I E [ Z ] = 0 I By CLT, p n ¯ Z n / σ D ⇠ N (0 , 1) approximately for large n I Thus p n ( α n � α ) / σ D ⇠ N (0 , 1) approximately for large n I Here σ 2 = Var[ Z ] = E [ Z 2 ] = E � 2 ⇤ ⇥� c ( X � µ X ) + d ( Y � µ Y ) I So asymptotic 100(1 � δ )% CI is α n ± z δ σ / p n I z δ is 1 � ( δ / 2) quantile of standard normal distribution I Estimate c , d , and σ from data 9 / 30

Delta Method, Continued Delta Method CI Algorithm 1. Simulate to get ( X 1 , Y 1 ) , . . . , ( X n , Y n ) i.i.d. 2. α n g ( ¯ X n , ¯ Y n ) ∂ x ( ¯ X n , ¯ ∂ y ( ¯ X n , ¯ 3. c n ∂ g Y n ) and d n ∂ g Y n ) � 2 n = ( n � 1) − 1 P n c n ( X i � ¯ X n ) + d n ( Y i � ¯ 4. s 2 � Y n ) i =1 5. Return asymptotic 100(1 � δ )% CI: α n � z δ s n p n , α n + z δ s n h i p n I SLLN and continuity assumptions imply that, with prob. 1, c n ! c , d n ! d , and s 2 n ! σ 2 10 / 30

Example: Ratio Estimation: g ( x , y ) = x / y Multi-pass method (apply previous algorithm directly) - Y÷r " ' Ma , fry ' ' Ii α = c = d = α n = c n = d n = - En My . n � 2 s 2 n = ( n � 1) � 1 X c n ( X i � ¯ X n ) + d n ( Y i � ¯ � Y n ) i =1 11 / 30

Example: Ratio Estimation: g ( x , y ) = x / y Single-pass method σ 2 = Var[ Z ] = Var[ c ( X � µ X ) + d ( Y � µ Y )] = Var[ X ] � 2 α Cov[ X , Y ] + α 2 Var[ Y ] µ 2 Y n = s n (1 , 1) � 2 α n s n (1 , 2) + α 2 n s n (2 , 2) s 2 � ¯ � 2 Y n 1 P n i =1 ( X i � ¯ I s n (1 , 1) = X n ) 2 Use n � 1 1 P n i =1 ( Y i � ¯ I s n (2 , 2) = Y n ) 2 single-pass n � 1 i =1 ( X i � ¯ X n )( Y i � ¯ 1 P n formulas I s n (1 , 2) = Y n ) n � 1 n = P n n = P n I Set S X i =1 X i and S Y i =1 Y i ! ! S X S Y k � 1 � ( k � 1) X k k � 1 � ( k � 1) Y k ( k � 1) v k = ( k � 1) v k � 1 + k � 1 k 12 / 30

Delta Method for Stochastic Root-Finding Problem: Find ¯ θ such that E [ g ( X , ¯ θ )] = 0 (can replace 0 with any fixed constant) Applications: I Process control, risk management, finance, quantiles, . . . I Stochastic optimization: min θ E [ h ( X , θ )] I Optimality condition: ∂ ∂θ E [ h ( X , θ )] = 0 h i I Can often show that ∂ ∂ ∂θ E [ h ( X , θ )] = E ∂θ h ( X , θ ) I So take g ( X , θ ) = ∂ ∂θ h ( X , θ ) Point Estimate (Stochastic Average Approximation) I Generate X 1 , . . . , X n i.i.d. as X I Find θ n s.t. 1 P n i =1 g ( X i , θ n ) = 0 (deterministic problem) n 13 / 30

Delta Method for Stochastic Root-Finding Problem: Find ¯ θ such that E [ g ( X , ¯ θ )] = 0 Point Estimate (Stochastic Average Approximation) I Generate X 1 , . . . , X n i.i.d. as X P n I Find θ n s.t. 1 i =1 g ( X i , θ n ) = 0 n How to find a confidence interval for ¯ θ ? I Taylor series: g ( X i , θ n ) ⇡ g ( X i , ¯ ∂θ ( X i , ¯ θ )( θ n � ¯ θ ) + ∂ g θ ) I Implies: 1 P n i =1 g ( X i , θ n ) ⇡ 1 P n i =1 g ( X i , ¯ θ ) � c n (¯ θ � θ n ) n n ⇥ ∂ g I where c n = 1 P n ∂ g ∂θ ( X i , ¯ ∂θ ( X , ¯ ⇤ θ ) ⇡ E θ ) O i =1 n Etgcxi , = I Implies: ¯ P n i =1 g ( X i , ¯ θ � θ n ⇡ 1 1 θ ) c n n I Implies: θ n � ¯ θ ⇡ N (0 , σ 2 / n ), where σ 2 = Var[ g ( X , ¯ n = E [ g ( X , ¯ θ )] / c 2 θ ) 2 ] / c 2 n 14 / 30

Delta Method for Stochastic Root-Finding Algorithm 1. Simulate to get X 1 , . . . , X n i.i.d. 2. Find θ n s.t. 1 P n i =1 g ( X i , θ n ) = 0 n c n 1 P n ∂ g 3. ˆ ∂θ ( X i , θ n ) i =1 n 4. s 2 n 1 P n i =1 g ( X i , θ n ) 2 / ˆ c 2 n n 5. Return asymptotic 100(1 � δ )% CI: θ n � z δ s n p n , θ n + z δ s n h i p n I Can use pilot runs, etc. in the usual way 15 / 30

Jackknife Method Estimate 9=9 ( Mx , My ) Overview I Naive point estimator α n = g ( ¯ X n , ¯ Y n ) is biased I Jackknife estimator has lower bias I Avoids need to compute partial derivatives as in Delta method I More computationally intensive Starting point: Taylor series + expectation E [ α n ] = α + b n + c n 2 + · · · I Thus bias is O ( n − 1 ) I Estimate b and adjust? α ∗ n = α n � b n n I Messy partial derivative calculation, adds noise 17 / 30

Jackknife, Continued I Observe that E ( α n ) = α + b n + c n 2 + · · · b c E ( α n � 1 ) = α + n � 1 + ( n � 1) 2 + · · · I and so ✓ 1 1 ◆ c E [ n α n � ( n � 1) α n � 1 ] = α + c n � + · · · = α � n ( n � 1) + · · · n � 1 I Bias reduced to O ( n − 2 )! I Q: What is special about deleting the n th data point? 18 / 30

Jackknife, Continued I Delete each data point in turn to get a low-bias estimator I Average the estimators to reduce variance Jackknife CI Algorithm for α = g ( µ X , µ Y ) 1. Choose n and δ , and set z δ = 1 � ( δ / 2) quantile of N (0 , 1) 2. Simulate to get ( X 1 , Y 1 ) , . . . , ( X n , Y n ) i.i.d. 3. α n g ( ¯ X n , ¯ Y n ) 4. For i = 1 to n n n ! 1 1 X X 4.1 α i n g X j , Y j (leave out X i ) n � 1 n � 1 j =1 j =1 j 6 = i j 6 = i 4.2 α n ( i ) n α n � ( n � 1) α i ( i th pseudovalue) n n (1 / n ) P n 5. Point estimator: α J i =1 α n ( i ) � 2 1 P n 6. v J � α n ( i ) � α J n = i =1 n n − 1 h i α J p n / n , α J p v J v J 7. 100(1 � δ )% CI: n � z δ n + z δ n / n 19 / 30

Jackknife, Continued Observations I Not obvious that CI is correct (why?) I Substitutes computational brute force for analytical complexity I Not a one-pass algorithm I Basic jackknife breaks down for “non-smooth” statistics like quantiles, maximum (but can fix—see next lecture) 20 / 30

Bootstrap Confidence Intervals Another brute force method I Key idea: analyze variability of estimator using samples of original data I More general than jackknife (estimates entire sampling distribution of estimator, not just mean and variance) I Jackknife is somewhat better empirically at variance estimates I “Non-repeatable”, unlike jackknife I OK for quantiles, still breaks down for maximum 22 / 30

Estimating Nonlinear Functions of Means Peter J. Haas CS 590M: - PowerPoint PPT Presentation

Estimating Nonlinear Functions of Means Peter J. Haas CS 590M: Simulation Spring Semester 2020 1 / 30 Estimating Nonlinear Functions of Means Overview Delta Method Jackknife Method Bootstrap Confidence Intervals Complete Bias Elimination

Estimating Variance under Estimating Mean . . . Interval and Fuzzy Estimating Variance . . .

Nonlinear Control Lecture # 31 Nonlinear Observers Nonlinear Control Lecture # 31 Nonlinear

Nonlinear Control Lecture # 22 Special nonlinear Forms Nonlinear Control Lecture # 22 Special

Nonlinear Control Lecture # 21 Special nonlinear Forms Nonlinear Control Lecture # 21 Special

Estimating Estimating Covariance . . . Statistical Characteristics Estimating . . . Proof of

Nonlinear Control Lecture # 8 Special nonlinear Forms Nonlinear Control Lecture # 8 Special

Nonlinear Control Lecture # 12 Nonlinear Observers and Output Feedback Stabilization Nonlinear

Nonlinear Control Lecture # 20 Special nonlinear Forms Nonlinear Control Lecture # 20 Special

Methods for estimating the diagonal of matrix functions Jesse Laeuchli Andreas Stathopoulos CSC

Planning III-A: Planning III-A: Estimating Software Size - Estimating Software Size -

Estimating Frequency Moments Estimating F 0 Algorithm Correctness Further Anil Maheshwari

Estimating Frequency Moments Moments Estimating F 0 Algorithm Correctness Anil Maheshwari

More on Functions Thomas Schwarz, SJ Marquette University Functions of Functions Functions

Nonlinear Control Lecture # 1 Introduction Nonlinear Control Lecture # 1 Introduction Nonlinear

Numerical Proofs in Nonlinear Control Sicun Gao, UCSD Nonlinear control working Nonlinear

Estimating Relative Expression Mark Voorhies 4/6/2011 Mark Voorhies Estimating Relative

Review - Mathematical Statistics Estimators and Estimates Unbiased estimators Efficiency

02941 Physically Based Rendering Density Estimation in Photon Mapping Jeppe Revall Frisvad March

Unbiased Estimation Binomial problem shows general phenomenon. An estimator can be good for some

Second order reduced bias tail index estimators under a third order framework M. Ivette Gomes

Accuracy & confidence Most of course so far: estimating stuff from data Today: how much

Lecture 4 Mojtaba Soltanalian- UIC msol@uic.edu http://msol.people.uic.edu Based on ECE 531

Almost-sure hedging under permanent price impact Y.Zou Universit e Paris Dauphine April 20,

Likelihood-based estimation, model selection, and forecasting of integer-valued trawl processes

Estimating Nonlinear Functions of Means Peter J. Haas CS 590M: - PowerPoint PPT Presentation

Estimating Nonlinear Functions of Means Peter J. Haas CS 590M: Simulation Spring Semester 2020 1 / 30 Estimating Nonlinear Functions of Means Overview Delta Method Jackknife Method Bootstrap Confidence Intervals Complete Bias Elimination

Estimating Variance under Estimating Mean . . . Interval and Fuzzy Estimating Variance . . .

Nonlinear Control Lecture # 31 Nonlinear Observers Nonlinear Control Lecture # 31 Nonlinear

Nonlinear Control Lecture # 22 Special nonlinear Forms Nonlinear Control Lecture # 22 Special

Nonlinear Control Lecture # 21 Special nonlinear Forms Nonlinear Control Lecture # 21 Special

Estimating Estimating Covariance . . . Statistical Characteristics Estimating . . . Proof of

Nonlinear Control Lecture # 8 Special nonlinear Forms Nonlinear Control Lecture # 8 Special

Nonlinear Control Lecture # 12 Nonlinear Observers and Output Feedback Stabilization Nonlinear

Nonlinear Control Lecture # 20 Special nonlinear Forms Nonlinear Control Lecture # 20 Special

Methods for estimating the diagonal of matrix functions Jesse Laeuchli Andreas Stathopoulos CSC

Planning III-A: Planning III-A: Estimating Software Size - Estimating Software Size -

Estimating Frequency Moments Estimating F 0 Algorithm Correctness Further Anil Maheshwari

Estimating Frequency Moments Moments Estimating F 0 Algorithm Correctness Anil Maheshwari

More on Functions Thomas Schwarz, SJ Marquette University Functions of Functions Functions

Nonlinear Control Lecture # 1 Introduction Nonlinear Control Lecture # 1 Introduction Nonlinear

Numerical Proofs in Nonlinear Control Sicun Gao, UCSD Nonlinear control working Nonlinear

Estimating Relative Expression Mark Voorhies 4/6/2011 Mark Voorhies Estimating Relative

Review - Mathematical Statistics Estimators and Estimates Unbiased estimators Efficiency

02941 Physically Based Rendering Density Estimation in Photon Mapping Jeppe Revall Frisvad March

Unbiased Estimation Binomial problem shows general phenomenon. An estimator can be good for some

Second order reduced bias tail index estimators under a third order framework M. Ivette Gomes

Accuracy &amp; confidence Most of course so far: estimating stuff from data Today: how much

Lecture 4 Mojtaba Soltanalian- UIC msol@uic.edu http://msol.people.uic.edu Based on ECE 531

Almost-sure hedging under permanent price impact Y.Zou Universit e Paris Dauphine April 20,

Likelihood-based estimation, model selection, and forecasting of integer-valued trawl processes

Accuracy & confidence Most of course so far: estimating stuff from data Today: how much