CS147 2015-06-15 CS 147: Computer Systems Performance Analysis Advanced Regression Techniques CS 147: Computer Systems Performance Analysis Advanced Regression Techniques 1 / 31
Overview CS147 Overview 2015-06-15 Curvilinear Regression Common Transformations General Transformations Handling Outliers Overview Common Mistakes Curvilinear Regression Common Transformations General Transformations Handling Outliers Common Mistakes 2 / 31
Curvilinear Regression Curvilinear Regression CS147 Curvilinear Regression 2015-06-15 Curvilinear Regression ◮ Linear regression assumes a linear relationship between predictor and response ◮ What if it isn’t linear? ◮ You need to fit some other type of function to the relationship Curvilinear Regression ◮ Linear regression assumes a linear relationship between predictor and response ◮ What if it isn’t linear? ◮ You need to fit some other type of function to the relationship 3 / 31
Curvilinear Regression When To Use Curvilinear Regression CS147 When To Use Curvilinear Regression 2015-06-15 Curvilinear Regression ◮ Easiest to tell by sight ◮ Make a scatter plot ◮ If plot looks non-linear, try curvilinear regression ◮ Or if non-linear relationship is suspected for other reasons When To Use Curvilinear Regression ◮ Relationship should be convertible to a linear form ◮ Easiest to tell by sight ◮ Make a scatter plot ◮ If plot looks non-linear, try curvilinear regression ◮ Or if non-linear relationship is suspected for other reasons ◮ Relationship should be convertible to a linear form 4 / 31
Curvilinear Regression Common Transformations Types of Curvilinear Regression CS147 Types of Curvilinear Regression 2015-06-15 Curvilinear Regression ◮ Many possible types, based on a variety of relationships: Common Transformations ◮ y = ax b ◮ y = a + b / x ◮ y = ab x ◮ Etc., ad infinitum Types of Curvilinear Regression ◮ Many possible types, based on a variety of relationships: ◮ y = ax b ◮ y = a + b / x ◮ y = ab x ◮ Etc., ad infinitum 5 / 31
Curvilinear Regression Common Transformations Transform Them to Linear Forms CS147 Transform Them to Linear Forms 2015-06-15 Curvilinear Regression ◮ Apply logarithms, multiplication, division, whatever to produce something in linear form Common Transformations ◮ I.e., y = a + b × something ◮ Or a similar form ◮ If predictor appears in more than one transformed predictor Transform Them to Linear Forms variable, correlation is likely! ◮ Apply logarithms, multiplication, division, whatever to produce something in linear form ◮ I.e., y = a + b × something ◮ Or a similar form ◮ If predictor appears in more than one transformed predictor variable, correlation is likely! 6 / 31
Curvilinear Regression Common Transformations Sample Transformations CS147 Sample Transformations 2015-06-15 Curvilinear Regression ◮ For y = ae bx take logarithm of y , do regression on log y = b 0 + b 1 x , let b = b 1 , a = e b 0 Common Transformations ◮ For y = a + b log x , take log of x before fitting parameters, let b = b 1 , a = b 0 ◮ For y = ax b , take log of both x and y , let b = b 1 , a = e b 0 Sample Transformations ◮ For y = ae bx take logarithm of y , do regression on log y = b 0 + b 1 x , let b = b 1 , a = e b 0 ◮ For y = a + b log x , take log of x before fitting parameters, let b = b 1 , a = b 0 ◮ For y = ax b , take log of both x and y , let b = b 1 , a = e b 0 7 / 31
Curvilinear Regression Common Transformations Corrections to Jain p. 257 CS147 Corrections to Jain p. 257 (Early Editions) 2015-06-15 Curvilinear Regression (Early Editions) Nonlinear Linear y = a + b / x y = a+b(1/x) Common Transformations y = 1 / ( a + bx ) ( 1 / y ) = a + bx y = x ( a + bx ) ( x / y ) = a + bx y = ab x ln y = ln a + x ln b Corrections to Jain p. 257 y = a + bx n y = a + b ( x n ) Nonlinear Linear y = a + b / x y = a+b(1/x) y = 1 / ( a + bx ) ( 1 / y ) = a + bx y = x ( a + bx ) ( x / y ) = a + bx y = ab x ln y = ln a + x ln b y = a + bx n y = a + b ( x n ) 8 / 31
Curvilinear Regression General Transformations General Transformations CS147 General Transformations 2015-06-15 Curvilinear Regression ◮ Use some function of response variable y in place of y itself General Transformations ◮ Curvilinear regression is one example ◮ But techniques are more generally applicable General Transformations ◮ Use some function of response variable y in place of y itself ◮ Curvilinear regression is one example ◮ But techniques are more generally applicable 9 / 31
Curvilinear Regression General Transformations When To Transform? CS147 When To Transform? 2015-06-15 Curvilinear Regression ◮ If known properties of measured system suggest it General Transformations ◮ If data’s range covers several orders of magnitude ◮ If homogeneous variance assumption of residuals (homoscedasticity) is violated When To Transform? ◮ If known properties of measured system suggest it ◮ If data’s range covers several orders of magnitude ◮ If homogeneous variance assumption of residuals (homoscedasticity) is violated 10 / 31
Curvilinear Regression General Transformations Transforming Due To (Lack of) Homoscedasticity CS147 Transforming Due To (Lack of) Homoscedasticity 2015-06-15 Curvilinear Regression ◮ If spread of scatter plot of residual vs. predicted response General Transformations isn’t homogeneous, ◮ Then residuals are still functions of the predictor variables ◮ Transformation of response may solve the problem Transforming Due To (Lack of) Homoscedasticity ◮ If spread of scatter plot of residual vs. predicted response isn’t homogeneous, ◮ Then residuals are still functions of the predictor variables ◮ Transformation of response may solve the problem 11 / 31
Curvilinear Regression General Transformations What Transformation To Use? CS147 What Transformation To Use? 2015-06-15 Curvilinear Regression ◮ Compute standard deviation of residuals ◮ Plot as function of mean of observations ◮ Assuming multiple experiments for single set of predictor values ◮ Check for linearity: if linear, use a log transform General Transformations ◮ If variance against mean of observations is linear, use square-root transform ◮ If standard deviation against mean squared is linear, use What Transformation To Use? inverse (1 / y ) transform ◮ If standard deviation against mean to a power is linear, use power transform ◮ More covered in the book ◮ Compute standard deviation of residuals ◮ Plot as function of mean of observations ◮ Assuming multiple experiments for single set of predictor values ◮ Check for linearity: if linear, use a log transform ◮ If variance against mean of observations is linear, use square-root transform ◮ If standard deviation against mean squared is linear, use inverse (1 / y ) transform ◮ If standard deviation against mean to a power is linear, use power transform ◮ More covered in the book 12 / 31
Curvilinear Regression General Transformations General Transformation Principle CS147 General Transformation Principle 2015-06-15 Curvilinear Regression For some observed relation between standard deviation and General Transformations mean, s = g ( y ) : 1 � let h ( y ) = g ( y ) dy transform to w = h ( y ) and regress on w General Transformation Principle For some observed relation between standard deviation and mean, s = g ( y ) : 1 � let h ( y ) = g ( y ) dy transform to w = h ( y ) and regress on w 13 / 31
Curvilinear Regression General Transformations Example: Log Transformation CS147 Example: Log Transformation 2015-06-15 Curvilinear Regression If standard deviation against mean is linear, then g ( y ) = ay General Transformations � 1 ay dy = 1 So h ( y ) = a ln y Example: Log Transformation If standard deviation against mean is linear, then g ( y ) = ay � ay dy = 1 1 So h ( y ) = a ln y 14 / 31
Curvilinear Regression General Transformations Confidence Intervals for Nonlinear Regressions CS147 Confidence Intervals for Nonlinear Regressions 2015-06-15 Curvilinear Regression ◮ For nonlinear fits using general (e.g., exponential) transformations: General Transformations ◮ Confidence intervals apply to transformed parameters ◮ Not valid to perform inverse transformation before calculating intervals Confidence Intervals for Nonlinear ◮ Must express confidence intervals in transformed domain Regressions ◮ For nonlinear fits using general (e.g., exponential) transformations: ◮ Confidence intervals apply to transformed parameters ◮ Not valid to perform inverse transformation before calculating intervals ◮ Must express confidence intervals in transformed domain 15 / 31
Handling Outliers Outliers CS147 Outliers 2015-06-15 Handling Outliers ◮ Atypical observations might be outliers ◮ Measurements that are not truly characteristic ◮ By chance, several standard deviations out ◮ Or mistakes might have been made in measurement ◮ Which leads to a problem: Outliers Do you include outliers in analysis or not? ◮ Atypical observations might be outliers ◮ Measurements that are not truly characteristic ◮ By chance, several standard deviations out ◮ Or mistakes might have been made in measurement ◮ Which leads to a problem: Do you include outliers in analysis or not? 16 / 31
Recommend
More recommend