issues and solutions in fitting
play

Issues and Solutions in Fitting, Sample Data and Simple Models - PowerPoint PPT Presentation

Issues and Solutions in Regression Modeling Florian Jaeger, Victor Kuperman Issues and Solutions in Fitting, Sample Data and Simple Models Evaluating, and Interpreting Regression Building an interpretable model Models Model Evaluation


  1. Issues and Solutions in Regression Modeling Florian Jaeger, Victor Kuperman Issues and Solutions in Fitting, Sample Data and Simple Models Evaluating, and Interpreting Regression Building an interpretable model Models Model Evaluation Reporting the model Florian Jaeger, Victor Kuperman Discussion March 24, 2009

  2. Issues and Hypothesis testing in psycholinguistic Solutions in Regression research Modeling Florian Jaeger, Victor Kuperman Sample Data and ◮ Typically, we make predictions not just about the Simple Models Building an existence, but also the direction of effects. interpretable model ◮ Sometimes, we’re also interested in effect shapes Model Evaluation (non-linearities, etc.) Reporting the model ◮ Unlike in ANOVA, regression analyses reliably test Discussion hypotheses about effect direction and shape without requiring post-hoc analyses if (a) the predictors in the model are coded appropriately and (b) the model can be trusted . ◮ Today: Provide an overview of (a) and (b).

  3. Issues and Overview Solutions in Regression Modeling ◮ Introduce sample data and simple models Florian Jaeger, Victor Kuperman ◮ Towards a model with interpretable coefficients: ◮ outlier removal Sample Data and Simple Models ◮ transformation Building an ◮ coding, centering, . . . interpretable model ◮ collinearity Model Evaluation ◮ Model evaluation: Reporting the ◮ fitted vs. observed values model ◮ model validation Discussion ◮ investigation of residuals ◮ case influence, outliers ◮ Model comparison ◮ Reporting the model: ◮ comparing effect sizes ◮ back-transformation of predictors ◮ visualization

  4. Sample Data and Simple Models Issues and Building an interpretable model Solutions in Regression Data exploration Modeling Transformation Florian Jaeger, Coding Victor Kuperman Centering Interactions and modeling of non-linearities Sample Data and Simple Models Collinearity Building an What is collinearity? interpretable Detecting collinearity model Dealing with collinearity Model Evaluation Model Evaluation Reporting the Beware overfitting model Detect overfitting: Validation Discussion Goodness-of-fit Aside: Model Comparison Reporting the model Describing Predictors What to report Back-transforming coefficients Comparing effect sizes Visualizing effects Interpreting and reporting interactions Discussion

  5. Issues and Data 1: Lexical decision RTs Solutions in Regression Modeling Florian Jaeger, Victor Kuperman ◮ Outcome: log lexical decision latency RT Sample Data and ◮ Inputs: Simple Models Building an ◮ factors Subject (21 levels) and Word (79 levels), interpretable ◮ factor NativeLanguage ( English and Other ) model ◮ continuous predictors Frequency (log word frequency), Model Evaluation and Trial (rank in the experimental list). Reporting the model Discussion Subject RT Trial NativeLanguage Word Frequency 1 A1 6.340359 23 English owl 4.859812 2 A1 6.308098 27 English mole 4.605170 3 A1 6.349139 29 English cherry 4.997212 4 A1 6.186209 30 English pear 4.727388 5 A1 6.025866 32 English dog 7.667626 6 A1 6.180017 33 English blackberry 4.060443

  6. Issues and Linear model of RTs Solutions in Regression Modeling > lin.lmer = lmer(RT ~ NativeLanguage + + Frequency + Trial + Florian Jaeger, + (1 | Word) + (1 | Subject), Victor Kuperman + data = lexdec) <...> Sample Data and Random effects: Simple Models Groups Name Variance Std.Dev. Word (Intercept) 0.0029448 0.054266 Building an Subject (Intercept) 0.0184082 0.135677 interpretable Residual 0.0297268 0.172415 model Number of obs: 1659, groups: Word, 79; Subject, 21 Model Evaluation Fixed effects: Reporting the Estimate Std. Error t value model (Intercept) 6.548e+00 4.963e-02 131.94 NativeLanguageOther 1.555e-01 6.043e-02 2.57 Discussion Frequency -4.290e-02 5.829e-03 -7.36 Trial -2.418e-04 9.122e-05 -2.65 <...> ◮ estimates for random effects of Subject and Word and for the residual error of the model: standard deviation and variance. ◮ estimates for regression coefficients, standard errors → t-values ◮ Effect significant if ± 2*SE does not include zero (if t -value of ± 2).

  7. Issues and Linear model of RTs (cnt’d) Solutions in Regression Modeling Florian Jaeger, Victor Kuperman ◮ t -value anti-conservative Sample Data and Simple Models → MCMC-sampling of coefficients to obtain non Building an interpretable anti-conservative estimates model Model Evaluation > pvals.fnc(lin.lmer, nsim = 10000) Reporting the $fixed model Estimate MCMCmean HPD95lower HPD95upper pMCMC Pr(>|t|) (Intercept) 6.5476 6.5482 6.4653 6.6325 0.0001 0.0000 Discussion NativeLanguageOther 0.1555 0.1551 0.0580 0.2496 0.0012 0.0001 Frequency -0.0429 -0.0429 -0.0542 -0.0323 0.0001 0.0000 Trial -0.0002 -0.0002 -0.0004 -0.0001 0.0068 0.0109 $random Groups Name Std.Dev. MCMCmedian MCMCmean HPD95lower HPD95upper 1 Word (Intercept) 0.0564 0.0495 0.0497 0.0384 0.0619 2 Subject (Intercept) 0.1410 0.1070 0.1083 0.0832 0.1379 3 Residual 0.1792 0.1737 0.1737 0.1678 0.1799

  8. Issues and Data 2: Lexical decision response Solutions in Regression ◮ Outcome: Correct or incorrect response ( Correct ) Modeling ◮ Inputs: same as in linear model Florian Jaeger, Victor Kuperman > lmer(Correct == "correct" ~ NativeLanguage + Sample Data and + Frequency + Trial + Simple Models + (1 | Subject) + (1 | Word), Building an + data = lexdec, family = "binomial") interpretable model Random effects: Groups Name Variance Std.Dev. Model Evaluation Word (Intercept) 1.01820 1.00906 Reporting the Subject (Intercept) 0.63976 0.79985 model Number of obs: 1659, groups: Word, 79; Subject, 21 Discussion Fixed effects: Estimate Std. Error z value Pr(>|z|) (Intercept) -1.746e+00 8.206e-01 -2.128 0.033344 * NativeLanguageOther -5.726e-01 4.639e-01 1.234 0.217104 Frequency 5.600e-01 1.570e-01 -3.567 0.000361 *** Trial 4.443e-06 2.965e-03 0.001 0.998804 ◮ estimates for random effects of Subject and Word (no residuals). ◮ estimates for regression coefficients, standard errors → Z- and p-values

  9. Issues and Interpretation of coefficients Solutions in Regression Modeling Florian Jaeger, Victor Kuperman ◮ In theory , directionality and shape of effects can be Sample Data and Simple Models tested and immediately interpreted. Building an ◮ e.g. logit model interpretable model Model Evaluation Fixed effects: Reporting the Estimate Std. Error z value Pr(>|z|) model (Intercept) -1.746e+00 8.206e-01 -2.128 0.033344 * Discussion NativeLanguageOther 5.726e-01 4.639e-01 1.234 0.217104 Frequency -5.600e-01 1.570e-01 -3.567 0.000361 *** Trial -5.725e-06 2.965e-03 -0.002 0.998460 ◮ . . . but can these coefficient estimates be trusted?

  10. Sample Data and Simple Models Issues and Building an interpretable model Solutions in Regression Data exploration Modeling Transformation Florian Jaeger, Coding Victor Kuperman Centering Interactions and modeling of non-linearities Sample Data and Simple Models Collinearity Building an What is collinearity? interpretable Detecting collinearity model Dealing with collinearity Data exploration Transformation Model Evaluation Coding Centering Beware overfitting Interactions and modeling Detect overfitting: Validation of non-linearities Collinearity Goodness-of-fit What is collinearity? Aside: Model Comparison Detecting collinearity Dealing with collinearity Reporting the model Model Evaluation Describing Predictors Reporting the What to report model Back-transforming coefficients Discussion Comparing effect sizes Visualizing effects Interpreting and reporting interactions Discussion

  11. Issues and Modeling schema Solutions in Regression Modeling Florian Jaeger, Victor Kuperman Sample Data and Simple Models Building an interpretable model Data exploration Transformation Coding Centering Interactions and modeling of non-linearities Collinearity What is collinearity? Detecting collinearity Dealing with collinearity Model Evaluation Reporting the model Discussion

  12. Issues and Data exploration Solutions in Regression Modeling Florian Jaeger, Victor Kuperman Sample Data and Simple Models Building an interpretable model Data exploration Transformation Coding Centering Interactions and modeling of non-linearities Collinearity What is collinearity? Detecting collinearity Dealing with collinearity Model Evaluation Reporting the model Discussion

  13. Issues and Data exploration Solutions in Regression Modeling Florian Jaeger, Victor Kuperman ◮ Select and understand input variables and outcome Sample Data and based on a-priori theoretical consideration Simple Models ◮ How many parameters does your data afford Building an interpretable ( � overfitting )? model Data exploration ◮ Data exploration: Before fitting the model, explore Transformation Coding inputs and outputs Centering Interactions and modeling ◮ Outliers due to missing data or measurement error (e.g. of non-linearities Collinearity RTs in SPR < 80msecs). What is collinearity? ◮ NB: postpone distribution-based outlier exclusion until Detecting collinearity Dealing with collinearity after transformations ) Model Evaluation ◮ Skewness in distribution can affect the accuracy of Reporting the model’s estimates ( � transformations ). model Discussion

Recommend


More recommend