research questions research questions
play

Research questions... Research questions... Could missing data - PowerPoint PPT Presentation

Research questions... Research questions... Could missing data method change the quality of the results obtained from a Customer Satisfaction market study? Could standard or classical imputation methods be applied no matter the rate of non


  1. Research questions... Research questions... Could missing data method change the quality of the results obtained from a Customer Satisfaction market study? Could standard or classical imputation methods be applied no matter the rate of non responses? Missing Data, PLS and Bootstrap: Missing Data, PLS and Bootstrap: Could Bootstrap improve quality of estimates? A Magical Recipe? A Magical Recipe? Cordeiro, C.; Machás, A. and Neves, M. The user R Conference, Wien, Austria, June 15-17, 2006 2 Missing Data Missing Data Methods Missing Data Missing Data Methods Standard practices to treat non-responses are not statistically justified and could result in biased estimates IMPUTATION METHODS  Mean, Modal and Median Data imputation methods are used for reconstructing the incomplete data to obtain a  Nearest Neighbour (NN) complete data set to produce more accurate estimates. MODEL BASED METHODS  Multiple Imputation (MI) Most common methods to treat missing data are: Mean imputation  Maximum Likelihood (ML) Listwise deletion Pairwise deletion  Expectation Maximization (EM) Maximum Likelihood 3 4

  2. Missing Data and Bootstrap Missing Data and Bootstrap Bootstrap and SEM-PLS on CSM Case Study: Bootstrap and SEM-PLS on CSM Case Study: ACSI Model for Mobile Telecom (Fornell, C) Efron(1994) uses the extensive imputation theory developed by Rubin(1987) SEM estimated with PLS algorithm (Chin, W) Data treatment for missing data: Standard procedure Mean imputation The simplest nonparametric bootstrap approach:  The rows in the original data matrix are resampled with replacement  STRUCTURAL MODEL  MEASUREMENT MODEL (number of questions) Image 5 Image  A bootstrap matrix is obtained and a bootstrap estimate is calculated for the parameter in Expectations 3 study Quality 8 Expect CS CL Value 2 So an extensive computer work is performed, repeating the above procedure several times; Value a large number of estimates are calculated and imputed in the original data. Customer Satisfaction 3 Customer Loyalty 2 Quality  ACSI Model, with Image like in EPSI Model 5 6 Methodological aspects Methodological aspects Using R Using R Bootstrap application in R Compared scenarios: Compared scenarios: Step1: matrix rows are resampled with replacement;  10% Rate of non responses from Original Data Matrix X = 1 st scenario  50% Rate of non-responses from Simulated Data Matrix Y= 2 nd scenario Step2: a bootstrap sample is obtained; 60 55 57 55 55 56 56 54 Step3: a bootstrap estimate is computed according to the missing data 53 52 52 52 52 51 50 50 50 50 50 49 Rate of non responses (%) 48 46 46 50 44 method; 40 30 Step4: go to step1. 20 10 10 8 7 7 6 6 6 6 10 5 5 4 4 4 4 3 3 2 2 1 1 1 0 0 1st scenario 2nd scenario 7 8

  3. Case Study questions... Using R Case Study questions... Using R  This procedure was repeated r=5000 times; ? How the classical missing data techniques perform for the two scenarios  Missing values in scenarios 1 and 2, are replaced with new ? How the Bootstrap perform with the missing data techniques for the estimates generated by 5000 replications; two scenarios  Then, a new PLS estimation is performed. ? What conclusion based on quality measures of model adjustment like: RSquared, Residual Variance…. Both scenarios, using bootstrap methodology, were compared with the classical situation (CSM estimation based on PLS, where Mean Imputation is the ad hoc procedure adopted for ECSI/EPSI model). 9 10 Conclusion The work still goes on... Conclusion The work still goes on...  Perform an extensive theoretical work  1st Scenario (10%): Bootstrap methodology doesn't increase the quality of estimates  Improve some performance methods  2nd Scenario (50%): Bootstrap methodology used with Hot Deck Imputation and K Nearest Neighbor achieves good results  Explore other bootstrap approaches to the estimation in the problem Overall, it was seen that for a higher non- response rates, bootstrap of missing is the best method to be adopted in case of missing data completely at random. THANK YOU 11 12

Recommend


More recommend