imputation methodology for
play

Imputation Methodology for the Agricultural Resource Management - PowerPoint PPT Presentation

Assessing the Impact of a New Imputation Methodology for the Agricultural Resource Management Survey Darcy Miller National Agricultural Statistics Service . . . providing timely, accurate, and useful statistics in service to U.S.


  1. Assessing the Impact of a New Imputation Methodology for the Agricultural Resource Management Survey Darcy Miller National Agricultural Statistics Service “. . . providing timely, accurate, and useful statistics in service to U.S. agriculture.”

  2. National Agricultural Statistics Service (NASS) • “The National Agricultural Statistics Service provides timely, accurate, and useful statistics in service to U.S. Agriculture.” Summer Conference Preview/Review 2014 July 22 nd 2014 2

  3. Agricultural Resource Management Survey (ARMS) ARMS is the USDA’s primary survey for the annual collection of data from farm operators • Household • demographic attributes, labor allocation and debt • Farm • ownership, management structure, cost and returns, assets and debt • Production Practices • tillage, fertilizer, and pesticides Summer Conference Preview/Review 2014 July 22 nd 2014 3

  4. Background • Research effort started in June 2009 • Cooperative agreement between NASS and National Institute of Statistical Sciences (NISS) • Agreement formed in response to a panel review by the Committee on National Statistics (CNSTAT) Summer Conference Preview/Review 2014 July 22 nd 2014 4

  5. Recommendation from CNSTAT Recommendation 6.7: NASS and ERS should consider approaches for imputation of missing data that would be appropriate when analyzing the data using multivariate models. Methods for accounting for the variability due to using imputed values should be investigated. Such methods would depend on the imputation approach adopted. Summer Conference Preview/Review 2014 July 22 nd 2014 5

  6. Current Imputation Methodology • Uses conditional mean imputation • Form Groups of Operations believed to be similar (Region, Farm Size, Farm Type) • Impute the mean item value of the group for operations in the group with missing values for that item Summer Conference Preview/Review 2014 July 22 nd 2014 6

  7. New Imputation Methodology • Uses multiple variables in imputation • Data are transformed and a regression-based technique is used • Various criteria are used to select the covariates • Parameter estimates for the sequence of linear models and imputations are obtained using Markov chain Monte Carlo • Referred to as Iterative Sequential Regression (ISR) Summer Conference Preview/Review 2014 July 22 nd 2014 7

  8. Operational Testing • R for Operational Use • Generalization & User Interface • Integrity of Data File • Transformations • Convergence • Impact to Workload • Impact to Indications Summer Conference Preview/Review 2014 July 22 nd 2014 8

  9. R for Operational Use • R was approved for operational use by the end of the research project. • Server Issues – Loading – Moving Data Across Platforms Summer Conference Preview/Review 2014 July 22 nd 2014 9

  10. Generalization & User Interface • Parameter Files – Calculated Variables, Variable Groups, Variable Types, Questionnaire Versions, Transformations, Percents, Income Bins, Notification Email, Seed & Iterations & Imputations • SAS Programs – Convert &Move Data and Run Program – Move Data and Convert Data Summer Conference Preview/Review 2014 July 22 nd 2014 10

  11. Integrity of Data File • Moving data across platforms and software – Character Values – Rounded Values • Correct Cells and Reasonable Values • Zeros Summer Conference Preview/Review 2014 July 22 nd 2014 11

  12. Efficacy of Transformations 2008-2012 • Achieving Normality (Univariate) • Across all years, 2008 to 2012, the transformations selected produce a reasonable fit for nearly every variable. Summer Conference Preview/Review 2014 July 22 nd 2014 12

  13. Markov Chain Monte Carlo Convergence Diagnostics 2008-2012 • Looking across the years 2008 to 2012, convergence seems to be demonstrated by the 100 th iteration for most imputed variables, and by the 200 th iteration for most of the remainder. Summer Conference Preview/Review 2014 July 22 nd 2014 13

  14. Analysis of 2011/2012 Data • Evaluated change in workload by analyzing the critical error counts • Examined the 18 key variables after the summary • Looked results for 2011, 2012, and 2012 “collapsed” (covariates summed together) Summer Conference Preview/Review 2014 July 22 nd 2014 14

  15. Workload Evaluation • Analyzed Critical Error Count Differences and Percent Differences for the following scenarios: – 2011 ISR vs. 2011 Mean – 2012 ISR vs. 2012 Mean – 2012 ISR Collapsed vs. 2012 Mean Summer Conference Preview/Review 2014 July 22 nd 2014 15

  16. US Level Results Summer Conference Preview/Review 2014 July 22 nd 2014 16

  17. Workload Assessment Conclusions • Indications that the new ISR method will somewhat increase workload • Indications that collapsing the variables included in the model will somewhat increase workload compared to the full variable model • Indications that adding a couple edits to the ISR program will not significantly reduce the workload Summer Conference Preview/Review 2014 July 22 nd 2014 17

  18. Impact to Estimates • NASS publishes 18 estimates from data collected on ARMS III • 3 estimates include some imputed data • No post edit run after imputation Summer Conference Preview/Review 2014 July 22 nd 2014 18

  19. Impact to Indications • - Agricultural Chemicals - Livestock, Poultry, and Expenditures Related Expenses • - Farm Improvements and - Miscellaneous Capital Construction Expenses • - Other Farm Machinery - Farm Services* Expenditures - Farm Supplies and Repairs • - Rent - Feed Expenditures • - Seeds and Plants - Fertilizer, Lime and Soil • - Taxes* Conditioner Expenditures • - Total Expenditures* - Fuels Expenditures • - Tractor and Self-Propelled - Interest Farm Machinery Expenditures - Labor Expenditures • - Trucks and Autos Expenditures * Variable contains imputed values

  20. Calibration Interaction • Components that make up GVSALES are imputed. – i.e. P543 (landlord share gov payments) • As GVSALES changes, ECONCLS changes . • One of our calibration targets is ECONCLS – Movement between ECONCLS required us to re-calibrate. Summer Conference Preview/Review 2014 July 22 nd 2014 20

  21. Summer Conference Preview/Review 2014 July 22 nd 2014 21

  22. Summer Conference Preview/Review 2014 July 22 nd 2014

  23. Summer Conference Preview/Review 2014 July 22 nd 2014

  24. Other and Future • Checks for ill-conditioned matrices • Stress Test and Document I/O Functions • Tuning Other Parameters of the Program • Speed Summer Conference Preview/Review 2014 July 22 nd 2014 24

  25. Thank You! “. . . providing timely, accurate, and useful statistics in service to U.S. agriculture.”

Recommend


More recommend