characterizing discrepancies in reported a acreage
play

Characterizing Discrepancies in Reported A Acreage between the - PowerPoint PPT Presentation

Characterizing Discrepancies in Reported A Acreage between the Census of Agriculture b t th C f A i lt and June Agricultural Survey Michael E. Bellow Heather Ridolfo Heather Ridolfo National Agricultural Statistics Service United States


  1. Characterizing Discrepancies in Reported A Acreage between the Census of Agriculture b t th C f A i lt and June Agricultural Survey Michael E. Bellow Heather Ridolfo Heather Ridolfo National Agricultural Statistics Service United States Department of Agriculture DC-AAPOR/WSS Summer Conference / Aug. 3, 2015 Washington, DC

  2. Outline • Background • Methods Methods • Results – Descriptive Graphics – Logistic Regression g g • Summary and Implications

  3. Research Question What factors were most influential on the large discrepancies in reported acreage operated p p g p between the 2012 JAS and COA?

  4. Background g • 2007 Classification Error Survey (CES) 2007 Cl ifi ti E S (CES) – Misclassification (farms classified as non-farms and vice versa) – Substantial acreage discrepancies between Census of Agriculture (COA) and June Agricultural Survey (JAS) for land related variables (COA) d J A i l l S (JAS) f l d l d i bl (e.g., total acres operated) • Re-interviews conducted on 147 operations found that acreage discrepancies were due to: – Actual changes in acreage over period between JAS and COA Actual changes in acreage over period between JAS and COA – Reporting errors – Change in respondents 2012 - large acreage discrepancies found again

  5. Definition of Total Acres Operated Total Acres Operated = (Acres owned) + (Acres rented/leased from others) (Acres rented/leased from others) – (Acres rented/leased to others) ( / )

  6. June Agricultural Survey (JAS) g y • Area frame based sample survey conducted A f b d l d t d annually in June • Sampling unit is segment (generally 1 square • Sampling unit is segment (generally 1 square mile), divided into tracts • Data collected on U S crops livestock grain • Data collected on U.S. crops, livestock, grain storage capacity, type and size of farm for tracts within sampled segments within sampled segments • Two week data collection period (first half of the month) • Face-to-face interviewing

  7. Census of Agriculture (COA) g • Complete enumeration of U.S. farms and ranches conducted every 5 years y y • Data collected on land use and ownership, operator characteristics income expenditures operator characteristics, income, expenditures and farming practices for the previous year • Multiple frame (area and list) • Primarily mail survey • Primarily mail survey

  8. Combined JAS/COA Data Set • JAS records matched to corresponding records in two COA datasets (unedited and edited) • Total number of matched records = 25,983 • Some COA records were linked to multiple JAS Some COA records were linked to multiple JAS records, each reporting data for the entire operation • Some JAS records were linked to multiple COA • Some JAS records were linked to multiple COA records (mainly ‘split’ operations)

  9. Adjusted Percent Difference (APD) APD = 100*(COA-JAS)/(COA+100) (if COA>JAS) = 100*(JAS-COA)/(JAS+100) (otherwise) Example - Example - COA JAS %Diff APD 7 5 29 1.9 700 500 29 25

  10. Exploratory Data Analysis p y y • Records for which APD of total acres operated is 25 or higher defined to be discrepant g p • 23% of operations (nationwide) identified as discrepant discrepant • Dependent variable in logistic regression is binary for acreage discrepancy (1 if discrepant, 0 otherwise) 0 otherwise)

  11. Explanatory Variables p y • Farm type (crop vs livestock) • Land rented from others (acres) ( ) • Land rented to others (acres) • Number of operators • Number of operators • Operator tenure (years operating farm) • Average drought level during JAS (county level) • Mode of COA data collection (face-to-face, CATI, etc.) ( , , ) • Time between JAS and COA (days)

  12. Drought Intensity Data Set g y • Obtained from Univ. of Nebraska’s National Drought Mitigation Center (NMDC) • • Drought Monitor Classification Scheme (DMCS) - Drought Monitor Classification Scheme (DMCS) - - six levels of drought ranging from ‘none’ to ‘exceptional’ recorded weekly at county level from May 29 – June 25, 2012 recorded weekly at county level from May 29 June 25, 2012 - data sets give percent of county’s area classified to each drought level g - overall county level average drought level computed from data

  13. Effect of Data Editing Data Set  JAS/Unedited / JAS/Edited COA / COA No. Records 25,983 25,983 Discrepant Records 6,601 (25.4%) ( ) 5,958 (22.9%) ( ) Discrepant Records 1,351 (20.5%) _ Edited Discrepancies _ 745 (55%) Resolved Non Discrepancies Non-Discrepancies _ 102 (11%) 102 (11%) Broken

  14. Preliminary Findings (From Exploratory Data Analysis) Data Analysis) More Discrepancy If: More Discrepancy If Independent Variable: • Livestock farm • Farm type (crop or li livestock) k) • Multiple operators • Number of operators • Newer operators • Newer operators • Operator tenure • Higher drought level • Drought level during JAS • Phone/CATI Ph /CATI • Mode of COA data collection • Longer time • Time between JAS and COA L ti Ti b t JAS d COA

  15. Logistic Regression g g • Goal – model probability of discrepancy as function of independent (explanatory) variables • Wald Chi-Square Statistic – used to test whether regression parameter estimate for a given independent variable is significantly different from zero • Odds Ratio – measures strength of association between dependent variable and a given p g independent variable

  16. Results of Logistic Regression g g Independent Variable Wald Test Odds Ratio Chi-Square P-Value Value 95% Confidence Statistic Interval Livestock Farm* Li t k F * 20 1 20.1 < 0001 <.0001 1 148 1.148 [1 081 1 219] [1.081-1.219] No. Operators 2.6 0.11 1.027 [0.994-1.06] Operator Tenure Operator Tenure 2 19 2.19 0 14 0.14 1 002 1.002 [1 0-1 004] [1.0 1.004] Avg. Drought Level 86.1 <.0001 1.137 [1.107-1.169] Mode = Phone/CATI* / 6.9 0.009 1.198 [1.047-1.371] [ ] Mode = EDR (Web)* 0.31 0.58 0.973 [0.883-1.072] Mode = FTF/CAPI* 0.26 0.61 0.963 [0.832-1.114] Days (JAS to COA) 12.0 0.0005 1.001 [1.001-1.002] * - binary variable y

  17. Summary and Future Work y • Six explanatory variables found to be significant in logistic Si l t i bl f d t b i ifi t i l i ti regression based on Wald chi-square test • Of those variables, livestock farm , average drought level and f g g phone/CATI showed most influence in terms of which farms have discrepancies and which do not • Next phase of research effort Next phase of research effort - explore explanatory variables further - probe largest outliers - investigate odd patterns (e.g. 60+ records with COA total land = 1, JAS investigate odd patterns (e g 60+ records with COA total land 1 JAS total land > 100) - data mining techniques (classification trees, clustering)?

  18. Acknowledgments g • Denise Abreu i b • Mark Apodaca • Mark Gorsak • Noemi Guindin Noemi Guindin • Thomas Jacob • Andrea Lamas A d L • Jaki McCarthy

  19. Questions/Comments? Michael E. Bellow, USDA/NASS Sampling and Estimation Research Section Mike.Bellow@nass.usda.gov e e o @ ass usda go Heather Ridolfo, USDA/NASS Survey Methodology and Technology Section Survey Methodology and Technology Section Heather.Ridolfo@nass.usda.gov

Recommend


More recommend