data management
play

Data Management Not everything that can be counted counts, and not - PowerPoint PPT Presentation

Data Management Not everything that can be counted counts, and not everything that counts can be counted. Albert Einstein (Physicist) Golden rules for data tables 1. A row represents a unit All measurements of a unit should normally be


  1. Data Management “Not everything that can be counted counts, and not everything that counts can be counted.” Albert Einstein (Physicist)

  2. Golden rules for data tables 1. A row represents a unit – All measurements of a unit should normally be in the same row. – Different units must be in different rows. – Important to think about what your units are

  3. Golden rules for data tables 2. If in doubt, add more rows – If possible, use categorical (character) variables to indicate the independent effects (treatments, environments). – Repeat measurement (e.g. time series data) normally get individual rows (e.g. time is added as a column) – It is always easy to convert a long table to a wide table (Excel Pivot), but not vice versa.

  4. Golden rules for data tables 3. Use strong IDs

  5. Weak IDs

  6. Strong IDs

  7. Golden rules for data tables 4. A column represents a variable – Each column is a different independent or dependent variable – Every column has to have a name • Don’t start names with symbols or numbers • Avoid duplicate columns names • Avoid units – keep them as meta data

  8. Golden rules for data tables 5. Keep a metafile with information about your datafile – If possible, keep record of how your data was collected • latitude/longitude of sites, slope, aspect • who collected it – Keep record of useful information • What each of your variable names stand for • Measurement units • resolution of spatial files

  9. Golden rules for data tables 6. Modify your raw data entries with R scripts – Easy to do a change something and re-run the analysis (e.g. with or without outliers) – Hunting down and fixing errors is efficient, because script leaves a perfect trail of what you did – Save yourself from repetitive tasks (that likely introduce errors)

  10. The Data Table Concept Type 1: Multiple populations Crop variety Dependent variables Sample of population that you want to learn something about

  11. The Data Table Concept Type 2: Single populations Independent variables Dependent variable You can think of this representing a population: crop grown without fertilizer

  12. Variable/Data types • Nominal : qualitative measurement where categories or numbers ONLY label the object being measured or identify the object as belonging to a category E.g. - Forest plots identified by 1-10 or by location - Qualitative categories: Low-Medium-High or Male/Female, etc. Don’t calculate statistics – how do you take a mean of male/female? • Ordinal: quantitative measurement that indicates a relative amount, arranged in rank order, but DOES NOT imply and equal distance between points E.g. – Ranking of growth performance of 10 trees, where 1 is worst and 10 is best Percentiles or Non-parametric statistics ONLY • Interval: quantitative measurement that indicates BOTH the order of magnitude AND implies equal intervals between the measurements. NOTE: These measurements have ARBITRARY ZEROS E.g. – Temperature ( ◦ C) All statistics allowed , but no × or ÷ (alternative % change) • Ratio: quantitative measurement where numbers indicate a measure with EQUAL intervals and a TRUE ZERO E.g. – Precipitation (156mm) – Frequencies (counts of just about anything) All statistics allowed

  13. Variable/Data types • Discrete: values may only fall at particular points on the scale of measurement and cannot exist between points E.g. Number of trees, number of cones, etc. • Continuous: values can fall anywhere on an unbroken scale of measurements with real limits E.g. temperature, height, volume of fertilizer, etc.

  14. Learning Objectives - Lab 2 • Learn a complete set of commands to automate data preparation in R & SAS. • Work through some simplified examples to understand how they can be applied • Try to apply scripts to your own data • If you run into problems with your own data: let’s solve them together.

Recommend


More recommend