Understanding data & statistical terminology “You can have data without information, but you cannot have information without data.” Daniel Keys Moran (Computer programmer and science fiction author)
Trembling aspen ( Populus tremuloides ) Western North America distribution Central location of aspen population n=100 samples take at each Aspen frequency % Forest Cover
Trembling aspen ( Populus tremuloides ) Western North America distribution Central location of aspen population n=100 samples take at each Can the data we collect in AB realistically be used to make inferences about Aspen in Colorado? So what is our population? Aspen frequency % Forest Cover
Trembling aspen ( Populus tremuloides ) Western North America distribution High Level Central location of aspen population n=100 samples take at each What if all the samples came from northern mixedwood ecosystems north of High Level, AB? What can we realistically make inferences about? Aspen frequency % Forest Cover
Example 1: Lentil dataset (Your new best friend) C A B Plot 1 Variety in each B C A C A B C A Farm 1 C A B A B C A C B C A Do yield of different lentil varieties differ at 2 farms? Do the varieties differ among Farm 2 themselves? Individual lentil plants
Golden rules for data tables 1. A row represents a unit – All measurements of a unit should normally be in the same row. – Different units must be in different rows. – Important to think about what your units are
Golden rules for data tables 2. If in doubt, add more rows – If possible, use categorical (character) variables to indicate the independent effects (treatments, environments). – Repeat measurement (e.g. time series data) normally get individual rows (e.g. time is added as a column) – It is always easy to convert a long table to a wide table (Excel Pivot), but not vice versa.
Example 2: Animal tracks Conifer dominated Deciduous dominated Forest stand Animal tracks Transect Is there a difference in the use of forest corridors in different stand types by ungulates?
Other useful statistical terms • Experiment – any controlled process of study which results in data collection, and which the outcome is unknown • Descriptive statistics – numerical/graphical summary of data • Inferential statistics – predict or control the values of variables (make conclusions with) • Statistical inference – to makes use of information from a sample to draw conclusions (inferences) about the population from which the sample was taken • Parameter – an unknown value (needs to be estimated) used to represent a population characteristic (e.g. population mean) • Statistic – estimation of parameter (e.g. mean of a sample) • Sampling distribution (aka. Probability distribution or Probability density function) – probability associated with each possible value of a variable • Error - difference between an observed value (or calculated) value and its true (or expected) value
Recommend
More recommend