statistical methods for plant biology
play

Statistical Methods for Plant Biology PBIO 3150/5150 Anirudh V. S. - PowerPoint PPT Presentation

Statistical Methods for Plant Biology PBIO 3150/5150 Anirudh V. S. Ruhil January 14, 2016 The Voinovich School of Leadership and Public Affairs 1/29 Table of Contents 1 Visualizing Data 2 Displaying Frequency Distributions 3 Associations


  1. Statistical Methods for Plant Biology PBIO 3150/5150 Anirudh V. S. Ruhil January 14, 2016 The Voinovich School of Leadership and Public Affairs 1/29

  2. Table of Contents 1 Visualizing Data 2 Displaying Frequency Distributions 3 Associations Between Categorical Variables 4 Comparing Numerical Variables 5 Principles of Effective Displays 2/29

  3. Visualizing Data

  4. Minard’s Map 4/29

  5. Super Storm & NYC? 5/29

  6. Displaying Frequency Distributions

  7. Frequency Tables: Categorical Data Table 1: Frequencies Table 2: Relative Frequencies Cause No.deaths No.deaths Accidents 6688 Accidents 0.49 Homicide 2093 Homicide 0.15 Suicide 1615 Suicide 0.12 Malignant tumor 745 Malignant tumor 0.05 Heart disease 463 Heart disease 0.03 Congenital abnormalities 222 Congenital abnormalities 0.02 Chronic respiratory disease 107 Chronic respiratory disease 0.01 Influenza and pneumonia 73 Influenza and pneumonia 0.01 Cerebrovascular diseases 67 Cerebrovascular diseases 0.00 Other tumor 52 Other tumor 0.00 All other causes 1653 All other causes 0.12 7/29

  8. Bar Graphs: Categorical Data Data on humans killed by tigers while victims engaged in specific activities by tigers near Chitwan National Park (Nepal) Frequency Grass/fodder 44 Forest products 11 Fishing 8 Herding 7 Disturbing tiger kill 5 Fuelwood/timber 5 Sleeping in house 3 Walking 3 Toilet 2 Sum 88 8/29

  9. Histograms: Numerical Data Data from survey of breeding birds of Organ Pipe Cactus National Monument in southern Arizona 9/29

  10. Tarsus lengths (in mm) of Wrens Group Freq Relative Freq Cumulative Freq Relative Cumulative Freq [16.5,17) 1.00 0.03 1.00 0.03 [17,17.5) 1.00 0.03 2.00 0.06 [17.5,18) 10.00 0.29 12.00 0.35 [18,18.5) 13.00 0.38 25.00 0.74 [18.5,19) 6.00 0.18 31.00 0.91 [19,19.5) 3.00 0.09 34.00 1.00 • [ and ) indicate “left-closed” and “right-open”, i.e., • include 16.5 and everything up to but not including 17 in the first group • Include 17 in the second group and then everything above 17 up to but not including 17.5 • ... and so on • Relative Freq is Frequency divided by the total number of units in the sample. For e.g., 1 34 = 0 . 03; 10 34 = 0 . 29 • Cumulative Freq for a group is the group’s frequency + all preceding frequencies 10/29

  11. Plotting Tarsus Lengths Tarsus length (in mm) of Wrens 8 1.0 6 0.8 Cumulative Frequencies 0.6 Frequency 4 0.4 0.2 2 0.0 16.5 17.0 17.5 18.0 18.5 19.0 19.5 0 17 18 19 Tarsus Length Tarsus Length (in mm) 11/29

  12. Describing the Shape of a Histogram 12/29

  13. Key Points About Histograms • Histograms vary ... Symmetric (cells split symmetrically) 1 Skewed Left (easy exam so most score high, only a few low 2 scores) Skewed Right (tough exam so most score low, only a few high 3 scores) Uniform (Penguins) 4 Bimodal (interval between geyser eruptions, drug inactivity in 5 humans) • Watch your bin width ... alters the shape of the Histogram Some pre-set rules: 1 • Sturges: h = 1 + ln ( n ) ln ( 2 ) ; then round up to nearest integer � IQR � • Freedman–Diaconis: h = 2 1 n 3 • Scott: h = 3 . 5 σ √ n 3 13/29

  14. Associations Between Categorical Variables

  15. Associations Between Categorical Variables • Many ways to evaluate how two or more categorical variables are related • Easiest method is a contingency table • Note: Columns = Explanatory variable; Rows = Outcome of interest (i.e., Response variable) • Does reproduction make the wild great tit (Parus major) more susceptible to malaria? ··· see below Experimental Treatment Group Control Egg-Removal Row Total Malaria 7 15 22 No Malaria 28 15 43 Column Total 35 30 65 15/29

  16. Grouped Bar Graphs Definition Grouped bar graphs show the frequency of all combinations of two or more categorical variables 16/29

  17. Mosaic Plot Definition Mosaic plots use the area of rectangles to display the relative frequency of occurrence of all combinations of two or more categorical variables Experimental Treatment Group Control Egg-Removal Row Total Malaria 7 15 22 No Malaria 28 15 43 Column Total 35 30 65 17/29

  18. Comparing Numerical Variables

  19. Comparing Histograms across Groups 0.4 • Do indigenous peoples who live 0.2 USA at high altitudes have 0 m physiological attributes that 0 0.4 compensate for oxygen deprivation? Relative frequency 0.2 Tibet 4000 m • Beall et al. (2002) shed some 0 light; USA (sea-level) versus 0.4 three high-altitude populations 0.2 Ethiopia • Andean males have higher 3530 m 0 concentrations of hemoglobin 0.3 but not so Tibetan and Andes 0.2 4000 m Ethiopian males (compared to 0.1 0 American males) 11 12 13 14 15 16 17 18 19 20 21 22 23 Hemoglobin concentration (g/dl) 19/29

  20. Comparing Cumulative Frequencies across Groups Cumulative relative frequency 1.0 0.8 0.6 0.4 U.S. 0 m Ethopia 3530 m 0.2 Tibet 4000 m Andes 4000 m 0 11 13 15 17 19 20 21 Hemoglobin concentration (g/dl) 20/29

  21. Displaying Relationships between Numerical Pairs • What explains bright colors and elaborate courtship displays of the males of many species? 1.5 • Brooks (2000) gives us some Son’s attractiveness 1.0 clues • Explored how fathers’ 0.5 ornamentation (a composite index of color & brightness) is 0 related to sons’ attractiveness (rate of female visits to corralled � 0.5 0 0.2 0.4 0.6 0.8 1.0 1.2 males, relative to a standard) Father’s ornamentation • Presumably females are attracted to more ornamented males 21/29

  22. Line Graphs Definition Line graphs connect observations ordered over time (or some other ordered dimension) • Lynx pelts turned in at fur trading posts in Canada 8000 (1752–1819) 6000 Lynx fur returns • Line graph shows patterns over time 4000 • Note a cyclical pattern of peaks 2000 and troughs • Note also the steep slopes 0 1760 1780 1800 1820 Year • Useful for multiple time series so long as it isn’t too cluttered 22/29

  23. Maps 500 • Ozone concentrations on � 60 60 400 October 6, 1987 over the Total ozone (DU) Southern Hemisphere 300 • Center is the South Pole, outer edge is 15 degrees south of the 200 � 120 120 equator 100 • Heat Map shows varying levels 180 of Ozone concentrations (note the “hole” above the South Pole) • Note: Maps can also be a graphic with a heatmap; see here for Brain mapping project 23/29

  24. Mapping the Path of Super Typhoon Yolanda (Haiyan) Source : Analysis with Programming 24/29

  25. This map shows carbon emissions from the consumption of goods, with red marking high rates of emissions and green marking low. Source : City Carbon Footprint 25/29

  26. Principles of Effective Displays

  27. Making Effective Displays • Show as much data as you can 30 30 Number of stingless bees • Plot 1 (left) hints at a 20 20 curvilinear link between 10 10 Africanized honeybees and 0 0 0 10 20 30 40 50 0 10 20 30 40 50 stingless bees Number of Africanized honeybees Number of Africanized honeybees $7000 • Adding the actual data Education spending ($ per student) $6748 points shows more details $6529 $6000 $6455 • Do not distort magnitudes. $6328 $6216 y-axis must start at 0 $6200 $5983 • Minimize chartjunk ... for e.g., $5844 $5800 98/99 99/00 00/01 01/02 02/03 03/04 04/05 three-dimensional bars, shadow Education spending ($ per student) 8000 effects, etc. 6000 • Avoid jargon for non-technical 4000 audience 2000 • Data graphic � = work of art; must 0 1998 1999 2000 2001 2002 2003 2004 be informative 27/29

  28. • Too much data or too complex a plot can defeat the purpose of visualizations • See this map – simultaneously plots linguistic richness and diversity of bird species • You could improve this display with a better color scheme, Species richness maybe some labeling of Low-Low, Low-High, High-Low, and High-High blocks Language richness • Avoid red-green colors; one-fifth of males cannot distinguish WTF Data Visualizations between shades of these colors • Better yet, if you can avoid colors altogether, do so 28/29

  29. Selecting your graphic Nominal or Ordinal variable(s) 1 • Frequency Table • Bar-chart • Mosaic plot Continuous or Discrete variable(s) 2 • Grouped Frequency table • Grouped Histogram • Line graph • Scatter plot • Box-plot (coming soon) • Ogive curves (coming soon) • Strip charts (coming soon) • Violin plots (coming soon) 29/29

Recommend


More recommend