univariate graphics
play

Univariate Graphics STAT 133 Gaston Sanchez Department of - PowerPoint PPT Presentation

Univariate Graphics STAT 133 Gaston Sanchez Department of Statistics, UCBerkeley gastonsanchez.com github.com/gastonstat/stat133 Course web: gastonsanchez.com/stat133 Looking at one single variable 2 Univariate Statistical Graphics


  1. Univariate Graphics STAT 133 Gaston Sanchez Department of Statistics, UC–Berkeley gastonsanchez.com github.com/gastonstat/stat133 Course web: gastonsanchez.com/stat133

  2. Looking at one single variable 2

  3. Univariate Statistical Graphics Getting started with graphics for exploration requires underdstanding charts and plots for single variables 3

  4. Univariate Statistical Graphics Getting started with graphics for exploration requires underdstanding charts and plots for single variables Many multivariate graphics are extensions or combinations of univariate charts 3

  5. Univariate graphics by type of variable Qualitative Variable Quantitative Variable ◮ Bar chart ◮ All of qualitative ◮ Dot chart ◮ Histogram ◮ Pie chart ◮ Density curve ◮ Pareto chart ◮ Boxplot ◮ Ogive 4

  6. Bar Charts 5

  7. From Frequency Tables ... Category Absolute Relative Frequency Frequency C 1 f 1 f 1 /n C 2 f 2 f 2 /n C 3 f 3 f 3 /n . . . . . . . . . C k f k f k /n total n 1 6

  8. to Bar-charts f2 f5 f3 frequency f1 f4 C1 C2 C3 C4 C5 7

  9. Bar-charts Elements of vertical bar-charts ◮ categories on horizontal axis ◮ frequencies on vertical axis ◮ length of bar equal to frequency (Note that you can also make a horizontal bar-chart, in which case the axes play inverse roles) 8

  10. Bar-chart: predominant color in flags 9

  11. Predominant Color in Flags ## color count percent ## 1 black 5 2.58 ## 2 blue 40 20.62 ## 3 brown 2 1.03 ## 4 gold 19 9.79 ## 5 green 31 15.98 ## 6 orange 4 2.06 ## 7 red 71 36.60 ## 8 white 22 11.34 10

  12. Bar-chart example 71 70 60 50 40 40 31 30 22 19 20 10 5 4 2 0 black blue brown gold green orange red white 11

  13. Bar-chart: predominant color in flags 40% 36.6% 30% 20.6% 20% 16% 11.3% 9.8% 10% 2.6% 2.1% 1% 0% black blue brown gold green orange red white 12

  14. Bar-chart: predominant color in flags 71 70 60 50 40 40 31 30 22 19 20 10 5 4 2 0 brown orange black gold white green blue red 13

  15. Bar-chart: predominant color in flags 71 70 60 50 40 40 31 30 22 19 20 10 5 4 2 0 red blue green white gold black orange brown 14

  16. Bar-chart: predominant color in flags brown 2 orange 4 black 5 gold 19 white 22 green 31 40 blue red 71 0 10 20 30 40 50 60 70 15

  17. Dot charts 16

  18. Dot charts ◮ Dot-charts are very similar to bar charts. ◮ Instead of using bars, dot-charts display frequencies with dots. ◮ They are simpler and cleaner than bar charts ◮ They are also less used than bar charts 17

  19. Dot-chart: predominant color in flags white red orange green gold brown blue black 0 10 20 30 40 50 60 70 80 18

  20. Ranked Dot-charts red blue green white gold black orange brown 0 10 20 30 40 50 60 70 80 19

  21. Ranked dot-chart patterns all values roughly the same differences decrease by roughly the same amount differences from one value to differences from one value the next vary significantly to the next increase 20

  22. Ranked dot-chart patterns differences from one value shifting differences from to the next decrease one value to the next one or more values are extraordinarily different from the rest 21

  23. Pareto charts 22

  24. Bar-chart with Pareto Line 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% red blue green white gold black orange brown 23

  25. Bar-chart with Pareto Line 100% 100% 98.97% 96.91% 90% 94.33% 84.54% 80% 70% 73.2% 60% 57.22% 50% 40% 36.6% 30% 20% 10% 0% red blue green white gold black orange brown 24

  26. Bar-chart with Pareto Line 100% 100% 98.97% 96.91% 90% 94.33% 84.54% 80% 70% 73.2% 60% 57.22% 50% 40% 36.6% 30% 20% 10% 0% red blue green white gold black orange brown 25

  27. Pareto charts ◮ Pareto charts contains both bars and a line graph ◮ Individual values are representing in descending order ◮ Cumulative frequencies are represented by the line ◮ The left vertical axis is the frequency of occurrence 26

  28. Pie charts 27

  29. Pie Chart brown gold blue green black orange white red 28

  30. Donut Chart brown gold blue green black orange white red 29

  31. Pie charts disadvantages ◮ Pie charts force us to compare either 2-D areas formed by each slice or the angles formed ◮ Visual perception handles neitheir of these comparisons easily or accurately 30

  32. Univariate Quantitative Charts 31

  33. NFL Ticket prices (2013) ## teams tickets teams tickets ## 1 cowboys 110.20 falcons 83.71 ## 2 patriots 117.84 vikings 78.69 ## 3 giants 111.69 rams 74.49 ## 4 bears 103.60 seahawks 71.21 ## 5 jets 110.28 cardinals 79.56 ## 6 redskins 94.80 dolphins 71.14 ## 7 ravens 100.19 raiders 64.80 ## 8 eagles 93.01 titans 65.28 ## 9 texans 88.98 lions 67.60 ## 10 chargers 84.55 bengals 68.96 ## 11 steelers 81.13 jaguars 68.44 ## 12 packers 82.61 chiefs 64.92 ## 13 49ers 83.54 buccaneers 63.59 ## 14 saints 74.99 bills 57.75 ## 15 broncos 84.27 panthers 66.84 ## 16 colts 86.32 browns 54.20 32

  34. Bar charts for quantitative variables ◮ We can use bar charts with quantitative variables ◮ In this case we need to first categorize the variable, and then get a frequency table 33

  35. Frequency Table of Ticket Prices Category Absolute Relative Name Frequency Frequency Below $ 70 10 0.3125 $ 70 - $ 99.99 16 0.5000 $ 100 or above 6 0.1875 Total 32 1.00 34

  36. NFL Ticket prices (2013) 16 14 12 absolute frequency 10 8 6 4 2 0 below $70 $70 − $99.99 $100 or above 35

  37. Histograms 36

  38. Histograms Histograms provide a way of viewing the general distribution of values in a quantitative variable 37

  39. NFL Ticket prices (2013) 8 6 frequency 4 2 0 50 60 70 80 90 100 110 120 price 38

  40. Building a Histogram 1. Partition of values : The range of the data values is partitioned into a number of non-overlapping “cells” or bins. 2. Counting frequencies : The number of data values falling into each cell is counted (either absolute or relative freqs) 3. Drawing Bars : The observations falling into a cell are represented as a “bar” drawn over the cell 39

  41. About Histograms ◮ The bins represent ranges of values ◮ The bins (intervals) must be adjacent, and usually of equal size ◮ The bars are adjacent (not discontinuous) ◮ The areas of the bars are meaningful ◮ Height of bars equal to the frequency ◮ Width equal to the bin size ◮ The area of a bar gives the proportion of data values which fall in the bin 40

  42. Histogram with 4 bins Histogram of price tickets 12 10 8 Frequency 6 4 2 0 54 62 70 78 86 94 102 110 118 price 41

  43. Histograms with different bins Histogram of price tickets (4 bins) Histogram of price tickets (5 bins) 14 14 12 12 10 10 frequency frequency 8 8 6 6 4 4 2 2 0 0 50 60 70 80 90 100 110 120 130 40 60 80 100 120 140 price price Histogram of price tickets (6 bins) Histogram of price tickets (7 bins) 10 8 8 6 frequency frequency 6 4 4 2 2 0 0 50 62 74 86 98 110 122 50 60 70 80 90 100 110 120 price price 42

  44. Avoid too few and too many bins Histogram of price tickets (3 bins) 16 14 12 frequency 10 8 6 4 2 0 50 80 110 140 price Histogram of price tickets (14 bins) 6 5 frequency 4 3 2 1 0 50 55 60 65 70 75 80 85 90 95 100 105 110 115 120 price 43

  45. About Histograms ◮ The shape of a histogram depends on the chosen bins ◮ This suggests that there is a fundamental instability at the heart of its construction ◮ The bars are adjacent (not discontinuous) ◮ The areas of the bars are meaningful 44

  46. Histogram patterns Symmetrical Skewed to the left Skewed to the right 45

  47. Histogram patterns Curved Flat or Uniform Curved Downward Multiple peaks (e.g. bimodal, trimodal. etc) 46

  48. Histogram patterns Concentration Gap Outlier 47

  49. Histogram patterns Peak Concentration Concentration 48

  50. Box plots 49

  51. Building a Histogram 1. Box-and-whisker plots , most commonly known as “box plots” 2. created by John Tukey 3. simple and effective way to display the distribution of values 4. relies on the so-called 5-summary indicators 50

  52. Box plots based on 5-number summary 5 summary indicators 51

  53. Box plots based on 5-number summary 5 summary indicators 1. minimum 2. 25th percentile (1st quartile) 3. 50th percentile (2nd quartile, or median) 4. 75th percentile (3rd quartile) 5. maximum 51

  54. Box plot basics whiskers Box 52

  55. Box plot basics High Value (max) 75th percentile (Q 3 ) Range IQR or midspread Median, 50th percentile (Q 2 ) (100% of values) (50% of values) 25th percentile (Q 1 ) Low Value (min) 53

  56. NFL Ticket Prices 55 60 65 70 75 80 85 90 95 100 105 110 115 120 price 54

  57. 5 number summary min Q1 median Q3 max | | | | | 55 60 65 70 75 80 85 90 95 100 105 110 115 120 price 55

  58. Box plot 55 60 65 70 75 80 85 90 95 100 105 110 115 120 price 56

Recommend


More recommend