stat 113 describing categorical data i
play

STAT 113 Describing Categorical Data I Colin Reimer Dawson Oberlin - PowerPoint PPT Presentation

Outline Describing One Categorical Variable Relationships Between Categorical Variables STAT 113 Describing Categorical Data I Colin Reimer Dawson Oberlin College September 11, 2020 1 / 25 Outline Describing One Categorical Variable


  1. Outline Describing One Categorical Variable Relationships Between Categorical Variables STAT 113 Describing Categorical Data I Colin Reimer Dawson Oberlin College September 11, 2020 1 / 25

  2. Outline Describing One Categorical Variable Relationships Between Categorical Variables Outline Describing One Categorical Variable Relationships Between Categorical Variables Contingency Tables Conditional Proportions 2 / 25

  3. Outline Describing One Categorical Variable Relationships Between Categorical Variables Outline Describing One Categorical Variable Relationships Between Categorical Variables Contingency Tables Conditional Proportions 3 / 25

  4. Outline Describing One Categorical Variable Relationships Between Categorical Variables A data frame with a single categorical variable Frequency 49 N/R 65 Daily 25 N/R 74 Weekly 18 Monthly 91 Monthly 47 Weekly 24 N/R 71 Monthly 37 Monthly Table: Partial Results of a Survey of College Students on Frequency of Video Game Playing (via Nolan and Speed, 2000) 4 / 25

  5. Outline Describing One Categorical Variable Relationships Between Categorical Variables Frequency Tables How can we summarize a categorical variable? One option is simply a frequency table . Daily Weekly Monthly Semesterly N/R Total 9 28 18 23 13 91 Table: Results of a Survey of College Studentson Frequency of Video Game Playing (via Nolan and Speed, 2000) 5 / 25

  6. Outline Describing One Categorical Variable Relationships Between Categorical Variables Relative Frequency Tables If we use proportions or percentages, we have a relative frequency table . Daily Weekly Monthly Semesterly N/R Total 0.100 0.310 0.200 0.250 0.140 1.000 Table: Results of a Survey of College Students on Frequency of Video Game Playing (via Nolan and Speed, 2000) 6 / 25

  7. Outline Describing One Categorical Variable Relationships Between Categorical Variables Pie Charts The pie chart is a popular choice for proportion data... Weekly ( 31%) Daily ( 10%) Monthly ( 20%) N/R ( 14%) Semesterly ( 25%) Figure: Results of a Survey of College Students on Frequency of 7 / 25 Video Game Playing (via Nolan and Speed, 2000)

  8. Outline Describing One Categorical Variable Relationships Between Categorical Variables Pie Charts... Figure: http://www.businessinsider.com/ pie-charts-are-the-worst-2013-6 8 / 25

  9. Outline Describing One Categorical Variable Relationships Between Categorical Variables Pie Charts... Figure: http://www.businessinsider.com/ pie-charts-are-the-worst-2013-6 • “Pie charts are the Nickelback of data visualization.” 8 / 25

  10. Outline Describing One Categorical Variable Relationships Between Categorical Variables Pie Charts... Figure: http://www.businessinsider.com/ pie-charts-are-the-worst-2013-6 • “Pie charts are the Nickelback of data visualization.” • “Pie charts are the Aquaman of data visualization.” 8 / 25

  11. Outline Describing One Categorical Variable Relationships Between Categorical Variables The One Exception 9 / 25

  12. Outline Describing One Categorical Variable Relationships Between Categorical Variables Bar Charts Much easier to see differences between categories. 40 40 30 30 % of respondents # of respondents 20 20 10 10 0 0 Daily Weekly Monthly Semesterly N/R Daily Weekly Monthly Semesterly N/R Figure: Two bar plots of Video Game data showing frequency (left) and percentages (right) 10 / 25

  13. Outline Describing One Categorical Variable Relationships Between Categorical Variables Bar Charts What’s this bar chart telling us? Figure: A Fair and Balanced Bar Chart (from FOX News, 8/9/12) 11 / 25

  14. Outline Describing One Categorical Variable Relationships Between Categorical Variables The Cardinal Rule of Bar Charts The cardinal rule of bar charts Ratios in area must correspond to ratios in value • The y -axis must start at 0! • Equal visual space for equal numerical differences 12 / 25

  15. Outline Describing One Categorical Variable Relationships Between Categorical Variables Outline Describing One Categorical Variable Relationships Between Categorical Variables Contingency Tables Conditional Proportions 13 / 25

  16. Outline Describing One Categorical Variable Relationships Between Categorical Variables Outline Describing One Categorical Variable Relationships Between Categorical Variables Contingency Tables Conditional Proportions 14 / 25

  17. Outline Describing One Categorical Variable Relationships Between Categorical Variables Contingency Tables • With one categorical variable, summarize by counting the observations in each category 15 / 25

  18. Outline Describing One Categorical Variable Relationships Between Categorical Variables Contingency Tables • With one categorical variable, summarize by counting the observations in each category • With more than one variable, count combinations . 15 / 25

  19. Outline Describing One Categorical Variable Relationships Between Categorical Variables Contingency Tables • With one categorical variable, summarize by counting the observations in each category • With more than one variable, count combinations . • With two variables, we can store the counts in a two-way table (also known as a contingency table ). 15 / 25

  20. Outline Describing One Categorical Variable Relationships Between Categorical Variables A Simple Contingency Table Student Year Computer 1 2nd PC Computer 2 3rd Mac PC Mac 3 3rd PC 1st 1 1 4 1st PC = ⇒ 2nd 3 0 5 2nd PC Year 3rd 1 1 6 1st Mac 4th 0 1 7 4th Mac 8 2nd PC 16 / 25

  21. Outline Describing One Categorical Variable Relationships Between Categorical Variables Outline Describing One Categorical Variable Relationships Between Categorical Variables Contingency Tables Conditional Proportions 17 / 25

  22. Outline Describing One Categorical Variable Relationships Between Categorical Variables Proportions within a Context Example: Driving While Black/Brown Armentrout, et al. (2007) 1 reports data on traffic stops by the Los Angeles Police Department (LAPD). Two of the variables recorded are race of the driver and whether or not the vehicle was searched . Question of interest: Of stops, is the proportion that result in a search different for different races of driver? 1 Armentrout, M., Goodrich, A., Nguyen, J., Ortega, L., Smith, L., & Khadjavi, L.S. (2007). Cops and stops: Racial profiling and a preliminary statistics analysis of Los Angeles police department traffic stops and searches. Retrieved from http://www.public.asu.edu/ etcamach/AMSSI/reports/copsnstops.pdf 18 / 25

  23. Outline Describing One Categorical Variable Relationships Between Categorical Variables Proportions in a Context Question of interest: Of stops, is the proportion that result in a search different for different races of driver? Hisp./Lat. White Black Asian Others Total Searched 510 109 240 16 7 882 Not Searched 1826 2081 1008 486 104 5505 Total 2336 2190 1248 502 111 6387 On Your Own (3 min.): • Identify the cases and the population the cases are drawn from. • How would you address this question using this data? 19 / 25

  24. Outline Describing One Categorical Variable Relationships Between Categorical Variables Conditional Proportions Hisp./Lat. White Black Asian Others Total Searched 510 109 240 16 7 882 Not Searched 1826 2081 1008 486 104 5505 Total 2336 2190 1248 502 111 6387 • We can group the cases according to one variable (e.g., driver race), and look at the distribution of the other (searched or not) within each group. 20 / 25

  25. Outline Describing One Categorical Variable Relationships Between Categorical Variables Conditional Proportions Hisp./Lat. White Black Asian Others Total Searched 510 109 240 16 7 882 Not Searched 1826 2081 1008 486 104 5505 Total 2336 2190 1248 502 111 6387 • We can group the cases according to one variable (e.g., driver race), and look at the distribution of the other (searched or not) within each group. • The resulting proportions are called conditional proportions : proportions are computed within a context , i.e., cases that satisfy a certain condition . 20 / 25

  26. Outline Describing One Categorical Variable Relationships Between Categorical Variables Conditional Proportions Searched Yes No Total Hisp./Lat. 510/2336 1826/2336 2336 White 109/2190 2081/2190 2190 Driver Race Black 240/1248 1008/1248 1248 Asian 15/502 486/502 502 Others 7/111 104/111 111 Total 882/6387 5505/6387 6387 21 / 25

  27. Outline Describing One Categorical Variable Relationships Between Categorical Variables Conditional Proportions Searched Yes No Total Hisp./Lat 0.218 0.782 2336 White 0.050 0.950 2190 Driver Race Black 0.192 0.808 1248 Asian 0.032 0.968 502 Others 0.063 0.937 111 Total 0.138 0.862 6387 22 / 25

  28. Outline Describing One Categorical Variable Relationships Between Categorical Variables Vietnam War Opinions From an October 2001 article in The Economist entitled “Treason of the Intellectuals?” “Back in Vietnam days, the anti-war movement spread from the intelligentsia into the rest of the population, eventually paralysing the country’s will to fight.” Source http://www.economist.com/node/806289 23 / 25

Recommend


More recommend