Categorical Data Contingency Tables STAT 113 Describing Categorical Data Colin Reimer Dawson Oberlin College September 7, 2017 1 / 29
Categorical Data Contingency Tables Frequency Tables How can we display a data set with categorical values? One option is simply a frequency table . Daily Weekly Monthly Semesterly N/R Total 9 28 18 23 13 91 Table: Results of a Survey of College Studentson Frequency of Video Game Playing (via Nolan and Speed, 2000) 3 / 29
Categorical Data Contingency Tables Relative Frequency Tables If we use proportions or percentages, we have a relative frequency table . Daily Weekly Monthly Semesterly N/R Total 0.0989 0.3077 0.1978 0.2527 0.1429 1.0000 Table: Results of a Survey of College Students on Frequency of Video Game Playing (via Nolan and Speed, 2000) 4 / 29
Categorical Data Contingency Tables Graphical Displays Sometimes a chart is more effective than a table. Figure: http://www.xkcd.com 5 / 29
Categorical Data Contingency Tables Pie Charts The pie chart is a popular choice for proportion data... Weekly ( 30.8%) Daily ( 9.9%) Monthly ( 19.8%) N/R ( 14.3%) Semesterly ( 25.3%) Figure: Results of a Survey of College Students on Frequency of Video Game Playing (via Nolan and Speed, 2000) 6 / 29
Categorical Data Contingency Tables Pie Charts... Figure: http: //www.businessinsider.com/pie-charts-are-the-worst-2013-6 • “Pie charts are the Nickelback of data visualization.” • “Pie charts are the Aquaman of data visualization.” 7 / 29
Categorical Data Contingency Tables Bar Charts vs. Pie Charts • Pie charts fail to convey anything useful with more than maybe 4 categories • Even with few categories, it’s difficult to judge differences in slice size. • People are better at judging linear size than area. 8 / 29
Categorical Data Contingency Tables Bar Chart > Pie Chart • Even in situations ideally suited to pie charts, it’s probably still better to use bar charts. 9 / 29
Categorical Data Contingency Tables The Verdict on Pie Charts 10 / 29
Categorical Data Contingency Tables The One Exception 11 / 29
Categorical Data Contingency Tables Bar Charts Much easier to see differences between categories. 50 % of respondents 40 30 20 10 0 Daily Weekly Monthly Semesterly N/R Figure: Two bar plots of Video Game data showing frequency (left) and percentages (right) 12 / 29
Categorical Data Contingency Tables Bar Charts What’s this bar chart telling us? Figure: A Fair and Balanced Bar Chart (from FOX News, 8/9/12) 13 / 29
Categorical Data Contingency Tables Bar Chart Tips • The cardinal rule of bar charts: Ratios in area = ratios in value • The y -axis must start at 0! • Equal distances = equal differences 14 / 29
Categorical Data Contingency Tables Summary Categorical Data • Count occurrences of each value. Represent counts with a frequency table , proportions with a relative frequency table . • Pie charts are pretty useless. Prefer bar charts. • Cardinal rule of bar charts: Ratios in area = ratios in value • The y -axis must start at 0! • Equal distances = equal differences 15 / 29
Categorical Data Contingency Tables Contingency Tables • Recall: with one categorical variable, we summarized by counting the observations in each category • With more than one variable, we do the same thing, but we keep track of combinations . • With two variables, we can store the counts in a two-way table (also known as a contingency table ). 17 / 29
Categorical Data Contingency Tables A Simple Contingency Table Student Sex Computer 1 M PC 2 F Mac Computer 3 F PC PC Mac 4 M PC = ⇒ 3 1 M 5 F PC Sex F 2 2 6 F Mac 7 M Mac 8 M PC 18 / 29
Categorical Data Contingency Tables Proportions in a Context Sometimes we want to ask about proportions within particular subsets of the data. Example: Driving While Black/Brown Armentrout, et al. (2007) 1 reports data on a variety of outcomes related to traffic stops by the Los Angeles Police Department (LAPD). Two of the variables recorded are racial category of the driver and whether or not the vehicle was searched. Question of interest: Does the proportion of stops that lead to a search differ across racial categories? 1 Armentrout, M., Goodrich, A., Nguyen, J., Ortega, L., Smith, L., & Khadjavi, L.S. (2007). Cops and stops: Racial profiling and a preliminary statistics analysis of Los Angeles police department traffic stops and searches. Retrieved from http://www.public.asu.edu/ etcamach/AMSSI/reports/copsnstops.pdf 19 / 29
Categorical Data Contingency Tables Proportions in a Context Question of interest: Does the proportion of stops that lead to a search differ across racial categories? Hisp./Lat. White Black Asian Others Total Searched 510 109 240 16 7 882 Not Searched 1826 2081 1008 486 104 5505 Total 2336 2190 1248 502 111 6387 Pairs (3 min.): • Identify the cases and the population the cases are drawn from. • How would you address this question using this data? 20 / 29
Categorical Data Contingency Tables Conditional Proportions Hisp./Lat. White Black Asian Others Total Searched 510 109 240 16 7 882 Not Searched 1826 2081 1008 486 104 5505 Total 2336 2190 1248 502 111 6387 • We can group the cases according to one variable (e.g., driver race), and look at the distribution of the other (searched or not) within each group. • The resulting proportions are called conditional proportions : proportions are computed within a context , i.e., cases that satisfy a certain condition . 21 / 29
Categorical Data Contingency Tables Conditional Proportions Searched Yes No Total Hisp./Lat. 510/2336 1826/2336 2336 White 109/2190 2081/2190 2190 Driver Race Black 240/1248 1008/1248 1248 Asian 15/502 486/502 502 Others 7/111 104/111 111 Total 882/6387 5505/6387 6387 22 / 29
Categorical Data Contingency Tables Conditional Proportions Searched Yes No Total Hisp./Lat 0.218 0.782 2336 White 0.050 0.950 2190 Driver Race Black 0.192 0.808 1248 Asian 0.032 0.968 502 Others 0.063 0.937 111 Total 0.138 0.862 6387 23 / 29
Categorical Data Contingency Tables Conditional Proportions to Measure Association Pairs, 1 min.: What do these conditional proportions tell us? Grade Expected A B C Total Daily 0.56 0.33 0.11 9 Weekly 0.50 0.50 0.00 28 Frequency Monthly 0.11 0.67 0.22 18 Sem’ly 0.30 0.61 0.09 23 Total 0.36 0.55 0.09 78 Table: Results of a survey of college students on frequency of video game playing and expected grade in a stats class (Nolan and Speed, 2000) 24 / 29
Categorical Data Contingency Tables A Three-Way Table Figure: Religious Affiliation by Political Party, 2006 vs 2016 (via 538) 25 / 29
Categorical Data Contingency Tables Religious and Political Affiliation “In 2016, a whopping 35 percent of Republicans were white evangelical Protestants, 18 percent were white mainline Protestants, and 16 percent were white Catholics; together, those groups account for nearly 70 percent of the Republican base. But since 2006, the proportion of Americans identifying as white evangelical Protestant, white Catholic, and white mainline Protestant have all dropped by 5 or 6 percentage points.” From America’s Shifting Religious Makeup Could Spell Trouble For Both Parties , FiveThirtyEight.com 26 / 29
Categorical Data Contingency Tables Age Breakdown by Religion “Religious minorities like Muslims, Hindus and Buddhists; the religiously unaffiliated; and Hispanic Protestants and Catholics all have significant numbers of followers under the age of 30. And all of these groups disproportionately identify as Democrats. This youth and diversity might seem like a gift to the Democratic Party, but it also presents a serious challenge for politicians hoping to present a compelling vision to voters who have a wide range of values and priorities.” 27 / 29
Categorical Data Contingency Tables Example: Medical Diagnosis A test for a rare disease (affecting about 1 in 10,000 people) has been developed and shown to have high accuracy: 99% of those with the disease test positive, and 99% of those without it test negative. Pairs (3 min.): If you test positive, what are the chances you have the disease? Hint: Construct a contingency table with 1,000,000 people who may or may not have the disease and may or may not test positive. 28 / 29
Categorical Data Contingency Tables Summary Two Categorical Variables • A two-way table called a contingency table contains the number of times each combination of values appears. • The conditional proportion of x given y is the proportion of the time x occurs in the context of y . • Conditional proportions are not symmetric . Direction of conditioning can make a big difference to the apparent pattern when base rates are very different. • We can use stacked or grouped bar plots to visualize joint frequencies, or conditional proportions. 29 / 29
Recommend
More recommend