Univariate Categorical Data MATH 185 Introduction to Computational - PowerPoint PPT Presentation

Univariate Categorical Data MATH 185 – Introduction to Computational Statistics University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/ ∼ eariasca/math185.html MATH 185 – University of California San Diego – Ery Arias-Castro 1 / 10

The first 2000 digits of π We use the pi2000 data in the package UsingR – call ?pi2000. > library(UsingR) > str(pi2000) num [1:2000] 3 1 4 1 5 9 2 6 5 3 ... Q: Though this is not the role of a statistician per se , what kind of questions would we ask of such data? MATH 185 – University of California San Diego – Ery Arias-Castro 2 / 10

Counts/Frequencies Say we are insterested in the number of times certain digits appear. We therefore summarize the data as counts in the different categories > table(pi2000) pi2000 0 1 2 3 4 5 6 7 8 9 181 213 207 189 195 205 200 197 202 211 Alternatively, we can compute frequencies > table(pi2000)/length(pi2000) pi2000 0 1 2 3 4 5 6 7 8 9 0.0905 0.1065 0.1035 0.0945 0.0975 0.1025 0.1000 0.0985 0.1010 0.1055 MATH 185 – University of California San Diego – Ery Arias-Castro 3 / 10

Barplot For categorical data with a few categories, a barplot is often useful. > barplot(table(pi2000), col = "#ffffcc") 200 150 100 50 0 0 1 2 3 4 5 6 7 8 9 MATH 185 – University of California San Diego – Ery Arias-Castro 4 / 10

Pie Chart We can also use a pie chart. > pie(table(pi2000)) 2 3 1 4 0 5 9 6 8 7 MATH 185 – University of California San Diego – Ery Arias-Castro 5 / 10

Testing for equal proportions The Pearson χ 2 -goodness-of-fit test: We observe an i.i.d. sample ξ 1 , . . . , ξ n with P ( ξ i = r s ) = p s We want to test � H 0 : p s = p 0 s for all s = 1 , . . . , t � H 1 : there is s = 1 , . . . , t such that p s � = p 0 s The Pearson χ 2 -goodness-of-fit test rejects when D below is large t ( X s − np 0 s ) 2 � D = np 0 s s =1 How large? Under the null, D has approximately the χ 2 distribution with t − 1 degrees of freedom. MATH 185 – University of California San Diego – Ery Arias-Castro 6 / 10

Testing for equal proportions 1 Here, n = 2000, t = 10 (with r s = s ) and p 0 s = 10 . > chisq.test(table(pi2000)) Chi-squared test for given probabilities data: table(pi2000) X-squared = 4.42, df = 9, p-value = 0.8817 The p -value is fairly large and so there is not enough evidence to reject the null. MATH 185 – University of California San Diego – Ery Arias-Castro 7 / 10

Testing for Dependencies Many possible dependency structures. Here is an example. Compute the differences of successive digits and group them into {− 9 , . . . , 9 } > table(diff(pi2000)) -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 18 33 66 93 103 119 145 170 156 190 181 162 131 114 116 83 46 45 28 If the sequence behaved like an i.i.d. sample from the uniform on { 0 , . . . , 9 } , the differences would have the following distribution on {− 9 , . . . , 9 } > p0 = c(1:9, 10, 9:1)/100 [1] 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.10 0.09 0.08 0.07 0.06 MATH 185 – University of California San Diego – Ery Arias-Castro 8 / 10

Testing for Dependencies We therefore perform a χ 2 -goodness-of-fit test to verify that > chisq.test(table(diff(pi2000)), p = p0) Chi-squared test for given probabilities data: table(diff(pi2000)) X-squared = 19.4219, df = 18, p-value = 0.3663 Again, there is not enough evidence to reject the null. MATH 185 – University of California San Diego – Ery Arias-Castro 9 / 10

Testing for Dependencies � A more detailed-oriented method computes the number of transistions from digit s to digit t . � If the sequence behaved like an i.i.d. sample from the uniform on { 0 , . . . , 9 } , all transitions would be equally likely. � However, there are many (100) such transitions. � Many other approaches, under the name of Tests of Randomness – for example tests based on runs. MATH 185 – University of California San Diego – Ery Arias-Castro 10 / 10

Univariate Categorical Data MATH 185 Introduction to Computational - PowerPoint PPT Presentation

Univariate Categorical Data MATH 185 Introduction to Computational Statistics University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/ eariasca/math185.html MATH 185 University of California San Diego

Case study introduction Emily Robinson Data Scientist DataCamp Categorical Data in the

Reordering factors Emily Robinson Data Scientist DataCamp Categorical Data in the Tidyverse

STAT 113 Describing Categorical Data Colin Reimer Dawson Oberlin College September 7, 2017 1 /

Lionel Riou Fransca Univariate & bivariate Two kind of analysis Univariate

Outline Univariate Trees 1 Decision Trees Classification Regression Pruning Steven J Zeil

STAT 113 Describing Categorical Data I Colin Reimer Dawson Oberlin College September 11, 2020

Introduction to qualitative data Emily Robinson Data Scientist DataCamp Categorical Data in

Examining common themed variables Emily Robinson Data Scientist DataCamp Categorical Data in

Categorical Professional Development In-Service August 6, 2019 Welcome Back Categorical Team

Categorical Probability and Statistics Peter McCullagh Department of Statistics University of

Categorical quantum mechanics Chris Heunen 1 / 76 Categorical Quantum Mechanics? Study of

Categorical Semantics for Linear Logic Categorical semantics for linear logic Interaction

Categorical models of probability with symmetries Sam Staton, Oxford Categorical models

Introduction to Data Science: Principles ordered categorical data do not have magnitude

Robust Statistics Part 1: Introduction and univariate data Peter Rousseeuw LARS-IASC School, May

MIXTURE DENSITY NETWORKS MIXTURE DENSITY NETWORKS Charles Martin SO FAR; RNNS THAT MODEL

HYPOTHESIS TESTING PART III LEARNING GOALS become able to interpret & apply some

HYPOTHESIS TESTING PART II LEARNING GOALS get more intimate with p -values distribution

Addressing educational equity for Latino youth in Oregon: The OSU Open Campus Juntos Program

Data Mining in Bioinformatics Day 6: Feature Selection in Bioinformatics Karsten Borgwardt

Stat 5421 Lecture Notes: To Accompany Agresti Ch 9 Charles J. Geyer November 09, 2020 Contents

Workshop 10.6b: Analysis of count data (Bayesian) Murray Logan September 13, 2016 Table of

Higgs Measurements at a Muon Collider Higgs Factory [Preliminary] Alexander Conway, UChicago

Applications of Graph Theory and Probability in the Board Game Ticket to Ride R. Teal Witter &

Univariate Categorical Data MATH 185 Introduction to Computational - PowerPoint PPT Presentation

Univariate Categorical Data MATH 185 Introduction to Computational Statistics University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/ eariasca/math185.html MATH 185 University of California San Diego

Case study introduction Emily Robinson Data Scientist DataCamp Categorical Data in the

Reordering factors Emily Robinson Data Scientist DataCamp Categorical Data in the Tidyverse

STAT 113 Describing Categorical Data Colin Reimer Dawson Oberlin College September 7, 2017 1 /

Lionel Riou Fransca Univariate &amp; bivariate Two kind of analysis Univariate

Outline Univariate Trees 1 Decision Trees Classification Regression Pruning Steven J Zeil

STAT 113 Describing Categorical Data I Colin Reimer Dawson Oberlin College September 11, 2020

Introduction to qualitative data Emily Robinson Data Scientist DataCamp Categorical Data in

Examining common themed variables Emily Robinson Data Scientist DataCamp Categorical Data in

Categorical Professional Development In-Service August 6, 2019 Welcome Back Categorical Team

Categorical Probability and Statistics Peter McCullagh Department of Statistics University of

Categorical quantum mechanics Chris Heunen 1 / 76 Categorical Quantum Mechanics? Study of

Categorical Semantics for Linear Logic Categorical semantics for linear logic Interaction

Categorical models of probability with symmetries Sam Staton, Oxford Categorical models

Introduction to Data Science: Principles ordered categorical data do not have magnitude

Robust Statistics Part 1: Introduction and univariate data Peter Rousseeuw LARS-IASC School, May

MIXTURE DENSITY NETWORKS MIXTURE DENSITY NETWORKS Charles Martin SO FAR; RNNS THAT MODEL

HYPOTHESIS TESTING PART III LEARNING GOALS become able to interpret &amp; apply some

HYPOTHESIS TESTING PART II LEARNING GOALS get more intimate with p -values distribution

Addressing educational equity for Latino youth in Oregon: The OSU Open Campus Juntos Program

Data Mining in Bioinformatics Day 6: Feature Selection in Bioinformatics Karsten Borgwardt

Stat 5421 Lecture Notes: To Accompany Agresti Ch 9 Charles J. Geyer November 09, 2020 Contents

Workshop 10.6b: Analysis of count data (Bayesian) Murray Logan September 13, 2016 Table of

Higgs Measurements at a Muon Collider Higgs Factory [Preliminary] Alexander Conway, UChicago

Applications of Graph Theory and Probability in the Board Game Ticket to Ride R. Teal Witter &amp;

Lionel Riou Fransca Univariate & bivariate Two kind of analysis Univariate

HYPOTHESIS TESTING PART III LEARNING GOALS become able to interpret & apply some

Applications of Graph Theory and Probability in the Board Game Ticket to Ride R. Teal Witter &