Statistics Marc H. Mehlman marcmehlman@yahoo.com University of New Haven “To understand God’s thoughts, we must study statistics, for these are the measure of his purpose.” – Florence Nightingale “Statistics: the mathematical theory of ignorance.” – Morris Kline “Statistics means never having to say you’re certain.” – Anonymous Marc Mehlman Marc Mehlman (University of New Haven) Statistics 1 / 48
Table of Contents Introduction 1 Graphical Representation of Distributions 2 Measuring the Center 3 Measuring the Spread 4 Normal Distribution 5 Misuse of Statistics 6 Chapter #1 R Assignment 7 Marc Mehlman Marc Mehlman (University of New Haven) Statistics 2 / 48
Statistics Statistics Marc Mehlman Marc Mehlman (University of New Haven) Statistics 3 / 48
Introduction Definition Given a population , one often examines a sample of the population in order to draw inference about the entire population. A variable is a measurable characteristic of individuals within the population. The distribution of a variable is the frequency it obtains it outputs. Data is a variable’s values from the sample. Statistics is the science of drawing inference from data about the population. Example From the 50,000 residents of the town a Milford, 300 where selected randomly and asked what their highest academic degree. The population is the 50,000 residents, the sample is the 300 randomly selected residents and the variable is the level of education of the resident. It was too costly to contact all 50,000 residents so the actual distribution of terminal degrees among the entire population is inferred from the distribution of the terminal degrees of 300 randomly sampled residents. Statistic’s Origins: Anecdotes and noticing patterns in random happenings. Marc Mehlman Marc Mehlman (University of New Haven) Statistics 4 / 48
Introduction “Data. Data. Data. I can’t make bricks without clay.” – Sherlock Holmes “In God we trust. All others must bring data.” - W. Edwards Deming Definition (Types of Variables) qualitative (categorical): descriptive Examples: color of eyes, gender, city born in. quantitative: numeric Examples: height, miles per gallon, tempera- ture, etc. Definition (Types of Quantitative Variables) discrete: discrete range Examples: # of children someone has, number of coins in pocket continuous: continuous range Examples: weight, speed Marc Mehlman Marc Mehlman (University of New Haven) Statistics 5 / 48
Graphical Representation of Distributions Graphical Representation of Distributions Graphical Representation of Distributions Marc Mehlman Marc Mehlman (University of New Haven) Statistics 6 / 48
Graphical Representation of Distributions Distribution of a Variable To examine a single variable, we graphically display its distribution. The distribution of a variable tells us what values it takes and how The distribution of a variable tells us what values it takes and how often it takes these values. often it takes these values. Distributions can be displayed using a variety of graphical tools. The Distributions can be displayed using a variety of graphical tools. The proper choice of graph depends on the nature of the variable. proper choice of graph depends on the nature of the variable. Categorical Variable Quantitative Variable Categorical Variable Quantitative Variable Pie chart Histogram Pie chart Histogram Bar graph Stemplot Bar graph Stemplot 6 Marc Mehlman Marc Mehlman (University of New Haven) Statistics 7 / 48
Graphical Representation of Distributions Categorical Variables The distribution of a categorical variable lists the categories and gives the count or percent of individuals who fall into that category. Pie Charts show the distribution of a categorical variable as a “pie” whose slices are sized by the counts or percents for the categories. Bar Graphs represent each category as a bar whose heights show the category counts or percents. 7 Marc Mehlman Marc Mehlman (University of New Haven) Statistics 8 / 48
Graphical Representation of Distributions > pie.sales = c(0.12, 0.3, 0.26, 0.16, 0.04, 0.12) > lbls = c("Blueberry", "Cherry", "Apple", "Boston Cream", "Other", "Vanilla Cream") > pie(pie.sales, labels = lbls, main="Pie Sales") Pie Sales Cherry Blueberry Apple Vanilla Cream Other Boston Cream Marc Mehlman Marc Mehlman (University of New Haven) Statistics 9 / 48
Graphical Representation of Distributions > counts=c(40,30,20,10) > colors=c("Red","Blue","Green","Brown") > barplot(counts,names.arg=colors,main="Favorite Colors") Favorite Colors 40 30 20 10 0 Red Blue Green Brown Marc Mehlman Marc Mehlman (University of New Haven) Statistics 10 / 48
Graphical Representation of Distributions Quantitative Variables The distribution of a quantitative variable tells us what values the variable takes on and how often it takes those values. Histograms show the distribution of a quantitative variable by using bars whose height represents the number of individuals who take on a value within a particular class. Stemplots separate each observation into a stem and a leaf that are then plotted to display the distribution while maintaining the original values of the variable. 9 Marc Mehlman Marc Mehlman (University of New Haven) Statistics 11 / 48
Graphical Representation of Distributions Histograms For quantitative variables that take many values and/or large datasets. Divide the possible values into classes (equal widths). Count how many observations fall into each interval (may change to percents). Draw picture representing the distribution―bar heights are equivalent to the number (percent) of observations in each interval. 13 Marc Mehlman Marc Mehlman (University of New Haven) Statistics 12 / 48
Graphical Representation of Distributions > hist(trees$Girth,main="Girth of Black Cherry Trees",xlab="Diameter in Inches") Girth of Black Cherry Trees 12 10 8 Frequency 6 4 2 0 8 10 12 14 16 18 20 22 Diameter in Inches Marc Mehlman Marc Mehlman (University of New Haven) Statistics 13 / 48
Graphical Representation of Distributions Stemplots To construct a stemplot: Separate each observation into a stem (first part of the number) and a leaf (the remaining part of the number). Write the stems in a vertical column; draw a vertical line to the right of the stems. Write each leaf in the row to the right of its stem; order leaves if desired. 10 Marc Mehlman Marc Mehlman (University of New Haven) Statistics 14 / 48
Graphical Representation of Distributions > Girth=trees$Girth > stem(Girth) # stem and leaf plot The decimal point is at the | 8 | 368 10 | 57800123447 12 | 099378 14 | 025 16 | 03359 18 | 00 20 | 6 > stem(Girth, scale=2) The decimal point is at the | 8 | 368 9 | 10 | 578 11 | 00123447 12 | 099 13 | 378 14 | 025 15 | 16 | 03 17 | 359 18 | 00 19 | 20 | 6 Marc Mehlman Marc Mehlman (University of New Haven) Statistics 15 / 48
Graphical Representation of Distributions Examining Distributions In any graph of data, look for the overall pattern and for striking deviations from that pattern. You can describe the overall pattern by its shape, center, and spread. An important kind of deviation is an outlier, an individual that falls outside the overall pattern. 15 Marc Mehlman Marc Mehlman (University of New Haven) Statistics 16 / 48
Graphical Representation of Distributions Examining Distributions A distribution is symmetric if the right and left sides of the graph are approximately mirror images of each other. A distribution is skewed to the right (right-skewed) if the right side of the graph (containing the half of the observations with larger values) is much longer than the left side. It is skewed to the left (left-skewed) if the left side of the graph is much longer than the right side. Symmetric Skewed-left Skewed-right Symmetric Skewed-left Skewed-right 16 Marc Mehlman Marc Mehlman (University of New Haven) Statistics 17 / 48
Graphical Representation of Distributions Outliers An important kind of deviation is an outlier. Outliers are observations that lie outside the overall pattern of a distribution. Always look for outliers and try to explain them. The overall pattern is fairly symmetrical except for two states that clearly do not belong to the main trend. Alaska and Florida have unusual representation of the elderly in their Alaska Florida population. A large gap in the distribution is typically a sign of an outlier. Marc Mehlman Marc Mehlman (University of New Haven) Statistics 18 / 48
Measuring the Center Measuring the Center Measuring the Center Marc Mehlman Marc Mehlman (University of New Haven) Statistics 19 / 48
Recommend
More recommend