Frequency Distribution and Summary Statistics Dongmei Li - PowerPoint PPT Presentation

Frequency Distribution and Summary Statistics Dongmei Li Department of Public Health Sciences Office of Public Health Studies University of Hawai’i at M ā noa

Outline 1. Stemplot 2. Frequency table 3. Summary statistics 2

1. Stem-and-leaf plots (stemplots)  Always start by looking at the data with graphs and plots  Our favorite technique for looking at a single variable is the stemplot  A stemplot is a graphical technique that organizes data into a histogram-like display You can observe a lot by looking – Yogi Berra 3

Stemplot Illustrative Example  Select an SRS of 10 ages  List data as an ordered array 05 11 21 24 27 28 30 42 50 52  Divide each data point into a stem-value and leaf-value  In this example the “tens place” will be the stem-value and t he “ones place” will be the leaf value, e.g., 21 has a stem value of 2 and leaf value of 1 4

Stemplot illustration (cont.)  Draw an axis for the stem-values: 0| 1| 2| 1 3| 4| 5| × 10  axis multiplier (important!)  Place leaves next to their stem value  21 plotted (animation) 5

Stemplot illustration continued …  Plot all data points and rearrange in rank order: 0|5 1|1 2|1478 3|0 4|2 5|02 8 × 10 7  Here is the plot horizontally: 4 2 5 1 1 0 2 0 (for demonstration purposes) ------------ 0 1 2 3 4 5 ------------ Rotated stemplot 6

Interpreting Stemplots  Shape ◦ Symmetry ◦ Modality (number of peaks) ◦ Kurtosis (width of tails) ◦ Departures (outliers)  Location ◦ Gravitational center  mean ◦ Middle value  median  Spread ◦ Range and inter-quartile range ◦ Standard deviation and variance 7

Shape  “Shape” refers to the pattern when plotted  Here’s the silhouette of our data X X X X X X X X X X ----------- 0 1 2 3 4 5 -----------  Consider: symmetry, modality, kurtosis 8

Shape: Idealized Density Curve A large dataset is introduced An density curve is superimposed to better discuss shape 9

Symmetrical Shapes 10

Asymmetrical shapes 11

Modality (no. of peaks) 12

Kurtosis (width of tails)  fat tails Mesokurtic (medium) Platykurtic (flat)  skinny tails Leptokurtic (steep) Kurtosis is not be easily judged by eye 13

Stemplot – Second Example  Data: 1.47, 2.06, 2.36, 3.43, 3.74, 3.78, 3.94, 4.42  Stem = ones-place |1|5  Leaves = tenths-place |2|14  Round to keep one digit |3|4789 |4|4 after decimal point ( × 1) (e.g., 1.47  1.5) Do not plot decimal  Shape: asymmetric, skewed to the left, unimodal, no outliers 14

Draw a stemplot using JMP Analyze---Distribution---Data---Stem and Leaf Open the JMP data set named Stem_and_leaf_plot.jmp 15

Third Illustrative Example ( n = 26)  Age data set from 26 subjects  {14, 17, 18, 19, 22, 22, 23, 24, 24, 26, 26, 27, 28, 29, 30, 30, 30, 31, 32, 33, 34, 34, 35, 36, 37, 38} Data set: Stem_and_leaf_plot_example2.jmp Distribution of the age variable? 16

2. Frequency Table AGE | Freq Rel.Freq Cum.Freq. ------+----------------------- 3 | 2 0.3% 0.3%  Frequency = 4 | 9 1.4% 1.7% 5 | 28 4.3% 6.0% count 6 | 37 5.7% 11.6% 7 | 54 8.3% 19.9% 8 | 85 13.0% 32.9%  Relative frequency 9 | 94 14.4% 47.2% 10 | 81 12.4% 59.6% = proportion or % 11 | 90 13.8% 73.4% 12 | 57 8.7% 82.1% 13 | 43 6.6% 88.7%  Cumulative 14 | 25 3.8% 92.5% 15 | 19 2.9% 95.4% frequency  % less 16 | 13 2.0% 97.4% 17 | 8 1.2% 98.6% than or equal to 18 | 6 0.9% 99.5% 19 | 3 0.5% 100.0% level ------+----------------------- Total | 654 100.0% 17

Frequency Table with Class Intervals  When data are sparse, group data into class intervals  Create 4 to 12 class intervals  Classes can be uniform or non-uniform  End-point convention: e.g., first class interval of 0 to 10 will include 0 but exclude 10 (0 to 9.99)  Talley frequencies  Calculate relative frequency  Calculate cumulative frequency 18

Class Intervals Uniform class intervals table (width 10) for data: 05 11 21 24 27 28 30 42 50 52 Class Freq Relative Cumulative Freq. (%) Freq (%) 0 – 9 1 10 10 10 – 19 1 20 – 29 4 30 – 39 1 40 – 49 1 10 80 50 – 59 2 20 100 Total 10 100 -- 19

Histogram A histogram is a frequency chart for a quantitative measurement. Notice how the bars touch. 5 4 3 2 1 0 10_19 0-9 20-29 30-39 40-49 50-59 Age Class 20

Bar Chart A bar chart with non-touching bars is reserved for categorical measurements and non-uniform class intervals 500 450 400 350 300 250 200 150 100 50 0 Pre- Elem. Middle High School-level 21

3. Summary Statistics  Central location ◦ Mean ◦ Median ◦ Mode  Spread ◦ Range and interquartile range (IQR) ◦ Variance and standard deviation 22

Location: Mean “Eye - ball method”  visualize where plot would balance Arithmetic method = sum values and divide by n Eye-ball method  8 around 25 to 30 7 4 2 (takes practice) 5 1 1 0 2 0 ------------ 0 1 2 3 4 5 ------------ Arithmetic method ^ mean = 290 / 10 = 29 Grav.Center 23

Notation  n  sample size  X  the variable (e.g., ages of subjects)  x i  the value of individual i for variable X    sum all values (capital sigma)  Illustrative data (ages of participants): 21 42 5 11 30 50 28 27 24 52 n = 10 X = AGE variable x 1 = 21, x 2 = 42, …, x 10 = 52  x i = x 1 + x 2 + … + x 10 = 21 + 42 + … + 52 = 290 24

Central Location: Sample Mean  “Arithmetic average”  Traditional measure of central location  Sum the values and divide by n  “xbar” refers to the sample mean n 1 1          x x x x x 1 2 n i n n  i 1 25

Example: Sample Mean Ten individuals selected at random have the following ages: 21 42 5 11 30 50 28 27 24 52 Note that n = 10,  x i = 21 + 42 + … + 52 = 290, and 1 1     x x ( 290 ) 29 . 0 i n 10 The sample mean is the gravitational center of a distribution 10 30 40 60 20 0 50 Mean = 29 26

Uses of the Sample Mean The sample mean can be used to predict:  The value of an observation drawn at random from the sample  The value of an observation drawn at random from the population  The population mean 27

Population Mean  x 1    i  x i N N  Same operation as sample mean except based on entire population ( N ≡ population size)  Conceptually important  Usually not available in practice  Sometimes referred to as the expected value 28

Central Location: Median  Ordered array: 05 11 21 24 27 28 30 42 50 52  When n is even, the median is the average of the ( n ÷ 2)th data and the (n ÷ 2+1)th data.  When n is odd, the median is the ((n+1) ÷ 2)th data.  For illustrative data: n = 10 → the median falls between 27 and 28=(27+28) ÷ 2 =27.5 05 11 21 24 27 28 30 42 50 52  median Average the adjacent values: M = 27.5 29

More Examples of Medians  Example A: 2 4 6 Median = 4  Example B: 2 4 6 8 Median = 5 (average of 4 and 6)  Example C: 6 2 4 Median  2 (Values must be ordered first) 30

The Median is Robust The median is more resistant to skews and outliers than the mean; it is more robust . This data set has a mean of 1636: 1362 1439 1460 1614 1666 1792 1867 Here’s the same data set with a data entry error “outlier” ( highlighted ). This data set has a mean of 2743: 1362 1439 1460 1614 1666 1792 9867 The median is 1614 in both instances, demonstrating its robustness in the face of outliers. 31

Mode  The mode is the most commonly encountered value in the dataset  This data set has a mode of 7 {4, 7, 7, 7, 8, 8, 9}  This data set has no mode {4, 6, 7, 8} (each point appears only once)  The mode is useful only in large data sets with repeating values 32

Comparison of Mean, Median, Mode Note how the mean gets pulled toward the longer tail more than the median mean = median → symmetrical distrib mean > median → positive skew mean < median → negative skew 33

Spread: Quartiles Site 1| |Site 2  Two distributions can be quite --------------- different yet can have the same 42|2| mean 8|2|  This data compares particulate 2|3|234 86|3|6689 matter in air samples ( μ g/m 3 ) at 2|4|0 two sites. Both sites have a |4| mean of 36, but Site 1 exhibits |5| much greater variability. We |5| would miss the high pollution |6| days if we relied solely on the 8|6| ×10 mean. 34

Spread: Range  Range = maximum – minimum Site 1| |Site 2  Illustrative example: ---------------- 42|2| Site 1 range = 68 – 22 = 46 8|2| Site 2 range = 40 – 32 = 8 2|3|234  Beware: the sample range will 86|3|6689 2|4|0 tend to underestimate the |4| population range. |5|  Always supplement the range |5| with at least one addition |6| measure of spread 8|6| ×10 35

Frequency Distribution and Summary Statistics Dongmei Li - PowerPoint PPT Presentation

Frequency Distribution and Summary Statistics Dongmei Li Department of Public Health Sciences Office of Public Health Studies University of Hawaii at M noa Outline 1. Stemplot 2. Frequency table 3. Summary statistics 2 1. Stem-and-leaf

Frequency Decomposition The base frequency or the fundamental frequency is the lowest frequency.

Time-Frequency Analysis Time Frequency Analysis in Visual Signal Yetmen Wang AnCAD, Inc.

1. Normal distribution 2. Geometric distribution 3. Binomial distribution 4.

Computer Graphics Spectral Analysis Philipp Slusallek Spatial Frequency Frequency

Chapter 7: The Distribution of Sample Means Frequency 2 1 0 1 2 3 4 5 6 7 8 9 Scores Distribution

ENTSO-E Network Code Update Operational Security & Load Frequency Control and Reserves

The Art of Raising Frequency Setting the Vibrations for Healing Barbara Evans Energy Matters

FREQUENCY PLANNING IN WIRELESS SYSTEMS ETI 2511 Thursday, 13 April 2017 1 FREQUENCY PLANNING

AES 116th, Workshop 14 The role of multiple low-frequency signals in the perception of

What is Gene Selection? Frequency? The process by which forms of life Gene frequency is the

Frequency Response Frequency Response Standard Standard Technical Issues Technical Issues

Strings December 4, 2008 1 Relative Frequencies of Letters Letter Frequency Letter Frequency

RADIO FREQUENCY SPECTRUM ISSUES RELATING TO SMALL RADIO FREQUENCY SPECTRUM ISSUES SATELLITES

FInancial High Frequency Data Per Mykland University of Chicago, October 2012 Mykland FInancial

MATH 105: Finite Mathematics 9-3: Organizing Data Prof. Jonathan Duncan Walla Walla College

4.3 Normal distribution Prof. Tesler Math 186 Winter 2020 Prof. Tesler 4.3 Normal distribution

DNS Anomaly Detection superDAD Nick Barendregt Hidde van der Heide Agenda Introduction

Mathematics 101: Tabular and Graphical Presentation of Data Olive R. Cawiding Department of

PR PROPOS OPOSED ED B BAK AKAS ASSI D DEEP S EEP SEA POR EA PORT T CROS OSS R RIVER

1 2 3 4 1st development phase Port of Baku 2 ferry berths: 6,2 mln. tons 2 Ro-Ro berths: 1,8

Full Spectrum Analysis A description of the full spectrum plot with examples. Ray Kelm, PE

FREQUENCY FEATURES OF 3D BASALT FIBER WOVEN COMPOSITES UNDER COMPRESSION AT HIGH STRAIN RATES Z.

Dynamics of structures with uncertainties S Adhikari College of Engineering, Swansea University,

BUFFETI NG RESPONSE PREDI CTI ON BUFFETI NG RESPONSE PREDI CTI ON FOR CABLE FOR CABLE-STAYED BRI

Frequency Distribution and Summary Statistics Dongmei Li - PowerPoint PPT Presentation

Frequency Distribution and Summary Statistics Dongmei Li Department of Public Health Sciences Office of Public Health Studies University of Hawaii at M noa Outline 1. Stemplot 2. Frequency table 3. Summary statistics 2 1. Stem-and-leaf

Frequency Decomposition The base frequency or the fundamental frequency is the lowest frequency.

Time-Frequency Analysis Time Frequency Analysis in Visual Signal Yetmen Wang AnCAD, Inc.

1. Normal distribution 2. Geometric distribution 3. Binomial distribution 4.

Computer Graphics Spectral Analysis Philipp Slusallek Spatial Frequency Frequency

Chapter 7: The Distribution of Sample Means Frequency 2 1 0 1 2 3 4 5 6 7 8 9 Scores Distribution

ENTSO-E Network Code Update Operational Security &amp; Load Frequency Control and Reserves

The Art of Raising Frequency Setting the Vibrations for Healing Barbara Evans Energy Matters

FREQUENCY PLANNING IN WIRELESS SYSTEMS ETI 2511 Thursday, 13 April 2017 1 FREQUENCY PLANNING

AES 116th, Workshop 14 The role of multiple low-frequency signals in the perception of

What is Gene Selection? Frequency? The process by which forms of life Gene frequency is the

Frequency Response Frequency Response Standard Standard Technical Issues Technical Issues

Strings December 4, 2008 1 Relative Frequencies of Letters Letter Frequency Letter Frequency

RADIO FREQUENCY SPECTRUM ISSUES RELATING TO SMALL RADIO FREQUENCY SPECTRUM ISSUES SATELLITES

FInancial High Frequency Data Per Mykland University of Chicago, October 2012 Mykland FInancial

MATH 105: Finite Mathematics 9-3: Organizing Data Prof. Jonathan Duncan Walla Walla College

4.3 Normal distribution Prof. Tesler Math 186 Winter 2020 Prof. Tesler 4.3 Normal distribution

DNS Anomaly Detection superDAD Nick Barendregt Hidde van der Heide Agenda Introduction

Mathematics 101: Tabular and Graphical Presentation of Data Olive R. Cawiding Department of

PR PROPOS OPOSED ED B BAK AKAS ASSI D DEEP S EEP SEA POR EA PORT T CROS OSS R RIVER

1 2 3 4 1st development phase Port of Baku 2 ferry berths: 6,2 mln. tons 2 Ro-Ro berths: 1,8

Full Spectrum Analysis A description of the full spectrum plot with examples. Ray Kelm, PE

FREQUENCY FEATURES OF 3D BASALT FIBER WOVEN COMPOSITES UNDER COMPRESSION AT HIGH STRAIN RATES Z.

Dynamics of structures with uncertainties S Adhikari College of Engineering, Swansea University,

BUFFETI NG RESPONSE PREDI CTI ON BUFFETI NG RESPONSE PREDI CTI ON FOR CABLE FOR CABLE-STAYED BRI

ENTSO-E Network Code Update Operational Security & Load Frequency Control and Reserves