exploring data
play

Exploring Data Graphing and Summarizing Univariate Data Graphing - PowerPoint PPT Presentation

Exploring Data Graphing and Summarizing Univariate Data Graphing the Data Graphical displays of quantitative data include: Dotplot Stemplot Histogram Cumulative Frequency Plots (ogives) Boxplots Dotplot As you might


  1. Exploring Data Graphing and Summarizing Univariate Data

  2. Graphing the Data • Graphical displays of quantitative data include: ▫ Dotplot ▫ Stemplot ▫ Histogram ▫ Cumulative Frequency Plots (ogives) ▫ Boxplots

  3. Dotplot • As you might guess, a dotplot is made up of dots plotted on a graph. • Each dot can represent a single observation from a set of data, or a specified number of observations from a set of data. • The dots are stacked in a column over a category or value, so that the height of the column represents the frequency of observations in the category.

  4. Dotplot Example Number of Dogs in Each Home in My Block * * * * * * * * * * 0 1 2 3 # of Dogs

  5. Stemplot Stems Leaves 15 1 14 13 12 2 6 11 4 5 7 9 10 1 2 2 2 5 7 9 9 Key: 9 0 2 3 4 4 5 7 8 9 9 15 1 = 151 8 1 1 4 7 8

  6. Histogram Note bars touch and variable is quantitative

  7. Cumulative Frequency Plot Typical Wait Times Often Used for estimating Cum medians, Freq quartiles, & (%) Percentiles Wait Times ( in Hrs.)

  8. Boxplot Min Med Max Q Q 1 3 Based on 5- Number Summary

  9. SHAPES of Boxplots • Previous was symmetric • Below is Skewed left • Below is Skewed Right

  10. Checking for outliers An outlier is any value that is either • greater than Q3 + 1.5*IQR OR • less than Q1 – 1.5*IQR Note that whiskers always end at a data value

  11. What Is Required on ALL Plots? • Title • Labels on the horizontal and vertical axes - be sure if you are using 3 to represent 3,000 that that information is in the label • Scales on both axes (sometimes this is not needed, for example on boxplots) • Labels for each plot if the graph includes multiple data sets (e.g. parallel boxplots)

  12. How to Describe the Graphs Use your SOCS: o S hape o O utliers and/or other unusual features o C enter o S pread Discuss all characteristics IN CONTEXT .

  13. Shape • Four Basic Shapes: • Symmetric • Uniform

  14. • Skewed left or skewed toward small values • Skewed right or skewed toward large values

  15. Should I Say Normal? Be careful when you describe the shape of a mound-shaped, approximately symmetric distribution. The distribution may or may not be normal. Graders will accept the description as approximately normal , but they will not accept that the distribution is normal based only on a mound-shaped, symmetric graph.

  16. Outliers and other Unusual Features The Usual Unusuals: • Gaps • Clusters • Outliers • Peaks – ex. Bimodal

  17. Center • Mean and median are both measures of center • Median – put the values in order and the median is the middle value (or the mean of the two middle values) – the median divides a histogram into two equal areas • Mean – add the values and divide by the number of values you have – the mean is the balance point for a histogram

  18. Spread Several ways to describe: • Range – calculate max - min ; the range gives you the total spread in the data. • IQR – calculate Q3 – Q1 ; IQR gives you the spread of the middle 50% of the data • Standard deviation – the average distance of data values from the mean

  19. How Does the shape impact Mean and Median? • If the shape is approximately symmetric, the mean and median are approximately equal. • If the shape is skewed, the mean is closer to the tail than the median. Ex. Salaries – the mean will be larger than the median because salaries are usually skewed right

  20. The Converse May Not Be True Be careful – If the mean is not equal to the median, you cannot conclude automatically that the shape is skewed.

  21. Comparing Graphs Means to Compare – not just list characteristics • Okay to say o The mean of x= 8 is less than the mean of y = 9. o The medians of x and y are about the same. o The median of x is slightly larger. o The shapes are both skewed left. • Not Okay o The mean of x is 8 and the mean of y is 9. o Median x = 4, median y =4. o The shapes are similar.

  22. When Do You Use X-Bar/S x and When Do You Use the 5-Number Summary? • If the distribution is symmetric, use mean and standard deviation. • If the distribution is skewed, use the 5-number summary. • Note that the mean and standard deviation are not resistant to outliers; the median and IQR are resistant.

  23. Other Key Locations on Distributions • Percentile – the smallest value x for which n percent of the data values are < or = x ex. If the 80 th percentile is 28, then 80% of the data equal 28 or less • Quartiles – the 25 th , 50 th , 75 th percentiles. The 25 th percentile is the lower or first quartile Q1, the 50 th percentile is the median, the 75 th percentile is the upper or third quartile Q3. • Z-score – shows how many standard deviations a value is above or below the mean

  24. How do I get the summary values? • You can calculate most of the summary values using 1-Var Stats. • The order on the calculator is: 1-Var Stats L1 or 1-Var Stats L1, L2 The data values are in L1 and the frequencies are in L2

  25. Categorical Data Displays

  26. Frequency Tables Grades Earned on Test 1 Grade frequency A 10 B 15 C 5 D 2 F 1

  27. Bar Chart

  28. Segmented Bar Chart Hobbies By Gender

  29. Two Way Tables Favorite Leisure Activities Dance Sports TV Total Men 2 10 8 20 Women 16 6 8 30 Total 18 16 16 50

  30. One Other Graph – The Pie Chart Sorry – couldn ’ t resist GOOD LUCK ON THE EXAM!!!

Recommend


More recommend