Chapter 4.1 Scatter Diagrams and Linear Correlation
Learning Objectives At the end of this lecture, the student should be able to: • Explain what a scattergram is and how to make one • State what “strength” and “direction” mean with respect to correlations • Compute correlation coefficient r using the computational formula • Describe why correlation is not necessarily causation
Introduction • Making a scatter diagram • Correlation coefficient r • Causation and lurking variables Photograph provided by Dr. John Bollinger
Scattergram Also called Scatter Plots
Scattergrams Graph x,y Pairs 8 • Explanatory (independent) 7 variable is called x 6 • Graphed on x-axis 5 4 3 2 1 0 0 1 2 3 4 5 6 7 8 x axis
Scattergrams Graph x,y Pairs 8 • Explanatory (independent) Y 7 variable is called x 6 • Graphed on x-axis 5 • Response (dependent) 4 y axis variable is called y 3 • Graphed on y-axis 2 1 0 0 1 2 3 4 5 6 7 8 x axis
Scattergrams Graph x,y Pairs 8 • Explanatory (independent) 7 variable is called x 6 • Graphed on x-axis 5 • Response (dependent) 4 y axis variable is called y 3 • Graphed on y-axis • Trick to memorizing: x → y, 2 x comes before y, so x 1 “causes” y. 0 • Scatter diagram is a graph 0 1 2 3 4 5 6 7 8 of these x,y pairs x axis
Scattergrams Graph x,y Pairs 8 Do the number of diagnoses a 7 patient has correlate with the 6 number of medications s/he 5 takes? 4 y axis x y 3 (# of dx) (# of meds) 2 1 3 1 3 5 0 4 4 0 1 2 3 4 5 6 7 8 7 6 x axis
Scattergrams Graph x,y Pairs 8 Do the number of diagnoses a Number of Medications 7 patient has correlate with the 6 number of medications s/he 5 takes? 4 1 x y 3 (# of dx) (# of meds) 2 1 3 3 1 3 5 0 4 4 0 1 2 3 4 5 6 7 8 7 6 Number of Diagnoses
Scattergrams Graph x,y Pairs 8 Do the number of diagnoses a Number of Medications 7 patient has correlate with the 6 3 number of medications s/he 5 takes? 4 x y 3 (# of dx) (# of meds) 2 5 1 3 1 3 5 0 4 4 0 1 2 3 4 5 6 7 8 7 6 Number of Diagnoses
Scattergrams Graph x,y Pairs 8 Do the number of diagnoses a Number of Medications 7 patient has correlate with the 6 number of medications s/he 5 takes? 4 x y 3 (# of dx) (# of meds) 2 1 3 1 3 5 0 4 4 0 1 2 3 4 5 6 7 8 7 6 Number of Diagnoses
Scattergrams Graph x,y Pairs 8 Do the number of diagnoses a Number of Medications 7 patient has correlate with the 6 number of medications s/he 5 takes? 4 x y 3 (# of dx) (# of meds) 2 1 3 1 3 5 0 4 4 0 1 2 3 4 5 6 7 8 7 6 Number of Diagnoses
Linear Correlation 8 • Linear correlation means 7 that when you make a 6 scatterplot of x,y pairs, it x y 5 looks kind of like a line 1 2 4 • “Perfect” linear correlation 2 4 3 3 6 looks like graphing points 2 4 8 in algebra 1 0 0 1 2 3 4 5 6 7 8
Facts About Linear Correlation 8 • The line can go up. This Number of Medications 7 is a positive correlation. 6 5 4 3 2 1 0 0 1 2 3 4 5 6 7 8 Number of Diagnoses
Facts About Linear Correlation 8 Number of Nurses Staffed on Shift • The line can go up. This 7 is a positive correlation. 6 • The line can go down. 5 This is negative 4 correlation. 3 2 1 0 0 1 2 3 4 5 6 7 8 Number of Patient Complaints
Facts About Linear Correlation 8 • The line can go up. This 7 is a positive correlation. Days Spent in Hospital 6 • The line can go down. 5 This is negative 4 correlation. 3 • The line can be straight. 2 This is no correlation. 1 0 0 1 2 3 4 5 6 7 8 Total Unique Visitors
Facts About Linear Correlation 8 • The line can go up. This 7 is a positive correlation. Number of Books 6 • The line can go down. 5 This is negative 4 correlation. 3 • The line can be straight. 2 This is no correlation. 1 • The line can be goofy. 0 This is also no 0 1 2 3 4 5 6 7 8 correlation. Number of Games
Correlation Has Two Attributes Direc Di ection tion Str Stren ength gth • Strength refers to how • Positive close to the line all the correlation dots fall. • If they fall really close to • Negative the line, it is strong • If they fall kind of close to correlation the line, it is moderate • No correlation • If they aren’t very close to the line, it is weak
Correlation Has Two Attributes Str Strong ong 8 Stren Str ength gth Ne Nega gativ tive 7 • Strength refers to how 6 close to the line all the 5 dots fall. 4 • If they fall really close to 3 the line, it is strong 2 • If they fall kind of close to 1 the line, it is moderate 0 • If they aren’t very close to 0 1 2 3 4 5 6 7 8 the line, it is weak
Correlation Has Two Attributes Str Strong ong 8 Stren Str ength gth Posit ositiv ive 7 • Strength refers to how 6 close to the line all the 5 dots fall. 4 • If they fall really close to 3 the line, it is strong 2 • If they fall kind of close to 1 the line, it is moderate 0 • If they aren’t very close to 0 1 2 3 4 5 6 7 8 the line, it is weak
Correlation Has Two Attributes Moder Moderate te 8 Stren Str ength gth Posit ositiv ive 7 • Strength refers to how 6 close to the line all the 5 dots fall. 4 • If they fall really close to 3 the line, it is strong 2 • If they fall kind of close to 1 the line, it is moderate 0 • If they aren’t very close to 0 1 2 3 4 5 6 7 8 the line, it is weak
Correlation Has Two Attributes Hey, what’s Weak eak 8 Stren Str ength gth that? tha t?? ? Outl Outlier! ier! Posit ositiv ive 7 • Strength refers to how 6 close to the line all the 5 dots fall. 4 • If they fall really close to 3 the line, it is strong 2 • If they fall kind of close to 1 the line, it is moderate 0 • If they aren’t very close to 0 1 2 3 4 5 6 7 8 the line, it is weak
Outliers in Correlation • Outliers can have a very powerful effect on a correlation • An outlier in any of the 4 corners of the plot can really affect the direction of the line • An outlier can also change the correlation from strong and moderate to weak • It’s good to look at a scatterplot to make sure you identify outliers
Correlation Coefficient r Putting a Number on Correlation
Correlation Coefficient r • Remember “coefficient” from CV (coefficient of variation)? • Coefficient just means a number • r stands for the sample correlation coefficient • Remember! Corrrrrrrrrrrrrrrrrrelation • Population correlation coefficient = • We will only focus on r
What is r? Wha hat i t it i t is Ho How w to i to inter nterpr pret et it it • • A numerical quantification of The r calculation produces a how correlated a set of x,y number pairs are • The lowest number possible is • Calculated from plugging -1.0 x,y pairs into an equation • Perfect negative correlation • Has a defining formula and • The highest possible number is a computational formula 1.0 • I will demonstrate • Perfect positive correlation computational formula • All others are in-between
Examples of Negative r r = -0.25 r = -0.70 r = -0.44 OPINION!!! For negative correlations: • 0.0 to -0.40: Weak • -0.40 to -0.70: Moderate • -0.70 to -1.0: Strong
Examples of Positive r r = 0.66 r = 0.92 OPINION!!! For positive correlations: • 0.0 to 0.40: Weak • 0.40 to 0.70: Moderate • 0.70 to 1.0: Strong
Calculating r Computational Formula
Computational Formula • FLASHBACK! …to Chapter n Σ xy – ( Σ x)( Σ y) r = √nΣ x 2 – ( Σ x) 2 3.2 √nΣ y 2 – ( Σ y) 2 • Notice all the Σ’s Hypothetical Scenario • We have 7 patients • As before, we will • They have come to the clinic for • make columns appointments throughout the year. • We predict those with a higher diastolic • make calculations blood pressure (DBP) will have more • Then add up the appointments columns to get these Σ’s • We take DBP at last appointment as “x” • We take number of appointments over the year as “y”
x=DBP , y=# of Appointments x 2 y 2 # x y xy n Σ xy – ( Σ x)( Σ y) r = 1 70 3 √nΣ x 2 – ( Σ x) 2 √nΣ y 2 – ( Σ y) 2 2 115 45 3 105 21 4 82 7 5 93 16 6 125 62 7 88 12 Σ x = Σ y = 678 166
x=DBP , y=# of Appointments x 2 y 2 # x y xy n Σ xy – ( Σ x)( Σ y) r = 1 70 3 √nΣ x 2 – ( Σ x) 2 √nΣ y 2 – ( Σ y) 2 2 115 45 3 105 21 4 82 7 5 93 16 6 125 62 7 88 12 Σ x = Σ y = 678 166
x=DBP , y=# of Appointments x 2 y 2 # x y xy n Σ xy – ( Σ x)( Σ y) r = 1 70 3 √nΣ x 2 – ( Σ x) 2 √nΣ y 2 – ( Σ y) 2 2 115 45 3 105 21 NOT! 4 82 7 5 93 16 6 125 62 7 88 12 Σ x = Σ y = 678 166 Σ xy will go here
Recommend
More recommend