probability and statistics
play

Probability and Statistics for Computer Science Correla)on is not - PowerPoint PPT Presentation

Probability and Statistics for Computer Science Correla)on is not Causa)on but Correla)on is so beau)ful! Credit: wikipedia Hongye Liu, Teaching Assistant Prof, CS361, UIUC, 9.1.2020 " " # in your Please use sign *


  1. Probability and Statistics ì for Computer Science “Correla)on is not Causa)on” but Correla)on is so beau)ful! Credit: wikipedia Hongye Liu, Teaching Assistant Prof, CS361, UIUC, 9.1.2020

  2. " " # in your Please use sign * question formal indicate chat a to comment or . mic keep eo please your mute * sound quality the Zoom . the of websites out check please * chat the Notebook in & Code simulation .

  3. Last time Parameters Location i Mode Mean IM ) Median , , parameters Scale : Inter quartile Standard ( g ) ' range ciqr ) deviation ( 62 ) variance x' I x ' Data : standardizing

  4. Objectives � Median, Interquar)le range, box plot and outlier � ScaRer plots, Correla)on Coefficient Heatmap, 3D bar, Time series plots, I � Visualizing & Summarizing rela%onships

  5. Median � To organize the data we first sort it � Then if the number of items N is odd median = middle item's value if the number of items N is even median = mean of middle 2 items' values

  6. Properties of Median � Scaling data scales the median median ( { k · x i } ) = k · median ( { x i } ) c Ei , ki - ul ) a rgmmin median = � Transla)ng data translates the median median ( { x i + c } ) = median ( { x i } ) + c -

  7. Percentile � k th percen)le is the value rela)ve to which k% of the data items have smaller or equal numbers � Median is roughly the 50 th percen)le 12 } I ' I 5 6 7 4 3 2 . , , , , , , ¥751 = ? percentile 6 > 5th .

  8. Interquartile range � iqr = (75th percen)le) - (25th percen)le) -1 20 � Scaling data scales the interquar)le range iqr ( { k · x i } ) = | k | · iqr ( { x i } ) AT � Transla)ng data does NOT change the interquar)le range iqr ( { x i + c } ) = iqr ( { x i } )

  9. Box plots Vehicle death by region � Boxplots � Simpler than histogram DEATH � Good for outliers � Easier to use for comparison Data from hRps://www2.stetson.edu/ ~jrasp/data.htm

  10. Boxplots details, outliers � How to Outlier define > 1.5 iqr Whisker outliers? - (the default) foot Box Interquar)le Range (iqr) Median < 1.5 iqr

  11. Q. TRUE or FALSE mean is more sensi)ve to outliers than median ⑦ True False B.

  12. Q. TRUE or FALSE interquar)le range is more sensi)ve to outliers than std. A True ⑤ false

  13. Sensitivity of summary statistics to outliers � mean and standard devia)on are - - very sensi)ve to outliers � median and interquar)le range are - - not sensi)ve to outliers

  14. Modes � Modes are peaks in a histogram � If there are more than 1 mode, we should be curious as to why

  15. Multiple modes � We have seen the “iris” data which looks to Iris have several peaks Data: “iris” in R

  16. Example Bi-modes distribution � Modes may indicate mul)ple popula)ons blood cell red Data: Erythrocyte cells in healthy humans Piagnerelli, JCP 2007

  17. Tails and Skews O tails outlier , C → night + nil Credit: Prof.Forsyth

  18. t.tl#. - 3 3 Smiled - I 4 - z l o L : an arrears -

  19. Q. How is this skewed? A Lep I B Right 46 mean = ? Median = 47

  20. Looking at relationships in data � Finding rela)onships between features in a data set or many data - sets is one of the most important tasks in data analysis

  21. Relationship between data features � Example: does the weight of people relate to their height? Q � x : HIGHT, y: WEIGHT

  22. Scatter plot � Body Fat data set

  23. Scatter plot � ScaRer plot with density O o° O

  24. Scatter plot � Removed of outliers & standardized

  25. Correlation y ✓ y ✓ covariance . Y . I ch 13

  26. Correlation seen from scatter plots Zero Posi)ve Nega)ve Correla)on correla)on correla)on Credit: Prof.Forsyth

  27. What kind of Correlation? � Line of code in a database and number of bugs � Frequency of hand washing and number of germs on your hands � GPA and hours spent playing video games � earnings and happiness Credit: Prof. David Varodayan

  28. Correlation doesn’t mean causation � Shoe size is correlated to reading skills, but it doesn’t mean making feet grow will make one person read faster.

  29. Correlation Coefficient � Given a data set consis)ng of { ( x i , y i ) } items ( x 1 , y 1 ) ... ( x N , y N ) , � Standardize the coordinates of each feature: x i = x i − mean ( { x i } ) y i = y i − mean ( { y i } ) � � std ( { x i } ) std ( { y i } ) � Define the correla)on coefficient as: N � corr ( { ( x i , y i ) } ) = 1 x i � � y i N i =1

  30. Correlation Coefficient x i = x i − mean ( { x i } ) y i = y i − mean ( { y i } ) � � std ( { x i } ) std ( { y i } ) � N corr ( { ( x i , y i ) } ) = 1 x i � � y i N i =1 = mean ( { � y i } ) x i �

  31. Q: Correlation Coefficient � Which of the following describe(s) correla)on coefficient correctly? A. It’s unitless B. It’s defined in standard coordinates o C. Both A & B N � corr ( { ( x i , y i ) } ) = 1 x i � � y i N i =1

  32. A visualization of correlation coefficient hRps://rpsychologist.com/d3/correla)on/ In a data set consis)ng of items { ( x i , y i ) } ( x 1 , y 1 ) ... ( x N , y N ) , shows posi)ve correla)on corr ( { ( x i , y i ) } ) > 0 shows nega)ve correla)on corr ( { ( x i , y i ) } ) < 0 shows no correla)on corr ( { ( x i , y i ) } ) = 0

  33. The Properties of Correlation Coefficient � The correla)on coefficient is symmetric corr ( { ( x i , y i ) } ) = corr ( { ( y i , x i ) } ) � Transla)ng the data does NOT change the correla)on coefficient

  34. The Properties of Correlation Coefficient � Scaling the data may change the sign of the correla)on coefficient corr ( { ( a x i + b, c y i + d ) } ) = sign ( a c ) corr ( { ( x i , y i ) } )

  35. 4 : - Z - 44 4 - Z 2 O

  36. 4 : -2 -4 - 4 -2 4 0 2

  37. The Properties of Correlation Coefficient � The correla)on coefficient is bounded within [-1, 1] if and only if x i = � � corr ( { ( x i , y i ) } ) = 1 y i if and only if corr ( { ( x i , y i ) } ) = − 1 x i = − � � y i

  38. Which%of%the%following%has%correlation% coefficient%equal%to%1?% Y Y Y ÷ . . × ^ a A. #Leb#and#right# B. #Leb# C. #Middle# #

  39. Concept of Correlation Coefficient’s bound � The correla)on coefficient can be wriRen as � N corr ( { ( x i , y i ) } ) = 1 x i � � y i T > N vi. U i =1 N - Vi = -2 Ui � N � � x i y i II corr ( { ( x i , y i ) } ) = √ √ N N i =1 � It’s the inner product of two vectors � � � � and � y 1 � y N � x 1 � x N √ √ √ √ N , ... N , ... N N

  40. Inner product � Inner product’s geometric meaning: ν 1 EEE | ν 1 | | ν 2 | cos ( θ ) θ ν 2 � Lengths of both vectors ν 1 = � � ν 2 = � � � � y 1 y N x N � x 1 � √ √ √ √ N , ... N , ... N N are 1

  41. Bound of correlation coefficient | corr ( { ( x i , y i ) } ) | = | cos ( θ ) | ≤ 1 = ν 1 θ ν 2 ν 1 = � � � � ν 2 = y 1 � y N � x N � x 1 � √ √ √ √ N , ... N , ... N N

  42. The Properties of Correlation Coefficient � Symmetric � Transla)ng invariant � Scaling only may change sign � bounded within [-1, 1]

  43. Using correlation to predict � Cau'on ! Correla)on is NOT Causa)on 7 Credit: Tyler Vigen

  44. How do we go about the prediction? � Removed of outliers & standardized

  45. Using correlation to predict � Given a correlated data set { ( x i , y i ) } we can predict a value that goes with p y 0 a value x 0 � In standard coordinates { ( � x i , � y i ) } we can predict a value that goes with � p y 0 a value � x 0

  46. Q: � Which coordinates will you use for the predictor using correla)on? A. Standard coordinates D B. Original coordinates C. Either

  47. Linear predictor and its error � We will assume that our predictor is linear y p = a � � x + b � We denote the predic)on at each in the data � x i set as p � y i p = a � � y i x i + b � The error in the predic)on is denoted u i p = � u i = � y i − � y i − a � y i x i − b

  48. ⇒ Require the mean of error to be zero We would try to make the mean of error equal to zero so that it is also centered around 0 as the standardized data: mean 45 - ij% Yeargain center = - a E - b 3 , mean 48 = meant 5- a. meant 35 = - b - b = O = b = 0 A

  49. Require the variance of error is minimal 3%2 ) # mean 14 Ui - mean GZ minimize , • = meant In :3 ' ? mean Cfc E - yep , -3 , O = " -4533 = mean CECE - ax - zeaxagt a' E' 3 , a = mean 48 ' Hein "3sta ' - y = mean 48 ' } ) - za nee managing TE moonlit -3 ) - i - rear ta = - of } ) = mean CECIL Ice - sashay ← varia 't - =o da - 28+29=0

  50. Require the variance of error is minimal

  51. Here is the linear predictor! jP=a Ee b y p = r � � x q = r b =o Correla)on coefficient

  52. Prediction Formula � In standard coordinates p = r � � r = corr ( { ( x i , y i ) } ) where y 0 x 0 � In original coordinates y p 0 − mean ( { y i } ) = rx 0 − mean ( { x i } ) std ( { y i } ) std ( { x i } )

Recommend


More recommend