Probability*and*Statistics* ! ! for*Computer*Science** “The!statement!that!“The! average!US!family!has!2.6! children”!invites!mockery”!–! Prof.!Forsyth!reminds!us! about!criAcal!thinking! Credit:!wikipedia! Hongye!Liu,!Teaching!Assistant!Prof,!CS361,!UIUC,!1.23.2020!
Last*lecture* � Course!material!and!survey! � Meet!staff!team!! � Overview!of!CS361! � Lecture!1!Q!Data!VisualizaAon!&! Summary!(I)!
Lecture*videos*and*ClassTranscribe** � Lecture!will!be!videotaped!and!accessible! at!hXps://mediaspace.illinois.edu/! � ClassTranscribe!provides!transcripts!for! lecture!videos! !hXps://classtranscribe.illinois.edu/home! o be connected ! T ! ! !
We*learned* � Visualizing!with! ! !Tables,!bar!charts,!histograms! � Summarizing!with!locaAon!parameter!! !Mean!
Visualizing*Data*with*Histogram*(III)* � CondiAonal!! Participation 0 1 histogram! 15 Mean!(aqua)!=!890!! Mean!(red)!=!760! 10 count Data:!Combined!Score!(HWs,! 5 Prj!and!Exams)!grouped!by! students!with!full! parAcipaAon!or! not$full !in! 0 CS361!fall!2019! 400 600 800 1000 Total_HWPRJExam
Today:*More*Summary*(descriptive* statistics*&*Data*Visualization)* � Mean! � Visualizing!&! Summarizing! � Standard!deviaAon! rela%onships, � Variance! � Heatmap! � 3D!bar! � Standardizing!data! � Time!series!plots! � Median,!interquarAle! � ScaXer!plots! range,!! � CorrelaAon! � box!plots!and!outliers! coefficient!
Summarizing*1D*continuous*data* For!a!data!set!{x}!or!annotated!as!{x i },!we! summarize!with:! � LocaAon!Parameters! � Mean!! � Median! � Mode! � Scale!parameters! � Standard!deviaAon!and!variance! � InterquarAle!range!
Summarizing*1D*continuous*data** � Mean! N mean ( { x i } ) = 1 � x i !! ! N i =1 It’s!the!centroid!of!the!data!geometrically,! by!idenAfying!the!data!set!at!that!point,!you!find!! the!center!of!balance.!
Properties*of*the*mean* � Scaling!data!scales!the!mean! mean ( { k · x i } ) = k · mean ( { x i } ) � TranslaAng!the!data!translates!the!mean! !! ! mean ( { x i + c } ) = mean ( { x i } ) + c
Less*obvious*properties*of*the*mean* � The!signed!distances!from!the!mean!! !sum!to!0! N � ( x i − mean ( { x i } )) = 0 i =1 � The!mean!minimizes!the!sum!of!the! squared!distance!from!any!real!value! N !! ! ( x i − µ ) 2 = mean ( { x i } ) � argmin µ i =1
prove Ig , I ki - mean Cfn ; ) ) ) = O N N Xi - I mean Ctx :3 ) I LHS E- I e- I - mean 443 , I N Xi - N Z X c E l ' meme { * i } ) = T v - Ex : [ H s Ex . =o .
= mean Chi ) ) Argmui Prove : 2 dffcg him dd-u.IE , - out CX ; D8 = df ' Ex T d - H = ¥1 , darkie 's -d¥n =÷I , dagos sexier = 7¥ , 2g C - l ) . - a) th ) = o = EY , Xxi ji-won Chih )
Qs:* ! � What!is!the!answer!for! ! mean ( mean ({x i }))!?! ! !! ! A.! mean ({x i })!!!!B.!unsure!!!C.!0! ! � Recall!in!which!applicaAon!in!Lecture!1! were!the!means!of!experiments! compared? !
Standard*Deviation*(σ)* ! � The!standard!deviaAon ! � N � � 1 � � std ( { x i } ) = ( x i − mean ( { x i } )) 2 N i =1 � = mean ( { ( x i − mean ( { x i } )) 2 } )
¥¥¥¥ . * " ÷¥
Can*a*standard*deviation*of*a*dataset*be* J1?* A.!!YES! B.!!NO!
Properties*of*the*standard*deviation* � Scaling!data!scales!the!standard!deviaAon! std ( { k · x i } ) = | k | · std ( { x i } ) � TranslaAng!the!data!does! NOT !change!the! standard!deviaAon! !! ! std ( { x i + c } ) = std ( { x i } )
Standard*deviation:*Chebyshev’s* inequality*(1 st *look)* N � At!most!!!!!!items!are!k!standard! k 2 deviaAons!( σ )!away!from!the!mean! � Rough!jusAficaAon:! Assume!mean!=0! N − N K 2 0 . 5 N 0 . 5 N 0! K 2 K 2 k σ − k σ � 1 N [( N − N k )0 2 + N std = k 2 ( k σ ) 2 ] = σ
Variance*(σ 2 )* � Variance!!=!(standard!deviaAon) 2! N var ( { x i } ) = 1 � ( x i − mean ( { x i } )) 2 N i =1 � Scaling!and!translaAng!similar!to!standard! !!!!deviaAon! var ( { k · x i } ) = k 2 · var ( { x i } ) var ( { x i + c } ) = var ( { x i } )
Q:*Standard*deviation* � What!is!the!value!of! ! std ( mean ({x i })!?! A.!0!!!!!B.!1!!!!C.!unsure!
Standard*Coordinates/normalized* ! data* � The! mean !tells!where!the!data!set!is!and!the! standard*devia-on !tells!how!spread!out!it!is.! If!we!are!interested!only!in!comparing!the! shape,!we!could! !define:! x i = x i − mean ( { x i } ) � std ( { x i ) } � We!say!!!!!!!!!!is!in!standard!coordinates! { � x i }
Q:*Mean*of*standard*coordinates* ! � !μ!of!!!!!!!!!!is:!! { � x i } !A.!1!!B.!0!!!C.!unsure! x i = x i − mean ( { x i } ) � std ( { x i ) }
Q:*Standard*deviation*(σ)*of* standard*coordinates* ! � σ!of!!!!!!!!!is:!! { � x i } !A.!1!!B.!0!!C.!unsure! x i = x i − mean ( { x i } ) � std ( { x i ) }
Q:*Variance*of*standard*coordinates* ! � Variance!of!!!!!!!!!is:!! { � x i } !A.!1!!B.!0!!C.!unsure! x i = x i − mean ( { x i } ) � std ( { x i ) }
Q:*Estimate*the*range*of*data*in* standard*coordinates** ! � EsAmate!as!close!as!possible,!90%!data! is!within:!! !A.![Q10,!10]!!! !B.![Q100,!100]! !C.![Q1,!1]! x i = x i − mean ( { x i } ) � !D.![Q4,!4]! std ( { x i ) } !E.!others!
Standard*Coordinates/normalized*data*to** ! μ=0,*σ=1,*σ 2 =1* � Data!in!standard!coordinates!always!has!!! !mean!=!0;!standard!deviaAon!=1;! ! !variance!=!1.! � Such!data!is!unitQless,!plots!based!on!this! ! someAmes!are!more!comparable! ! � We!see!such!normalizaAon!very!oten!in! ! staAsAcs! !
Median* ! � To!organize!the!data!we!first!sort!it!! � Then! if !the!number!of!items!N!is!odd! !median!=!middle!item's!value!! ! if !the!number!of!items!N!is!even! !median!=!mean!of!middle!2!items'! !values!
Properties*of*Median* ! � Scaling!data!scales!the!median! ! median ( { k · x i } ) = k · median ( { x i } ) ! � TranslaAng!data!translates!the!median! median ( { x i + c } ) = median ( { x i } ) + c
Percentile* ! � !k th! percenAle!is!the!value!relaAve!to! which!k%!of!the!data!items!have!smaller! or!equal!numbers! � Median!is!the!50 th !percenAle! ! !
Q:*Scaling*effect*on*percentiles* ! � Scaling!data!scales!the!percenAle! !A.!True!!!!!B.!False! !
Q:*Translating*effect*on*percentiles* ! � TranslaAng!data!does! NOT !change!the! percenAle! !A.!True!!!B.!False!
Interquartile*range* ! � iqr!=!(75th!percenAle)!Q!(25th!percenAle)! � Scaling!data!scales!the!interquarAle!range! ! iqr ( { k · x i } ) = | k | · iqr ( { x i } ) � TranslaAng!data!does! NOT !change!the! interquarAle!range! iqr ( { x i + c } ) = iqr ( { x i } )
Summarizing*1D*continuous*data* � LocaAon!Parameters! � Mean!! � Median! � Mode! � Scale!parameters! � Standard!deviaAon!and!variance! � InterquarAle!range!
Box*plots* ! Vehicle!death!by!region! � Boxplots! � Simpler!than!! !histogram! DEATH! � Good!for!outliers! � Easier!to!use! for!comparison ! Data!from!hXps://www2.stetson.edu/ ~jrasp/data.htm!
Boxplots*details,*outliers* � How!to!! Outlier! define!! >!1.5!iqr! Whisker! outliers?! 75% (the!default)! Box! InterquarAle!! Range!(iqr)! Median! ni <!1.5!iqr! . !!
Sensitivity*of*summary*statistics*to* outliers* � mean!and!standard!deviaAon!are! very!sensiAve!to!outliers! � median!and!interquarAle!range!are! not!sensiAve!to!outliers!
Group*Discussion*
Modes* � Modes!are!peaks!in!a!histogram! � If!there!are!more!than!1!mode,!we! should!be!curious!as!to!why !
Multiple*modes* � We!have!seen! !the!“iris”!data! which!looks!to!! have!several!! peaks! Data:!“iris”!!
Example*BiJmodes*distribution* � Modes!may! indicate! mulAple! populaAons! Data:!Erythrocyte!cells!in! healthy!humans! ! Piagnerelli,!JCP!2007!
Tails*and*Skews* Credit:!Prof.Forsyth!
Assignments* � HW1,!due!on!1/30!Thurs.! � Reading!Chapter!2!of!the!textbook! � Next!Ame:!Looking!for!relaAonship!in! data;!correlaAon!coefficient! !
Additional*References* � Peter!Dalgaard!"Introductory!StaAsAcs"! with!R! � Charles!M.!Grinstead!and!J.!Laurie!Snell! "IntroducAon!to!Probability”!! � Morris!H.!Degroot!and!Mark!J.!Schervish! "Probability!and!StaAsAcs”!
See*you*next*time* See You!
Recommend
More recommend