probability and statistics
play

Probability*and*Statistics* ! for*Computer*Science** - PowerPoint PPT Presentation

Probability*and*Statistics* ! for*Computer*Science** All!models!are!wrong,!but!some! models!are!useful555!George!Box! Credit:!wikipedia! Hongye!Liu,!Teaching!Assistant!Prof,!CS361,!UIUC,!4.14.2020! Last*time*


  1. Probability*and*Statistics* � ! for*Computer*Science** “All!models!are!wrong,!but!some! models!are!useful”555!George!Box! Credit:!wikipedia! Hongye!Liu,!Teaching!Assistant!Prof,!CS361,!UIUC,!4.14.2020!

  2. Last*time* � StochasOc!Gradient!Descent! � Naïve!Bayesian!Classifier!

  3. An*example*of*Naive*Bayes*training* Modeling!!!!!!!!!!!!!!! Training!data! Modeling!!!!!!!!!!!!!!! P ( x (1) | y ) P ( x (2) | y ) as!normal! laser as!Poisson! V ÷ X (1) % X (2) % y% P ( x (2) | y = 1) P ( x (1) | y = 1) 3.5! 10! 1! λ MLE = 10 + 8 µ MLE = 3 . 5 + 1 . 0 = 9 = 2 . 25 1.0! 8! 1! 2 2 σ MLE = 1 . 25 P ( x (2) | y = − 1) 0.0! 10! 51! 53.0! 14! 51! λ MLE = 12 P ( x (1) | y = − 1) Modeling!!!!!!!!!!!!!!! P ( y ) discrete µ MLE = − 1 . 5 as!Bernoulli! & a σ MLE = 1 . 5 P ( y = 1) = 2 9 3. ? 4 = 0 . 5 0 pay 1*7 PIXIYIPCYI T - P ( y = − 1) = 0 . 5 pix )

  4. Classification*example:* For!a!new!feature!vector!x!=![x1,x2,…],!ie!x!=![3,9]!in! the!example! � � d � logP ( x ( j ) | y ) + log P ( y ) argmax y j =1 !!!!!!!!!!!!!!!!!!!!!!!!! ! ! ! ! !

  5. Classification*example:* For!a!new!feature!vector!x!=![x1,x2,…],!ie!x!=![3,9]!in! the!example! � � d � logP ( x ( j ) | y ) + log P ( y ) argmax y j =1 !!!!!!!!!!!!!!!!!!!!!!!!! ! � y = 1 ! g ( y ) = y = − 1 ! ! !

  6. Example*of*Naïve*Bayesian*Model* “Bag!of!words”!Naive!Bayesian!models!for! document!class! " - ' subject g- !!!!!!!!!!!!!!! !!!!!!!!!!!!!!! !!!!!!!!!!!!!!!! I document!(represented!as!a! bag5of5words!bit!vector),! X5windows! MS5windows! each!column!is!a!word!

  7. What*about*the*decision*boundary?* � Not!explicit!as!in!the!case!of!decision!tree! � This!method!is!parametric,!generaOve! � The!model!was!specified!with!parameters!to! generate!label!for!test!data!

  8. Pros*and*Cons*of*Naïve*Bayesian*Classifier* � Pros:! � Simple!approach! � Good!accuracy! � Good!for!high!dimensional!data! � Cons:! � The!assumpOon!of!condiOonal!independence!of! features! � No!explicit!decision!boundary! � SomeOmes!has!numerical!issues!

  9. Content* � Naïve!Bayesian!Classifier!(cont)! � Linear!regression! � The%problem% � The!least!square!soluOon! � The!training!and!predicOon! � The!R5squared!for!the! evaluaOon!of!the!fit.!

  10. Some*popular*topics*in*Ngram*

  11. Regression*models*are*Machine*learning* methods* � Regression!models!have!been! around!for!a!while! � Dr.!Kelvin!Murphy’s!Machine! Learning!book!has!3+!chapters!on! regression!

  12. Content* � Linear!regression! � The%problem% � The!least!square!soluOon! � The!training!and!predicOon! � The!R5squared!for!the! evaluaOon!of!the!fit.! !

  13. Wait,*have*we*seen*the*linear*regression* before?* Texts . Tj = @ I ' = T com

  14. It’s*about* Relationship *between* data*features* � Example:!does!the!Height!of!people!relate!to! people’s!weight?!! � x!:!!HIGHT,!!y:!WEIGHT!

  15. Chicago*social*economic*census* � The!census!included!77!communiOes!in!Chicago! = � The!census!evaluated!the!average!hardship!index!of!the!residents! . � The!census!evaluated!the!following!parameters!for!each! community:! PERCENT_OF_ HOUSING_CROWDED% � PERCENT_ HOUSEHOLD_BELOW_POVERTY% � PERCENT_ AGED_16p_UNEMPLOYED% � PERCENT_ AGED_25p_WITHOUT_HIGH_SCHOOL_DIPLOMA% � PERCENT_ AGED_UNDER_18_OR_OVER_64% � PER_CAPITA_ INCOME% � Given&a&new&community&and&its&parameters,&& can&you&predict&its&average&hardship&index&with&all&these&parameters?& !

  16. Chicago*social*economic*census* � The!scaler!plots!and! the!k5means!clusters! � Take!a!log!of!the! income!for!it!shows! beler!trend! !

  17. The*regression*problem* � Given!a!set!of!feature!vectors!x i !where!!each!has!a! numerical!label!y i ,!we!want!to!train!a!model!that!can! map!unlabeled!vectors!to!numerical!values! � We!can!think!of!regression!as!fimng!a!line!(or!curve! or!hyperplane,!etc.)!to!data! � Regression!is!like!classificaOon!except!that!the! case!of!regression)! 0 predicOon!target!is!a!number,!not!a!class!label.! (PredicOng!class!label!can!be!considered!a!special! feel valued

  18. Some*terminology* � Suppose!the!dataset!!!!!!!!!!!!!consists!of!N!labeled! { ( x , y ) } items! ( x i , y i ) - � If!we!represent!the!dataset!as!a!table! � The!d!columns! represenOng!!!!!!!!are!called! { x } → explanatory%variables% x ( j ) a � The!numerical!column!y ! x (1) x (2) y is!called!the! dependent% 1! 3! 0! variable%% 2! 3! 2! 3! 6! 5!

  19. Variables*of*the*Chicago*census* [1]!"PERCENT_OF_HOUSING_CROWDED"!!!!!!!!!!!!!!!!!!!! [2]"PERCENT_HOUSEHOLDS_BELOW_POVERTY"!!!!!!!!!!!!! [3]!"PERCENT_AGED_16p_UNEMPLOYED"!!!!!!!!!!!!!!!!!!! [4]"PERCENT_AGED_25p_WITHOUT_HIGH_SCHOOL_DI PLOMA"! [5]!"PERCENT_AGED_UNDER_18_OR_OVER_64"!!!!!!!!!!!!!! [6]"PER_CAPITA_INCOME"!!!!!!!!!!!!!!!!!!!!!!!!!!!! [7]!"HardshipIndex"!!

  20. Which*is*the*dependent*variable*in*the* census*example?* A.!"PERCENT_OF_HOUSING_CROWDED"!!! B.!"PERCENT_AGED_25p_WITHOUT_HIGH_SCHOOL_DIPLOMA”! o C.!"HardshipIndex”!! D.!"PERCENT_AGED_UNDER_18_OR_OVER_64"!!!!!!!!!!!!!!

  21. Linear*model* x ( j ) � We!begin!by!modeling!y!as!a!linear!funcOon!of!!!!!!!!!!!!!! i plus!randomness! y = x (1) β 1 + x (2) β 2 + ... + x ( d ) β d + ξ Where!!!!is!a!zero5mean!random!variable!that! ξ represents!model!error!! ¥7 � !In!vector!notaOon: ! x (1) x (2) y y = x T β + ξ Cl 1! 3! 0! Where!!!!!is!the!d5dimensional! 2! 3! 2! β vector!of!coefficients!that!we!train! 3! 6! 5!

  22. Each*data*item*gives*an*equation* � The!model:!! y = x T β + ξ = x (1) β 1 + x (2) β 2 + ξ It fit 3949 , o = = 2+4.439452 2 Training!data! 5=3 × 1 , -16pct 'S , x (1) x (2) y 1! 3! 0! 2! 3! 2! 3! 6! 5!

  23. Which*together*form*a*matrix*equation* � The!model!! y = x T β + ξ = x (1) β 1 + x (2) β 2 + ξ       ξ 1 0 1 3 � β 1 �  = Training!data! ξ 2 2 2 3 +      β 2 ξ 3 5 3 6 x (1) x (2) y 1! 3! 0! 2! 3! 2! 3! 6! 5!

  24. Which*together*form*a*matrix*equation* � The!model!! y = x T β + ξ = x (1) β 1 + x (2) β 2 + ξ x' 4 " ' x ① T       ξ 1 0 1 3 � β 1 �  = Training!data! ξ 2 2 2 3 +      β 2 ' ' ξ 3 5 3 6 - x (1) x (2) u t t y µ ( 1! 3! 0! y = X · β + e 2! 3! 2! ? Ex , It . ii. 3! 6! 5! a-

  25. Q.*What’s*the*dimension*of*matrix*X?* o A. !N!!×!d! B. !d!!×!N! C. !N!!×!N! D. !d!×!!d! Z

  26. Training*the*model*is*to*choose*β* � Given!a!training!dataset!!!!!!!!!!!!!,!we!want!to!fit!a! { ( x , y ) } model!! y = x T β + ξ -       ξ 1 y 1 x T 1 � Define!!!!!!!!!!!!!!!!!!!!and!!!!!!!!!!!!!!!!!!!!and!! . .   . . y = X =     . e = . . . .       x T ξ N y N N � To!train!the!model,!we!need!to!choose!!!!!!that!makes!!!!! β e small!in!the!matrix!equaOon!!!! y = X · β + e -

  27. Training*using*least*squares* � In!the!least!squares!method,!we!aim!to!minimize!! � e � 2 - � e � 2 = � y − X β � 2 = ( y − X β ) T ( y − X β ) , .IT → scalar µ - � DifferenOaOng!with!respect!to!!!!!and!semng!to!zero!! β - X T X β − X T y = 0 ' tee � If!!!!!!!!!!!!!is!inverOble,!the!least!squares!esOmate!of! X T X the!coefficient!is:!! Xp = y � β = ( X T X ) − 1 X T y ' XTY x Yip = c × txT

  28. ⇒ = Cy - xp 5cg - Xp ) CA B) T= B' AT - Hell = yty - ptxty - ytxptpixtxp all involving vector lnratr :X derivative useful a , b vectors aiazAa = are ( A+ Atta Aismatr.is# ⇒ zqptxtxp > ⇒ HpM= × Ty 2lbIaaL=b Xp = 2 zp- ⇒ dPItp=xTy b. Ta scalar is since HbI=2'bIaa=aig ' = , " is all items Cl ) scalar , in Hell scalar Note are 211 ell ' - xTytzxTxp=o o - XT y -2g = ⇒ xTxB=xTy XTX tx XT \ ✓ T - - ' x' B = cxtx , = zxx y I' x ' - 11 × 11 - if - ' -11 × 112 × I

  29. Convex*set*and*convex*function* � If!a!set!is!convex,! any!line!connecOng! y-txtb-zcyi-axi.bg two!points!in!the! set!is!completely! i included!in!the!set!! � A!convex!funcOon:! the!area!above!the! curve!is!convex!! f ( λ x + (1 − λ ) y ) < λ f ( x ) + (1 − λ ) f ( y ) � The!least!square! - funcOon!is! convex% Credit:!Dr.!Kelvin!Murphy! -

  30. What’s*the*dimension*of*matrix*X T X?* A. !N!!×!d! B. !d!!×!N! C. !N!!×!N! D. !d!×!!d! I

  31. What’s*the*dimension*of*matrix*X T X?* A. !N!!×!d! B. !d!!×!N! C. !N!!×!N! D. !d!×!!d!

  32. Is*this*statement*true?* If!the!matrix! X T X !does!NOT!have!zero!valued!eigenvalues,! it!is!inverOble.! - A. !TRUE! B. !FALSE! Xp o T symmetric , d xd X x det to detcxtxl > o -

  33. Is*this*statement*true?* If!the!matrix! X T X !does!NOT!have!zero!valued!eigenvalues,! it!is!inverOble.! A. !TRUE! B. !FALSE!

  34. Training*using*least*squares*example* � Model:!! y = x T β + ξ = x (1) β 1 + x (2) β 2 + ξ � � 2 � β = ( X T X ) − 1 X T y = − 1 Training!data! 3 x (1) x (2) y � β 1 = 2 1! 3! 0! β 2 = − 1 � 2! 3! 2! 3 3! 6! 5!

Recommend


More recommend