ProbabilityandStatistics* ! forComputerScience** - PowerPoint PPT Presentation

Probability*and*Statistics* � ! for*Computer*Science** “All!models!are!wrong,!but!some! models!are!useful”555!George!Box! Credit:!wikipedia! Hongye!Liu,!Teaching!Assistant!Prof,!CS361,!UIUC,!4.14.2020!

Last*time* � StochasOc!Gradient!Descent! � Naïve!Bayesian!Classifier!

An*example*of*Naive*Bayes*training* Modeling!!!!!!!!!!!!!!! Training!data! Modeling!!!!!!!!!!!!!!! P ( x (1) | y ) P ( x (2) | y ) as!normal! laser as!Poisson! V ÷ X (1) % X (2) % y% P ( x (2) | y = 1) P ( x (1) | y = 1) 3.5! 10! 1! λ MLE = 10 + 8 µ MLE = 3 . 5 + 1 . 0 = 9 = 2 . 25 1.0! 8! 1! 2 2 σ MLE = 1 . 25 P ( x (2) | y = − 1) 0.0! 10! 51! 53.0! 14! 51! λ MLE = 12 P ( x (1) | y = − 1) Modeling!!!!!!!!!!!!!!! P ( y ) discrete µ MLE = − 1 . 5 as!Bernoulli! & a σ MLE = 1 . 5 P ( y = 1) = 2 9 3. ? 4 = 0 . 5 0 pay 1*7 PIXIYIPCYI T - P ( y = − 1) = 0 . 5 pix )

Classification*example:* For!a!new!feature!vector!x!=![x1,x2,…],!ie!x!=![3,9]!in! the!example! � � d � logP ( x ( j ) | y ) + log P ( y ) argmax y j =1 !!!!!!!!!!!!!!!!!!!!!!!!! ! ! ! ! !

Classification*example:* For!a!new!feature!vector!x!=![x1,x2,…],!ie!x!=![3,9]!in! the!example! � � d � logP ( x ( j ) | y ) + log P ( y ) argmax y j =1 !!!!!!!!!!!!!!!!!!!!!!!!! ! � y = 1 ! g ( y ) = y = − 1 ! ! !

Example*of*Naïve*Bayesian*Model* “Bag!of!words”!Naive!Bayesian!models!for! document!class! " - ' subject g- !!!!!!!!!!!!!!! !!!!!!!!!!!!!!! !!!!!!!!!!!!!!!! I document!(represented!as!a! bag5of5words!bit!vector),! X5windows! MS5windows! each!column!is!a!word!

What*about*the*decision*boundary?* � Not!explicit!as!in!the!case!of!decision!tree! � This!method!is!parametric,!generaOve! � The!model!was!specified!with!parameters!to! generate!label!for!test!data!

Pros*and*Cons*of*Naïve*Bayesian*Classifier* � Pros:! � Simple!approach! � Good!accuracy! � Good!for!high!dimensional!data! � Cons:! � The!assumpOon!of!condiOonal!independence!of! features! � No!explicit!decision!boundary! � SomeOmes!has!numerical!issues!

Content* � Naïve!Bayesian!Classifier!(cont)! � Linear!regression! � The%problem% � The!least!square!soluOon! � The!training!and!predicOon! � The!R5squared!for!the! evaluaOon!of!the!fit.!

Some*popular*topics*in*Ngram*

Regression*models*are*Machine*learning* methods* � Regression!models!have!been! around!for!a!while! � Dr.!Kelvin!Murphy’s!Machine! Learning!book!has!3+!chapters!on! regression!

Content* � Linear!regression! � The%problem% � The!least!square!soluOon! � The!training!and!predicOon! � The!R5squared!for!the! evaluaOon!of!the!fit.! !

Wait,*have*we*seen*the*linear*regression* before?* Texts . Tj = @ I ' = T com

It’s*about* Relationship *between* data*features* � Example:!does!the!Height!of!people!relate!to! people’s!weight?!! � x!:!!HIGHT,!!y:!WEIGHT!

Chicago*social*economic*census* � The!census!included!77!communiOes!in!Chicago! = � The!census!evaluated!the!average!hardship!index!of!the!residents! . � The!census!evaluated!the!following!parameters!for!each! community:! PERCENT_OF_ HOUSING_CROWDED% � PERCENT_ HOUSEHOLD_BELOW_POVERTY% � PERCENT_ AGED_16p_UNEMPLOYED% � PERCENT_ AGED_25p_WITHOUT_HIGH_SCHOOL_DIPLOMA% � PERCENT_ AGED_UNDER_18_OR_OVER_64% � PER_CAPITA_ INCOME% � Given&a&new&community&and&its&parameters,&& can&you&predict&its&average&hardship&index&with&all&these&parameters?& !

Chicago*social*economic*census* � The!scaler!plots!and! the!k5means!clusters! � Take!a!log!of!the! income!for!it!shows! beler!trend! !

The*regression*problem* � Given!a!set!of!feature!vectors!x i !where!!each!has!a! numerical!label!y i ,!we!want!to!train!a!model!that!can! map!unlabeled!vectors!to!numerical!values! � We!can!think!of!regression!as!fimng!a!line!(or!curve! or!hyperplane,!etc.)!to!data! � Regression!is!like!classificaOon!except!that!the! case!of!regression)! 0 predicOon!target!is!a!number,!not!a!class!label.! (PredicOng!class!label!can!be!considered!a!special! feel valued

Some*terminology* � Suppose!the!dataset!!!!!!!!!!!!!consists!of!N!labeled! { ( x , y ) } items! ( x i , y i ) - � If!we!represent!the!dataset!as!a!table! � The!d!columns! represenOng!!!!!!!!are!called! { x } → explanatory%variables% x ( j ) a � The!numerical!column!y ! x (1) x (2) y is!called!the! dependent% 1! 3! 0! variable%% 2! 3! 2! 3! 6! 5!

Variables*of*the*Chicago*census* [1]!"PERCENT_OF_HOUSING_CROWDED"!!!!!!!!!!!!!!!!!!!! [2]"PERCENT_HOUSEHOLDS_BELOW_POVERTY"!!!!!!!!!!!!! [3]!"PERCENT_AGED_16p_UNEMPLOYED"!!!!!!!!!!!!!!!!!!! [4]"PERCENT_AGED_25p_WITHOUT_HIGH_SCHOOL_DI PLOMA"! [5]!"PERCENT_AGED_UNDER_18_OR_OVER_64"!!!!!!!!!!!!!! [6]"PER_CAPITA_INCOME"!!!!!!!!!!!!!!!!!!!!!!!!!!!! [7]!"HardshipIndex"!!

Which*is*the*dependent*variable*in*the* census*example?* A.!"PERCENT_OF_HOUSING_CROWDED"!!! B.!"PERCENT_AGED_25p_WITHOUT_HIGH_SCHOOL_DIPLOMA”! o C.!"HardshipIndex”!! D.!"PERCENT_AGED_UNDER_18_OR_OVER_64"!!!!!!!!!!!!!!

Linear*model* x ( j ) � We!begin!by!modeling!y!as!a!linear!funcOon!of!!!!!!!!!!!!!! i plus!randomness! y = x (1) β 1 + x (2) β 2 + ... + x ( d ) β d + ξ Where!!!!is!a!zero5mean!random!variable!that! ξ represents!model!error!! ¥7 � !In!vector!notaOon: ! x (1) x (2) y y = x T β + ξ Cl 1! 3! 0! Where!!!!!is!the!d5dimensional! 2! 3! 2! β vector!of!coefficients!that!we!train! 3! 6! 5!

Each*data*item*gives*an*equation* � The!model:!! y = x T β + ξ = x (1) β 1 + x (2) β 2 + ξ It fit 3949 , o = = 2+4.439452 2 Training!data! 5=3 × 1 , -16pct 'S , x (1) x (2) y 1! 3! 0! 2! 3! 2! 3! 6! 5!

Which*together*form*a*matrix*equation* � The!model!! y = x T β + ξ = x (1) β 1 + x (2) β 2 + ξ       ξ 1 0 1 3 � β 1 �  = Training!data! ξ 2 2 2 3 +      β 2 ξ 3 5 3 6 x (1) x (2) y 1! 3! 0! 2! 3! 2! 3! 6! 5!

Which*together*form*a*matrix*equation* � The!model!! y = x T β + ξ = x (1) β 1 + x (2) β 2 + ξ x' 4 " ' x ① T       ξ 1 0 1 3 � β 1 �  = Training!data! ξ 2 2 2 3 +      β 2 ' ' ξ 3 5 3 6 - x (1) x (2) u t t y µ ( 1! 3! 0! y = X · β + e 2! 3! 2! ? Ex , It . ii. 3! 6! 5! a-

Q.*What’s*the*dimension*of*matrix*X?* o A. !N!!×!d! B. !d!!×!N! C. !N!!×!N! D. !d!×!!d! Z

Training*the*model*is*to*choose*β* � Given!a!training!dataset!!!!!!!!!!!!!,!we!want!to!fit!a! { ( x , y ) } model!! y = x T β + ξ -       ξ 1 y 1 x T 1 � Define!!!!!!!!!!!!!!!!!!!!and!!!!!!!!!!!!!!!!!!!!and!! . .   . . y = X =     . e = . . . .       x T ξ N y N N � To!train!the!model,!we!need!to!choose!!!!!!that!makes!!!!! β e small!in!the!matrix!equaOon!!!! y = X · β + e -

Training*using*least*squares* � In!the!least!squares!method,!we!aim!to!minimize!! � e � 2 - � e � 2 = � y − X β � 2 = ( y − X β ) T ( y − X β ) , .IT → scalar µ - � DifferenOaOng!with!respect!to!!!!!and!semng!to!zero!! β - X T X β − X T y = 0 ' tee � If!!!!!!!!!!!!!is!inverOble,!the!least!squares!esOmate!of! X T X the!coefficient!is:!! Xp = y � β = ( X T X ) − 1 X T y ' XTY x Yip = c × txT

⇒ = Cy - xp 5cg - Xp ) CA B) T= B' AT - Hell = yty - ptxty - ytxptpixtxp all involving vector lnratr :X derivative useful a , b vectors aiazAa = are ( A+ Atta Aismatr.is# ⇒ zqptxtxp > ⇒ HpM= × Ty 2lbIaaL=b Xp = 2 zp- ⇒ dPItp=xTy b. Ta scalar is since HbI=2'bIaa=aig ' = , " is all items Cl ) scalar , in Hell scalar Note are 211 ell ' - xTytzxTxp=o o - XT y -2g = ⇒ xTxB=xTy XTX tx XT \ ✓ T - - ' x' B = cxtx , = zxx y I' x ' - 11 × 11 - if - ' -11 × 112 × I

Convex*set*and*convex*function* � If!a!set!is!convex,! any!line!connecOng! y-txtb-zcyi-axi.bg two!points!in!the! set!is!completely! i included!in!the!set!! � A!convex!funcOon:! the!area!above!the! curve!is!convex!! f ( λ x + (1 − λ ) y ) < λ f ( x ) + (1 − λ ) f ( y ) � The!least!square! - funcOon!is! convex% Credit:!Dr.!Kelvin!Murphy! -

What’s*the*dimension*of*matrix*X T X?* A. !N!!×!d! B. !d!!×!N! C. !N!!×!N! D. !d!×!!d! I

What’s*the*dimension*of*matrix*X T X?* A. !N!!×!d! B. !d!!×!N! C. !N!!×!N! D. !d!×!!d!

Is*this*statement*true?* If!the!matrix! X T X !does!NOT!have!zero!valued!eigenvalues,! it!is!inverOble.! - A. !TRUE! B. !FALSE! Xp o T symmetric , d xd X x det to detcxtxl > o -

Is*this*statement*true?* If!the!matrix! X T X !does!NOT!have!zero!valued!eigenvalues,! it!is!inverOble.! A. !TRUE! B. !FALSE!

Training*using*least*squares*example* � Model:!! y = x T β + ξ = x (1) β 1 + x (2) β 2 + ξ � � 2 � β = ( X T X ) − 1 X T y = − 1 Training!data! 3 x (1) x (2) y � β 1 = 2 1! 3! 0! β 2 = − 1 � 2! 3! 2! 3 3! 6! 5!

ProbabilityandStatistics* ! forComputerScience** - PowerPoint PPT Presentation

ProbabilityandStatistics* ! forComputerScience** All!models!are!wrong,!but!some! models!are!useful555!George!Box! Credit:!wikipedia! Hongye!Liu,!Teaching!Assistant!Prof,!CS361,!UIUC,!4.14.2020! Lasttime

Probability Basics Martin Emms October 1, 2020 Probability Basics Outline Probability

Continuing Probability. Wrap up: Total Probability and Conditional Probability. Continuing

Chapter 2 Probability 1. Definition of Probability 2. Probability of disjoint events 3.

Probability Basics Probability Background Martin Emms October 1, 2020 Probability Basics

Chapter 2 Probability 1. Definition of Probability 2. Probability of disjoint events 3.

Categorical Probability and Statistics Peter McCullagh Department of Statistics University of

Counting and Probability Whats to come? Counting and Probability Whats to come?

Unit 2: Probability and distributions Lecture 1: Probability and conditional probability

Which probability Which probability Which probability Which probability theory for cosmology?

Recap of Basic Probability Elements of basic probability theory probability theory The

1 2 3 4 Stopping Probability Visiting Probability 5 Stopping

ACMS 20340 Statistics for Life Sciences Chapter 9: Introducing Probability Why Consider

Statistics 1B Statistics 1B 1 (11) 0. Lecture 1. Introduction and probability review

Statistics 370 Probability and Statistics for Engineers Instructor: Peter Bloomfield Course

Chapter II.2: Basic Probability Theory and Statistics 1. What is a probability? 1.1. Probability

Official Statistics Matt Dray, Assistant Statistician Official Statistics 2 Official

Long-term monitoring of LLGHGs & SLCPs in Asia and Oceania using voluntary observing ships

Lead Partner Event 28 February 2018 Points to accompany slides N+3 Ninety two paid claims to

The Faint Young Sun Problem Long-term climate/ Solar luminosity changes/ Constraints on

System Test Plan Release Version xx.xx.xx Document ID - Nummer Version: 0 Filename Project

Print version Updated: 13 April 2020 Lecture #43 Precipitation and Dissolution: Cadmium Case

CONTRACTING IN A WAR ZONE C ONTRACTING IN A W AR Z ONE o Deployment Experiences o Contracting

PRESENTATI ON OF ACTI VI TI ES OF ACTI VI TI ES PRESENTATI ON PRESENTATI ON OF ACTI VI TI ES

United States - Canada United States - Canada Regulatory Cooperation Council Regulatory

Probability*and*Statistics* ! for*Computer*Science** - PowerPoint PPT Presentation

Probability*and*Statistics* ! for*Computer*Science** All!models!are!wrong,!but!some! models!are!useful555!George!Box! Credit:!wikipedia! Hongye!Liu,!Teaching!Assistant!Prof,!CS361,!UIUC,!4.14.2020! Last*time*

Probability Basics Martin Emms October 1, 2020 Probability Basics Outline Probability

Continuing Probability. Wrap up: Total Probability and Conditional Probability. Continuing

Chapter 2 Probability 1. Definition of Probability 2. Probability of disjoint events 3.

Probability Basics Probability Background Martin Emms October 1, 2020 Probability Basics

Chapter 2 Probability 1. Definition of Probability 2. Probability of disjoint events 3.

Categorical Probability and Statistics Peter McCullagh Department of Statistics University of

Counting and Probability Whats to come? Counting and Probability Whats to come?

Unit 2: Probability and distributions Lecture 1: Probability and conditional probability

Which probability Which probability Which probability Which probability theory for cosmology?

Recap of Basic Probability Elements of basic probability theory probability theory The

1 2 3 4 Stopping Probability Visiting Probability 5 Stopping

ACMS 20340 Statistics for Life Sciences Chapter 9: Introducing Probability Why Consider

Statistics 1B Statistics 1B 1 (11) 0. Lecture 1. Introduction and probability review

Statistics 370 Probability and Statistics for Engineers Instructor: Peter Bloomfield Course

Chapter II.2: Basic Probability Theory and Statistics 1. What is a probability? 1.1. Probability

Official Statistics Matt Dray, Assistant Statistician Official Statistics 2 Official

Long-term monitoring of LLGHGs &amp; SLCPs in Asia and Oceania using voluntary observing ships

Lead Partner Event 28 February 2018 Points to accompany slides N+3 Ninety two paid claims to

The Faint Young Sun Problem Long-term climate/ Solar luminosity changes/ Constraints on

System Test Plan Release Version xx.xx.xx Document ID - Nummer Version: 0 Filename Project

Print version Updated: 13 April 2020 Lecture #43 Precipitation and Dissolution: Cadmium Case

CONTRACTING IN A WAR ZONE C ONTRACTING IN A W AR Z ONE o Deployment Experiences o Contracting

PRESENTATI ON OF ACTI VI TI ES OF ACTI VI TI ES PRESENTATI ON PRESENTATI ON OF ACTI VI TI ES

United States - Canada United States - Canada Regulatory Cooperation Council Regulatory

ProbabilityandStatistics* ! forComputerScience** - PowerPoint PPT Presentation

ProbabilityandStatistics* ! forComputerScience** All!models!are!wrong,!but!some! models!are!useful555!George!Box! Credit:!wikipedia! Hongye!Liu,!Teaching!Assistant!Prof,!CS361,!UIUC,!4.14.2020! Lasttime

Long-term monitoring of LLGHGs & SLCPs in Asia and Oceania using voluntary observing ships