Probability and Statistics for Computer Science All models are - PowerPoint PPT Presentation

Probability and Statistics ì for Computer Science “All models are wrong, but some models are useful”--- George Box Credit: wikipedia Hongye Liu, Teaching Assistant Prof, CS361, UIUC, 11.19.2020

Last time ✺ Linear regression ✺ The problem ✺ The least square soluPon ✺ The training and predicPon ✺ The R-squared for the evaluaPon of the fit.

Objectives ✺ Linear regression (cont.) ✺ Modeling non-linear relaPonship with linear regression ✺ Outliers and over-fiXng issues ✺ Regularized linear regression/Ridge regression ✺ Nearest neighbor regression

What if the relationship between variables is non-linear? ✺ A linear model will not produce a good fit if the dependent variable is not linear R 2 = 0.1 combinaPon of the explanatory variables

Transforming variables could allow linear model to model non-linear relationship ✺ In the word- frequency example, log-transforming both variables would allow a linear model to fit the data well.

More example: Data of fish in a Finland lake ✺ Perch (a kind of fish) in a lake in Finland, 56 data observaPons ✺ Variables include: Weight, Length, Height, Width ✺ In order to illustrate the point, let’s model Weight as the dependent variable and the Length as the explanatory variable. Yellow Perch

Is the linear model fine for this data? A. YES B. NO

Is the linear model fine for this data? ✺ R-squared is 0.87 may suggest the model is OK ✺ But the trend of the data suggests non- linear relaPonship ✺ IntuiPon tells us length is not linear to weight given fish is 3- dimensional ✺ We can do beger!

Transforming the explanatory variables

Q. What are the matrix X and y? Length 3 Weight 1

Transforming the dependent variables

What is the model now?

What are the matrix X and y? √ w Length 3 1

Effect of outliers on linear regression ✺ Linear regression is sensiPve to outliers

Effect of outliers: body fat example ✺ Linear regression is sensiPve to outliers

Over-fitting issue: example of using too many power transformations

Avoiding over-fitting ✺ Method 1: valida2on ✺ Use a validaPon set to choose the transformed explanatory variables ✺ The difficulty is the number of combinaPon is exponenPal in the number of variables. ✺ Method 2: regulariza2on ✺ Impose a penalty on complexity of the model during the training ✺ Encourage smaller model coefficients ✺ We can use validaPon to select regularizaPon parameter λ

Regularized linear regression ∥ e ∥ 2 ✺ In ordinary least squares, the cost funcPon is : ∥ e ∥ 2 = ∥ y − X β ∥ 2 = ( y − X β ) T ( y − X β ) ✺ In regularized least squares, we add a penalty with a weight parameter λ (λ>0): ∥ y − X β ∥ 2 + λ ∥ β ∥ 2 = ( y − X β ) T ( y − X β ) + λ β T β 2 2

Training using regularized least squares ✺ DifferenPaPng the cost funcPon and seXng it to zero, one gets: ( X T X + λ I ) β − X T y = 0 ✺ is always inverPble, so the regularized ( X T X + λ I ) least squares esPmaPon of the coefficients is: � β = ( X T X + λ I ) − 1 X T y

Why is the regularized version always invertible? Prove: ( X T X + λ I ) is inverPble (λ>0, λ is not the eigenvalue). Energy based definiPon of semi-posi2ve definite : Given a matrix A and any nonzero vector f , we have f T Af ≥ 0 and posi2ve definite means f T Af > 0 If A is posiPve definite, then all eigenvalues of A are posiPve, then it’s inverPble

Why is the regularized version always invertible? Prove: ( X T X + λ I ) is inverPble (λ>0, λ is not the eigenvalue). f T Af ≥ 0 f T Af > 0

Over-fitting issue: example from using too many power transformations

Choosing lambda using cross-validation

Q. Can we use the R-squared to evaluate the regularized model correctly? A. YES B. NO C. YES and NO

Nearest neighbor regression ✺ In addiPon to linear regression and generalize linear regression models, there are methods such as Nearest neighbor regression that do not need much training for the model parameters. ✺ When there is plenty of data, nearest neighbors regression can be used effecPvely

K nearest neighbor regression with k=1 The idea is very similar to k-nearest neighbor classifier, but the regression model predicts numbers K=1 gives piecewise constant predicPons

K nearest neighbor regression with weights The goal is to predict from using a training set y p { ( x , y ) } x 0 0 ✺ Let be the set of k items in the training { ( x j , y j ) } data set that are closest to . x 0 ✺ PredicPon is the following: � j w j y j y p 0 = � j w j Where are weights that drop off as gets further w j x j away from . x 0

Choose different weights functions for KNN regression � j w j y j y p 0 = � j w j ✺ Inverse distance 1 w j = ∥ x 0 − x j ∥ ✺ ExponenPal funcPon w j = exp ( −∥ x 0 − x j ∥ 2 ) 2 σ 2

Evaluation of KNN models ✺ Which methods do you use to choose K and weight funcPons? A. Cross validaPon B. EvaluaPon of MSE C. Both A and B

The Pros and Cons of K nearest neighbor regression ✺ Pros: ✺ The method is very intuiPve and simple ✺ You can predict more than numbers as long as you can define a similarity measure. ✺ Cons ✺ The method doesn’t work well for very high dimensional data ✺ The model depends on the scale of the data

Assignments ✺ Finish Chapter 13 of the textbook ✺ Next Pme: Curse of Dimension, clustering

Additional References ✺ Robert V. Hogg, Elliot A. Tanis and Dale L. Zimmerman. “Probability and StaPsPcal Inference” ✺ Kelvin Murphy, “Machine learning, A ProbabilisPc perspecPve”

See you next time See You!

Probability and Statistics for Computer Science All models are - PowerPoint PPT Presentation

Probability and Statistics for Computer Science All models are wrong, but some models are useful--- George Box Credit: wikipedia Hongye Liu, Teaching Assistant Prof, CS361, UIUC, 11.19.2020 Last time Linear regression The

Probability Basics Martin Emms October 1, 2020 Probability Basics Outline Probability

Continuing Probability. Wrap up: Total Probability and Conditional Probability. Continuing

Chapter 2 Probability 1. Definition of Probability 2. Probability of disjoint events 3.

Probability Basics Probability Background Martin Emms October 1, 2020 Probability Basics

Chapter 2 Probability 1. Definition of Probability 2. Probability of disjoint events 3.

Categorical Probability and Statistics Peter McCullagh Department of Statistics University of

Counting and Probability Whats to come? Counting and Probability Whats to come?

Unit 2: Probability and distributions Lecture 1: Probability and conditional probability

Which probability Which probability Which probability Which probability theory for cosmology?

Recap of Basic Probability Elements of basic probability theory probability theory The

1 2 3 4 Stopping Probability Visiting Probability 5 Stopping

ACMS 20340 Statistics for Life Sciences Chapter 9: Introducing Probability Why Consider

Statistics 1B Statistics 1B 1 (11) 0. Lecture 1. Introduction and probability review

Statistics 370 Probability and Statistics for Engineers Instructor: Peter Bloomfield Course

Chapter II.2: Basic Probability Theory and Statistics 1. What is a probability? 1.1. Probability

Official Statistics Matt Dray, Assistant Statistician Official Statistics 2 Official

crowdsourcing using location- based questionnaires Google StreetView to rebuild a cyclists

Why does water fall from an inverted glass ? Olivier Soulard CEA-DAM CEMRACS, Marseille 14

Its been a tough few years! 2 1 2018-09-24 Some important drivers 3 1) How we manage our

Eric Mintun HEP-AI Journal Club May 15th, 2018 Outline Motivating example and definition

Hidden and Mended Symmetries and Compact Stars Mannque Rho CEA Saclay Nagoya, March 2015

The effect of the spin of a neutron star on the stability of an accretion disc sniewicz 1 Mateusz

Animation CS418 Computer Graphics John C. Hart Keyframe Animation Set target positions for

Animation CS418 Interactive Computer Graphics John C. Hart Keyframe Animation Set target