Recall: Linear Regression 200 180 160 140 Power - PowerPoint PPT Presentation

Logis&c ¡Regression ¡ Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata September 1, 2014

Recall: ¡Linear ¡Regression ¡ 200 ¡ 180 ¡ 160 ¡ 140 ¡ Power ¡(bhp) ¡ 120 ¡ 100 ¡ 80 ¡ 60 ¡ 40 ¡ 20 ¡ 0 ¡ 0 ¡ 500 ¡ 1000 ¡ 1500 ¡ 2000 ¡ 2500 ¡ Engine ¡displacement ¡(cc) ¡ § Assume: the relation is linear § Then for a given x (=1800), predict the value of y § Both the dependent and the independent variables are continuous 2 ¡

Scenario: ¡Heart ¡disease ¡– ¡vs ¡– ¡Age ¡ Training set Age (numarical): Yes ¡ independent variable Heart disease ( Y ) Heart disease (Yes/No): dependent variable with two classes Task: Given a new No ¡ person’s age, predict if 0 ¡ 20 ¡ 40 ¡ 60 ¡ 80 ¡ 100 ¡ (s)he has heart disease Age ( X ) The task: calculate P ( Y = Yes | X ) 3 ¡

Scenario: ¡Heart ¡disease ¡– ¡vs ¡– ¡Age ¡ Training set Age (numarical): Yes ¡ independent variable Heart disease ( Y ) Heart disease (Yes/No): dependent variable with two classes Task: Given a new No ¡ person’s age, predict if 0 ¡ 20 ¡ 40 ¡ 60 ¡ 80 ¡ 100 ¡ (s)he has heart disease Age ( X ) § Calculate P ( Y = Yes | X ) for different ranges of X § A curve that estimates the probability P ( Y = Yes | X ) 4 ¡

The ¡Logis&c ¡func&on ¡ Logistic function on t : takes values between 0 and 1 e t 1 Logistic ( t ) = 1 + e t = 1 + e − t If t is a linear function of x L ( t ) t = β 0 + β 1 x Logistic function becomes: 1 F ( x ) = t 1 + e − ( β 0 + β 1 x ) Probability of the dependent variable The logistic curve Y taking one value against another 5 ¡

The ¡Likelihood ¡func&on ¡ § Let, a discrete random variable X has a probability distribution p ( x ; θ ), that depends on a parameter θ § In case of Bernoulli’s distribution p ( x ; θ ) = θ x (1 − θ ) 1 − x § Intuitively, likelihood is “how likely” is an outcome being estimated correctly by the parameter θ – For x = 1, p ( x ; θ ) = θ – For x = 0, p ( x ; θ ) = 1 − θ § Given a set of data points x 1 , x 2 ,…, x n , the likelihood function is defined as: n ∏ l ( θ ) = p ( x i ; θ ) i = 1 6 ¡

About ¡the ¡Likelihood ¡func&on ¡ n ∏ l ( θ ) = p ( x i ; θ ) i = 1 § The actual value does not have any meaning, only the relative likelihood matters, as we want to estimate the parameter θ § Constant factors do not matter § Likelihood is not a probability density function § The sum (or integral) does not add up to 1 § In practice it is often easier to work with the log-likelihood § Provides same relative comparison § The expression becomes a sum " % n n ∏ ∑ ( ) = ln ( ) L ( θ ) = ln l ( θ ) p ( x i ; θ ) ln p ( x i ; θ ) ' = $ # & i = 1 i = 1 7 ¡

Example ¡ § Experiment: a coin toss, not known to be unbiased § Random variable X takes values 1 if head and 0 if tail § Data: 100 outcomes, 75 heads, 25 tails L ( θ ) = 75 × ln( θ ) + 25 × ln(1 − θ ) § Relative likelihood: if θ 1 > θ 2 , L ( θ 1 ) > L ( θ 2 ) 8 ¡

Maximum ¡likelihood ¡es&mate ¡ § Maximum likelihood estimation: Estimating the set of values for the parameters (for example, θ ) which maximizes the likelihood function § Estimate: " % n ∑ [ ] = argmax θ ( ) argmax θ L ( θ ) ln p ( x i ; θ ) $ ' # & i = 1 § One method: Newton’s method – Start with some value of θ and iteratively improve – Converge when improvement is negligible § May not always converge 9 ¡

Taylor’s ¡theorem ¡ § If f is a – Real-valued function – k times differentiable at a point a, for an integer k > 0 Then f has a polynomial approximation at a § In other words, there exists a function h k , such that ( x − a ) + ... + f ( k − 1) ( a ) f ( x ) = f ( a ) + f '( a ) ( x − a ) k + h k ( x )( x − a ) k 1! k ! ! ####### " ####### $ P ( x ) and ( ) = 0 lim x → a h k ( x ) Polynomial approximation ( k- th order Taylor’s polynomial) 10 ¡

Newton’s ¡method ¡ § Finding the global maximum w * of a function f of one variable Assumptions: 1. The function f is smooth 2. The derivative of f at w * is 0, second derivative is negative § Start with a value w = w 0 § Near the maximum, approximate the function using a second order Taylor polynomial 2 ( w − w 0 ) d 2 f f ( w ) ≈ f ( w 0 ) + ( w − w 0 ) df + 1 dw 2 dw w = w 0 w = w 0 ≈ f ( w 0 ) + ( w − w 0 ) f '( w 0 ) + 1 2 ( w − w 0 ) f ''( w 0 ) § Using the gradient descent approach iteratively estimate the maximum of f 11 ¡

Newton’s ¡method ¡ f ( w ) ≈ f ( w 0 ) + ( w − w 0 ) f '( w 0 ) + 1 2 ( w − w 0 ) f ''( w 0 ) § Take derivative w.r.t. w, and set it to zero at a point w 1 f '( w 1 ) ≈ 0 = f '( w 0 ) + 1 2 f ''( w 0 ) × 2( w 1 − w 0 ) ⇒ w 1 = w 0 − f '( w 0 ) f ''( w 0 ) Iteratively: w n + 1 = w n − f '( w n ) f ''( w n ) § Converges very fast, if at all § Use the optim function in R 12 ¡

Logis&c ¡Regression: ¡Es&ma&ng ¡ β 0 ¡and ¡ β 1 § Logistic function e β 0 + β 1 x 1 F ( x ) = 1 + e β 0 + β 1 x = 1 + e − ( β 0 + β 1 x ) § Log-likelihood function – Say we have n data points x 1 , x 2 ,…, x n – Outcomes y 1 , y 2 ,…, y n , each either 0 or 1 – Each y i = 1 with probabilities p and 0 with probability 1 − p n ∑ ( ) = L ( β ) = ln l ( β ) y i ln p ( x i ) + (1 − y i )ln(1 − p ( x i )) i = 1 n ∑ ) − ln(1 + e β 0 + β 1 x ) ( y i β 0 + β 1 x = i = 1 13 ¡

Visualiza&on ¡ § Fit some plot with Yes ¡ parameters β 0 and β 1 Heart disease ( Y ) 0.25 ¡ 0.75 ¡ 0.5 ¡ No ¡ 0 ¡ 20 ¡ 40 ¡ 60 ¡ 80 ¡ 100 ¡ Age ( X ) 14 ¡

Visualiza&on ¡ § Fit some plot with Yes ¡ parameters β 0 and β 1 § Iteratively adjust Heart disease ( Y ) curve and the 0.25 ¡ probabilities of some 0.75 ¡ point being classified 0.5 ¡ as one class vs another No ¡ 0 ¡ 20 ¡ 40 ¡ 60 ¡ 80 ¡ 100 ¡ Age ( X ) For a single independent variable x the separation is a point x = a 15 ¡

Two ¡independent ¡variables ¡ Separation is a line where the probability 200 becomes 0.5 Income (thousand rupees) 150 100 50 0.75 ¡ 0.5 ¡ 0.25 ¡ 0 30 40 50 60 70 80 Age (Years) 16 ¡

Wrapping up classification CLASSIFICATION ¡ 17 ¡

Binary ¡and ¡Mul&-‑class ¡classifica&on ¡ § Binary classification: – Target class has two values – Example: Heart disease Yes / No § Multi-class classification – Target class can take more than two values – Example: text classification into several labels (topics) § Many classifiers are simple to use for binary classification tasks § How to apply them for multi-class problems? 18 ¡

Compound ¡and ¡Monolithic ¡classifiers ¡ § Compound models – By combining binary submodels – 1-vs-all: for each class c , determine if an observation belongs to c or some other class – 1-vs-last § Monolithic models (a single classifier) – Examples: decision trees, k-NN 19 ¡

Recall: Linear Regression 200 180 160 140 Power - PowerPoint PPT Presentation

Logis&c Regression Debapriyo Majumdar Data Mining Fall 2014 Indian Statistical Institute Kolkata September 1, 2014 Recall: Linear Regression 200 180 160 140 Power (bhp) 120

Regression 1: Linear Regression Marco Baroni Practical Statistics in R Outline Classic linear

Linear regression Linear regression is a simple approach to supervised learning. It assumes

Linear regression Linear regression is a simple approach to supervised learning. It assumes

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

GPU tuning, part 1 (updated) CSE 6230: HPC Tools & Apps Fall 2014 September 30 &

Linear regression How to measure the accuracy of linear regression models Linear Regression

Linear Models for Regression Greg Mori - CMPT 419/726 Bishop PRML Ch. 3 Regression Linear Basis

STAT 213 Simple Linear Regression I Colin Reimer Dawson Oberlin College 5 October 2016 Outline

Linear regression Linear regression is a simple approach to supervised learning. It assumes

Logistic regression CS 446 1. Linear classifiers Linear regression Last two lectures, we studied

LINEAR REGRESSION LINEAR REGRESSION - FROM A MACHINE LEARNING POINT OF VIEW 25 SIMPLE LINEAR

Notes on the Non-linear Regression The model Non-linear regression models, like ordinary linear

CS70: Lecture 35. Regression (contd.): Linear and Beyond CS70: Lecture 35. Regression (contd.):

Chapter 7 Linear Regression 04/05/2016 Huamei Dong 1. Review Least square regression line 2.

Simple Linear Regression Recall: A regression model describes how a dependent variable (or

Multiple Linear Regression Recall: a regression model describes how a dependent variable (or

E9 205 Machine Learning for Signal Processing ML, MAP, MMSE and Gaussian 28-08-2019 Modeling

Inferential Statistics Concepts IN TR OD U C TION TO L IN E AR MOD E L IN G IN P YTH ON Jason

Parametric Models Part I: Maximum Likelihood and Bayesian Density Estimation Selim Aksoy

Probabilistic Graphical Models Lecture 11 CRFs, Exponential Family CS/CNS/EE 155 Andreas

13. hypothesis testing 1 competing hypotheses 2 competing hypotheses 3 competing hypotheses

The Bernoulli Generalized Likelihood Ratio test (BGLR) for Non-Stationary Multi-Armed Bandits

Bayesian hypothesis testing (cont.) Dr. Jarad Niemi STAT 544 - Iowa State University March 7,

Chapter 8: Hypothesis Testing STK4011/9011: Statistical Inference Theory Johan Pensar

Recall: Linear Regression 200 180 160 140 Power - PowerPoint PPT Presentation

Logis&c Regression Debapriyo Majumdar Data Mining Fall 2014 Indian Statistical Institute Kolkata September 1, 2014 Recall: Linear Regression 200 180 160 140 Power (bhp) 120

Regression 1: Linear Regression Marco Baroni Practical Statistics in R Outline Classic linear

Linear regression Linear regression is a simple approach to supervised learning. It assumes

Linear regression Linear regression is a simple approach to supervised learning. It assumes

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

GPU tuning, part 1 (updated) CSE 6230: HPC Tools &amp; Apps Fall 2014 September 30 &amp;

Linear regression How to measure the accuracy of linear regression models Linear Regression

Linear Models for Regression Greg Mori - CMPT 419/726 Bishop PRML Ch. 3 Regression Linear Basis

STAT 213 Simple Linear Regression I Colin Reimer Dawson Oberlin College 5 October 2016 Outline

Linear regression Linear regression is a simple approach to supervised learning. It assumes

Logistic regression CS 446 1. Linear classifiers Linear regression Last two lectures, we studied

LINEAR REGRESSION LINEAR REGRESSION - FROM A MACHINE LEARNING POINT OF VIEW 25 SIMPLE LINEAR

Notes on the Non-linear Regression The model Non-linear regression models, like ordinary linear

CS70: Lecture 35. Regression (contd.): Linear and Beyond CS70: Lecture 35. Regression (contd.):

Chapter 7 Linear Regression 04/05/2016 Huamei Dong 1. Review Least square regression line 2.

Simple Linear Regression Recall: A regression model describes how a dependent variable (or

Multiple Linear Regression Recall: a regression model describes how a dependent variable (or

E9 205 Machine Learning for Signal Processing ML, MAP, MMSE and Gaussian 28-08-2019 Modeling

Inferential Statistics Concepts IN TR OD U C TION TO L IN E AR MOD E L IN G IN P YTH ON Jason

Parametric Models Part I: Maximum Likelihood and Bayesian Density Estimation Selim Aksoy

Probabilistic Graphical Models Lecture 11 CRFs, Exponential Family CS/CNS/EE 155 Andreas

13. hypothesis testing 1 competing hypotheses 2 competing hypotheses 3 competing hypotheses

The Bernoulli Generalized Likelihood Ratio test (BGLR) for Non-Stationary Multi-Armed Bandits

Bayesian hypothesis testing (cont.) Dr. Jarad Niemi STAT 544 - Iowa State University March 7,

Chapter 8: Hypothesis Testing STK4011/9011: Statistical Inference Theory Johan Pensar

GPU tuning, part 1 (updated) CSE 6230: HPC Tools & Apps Fall 2014 September 30 &