Special Topics Some complex model-building problems can be handled - PowerPoint PPT Presentation

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Special Topics Some complex model-building problems can be handled using the linear regression approach covered up to this point. For example, piecewise regression, including piecewise linear regression and spline regression. Some require more general nonlinear approaches. For example, logistic and probit regression for binary responses. 1 / 15 Special Topics Introduction

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Logistic Regression Linear regression methods are used to evaluate the impact of various factors on a response. When the response Y is binary (0 or 1), linear methods have problems. Because E ( Y | x ) = P ( Y = 1 | x ), the linear regression model E ( Y | x ) = β 0 + β 1 x will often predict probabilities that are either negative or greater than 1. 2 / 15 Special Topics Logistic Regression

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II The most common alternative is based on modeling the log odds ratio : π ( x ) = P ( Y = 1 | x ) π ( x ) 1 − π ( x ) = the odds ratio � π ( x ) � log = log odds ratio, or just log odds . 1 − π ( x ) In the logistic regression model, we assume � π ( x ) � log = β 0 + β 1 x 1 + · · · + β k x k . 1 − π ( x ) 3 / 15 Special Topics Logistic Regression

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Solving for π ( x ), we find exp ( β 0 + β 1 x 1 + · · · + β k x k ) P ( Y = 1 | x ) = π ( x ) = 1 + exp ( β 0 + β 1 x 1 + · · · + β k x k ) . Consequently 1 P ( Y = 0 | x ) = 1 − π ( x ) = 1 + exp ( β 0 + β 1 x 1 + · · · + β k x k ) . As a function of any x j , π ( x ) changes smoothly from 0 to 1. It is increasing if β j > 0, and decreasing if β j < 0. 4 / 15 Special Topics Logistic Regression

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II The function exp( x ) F ( x ) = 1 + exp( x ) is the cdf of the logistic distribution: curve(exp(x)/(1 + exp(x)), from = -5, to = 5) It is similar to the cdf of the normal distribution with the matching variance ( π 2 3 ): curve(pnorm(x, 0, sqrt(pi^2/3)), add = TRUE, col = "red") 5 / 15 Special Topics Logistic Regression

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Interpreting the parameters The coefficient β j measures the change in the log odds associated with a change of +1 in x j . So e β j is the proportional change in the odds associated with the same change. When x j is an indicator variable, e β j is often interpreted as the relative risk that Y = 1 for the group where x j = 1, relative to the group where x j = 0. 6 / 15 Special Topics Logistic Regression

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Example: fraud detection Data are credit card transactions. The response is Y , where � 1 if the transaction is fraudulent Y = 0 otherwise . The predictors are information about the card holder (credit limit, etc.) and about the transaction (amount, etc.). The fitted ˆ π ( x ) can be used to predict the probability that a new transaction will prove to be fraudulent. 7 / 15 Special Topics Logistic Regression

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Estimation The usual approach to estimating β 0 , β 1 , . . . , β k is by maximum likelihood. It is implemented in proc logistic and proc genmod in SAS, and in the glm() function in R. The names “genmod” and “glm” are abbreviations of generalized linear model , of which logistic regression is a particular case. 8 / 15 Special Topics Logistic Regression

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Example: collusive bidding in Florida road construction. bids <- read.table("Text/Exercises&Examples/ROADBIDS.txt", header = TRUE) pairs(bids) Using glm() is very similar to using lm() : g <- glm(STATUS ~ NUMBIDS + DOTEST, bids, family = binomial) summary(g) The argument family = binomial specifies that the response, STATUS , has the binomial (strictly, the Bernoulli) distribution. 9 / 15 Special Topics Logistic Regression

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II The output is also similar to that of lm() . Note that instead of a column of t -values, there is a column of z -values. Like a t -value, a z -value is the ratio of a parameter estimate to its standard error. The label indicates that you test the significance of the parameter using the normal distribution, not the t -distribution. 10 / 15 Special Topics Logistic Regression

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Because this is not a least squares fit, there are no sums of squares. Deviance plays a similar role. For example, to test the utility of the model, use the statistic Null deviance − Residual deviance = 21 . 756 which, under H 0 : β 1 = β 2 = 0, is χ 2 -distributed with 30 − 28 = 2 degrees of freedom. P ( χ 2 2 ≥ 21 . 756) = 1 . 9 × 10 − 5 , so we reject H 0 . 11 / 15 Special Topics Logistic Regression

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II You also use deviance to compare nested models, such as the first order model � π ( x ) � log = β 0 + β 1 x 1 + β 2 x 2 1 − π ( x ) against the complete second-order model � π ( x ) � = β 0 + β 1 x 1 + β 2 x 2 + β 3 x 1 x 2 + β 4 x 2 1 + β 5 x 2 log 2 . 1 − π ( x ) summary(glm(STATUS ~ NUMBIDS * DOTEST + I(NUMBIDS^2) + I(DOTEST^2), bids, family = binomial)) 12 / 15 Special Topics Logistic Regression

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II To test H 0 : β 3 = β 4 = β 5 = 0, the test statistic is (Residual deviance for reduced model) − (Residual deviance for complete model) Under H 0 , this statistic has the χ 2 -distribution with 28 − 25 = 3 degrees of freedom. Here we have 22 . 843 − 13 . 820 = 9 . 023, which we compare with the χ 2 3 -distribution. We find P ( χ 2 3 ≥ 9 . 023) = . 029, so we would reject H 0 at α = . 05 but not at α = . 01. That is, there is some evidence that we need second-order terms. 13 / 15 Special Topics Logistic Regression

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Prediction Suppose that a new auction has 4 bidders, and the difference between the winning bid and the engineer’s estimate is 30%. What is the probability that the auction was collusive? predict(g, data.frame(NUMBIDS = 4, DOTEST = 30), type = "response", se.fit = TRUE) The probability is . 85, but the standard error of . 13 shows that it is not very well quantified. If you do not specify “ type = "response" ”, the prediction is on the scale of the log odds, not the probability itself. 14 / 15 Special Topics Logistic Regression

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Do not use the standard error of the predicted probability to construct a confidence interval! You can use a confidence interval for the log odds to construct a corresponding confidence interval for the probability: p <- predict(g, data.frame(NUMBIDS = 4, DOTEST = 30), se.fit = TRUE) logOdds <- p$fit + qnorm(c(.025, .5, .975)) * p$se.fit exp(logOdds) / (1 + exp(logOdds)) 15 / 15 Special Topics Logistic Regression

Special Topics Some complex model-building problems can be handled - PowerPoint PPT Presentation

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Special Topics Some complex model-building problems can be handled using the linear regression approach covered up to this point. For example,

Special and Extra Special Groups Generalised Bestvina-Brady groups Special Cube Complexes My

Office of Special Events, Film & Tourism SPECIAL EVENTS ORDINANCE City of Savannah / Office

SPECIAL EVENTS 2018 Training Planning for a Special Event When do you need a Special Event

Special Olympics Tennis Special Olympics Tennis Special Olympics Tennis Special Olympics Tennis

Special Topics in Organic Chemistry Special Topics in Organic Chemistry Biorenewable Polymers

Special Services Presentation March 20, 2018 Ellen Gerace, LCSW, Director of Special Services

Formal Modeling in Cognitive Science 1 Special Probability Distributions Uniform Distribution

NEGATIVE POSITIVE FLUFFY AND IRRELEVANT UNHEARD GOD ONLY GIVES SPECIAL KIDS TO SPECIAL

Special Ed Teacher and SLP Collaborating and Creating Learning Units Suzanne Slaughter - Special

AIR TICKETING | SAFARIS | CAR RENTALS AGM AGM AGM SPECIAL SPECIAL SPECIAL LAKE MANYARA

Special Education Special Education & School Climate & School Climate Melissa Toshner

Special Student Services Special Student Services Special Education services for students

LODZ SPECIAL ECONOMIC ZONE SPARK FOR GROWTH LODZ SPECIAL ECONOMIC ZONE Special Economic Zone

West Rocks Middle School SPECIAL EDUCATION What is special education? The purpose of special

SPECIAL NEEDS SPECIAL NEEDS U.S.C. U.S.C. 1396p(d)(4)(A) 1396p(d)(4)(A) TRUSTS

SPECIAL NATURAL AREA DISTRICT UPDATE Draft Proposal for The Bronx November 2018 Bronx Special

Introducing Open Platform for NFV Dirk Kutscher Chief Researcher NEC

Ukulele Lesson Eight Play each string, 1 2 one at a time. Listen to the sound. Are your

Multistage Interconnection Networks are not Crossbars - A Case Study with Infiniband - Torsten

Cyber@UC Meeting 75 MITRE Framework Continued If Youre New! Join our Slack:

Intro to GLM Day 3: Quantities of interest Federico Vegetti Central European University ECPR

Lecture 11: Interpreting logistic regression models Ani Manichaikul amanicha@jhsph.edu 3 May

Lecture 18: Review Lecture Ani Manichaikul amanicha@jhsph.edu 15 May 2007 Types of

Statistical Modelling in Stata: Categorical Outcomes Mark Lunt Centre for Epidemiology Versus

Special Topics Some complex model-building problems can be handled - PowerPoint PPT Presentation

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Special Topics Some complex model-building problems can be handled using the linear regression approach covered up to this point. For example,

Special and Extra Special Groups Generalised Bestvina-Brady groups Special Cube Complexes My

Office of Special Events, Film &amp; Tourism SPECIAL EVENTS ORDINANCE City of Savannah / Office

SPECIAL EVENTS 2018 Training Planning for a Special Event When do you need a Special Event

Special Olympics Tennis Special Olympics Tennis Special Olympics Tennis Special Olympics Tennis

Special Topics in Organic Chemistry Special Topics in Organic Chemistry Biorenewable Polymers

Special Services Presentation March 20, 2018 Ellen Gerace, LCSW, Director of Special Services

Formal Modeling in Cognitive Science 1 Special Probability Distributions Uniform Distribution

NEGATIVE POSITIVE FLUFFY AND IRRELEVANT UNHEARD GOD ONLY GIVES SPECIAL KIDS TO SPECIAL

Special Ed Teacher and SLP Collaborating and Creating Learning Units Suzanne Slaughter - Special

AIR TICKETING | SAFARIS | CAR RENTALS AGM AGM AGM SPECIAL SPECIAL SPECIAL LAKE MANYARA

Special Education Special Education &amp; School Climate &amp; School Climate Melissa Toshner

Special Student Services Special Student Services Special Education services for students

LODZ SPECIAL ECONOMIC ZONE SPARK FOR GROWTH LODZ SPECIAL ECONOMIC ZONE Special Economic Zone

West Rocks Middle School SPECIAL EDUCATION What is special education? The purpose of special

SPECIAL NEEDS SPECIAL NEEDS U.S.C. U.S.C. 1396p(d)(4)(A) 1396p(d)(4)(A) TRUSTS

SPECIAL NATURAL AREA DISTRICT UPDATE Draft Proposal for The Bronx November 2018 Bronx Special

Introducing Open Platform for NFV Dirk Kutscher Chief Researcher NEC

Ukulele Lesson Eight Play each string, 1 2 one at a time. Listen to the sound. Are your

Multistage Interconnection Networks are not Crossbars - A Case Study with Infiniband - Torsten

Cyber@UC Meeting 75 MITRE Framework Continued If Youre New! Join our Slack:

Intro to GLM Day 3: Quantities of interest Federico Vegetti Central European University ECPR

Lecture 11: Interpreting logistic regression models Ani Manichaikul amanicha@jhsph.edu 3 May

Lecture 18: Review Lecture Ani Manichaikul amanicha@jhsph.edu 15 May 2007 Types of

Statistical Modelling in Stata: Categorical Outcomes Mark Lunt Centre for Epidemiology Versus

Office of Special Events, Film & Tourism SPECIAL EVENTS ORDINANCE City of Savannah / Office

Special Education Special Education & School Climate & School Climate Melissa Toshner