Bayesian Analysis of Choice Data Simon Jackman Stanford University - PowerPoint PPT Presentation

Bayesian Analysis of Choice Data Simon Jackman Stanford University http://jackman.stanford.edu/BASS February 3, 2012 Simon Jackman ( Stanford ) Bayesian Analysis of Choice Data February 3, 2012 1 / 1

Discrete Choice binary (e.g., probit model; we looked at with data augmentation) Simon Jackman ( Stanford ) Bayesian Analysis of Choice Data February 3, 2012 2 / 1

Discrete Choice binary (e.g., probit model; we looked at with data augmentation) ordinal (ordinal logit or probit) Simon Jackman ( Stanford ) Bayesian Analysis of Choice Data February 3, 2012 2 / 1

Discrete Choice binary (e.g., probit model; we looked at with data augmentation) ordinal (ordinal logit or probit) multinomial models for unordered choices: e.g., multinomial logit (MNL), multinomial probit (MNP). Simon Jackman ( Stanford ) Bayesian Analysis of Choice Data February 3, 2012 2 / 1

Discrete Choice binary (e.g., probit model; we looked at with data augmentation) ordinal (ordinal logit or probit) multinomial models for unordered choices: e.g., multinomial logit (MNL), multinomial probit (MNP). We won’t consider models for ‘‘tree-like’’ choice structures (nested logit, GEV, etc). Simon Jackman ( Stanford ) Bayesian Analysis of Choice Data February 3, 2012 2 / 1

Binary Choices: logit or probit for ‘‘standard’’ models (e.g., no ‘‘fancy’’ hierarchical structure, no concerns re missing data etc), other avenues besides BUGS / JAGS e.g., MCMCpack implementations in BUGS / JAGS : don’t use data augmentation a la Albert & Chib (1991). dbern or dbin and sample from the conditional distributions using Metropolis-within-Gibbs, slice sampling Simon Jackman ( Stanford ) Bayesian Analysis of Choice Data February 3, 2012 3 / 1

Binary Choices: logit or probit Voter turnout example. JAGS code 1 model{ 2 for (i in 1:N){ ## loop over observations 3 y[i] ~ dbern(p[i]) ## binary outcome 4 logit(p[i]) <- ystar[i] ## logit link 5 ystar[i] <- beta[1] ## regression structure for covariates 6 + beta[2]*educ[i] 7 + beta[3]*(educ[i]*educ[i]) 8 + beta[4]*age[i] 9 + beta[5]*(age[i]*age[i]) 10 + beta[6]*south[i] 11 + beta[7]*govelec[i] 12 + beta[8]*closing[i] 13 + beta[9]*(closing[i]*educ[i]) 14 + beta[10]*(educ[i]*educ[i]*closing[i]) 15 } 16 17 ## priors 18 beta[1:10] ~ dmnorm(mu[] , B[ , ]) # diffuse multivariate Normal prior 19 # see data file 20 } Simon Jackman ( Stanford ) Bayesian Analysis of Choice Data February 3, 2012 4 / 1

Binary Data Is Binomial Data when Grouped (§8.1.4) big, micro-level data sets with binary data (e.g., CPS) MCMC gets slow collapse the data into covariate classes , treat as binomial data; much smaller data set, much shorter run-times y i | x i ∼ Bernoulli ( F [ x i b ]) , where x i is a vector of covariates. Covariate classes : a set C = { i : x i = x C } i.e., the set of respondents who have covariate vector x C . probability assignments over y i ∀ i ∈ C are conditionally exchangeable given their common x i and b . binomial model r C ∼ Binomial ( p C ; n C ) , where p C = F ( x C b ) , r C = � i ∈C y i is the number of ‘‘successes’’ in C and n C is the cardinality of C . Simon Jackman ( Stanford ) Bayesian Analysis of Choice Data February 3, 2012 5 / 1

Example 8.5; binomial model for grouped binary data Form covariate classes, and groupedData object; original data set n ≈ 99000; only 636 unique covariates classes. R code 1 ## collapse by covariate classes 2 X <- cbind(nagler$age,nagler$educYrs) 3 X <- apply(X,1,paste,collapse=":") 4 covClasses <- match(X,unique(X)) 5 covX <- matrix(unlist(strsplit(unique(X),":")),ncol=2,byrow=TRUE) 6 r <- tapply(nagler$turnout,covClasses,sum) 7 n <- tapply(nagler$turnout,covClasses,length) 8 groupedData <- list(n=n,r=r, 9 age=as.numeric(covX[,1]), 10 educYrs=as.numeric(covX[,2]), 11 NOBS=length(n)) Simon Jackman ( Stanford ) Bayesian Analysis of Choice Data February 3, 2012 6 / 1

Example 8.5; binomial model for grouped binary data We can then pass the groupedData data frame to JAGS . We specify the binomial model r i ∼ Binomial ( p i ; n i ) with p i = F ( x i b ) and vague normal priors on b with the following code: JAGS code 1 model{ 2 for (i in 1:NOBS){ 3 logit(p[i]) <- beta[1] + age[i]*beta[2] 4 + pow(age[i],2)*beta[3] 5 + educYrs[i]*beta[4] 6 + pow(educYrs[i],2)*beta[5] 7 r[i] ~ dbin(p[i],n[i]) ## binomial model for each covariate class 8 } 9 10 11 beta[1:5] ~ dmnorm(b0[],B0[,]) 12 } 13 Simon Jackman ( Stanford ) Bayesian Analysis of Choice Data February 3, 2012 7 / 1

Ordinal Responses e.g., 7-point scale when measuring party identification in the U.S., assigning the numerals y i ∈ { 0 , . . . , 6 } to the categories {‘‘Strong Republican’’, ‘‘Weak Republican’’, . . . , ‘‘Strong Democrat’’}. Censored, latent variable representation: y * e i ∼ N ( 0 , r 2 ), = x i b + e i , i = 1 , . . . , n . i y * y i = 0 ⇐ ⇒ i < s 1 s j < y * y i = j ⇐ ⇒ i ≤ s j + 1 , j = 1 , . . . , J - 1 y * y i = J ⇐ ⇒ i > s J threshold parameters obey the ordering constraint s 1 < s 2 < . . . < s J . The assumption of normality for e i generates the probit version of the model; a logistic density generates the ordinal logistic model. Bayesian analysis: we want p ( b , s | y , X ) ∝ p ( y | X , b , s ) p ( b , s ) . Simon Jackman ( Stanford ) Bayesian Analysis of Choice Data February 3, 2012 8 / 1

Ordinal responses, y * i ∼ N ( x i b , r 2 ) Pr [ y i = j ] = U [( s j + 1 - x i b )/ r ] - U [( s j - x i b )/ r ] Pr(y=0) Pr(y=1) Pr(y=2) Pr(y=3) Pr(y=4) Pr(y=5) Pr(y=6) τ 0 τ 1 τ τ 2 τ 3 τ τ 4 τ 5 Simon Jackman ( Stanford ) Bayesian Analysis of Choice Data February 3, 2012 9 / 1

Identification y * e i ∼ N ( 0 , r 2 ), i = 1 , . . . , n . = x i b + e i , i y * y i = 0 ⇐ ⇒ i < s 1 s j < y * y i = j ⇐ ⇒ i ≤ s j + 1 , j = 1 , . . . , J - 1 y * y i = J ⇐ ⇒ i > s J Model needs identification constraints Set one of the s to a point (zero); set r to a constant (one) Drop the intercept and fix r Fix two of the s parameters. Simon Jackman ( Stanford ) Bayesian Analysis of Choice Data February 3, 2012 10 / 1

Priors on thresholds s j ∼ N ( 0 , 10 2 ) , subject to ordering constraint s j > s j - 1 , ∀ j = 2 , . . . , J . In JAGS only, use nifty sort function: JAGS code 1 for(j in 1:4){ 2 tau0[j] ~ dnorm(0,.01) 3 } 4 tau[1:4] <- sort(tau0) ## JAGS only, not in WinBUGS! BUGS : s 1 ∼ N ( t 1 , T 1 ) d j ∼ Exponential ( d ), j = 2 , . . . , J , j = 2 , . . . , J , s j = s j - 1 + d j , BUGS code 1 tau[1] ~ dnorm(0,.01) 2 for(j in 1:3){ 3 delta[j] ~ dexp(2) 4 tau[j+1] <- tau[j] + delta[j] 5 } Simon Jackman ( Stanford ) Bayesian Analysis of Choice Data February 3, 2012 11 / 1

Example 8.6, interviewer ratings of respondents 5 point rating scale used by interviewers in assessing respondents’ levels of political information In 2000 ANES: Label y n % Very Low 0 105 6 Fairly Low 1 334 19 Average 2 586 33 Fairly High 3 450 25 Very High 4 325 18 covariates: education, gender, age, home-owner, public sector employment Simon Jackman ( Stanford ) Bayesian Analysis of Choice Data February 3, 2012 12 / 1

Ordinal Logistic Model JAGS code 1 model{ 2 for(i in 1:N){ ## loop over observations 3 ## form the linear predictor (no intercept) 4 mu[i] <- x[i,1]*beta[1] + 5 x[i,2]*beta[2] + 6 x[i,3]*beta[3] + 7 x[i,4]*beta[4] + 8 x[i,5]*beta[5] + 9 x[i,6]*beta[6] 10 11 ## cumulative logistic probabilities 12 logit(Q[i,1]) <- tau[1]-mu[i] 13 p[i,1] <- Q[i,1] 14 for(j in 2:4){ 15 logit(Q[i,j]) <- tau[j]-mu[i] 16 ## trick to get slice of the cdf we need 17 p[i,j] <- Q[i,j] - Q[i,j-1] 18 } 19 p[i,5] <- 1 - Q[i,4] 20 y[i] ~ dcat(p[i,1:5]) ## p[i,] sums to 1 for each i 21 } 22 23 ## priors over betas 24 beta[1:6] ~ dmnorm(b0[],B0[,]) 25 26 ## thresholds 27 for(j in 1:4){ 28 tau0[j] ~ dnorm(0, .01) 29 } 30 tau[1:4] <- sort(tau0) ## JAGS only not in BUGS! 31 } Simon Jackman ( Stanford ) Bayesian Analysis of Choice Data February 3, 2012 13 / 1

Redundant Parameterization exploit lack of identification run the MCMC algorithm deployed in the space of unidentified parameters post-processing : map MCMC output back mixes better than the MCMC algorithm in the space of the identified parameters get a better mixing Markov chain in ordinal model case, exploit lack of identification between thresholds and intercept parameters take care! Simon Jackman ( Stanford ) Bayesian Analysis of Choice Data February 3, 2012 14 / 1

Interviewer heterogeneity in scale-use Different interviewers use the rating scale differently: e.g., interviewer k is a tougher grader than interviewer k ′ . We tap this with a set of interviewer terms, varying over interviewers k = 1 , . . . , K We augment the usual ordinal model as follows: Pr ( y i ≥ j ) = F ( s j - l i ), j = 0 , . . . , J - 1 Pr ( y i = J ) = 1 - F ( s j - 1 - l i ) l i = x i b + g k g k ∼ N ( 0 , r 2 ) k = 1 , . . . , K A positive g k is equivalent to the thresholds being shifted down (i.e., interviewer k is an easier-than-average grader). Zero-mean restriction on g k : why? Alternative model: each interviewer gets their own set of thresholds, perhaps fit these hierarchically. Simon Jackman ( Stanford ) Bayesian Analysis of Choice Data February 3, 2012 15 / 1

Bayesian Analysis of Choice Data Simon Jackman Stanford University - PowerPoint PPT Presentation

Bayesian Analysis of Choice Data Simon Jackman Stanford University http://jackman.stanford.edu/BASS February 3, 2012 Simon Jackman ( Stanford ) Bayesian Analysis of Choice Data February 3, 2012 1 / 1 Discrete Choice binary (e.g., probit

MOBILITY CHOICE STUDY MOBILITY CHOICE STUDY MOBILITY CHOICE STUDY Planning for Mobility in

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

Meta-Bayesian Analysis A Bayesian decision-theoretic analysis of Bayesian inference under model

Voting in Maines Ranked Choice Election A non-partisan guide to ranked choice elections

Homecare Choice Program Presented by Jenny Cokeley Homecare Choice Program Manager Homecare

CS440/ECE448 Lecture 15: Bayesian Inference and Bayesian Learning Slides by Svetlana Lazebnik,

Bayesian Learning 1 Outline MLE, MAP vs. Bayesian Learning Bayesian Linear Regression

CS 331: Bayesian Networks 2 1 Bayesian Networks Youve heard about how Bayesian networks

HOW WILL HOW WILL RANKED CHOICE VOTING RANKED CHOICE VOTING WORK IN HI? WORK IN HI? VOTERS

1 WEEK/CHOICE #1 The Reality Choice: realize Im not God, and humbly admit that I need help.

Introduction to Bayesian Inference Frank Wood April 6, 2010 Introduction Overview of Topics

A simple Bayesian regression model Alicia Johnson Associate Professor, Macalester College

Part 7 Bayesian hierarchical modelling, simulation and MCMC by Gero Walter 252 Bayesian

Case Study: Bayesian Linear Regression and Sparse Bayesian Models Piyush Rai Dept. of CSE, IIT

AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Bayesian Networks Directed Acyclic Graph (DAG)

and Policy Implications A LLISON K. H OFFMAN & H OWELL E. J ACKSON UCLA & H ARVARD L AW S

Modeling the effect of temperature changes on plant life-form distribution along a treeline

Analysis of the PeerRank Method for Peer Grading Joshua Kline Advisors: Matthew Anderson and

Thermodynamic fluctuations in glass-forming liquids Ludovic Berthier Laboratoire Charles Coulomb

Practical note on specification of discrete choice model Toshiyuki Yamamoto Nagoya University,

Alecos Papadopoulos PhD Candidate (supervisor: Prof. Pl. Sakellaris) (N : this piece of

Computer Lab IV Summary Evanthia Kazagli evanthia.kazagli@epfl.ch p. 1/11 Today

Computer Lab II Biogeme & Binary Logit Model Estimation Evanthia Kazagli Transport and

Bayesian Analysis of Choice Data Simon Jackman Stanford University - PowerPoint PPT Presentation

Bayesian Analysis of Choice Data Simon Jackman Stanford University http://jackman.stanford.edu/BASS February 3, 2012 Simon Jackman ( Stanford ) Bayesian Analysis of Choice Data February 3, 2012 1 / 1 Discrete Choice binary (e.g., probit

MOBILITY CHOICE STUDY MOBILITY CHOICE STUDY MOBILITY CHOICE STUDY Planning for Mobility in

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

Meta-Bayesian Analysis A Bayesian decision-theoretic analysis of Bayesian inference under model

Voting in Maines Ranked Choice Election A non-partisan guide to ranked choice elections

Homecare Choice Program Presented by Jenny Cokeley Homecare Choice Program Manager Homecare

CS440/ECE448 Lecture 15: Bayesian Inference and Bayesian Learning Slides by Svetlana Lazebnik,

Bayesian Learning 1 Outline MLE, MAP vs. Bayesian Learning Bayesian Linear Regression

CS 331: Bayesian Networks 2 1 Bayesian Networks Youve heard about how Bayesian networks

HOW WILL HOW WILL RANKED CHOICE VOTING RANKED CHOICE VOTING WORK IN HI? WORK IN HI? VOTERS

1 WEEK/CHOICE #1 The Reality Choice: realize Im not God, and humbly admit that I need help.

Introduction to Bayesian Inference Frank Wood April 6, 2010 Introduction Overview of Topics

A simple Bayesian regression model Alicia Johnson Associate Professor, Macalester College

Part 7 Bayesian hierarchical modelling, simulation and MCMC by Gero Walter 252 Bayesian

Case Study: Bayesian Linear Regression and Sparse Bayesian Models Piyush Rai Dept. of CSE, IIT

AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Bayesian Networks Directed Acyclic Graph (DAG)

and Policy Implications A LLISON K. H OFFMAN &amp; H OWELL E. J ACKSON UCLA &amp; H ARVARD L AW S

Modeling the effect of temperature changes on plant life-form distribution along a treeline

Analysis of the PeerRank Method for Peer Grading Joshua Kline Advisors: Matthew Anderson and

Thermodynamic fluctuations in glass-forming liquids Ludovic Berthier Laboratoire Charles Coulomb

Practical note on specification of discrete choice model Toshiyuki Yamamoto Nagoya University,

Alecos Papadopoulos PhD Candidate (supervisor: Prof. Pl. Sakellaris) (N : this piece of

Computer Lab IV Summary Evanthia Kazagli evanthia.kazagli@epfl.ch p. 1/11 Today

Computer Lab II Biogeme &amp; Binary Logit Model Estimation Evanthia Kazagli Transport and

and Policy Implications A LLISON K. H OFFMAN & H OWELL E. J ACKSON UCLA & H ARVARD L AW S

Computer Lab II Biogeme & Binary Logit Model Estimation Evanthia Kazagli Transport and