Bayesian Analysis of Choice Data Simon Jackman Stanford University http://jackman.stanford.edu/BASS February 3, 2012 Simon Jackman ( Stanford ) Bayesian Analysis of Choice Data February 3, 2012 1 / 1
Discrete Choice binary (e.g., probit model; we looked at with data augmentation) Simon Jackman ( Stanford ) Bayesian Analysis of Choice Data February 3, 2012 2 / 1
Discrete Choice binary (e.g., probit model; we looked at with data augmentation) ordinal (ordinal logit or probit) Simon Jackman ( Stanford ) Bayesian Analysis of Choice Data February 3, 2012 2 / 1
Discrete Choice binary (e.g., probit model; we looked at with data augmentation) ordinal (ordinal logit or probit) multinomial models for unordered choices: e.g., multinomial logit (MNL), multinomial probit (MNP). Simon Jackman ( Stanford ) Bayesian Analysis of Choice Data February 3, 2012 2 / 1
Discrete Choice binary (e.g., probit model; we looked at with data augmentation) ordinal (ordinal logit or probit) multinomial models for unordered choices: e.g., multinomial logit (MNL), multinomial probit (MNP). We won’t consider models for ‘‘tree-like’’ choice structures (nested logit, GEV, etc). Simon Jackman ( Stanford ) Bayesian Analysis of Choice Data February 3, 2012 2 / 1
Binary Choices: logit or probit for ‘‘standard’’ models (e.g., no ‘‘fancy’’ hierarchical structure, no concerns re missing data etc), other avenues besides BUGS / JAGS e.g., MCMCpack implementations in BUGS / JAGS : don’t use data augmentation a la Albert & Chib (1991). dbern or dbin and sample from the conditional distributions using Metropolis-within-Gibbs, slice sampling Simon Jackman ( Stanford ) Bayesian Analysis of Choice Data February 3, 2012 3 / 1
Binary Choices: logit or probit Voter turnout example. JAGS code 1 model{ 2 for (i in 1:N){ ## loop over observations 3 y[i] ~ dbern(p[i]) ## binary outcome 4 logit(p[i]) <- ystar[i] ## logit link 5 ystar[i] <- beta[1] ## regression structure for covariates 6 + beta[2]*educ[i] 7 + beta[3]*(educ[i]*educ[i]) 8 + beta[4]*age[i] 9 + beta[5]*(age[i]*age[i]) 10 + beta[6]*south[i] 11 + beta[7]*govelec[i] 12 + beta[8]*closing[i] 13 + beta[9]*(closing[i]*educ[i]) 14 + beta[10]*(educ[i]*educ[i]*closing[i]) 15 } 16 17 ## priors 18 beta[1:10] ~ dmnorm(mu[] , B[ , ]) # diffuse multivariate Normal prior 19 # see data file 20 } Simon Jackman ( Stanford ) Bayesian Analysis of Choice Data February 3, 2012 4 / 1
Binary Data Is Binomial Data when Grouped (§8.1.4) big, micro-level data sets with binary data (e.g., CPS) MCMC gets slow collapse the data into covariate classes , treat as binomial data; much smaller data set, much shorter run-times y i | x i ∼ Bernoulli ( F [ x i b ]) , where x i is a vector of covariates. Covariate classes : a set C = { i : x i = x C } i.e., the set of respondents who have covariate vector x C . probability assignments over y i ∀ i ∈ C are conditionally exchangeable given their common x i and b . binomial model r C ∼ Binomial ( p C ; n C ) , where p C = F ( x C b ) , r C = � i ∈C y i is the number of ‘‘successes’’ in C and n C is the cardinality of C . Simon Jackman ( Stanford ) Bayesian Analysis of Choice Data February 3, 2012 5 / 1
Example 8.5; binomial model for grouped binary data Form covariate classes, and groupedData object; original data set n ≈ 99000; only 636 unique covariates classes. R code 1 ## collapse by covariate classes 2 X <- cbind(nagler$age,nagler$educYrs) 3 X <- apply(X,1,paste,collapse=":") 4 covClasses <- match(X,unique(X)) 5 covX <- matrix(unlist(strsplit(unique(X),":")),ncol=2,byrow=TRUE) 6 r <- tapply(nagler$turnout,covClasses,sum) 7 n <- tapply(nagler$turnout,covClasses,length) 8 groupedData <- list(n=n,r=r, 9 age=as.numeric(covX[,1]), 10 educYrs=as.numeric(covX[,2]), 11 NOBS=length(n)) Simon Jackman ( Stanford ) Bayesian Analysis of Choice Data February 3, 2012 6 / 1
Example 8.5; binomial model for grouped binary data We can then pass the groupedData data frame to JAGS . We specify the binomial model r i ∼ Binomial ( p i ; n i ) with p i = F ( x i b ) and vague normal priors on b with the following code: JAGS code 1 model{ 2 for (i in 1:NOBS){ 3 logit(p[i]) <- beta[1] + age[i]*beta[2] 4 + pow(age[i],2)*beta[3] 5 + educYrs[i]*beta[4] 6 + pow(educYrs[i],2)*beta[5] 7 r[i] ~ dbin(p[i],n[i]) ## binomial model for each covariate class 8 } 9 10 11 beta[1:5] ~ dmnorm(b0[],B0[,]) 12 } 13 Simon Jackman ( Stanford ) Bayesian Analysis of Choice Data February 3, 2012 7 / 1
Ordinal Responses e.g., 7-point scale when measuring party identification in the U.S., assigning the numerals y i ∈ { 0 , . . . , 6 } to the categories {‘‘Strong Republican’’, ‘‘Weak Republican’’, . . . , ‘‘Strong Democrat’’}. Censored, latent variable representation: y * e i ∼ N ( 0 , r 2 ), = x i b + e i , i = 1 , . . . , n . i y * y i = 0 ⇐ ⇒ i < s 1 s j < y * y i = j ⇐ ⇒ i ≤ s j + 1 , j = 1 , . . . , J - 1 y * y i = J ⇐ ⇒ i > s J threshold parameters obey the ordering constraint s 1 < s 2 < . . . < s J . The assumption of normality for e i generates the probit version of the model; a logistic density generates the ordinal logistic model. Bayesian analysis: we want p ( b , s | y , X ) ∝ p ( y | X , b , s ) p ( b , s ) . Simon Jackman ( Stanford ) Bayesian Analysis of Choice Data February 3, 2012 8 / 1
Ordinal responses, y * i ∼ N ( x i b , r 2 ) Pr [ y i = j ] = U [( s j + 1 - x i b )/ r ] - U [( s j - x i b )/ r ] Pr(y=0) Pr(y=1) Pr(y=2) Pr(y=3) Pr(y=4) Pr(y=5) Pr(y=6) τ 0 τ 1 τ τ 2 τ 3 τ τ 4 τ 5 Simon Jackman ( Stanford ) Bayesian Analysis of Choice Data February 3, 2012 9 / 1
Identification y * e i ∼ N ( 0 , r 2 ), i = 1 , . . . , n . = x i b + e i , i y * y i = 0 ⇐ ⇒ i < s 1 s j < y * y i = j ⇐ ⇒ i ≤ s j + 1 , j = 1 , . . . , J - 1 y * y i = J ⇐ ⇒ i > s J Model needs identification constraints Set one of the s to a point (zero); set r to a constant (one) Drop the intercept and fix r Fix two of the s parameters. Simon Jackman ( Stanford ) Bayesian Analysis of Choice Data February 3, 2012 10 / 1
Priors on thresholds s j ∼ N ( 0 , 10 2 ) , subject to ordering constraint s j > s j - 1 , ∀ j = 2 , . . . , J . In JAGS only, use nifty sort function: JAGS code 1 for(j in 1:4){ 2 tau0[j] ~ dnorm(0,.01) 3 } 4 tau[1:4] <- sort(tau0) ## JAGS only, not in WinBUGS! BUGS : s 1 ∼ N ( t 1 , T 1 ) d j ∼ Exponential ( d ), j = 2 , . . . , J , j = 2 , . . . , J , s j = s j - 1 + d j , BUGS code 1 tau[1] ~ dnorm(0,.01) 2 for(j in 1:3){ 3 delta[j] ~ dexp(2) 4 tau[j+1] <- tau[j] + delta[j] 5 } Simon Jackman ( Stanford ) Bayesian Analysis of Choice Data February 3, 2012 11 / 1
Example 8.6, interviewer ratings of respondents 5 point rating scale used by interviewers in assessing respondents’ levels of political information In 2000 ANES: Label y n % Very Low 0 105 6 Fairly Low 1 334 19 Average 2 586 33 Fairly High 3 450 25 Very High 4 325 18 covariates: education, gender, age, home-owner, public sector employment Simon Jackman ( Stanford ) Bayesian Analysis of Choice Data February 3, 2012 12 / 1
Ordinal Logistic Model JAGS code 1 model{ 2 for(i in 1:N){ ## loop over observations 3 ## form the linear predictor (no intercept) 4 mu[i] <- x[i,1]*beta[1] + 5 x[i,2]*beta[2] + 6 x[i,3]*beta[3] + 7 x[i,4]*beta[4] + 8 x[i,5]*beta[5] + 9 x[i,6]*beta[6] 10 11 ## cumulative logistic probabilities 12 logit(Q[i,1]) <- tau[1]-mu[i] 13 p[i,1] <- Q[i,1] 14 for(j in 2:4){ 15 logit(Q[i,j]) <- tau[j]-mu[i] 16 ## trick to get slice of the cdf we need 17 p[i,j] <- Q[i,j] - Q[i,j-1] 18 } 19 p[i,5] <- 1 - Q[i,4] 20 y[i] ~ dcat(p[i,1:5]) ## p[i,] sums to 1 for each i 21 } 22 23 ## priors over betas 24 beta[1:6] ~ dmnorm(b0[],B0[,]) 25 26 ## thresholds 27 for(j in 1:4){ 28 tau0[j] ~ dnorm(0, .01) 29 } 30 tau[1:4] <- sort(tau0) ## JAGS only not in BUGS! 31 } Simon Jackman ( Stanford ) Bayesian Analysis of Choice Data February 3, 2012 13 / 1
Redundant Parameterization exploit lack of identification run the MCMC algorithm deployed in the space of unidentified parameters post-processing : map MCMC output back mixes better than the MCMC algorithm in the space of the identified parameters get a better mixing Markov chain in ordinal model case, exploit lack of identification between thresholds and intercept parameters take care! Simon Jackman ( Stanford ) Bayesian Analysis of Choice Data February 3, 2012 14 / 1
Interviewer heterogeneity in scale-use Different interviewers use the rating scale differently: e.g., interviewer k is a tougher grader than interviewer k ′ . We tap this with a set of interviewer terms, varying over interviewers k = 1 , . . . , K We augment the usual ordinal model as follows: Pr ( y i ≥ j ) = F ( s j - l i ), j = 0 , . . . , J - 1 Pr ( y i = J ) = 1 - F ( s j - 1 - l i ) l i = x i b + g k g k ∼ N ( 0 , r 2 ) k = 1 , . . . , K A positive g k is equivalent to the thresholds being shifted down (i.e., interviewer k is an easier-than-average grader). Zero-mean restriction on g k : why? Alternative model: each interviewer gets their own set of thresholds, perhaps fit these hierarchically. Simon Jackman ( Stanford ) Bayesian Analysis of Choice Data February 3, 2012 15 / 1
Recommend
More recommend