multivariate glms
play

Multivariate GLMs Author: Nicholas Reich, transcribed by Kate Hoff - PowerPoint PPT Presentation

Multivariate GLMs Author: Nicholas Reich, transcribed by Kate Hoff Shutta and Herb Susmann Course: Categorical Data Analysis (BIOSTATS 743) Made available under the Creative Commons Attribution-ShareAlike 4.0 International License. Overview:


  1. Multivariate GLMs Author: Nicholas Reich, transcribed by Kate Hoff Shutta and Herb Susmann Course: Categorical Data Analysis (BIOSTATS 743) Made available under the Creative Commons Attribution-ShareAlike 4.0 International License.

  2. Overview: Models for Multinomial Responses Note: This lecture focuses mainly on the Baseline Category Logit Model (see Agresti Ch. 8), but for the exam we are responsible for reading Chapter 8 of the text and being familiar with all types of models for multinomial responses introduced there. ◮ GLMs for Nominal Responses ◮ Baseline Category Logit Model (Multinomial Logit Model) ◮ Multinomial Probit Model ◮ GLMs for Ordinal Responses ◮ Cumulative Logit Model ◮ Cumulative Link Models ◮ Cumulative Probit Model ◮ Cumulative Log-Log Model ◮ Adjacent-Categories Logit Models ◮ Continuation-Ratio Logit Models ◮ Discrete-Choice Models ◮ Conditional Logit Models (and relationship to Multinomial Logit Model) ◮ Multinomial Probit Discrete-Choice Models ◮ Extension to Nested Logit and Mixed Logit Models ◮ Extension to Discrete Choice Model with Ordered Categories

  3. Baseline Category Logit Model The Baseline Category Logit (BCL) model is appropriate for modeling nominal response data as a function of one or more categorical or quantitative covariates. ◮ Example: Modeling choice of voter candidate as a function of voter age (quantitative), gender (categorical nominal), race (categorical nominal), and socioeconomic status (categorical ordinal). ◮ Example: Modeling transcription factor binding to a promoter region as a function of transcription factor abundance (quantitative), affinity for the binding site (quantitative), and primary immune response activation status (categorical binary). ◮ Non-Example: Modeling consumer choice of soda size as a function of air temperature (quantitative) and time of day (quantitative). Soda size is a categorical ordinal variable, so although this model will technically work, it does not incorporate all of the information that our data contain.

  4. BCL Model Formulation Consider the set of J possible values of a categorical response variable { C 1 , C 2 , . . . , C J } and the vector of P covariates � X = ( X 1 , X 2 , . . . , X P ) Goal: For a particular vector of covariates � x i = ( x i 1 , x i 2 , . . . , x iP ), predict Y i , the category to which the observation with covariates � x i belongs. (Note that Y i ∈ { C 1 , . . . , C J } .) Intermediate Goal: For all j ∈ 1 , . . . , J , use training data to fit x i ) under the constraint that � J π j ( � x i ) = P ( Y i = C J | � j =1 π j ( � x i ) = 1 Conditional on the observed covariates and the estimates for the functions π j , Y i is Multinomial: Y i | � x i ∼ Multinomial (1 , { π 1 ( � x i ) , . . . , π J ( � x i ) } )

  5. Overview of Modeling Process ◮ Choose one of the J categories as a baseline. Without loss of generality, use C J (since the C j are nominal and ordering is irrelevant). ◮ Let β j = ( β j 1 , . . . , β jP ) be the category-specific coefficients of the covariates � x i for a particular category C J . (note the dimensions of β j are P x 1) ◮ Recall � x i = ( x i 1 , x i 2 , . . . , x iP ) is P x 1 ◮ We now can calculate the following scalar quantity, which is a log probability ratio that is modeled as a linear function of the covariates � x i : � π j ( � x i ) � T � log = α j + β j β j β j x i π J ( � x i )

  6. Overview of Modeling Process, continued ◮ Specifying the probabilities π j relative to the reference category π J specifies a similar log probability ratio for any two categories π a , π b , a � = b , since � π a ( � x i ) � π b ( � x i ) � π a ( � x i ) � � � log − log = log π J ( � x i ) π J ( � x i ) π b ( � x i ) ◮ Note that we only need to model ( J − 1) of the probabilities π j , since the constraint � J j =1 π j ( � x i ) = 1 uniquely constrains the J th conditional on the ( J − 1).

  7. Formulation of the BCL Model as a Multivariate GLM Response Vector y i = ( y i 1 , y i 2 , . . . , y i ( J − 1) ) � Expected Response Vector E [ � y i ] = g ( � µ i ) Argument to Link Function µ i � = ( µ i 1 , µ i 2 , . . . , µ i ( J − 1) ) = ( π 1 ( � x i ) , π 2 ( � x i ) , . . . , π J − 1 ( � x i )) Link Function � T � x i ) , . . . , log π ( J − 1) ( � x i ) log π 1 ( � x i ) x i ) , log π 2 ( � x i ) g ( � µ i ) = = X i β β β π J ( � π J ( � π J ( � x i ) where X i and β β β are defined on the next slide

  8. Formulation of the BCL Model as a Multivariate GLM Matrix of Covariates X i is a ( J − 1) x P ( J − 1) matrix (recall that P is the number of covariates) constructed from blocks of the form (1 , x i 1 , x i 2 , . . . , x i ( P − 1) )   1 x i 1 . . . x iP 0 0 . . . 0 . . . 0 . . . 0 0 0 0 1 0 0 . . . x i 1 . . . x iP . . . . . .     X i = . . . . . . . . . . . . . . . . . .  . . . . . . . . . . . . . . . . . .  . . . . . . . . . . . . . . . . . .     0 0 . . . 0 0 0 . . . 0 . . . 1 x i 1 .. x ip Vector of Parameters β β β is a column vector with dimension ( J − 1) P x 1, containing the category-specific coefficients α j and β jk for j ∈ { 1 , J − 1 } and k ∈ { 1 , P } : β = ( α 1 , β 11 , . . . , β 1 P , α 2 , β 21 , . . . , β 2 P , . . . , α J − 1 , β ( J − 1)1 , . . . ,β ( J − 1) P ) T β β

  9. Multivariate GLM : The Mechanics of Prediction ◮ X i is J − 1 x P ( J − 1) and β β β is P ( J − 1) x 1 ◮ � y i = g ( � µ i ) = X i β β β is a J − 1 x 1 column vector Let X ( j ) refer to the j th row vector of X i . Then the dot product of i X ( j ) with the parameter vector β β β is the predicted log probability i ratio for observation i and non-reference category C j : � π j ( � x i ) � = X ( j ) y ij = g ( � µ i ) = log · β β β i π J ( � x i )

  10. Multivariate GLM : Example of the Mechanics of Prediction Suppose we wish to calculate y i 1 . The first row vector of X i is: X ( 1 ) = (1 , x i 1 , x i 2 , . . . , x iP , 0 , 0 , 0 , . . . , 0) i The column vector of parameters β is the same for all i : β β β = ( α 1 , β 11 , . . . , β 1 P , α 2 , β 21 , . . . , β 2 P , . . . ,α J − 1 , β ( J − 1)1 , . . . , β ( J − 1) P ) Their dot product gives us the predicted y i 1 : � π 1 ( � x i ) � y i 1 = g ( π 1 ( � x i )) = log π J ( � x i ) = X ( 1 ) · β β β i =1 α 1 + x i 1 β 11 + · · · + x iP β 1 P + 0 ∗ α 2 +0 ∗ β 21 + · · · + 0 ∗ β 2 p + . . . + 0 ∗ α J − 1 +0 ∗ β ( J − 1)1 + · · · + 0 ∗ β ( J − 1) p =1 α 1 + x i 1 β 11 + · · · + x iP β 1 P

  11. Response Probabilities Note the following relationship: � π j ( � x i ) exp( X i β j β j β j ) � log = X i β β β = ⇒ π j ( � x i ) = 1 + � J − 1 π J ( � x i ) n =1 exp( X i β n β n β n ) The argument of the log function here is sometimes referred to as the “relative risk” in the public health setting.

  12. Response Probabilities Plotting the π j ( � x i ) as a function of one covariate x ij can provide a nice graphic of how these probabilities compare to one another when projected onto x ij × π j (i.e., compare the category-specific response probabilities for different values of the j th covariate for subject i with all other covariates held constant).

  13. Using χ 2 or G 2 as a Model Check When all predictors in a model are categorical and the training data can be represented in a contingency table that is not sparse, the chi 2 or G 2 goodness-of-fit tests used earlier in the semester can be used to assess whether or not the fitted BCL model is appropriate. (generate “expected” contingency table from predicted results and then “residuals” are expected-observed) If some predictors are not categorical or the contingency table is sparse, these statistics are “valid only for comparing nested models differing by relatively few terms” (A. Agresti, Categorical Data Analysis p. 294). This means that they cannot validly be used as a model check overall, but they can be used to compare fit of full vs. reduced models if the full model only has “relatively few” more covariates than the reduced one(s).

  14. Example: Using Symptoms to Classify Disease (Reich Lab Research) Motivating Question: Confirmatory clinical tests are expensive and take time, meaning they are not a reasonable diagnostic option in many public health settings. Can we instead design a model that can use routine observable symptoms to classify sick individuals accurately? (Adapted from work in progress by Brown et. al. ) Categories: Covariates (a few of many in the actual model): ◮ C 1 : Dengue ◮ Age ◮ C 2 : Zika ◮ Headache ◮ C 3 : Flu ◮ Rash ◮ C 4 : Chikingunya ◮ Conjunctivitis ◮ C 5 : Other ◮ ... ◮ C 6 : No Diagnosis

Recommend


More recommend