estimation in mixed models with dirichlet process random
play

Estimation in Mixed Models with Dirichlet Process Random Effects - PowerPoint PPT Presentation

The Fourth Erich L. Lehmann Symposium May 9 - 12, 2011 Estimation in Mixed Models with Dirichlet Process Random Effects Both Sides of the Story George Casella Chen Li Department of Statistics Department of Statistics University of Florida


  1. The Fourth Erich L. Lehmann Symposium May 9 - 12, 2011 Estimation in Mixed Models with Dirichlet Process Random Effects Both Sides of the Story George Casella Chen Li Department of Statistics Department of Statistics University of Florida University of Florida Minjung Kyung Jeff Gill Center for Applied Statistics Center for Applied Statistics Washington University Washington University Supported by NSF Grants: SES-0958982 & SES-0959054.

  2. Estimation in Dirichlet Process Random Effects Models: Introduction [1] Introduction ◮ The Beginning Prior distributions in the social sciences ◮ Transition After the data analysis: model properties ◮ Dirichlet Process Likelihood, subclusters, precision parameter Random Effects ◮ MCMC Parameter expansion, convergence, optimality ◮ Example Scottish election, normal random effects ◮ Some Theory Why are the intervals shorter? ◮ Classical OLS, BLUE Mixed Models ◮ Conclusions And other remarks

  3. Estimation in Dirichlet Process Random Effects Models: Introduction [2] ———But First——— Here is the Big Picture ◮ Usual Random Effects Model Y | ψ ∼ N ( Xβ + ψ, σ 2 I ) , ψ i ∼ N (0 , τ 2 ) ⊲ Subject-specific random effect ◮ Dirichlet Process Random Effects Model Y | ψ ∼ N ( Xβ + ψ, σ 2 I ) , ψ i ∼ DP ( m, N (0 , τ 2 )) ◮ Results in ⊲ Fewer Assumptions ⊲ Better Estimates ⊲ Shorter Credible Intervals ⊲ Straightforward Classical Estimation

  4. Estimation in Dirichlet Process Random Effects Models: How this all started [3] How This All Started The Use of Prior Distributions in the Social Sciences ◮ When do priors matter in social science research? Can more flexible priors help us ◮ How to specify known prior information? recover latent ◮ Bayesian social scientists like uninformed priors hierarchical information? ◮ Reviewers often skeptical about informed priors ◮ Survey of Political Executives (Gill and Casella 2008 JASA) ⊲ Outcome Variable: stress ⊲ surrogate for self-perceived effectiveness and job-satisfaction ⊲ five-point scale from “not stressful at all” to “very stressful.” ⊲ Ordered probit model

  5. Estimation in Dirichlet Process Random Effects Models: How this all started [4] Survey of Political Executives Some Coefficient Estimates Posterior Mean 95% HD Interval 0.120 [ –0.086 : 0.141] Government Experience 0.076 [ -0.031 : 0.087] Republican -0.181 [ -0.302 : -0.168] Committee Relationship -0.316 [ -0.598 : -0.286] Confirmation Preparation 0.447 [ 0.351 : 0.457] Hours/Week -0.338 [ -0.621 : -0.309] President Orientation [ -1.958 : ] Cutpoints: -1.488 -1.598 (None) (Little) [ -1.410 : ] -0.959 -1.078 (Little) (Some) [ -0.786 : ] -0.325 0.454 (Some) (Significant) ◮ Intervals are very tight [ ] 0.844 0.411 : 0.730 (Significant) (Extreme) ◮ Most do not overlap zero ◮ Seems typical of Dirichlet Process random effects model (later) ◮ Reasonable Subject Matter Interpretations

  6. Estimation in Dirichlet Process Random Effects Models: Motivation [5] Transition What Did We Learn? ◮ Dirichlet Process Random Effects Models Analyzing Social Science Data ⊲ Accepted by Social Scientists ⊲ Computationally Feasible ⊲ Provides good estimates ◮ “Off the shelf ” MCMC ⊲ can we do better? ◮ Precision parameter m ⊲ arbitrarily fixed Understanding the Methodology ◮ Answers insensitive to m ??? ◮ Next: Better understanding of MCMC and estimation of m . ◮ Performance evaluations and wider applications

  7. Estimation in Dirichlet Process Random Effects Models: Details of the Model [6] A Dirichlet Process Random Effects Model Estimating the Dirichlet Process Parameters ◮ A general random effects Dirichlet Process model can be written � ( Y 1 , . . . , Y n ) ∼ f ( y 1 , . . . , y n | θ, ψ 1 , . . . , ψ n ) = f ( y i | θ, ψ i ) i ⊲ ψ 1 , . . . , ψ n iid from G ∼ DP ⊲ DP is the Dirichlet Process ⊲ Base measure φ 0 and precision parameter m ⊲ The vector θ contains all model parameters ◮ Blackwell and MacQueen (1973) proved i − 1 � m 1 ψ i | ψ 1 , . . . , ψ i − 1 ∼ i − 1 + m φ 0 ( ψ i ) + δ ( ψ l = ψ i ) i − 1 + m l =1 ⊲ Where δ denotes the Dirac delta function.

  8. Estimation in Dirichlet Process Random Effects Models: Details of the Model [7] Some Distributional Structure ◮ Freedman (1963), Ferguson (1973, 1974) and Antoniak (1974) ⊲ Dirichlet process prior for nonparametric G ⊲ Random probability measure on the space of all measures. ◮ Notation ⊲ G 0 , a base distribution (finite non-null measure) ⊲ m > 0, a precision parameter (finite and non-negative scalar) ⊲ Gives spread of distributions around G 0 , ⊲ Prior specification G ∼ DP ( m, G 0 ) ∈ P . ◮ For any finite partition of the parameter space, { B 1 , . . . , B K } , ( G ( B 1 ) , . . . , G ( B K )) ∼ D ( mG 0 ( B 1 ) , . . . , mG 0 ( B K )) ,

  9. Estimation in Dirichlet Process Random Effects Models: Details of the Model [8] A Mixed Dirichlet Process Random Effects Model Likelihood Function ◮ The likelihood function is integrated over the random effects � L ( θ | y ) = f ( y 1 , . . . , y n | θ, ψ 1 , . . . , ψ n ) π ( ψ 1 , . . . , ψ n ) dψ 1 · · · dψ n ◮ From Lo (1984 Annals) Lemma 2 and Liu (1996 Annals)   � n k �  � � Γ( m ) m k  , L ( θ | y ) = f ( y ( j ) | θ, ψ j ) φ 0 ( ψ j ) dψ j Γ( n j ) Γ( m + n ) j =1 k =1 C : | C | = k ⊲ The partition C defines the subclusters ⊲ y ( j ) is the vector of y i s in subcluster j ⊲ ψ j is the common parameter for that subcluster

  10. Estimation in Dirichlet Process Random Effects Models: Details of the Model [9] A Mixed Dirichlet Process Random Effects Model Matrix Representation of Partitions ◮ Start with the model Y | ψ ∼ N ( Xβ + ψ, σ 2 I ) , where ψ i ∼ DP ( m, N (0 , τ 2 )) , i = 1 , . . . , n ◮ With Likelihood Function   � n k �  � � Γ( m )  , m k L ( θ | y ) = Γ( n j ) f ( y ( j ) | θ, ψ j ) φ 0 ( ψ j ) dψ j Γ( m + n ) k =1 C : | C | = k j =1 ◮ Associate a binary matrix A n × k with a partition C   0 1 0 0 1 0     1 0 0   C = { S 1 , S 2 , S 3 } = {{ 3 , 4 , 6 } , { 1 , 2 } , { 5 }} ↔ A =   1 0 0   0 0 1 1 0 0

  11. Estimation in Dirichlet Process Random Effects Models: Details of the Model [10] A Mixed Dirichlet Process Random Effects Model Matrix Representation of Partitions ◮ ψ = A η , η ∼ N k (0 , σ 2 I ) Y | A , η ∼ N ( Xβ + Aη, σ 2 I ) , η ∼ N k (0 , τ 2 I ) , ⊲ Rows: a i is a 1 × k vector of all zeros except for a 1 in its subcluster ⊲ Columns: The column sums of A are the number of observations in the groups ⊲ Variables: ψ i ∈ S j ⇒ ψ i = η j (constant in subclusters) ⊲ Monte Carlo: Only need to generate k normal random variables

  12. Estimation in Dirichlet Process Random Effects Models: MCMC [11] MCMC Sampling Scheme Posterior Distribution ◮ The joint posterior distribution m k f ( y | θ, A ) π ( θ ) � π ( θ, A | y ) = � A m k f ( y | θ, A ) π ( θ ) dθ. Θ Model Random effects Dirichlet Process parameters Model parameters θ A : the subclusters → sampling is straightforward m : the precision parameter

  13. Estimation in Dirichlet Process Random Effects Models: MCMC [12] MCMC Sampling Scheme Model Parameters and Dirichlet Process Parameters ◮ For t = 1 , . . . T , at iteration t ◮ Starting from ( θ ( t ) , A ( t ) ), Model Parameters θ ( t +1) ∼ π ( θ | A ( t ) , y ) , ◮ Given θ ( t +1) , A ( t +1) q ( t +1) ∼ Dirichlet( n ( t ) 1 + 1 , . . . , n ( t ) k + 1 , 1 , . . . , 1 ) � �� � length n � � Dirichlet Process Parameters n � n A ( t +1) ∝ m k f ( y | θ ( t +1) , A ) [ q ( t +1) ] n j j n 1 · · · n n j =1 ◮ where n j ≥ 0, n 1 + · · · + n n = n .

  14. Estimation in Dirichlet Process Random Effects Models: MCMC [13] MCMC Sampling Scheme Convergence of Dirichlet Process ◮ Neal (2000) describes 8 algorithms: All use “stick-breaking” conditionals Stick-breaking chain Our chain � � �  � � n j n j q j  n − 1+ m j = 1 , . . . , k j = 1 , . . ., k n − 1+ m n j +1 P ( a j = 1 | A − j ) ∝ P ( a j = 1 | A − j ) ∝  m n − 1+ m j = k + 1 m n − 1+ m q k +1 j = k + 1 , . . ., n ◮ Ours is a Parameter Expansion ◮ Parameter expansion dominates ◮ Var h ( Y ) is smaller for any square-integrable function h . (Liu/Wu 1999; vanDyk/Meng 2001; Hobert/Marchev 2008; Mira/ Geyer 1999; Mira, 2001)

  15. Estimation in Dirichlet Process Random Effects Models: Scottish Election Data [14] Scottish Election Data - History Our Interest: 1997: Scottish voters overwhelmingly (74.3%) approved the creation of ◮ Who subsequently voted the first Scottish parliament conservative in Scotland? The Data: ◮ British General Election Study of 880 Scottish na- tionals The voters gave strong support, ◮ Outcome: party choice (63.5%), to granting this parliament (conservative or not) in UK taxation powers general election ◮ Independent variables: po- litical and social measures ◮ Probit model

Recommend


More recommend