How Much Can Be Inferred From Almost Nothing? A Two-Stage Maximum Entropy Approach to Uncertainty in Ecological Inference Problems of Ecological Inference Martin Elff 1 , Thomas Gschwend 1 , and Ron Johnston 2 1 University of Mannheim 2 University of Bristol useR 2006, R User Conference, Wirtschaftsuniversität Wien, 15-17 Juni 2006, Wien Martin Elff, Thomas Gschwend, and Ron Johnston Maximum Entropy and Ecological Inference Ecological Inference The Problem of Modelling Indeterminacy Aim: estimation of individual-level behavior/properties from aggregate summaries Restrictive model necessary to find estimates in ecological inference problem If behavior/properties are categorical: estimation of a I × J × K -size data cube from I × K -, J × K -, and Assumptions of restrictive model cannot be tested – because of missing data sometimes also I × J -size marginal tables Assumptions may be wrong – but a wrong model may lead Big problem: more items of data to be estimated than to biased estimates items of data known Usual trick: use a model with less parameters Martin Elff, Thomas Gschwend, and Ron Johnston Maximum Entropy and Ecological Inference Martin Elff, Thomas Gschwend, and Ron Johnston Maximum Entropy and Ecological Inference
A Solution Template – A Two-Stage Approach Main Principle: Consider possible bias caused by model failure as a source of extra-variation of parameter estimates A Solution Template First stage: Use a “neutral” model: means maximize entropy subject to the constraints implied by known data Second stage: Use a entropy maximizing conjugate distribution of means derived from first-stage model Use means/expectations from first stage model to derive point estimates Use second stage model to derive confidence intervals Martin Elff, Thomas Gschwend, and Ron Johnston Maximum Entropy and Ecological Inference Martin Elff, Thomas Gschwend, and Ron Johnston Maximum Entropy and Ecological Inference Maximizing Entropy at the First Stage – Example: The Maximizing Entropy at the First Stage – Example: The Johnston-Hay Model I Johnston-Hay Model II Formulation of Johnston-Hay Model: First stage probability model of unknown data x ijk : Model for unknown counts in data cube with given marginal tables n ! x ijk � f Mt ( x ) = p ijk , Entropy is maximized subject to the condition that sums of � i , j , k x ijk ! i , j , k probabilities in each direction are equal to proportions in marginal tables Expectations: e α ij + β ik + γ jk E ( x ijk ) = np ijk = ne α ij + β ik + γ jk e τ − 1 = n � r , s , t e α rs + β rt + γ st Martin Elff, Thomas Gschwend, and Ron Johnston Maximum Entropy and Ecological Inference Martin Elff, Thomas Gschwend, and Ron Johnston Maximum Entropy and Ecological Inference
Maximizing Entropy at the First Stage – Example: The Maximizing Entropy at the Second Stage – Extending Johnston-Hay Model III the Johnston-Hay Model by a Infinite Mixture of the p ijk Entropy is maximized subject to constraints — that is, the Mixing distribution: Dirichlet following Lagrangian is maximized: Γ( � i , j , k θ ijk ) θ ijk − 1 � f Dt ( p ) = p ijk � i , j , k Γ( θ ijk ) � � i , j , k � � � L ( p ) = − n p ijk log p ijk + α ij n p ijk − n ij . � i , j , k i , j k Maximize H Dt := − f Dt ( p ) ln f Dt ( p ) d p for all θ ijk subject to � � θ ijk ! � � � � = ˆ + π ijk := E ( p ijk ) = p ijk , that is, maximize + β ik n p ijk − n i . k γ jk n p ijk − n . jk P r , s , t θ rst i , k j j , k i � � ln Γ( θ 0 ˆ ( θ 0 ˆ p ijk − 1 )Ψ( θ 0 ˆ p ijk ) − ln Γ( θ 0 )+( θ 0 − IJK )Ψ ( θ 0 ) − p ijk ) � + τ n p ijk − n i , j , k i , j , k i , j , k for θ 0 and set θ ijk = θ 0 ˆ p ijk . ( Ψ( x ) := d ln Γ( x ) / d x ) Martin Elff, Thomas Gschwend, and Ron Johnston Maximum Entropy and Ecological Inference Martin Elff, Thomas Gschwend, and Ron Johnston Maximum Entropy and Ecological Inference Implementation in R ▼❛①❊♥t▼✉❧t✐♥♦♠✐❛❧✸✭✮ Produces cell probability estimates p ijk from marginal table counts n ij , n ik , and n jk using A Simulation Study – Check of the iterative proportional scaling. Two-Stage Maximum Entropy Approach ❉✐r✐❝❤❧❡tP❛r♠s✭✮ Produces entropy-maximizing parameters ˜ θ ijk of Dirichlet distribution subject to r , s , t θ rst = ˆ θ ijk / � p ijk . ❉✐r✐❝❤❧❡t❚♦❇❡t❛❈■✭✮ Produces confidence intervals for each p ijk based on ˜ of the ˆ θ ijk and marginal Beta distribution of p ijk . Martin Elff, Thomas Gschwend, and Ron Johnston Maximum Entropy and Ecological Inference Martin Elff, Thomas Gschwend, and Ron Johnston Maximum Entropy and Ecological Inference
RMSE of First-Stage Point Estimates: Contrary to Confidence Intervals from Second Stage Distribution: asymptotic theory, RMSE is unaffected by n . Nominal coverage ≈ real covererage if n → ∞ (?) Simulation Study of Extended Maximum Entropy Approach: Total root mean square error (TRMSE) of prediction after 2,000 Mean Effective Coverage (Percentage) of True Cell Counts replications with arbitrary configuration of “true” counts. after 2,000 replications Population size Population size Number of cells 100,000 10,000,000 Number of cells 100,000 10,000,000 3 × 3 × 50 0 . 565 0 . 564 3 × 3 × 50 94 . 7 95 . 0 3 × 3 × 200 0 . 579 0 . 574 3 × 3 × 200 93 . 2 95 . 0 7 × 7 × 50 0 . 827 0 . 817 7 × 7 × 50 92 . 7 94 . 4 0 . 867 0 . 829 7 × 7 × 200 7 × 7 × 200 86 . 9 94 . 3 Martin Elff, Thomas Gschwend, and Ron Johnston Maximum Entropy and Ecological Inference Martin Elff, Thomas Gschwend, and Ron Johnston Maximum Entropy and Ecological Inference Possible Causes of Undercoverage Proposed method rests on the approximation of the compound multinomial distribution by the Dirchlet Application to Split-Ticket Voting: See distribution. poster! If data cube is large and n is “small,” the approximation is not so good. Confidence intervals based on compund multinomial distribution are difficult to construct (mixture of a discrete distribution with a continous distribution). Martin Elff, Thomas Gschwend, and Ron Johnston Maximum Entropy and Ecological Inference
Recommend
More recommend