spatial mapping of multivariate spatial mapping of
play

Spatial Mapping of Multivariate Spatial Mapping of Multivariate - PowerPoint PPT Presentation

Spatial Mapping of Multivariate Spatial Mapping of Multivariate Profiles Profiles John Molitor Molitor John Imperial College, London Imperial College, London Aug 26, 2010 Aug 26, 2010 Motivation- - deal with correlated data Motivation


  1. Spatial Mapping of Multivariate Spatial Mapping of Multivariate Profiles Profiles John Molitor Molitor John Imperial College, London Imperial College, London Aug 26, 2010 Aug 26, 2010

  2. Motivation- - deal with correlated data Motivation deal with correlated data X3 X1 X2 X4 Smoking Poor interaction Healthy Educ Disease D= β 0 + β 1 X1 + β 2 X2 + β 3 X1*X2+ error β 1 X1+ β 2 X2 + β 3 X3+ β 4 X4 + β 5 *X1*X2 +…+ β 10 X3*X4 + +…. Xp+ +…. Xp-1 * Xp+ P=20 - Muticollinearity 2-ways interaction: 190 - Pattern of interaction effects 3-ways interaction: 1140 may be illusive

  3. Individual Covariates versus Profiles Individual Covariates versus Profiles X1 X2 X3 X4 Smoke Poor Healthy Educ Disease Use a sequence of covariates values to form different profiles profile 1: 1, 1, 0, 0 (Smoke, Poor) profile 2: 1, 0, 0, 1 (Smoke, Educ) ... profile N: 0, 0, 1, 1 (Healthy, Educ)

  4. Profile Regression Profile Regression � Idea : Use pattern as basic unit of inference. Cluster these pat Idea : Use pattern as basic unit of inference. Cluster these patterns into a terns into a � relative small numbers of risk groups and use these risk groups to predict to predict relative small numbers of risk groups and use these risk groups an outcome of interests. an outcome of interests. Disease Outcome L θ θ Risk group Risk group 1 C L Pattern 1 Pattern 2 Pattern C-1 Pattern C

  5. Profile Regression- - modeling framework modeling framework Profile Regression Assignment Model: � Assignment Model: � Model the probability that an individual is assigned to particular ar Model the probability that an individual is assigned to particul cluster. cluster. C = ∑ ψ θ ( ) ( | ) f f x x i c i c = Disease Model : : 1 � Disease Model c � Model the risk associated with a individual pattern group. Model the risk associated with a individual pattern group. = θ + β = logit( ) , y W z c i z i i i Or, alternatively, Or, alternatively, * + β W i , C ∑ logit(y i ) = α + θ z i θ * c = 0 c = 1

  6. How to decide the number of clusters? How to decide the number of clusters? C = ∑ ψ θ ( ) ( | ) f f x x i c i c = 1 c � Reversible Jump Reversible Jump - - complicated split/merge moves complicated split/merge moves � � Flexible Approach Flexible Approach - - finite number of clusters finite number of clusters � � Truncated Truncated Dirichlet Dirichlet Process Process � � Choose more clusters than needed. (Clusters allowed to be empty. Choose more clusters than needed. (Clusters allowed to be empty.) ) � � Chose the enough clusters to avoid estimating a large number of Chose the enough clusters to avoid estimating a large number of unnecessary unnecessary � cluster parameters. cluster parameters.

  7. Stick- -breaking prior cluster probabilities breaking prior cluster probabilities Stick � Determines prior probabilities for cluster allocations Determines prior probabilities for cluster allocations � � Prior probability assigned to first cluster is obtain by Prior probability assigned to first cluster is obtain by � breaking stick of length one. breaking stick of length one. � Subsequent probabilities obtained by breaking Subsequent probabilities obtained by breaking “ “left left � over” ” part of stick. part of stick. over

  8. Truncated Dirichlet Dirichlet Process Process Truncated When specified the number of clusters Finite Infinite Truncated Dirichlet Dirichlet ∞ C ∑ ∑ f ( x i ) = ψ c f ( x i | θ c ) ≈ ψ c f ( x i | θ c ) c = 1 c = 1

  9. Markov Chain Monte Carlo (MCMC) Parameter Estimation � Fits model as a unit. � Both outcome (y’s) and covariates (x’s) influence cluster membership � Flexible (e.g. easy to change form of disease model) � Implemented in WinBugs (could use JAGS or custom code)

  10. Model Averaging through Post- -Processing Processing Model Averaging through Post Estimating the risk of a new profile Estimating the risk of a new profile � � Examination of Average Clustering Examination of Average Clustering � � Estimate the partition of interest. Estimate the partition of interest. � � Deal with typical clustering algorithm problems Deal with typical clustering algorithm problems � � such as label- -switching. switching. such as label

  11. Estimating the Risk of a New Profile – Estimating the Risk of a New Profile – A A Model Averaging Approach Model Averaging Approach 1. Probabilistically assign the profile to the appropriate cluster Probabilistically assign the profile to the appropriate cluster ( ) ∝ Pr x new |z new ( ) Pr z new ( ) Pr z new |x new 2. Profile risk is equal to the risk of cluster to which pattern is risk of cluster to which pattern is assigned assigned profile risk = θ z new 3. Average over varying number of clusters used at each iteration of MCMC sampler

  12. Examination of Average Clustering Examination of Average Clustering (invariant to label switching) (invariant to label switching) � At every iteration of MCMC sampler, we have a partition of At every iteration of MCMC sampler, we have a partition of � individuals : individuals : z 1 =(2, 2, 2, 5, 5, 5, 7, 7, 7, 5) z 1 =(2, 2, 2, 5, 5, 5, 7, 7, 7, 5) z 2 =(2, 2, 2, 5, 5, 5, 5, 7, 7, 7) z 2 =(2, 2, 2, 5, 5, 5, 5, 7, 7, 7) z 3 =(2, 2, 2, 5, 5, 5, 5, 7, 5, 7) z 3 =(2, 2, 2, 5, 5, 5, 5, 7, 5, 7) z 4 =(2, 2, 2, 5, 5, 7, 5, 7, 7, 5) z 4 =(2, 2, 2, 5, 5, 7, 5, 7, 7, 5) … … � Find the best partition, Find the best partition, z z best . Represents as average way in best . Represents as average way in � which the algorithm groups individuals into clusters. which the algorithm groups individuals into clusters. e.g. z e.g. z best best = (2, 2, 2, 5, 5, 5, 5, 7, 7, 7) = (2, 2, 2, 5, 5, 5, 5, 7, 7, 7) z best = (a, a, a, b, b, b, b, c, c, c) z best = (a, a, a, b, b, b, b, c, c, c)

  13. Best Partition Z Z best Best Partition best � Construct the score matrix (S Construct the score matrix (S Z ) Z ) � � Record 1 if individual i and j are in the same cluster and Record 1 if individual i and j are in the same cluster and � record 0 otherwise (repeating for each iteration) record 0 otherwise (repeating for each iteration) � Averaging the score matrices obtained at each iteration Averaging the score matrices obtained at each iteration � � Define Define S S ij as empirical prob. which individual i and j in ij as empirical prob. which individual i and j in � the same cluster the same cluster � Finding Finding z z best : Use the following “ “least squares least squares” ” formula formula best : Use the following � (Dahl 2006) (Dahl 2006) ⎧ ⎫ 2 ( ) ⎪ ⎪ N N ∑ ∑ Z best = argmin S z , ij − S ij ⎨ ⎬ ⎪ ⎪ z ∈ Z ⎩ ⎭ i = 1 j = 1

  14. Accounting for uncertainty when finding the Accounting for uncertainty when finding the best partition using model averaging best partition using model averaging � Individuals in each single group of Individuals in each single group of z z best may appear in the different best may appear in the different � cluster at each iteration. cluster at each iteration. � Variability from cluster is used to access the uncertainty rela Variability from cluster is used to access the uncertainty related to ted to � group defined by the z z best group defined by the best � At each iteration of MCMC sampler, we find average risk for all At each iteration of MCMC sampler, we find average risk for all � individuals in each subgroup of best partition, z z best . individuals in each subgroup of best partition, best . (Same procedure for covariate probabilities) (Same procedure for covariate probabilities) � Important to properly assess uncertainty as all datasets will ha Important to properly assess uncertainty as all datasets will have ve “ “best best” ” � grouping. grouping.

  15. Subgroup Assignment at Each Iteration of MCMC Sampler θ θ θ θ θ θ θ θ θ θ 1 2 3 4 5 6 7 8 9 10 Subgroup 2 Subgroup 1 Subgroup 3 θ + θ + θ + θ θ + θ + θ θ + θ + θ θ = 5 5 6 6 θ = θ = 6 7 7 2 2 2 b 4 a c 3 3

  16. Cluster Risks Cluster Risks θ 1 = 0.2 θ 2 = 0.4 θ 3 = 0.6 θ 4 = 0.1 θ 5 = 0.7 θ = 0.8 6 Partition Sub- -Groups Groups Partition Sub 1,8,5 2,6,4 7,3 1,8,5 2,6,4 7,3 Individual Cluster Assignment Sub- -Group Risk Group Risk Individual Cluster Assignment Sub θ 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 1 1 2 2 3 3 z = z = z = (1,1,2) (3,3,3) (5,5) 1 1 3 3 5 5 3 3 2 2 3 3 5 5 1 1 1 5 3 2 3 5 1 3 θ = θ = θ = 0.6 0.7 0.27 z = z = z = (1,1, 4) (3,3,3) (5,3) 1 1 3 3 3 3 3 3 4 4 4 3 3 5 5 5 1 1 1 1 3 3 3 3 θ = θ = θ = 0.65 0.17 0.6 Mean: (0.2+0.2+0.4)/3=0.27

Recommend


More recommend