Outline Clustering Clustering Clustering is a widely used - PowerPoint PPT Presentation

Collaboration with Rudolf Winter-Ebmer , Capturing Unobserved Heterogeneity in Department of Economics, Johannes Kepler University Linz the Austrian Labor Market Using Finite Supported by the Austrian Science Foundation (FWF) under grant P 17 959 ( “Gibbs Sampling for Discrete Data” ) Mixtures of Markov Chain Models Sylvia Fr¨ uhwirth-Schnatter and Christoph Pamminger Department of Applied Statistics and Econometrics Johannes Kepler University Linz, Austria UseR! 2006 – p. 1 UseR! 2006 – p. 2 Outline Clustering Clustering Clustering is a widely used statistical tool to determine subsets Motivating Example Frequently used clustering methods are based on • Research Question distance-measures • Data Description However, distance-measures are difficult to define for more Markov Chain Model complex data (e.g. time series) Dirichlet Multinomial Model ⇒ Model-based clustering methods (mixture models) • Bayesian Analysis We present an approach for model-based clustering of • MCMC-Estimation discrete-valued time series data following ideas discussed in Fr¨ uhwirth-Schnatter and Kaufmann (2004) Estimation Results UseR! 2006 – p. 3 UseR! 2006 – p. 4

Motivating Example Data Description Wage Mobility in the Austrian labor market Time series for N = 9 , 809 individuals (only men, because of data inconsistencies with e.g. female part-time workers) Describes chances but also risks of an individual to move between wage categories Gross monthly wage at May of successive years (with individual length T i ) divided into 6 categories corresponding Assumption of different career progressions or income to quintiles of the particular income distribution (1-5) and careers of employees zero-income (0) according to Weber (2002) Task : Find groups of employees with similar behavior in → y i = ( y i 0 , y i 1 , y i 2 , . . . , y it , . . . , y i,T i ) , i = 1 , . . . , N terms of transition probabilities (focus on one-year transitions) Income careers of the first four employees in the data set Data provided by the Austrian social security authority [1] 4 4 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 [2] 1 1 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 4 4 4 4 4 [3] 4 0 0 1 0 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 1 0 5 [4] 3 2 3 5 4 4 4 4 5 5 2 3 3 2 3 3 3 4 4 4 4 4 4 4 4 4 4 UseR! 2006 – p. 5 UseR! 2006 – p. 6 Illustration Markov Chain Model 5 5 5 y it = k if subject i ∈ { 1 , . . . , N } belongs to wage category 4 4 4 3 3 3 k ∈ { 0 , 1 , . . . , K } in year t ∈ { 0 , . . . , T i } 2 2 2 1 1 1 Markov chain y i is modeled with a (time-homogeneous) 0 0 0 0 5 10 15 20 25 0 5 10 15 20 25 0 5 10 15 20 25 Markov process with unknown transition matrix ξ , where 5 5 5 K 4 4 4 � ξ jk = P { y it = k | y i,t − 1 = j } and ξ jk = 1 3 3 3 2 2 2 k =0 1 1 1     0 0 0 ξ 0 · ξ 00 ξ 01 · · · ξ 0 K 5 10 15 20 25 0 5 10 15 20 25 5 10 15 20 25 5 5 5  ξ 1 ·   ξ 10 ξ 11 · · · ξ 1 K      ξ = = 4 4 4 . . . ...     . . . . . . 3 3 3         2 2 2 ξ K · ξ K 0 ξ K 1 · · · ξ KK 1 1 1 0 0 0 0 5 10 15 20 25 5 10 15 20 0 5 10 15 20 25 Figure 1: Individual wage mobility time series of nine selected employees. UseR! 2006 – p. 7 UseR! 2006 – p. 8

Bayesian Analysis Modeling Hidden Groups Prior -distribution of ξ j · , j = 0 , . . . , K : Assumptions and notations • H hidden groups with group-specific transition ξ j · ∼ D ( e 0 ,j 0 , . . . , e 0 ,jK ) . matrices ξ h , h = 1 , . . . , H Posterior -distribution of ξ j · : • Individual transition matrices ξ s i , i = 1 , . . . , N • Latent indicator variable S = ( S 1 , . . . , S N ) for group ξ j · ∼ D ( e N,j 0 , . . . , e N,jK ) with e N,jk = e 0 ,jk + N jk , membership: S i = h , if subject i belongs to group h where N jk = # { y it = k, y i,t − 1 = j } is the number of • Relative group sizes η = ( η 1 , . . . , η H ) : transitions from state j to state k over all subjects P { S i = h | η } = η h , h = 1 , . . . , H i = 1 , . . . , N ⇒ ξ ∼ product of ( K + 1 indep.) D irichlet-distributions UseR! 2006 – p. 9 UseR! 2006 – p. 10 Modeling Heterogeneity Dirichlet Multinomial Model 1. Simple model: Group-specific transition matrix ξ h is given by ξ s i | ( S i = h ) = ξ h (fixed) e h,jk ξ h,jk = E ( ξ s i,jk | S i = h, e h ) = � K k =0 e h,jk ⇒ ξ h | S ∼ product of ( K + 1 indep.) Dirichlet-distributions 2. Apply a multinomial logit model with random So each row of e h determines the corresponding row of ξ h effects (Rossi et al., 2005). High-parametrical model including high-dimensional covariance matrices Finite mixture model representation: 3. Dirichlet Multinomial Model : Y i ∼ p h ( y i | e h ) . . . product of K + 1 Dirichlet-distributions ξ s i,j · | ( S i = h ) ∼ D ( e h,j 0 , . . . , e h,jK ) Unconditional density: H with group-specific parameter e h = { e h,j · } , � p ( Y i | e 1 , . . . , e H ) = η h p h ( y i | e h ) j = 0 , . . . , K h =1 UseR! 2006 – p. 11 UseR! 2006 – p. 12

Group-specific parameter e h Bayesian Analysis The variance of ξ s i,jk is given by Prior-assumptions : • All e h,j · are independent and e h,j · − 1 ≥ 0 (to avoid � l � = k e h,jl problems with empty groups and non-informative priors) V ar ( ξ s i,jk | S i = h, e h ) = ξ 2 h,jk · � � � K 1 + � K k =0 e h,jk · k =0 e h,jk • e h,j · − 1 is a discrete -valued multivariate random variable • e h,j · − 1 ∼ negative multinomial distribution If � K k =0 e h,jk is very large (for each row in each group) → • η ∼ Dirichlet-distribution amount of heterogeneity (in each group) is small ⇒ leads to the simple model with fixed ξ h All parameters e 1 , . . . , e H , S , η are jointly estimated by If � K means of MCMC-Sampling k =0 e h,jk is small ⇒ the individual transition matrices are allowed to deviate from the group mean within each group UseR! 2006 – p. 13 UseR! 2006 – p. 14 Estimation Results MCMC-Estimation (Gibbs Sampler) Choose initial values for η and e 1 , . . . , e H ( H fixed in Here we show the results for 3 groups which allow very sensible interpretations according to our economist ( M = advance) and repeat following steps ( m = 1 , . . . , M ): 10,000 with 2,000 burn-in) 1. Bayes-classification for each subject i : draw S ( m ) from p ( S i | y i , η ( m − 1) , e ( m − 1) , . . . , e ( m − 1) ) . 1 i H • Transition probabilities 2. sample Group sizes η : • Typical group members draw η ( m ) from D ( α ( m ) , . . . , α ( m ) H ) with • Classification probabilities 1 α ( m ) = N ( m ) + α 0 and N ( m ) = # { S ( m ) = h } . • Equilibrium distributions h h h i 3. sample group-specific parameters e 1 , . . . , e H : draw e ( m ) h,j · row-by-row from p ( e h,j · | y , S ( m ) ) (not of closed form!) using a Metropolis-Hastings step (with discrete random walk proposal). UseR! 2006 – p. 15 UseR! 2006 – p. 16

Transition Probabilities Typical Group Members member of group 1 member of group 1 member of group 1 5 5 5 S = 1 ( 0.2152 ) S = 2 ( 0.2487 ) S = 3 ( 0.5361 ) 4 4 4 ti 0 1 2 3 4 5 ti 0 1 2 3 4 5 ti 0 1 2 3 4 5 3 3 3 ti.1 ti.1 ti.1 2 2 2 0 0 0 1 1 1 1 1 1 0 0 0 5 10 15 20 5 10 15 20 25 5 10 15 20 25 2 2 2 member of group 2 member of group 2 member of group 2 5 5 5 3 3 3 4 4 4 4 4 4 3 3 3 5 5 5 2 2 2 1 1 1 0 0 0 0 5 10 15 20 25 0 5 10 15 20 25 0 5 10 15 20 25 Figure 2: 3D-Visualizations of transition probabilities ˆ member of group 3 member of group 3 member of group 3 ξ h ( vol- 5 5 5 4 4 4 umes of balls are proportional to probs) and estimated group 3 3 3 sizes ˆ η indicated in brackets (posterior means). 2 2 2 1 1 1 0 0 0 0 5 10 15 20 25 0 5 10 15 20 25 0 5 10 15 20 25 Figure 3: Selected typical group members (with high classification prob). UseR! 2006 – p. 17 UseR! 2006 – p. 18 Classification Probabilities Equilibrium Distributions i \ h 1 2 3 j \ h 1 2 3 1 0.00016 0.35852 0.64132 0 0.25028 0.60154 0.03993 2 0.01319 0.98676 0.00005 3 0.13440 0.25522 0.61039 1 0.22435 0.10482 0.10655 4 0.34690 0.00462 0.64848 2 0.13299 0.06598 0.13688 5 0.00035 0.99965 0.00000 6 0.13326 0.86632 0.00042 3 0.14742 0.03524 0.16979 7 0.00011 0.99989 0.00000 4 0.15030 0.03786 0.23205 8 0.81248 0.18748 0.00004 9 0.00008 0.99992 0.00000 5 0.09466 0.15456 0.31480 10 0.05821 0.18316 0.75863 . . . Table 2: Equilibrium distributions in each group. 9809 0.51099 0.29038 0.19863 Table 1: Classification probabilities for each individual. UseR! 2006 – p. 19 UseR! 2006 – p. 20

Outline Clustering Clustering Clustering is a widely used - PowerPoint PPT Presentation

Collaboration with Rudolf Winter-Ebmer , Capturing Unobserved Heterogeneity in Department of Economics, Johannes Kepler University Linz the Austrian Labor Market Using Finite Supported by the Austrian Science Foundation (FWF) under grant P 17

Graph Clustering Graph Clustering What is clustering? What is clustering? Finding patterns

Subspace Clustering Ensemble Clustering Subspace Clustering, Ensemble Clustering, Alternative

Evolutionary Clustering Presenter: Lei Tang Evolutionary Clustering Evolutionary Clustering

Clustering A Categorization of Major Clustering Methods Partitioning Methods

CSCE 478/878 Lecture 8: Stephen Scott Clustering Introduction Outline Clustering Stephen

Clustering kMeans, Expectation Maximization, Self-Organizing Maps Outline K-means

Trust based Clustering for Group Trust based Clustering for Group Trust based Clustering for

Finding Clusters Types of Clustering Approaches: Linkage Based, e.g. Hierarchical Clustering

Clustering Hierarchical clustering and k-mean clustering Genome 373 Genomic Informatics

Cl Clustering t i A Categorization of Major Clustering Methods Partitioning Methods

Clustering Hierarchical clustering, k-mean clustering Genome 559: Introduction to Statistical and

Clustering and Dimensionality Reduction Preview Clustering K -means clustering

Lecture 23: Spectral clustering Hierarchical clustering What is a good clustering?

PAC-Bayesian Analysis of Co-clustering, Graph Clustering and Pairwise Clustering Yevgeny Seldin

Introduction to Machine Learning, Clustering and EM Barnab s P czos Contents Clustering

Similarity and clustering Dr. Ahmed Rafea Outline Motivation Clustering: An Overview

Boosting a Generalized Poisson Hurdle Model Vera Hofer University of Graz Paris, 23/08/2010

Derivatives of Exponential and Logarithm Functions 10/17/2011 The Derivative of y = e x Recall!

None Vincent L. Rowe, M.D., F.A.C.S. Professor of Surgery Division of Vascular Surgery and

Tie public debt-to-GDP ratios for CEE countries have improved substantially Debt reduction PEAK

Two-photon exchange calculations versus data Oleksandr Tomalak Johannes Gutenberg University,

Model Order Reduction of Elastic Multibody Systems with Large Finite Element Models Michael

Magical parallel variant of SIDH Daniel Cervantes-V azquez Eduardo

Energy Stable Discontinuous Galerkin Methods for Maxwells Equations in Nonlinear Optical Media

Sambuz

Useful Links

Newsletter

Mail Us