Motivation Intra-block model I: Mixture of two extreme distributions Intra-block model II: Conditional dependency modes Model-based clustering of categorical data by relaxing conditional independence M. Marbac 3 , 6 , C. Biernacki 3 , 4 , 5 , V. Vandewalle 1 , 2 , 3 Classification society meeting 2015 Mc Master University 5 June 2015 5 2 4 1 6 3 1/39
Motivation Intra-block model I: Mixture of two extreme distributions Intra-block model II: Conditional dependency modes Outline 1 Motivation 2 Intra-block model I: Mixture of two extreme distributions 3 Intra-block model II: Conditional dependency modes 2/39
Motivation Intra-block model I: Mixture of two extreme distributions Intra-block model II: Conditional dependency modes Model-based clustering ˆ z = (ˆ z 1 , . . . , ˆ x = ( x 1 , ..., x n ) z n ), ˆ g clusters 4 4 2 2 clustering − → X 2 X 2 0 0 −2 −2 −2 0 2 4 −2 0 2 4 X 1 X 1 Mixture model: well-posed problem g � x → ˆ θ → p ( z | x , g ; ˆ � θ ) → ˆ z p ( x ; θ | g ) = π k p ( x ; θ k | g ) can be used for x → ˆ p ( g | x ) → ˆ g k =1 with θ = (( π 1 , . . . , π k , . . . , π g ) , ( α 1 , . . . , α k , . . . , α g )) 3/39
Motivation Intra-block model I: Mixture of two extreme distributions Intra-block model II: Conditional dependency modes Categorical data d categorical variables, each with m j response levels x i = { x j i : j = 1 , . . . , d } x j i = { x jh : h = 1 , . . . , m j } i x jh = 1 if i has response level h for variable j and x jh = 0 otherwise i i Example (“Genes Diffusion” company): n = 4270 calves d = 9 variables of behavior 1 and health related 2 Response levels of TRC ( j = 3): TRC ∈{ “curative”,“preventive”,“no” } ( m 3 = 3) x 3 = “curative” = (1 0 0) 1 x 3 = “no” = (0 0 1) 2 x 3 = “no” = (0 0 1) 3 . . . . . . . . . . . . . . . 1 aptitude for sucking Apt , behavior of the mother just before the calving Iso 2 treatment against omphalite TOC , respiratory disease TRC and diarrhea TDC , umbilicus disinfection Dis , umbilicus emptying Emp , mother preventive treatment against respiratory disease TRM and diarrhea TDM 4/39
Motivation Intra-block model I: Mixture of two extreme distributions Intra-block model II: Conditional dependency modes Intra-class correlations A nowadays interest More frequent (in the population) when d increases More observable (in the sample) when n increases Risk of bias when models do not take into account such correlations Bias example (on z ) with Gaussians: 4 4 2 2 X 2 X 2 0 0 −2 −2 −2 0 2 4 −2 0 2 4 X 1 X 1 Independent Gaussians Correlated Gaussians 5/39
Motivation Intra-block model I: Mixture of two extreme distributions Intra-block model II: Conditional dependency modes Classical categorical models Conditional independence (CIM): linked to some χ 2 distance-based methods m j d d � � � k ) x jh p ( x j ; α j ( α jh p ( x ; θ k ) = p ( x ; α k ) = k ) = j =1 j =1 h =1 k = p ( x jh = 1 | z = k ) where α k = { α jh k : j = 1 , . . . , d , h = 1 , . . . , m j } and α jh ⊖ bias Dependence trees: allows only certain dependencies ⊖ too many parameters and unstable estimation of the tree Latent Trait Analyzers: a continuous variable explains intra-dependency m j d � � � p ( x jh | c ; α k ) p ( c ) d c p ( x ; α k ) = R | c | j =1 h =1 ⊖ difficult to meaningfully explain correlations The “gold rule” A model should be flexible + parsimonious + meaningful 6/39
Motivation Intra-block model I: Mixture of two extreme distributions Intra-block model II: Conditional dependency modes Dependence per blocks (1/3) Conditionally on the class k , variables are grouped into b k independent blocks Partition of variables: σ k = ( σ k 1 , . . . , σ k b k ) of { 1 , . . . , d } Number of variables in the block b of the component k : d { kb } = card( σ kb ) Subset of x associated to σ kb : x { kb } = x σ kb = ( x { kb } j ; j = 1 , . . . , d { kb } ) Variable j of the block b for component k : x { kb } j = ( x { kb } jh ; h = 1 , . . . , m { kb } ) j Modalities number of x { kb } j : m { kb } j All repartitions in blocks: σ = ( σ 1 , . . . , σ g ) Distribution per class: B k � p ( x { kb } ; θ kb ) p ( x ; θ k | σ k , g ) = with θ k = ( θ k 1 , . . . , θ k b k ) b =1 Inter-Block model σ k verifies the “gold rule” 7/39
Motivation Intra-block model I: Mixture of two extreme distributions Intra-block model II: Conditional dependency modes Dependence per blocks (2/3) Example with g = 2, d = 5: k = 1, B 1 = 2 k = 2, B 2 = 3 σ 1 = ( { 1 , 2 } , { 3 , 4 , 5 } ) σ 2 = ( { 1 , 5 } , { 2 , 4 } , { 3 } ) The present work Intra-block distribution p ( x { kb } ; θ kb ) should also verify the “gold rule” 8/39
Motivation Intra-block model I: Mixture of two extreme distributions Intra-block model II: Conditional dependency modes Dependence per blocks (2/3) Example with g = 2, d = 5: k = 1, B 1 = 2 k = 2, B 2 = 3 σ 1 = ( { 1 , 2 } , { 3 , 4 , 5 } ) σ 2 = ( { 1 , 5 } , { 2 , 4 } , { 3 } ) The present work Two Intra-block distributions are now proposed. . . 9/39
Motivation Intra-block model I: Mixture of two extreme distributions Intra-block model II: Conditional dependency modes Outline 1 Motivation 2 Intra-block model I: Mixture of two extreme distributions 3 Intra-block model II: Conditional dependency modes 10/39
Motivation Intra-block model I: Mixture of two extreme distributions Intra-block model II: Conditional dependency modes Maximum dependency distribution Main idea The “opposite” distribution of independence according to the Cramer’s V criterion computed on all the couples of variables The knowledge of the variable having the largest number of modalities determines exactly the others Variables are ordered by decreasing number of modalities in each block Successive surjections from the space of x { kb } j to the space of x { kb } j +1 other variables � �� � 1st variable d { kb } � �� � � p ( x { kb } ; τ kb , δ kb ) = p ( x { kb } 1 ; τ kb ) p ( x { kb } j | x { kb } 1 ; { δ hj kb } h =1 ,..., m { kb } ) 1 j =2 m { kb } m { kb } d { kb } j 1 ) x { kb } jh ′ � x { kb } 1 h � � � � ( δ hjh ′ τ h = kb kb ���� ���� h ′ =1 h =1 j =2 ∈ (0 , 1) ∈{ 0 , 1 } kb = ( δ hjh ′ with δ kb = ( δ hj kb ), δ hj kb ), τ kb = ( τ h kb ) 11/39
Motivation Intra-block model I: Mixture of two extreme distributions Intra-block model II: Conditional dependency modes Example m { 12 } 1 = m { 12 } 2 = m { 12 } 3 = 2 m { 11 } 1 = 4, m { 11 } 2 = 3 δ hjh ′ δ h 1 h = 1 for h = 1 , 2 , 3, δ 413 11 = 1 = 1 iff ( h = h ′ ) 11 12 τ 11 = (0 . 1 , 0 . 3 , 0 . 2 , 0 . 4) τ 12 = (0 . 5 , 0 . 5) 12/39
Motivation Intra-block model I: Mixture of two extreme distributions Intra-block model II: Conditional dependency modes Mixture of extreme distributions (CCM1) CCM1 p ( x { kb } ; θ kb ) = (1 − ρ kb ) p ( x { kb } ; α kb ) + ρ kb p ( x { kb } ; τ kb , δ kb ) � �� � � �� � independence extreme dependency where θ kb = ( ρ kb , α kb , τ kb , δ kb ) Meaningful: ρ kb : global inter-variable correlation in the block (0 ≤ ρ kb ≤ 1) δ kb : intra-variable correlation in the block ( ∈ { 0 , 1 } ) Parsimony: � m { kb } ν ccm1 = ν cim + 1 { ( k , b ) | d { kb } > 1 } � �� � nb modalities of the 1st variable in the block Identifiable if d { kb } > 2 or m { kb } > 2 (additional constraints added otherwise) 2 13/39
Motivation Intra-block model I: Mixture of two extreme distributions Intra-block model II: Conditional dependency modes ρ kb vs. Cramer’s V Empirical link between ρ kb and the Cramer’s V for two binary variables 14/39
Motivation Intra-block model I: Mixture of two extreme distributions Intra-block model II: Conditional dependency modes Estimation of θ (1/3) ˆ θ = argmax θ L ( θ ; x | g , σ ) with model ( g , σ ) fixed Global GEM algorithm E global step: π ( r ) k p ( x i ; σ k , θ ( r ) k ) z ( r ) = ik � g k ′ =1 π ( r ) k ′ p ( x i ; σ k ′ , θ ( r ) k ′ ) GM global step: = n ( r ) n � π ( r +1) n ( r ) z ( r ) k where = k k ik n i =1 θ ( r +1) = argmax θ kb L ( θ kb ; x , z ( r ) | g , σ ) ∀ ( k , b ) , − → MH algorithm kb 15/39
Motivation Intra-block model I: Mixture of two extreme distributions Intra-block model II: Conditional dependency modes Estimation of θ (2/3) θ ( r +1) = argmax θ kb L ( θ kb ; x , z ( r ) | g , σ ) with ( z ( r ) , g , σ ) fixed ∀ ( k , b ) , kb Metropolis-Hastings algorithm (discrete parameters δ kb ) Proposal distribution: ( r , s + 1 2 ) ∼ uniform distribution in a neighborhood ∆( δ ( r , s ) δ ) kb kb ( r , s + 1 ( ρ kb , α kb , τ kb ) ( r , s + 1 2 ) 2 ) = argmax • L ( • ; x , z ( r ) , δ | g , σ ) − → EM algorithm kb Acceptance distribution: ( r , s + 1 ( r , s + 1 ) z ( r ) 2 ) 2 ) � n i =1 p ( x { kb } ik | ∆( δ ; θ ) | µ ( r , s +1) = min i kb kb , 1 ) z ( r ) � n i =1 p ( x { kb } ; θ ( r , s ) ik | ∆( δ ( r , s ) ) | i kb kb � ( r , s + 1 2 ) with probability µ ( r , s +1) θ ( r , s +1) θ = kb kb θ ( r , s ) otherwise kb 16/39
Recommend
More recommend