Classifications(s): overview Mixture model solution Estimation Clustering with MixtComp Imputation with MixtComp Conclusion MixtComp software: Model-based clustering/imputation with mixed data, missing data and uncertain data https://modal-research.lille.inria.fr/BigStat/ Christophe Biernacki (with Thibault Deregnaucourt and Vincent Kubicki) Tutorial in MissData Conference June, 17th 2015 1/58
Classifications(s): overview Mixture model solution Estimation Clustering with MixtComp Imputation with MixtComp Conclusion Take-home message Imputation: should take into account the final analysis purpose Clustering: no imputation is needed in the model-based context Mixture models: flexible enough for accurate multiple imputation MixtComp software Clustering/imputation for mixed data 2/58
Classifications(s): overview Mixture model solution Estimation Clustering with MixtComp Imputation with MixtComp Conclusion Outline 1 Classifications(s): overview 2 Mixture model solution 3 Estimation 4 Clustering with MixtComp 5 Imputation with MixtComp 6 Conclusion 3/58
Classifications(s): overview Mixture model solution Estimation Clustering with MixtComp Imputation with MixtComp Conclusion Today’s data (1/2) Today, it is easy to collect many features, so it favors data variety and/or mixed data missing data uncertainty (or interval data) Mixed, missing, uncertain Observed individuals x O ∈ X ? 0.5 ? 5 0.3 0.1 green 3 0.3 0.6 { red,green } 3 0.9 [0.25 0.45] red ? ↓ ↓ ↓ ↓ continuous continuous categorical integer 4/58
Classifications(s): overview Mixture model solution Estimation Clustering with MixtComp Imputation with MixtComp Conclusion Today’s data (2/2) And also Ranking data Directional data Ordinal data Functional data Graphical data . . . 5/58
Classifications(s): overview Mixture model solution Estimation Clustering with MixtComp Imputation with MixtComp Conclusion Supervised classification (1/3) Data: learning dataset D = ( x O , z ) n individuals: x = ( x 1 , . . . , x n ) = ( x O , x M ) belonging to a space X x O Observed individuals x M Missing individuals Partition in K groups G 1 , . . . , G K : z = ( z 1 , . . . , z n ), z i = ( z i 1 , . . . , z iK ) ′ x i ∈ G k ⇔ z ih = I { h = k } Aim: estimation of an allocation rule r from D r : X − → { 1 , . . . , K } x O �− → r ( x O n +1 ) . n +1 6/58
Classifications(s): overview Mixture model solution Estimation Clustering with MixtComp Imputation with MixtComp Conclusion Supervised classification (2/3) Mixed, missing, uncertain Individuals x O Partition z ⇔ Group ? 0.5 red 5 0 1 0 ⇔ G 2 0.3 0.1 green 3 1 0 0 ⇔ G 1 0.3 0.6 { red,green } 3 1 0 0 ⇔ G 1 0.9 [0.25 0.45] red ? 0 0 1 ⇔ G 3 ↓ ↓ ↓ ↓ continuous continuous categorical integer 7/58
Classifications(s): overview Mixture model solution Estimation Clustering with MixtComp Imputation with MixtComp Conclusion Supervised classification (3/3) 6 5 4 4 1 3 2 3 2 2 1 − → • ? 0 0 −1 −2 −2 −3 −4 −4 −5 −6 −6 −4 −2 0 2 4 6 8 −4 −2 0 2 4 6 ( x O , z ) and x O r and ˆ ˆ z n +1 n +1 8/58
Classifications(s): overview Mixture model solution Estimation Clustering with MixtComp Imputation with MixtComp Conclusion Semi-supervised classification (1/3) Data: learning dataset D = ( x O , z O ) n individuals: x = ( x 1 , . . . , x n ) = ( x O , x M ) belonging to a space X x O Observed individuals x M Missing individuals Partition: z = ( z 1 , . . . , z n ) = ( z O , z M ) z O Observed partition z M Missing partition Aim: estimation of an allocation rule r from D r : X − → { 1 , . . . , K } x O r ( x O �− → n +1 ) . n +1 Idea: x is cheaper than z so # z M ≫ # z O 9/58
Classifications(s): overview Mixture model solution Estimation Clustering with MixtComp Imputation with MixtComp Conclusion Semi-supervised classification (2/3) Mixed, missing, uncertain Individuals x O Partition z O ⇔ Group ? 0.5 red 5 0 ⇔ G 2 or G 3 ? ? 0.3 0.1 green 3 1 0 0 ⇔ G 1 0.3 0.6 { red,green } 3 ⇔ ??? ? ? ? 0.9 [0.25 0.45] red ? 0 0 1 ⇔ G 3 ↓ ↓ ↓ ↓ continuous continuous categorical integer 10/58
Classifications(s): overview Mixture model solution Estimation Clustering with MixtComp Imputation with MixtComp Conclusion Semi-supervised classification (3/3) 6 5 4 4 1 3 2 3 2 2 1 − → • ? 0 0 −1 −2 −2 −3 −4 −4 −5 −6 −6 −4 −2 0 2 4 6 8 −4 −2 0 2 4 6 ( x O , z O ) and x O r and ˆ ˆ z n +1 n +1 11/58
Classifications(s): overview Mixture model solution Estimation Clustering with MixtComp Imputation with MixtComp Conclusion Unsupervised classification (1/3) Data: learning dataset D = x O , so z O = ∅ Aim: estimation of the partition z and the number of groups K Also known as: clustering 12/58
Classifications(s): overview Mixture model solution Estimation Clustering with MixtComp Imputation with MixtComp Conclusion Unsupervised classification (2/3) Mixed, missing, uncertain Individuals x O Partition z O ⇔ Group ? 0.5 red 5 ? ? ? ⇔ ??? 0.3 0.1 green 3 ⇔ ??? ? ? ? 0.3 0.6 { red,green } 3 ? ? ? ⇔ ??? 0.9 [0.25 0.45] red ? ⇔ ??? ? ? ? ↓ ↓ ↓ ↓ continuous continuous categorical integer 13/58
Classifications(s): overview Mixture model solution Estimation Clustering with MixtComp Imputation with MixtComp Conclusion Unsupervised classification (3/3) 6 5 4 4 3 2 2 1 − → 0 0 −1 −2 −2 −3 −4 −4 −5 −6 −6 −4 −2 0 2 4 6 −6 −4 −2 0 2 4 6 8 x O ( x O , ˆ z ) 14/58
Classifications(s): overview Mixture model solution Estimation Clustering with MixtComp Imputation with MixtComp Conclusion Traditional solutions (1/3) Two main frameworks Generative models Model p( x , z ) Thus direct model for p( x ) = � z p( x , z ) Easy to take into account some missing z and x Predictive models Model p( z | x ) or sometimes 1 { p ( z | x ) > 1 / 2 } or also ranking on p( z | x ) Avoid asumptions on p( x ), thus avoids associated error model difficult to take into account some missing z and x 15/58
Classifications(s): overview Mixture model solution Estimation Clustering with MixtComp Imputation with MixtComp Conclusion Traditional solutions (2/3) No mixed, missing or uncertain data: Supervised classification 1 Generative models: linear/quadratic discriminant analysis Predictive models: logistic regression, support vector machines (SVM), k nearest neighbourhood, classification trees. . . Semi-supervised classification 2 Generative models: mixture models Predictive models: low density separation (transductive SVM), graph-based methods. . . Unsupervised classification 3 Generative models: k -means like criteria, hierarchical clustering, mixture models Predictive models: - 1 Govaert et al. , Data Analysis, Chap.6, 2009 2 Chapelle et al. , Semi-supervised learning, 2006 3 Govaert et al. , Data Analysis, Chap.7-9, 2009 16/58
Classifications(s): overview Mixture model solution Estimation Clustering with MixtComp Imputation with MixtComp Conclusion Traditional solutions (3/3) But more complex with mixed, missing or uncertain data. . . Missing/uncertain data: multiple imputation is possible but it should ideally take into account the classification purpose at hand Mixed data: some heuristic methods with recoding How to marry the classification aim with mixed, missing or uncertain data? 17/58
Classifications(s): overview Mixture model solution Estimation Clustering with MixtComp Imputation with MixtComp Conclusion Outline 1 Classifications(s): overview 2 Mixture model solution 3 Estimation 4 Clustering with MixtComp 5 Imputation with MixtComp 6 Conclusion 18/58
Classifications(s): overview Mixture model solution Estimation Clustering with MixtComp Imputation with MixtComp Conclusion Density estimation (1/2) z O = ∅ Data: learning dataset D = x O , so Aim: estimation of the distribution p( x ) z O � = ∅ Extension easy to: D = ( x O , z O ) with Useful for: data imputation and multi-purpose classification! 19/58
Classifications(s): overview Mixture model solution Estimation Clustering with MixtComp Imputation with MixtComp Conclusion Density estimation (2/2) 5 4 3 2 0.35 1 0.3 − → 0.25 0 density 0.2 −1 6 0.15 4 0.1 −2 2 0.05 −3 0 0 −2 −6 −4 −4 −2 0 −4 2 4 −5 y 6 −6 8 −6 −4 −2 0 2 4 6 x x O ˆ p( x ) 20/58
Classifications(s): overview Mixture model solution Estimation Clustering with MixtComp Imputation with MixtComp Conclusion The mixture model answer in {∅ ,semi,un } classification Rigorous definition of a group: x 1 ∈ G k ⇔ x 1 is a realization of X 1 ∼ p k ( x 1 ) Mixture formulation: ∼ X 1 | Z 1 k =1 p k ( x 1 ) Z 1 ∼ Mult K (1 , π 1 , . . . , π K ) � �� � π Joint and marginal (or mixture) distributions: K � [ π k p k ( x 1 )] z 1 k ∼ ( X 1 , Z 1 ) k =1 K � ∼ X 1 p( x 1 ) = π k p k ( x 1 ) k =1 1 ) = π k p k ( x O 1 ) Maximum a posteriori (MAP): with t k ( x O 1 ) = p( Z 1 k = 1 | x O p ( x O 1 ) r ( x O k = { 1 ,..., K } t k ( x O 1 ) = arg max 1 ) 21/58
Recommend
More recommend