1
play

1 2.1 Mobility trees Mobility tree, Men Working status 04 - PDF document

IPUC, Neuch atel, February 23-24, 2007 1 Aim of the research project Innovative Data Mining based approaches for Just started February 1, 2007 FNS project on life course analysis Mining event histories: Towards new


  1. ✬ ✩ ✬ ✩ IPUC, Neuchˆ atel, February 23-24, 2007 1 Aim of the research project Innovative Data Mining based approaches for Just started February 1, 2007 FNS project on life course analysis “Mining event histories: Towards new insight on personal Swiss life courses” Gilbert Ritschard Methodological concern Explore and develop data mining approaches for Alexis Gabadinho, Nicolas M¨ uller, Matthias Studer individual longitudinal data University of Geneva, Switzerland • Methods for time to event analysis Outline • Methods for sequence data analysis 1 Aim of the research project Socio-demographic concern Using mainly SHP data, but also other sources, 2 Our first results gain original insight on 2.1 Mobility trees 2.2 Survival trees • How familial, professional and other socio-demographic events are 2.3 Characteristic sequences entwined, 3 Foreseen Developments • Typical characteristics of Swiss life trajectories, • Changes in these characteristics over time. ✫ ✪ ✫ ✪ http://mephisto.unige.ch IPUC07 toc intro mob surv seq conc ◭ ◮ � � 22/2/2007gr 1 IPUC07 toc intro mob surv seq conc ◭ ◮ � � 22/2/2007gr 2 ✬ ✩ ✬ ✩ What is data mining? What is data mining? (2) Concerned with characterization of interesting patterns “Data Mining is the process of finding new and potentially useful knowledge from data” • per se (unsupervised learning) Gregory Piatetsky-Shapiro editor of http://www.kdnuggets.com – Clustering – Frequent itemsets – Association rules “Data mining is the analysis of (often large) observational data sets to find unsuspected relationships and to summarize the data in novel • for classification or prediction purposes (supervised learning) ways that are both understandable and useful to the data owner” – Decision trees (Hand et al., 2001) – Bayesian networks – SVM and Kernel Methods – CBR (case based reasoning), K-NN ( k nearest neighbors) Also called Knowledge Discovery in Databases, KDD. Origin: IJCAI Workshop, 1989, Piatetsky-Shapiro (1989) Proceeds mainly heuristically . Unlike statistical modeling, makes no assumptions about process Textbooks : Han and Kamber (2001), Hand et al. (2001) generating the data. ✫ ✪ ✫ ✪ IPUC07 toc intro mob surv seq conc ◭ ◮ � � 22/2/2007gr 3 IPUC07 toc intro mob surv seq conc ◭ ◮ � � 22/2/2007gr 4 ✬ ✩ ✬ ✩ 2 Our first results Typology of methods for individual longitudinal data nature of data • Mobility trees questions time stamped event state/event sequences • Survival trees descriptive - Survival curves: - Optimal matching clustering Parametric (Weibull, Gompertz) - Frequencies of typical • Characteristic sequences and non parametric patterns (Kaplan-Meier, Nelson-Aalen) - Discovering typical patterns estimators causality - Hazard regression models - Markov models, Mobility trees - Survival trees - Association rules between subsequences ✫ ✪ ✫ ✪ IPUC07 toc intro mob surv seq conc ◭ ◮ � � 22/2/2007gr 5 IPUC07 toc intro mob surv seq conc ◭ ◮ � � 22/2/2007gr 6 1

Recommend


More recommend