1
play

1 Longitudinal Analysis Survival Trees Mining Frequent Episodes - PDF document

Longitudinal Analysis Survival Trees Mining Frequent Episodes Summary Longitudinal Analysis Survival Trees Mining Frequent Episodes Summary Outline Longitudinal Analysis 1 Motivation Mining Event Histories: A Social Scientist View


  1. Longitudinal Analysis Survival Trees Mining Frequent Episodes Summary Longitudinal Analysis Survival Trees Mining Frequent Episodes Summary Outline Longitudinal Analysis 1 Motivation Mining Event Histories: A Social Scientist View Methods for Longitudinal Data Survival Trees 2 Gilbert Ritschard Principle Example Department of Econometrics, University of Geneva Social Science Issues http://mephisto.unige.ch Mining Frequent Episodes 3 IASC 2007, Aveiro, Portugal, August 30 - September 1 What Is It About? Example: Counting Alternate Episode Structures Issues Regarding Episode Rules 10/8/2007gr 1/34 10/8/2007gr 2/34 Longitudinal Analysis Survival Trees Mining Frequent Episodes Summary Longitudinal Analysis Survival Trees Mining Frequent Episodes Summary Motivation Motivation Need for suited methods for discovering interesting knowledge Individual life course paradigm. from these individual longitudinal data. Following macro quantities (e.g. #divorces, fertility rate, mean education level, ...) over time Social scientists use insufficient for understanding social behavior. Essentially Survival analysis (Event History Analysis) Need to follow individual life courses. More rarely sequential data analysis (Optimal Matching, Data availability Markov Chain Models) Large panel surveys in many countries (SHP, Could social scientists benefit from data-mining approaches? Biographical retrospective surveys (FFS, ...). Which methods? Statistical matching of censuses, population registers and other Are there specific issues with those methods for social administrative data. scientists? 10/8/2007gr 4/34 10/8/2007gr 5/34 Longitudinal Analysis Survival Trees Mining Frequent Episodes Summary Longitudinal Analysis Survival Trees Mining Frequent Episodes Summary Alternative views of Individual Longitudinal Data Issues with life course data Incomplete sequences Table: Time stamped events, record for Sandra Censored and truncated data: Cases falling out of observation before experiencing an event of ending secondary school in 1970 first job in 1971 marriage in 1973 interest. Sequences of varying length. Time varying predictors. Table: State sequence view, Sandra Example: When analysing time to divorce, presence of children year 1969 1970 1971 1972 1973 is a time varying predictor. civil status single single single single married Data collected by clusters education level primary secondary secondary secondary secondary Example: Household panel surveys. job no no first first first Multi-level analysis to account for unobserved shared characteristics of members of a same cluster. 10/8/2007gr 6/34 10/8/2007gr 7/34 1

  2. Longitudinal Analysis Survival Trees Mining Frequent Episodes Summary Longitudinal Analysis Survival Trees Mining Frequent Episodes Summary Classical statistical approaches Multi-level: Simple linear regression example Survival Approaches 9 Survival or Event history analysis (Blossfeld and Rohwer, 2002) 8 y = 15.6 - 0.8 x Focuses on one event. y = 12.5 - 0.8 x Concerned with duration until event occurs 7 or with hazard of experiencing event. 6 Survival curves: Distribution of duration until event occurs 5 Children S ( t ) = p ( T ≥ t ) . 4 3 y = 3.2 + 0.2 x Hazard models: Regression like models for S ( t , x ) or hazard 2 h ( t ) = p ( T = t | T ≥ t ) y = 6.2 - 0.8 x 1 � � h ( t , x ) = g t , β 0 + β 1 x 1 + β 2 x 2 ( t ) + · · · . 0 1 3 5 7 9 11 13 15 Education 10/8/2007gr 8/34 10/8/2007gr 9/34 Longitudinal Analysis Survival Trees Mining Frequent Episodes Summary Longitudinal Analysis Survival Trees Mining Frequent Episodes Summary Survival curves (Switzerland, SHP 2002 biographical survey) Analysis of sequences 1 Frequencies of given subsequences 0.9 Essentially event sequences. 0.8 Subsequences considered as categories ⇒ Methods for Survival probability 0.7 categorical data apply (Frequencies, cross tables, log-linear 0.6 models, logistic regression, ...). 0.5 Markov chain models 0.4 State sequences. 0.3 Focuses on transition rates between states. Does the rate also depend on previous states? 0.2 Women How many previous states are significant? 0.1 Optimal Matching (Abbott and Forrest, 1986) . 0 State sequences. 0 10 20 30 40 50 60 70 80 Edit distance (Levenshtein, 1966; Needleman and Wunsch, AGE (years) 1970) between pairs of sequences. Clustering of sequences. Leaving home Marriage 1st Chilbirth Parents' death Last child left Divorce Widowing 10/8/2007gr 10/34 10/8/2007gr 11/34 Longitudinal Analysis Survival Trees Mining Frequent Episodes Summary Longitudinal Analysis Survival Trees Mining Frequent Episodes Summary Optimal Matching Typology of methods for life course data Example from (Gauthier, Widmer, Bucher, and Notredame, 2007) Issues Questions duration/hazard state/event sequencing Professional life course, age 16-64, Switzerland SHP retrospective survey, ∼ 3000 cases descriptive • Survival curves: • Optimal matching 5 clusters: Full Time, Part Time, Come Back, Home, Erratic Parametric clustering (Weibull, Gompertz, ...) • Frequencies of given and non parametric patterns 100% 100% (Kaplan-Meier, Nelson- • Discovering typical 80% 80% Aalen) estimators. episodes 60% 60% causality • Hazard regression models • Markov models 40% 40% (Cox, ...) • Mobility trees 20% 20% • Survival trees • Association rules 0% 0% among episodes 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60 62 64 6 1 8 1 2 0 2 2 2 4 2 6 2 8 3 0 3 2 3 4 3 6 3 8 4 0 4 2 4 4 4 6 8 4 0 5 5 2 5 4 5 6 5 8 6 0 6 2 6 4 Full time Part time Negative interruption Positive interruption Home Retired Education Full time Part time Negative i nterruption Positive interruption Home Re tired Education Full Time, 53% Come Back, 16% 10/8/2007gr 12/34 10/8/2007gr 13/34 2

Recommend


More recommend