Sequential data analysis - 1 Sequential data analysis with TraMineR, Part 1 Gilbert Ritschard Department of Econometrics and Laboratory of Demography University of Geneva http://mephisto.unige.ch/biomining APA-ATI Workshop on Exploratory Data Mining University of Southern California, Los Angeles, CA, July 2009 8/7/2009gr 1/66
Sequential data analysis - 1 Outline Introduction 1 Concepts and definitions 2 Rendering and summarizing state sequences 3 8/7/2009gr 2/66
Sequential data analysis - 1 Introduction Outline Introduction 1 Concepts and definitions 2 Rendering and summarizing state sequences 3 8/7/2009gr 3/66
Sequential data analysis - 1 Introduction Objectives Section outline Introduction 1 Objectives Overview of what you will learn 8/7/2009gr 4/66
Sequential data analysis - 1 Introduction Objectives Objectives Concepts and questioning about sequential categorical data Types of sequences: with or without time content, states, transitions, events. Principles of sequence analysis exploratory approaches more causal and predictive approaches Practice of sequence analysis (TraMineR) 8/7/2009gr 5/66
Sequential data analysis - 1 Introduction Objectives Objectives Concepts and questioning about sequential categorical data Types of sequences: with or without time content, states, transitions, events. Principles of sequence analysis exploratory approaches more causal and predictive approaches Practice of sequence analysis (TraMineR) 8/7/2009gr 5/66
Sequential data analysis - 1 Introduction Objectives Objectives Concepts and questioning about sequential categorical data Types of sequences: with or without time content, states, transitions, events. Principles of sequence analysis exploratory approaches more causal and predictive approaches Practice of sequence analysis (TraMineR) 8/7/2009gr 5/66
Sequential data analysis - 1 Introduction Objectives Objectives Concepts and questioning about sequential categorical data Types of sequences: with or without time content, states, transitions, events. Principles of sequence analysis exploratory approaches more causal and predictive approaches Practice of sequence analysis (TraMineR) 8/7/2009gr 5/66
Sequential data analysis - 1 Introduction Objectives The research project Course mainly based on results of NSF project Mining event histories: Towards new insights on personal Swiss life courses Project FN 100012-113998 and FN-100015-122230 Start: February 1, 2007 End: January 31, 2011 Gilbert Ritschard, main applicant Eric Widmer, professor of Sociology, co-applicant Alexis Gabadinho, Demography Nicolas S. M¨ uller, Sociology, Computer science Matthias Studer, Economics, Sociology 8/7/2009gr 6/66
Sequential data analysis - 1 Introduction Overview of what you will learn Section outline Introduction 1 Objectives Overview of what you will learn 8/7/2009gr 7/66
Sequential data analysis - 1 Introduction Overview of what you will learn Rendering sequences 8/7/2009gr 8/66
Sequential data analysis - 1 Introduction Overview of what you will learn Characterizing set of sequences Sequence of transversal measures (modal state, between entropy, ...) id · · · t 1 t 2 t 3 1 B B D · · · 2 A B C · · · 3 B B A · · · Summary of longitudinal measures (within entropy, transition rates, mean duration ...) id · · · t 1 t 2 t 3 1 B B D · · · 2 A B C · · · 3 B B A · · · Other global characteristics: Centro-type sequence, diversity of sequences, ... 8/7/2009gr 9/66
Sequential data analysis - 1 Introduction Overview of what you will learn Characterizing set of sequences Sequence of transversal measures (modal state, between entropy, ...) id · · · t 1 t 2 t 3 1 B B D · · · 2 A B C · · · 3 B B A · · · Summary of longitudinal measures (within entropy, transition rates, mean duration ...) id · · · t 1 t 2 t 3 1 B B D · · · 2 A B C · · · 3 B B A · · · Other global characteristics: Centro-type sequence, diversity of sequences, ... 8/7/2009gr 9/66
Sequential data analysis - 1 Introduction Overview of what you will learn Characterizing set of sequences Sequence of transversal measures (modal state, between entropy, ...) id · · · t 1 t 2 t 3 1 B B D · · · 2 A B C · · · 3 B B A · · · Summary of longitudinal measures (within entropy, transition rates, mean duration ...) id · · · t 1 t 2 t 3 1 B B D · · · 2 A B C · · · 3 B B A · · · Other global characteristics: Centro-type sequence, diversity of sequences, ... 8/7/2009gr 9/66
Sequential data analysis - 1 Introduction Overview of what you will learn Mean time in each state Men Women 15 Mean time in years 10 5 0 Missing Full time Part time Neg. break Pos. break At home Retired Education State 8/7/2009gr 10/66
Sequential data analysis - 1 Introduction Overview of what you will learn Transition rates [- > 0] [- > 1] [- > 2] [- > 3] [- > 4] [- > 5] [- > 6] [- > 7] Missing 0.969 0.005 0.004 0.001 0.001 0.011 0.000 0.008 Full time 0.003 0.971 0.009 0.001 0.001 0.013 0.000 0.003 Part time 0.005 0.026 0.939 0.001 0.001 0.018 0.000 0.010 Neg. break 0.040 0.047 0.027 0.880 0.000 0.007 0.000 0.000 Pos. break 0.105 0.316 0.105 0.000 0.404 0.018 0.000 0.053 At home 0.003 0.007 0.032 0.000 0.000 0.956 0.000 0.002 Retired 0.000 0.000 0.000 0.000 0.000 0.000 1.000 0.000 Education 0.044 0.236 0.045 0.001 0.002 0.006 0.000 0.664 8/7/2009gr 11/66
Sequential data analysis - 1 Introduction Overview of what you will learn Heterogeneity: Sequence of transversal entropies Cohabitational Trajectories Occupational Trajectories 0.8 0.8 1910−1924 1925−1945 1946−1957 0.7 0.7 0.6 0.6 Entropy Entropy 0.5 0.5 0.4 0.4 0.3 0.3 A20 A23 A26 A29 A32 A35 A38 A41 A44 A20 A23 A26 A29 A32 A35 A38 A41 A44 Age Age 8/7/2009gr 12/66
Sequential data analysis - 1 Introduction Overview of what you will learn Longitudinal entropy Men: Occupational Trajectories Women: Occupational Trajectories 0.7 0.7 ● 0.6 0.6 ● 0.5 0.5 ● ● ● ● ● 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 0.0 0.0 1910−1924 1925−1945 1946−1957 1910−1924 1925−1945 1946−1957 8/7/2009gr 13/66
Sequential data analysis - 1 Introduction Overview of what you will learn Dissimilarities between pairs of sequences Distance between sequences Different metrics metrics (LCP, LCS, OM) Once we have 2 by 2 dissimilarities, we can Determine a central sequence (centro-type) Measure the discrepancy between sequences Clustering a set of sequences MDS scatterplot representation of sequences Heterogeneity analysis of a set of sequences (ANOH) Dissimilarity analysis (Induction trees) 8/7/2009gr 14/66
Sequential data analysis - 1 Introduction Overview of what you will learn Dissimilarities between pairs of sequences Distance between sequences Different metrics metrics (LCP, LCS, OM) Once we have 2 by 2 dissimilarities, we can Determine a central sequence (centro-type) Measure the discrepancy between sequences Clustering a set of sequences MDS scatterplot representation of sequences Heterogeneity analysis of a set of sequences (ANOH) Dissimilarity analysis (Induction trees) 8/7/2009gr 14/66
Sequential data analysis - 1 Introduction Overview of what you will learn Dissimilarities between pairs of sequences Distance between sequences Different metrics metrics (LCP, LCS, OM) Once we have 2 by 2 dissimilarities, we can Determine a central sequence (centro-type) Measure the discrepancy between sequences Clustering a set of sequences MDS scatterplot representation of sequences Heterogeneity analysis of a set of sequences (ANOH) Dissimilarity analysis (Induction trees) 8/7/2009gr 14/66
Sequential data analysis - 1 Introduction Overview of what you will learn Dissimilarities between pairs of sequences Distance between sequences Different metrics metrics (LCP, LCS, OM) Once we have 2 by 2 dissimilarities, we can Determine a central sequence (centro-type) Measure the discrepancy between sequences Clustering a set of sequences MDS scatterplot representation of sequences Heterogeneity analysis of a set of sequences (ANOH) Dissimilarity analysis (Induction trees) 8/7/2009gr 14/66
Sequential data analysis - 1 Introduction Overview of what you will learn Cluster analysis: determining typologies Type 1 : Full Time Trajectoires (53 %) Type 2 : Mixed Part Time − Home Trajectories (13 %) Type 3 : At Home Trajectories (16 %) 1.0 1.0 1.0 0.8 0.8 0.8 0.6 0.6 0.6 Freq. (n=795) Freq. (n=155) Freq. (n=277) 0.4 0.4 0.4 0.2 0.2 0.2 0.0 0.0 0.0 A20 A23 A26 A29 A32 A35 A38 A41 A44 A20 A23 A26 A29 A32 A35 A38 A41 A44 A20 A23 A26 A29 A32 A35 A38 A41 A44 Type 4 : Part Time Trajectories (7 %) Type 5 : Missing Data (11 %) 1.0 1.0 0.8 0.8 0.6 0.6 Freq. (n=101) Freq. (n=175) Missing Full time 0.4 0.4 Part time Negative break Positive break 0.2 0.2 At home Retired Education 0.0 0.0 A20 A23 A26 A29 A32 A35 A38 A41 A44 A20 A23 A26 A29 A32 A35 A38 A41 A44 8/7/2009gr 15/66
Recommend
More recommend