Symbolic Analysis of Hierarchical-Structured Data. Application to Veterinary epidemiology C. Fablet 1 , E. Diday 2 , S. Bougeard 1 , C. Toque 3 & L. Billard 4 1 French agency for food, environmental and occupational health safety (Anses), France 2 University of Paris Dauphine, France 3 SYROKKO, France 4 University of Georgia, Athens, USA 19th International Conference on Computational Statistics, Paris, August 22-27, 2010
Context of veterinary epidemiological surveys Statistical issue 1. Description of the relationships between the dependent variables � variable selection, 2. Summary of the dependent variables into an overall single variable ( i.e. the disease), … with a hierarchical structure of observations (P animals each within N farms). Disease intensity Unapparent disease Dependent variable Farms x Farms Average disease animals (disease) y Y Fatal disease
Dataset: Study of pig respiratory diseases 19 variables Disease intensity Unapparent disease Description of pig respiratory 125 farms x 125 Average disease diseases 30 animals farms y Y Fatal disease • Pneumonia (0 � 28), pleuritis (0 � 4), • Lung abscess (0/1), lung nodules (0/1), healing from pneumonia (0/1), • Hypertrophy of lung lymph nodes (0 � 3), pericarditis (0/1), • Frequency of coughs at 16 and 22 weeks of age.
Step 1: Variable synthesis Classical procedure Symbolic procedure 19 variables Description of • Categorical variable: 125 farms x pig respiratory 30 animals histogram of the diseases frequencies based on 30 animals, Animal frequencies Median score (categorical var.) (continuous var.) • Continuous variable: 64 variables histogram which keep the data variation. Description of 125 farms pig respiratory diseases
Step 1: Variable synthesis (symbolic results) SYR software with the TABSYR & STATSYR modules
Step 2: Variable selection Classical procedure Symbolic procedure • Symbolic Principal • Principal Component Component Analysis of Analysis of the 64 the 19 variables, variables, • ‘Global’ variable selection (best var. contribution) • Selection of the variables • ‘Quadrants’ variable with the best contribution, selection (best var. correlation), • Principal Component • Final symbolic PCA Analysis of the selected representation of the selected ‘bins’ variables. variables.
Step 2: Variable selection (symbolic results) • Var. group PNEU+: severe pneumonia, • Var. group PLEU_PNEU: average level of pleuritis and pneumonia, • Var. group PLEU0_PNEU0: few lung lesions, • Var. group PNEU-: light pneumonia lesions. Symbolic PCA of the 8 ‘bins’ selected var. SYR software with the ACPSYR module
Step 3: Individual clustering Classical procedure Symbolic procedure • Hierarchical Ascendant • Symbolic partitioning Classification (Ward (inertia criterion) criterion) • Cluster description • Cluster description • Variables sorted in order of overall discriminant power, • Comparison of the variable means (& standard • Cluster description with the deviations) of each cluster, most discriminant with the variable means on variables (or variable the whole sample. modalities).
Step 3: Individual clustering (symbolic results) SYR software with the CLUSTSYR module
Conclusion & perspectives Conclusion • Symbolic analysis to process hierarchical-structured data without reducing information, • Relevant and useful methods for veterinary epidemiological surveys (competes with GEE including a random measurement effect), • Available software (SYR). Perspectives • Other symbolic methods available for various aims, • Extension to multiblock modelling (hierarchical-structured observations and variables).
Symbolic Analysis of Hierarchical-Structured Data. Application to Veterinary epidemiology C. Fablet 1 , E. Diday 2 , S. Bougeard 1 , C. Toque 3 & L. Billard 4 1 French agency for food, environmental and occupational health safety (Anses), France 2 University of Paris Dauphine, France 3 SYROKKO, France 4 University of Georgia, Athens, USA 19th International Conference on Computational Statistics, Paris, August 22-27, 2010
Recommend
More recommend