Analysis of multivariate data depending on several factors: ANOVA-PLS A. El Ghaziri 1 E.M. Qannari 1 T. Moyon 2 M.-C. Alexandre-Gouabau 2 1 ONIRIS, Sensometrics and Chemometrics unit, Nantes 2 INRA, Physiologie des Adaptations Nutritionnelles (PhAN), Nantes
Analysis of multivariate Metabolomics data and two factors data depending on several factors Context Several metabolites (variables) measured on animals (rat pups) Existing methods according to an experimental design involving two factors: ASCA APCA ANOVA-PLS Gestation : nutritional protein restriction on the mothers Particular case during pregnancy Comparison ⇒ levels : YES/NO Benefits ANOVA-PLS Application Lactation : nutritional protein restriction on the mothers Factor Gestation Factor Lactation during lactation Interaction Conclusion ⇒ levels : YES/NO References 2 / 19
Analysis of multivariate Metabolomics data and two factors data depending on several factors Context Several metabolites (variables) measured on animals (rat pups) Existing methods according to an experimental design involving two factors: ASCA APCA ANOVA-PLS Gestation : nutritional protein restriction on the mothers Particular case during pregnancy Comparison ⇒ levels : YES/NO Benefits ANOVA-PLS Application Lactation : nutritional protein restriction on the mothers Factor Gestation Factor Lactation during lactation Interaction Conclusion ⇒ levels : YES/NO References Aim study of the effect of the various factors (fetal and post-natal) nutritional periods on the growth of the rat pups through the metabolites. 2 / 19
Analysis of multivariate Outline data depending on several factors 1 Existing methods Existing ASCA methods ANOVA-PCA ASCA APCA ANOVA-PLS 2 New method: ANOVA-PLS Particular case Comparison 3 Comparison of methods Benefits ANOVA-PLS 4 Benefits of ANOVA-PLS Application Factor Gestation Factor Lactation 5 Application to metabolomics data Interaction Conclusion Factor Gestation References Factor Lactation Interaction 6 Conclusion 3 / 19
Analysis of multivariate Existing methods data depending on several factors Existing methods • ANOVA on each metabolite ASCA APCA ANOVA-PLS • Multivariate-ANOVA (MANOVA) Particular case Comparison Benefits ANOVA-PLS Application Factor Gestation Factor Lactation Interaction Conclusion References 4 / 19
Analysis of multivariate Existing methods data depending on several factors Existing methods • ANOVA on each metabolite ASCA APCA ANOVA-PLS • Multivariate-ANOVA (MANOVA) Particular case Comparison Benefits ANOVA-PLS Application Factor Gestation • A NOVA- S imultaneous C omponent A nalysis (ASCA, Factor Lactation Interaction Smilde et al. (2005)) Conclusion References • A NOVA- PCA (APCA, Harrington et al. (2005)) 4 / 19
Analysis of multivariate Decomposition data depending on several factors • For each variable (i.e. metabolite), the two-way ANOVA Existing decomposition: methods ASCA x ijk = µ + α i + β j + γ ij + ǫ ijk APCA ANOVA-PLS • µ is the overall mean Particular case • α i is the effect due to level i in the first factor Comparison • β j is the effect due to level j in the second factor • γ ijk is the effect due to the interaction between the two factors. Benefits ANOVA-PLS Application Extension to multivariate data for 2 factors G, L with Factor Gestation Factor Lactation Interaction interaction: Conclusion X = ¯ X + X G + X L + X GL + E References 5 / 19
Analysis of multivariate Decomposition data depending on several factors • For each variable (i.e. metabolite), the two-way ANOVA Existing decomposition: methods ASCA x ijk = µ + α i + β j + γ ij + ǫ ijk APCA ANOVA-PLS • µ is the overall mean Particular case • α i is the effect due to level i in the first factor Comparison • β j is the effect due to level j in the second factor • γ ijk is the effect due to the interaction between the two factors. Benefits ANOVA-PLS Application Extension to multivariate data for 2 factors G, L with Factor Gestation Factor Lactation Interaction interaction: Conclusion X = ¯ X + X G + X L + X GL + E References In the following, we assume that the data matrix X is centered by column: ¯ X = 0) 5 / 19
Analysis of multivariate ANOVA-Simultaneous Component data depending on several factors Analysis Existing X = X G + X L + X GL + E methods ASCA APCA ANOVA-PLS Particular case PCA PCA Comparison Benefits ANOVA-PLS Application For each effect, maximization of the between groups variances Factor Gestation Factor Lactation Interaction Conclusion Limitation References Lack of tools to assess the significance of the effects. To cope with this difficulty, permutation tests were proposed (Vis et al., 2007; Zwanenburg et al., 2011) 6 / 19
Analysis of multivariate ANOVA-PCA (APCA) data depending on several factors • Z G = X G + E PCA ( Z G ) Existing methods • Z L = X L + E PCA ( Z L ) ASCA APCA ANOVA-PLS • Z GL = X GL + E PCA ( Z GL ) Particular case Comparison Benefits The rationale behind this strategy is that if a factor is ANOVA-PLS significant, it is likely to emerge on the first principal Application Factor Gestation components. Factor Lactation Interaction Conclusion Problem References It may occur that the noise dominates the first component. Gradual reduction of the residual variance + permutation test (Climaco-Pinto et al., 2009). 7 / 19
Analysis of multivariate New idea: ANOVA-PLS data depending on several factors • Z G = X G + E PLS ( X G ∼ Z G ) Existing methods ASCA APCA • Z L = X L + E PLS ( X L ∼ Z L ) ANOVA-PLS Particular case Comparison • Z GL = X GL + E PLS ( X GL ∼ Z GL ) Benefits ANOVA-PLS Application Factor Gestation Factor Lactation Rationale Interaction Conclusion If the factor has a significant effect it will emerge, References otherwise it will be “diluted” in the noise and the regression model will be irrelevant. 8 / 19
Analysis of multivariate The case of one factor: F data depending on several factors Existing methods ASCA X = X F + E APCA ANOVA-PLS Particular case • PLS ( X F ∼ X ) Comparison Benefits ANOVA-PLS ⇒ PLS-DA (Kemsley, 1996; Barker & Rayens, 2003; Application Nocairi et al., 2005) Factor Gestation Factor Lactation Interaction Conclusion Thus, our approach appears as an extension of PLS-DA to the References case of several factors. 9 / 19
Analysis of multivariate Comparison of methods data depending on several factors Existing ASCA: PCA ( X G ) methods ASCA APCA max variance ( u ) with u = X G ν ANOVA-PLS Particular case Comparison APCA: PCA ( X G + E ) Benefits ANOVA-PLS max variance ( t ) with t = ( X G + E ) ω Application Factor Gestation Factor Lactation Interaction Conclusion ANOVA-PLS: PLS ( X G ∼ ( X G + E )) References max cov 2 ( u , t ) = variance ( u ) variance ( t ) cor 2 ( u , t ) 10 / 19
Analysis of multivariate Benefits of the ANOVA-PLS data depending on several factors Existing Tools are available to highlight the relevance of the model (and methods ASCA the significance of the effects): APCA ANOVA-PLS Particular case • PLS principal components (scores) Comparison Benefits ANOVA-PLS • RMSEP (Root Mean Square Error of Prediction) Application Factor Gestation • Q 2 (index used in Cross validation to assess the Factor Lactation Interaction significance of a new component) Conclusion References • VIP: V ariable I mportance in the P rojection 11 / 19
Analysis of multivariate Application to metabolomics data data depending on several factors Experimental design on rats pups during gestation and Existing lactation stages: methods ASCA APCA ANOVA-PLS Gestation Lactation Particular case Comparison Protein- R estricted dams (8g of protein/100g of food) 8 % Protein- R estricted dams RR Benefits ANOVA-PLS 20 % Protein- C ontrol dams RC Application Factor Gestation Protein- C ontrol dams (20g of protein/100 g of food) 8 % Protein- R estricted dams CR Factor Lactation Interaction 20 % Protein- C ontrol dams CC Conclusion References Two factors + interaction • Factor Gestation (G), 2 levels: first letter R or first letter C • Factor Lactation (L), 2 levels: second letter R or second letter C 12 / 19
Analysis of multivariate Factor Gestation data depending on several factors PLS ( X G ∼ X G + E ) Existing Cumulative percentage of variation Percentage correctly classified by cross methods validation (LOO) ASCA APCA X G X G + E ANOVA-PLS comp 1 17.8 10.7 Particular case 64 comp 2 36.4 20.4 percentage correctly classified Comparison comp 3 55.5 25.0 62 Benefits comp 4 70.4 27.8 ANOVA-PLS comp 5 81.0 30.4 60 Application comp 6 87.8 32.7 Factor Gestation 58 Factor Lactation comp 7 91.3 35.6 Interaction comp 8 93.0 40.7 56 Conclusion comp 9 94.8 43.3 2 4 6 8 10 References component index 7 components 13 / 19
Analysis of multivariate Factor Gestation data depending on several factors 7 components were retained and submitted to Linear Discriminant Analysis (LDA) Existing methods ASCA APCA Boxplot of the discriminant component of LDA ANOVA-PLS Particular case Comparison 1.5 Benefits 1.0 ANOVA-PLS 0.5 Application LDA scores Factor Gestation 0.0 Factor Lactation −0.5 Interaction Conclusion −1.0 −1.5 References GC GR ⇒ Discrimination between the two levels of the factor Gestation on LDA component 14 / 19
Recommend
More recommend