SLIDE 1
Comparison of complementary statistical analysis approaches in - - PowerPoint PPT Presentation
Comparison of complementary statistical analysis approaches in - - PowerPoint PPT Presentation
Comparison of complementary statistical analysis approaches in metabolomic food traceability Ral Gonzlez-Domnguez 1,2* , Ana Sayago 1,2 , ngeles Fernndez-Recamales 1,2 1 Department of Chemistry, Faculty of Experimental Sciences,
SLIDE 2
SLIDE 3
Abstract: Metabolomics generates large datasets that require the use of advanced and complementary statistical tools in order to extract the maximum amount of useful information. In this work, we show the advantages, limitations and complementarities of these techniques in food analysis, on the basis of data acquired in various traceability studies performed in our research group with strawberry and extra virgin olive oil. Keywords: food traceability; machine learning; pattern recognition
3
SLIDE 4
Introduction
4
Omic technologies Pattern recognition techniques: Principal component analysis (PCA), partial least squares discriminant analysis (PLS-DA), soft independent model class analogy (SIMCA) Machine learnig techniques: random forest (RF), support vector machines (SVM), artificial neural network (ANN) large datasets
SLIDE 5
Introduction
5
Principal component analysis
- verview of data and identification of
- utliers and trends
Partial least square discriminant analysis discrimination between previously defined categories
- 15
- 10
- 5
5 10
- 12
- 10
- 8
- 6
- 4
- 2
2 4 6 8 10 12 t[2] t[1]
- 10
- 5
5 10
- 4
- 3
- 2
- 1
1 2 3 4 t[2] t[1]
most commonly employed tools in metabolomics
SLIDE 6
Introduction
6
Soft independent model class analogy Look for possible overlapping among the study groups
1 2 3 4 5 6 1 2 3 4 5 M2.DModXPS+[2](Norm) M1.DModXPS+[2](Norm)
D-Crit(0,05) D-Crit(0,05)
SLIDE 7
Introduction
7
Support vector machines Random forest Machine learning techniques Model performance sensitivity (SENS): percentage of cases belonging to a determinate class correctly classified specificity (SPEC): percentage of cases not belonging to a class and rejected by this class model Artificial neural network
SLIDE 8
Materials and Methods
8
Three varieties 2 macrotunnel types 3 conductivities of irrigation 3 soilless substrates GC-MS un-targeted metabolomics 1 LC-MS targeted metabolomics 2 ICP-MS multielemental profiling 3
1H-NMR + GC/LC profiling
unsaponifiable fraction 4
(1) Akhatou et al. Plant Physiol. Biochem. 101 (2016) 14-22 (2) Akhatou et al. J. Agric. Food Chem. 65 (2017) 9559-9567 (3) Sayago et al. Food Chem. 261 (2018) 42–50 (4) Sayago et al. Under preparation
SLIDE 9
Results and Discussion
9
Differentiation of strawberry cultivars based on GC-MS metabolomic profiles PCA showed good clustering of study groups PLS-DA to search for discriminant metabolites between varieties: sugars, organic acids, amino acids conventional statistical pipeline in metabolomics
Akhatou et al. Plant Physiol. Biochem. 101 (2016) 14-22
PCA PLS-DA
SLIDE 10
Results and Discussion
10
Differentiation of strawberry cultivars based on LC-MS metabolomic profiles
Akhatou et al. J. Agric. Food Chem. 65 (2017) 9559-9567
PLS-DA RF Similar metabolic changes were observed in both models: anthocyanins, ellagic acid derivatives RF modeling provided higher sensitivity and similar specificity
SLIDE 11
Results and Discussion
11
Differentiation of olive oil provenance based on ICP-MS mineral profiles
Sayago et al. Food Chem. 261 (2018) 42–50
Three predictive modelling aproaches were compared to classify EVOOs according to three geographical origins Machine learning tools (RF and SVM) provided higher sensitivity than PLS-DA models Specificity was slightly higher in PLS-DA models
SLIDE 12
Results and Discussion
12 Sayago et al. Under preparation
Differentiation of olive oil variety based on 1H-NMR and the unsaponifiable fraction
Model Arbequina Picual Verdial SENS SPEC SENS SPEC SENS SPEC SVM 100 100 100 96 87.5 100 RF 100 93.3 100 85.3 12.5 100 ANN 100 100 100 100 100 100
PLS-DA SIMCA complements to PLS-DA with the aim of looking for possible overlapping among study groups Machine learning tools provide similar statistical performance SIMCA
SLIDE 13