large chromatographic data sets analysis on the example
play

Large chromatographic data sets analysis on the example of - PowerPoint PPT Presentation

Parameters of experimental design Preprocessing Statistical analysis Correlation network analysis Large chromatographic data sets analysis on the example of metabolomic data Aneta Sawikowska 1 , 2 , Pawe l Krajewski 2 1 Poznan University of


  1. Parameters of experimental design Preprocessing Statistical analysis Correlation network analysis Large chromatographic data sets analysis on the example of metabolomic data Aneta Sawikowska 1 , 2 , Pawe� l Krajewski 2 1 Poznan University of Life Sciences, Poznan, Poland 2 Institute of Plant Genetics, Polish Academy of Sciences, Poznan, Poland 02.12.2016 A. Sawikowska, P. Krajewski Large chromatographic data analysis

  2. Parameters of experimental design Preprocessing Statistical analysis Correlation network analysis Plan 1 Introduction 2 Parameters of experimental design 3 Preprocessing 4 Statistical analysis 5 Correlation network analysis 6 Conclusions A. Sawikowska, P. Krajewski Large chromatographic data analysis

  3. Parameters of experimental design Preprocessing Statistical analysis Correlation network analysis A. Sawikowska, P. Krajewski Large chromatographic data analysis

  4. Parameters of experimental design Preprocessing Statistical analysis Correlation network analysis Peaks can be interpreted as intervals in which a metabolite or a group of metabolites with similar properties occur. A. Sawikowska, P. Krajewski Large chromatographic data analysis

  5. Parameters of experimental design Preprocessing Statistical analysis Correlation network analysis v varieties, d drought treatment, p time points, r replications TOTAL - about 12 mln observations A. Sawikowska, P. Krajewski Large chromatographic data analysis

  6. Parameters of experimental design Preprocessing Statistical analysis Correlation network analysis Parameters of experimental design: 9 varieties, 3 drought treatment (I, II, I+II) and control, 8 time points, 4 biological replications. Preprocessing in own scripts in the R system Statistical analysis in Genstat A. Sawikowska, P. Krajewski Large chromatographic data analysis

  7. Parameters of experimental design Baseline removal (differentiation) Preprocessing Retention time alignment (COW) Statistical analysis Peak detection Correlation network analysis Different baseline for the same data A. Sawikowska, P. Krajewski Large chromatographic data analysis

  8. Parameters of experimental design Baseline removal (differentiation) Preprocessing Retention time alignment (COW) Statistical analysis Peak detection Correlation network analysis Different baseline for the same data A. Sawikowska, P. Krajewski Large chromatographic data analysis

  9. Parameters of experimental design Baseline removal (differentiation) Preprocessing Retention time alignment (COW) Statistical analysis Peak detection Correlation network analysis The baseline-estimation problem A given vector x = { x 1 , x 2 ,..., x i } of i observed intensities can be modeled as the sum of a ideal spectrum s and a background b , convolved with a blurring function p , with noise n added to the result: x = ( s + b ) ∗ p + n with ∗ denoting convolution. The noise is often taken to be Gaussian or Poissonian. The problem is to recover s , hence s = ( x − n ) ∗ p − 1 − b with p − 1 being the inverse of the blurring function. The problem is that knowledge of p − 1 , b , and n is often incomplete or totally lacking. A. Sawikowska, P. Krajewski Large chromatographic data analysis

  10. Parameters of experimental design Baseline removal (differentiation) Preprocessing Retention time alignment (COW) Statistical analysis Peak detection Correlation network analysis Baseline removal by differentiation Baseline: a line that is a base for measurements, leads to problems with measurement of peak area and can negatively affect all subsequent steps, needs to be removed. Differentiation - common approach to remove baseline by calculating the vectors y j = x j +1 − x j , for j = 1 ,..., J − 1, where x j - the observation at the j -th retention time, J - the number of retention time points. A. Sawikowska, P. Krajewski Large chromatographic data analysis

  11. Parameters of experimental design Baseline removal (differentiation) Preprocessing Retention time alignment (COW) Statistical analysis Peak detection Correlation network analysis Baseline removal by differentiation A. Sawikowska, P. Krajewski Large chromatographic data analysis

  12. Parameters of experimental design Baseline removal (differentiation) Preprocessing Retention time alignment (COW) Statistical analysis Peak detection Correlation network analysis Retention time alignment (COW) Why correlation optimised warping? Advantages of COW are: it aligns profiles by matching shapes, the profiles are as similar as possible while preserving the peak shape and area (automated COW). A. Sawikowska, P. Krajewski Large chromatographic data analysis

  13. Parameters of experimental design Baseline removal (differentiation) Preprocessing Retention time alignment (COW) Statistical analysis Peak detection Correlation network analysis Reference chromatogram selection Similarity index For a given chromatogram y T I ∏ similarityindex = | ρ ( y T , y i ) | , i =1 where ρ is Pearson’s correlation coefficient between y T and y i 0 ≤ similarityindex ≤ 1 A. Sawikowska, P. Krajewski Large chromatographic data analysis

  14. Parameters of experimental design Baseline removal (differentiation) Preprocessing Retention time alignment (COW) Statistical analysis Peak detection Correlation network analysis Theory - COW on two chromatograms m - ”segment length”, t - ”slack size” L p +1, L T +1 - the number of data points in profile P , T N = L p m - the number of segments ∆ = L T N − m - the difference in segment length in P and T (∆ − t ;∆+ t ) - the interval in which warpings are allowed x i - the position of the beginning of segment i after warping A. Sawikowska, P. Krajewski Large chromatographic data analysis

  15. Parameters of experimental design Baseline removal (differentiation) Preprocessing Retention time alignment (COW) Statistical analysis Peak detection Correlation network analysis COW on chromatograms for individual varieties A. Sawikowska, P. Krajewski Large chromatographic data analysis

  16. Parameters of experimental design Baseline removal (differentiation) Preprocessing Retention time alignment (COW) Statistical analysis Peak detection Correlation network analysis COW on chromatograms for individual varieties A. Sawikowska, P. Krajewski Large chromatographic data analysis

  17. Parameters of experimental design Baseline removal (differentiation) Preprocessing Retention time alignment (COW) Statistical analysis Peak detection Correlation network analysis Individual peaks - for individual chromatograms A. Sawikowska, P. Krajewski Large chromatographic data analysis

  18. Parameters of experimental design Baseline removal (differentiation) Preprocessing Retention time alignment (COW) Statistical analysis Peak detection Correlation network analysis Second difference and it’s smoothing A. Sawikowska, P. Krajewski Large chromatographic data analysis

  19. Parameters of experimental design Baseline removal (differentiation) Preprocessing Retention time alignment (COW) Statistical analysis Peak detection Correlation network analysis Common peaks - sum of all individual peaks A. Sawikowska, P. Krajewski Large chromatographic data analysis

  20. Common peaks - deconvolution problem

  21. Common peaks - deconvolution problem

  22. Common peaks - deconvolution problem

  23. Parameters of experimental design Preprocessing Statistical analysis Correlation network analysis Statistical analysis - Mixed linear model y - observation of a peak, the content of a metabolite in a sample y = µ + Variety + Drought treatment + Variety ∗ Drought treatment + e 1 Log transformation. 2 Analysis of variance by REML (all effects fixed). 3 Significant peaks selection by tests based on F approximation with Bonferroni correction. 4 The hierarchical group-average method (UPGMA), boxplots and correlation analysis based on significant peaks. A. Sawikowska, P. Krajewski Large chromatographic data analysis

  24. Parameters of experimental design Preprocessing Statistical analysis Correlation network analysis Conclusions Analysis of a large set of chromatographic data within a reasonably short time on computing clusters at Poznan Supercomputing and Networking Centre. Statistical analysis was performed on 3-factorial experiment for a large number of data: 100 lines (the population of recombinant inbred lines derived from the cross between European and Syrian barley), about 100 metabolites, treatment and control, 2 time points, 3 biological replications, total: about 120 000 observations. Correlation analysis was done. A. Sawikowska, P. Krajewski Large chromatographic data analysis

  25. Acknowledgements Piasecka, A.; Sawikowska, A.; Kuczy´ nska, A.; Krystkowiak, K.; Miko� lajczak, K.; Ogrodowicz, P.; Gudy´ s, K.; Guzy-Wr´ obelska, J.; Krajewski, P.; Kachlicki, P., Drought related secondary metabolites of barley (Hordeum vulgare L.) leaves and their mQTLs, The Plant Journal , doi: 10.1111/tpj.13430, accepted. 1 IPG PAS, 2 IBC PAS dr Anna Piasecka 1 , 2 prof. Piotr Kachlicki 1

Recommend


More recommend