Large chromatographic data sets analysis on the example of - PowerPoint PPT Presentation

Parameters of experimental design Preprocessing Statistical analysis Correlation network analysis Large chromatographic data sets analysis on the example of metabolomic data Aneta Sawikowska 1 , 2 , Pawe� l Krajewski 2 1 Poznan University of Life Sciences, Poznan, Poland 2 Institute of Plant Genetics, Polish Academy of Sciences, Poznan, Poland 02.12.2016 A. Sawikowska, P. Krajewski Large chromatographic data analysis

Parameters of experimental design Preprocessing Statistical analysis Correlation network analysis Plan 1 Introduction 2 Parameters of experimental design 3 Preprocessing 4 Statistical analysis 5 Correlation network analysis 6 Conclusions A. Sawikowska, P. Krajewski Large chromatographic data analysis

Parameters of experimental design Preprocessing Statistical analysis Correlation network analysis A. Sawikowska, P. Krajewski Large chromatographic data analysis

Parameters of experimental design Preprocessing Statistical analysis Correlation network analysis Peaks can be interpreted as intervals in which a metabolite or a group of metabolites with similar properties occur. A. Sawikowska, P. Krajewski Large chromatographic data analysis

Parameters of experimental design Preprocessing Statistical analysis Correlation network analysis v varieties, d drought treatment, p time points, r replications TOTAL - about 12 mln observations A. Sawikowska, P. Krajewski Large chromatographic data analysis

Parameters of experimental design Preprocessing Statistical analysis Correlation network analysis Parameters of experimental design: 9 varieties, 3 drought treatment (I, II, I+II) and control, 8 time points, 4 biological replications. Preprocessing in own scripts in the R system Statistical analysis in Genstat A. Sawikowska, P. Krajewski Large chromatographic data analysis

Parameters of experimental design Baseline removal (differentiation) Preprocessing Retention time alignment (COW) Statistical analysis Peak detection Correlation network analysis Different baseline for the same data A. Sawikowska, P. Krajewski Large chromatographic data analysis

Parameters of experimental design Baseline removal (differentiation) Preprocessing Retention time alignment (COW) Statistical analysis Peak detection Correlation network analysis The baseline-estimation problem A given vector x = { x 1 , x 2 ,..., x i } of i observed intensities can be modeled as the sum of a ideal spectrum s and a background b , convolved with a blurring function p , with noise n added to the result: x = ( s + b ) ∗ p + n with ∗ denoting convolution. The noise is often taken to be Gaussian or Poissonian. The problem is to recover s , hence s = ( x − n ) ∗ p − 1 − b with p − 1 being the inverse of the blurring function. The problem is that knowledge of p − 1 , b , and n is often incomplete or totally lacking. A. Sawikowska, P. Krajewski Large chromatographic data analysis

Parameters of experimental design Baseline removal (differentiation) Preprocessing Retention time alignment (COW) Statistical analysis Peak detection Correlation network analysis Baseline removal by differentiation Baseline: a line that is a base for measurements, leads to problems with measurement of peak area and can negatively affect all subsequent steps, needs to be removed. Differentiation - common approach to remove baseline by calculating the vectors y j = x j +1 − x j , for j = 1 ,..., J − 1, where x j - the observation at the j -th retention time, J - the number of retention time points. A. Sawikowska, P. Krajewski Large chromatographic data analysis

Parameters of experimental design Baseline removal (differentiation) Preprocessing Retention time alignment (COW) Statistical analysis Peak detection Correlation network analysis Baseline removal by differentiation A. Sawikowska, P. Krajewski Large chromatographic data analysis

Parameters of experimental design Baseline removal (differentiation) Preprocessing Retention time alignment (COW) Statistical analysis Peak detection Correlation network analysis Retention time alignment (COW) Why correlation optimised warping? Advantages of COW are: it aligns profiles by matching shapes, the profiles are as similar as possible while preserving the peak shape and area (automated COW). A. Sawikowska, P. Krajewski Large chromatographic data analysis

Parameters of experimental design Baseline removal (differentiation) Preprocessing Retention time alignment (COW) Statistical analysis Peak detection Correlation network analysis Reference chromatogram selection Similarity index For a given chromatogram y T I ∏ similarityindex = | ρ ( y T , y i ) | , i =1 where ρ is Pearson’s correlation coefficient between y T and y i 0 ≤ similarityindex ≤ 1 A. Sawikowska, P. Krajewski Large chromatographic data analysis

Parameters of experimental design Baseline removal (differentiation) Preprocessing Retention time alignment (COW) Statistical analysis Peak detection Correlation network analysis Theory - COW on two chromatograms m - ”segment length”, t - ”slack size” L p +1, L T +1 - the number of data points in profile P , T N = L p m - the number of segments ∆ = L T N − m - the difference in segment length in P and T (∆ − t ;∆+ t ) - the interval in which warpings are allowed x i - the position of the beginning of segment i after warping A. Sawikowska, P. Krajewski Large chromatographic data analysis

Parameters of experimental design Baseline removal (differentiation) Preprocessing Retention time alignment (COW) Statistical analysis Peak detection Correlation network analysis COW on chromatograms for individual varieties A. Sawikowska, P. Krajewski Large chromatographic data analysis

Parameters of experimental design Baseline removal (differentiation) Preprocessing Retention time alignment (COW) Statistical analysis Peak detection Correlation network analysis Individual peaks - for individual chromatograms A. Sawikowska, P. Krajewski Large chromatographic data analysis

Parameters of experimental design Baseline removal (differentiation) Preprocessing Retention time alignment (COW) Statistical analysis Peak detection Correlation network analysis Second difference and it’s smoothing A. Sawikowska, P. Krajewski Large chromatographic data analysis

Parameters of experimental design Baseline removal (differentiation) Preprocessing Retention time alignment (COW) Statistical analysis Peak detection Correlation network analysis Common peaks - sum of all individual peaks A. Sawikowska, P. Krajewski Large chromatographic data analysis

Common peaks - deconvolution problem

Parameters of experimental design Preprocessing Statistical analysis Correlation network analysis Statistical analysis - Mixed linear model y - observation of a peak, the content of a metabolite in a sample y = µ + Variety + Drought treatment + Variety ∗ Drought treatment + e 1 Log transformation. 2 Analysis of variance by REML (all effects fixed). 3 Significant peaks selection by tests based on F approximation with Bonferroni correction. 4 The hierarchical group-average method (UPGMA), boxplots and correlation analysis based on significant peaks. A. Sawikowska, P. Krajewski Large chromatographic data analysis

Parameters of experimental design Preprocessing Statistical analysis Correlation network analysis Conclusions Analysis of a large set of chromatographic data within a reasonably short time on computing clusters at Poznan Supercomputing and Networking Centre. Statistical analysis was performed on 3-factorial experiment for a large number of data: 100 lines (the population of recombinant inbred lines derived from the cross between European and Syrian barley), about 100 metabolites, treatment and control, 2 time points, 3 biological replications, total: about 120 000 observations. Correlation analysis was done. A. Sawikowska, P. Krajewski Large chromatographic data analysis

Acknowledgements Piasecka, A.; Sawikowska, A.; Kuczy´ nska, A.; Krystkowiak, K.; Miko� lajczak, K.; Ogrodowicz, P.; Gudy´ s, K.; Guzy-Wr´ obelska, J.; Krajewski, P.; Kachlicki, P., Drought related secondary metabolites of barley (Hordeum vulgare L.) leaves and their mQTLs, The Plant Journal , doi: 10.1111/tpj.13430, accepted. 1 IPG PAS, 2 IBC PAS dr Anna Piasecka 1 , 2 prof. Piotr Kachlicki 1

Large chromatographic data sets analysis on the example of - PowerPoint PPT Presentation

Parameters of experimental design Preprocessing Statistical analysis Correlation network analysis Large chromatographic data sets analysis on the example of metabolomic data Aneta Sawikowska 1 , 2 , Pawe l Krajewski 2 1 Poznan University of

MATH 105: Finite Mathematics 6-1: Sets Prof. Jonathan Duncan Walla Walla College Winter

Chromatographic Theory References: Skoog, Principles of Instrumental Analysis 1985 (3

Sets Sets A Set is an abstract data type representing an unordered Sets are unordered and

Taking the Pain Out of Chromatographic Peak Integration Dan Kutscher Eastern Analytical

Large Sets of q -Analogs of Designs Michael Braun, Michael Kiermaier, Axel Kohnert , Reinhard

Principal Components Analysis (PCA) Exploratory data analysis of high-dimensional data sets.

S 3 identified by a rep. identified by a rep. n n = # of = # of Make Make- -Set

Languages and Regular expressions Lecture 2 1 Strings, Sets of Strings, Sets of Sets of

SELF - PRESENTATION Laboratory of Chromatographic Method Department of Chemical, Aerosols and

Chromatographic techniques for Quality control of freeze dried Radiopharmaceuticals Gel

Symbolic data analysis Symbolic data analysis Clustering of large data sets of mixed units

Mining and Pattern Analysis in Large Data Sets for Biological Information. David W. Mount

Mining and Pattern Analysis in Large Data Sets for Biological Information. David W. Mount

The Firefighter Problem on Trees David Ellison RMIT School of Science Co-authors: Pierre

Toy Example Toy Example Toy Example Toy Example Toy Example D 1 weak classifiers = vertical or

Toy Example Toy Example Toy Example Toy Example Toy Example D 1 weak classifiers = vertical or

Genetic Evaluation of Eucalyptus cladocalyx Growth and Form in Western Australia Andrew Callister

P++ models ( DIOGENE software) for adjustment to environmental effects Applications in Genetics.

Introduction 2 / 148 Social Life and Economics The outstanding discovery

SOIAL PROJECT NETWORK DETAILS CONCEPTS ASSOCIATION ANALYSIS Social network parameters

The statistical evaluation of DNA crime stains in R Miriam Maruiakov Department of

Outline 1 The topic 2 Decision support systems 3 Modeling 3.2 Numerical models

Analysis of High-Throughput Biological Data Part II: Computational Bottlenecks and Novel

Learning Links in MeSH Co-occurrence Network Preliminary Results Andrej Kastrin 1 , Thomas C.

Sambuz

Useful Links

Newsletter

Mail Us