“I “Interrogating the Gut Mi Microbiome: Esti timati tion of Gr Growth D Dynamics a and Pr Prediction of Biosynthetic Ge Gene C Clusters” Hongzhe Li Perelman Professor of Biostatistics, Epidemiology and Informatics Professor of Biostatistics and Statistics Vice Chair of Integrative Research Director, Center for Statistics in Big Data Perelman School of Medicine University of Pennsylvania
Interrogating the Gut Microbiome: Estimation of Growth Dynamics and Prediction of Biosynthetic Gene Clusters Hongzhe Li University of Pennsylvania 05/01/2020 1
Microbiome and its Function https://ep.bmj.com/content/102/5/257 (Amon and Sanderson, 2016) 2
The Human Microbiome and Cancer Rajagopala (2017 Cancer Prevention Research). Question - microbiome-based individual treatment assignment? 3
Microbiome, metabolites and immunology Levy, Blacher and Elinav (2017, Current Opinion in Microbiology) Question: how microbiome produces different metabolites? 4
Shotgun Metagenomics Slide from Katie Pollard Question: can we understand the growth dynamics? 5
Microbiome configurations/features in shotgun metagenomic data Static Features Composition of taxa. Microbial genes/gene set or pathway abundance. Diversity of microbes. Metagenomic SNPs/structural variants. Dynamic Features Bacterial growth rates Dynamic interactions Statistical questions - how to quantify and model these features? 6
Topics to be discussed Basic microbiology science Estimation of bacterial growth dynamics based on genome assemblies. Functional microbiome Deep learning approach for predicting biosythetic gene clusters. 7
Bacterial Growth Dynamics in Metagenomics Pienkowska et al., 2019. 8
Bacterial DNA Replication and Growth Dynamics Uneven coverage of read counts reveals bacterial growth rates. growth dynamics for species with complete genome sequences Korem et al. 2015 Science. growth dynamics for genome assemblies - new species Brown et al. 2016 Nature Biotechnology Gao and Li, 2018 Nature Methods 9
Genome assemblies from shotgun data Sangwan et al (2016): Microbiome 10
Illustration of the Statistical/Computational Problem For a given bacteria: For a given bacteria: 11
Illustration of the Statistical/Computational Problem For a given bacteria: For a given bacteria: 11
Illustration of the Statistical/Computational Problem For a given bacteria: For a given bacteria: 11
Coverages of contigs - 6 PLEASE samples Top 3: normal. Bottom 3: IBD patients. 0.6 0.4 0.5 0.4 0.2 normalized log−coverage normalized log−coverage normalized log−coverage 0.2 0.0 0.0 0.0 −0.2 −0.2 −0.4 −0.5 −0.4 −0.6 −0.6 −0.8 0 20 40 60 80 100 120 140 0 20 40 60 80 100 120 140 0 20 40 60 80 100 120 140 contig.unordered contig.unordered contig.unordered 0.6 0.5 0.5 0.4 normalized log−coverage normalized log−coverage normalized log−coverage 0.2 0.0 0.0 0.0 −0.2 −0.5 −0.5 −0.6 0 20 40 60 80 100 120 140 0 20 40 60 80 100 120 140 0 20 40 60 80 100 120 140 contig.unordered contig.unordered contig.unordered 12
PCA vs Coverages - 6 PLEASE samples Top 3: normal. Bottom 3: IBD patients. 0.6 0.4 0.5 0.4 0.2 normalized log−coverage normalized log−coverage 0.2 normalized log−coverage 0.0 0.0 0.0 −0.2 −0.2 −0.4 −0.5 −0.4 −0.6 −0.6 −0.8 0 20 40 60 80 100 120 140 0 20 40 60 80 100 120 140 0 20 40 60 80 100 120 140 contig.ordered contig.ordered contig.ordered 0.6 0.5 0.5 0.4 normalized log−coverage normalized log−coverage normalized log−coverage 0.2 0.0 0.0 0.0 −0.2 −0.5 −0.5 −0.4 −0.6 0 20 40 60 80 100 120 140 0 20 40 60 80 100 120 140 0 20 40 60 80 100 120 140 contig.ordered contig.ordered contig.ordered 13
Optimal permutation recovery For a given assembly bin (species) Permuted Monotone Matrix Model: X is GC-adjusted log-read counts along the genome - n samples and p contigs, Y n × p = π ( X n × p ) , X n × p = Θ n × p + Z n × p where X, Θ , Z ∈ R n × p , π is a column-permutation operator, and � � Θ ∈ D = Θ = ( θ ij ) : 0 < θ i,j ≤ θ i,j +1 < ∞ , ∀ i, j . Z : some additive noise (i.i.d. Gaussian, N (0 , σ 2 )). The goal is to recover π based on observed Y . w ⊤ Solution: 1st PC, ˆ π = r ( ˆ 1 Y ) as an estimate of π , ˆ w 1 is loading coefficients of the 1st PC. 14
Theoretical Properties (Ma, Cai and Li 2020 JASA) Linear growth model - the parameter space for Θ: Θ ∈ R n × p : θ ij = a i η j + b i , where a i , b i ≥ 0 for 1 ≤ i ≤ n , � � D L = , 0 ≤ η j ≤ η j +1 for 1 ≤ j ≤ p − 1 A key quantity: n � 1 / 2 � n − 1 � a 2 Γ(Θ) = · 1 ≤ i<j ≤ p | η i − η j | . min i i =1 Theorem (Exact Recovery) Suppose the noise Z are i.i.d. N (0 , σ 2 ) . Then under some mild conditions, whenever � log p Γ � σ n , π = π with probability at least 1 − p − c . we have ˆ 15
Estimation of PTR Proposed estimators of peak/trough coverage: ˆ Θ max / ˆ Θ min : 1 Obtain the optimal permutation estimator ˆ π to reorder the columns (contigs); 2 Fit simple linear regression for each row (sample); 3 Define ˆ Θ max and ˆ Θ min as the fitted maximum and minimum values . = ⇒ DEMIC algorithm. Optimal and adaptive estimation of PTR and the two extreme values (peak and trough) for general growth model. Ma, Cai and Li: 2020 submitted 16
" # ! /"01-*)+-$#&$,-2-# DEMIC Software =")A *$9",)2" D,$.2@& Dynamics Estimator of Microbial Communities (DEMIC) *$9",)2" !"#$%"&$'& 3-4-,"*+-$#)1&,"01-*)+-$#& https://github.com/scottdaniel/sbx demic (Scott Daniel) =")AB+$B+,$.2@&,)+-$&$'&*$9",)2"&C@"#& ()*+",-.% $'&()*+",-)1&2"#$%" *$%01"+"&2"#$%"&-5&)9)-1)(1" $ H 6$2 7 8*$9",)2":& < 6$2 7 8*$9",)2":& -#&5)%01"5 -#&5)%01"&; >$#+-25 ! 3-##"4&*$#+-25 $'& *$#5-5+"#*K )&50"*-"5 ! 6$2 7 8*$9",)2":& 6$2 7 8*$9",)2":& 7 -#&5)%01"&< -#&5)%01"5 =>?&'$,&,"1)+-9"&4-5+)#*"& E)%01"&'-1+,)+-$# -#'","#*"&$'&*$#+-25 6-#"),&,"2,"55-$#&()5"4&$#&+@"& -#'",,"4&,"1)+-9"&4-5+)#*"5 E"F."#*-#2&*$9",)2"5& -#&51-4-#2&C-#4$C5 >.%.1)+-9"&0,$()(-1-+K < 7 6GG&'$,&*$,,"*+-#2& >$#+-2 '-1+,)+-$# !>&(-)5 DC$&,)#4$%&5.(5"+5& I+",)+"&.#+-1&*$#9",2"#*"&'$,& I+",)+-$#&'$,&)11&)9)-1)(1"& $'&*$#+-25 ")*@&5.(5"+ 50"*-"5J(-##-#25 17
Penn PLEASE Study (Lewis et al. (2015): Cell Host & Microbe) PLEASE (Pediatric Crohn’s Disease) study at Penn: 90 × 4 shotgun metagenomic samples and 26 normal children (ave 11 × 10 6 paired-end reads). Outcome: Fecal calprotection (FCP) (reduction below 250mcg/g). Metabolomics: fecal metabolites. 90 Children with Active Anti-TNF: 26 Crohn’s Disease (50%) a reduction in FCP below 250 Treatment at Discretion mcg/g. of Treating Physician Diet Therapy (n=38) Anti-TNF Therapy (n=52) Enteral Diet: 12 (32%) a reduction in FCP below 250 Baseline: Stool Microbiome, Dietary recalls x 3, FCP, PCDAI mcg/g. Week 1: Stool Microbiome, Dietary recalls x 3, FCP Lewis, Chen et al. (2015): Cell Host Week 4: Stool Microbiome, Dietary recalls x 3, FCP & Microbe. Week 8: Stool Microbiome, Dietary recalls x 3, FCP, PCDAI 18
Species with differential growth dynamics DEMIC estimated growth dynamics for 278 species, 20% in 50 or more samples. The assembly quality and marker lineage of seven contig clusters with different growth rates in healthy and Crohn’s disease samples of PLEASE data set (FDR < 0 . 05) Contig cluster Completeness Contamination Control vs Marker lineage Crohn’s metabat2.187 61.7% 0 High kBacteria metabat2.239 58.5% 1.8% High oClostridiales metabat2.250 66.6% 0.8% High pProteobacteria metabat2.259 79.3% 2.1% High kBacteria metabat2.270 72.0% 2.0% High fLachnospiraceae metabat2.369 68.8% 2.8% High fLachnospiraceae metabat2.55 55.2% 1.9% Low oClostridiales 19
! ! ! Shift of growth dynamics after treatment oClostridiales, oClostridiales, kbacteria (uncharacterized) " # metabat2.239 metabat2.259 metabat2.55 0123+1% 2.5 2.5 2.4 factor(Disease) 0+142'5"6&%72& ePTR 2.0 !"#$%&' 2.0 2.1 Control 0+142'8&&9'. <=$& 0+142'8&&9': Crohn 1.5 1.8 1.5 0+142'8&&9'; 1.0 1.5 ,- Control Crohn Control Crohn Control Crohn !"#$%&' Disease .-- ()#*&+ metabat2.239 metabat2.259 metabat2.55 3.5 /-- factor(Time) 2.00 3.0 2.5 5"6&%72& 1 1.75 ePTR 2.5 2 8&&9'. 1.50 2.0 2.0 8&&9': 3 1.25 1.5 8&&9'; 4 1.5 1.00 1 2 3 4 1 2 3 4 1 2 3 4 Time 20
Recommend
More recommend