Robust and Scalable Models of Microbiome Dynamics Travis E. Gibson 1 Georg K. Gerber 1 , 2 tgibson@mit.edu ggerber@bwh.harvard.edu 1 Massachusetts Host Microbiome Center Brigham and Women’s Hospital & Harvard Medical School 2 Harvard-MIT Health Sciences and Technology 35 th International Conference on Machine Learning July 12 th , 2018 Poster #2
Outline 1 What is the microbiome 2 From sequences to reads to microbial interactions 3 Bayesian nonparametric model for microbe dynamics 2 / 18
The Microbiome 1 The microbiome is the aggregate of microorganisms that resides on or within any of a number of human tissues and biofluids: • skin, mammary glands, placenta, seminal fluid, uterus, ovarian follicles, lung, saliva, oral mucosa, conjunctiva, biliary and gastrointestinal tracts ) [wikipedia] 2 10 14 Microbes in/on your body [Sender et al. PLoS Biology 2016] 3 3.3 million genes compared to 23,000 human genes [Qin et al. Nature 2010] 4 Play a role in a variety of human diseases: • infections, arthritis, food allergy, cancer, inflammatory bowel disease, neurological diseases, and obesity/diabetes 3 / 18
Bacteriotherapy Bacteriotherapy : communities of bacteria administered to patients for specific therapeutic applications • “bugs-as-drugs” Clostridium difficile infection • Causes serious diarrhea (14K deaths/yr) • Antibiotics disrupt helpful bacteria in gut • Increasingly difficult to treat with conventional therapies (more antibiotics): 20-30% recurrence rate Pharmacology meets Ecology positive microbe A produces a small molecule (metabolite) that microbe B needs negative two microbes competing for C. diff the same niche what if there were 300 bugs in the network? microbial interaction network 4 / 18
Workflow in our lab batch experiments • 16S rRNA on MiSeq (reads) for relative abundances of species abundance relative chemostat t i m e • 16S rRNA qPCR (universal primers) for bacterial biomass animal biomass experiments total time 5 / 18
Sequencing to obtain microbe relative abundances • 16S rRNA on MiSeq (reads) 1. Fastq file for relative abundances of @sequence-id species nucleobase sequence TCGCACTCAACGCCCTGCATATGACAAGACAGAATC + abundance <>;##=><9=AAAAAAAAAA9#:<#<;<<<????#= quality scores relative 2. Sequences are clustered 3. Read Table t i m e sample1 sample2 ... • 16S rRNA qPCR (universal bacteria1 11 1004 primers) for bacterial biomass 7 bacteria2 0 bacteria3 biomass 301 275 total . . . time • measurements - irregular, sparse & noisy Reads ∼ Negative Binomial 6 / 18
Quantitative PCR for total bacterial biomass • 16S rRNA on MiSeq (reads) control templates library for relative abundances of Dilute control template species and library for quantification abundance relative Amplification Plot relative fluorescence 1 µg 0.1 µg 10 ng t i m e 1 ng 0.1 ng • 16S rRNA qPCR (universal 10 pg 1 pg primers) for bacterial biomass cycle biomass Standard Curve total time Ct • measurements - irregular, sparse & noisy initial quantity 7 / 18
Irregular and sparse measurements • 16S rRNA on MiSeq (reads) for relative abundances of species abundance relative measurements t i m e perturbation abundance • 16S rRNA qPCR (universal primers) for bacterial biomass biomass total time time • measurements - irregular, sparse & noisy 8 / 18
Learning microbial interaction networks Abundance of microbe i at time t : x t,i • 16S rRNA on MiSeq (reads) d x t,i for relative abundances of = α i x t,i + β ii x 2 � t,i + β ij x t,i x t,j species d t j � = i Interaction Network abundance growth, self limiting, interaction relative Previous literature specific to the microbiome • Log transform dynamics → Linear t i m e Regression + L2 [Stein et al. PLoS Comput • 16S rRNA qPCR (universal Biology 2013] primers) for bacterial biomass • Sparse linear regression with bootstrap biomass aggregation [Fisher et al. PLoS One 2014] total • Bayesian model with deterministic time dynamics (independent filtering) [Bucci et • measurements - irregular, al. Genome Biology 2016] sparse & noisy • Extended Kalman Filter [Alshawaqfeh et al. BMC Genomics 2017] 9 / 18
Goal with our model and short literature review Three main contributions in our model 1 Clustering (interaction modules) • Dirichlet Process (DP) Interaction Network Interaction Modules [Rasmussen, Advances in Information Processing Systems 2000] [Neal, Journal of computational and graphical statistics , 2000] 2 Edge selection (structure learning, variable selection) • Bayesian Networks [George and McCulloch, Journal of the ASA , 1993] • 300 species [Heckerman, A Tutorial on Learning with Bayesian Networks , 2008] • 90,000 interactions 3 Introduction of an auxiliary variable between • Redundant gene the measurement model and latent state function 10 / 18
Back to the basic time series model • Abundance of microbe i at time t : x t,i d x t,i β ij x t,i x t,j + d w t,i � = α i x t,i + β ii x 2 t,i + d t d t j � = i growth, self limiting, interaction, stochastic disturbance • Convert to discrete time � � α i x k,i + β ii x 2 � � x k +1 ,i = x k,i + k,i + β ij x k,i x k,j ∆ k + ( w k +1 ,i − w k,i ) ∆ k j � = i discrete time step size Next we discuss the three main ingredients to our model 1 Clustering (interaction modules) 2 Edge selection (structure learning, variable selection) 3 Introduction of an auxiliary variable between the measurement model 11 / 18
Complete Model Dirichlet Process Edge Selection (Structure) π c α c i π c | α ∼ Stick ( α ) σ a z c i , c j | π z ∼ Bernouli ( π z ) i ∈ [ n ] c i | π c ∼ Multinomial ( π c ) b c i , c j | σ b ∼ Normal (0 , σ 2 b ) Self Interactions a i x k,i b ℓ,m σ b a i, 1 , a i, 2 | σ a ∼ Normal (0 , σ 2 a ) z ℓ,m π z q k,i ℓ ∈ Z + Dynamics m ∈ Z + y k,i Q k x k +1 ,i | x k , a i , b , c , z , σ w ∼ k ∈ [ m ] � � � � i ∈ [ n ] , ∆ k σ 2 x k,i + x k,i a i, 1 + a i, 2 x k,i + � b c i , c j z c i , c j x k,j Normal w c j � = c i Constraint and Measurement Model q k,i | x k,i ∼ Normal ( x k,i , σ 2 aux q ) reads y k,i | q k,i ∼ NegBin ( φ ( q k ) , ǫ ( q k )) Q k | q k,i ∼ Normal �� � i q k,i , σ 2 qPCR Q k 12 / 18
Simplified model unraveled in time - auxiliary variable x t +1 ,i | x t , a ∼ Normal ( a T i f ( x t ) , σ 2 x i ) a q k,i | x k,i ∼ Normal ( x k,i , σ 2 q ) x 1 x 2 x 3 · · · x n Prior on q is positive, q k,i ∼ Uniform [0 , L ) relaxing the distribution q 1 q 2 q 3 · · · q n y k,i | σ y , q k,i ∼ Normal ≥ 0 ( q k,i , σ 2 y ) on the dynamics for x a i ∼ Normal (0 , σ 2 a i ) y 1 y 2 y 3 y n · · · a ( g +1) ∼ p a | x ( · | x ( g ) ) Parameter inference Gibbs step: • Direct sampling from the posterior possible (Bayesian Regression!) Sampling for other variables • Collapsed Gibbs sampling for Dirichlet Process and Edge Selection (integrate out a ) • Filtering is still challenging but easy to design proposals for (MH) a x 1 x 2 x 3 · · · x n q 1 q 2 q 3 q n · · · y 1 y 2 y 3 y n · · · 13 / 18
Synthetic experiment • Comparing inference with and without clustering enabled 1 , 5 , 7 9 , 11 2 3 • Ground truth interaction network − 1 2 , 4 , 6 , 8 3 , 13 10 , 12 − 4 time C D Microbe Interactions (Truth) Forecast Trajectories Interaction Coefficients 5 25 Module Learning Off 1 -5 0 0 0 0 3 3 3 3 3 3 -1 -1 10 10 Module Learning On 5 0 -5 0 0 0 3 3 3 3 3 3 -1 -1 7 0 0 -5 0 0 3 3 3 3 3 3 -1 -1 RMSE (log scaling) 1/(abundance time) 9 0 0 0 -5 0 3 3 3 3 3 3 -1 -1 20 11 0 0 0 0 -5 3 3 3 3 3 3 -1 -1 2 0 0 0 0 0 -5 0 0 0 0 0 0 0 RMSE 4 0 0 0 0 0 0 0 -5 0 0 0 0 0 0 10 5 6 0 0 0 0 0 0 0 -5 0 0 0 0 0 15 8 0 0 0 0 0 0 0 0 -5 0 0 0 0 10 0 0 0 0 0 0 0 0 0 -5 0 0 0 12 0 0 0 0 0 0 0 0 0 0 -5 0 0 3 2 2 2 2 2 -4 -4 -4 -4 -4 -4 -5 0 10 13 2 2 2 2 2 -4 -4 -4 -4 -4 -4 0 -5 10 0 -5 1 5 7 9 11 2 4 6 8 10 12 3 13 1 2 3 4 5 1 2 3 4 5 Biological Replicates Biological Replicates 14 / 18
Synthetic experiment continued A Microbe Co-cluster Proportions Microbe Interactions (RMSE=9.49) Forecasted Trajectories (RMSE=1.88) 1 5 1 1 5 5 15 microbiota abundances 7 7 0.8 1/(abundance time) 9 9 11 11 0.6 10 2 2 4 4 0 6 6 0.4 8 8 5 10 10 12 0.2 12 3 3 13 13 0 -5 0 1 5 7 9 11 2 4 6 8 10 12 3 13 1 5 7 9 11 2 4 6 8 10 12 3 13 0 50 100 time B Microbe Co-cluster Proportions Microbe Interactions (RMSE=15.9) Forecasted Trajectories (RMSE=2.06) 1 5 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 5 5 0 1 0 0 0 0 0 0 0 0 0 0 0 15 microbiota abundances 7 0 0 1 0 0 0 0 0 0 0 0 0 0 7 0.8 1/(abundance time) 9 0 0 0 1 0 0 0 0 0 0 0 0 0 9 11 0 0 0 0 1 0 0 0 0 0 0 0 0 11 0.6 10 2 0 0 0 0 2 0 1 0 0 0 0 0 0 0 4 4 0 0 0 0 0 0 1 0 0 0 0 0 0 0 6 0 0 0 0 0 0 0 1 0 0 0 0 0 6 0.4 8 0 0 0 0 0 0 0 0 1 0 0 0 0 8 5 10 0 0 0 0 0 0 0 0 0 1 0 0 0 10 12 0 0 0 0 0 0 0.2 12 0 0 0 0 1 0 0 3 3 0 0 0 0 0 0 0 0 0 0 0 1 0 13 0 0 0 0 0 0 0 0 0 0 0 0 1 13 0 -5 0 1 5 7 9 11 2 4 6 8 10 12 3 13 1 5 7 9 11 2 4 6 8 10 12 3 13 0 50 100 time 15 / 18
Recommend
More recommend