Robust and Scalable Models of Microbiome Dynamics for Bacteriotherapy Design Travis E. Gibson 1 Georg K. Gerber 1 , 2 1 Massachusetts Host Microbiome Center Brigham and Women’s Hospital and Harvard Medical School 2 Health Sciences and Technology Division Harvard-MIT December 9, 2017 NIPS 2017 Workshop on Machine Learning in Computational Biology 1 / 13
Outline 1 Background on the Human Microbiome 2 From Experimental Design to Bacteriotherapies 3 Model of microbial dynamics 4 Inference Model 5 Applications Gerber Lab is looking for Post-docs and PhD students 2 / 13
The Microbiome 1 The microbiome is the aggregate of microorganisms that resides on or within any of a number of human tissues and biofluids: • skin, mammary glands, placenta, seminal fluid, uterus, ovarian follicles, lung, saliva, oral mucosa, conjunctiva, biliary and gastrointestinal tracts ) [wikipedia] 2 10 14 Microbes in/on your body [Sender et al. PLoS Biology 2016] 3 3.3 million genes compared to 23,000 human genes [Qin et al. Nature 2010] 4 Large component of the immune system 5 Play a role in a variety of human diseases: • infections, arthritis, food allergy, cancer, inflammatory bowel disease, neurological diseases, and obesity/diabetes 3 / 13
Bacteriotherapy Bacteriotherapy : communities of bacteria administered to patients for specific therapeutic applications • “bugs-as-drugs” Clostridium difficile infection • Causes serious diarrhea (14K deaths/yr) • Antibiotics disrupt helpful bacteria in gut • Increasingly difficult to treat with conventional therapies (more antibiotics): 20-30% recurrence rate Pharmacology meets Ecology positive microbe A produces a small molecule (metabolite) that microbe B needs negative two microbes competing for C. diff the same niche what if there were 300 bugs in the network? microbial interaction network 4 / 13
Workflow in our lab • 16S rRNA on MiSeq batch (reads) for relative abundances of species experiments • 16S rRNA qPCR (universal primers) for Interaction Network Interaction Modules bacterial biomass abundance chemostat time animal experiments • 300 species • measurements - irregular, sparse & noisy • 90,000 interactions 5 / 13
Microbial Dynamics • Abundance of microbe i at time t : x t,i d x t,i β ij x t,i x t,j + d w t,i � = α i x t,i + β ii x 2 t,i + d t d t j � = i growth, carrying capacity, interaction, stochastic disturbance • Convert to discrete time � � α i x k,i + β ii x 2 � � x k +1 ,i = x k,i + k,i + β ij x k,i x k,j ∆ k + ( w k +1 ,i − w k,i ) ∆ t j � = i discrete time step size Next we discuss the three main ingredients to our model 1 Clustering (interaction modules) 2 Edge selection (structure learning, variable selection) 3 Introduction of an auxiliary variable between the measurement model 6 / 13
Complete Model α π c Dirichlet Process Edge Selection (Structure) c i σ a i ∈ [ n ] π c | α ∼ Stick ( α ) z c i , c j | π z ∼ Bernouli ( π z ) c i | π c ∼ Multinomial ( π c ) x k,i a i b ℓ,m σ b b c i , c j | σ b ∼ Normal (0 , σ 2 Self Interactions b ) z ℓ,m π z a i, 1 , a i, 2 | σ a ∼ Normal (0 , σ 2 a ) q k,i ℓ ∈ Z + m ∈ Z + Dynamics y k,i k ∈ [ m ] x k +1 ,i | x k , a i , b , c , z , σ w ∼ i ∈ [ n ] � � � � a i, 1 + a i, 2 x k,i + � , ∆ k σ 2 x k,i + x k,i b c i , c j z c i , c j x k,j Normal w c j � = c i Constraint and Measurement Model q k,i | x k,i ∼ Normal ( x k,i , σ 2 q ) y k,i | σ y , q k,i ∼ f ( q k,i ) f ∈ { Neg. Bin., Log Norm., ... } 7 / 13
Simple example without the intermediate auxiliary variable Σ a 0 x t +1 ,i | x t , a ∼ Normal ≥ 0 ( a i T f ( x t ) , σ 2 x i ) a Note the truncated dis- y t,i | x t,i ∼ Normal ≥ 0 ( x t,i , σ 2 y i ) x 1 x 2 x 3 · · · x n tributions for x and y a i ∼ Normal (0 , σ 2 a i ) y 1 y 2 y 3 · · · y n a ( g +1) ∼ p a | x ( · | x ( g ) ) Parameter inference Gibbs step: Normal ≥ 0 ( x ; µ ( a , x ) , σ 2 ) p a | x ∝ p x | a p x | a p a p a Normal ( a ; 0 , σ 2 ) 2 σ 2 ( x − µ ( a , x )) 2 2 σ 2 a 2 e − 1 � � e − 1 = √ � 2 π √ σ � − µ ( a , x ) 2 π Φ( ∞ ) − Φ σ σ Sampling for other variables • Filtering (sampling from posterior of x ) is challenging • Can not use collapsed Gibbs sampling for Dirichlet Process or Edge Selection 8 / 13
Introducing an auxiliary variable x t +1 ,i | x t , a ∼ Normal ( a T i f ( x t ) , σ 2 Σ a x i ) 0 a q k,i | x k,i ∼ Normal ( x k,i , σ 2 q ) Prior on q is positive, x 1 x 2 x 3 · · · x n q k,i ∼ Uniform [0 , L ) relaxing the distribution y k,i | σ y , q k,i ∼ Normal ≥ 0 ( q k,i , σ 2 y ) on the dynamics for x q 1 q 2 q 3 · · · q n a i ∼ Normal (0 , σ 2 a i ) y 1 y 2 y 3 · · · y n a ( g +1) ∼ p a | x ( · | x ( g ) ) Parameter inference Gibbs step: • Direct sampling from the posterior now possible (Bayesian Regression!) Sampling for other variables • Collapsed Gibbs sampling for Dirichlet Process and Edge Selection (integrate out a ) • Filtering is still challenging but easier to design proposals than before (MH) 9 / 13
Synthetic consortia of small microbial community • Microbes engineered to overproduces one amino acid B. fragilis E. coli • Microbes engineered to need three amino acids • Compare inference on WT and engineered strains to Marika Ziesack S. typhimurium prove that engineering was performed. B. theta Silver Lab, Harvard Synthetic Data Learning from 2 batch experiments Learning from 2 batch experiments Learning from 2 batch experiments Learning from 2 batch experiments Bayes Factors Bayes Factors Bayes Factors Bayes Factors Interaction Coefficients Interaction Coefficients Interaction Coefficients Interaction Coefficients Example Trajectory Example Trajectory Example Trajectory Example Trajectory Simulated Trajectories Simulated Trajectories Simulated Trajectories Simulated Trajectories 10 10 10 10 4 4 4 4 1 1 1 1 1 1 1 1 -0.6 -0.6 -0.6 -0.6 0 0 0 0 0.3 0.3 0.3 0.3 0 0 0 0 1/(abundance time) 1/(abundance time) 1/(abundance time) 1/(abundance time) Inf Inf Inf Inf 0.2 0.2 0.2 0.2 3 3 3 3 0.3 0.3 0.3 0.3 20 20 20 20 20 20 20 20 8 8 8 8 abundance abundance abundance abundance abundance abundance abundance abundance 2 2 2 2 2 2 2 2 2 2 2 2 -0 -0 -0 -0 -4.1 -2.5 -4.1 -2.5 -4.1 -2.5 -4.1 -2.5 -0 -0 -0 -0 0.1 0.1 0.1 0.1 Inf Inf Inf Inf 1226 1226 1226 1226 0.2 0.2 0.2 0.2 6 6 6 6 0 0 0 0 10 10 10 10 10 10 10 10 4 4 4 4 3 3 3 3 3 3 3 3 0.1 0.1 0.1 0.1 0.4 0.4 0.4 0.4 -3.4 -3.4 -3.4 -3.4 0.5 0.5 0.5 0.5 0.9 0.9 0.9 0.9 2.5 2.5 2.5 2.5 Inf Inf Inf Inf 23.4 23.4 23.4 23.4 -2 -2 -2 -2 2 2 2 2 0 0 0 0 4 4 4 4 4 4 4 4 1 1 1 1 0 0 0 0 2.1 2.1 2.1 2.1 -3.1 -3.1 -3.1 -3.1 0 0 0 0 Inf Inf Inf Inf 0.2 0.2 0.2 0.2 Inf Inf Inf Inf Inf Inf Inf Inf 0 0 0 0 200 200 200 200 400 400 400 400 600 600 600 600 0 0 0 0 200 200 200 200 400 400 400 400 600 600 600 600 0 0 0 0 -4 -4 -4 -4 time time time time 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4 time time time time Learning from 4 batch experiments Learning from 4 batch experiments Learning from 4 batch experiments Learning from 4 batch experiments Bayes Factors Bayes Factors Bayes Factors Bayes Factors Interaction Coefficients Interaction Coefficients Interaction Coefficients Interaction Coefficients Ground Truth Ground Truth Ground Truth Ground Truth Simulated Trajectories Simulated Trajectories Simulated Trajectories Simulated Trajectories 4 4 4 4 10 10 10 10 4 4 4 4 1/(abundance time) 1/(abundance time) 1/(abundance time) 1/(abundance time) 1 1 1 1 1 1 1 1 -0.9 -0.9 -0.9 -0.9 -0 -0 -0 -0 0 0 0 0 0.7 0.7 0.7 0.7 1/(abundance time) 1/(abundance time) 1/(abundance time) 1/(abundance time) 1 1 1 1 -1 -1 -1 -1 0 0 0 0 0 0 0 0 1 1 1 1 Inf Inf Inf Inf 0.24 0.24 0.24 0.24 0.172 0.172 0.172 0.172 Inf Inf Inf Inf 20 20 20 20 8 8 8 8 abundance abundance abundance abundance 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 -0 -0 -0 -0 -4.4 -2.5 -4.4 -2.5 -4.4 -2.5 -4.4 -2.5 0 0 0 0 2 2 2 2 0 0 0 0 -4 -4 -4 -4 -3 -3 -3 -3 0 0 0 0 1.194 1.194 1.194 1.194 Inf Inf Inf Inf Inf Inf Inf Inf 0.191 0.191 0.191 0.191 6 6 6 6 0 0 0 0 0 0 0 0 10 10 10 10 4 4 4 4 3 3 3 3 3 3 3 3 -0 -0 -0 -0 0.5 0.5 0.5 0.5 -3.4 -3.4 -3.4 -3.4 0.6 0.6 0.6 0.6 3 3 3 3 0 0 0 0 2 2 2 2 -4 -4 -4 -4 1 1 1 1 0.08 0.08 0.08 0.08 22.66 22.66 22.66 22.66 Inf Inf Inf Inf Inf Inf Inf Inf -2 -2 -2 -2 -2 -2 -2 -2 2 2 2 2 4 4 4 4 1 1 1 1 0 0 0 0 2 2 2 2 -3 -3 -3 -3 4 4 4 4 4 4 4 4 1 1 1 1 0 0 0 0 2 2 2 2 -3 -3 -3 -3 0 0 0 0 Inf Inf Inf Inf 0.107 0.107 0.107 0.107 Inf Inf Inf Inf Inf Inf Inf Inf 0 0 0 0 200 200 200 200 400 400 400 400 600 600 600 600 0 0 0 0 -4 -4 -4 -4 -4 -4 -4 -4 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4 time time time time 10 / 13
Recommend
More recommend