Statistical Foundations for Analyzing Human Microbiome Data Human Microbiome Data Patricio S. La Rosa 1 , Paul Brooks 2 , Yanjiao Zhou 1 , Elena Deych 1 , Berkley Shands 1 , Ed Boone 2 , David Edwards 2 , Qin Wang 2 , Erica Sodergren 1 , George Weinstock 1 , and Bill Shannon 1 1 Washington University in St. Louis Medical School 2 Virginia Commonwealth University
Probability Models Simplify Data Probability Models Simplify Data • Replace data by model Replace data by model ( ( ) 2 ) x − x u u − ( ) 1 1 = μ σ = σ 2 P X x ; , e 2 and parameters i i πσ 2 2 – Mean and std. dev. defines normal data – Statistical tests compare parameters (e g t test) parameters (e.g., t ‐ test) • What probability • What probability models will work for HMP data? HMP data? 3/9/2011 IHMC Vancouver 2
Dirichlet Multinomial Distribution Dirichlet ‐ Multinomial Distribution • Relative Abundance Data Relative Abundance Data – Numbers of individuals observed for each taxon – Multivariate descriptor of ecological community { } ( ) ( ) ∏ ∏ K x π − θ + − θ ij ( { } ) 1 r 1 N ! = = j = π θ = j 1 r 1 i P X x ; , { ( ) ( ) } ∏ i i j L N − θ + − θ x ! x ! i 1 r 1 i 1 iK = r 1 { } π j = θ = mean proportion of taxa j, measure of dispersion 3/9/2011 IHMC Vancouver 3
Does DM Fit HMP Data? Does DM Fit HMP Data? • Goodness ‐ of ‐ Fit Goodness of Fit – Power > 99% to correctly decide data is Dirichlet ‐ Multinomial – Size of test to correctly decide data is decide data is multinomial ~5% • Simulations indicate DM is good fit to HMP data 3/9/2011 IHMC Vancouver 4
What Hypotheses Can We Test? What Hypotheses Can We Test? • Test model parameters p – [3] analogous to 1 sample t ‐ test – [4] analogous to 2 sample t ‐ test or ANOVA 3/9/2011 IHMC Vancouver 5
Power and Sample Sizes? Power and Sample Sizes? Table 3. Comparing RAD means from 2 populations using hypothesis test [5]. P/Nr 100 500 1000 10000 20000 10 0.78 0.87 0.89 0.90 0.90 20 0.89 0.97 0.98 0.98 0.98 40 0.98 >0.99 >0.99 >0.99 >0.99 60 >0.99 >0.99 >0.99 >0.99 >0.99 100 >0.99 >0.99 >0.99 >0.99 >0.99
Object Data Analysis (ODA) Object Data Analysis (ODA) • Apply probability model to Apply probability model to graphical (tree) objects – Sequence reads map to paths in a tree – Samples map to a tree ( ) ( ) ( ( ) ) = ∗ τ = ∗ τ × − τ ∗ P G g ; g , c g , exp d g , g i i ∗ = τ = = g core microbiome , disperison , d distance 3/9/2011 IHMC Vancouver 7
Bacteria 2.97 Bacteroidetes 0.99 Firmicutes 1.49 Bacteroidia 0.99 Clostridia 0.53 Bacilli 0.91 B Bacteroidales 0.99 t id l 0 99 L Lactobacillales 0.9 t b ill l 0 9 Clostridiales 0.53 Cl t idi l 0 53 Enterococcaceae 0.44 Prevotellaceae 0.99 Veillonellaceae 0.53 Prevotella 0.99 Pilibacter 0.41 Megasphaera 0.52 3/9/2011 IHMC Vancouver 8
How do we estimate the core? How do we estimate the core? 3/9/2011 IHMC Vancouver 9
Are Variable Region Cores Equal? Are Variable Region Cores Equal? 3/9/2011 IHMC Vancouver 10
Are Body Site Cores Equal? Are Body Site Cores Equal? 3/9/2011 IHMC Vancouver 11
Why Use Probability Models? Why Use Probability Models? • Parameters simplify interpretation of data (e.g., p y p ( g , core defined by central graph) • Formal hypotheses and P values (e.g., DM t ‐ test ( and ANOVA analogs) • Existing statistical machinery (e.g., power calculations for study design) y g ) • All estimates come with error (e.g., confidence errors) ) 3/9/2011 IHMC Vancouver 12
Two Posters Two Posters • Dirichlet ‐ Multinomial Power Calculations and Statistical Tests for Microbiome Data – La Rosa, Brooks, Deych, Boone, Edwards, Wang, , , y , , , g, Sodergren, Weinstock, Shannon • Statistical Analysis of Taxonomic Trees in Microbiome Research – La Rosa, Zhou, Deych, Shands, Sodergren, Weinstock, Shannon , 3/9/2011 IHMC Vancouver 13
Recommend
More recommend