statistics in high content biology content biology
play

Statistics in high- -content biology content biology Statistics in - PowerPoint PPT Presentation

Statistics in high- -content biology content biology Statistics in high Rebecca Walls Rebecca Walls Advanced Science & Technology Laboratory Advanced Science & Technology Laboratory Outline Outline Introduction and aim of


  1. Statistics in high- -content biology content biology Statistics in high Rebecca Walls Rebecca Walls Advanced Science & Technology Laboratory Advanced Science & Technology Laboratory

  2. Outline Outline • Introduction and aim of high-content biology • Predicting liver toxicity in vivo in vivo • Distinguishing distinct modes of compound action 2 Rebecca Walls, Non-Clinical Statistics Conference 2008, Leuven

  3. Current issues facing the pharmaceutical Current issues facing the pharmaceutical industry industry • All pharmaceutical companies face high attrition of compounds through the discovery and development process • Two key issues that face project progression are • Safety and toxicity • Safety and toxicity • Efficacy in disease process • Efficacy in disease process • Need to know more about the mechanism of action and toxicity of our compounds at an earlier stage in the discovery process • More information enables front-loading of risk, early go/no-go decisions and improvements in toxicological attrition 3 Rebecca Walls, Non-Clinical Statistics Conference 2008, Leuven

  4. High- -content biological assays content biological assays High • Attempt to use in vitro in vitro cell models to mimic the complexity of an in vivo in vivo situation • Advanced imaging techniques used to generate large, complex datasets describing the response of a population of cells to a compound • Aim is to build predictive models or ‘fingerprints’ from the multiparametric assay data for well- characterised compounds that elicit known responses • Fingerprints applied to new drugs to predict biological mechanism of action and its toxicity 4 Rebecca Walls, Non-Clinical Statistics Conference 2008, Leuven

  5. Cell culture Cell culture • Cells are extracted from some source tissue e.g. rat hepatocytes, tumour derived cell-lines • Cells are plated into multi-well plates, typically hundred or thousands of cells per well • Each well is like test tube • Cells grown in where we can test a single the well can be prototype drug labelled and imaged Media layer Cells 5 Rebecca Walls, Non-Clinical Statistics Conference 2008, Leuven

  6. HCB cellular profiling HCB cellular profiling Apoptosis Apoptosis ER/Golgi ER/ Golgi Membrane markers Protein trafficking Blebbing Secretion Necrosis Nucleus Nucleus DNA content Cytoskeleton Cytoskeleton Size Tubulin Shape Actin Cell division Fibre content Fragmentation Length Micronuclei Mitotic arrest Mitochondria Mitochondria Viability Cell Morphology Cell Morphology Mass Count Activity Area Cellular distibution Form Pre-Apoptotic indicators Roundness Length/Breadth General imaging indicators General imaging indicators Perimeter 6 Rebecca Walls, Non-Clinical Statistics Conference 2008, Leuven

  7. Statistical challenges Statistical challenges • Information captured for each feature is a dynamic response to the compound over an 8-point dose- range • Datasets possess three-dimensional cube-like structure FEATURES FEATURES • Traditional multivariate COMPOUNDS COMPOUNDS approaches are difficult to DOSES DOSES apply to this type of data directly 7 Rebecca Walls, Non-Clinical Statistics Conference 2008, Leuven

  8. Case study 1: Predicting liver toxicity in in Case study 1: Predicting liver toxicity vivo vivo • Drug-induced liver toxicity is one of the most common causes of drug non- approval • Early in vitro identification of compounds with hepatotoxic risk would allow their de-selection early in the drug development process Cell Death Fatty Liver (Necrosis) (Steatosis) Cholestasis Phospholipidosis In the animal In the lab 8 Rebecca Walls, Non-Clinical Statistics Conference 2008, Leuven

  9. Predicting steatosis steatosis - - data data Predicting HT0053 • Primary rat hepatotcytes treated with 60 20000 15000 compound set at a range of doses, consisting of 10000 5000 known steatotics and non-steatotics 0 • Bespoke algorithms designed to quantify -5000 0.5 1 5 10 50 100 500 1000 5000 10000 Dose differences in localisation and morphology of lipid HT1042 20000 droplets in the cells 15000 • Generates 32 different continuous measurements 10000 5000 per cell 0 • Averaged over cell population to give well-level -5000 0.5 1 5 10 50 100 500 1000 5000 10000 Dose measurements for each compound and dose HT1102 20000 combination 15000 10000 • Use partial least squares modelling (stepwise) 5000 with the steatotic annotation as a binary response 0 -5000 0.5 1 5 10 50 100 500 1000 5000 10000 9 Rebecca Walls, Non-Clinical Statistics Conference 2008, Leuven Dose

  10. Polynomial model Polynomial model Proportion of edge fat Proportion of edge fat – – steatotic steatotic • Fit cubic polynomial to dose-response data for each feature 0.06 • t -statistics for each term in cubic form a new set of variables -0.06 -0.04 -0.02 0.00 0.02 0.04 • Only a small number of variables required to generate y greatest predictivity • After cross-validation, polynomial model is approximately 10% better than range model -2 -1 0 1 2 x 1.0 1 variable Proportion of edge fat – Proportion of edge fat – non non- -steatotic steatotic 0.8 0.10 0.6 Sensitivity 0.05 0.4 y 0.00 0.2 50 variables 0.0 -2 -1 0 1 2 0.0 0.2 0.4 0.6 0.8 1.0 x Specificity 10 Rebecca Walls, Non-Clinical Statistics Conference 2008, Leuven

  11. Advantages of model Advantages of model • Based on predictive scores, compounds can be ranked in order of steatotic effect • Bootstrapping, incorporating random x -resampling, used to generate 95% confidence intervals for the predicted score • High confidence, high steatotic effect compounds can be de-selected 1.5 1.0 Steatotic effect 0.5 0.0 -0.5 -1.0 1 4 7 11 15 19 23 27 31 35 39 43 47 51 55 59 Compounds 11 Rebecca Walls, Non-Clinical Statistics Conference 2008, Leuven

  12. Case study 2: Identifying distinct modes Case study 2: Identifying distinct modes of compound action of compound action • Morphology high content assay developed specifically to examine microtubules and actin filaments as oncology targets – • Describes how drugs influence entire complex cellular phenotype (i.e. multiple targets) • 102 compounds screened through the morphology assay • Primary aims are • Identify which compounds are active in the assay i.e. which • Identify which compounds are active in the assay i.e. which are ‘hits’? are ‘hits’? • Differentiate compound hits that have distinct morphological • Differentiate compound hits that have distinct morphological effects effects • Cluster hits together that have similar effects • Cluster hits together that have similar effects • 138 features for each compound, tested over 8 doses • 310 control wells 12 Rebecca Walls, Non-Clinical Statistics Conference 2008, Leuven

  13. Principal components analysis Principal components analysis • PCA used in an attempt to reduce dimension of dataset, yielding 6 principal components which explain close to 80% of variation • Mahalanobis distance is powerful means of determining how similar an unknown sample is to a known one • Differs from Euclidean distance in that it takes into account the covariance between variables • The Mahalanobis distance from a group of values with mean μ =( μ 1 , μ 2 , …, μ p ) T and covariance matrix Σ for multivariate vector x =( x 1 , x 2 , …, x p ) T is defined as − x = − μ Σ − μ 1 D M ( x ) ( x ) ( ) 13 Rebecca Walls, Non-Clinical Statistics Conference 2008, Leuven

  14. Using the Mahalanobis distance Using the Mahalanobis distance • Working on the PCA scores on the 6 principal components, the covariance matrix of the control cloud was calculated • For each compound at every dose, the squared Mahalanobis distance to the centre of mass was calculated and compared to a chi-squared distribution with 6 degrees of freedom at some pre-chosen significance level, α . • An adjustment was made to control the false discovery rate • A compound with a significant result at at least at least one of the doses along its range was deemed to be an ‘active hit’. Squared Mahalanobis distances 0.15 0.10 Density 0.05 0.00 0 20 40 60 80 100 120 Hits Non- -hits hits Hits Non 14 Rebecca Walls, Non-Clinical Statistics Conference 2008, Leuven

  15. Distinguishing distinct phenotypes Distinguishing distinct phenotypes • Homogeneous nuclei Homogeneous nuclei • and cell shape and cell shape Buffer Compound A Compound G • Aneuploidy Aneuploidy – – big nuclei big nuclei • • • Increased cell size Increased cell size Compound B Compound F F Compound C • Stabilised cell Stabilised cell- -cell junctions cell junctions • Compound E Compound D – results in ‘clumpy’ cells results in ‘clumpy’ cells – • No single cells • No single cells 15 Rebecca Walls, Non-Clinical Statistics Conference 2008, Leuven

Recommend


More recommend