CSEP 590B “Summary” Below, as a somewhat unusual “course summary,” I have decided to give the bulk of a research talk I presented in our CompBio seminar last spring, partly because I think the content is interesting, but more to show how deeply “computation” is embedded in modern “bio” research, and to show that many of the themes of the course are directly relevant. Asides emphasizing these connections are highlighted in a sprinkling of boxes like this. Also note that the last ~40 slides of Lecture 9 “CMs” were actually presented in Lecture 10, but conceptually and logistically it was easier to split the slides this way...
April 2010 Goal: To give you a sense of where the “comp” fits in a modern “bio” paper.
Outline Transcription factors & MyoD Chromatin Immunoprecipitation (ChIP) ChIP-seq Computational Methods Results
Transcription factors & TFBS motif discovery ... Enhancer TF TSS Gene Promoter
Myogenesis: N e t w o Myoblast – a muscle precursor r k m o Myotube – differentiated skeletal muscle cell t i f s MyoD MyoD Mef2 MyoD Mef2 Myogenin MyoD is the MyoD Myogenin “master regulator” Other players: Mef2, MyoG, ...
f a A m g “Standard Model” i a l i i a n r , g r o u n d MyoD absent or low in myoblasts Triggering it in myoblasts (or many other cell types) starts a cascade leading to myotubes 500-1500 genes show differential expression between myoblasts & myotubes Expectation: MyoD drives those changes, by binding their promoters, plus a few enhancer sites
Chromatin Immunoprecipitation Antibody Readout: qPCR microarray deep seq
MyoD Experimental Design R e c a l l C2C12 s e q u Myoblasts e n c i n g Gene Solexa specific Sequencing QC-PCR C2C12 Myotubes Chromatin IP with anti- Myod antisera
ChIP-seq Sample Prep ChIP DNA End repair 5’ A 3’-dA overhang A A GATCGGAAGAGCTCGTATGCCGTCTTCTGCTTG3’ 5’ACACTCTTTCCCTACACGACGCTCTTCCGATCT Adapter ligation 3’GTTCGTCTTCTGCCGTATGCTCGAGAAGGCTAG A TCTAGCCTTCTCGCAGCACATCCCTTTCTCACA5’ Size selection 150-250bp 5’AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT PCR amplification 3’TCTAGCCTTCTCGAGCATACGGCAGAAGACGAAC R e c a l l P C R Load 2 picomoles on the machine Adaptor total length: 67bp
R e c a l l Bioanalyzer Analysis g e l e l e c t r o p h o r e s i s ChIP Fragment Mean = 130 Range = 60 - 200
Analysis & Methods
L a t e s t t e ChIP-seq Analysis c h r n e o a l d o s g y p e → r r u ~ n 1 0 9 Yields 5-20M “reads” per lane (8 lanes per run, usually 8 different samples) Reads (35-55 bp, depending on run) are mapped back to the mouse reference genome. only one copy of dup reads retained (PCR artifacts?) tolerate 2 bp mismatch among 1st 28 bp reads not mapping uniquely are discarded “Extended read” – pretend each is 200 bp Overlapping extended reads presumably mark binding sites
Identification of Binding Regions Few reads = short flat peaks Many reads = High sharp peaks
p R r e e d c a i c l l t i g o a e n Myod Locus l n . i g e A n , T m l s F o e B d , n i m S s t c , u o r l e v t i e p - w r e y a a , . t y . . 12 12 MyoD
Are the antibodies any good? A Myog B
Analysis questions: R e c a r l a l t l i i o k e t l i e h s o t s o How tall must a peak be? , d e t c . Estimate Poisson null model from “islands” of height 1, 2. How likely is height 6? 12?
Results
f a A m g “Standard Model” i a l i i a n r , g r o u n d MyoD binding absent or rare in myoblasts Triggering it in myoblasts (or many other cell types) starts a cascade leading to myotubes 500-1500 genes show differential expression between blasts & tubes Expectation: MyoD binds their promoters & drives those changes
How Many Peaks? As opposed to the 500-1500 genes changed, we find MyoD bound to 25,956 loci in myotubes (at 12-read cutoff; FDR < 10 -6 ) In myotubes and myoblasts both > 60,000 at ~.01 FDR (Excludes X, Y, repetitive regions)
much computational analysis Where are peaks? Concentrated at But 50% are >10k from any TSS promoters f A a A m g s i a u l i i r a n p r , r g i s r o e u n d Count Binds 41% of genes.
much computational analysis Where are peaks?
much computational analysis What’s it doing?
much computational analysis What else is it doing?
much computational analysis CTCF domains CTCF domains Promoter regions AUC=0.70 AUC=0.77 AUC=0.64 Pos: upregulated genes (504) Pos: upregulated genes (504) Pos: upregulated genes (384) Neg: non-regulated expressed Neg: non-regulated expressed Neg: intergenic (1294) genes (3789) genes (2278) Pos: upregulated genes (384) Neg: non-regulated expressed genes (2278) TP
Binding Site/Cofactor Motifs (See paper) Discriminative motif discovery, on very large scale E.g., 3 papers with related approaches appeared in Bioinformatics today
Summary MyoD present (& bound) in both myoblasts & myotubes Binds most genes, not just differentially expressed ones Significant genome-wide binding Although differentially bound peaks are associated with changed expression, peak height is a weak predictor of function Implicated in broad chromatin modifications (histone H4 acetylation) Motif discovery possible (but of limited predictive value in isolation)
Summary And math, stat, computational MyoD present (& bound) in both myoblasts & myotubes analysis is deeply interwoven Binds most genes, not just differentially expressed ones Significant genome-wide binding with everything here... Although differentially bound peaks are associated with changed expression, peak height is a weak predictor of function Implicated in broad chromatin modifications (histone H4 acetylation) Motif discovery possible (but of limited predictive value in isolation)
Thanks for a fun quarter!
Recommend
More recommend