The Epigenome Tools 2: ChIP-Seq and Data Analysis Chongzhi Zang - PowerPoint PPT Presentation

The Epigenome Tools 2: ChIP-Seq and Data Analysis Chongzhi Zang zang@virginia.edu http://zanglab.com PHS5705: Public Health Genomics March 20, 2017 1

Outline • Epigenome: basics review • ChIP-seq overview • ChIP-seq data analysis 2

Epigenome histone nucleosome The epigenome is a multitude of chemical compounds that can tell the genome what to do. The epigenome is made up of chemical compounds and proteins that can attach to DNA and direct such actions as turning genes on or off, controlling the production of proteins in particular cells. -- from genome.gov 3 Original figure from ENCODE, Darryl Leja (NHGRI), Ian Dunham (EBI)

Epigenomic marks • DNA methylation • Histone marks – Covalent modifications – Histone variants • Chromatin regulators – Histone modifying enzymes – Chromatin remodeling complexes • * Transcription factors 4

Histone modifications • Nucleosome Core Particles • Core Histones: H2A, H2B, H3, H4 Notation: H3K4me3 • Covalent modifications on histone tails include: methylation (me), acetylation (ac), phosphorylation … • Histone variants • Histone modifications are implicated in influencing gene expression. Allis C. et al. Epigenetics. 2006 5

Histone modifications associate with regulation of gene expression Differential expression log2 (fold-change) 0.35 Fractions of enhancers 0.30 0.25 0.20 0.15 0.10 0.05 0 H2AK5ac H2AK9ac H2A.Z H2BK5ac H2BK5me1 H2BK12ac H2BK20ac H2BK120ac H3K4ac H3K4me1 H3K4me2 H3K4me3 H3K9ac H3K9me1 H3K9me2 H3K9me3 H3K14ac H3K18ac H3K23ac H3K27ac H3K27me1 H3K27me2 H3K27me3 H3K36ac H3K36me1 H3K36me3 H3K79me1 H3K79me2 H3K79me3 H3R2me1 H3R2me2 H4K5ac H4K8ac H4K12ac H4K16ac H4K20me1 H4K20me3 H4K91ac 6 Wang, Zang et al. Nat Genet 2008

“Functions” of histone marks Table 3. Distinctive Chromatin Features of Genomic Elements Functional Annotation Histone Marks Promoters H3K4me3 Bivalent/Poised Promoter H3K4me3/H3K27me3 Transcribed Gene Body H3K36me3 Enhancer (both active and poised) H3K4me1 Poised Developmental Enhancer H3K4me1/H3K27me3 Active Enhancer H3K4me1/H3K27ac Polycomb Repressed Regions H3K27me3 Heterochromatin H3K9me3 7 Rivera & Ren Cell 2013

H3K4me3/H3K27me3 Bivalent Domain Repressed H3K4me3 H3K27me3 Remained Poised Induced From: https://pubs.niaaa.nih.gov/publications/arcr351/77-85.htm 8

ChIP-seq: Profiling epigenomes with sequencing histone nucleosome ATAC-seq 9 Original figure from ENCODE, Darryl Leja (NHGRI), Ian Dunham (EBI)

�� Published ChIP-seq datasets are skyrocketing We are entering the Big Data era Number of ChIP-seq datasets on GEO 3000 2500 � �� 2000 � �� 1500 1000 �� 500 0 ! �� Mei et al. Nucleic Acids Research 2016 10

Chromatin ImmunoPrecipitation (ChIP) 11

Protein-DNA crosslinking in vivo (for TF) 12

Chop the chromatin using sonication (TF) or micrococal nuclease (MNase) digestion (histone) 13

Specific factor-targeting antibody 14

Immunoprecipitation 15

DNA purification 16

PCR amplification and sequencing 17

ChIP-seq data analysis overview Scale 500 bases hg19 chr19: 15,308,000 15,308,100 15,308,200 15,308,300 15,308,400 15,308,500 15,308,600 15,308,700 15,308,800 15,308,900 15,309,000 15,309,100 15,309,200 User Supplied Track @ILLUMINA-8879DC:231:KK:3:1:1070:945 1:Y:0: NNNAATACAGTCAGAAACATATCATATTGGAGAATA #################################### @ILLUMINA-8879DC:231:KK:3:1:1153:945 1:Y:0: NNNAAGCACACAGAAGATAACTAAACAATCAAGTAG #################################### @ILLUMINA-8879DC:231:KK:3:1:1222:945 1:Y:0: NNNAAGGGTCTTGAGAAGAAATCATTCTGGATGGCA #################################### @ILLUMINA-8879DC:231:KK:3:1:1304:939 1:Y:0: NNNCCAGGCTCCCGCGATTCTCCTGCCTCAGCTTCT #################################### @ILLUMINA-8879DC:231:KK:3:1:1354:945 1:Y:0: NNNCTCTTCCTTAGCTAAACTTTCAACTAAGCCAAA #################################### @ILLUMINA-8879DC:231:KK:3:1:1411:932 1:Y:0: NNNGTAGGACCATTGGCGTTGCGACACAAAAAATTT #################################### @ILLUMINA-8879DC:231:KK:3:1:1496:937 1:Y:0: NNNTTCATCGGGTTGAGAGTCCCCTTGTTGCATGCA #################################### @ILLUMINA-8879DC:231:KK:3:1:1533:939 1:Y:0: NNNATTTTCCCGTTCCAGGTCGCAATTTCCGCCGTT #################################### @ILLUMINA-8879DC:231:KK:3:1:1573:940 1:Y:0: NNNGGGGTGCGCCTTTAGTCCCAGCTACTCAGGAAC #################################### 18

ChIP-seq data analysis overview • Where in the genome do these sequence reads come from? - Sequence alignment and quality control • What does the enrichment of sequences mean? - Peak calling • What can we learn from these data? – Downstream analysis and integration 19

ChIP-seq data analysis: basic processing • alignment of each sequence read: bowtie or BWA cannot map to the reference genome ✗ can map to multiple loci in the genome ✗ can map to a unique location in the ✔ genome • redundancy control: Langmead et al. 2009, ✔ Zang et al. 2009 20

ChIP-seq data analysis: Peak calling • pile-up profiling • DNA fragment size estimation peak model cross-correlation d s 0.055 0.35 forward tags reverse tags 0.05 0.30 0.045 0.25 0.04 0.035 Percentage 0.20 • Peak/signal 0.03 0.15 0.025 0.02 0.10 detection 0.015 0.05 0.01 0.005 0 50 100 150 200 250 300 350 400 − 600 − 400 − 200 0 200 400 600 Distance to the middle 21

ChIP-seq data analysis: Peak calling • Sharp peaks • Broad peaks transcription factor binding, Histone modifications, DNase, ATAC-seq “super-enhancers” Diffuse MACS (Zhang, 2008) dynamic background SICER (Zang, 2009) Poisson model Spatial clustering of localized weak signal and integrative Poisson model NOTCH1 H3K27ac 22 Wang, Zang et al. 2014

MACS • M odel-based A nalysis for C hIP- S eq • Tag distribution along the genome ~ Poisson distribution (λ BG = total tag / genome size) • ChIP-seq show local biases in the genome – Chromatin and sequencing bias – 200-300bp control windows have to few tags ChIP – But can look further Control Dynamic λ local = 300bp max( λ BG , [ λ ctrl , λ 1k , ] λ 5k , λ 10k ) 1kb 5kb 10kb http://liulab.dfci.harvard.edu/MACS/ Zhang et al, Genome Bio , 2008

SICER • S patial-clustering I dentification of C hIP- E nriched R egions 5kb ★★★★★ 10kb omictools.com 24 Zang et al. Bioinformatics 2009

ChIP-seq peak calling: Parameters Parameter Remarks Species and reference genome version, Genome e.g. hg38, hg18, mm10, mm9 Fraction of the mappable genome, vary in Effective genome rate species, read length, etc. Estimated by default; can specify DNA fragment size otherwise Data resolution, usually nucleosome Window size periodicity length, i.e. 200bp (for SICER only) Allowable gaps between Gap size eligible windows, usually 2 or 3 windows P-value cut-off Threshold for peak calling, from model Threshold for peak calling, BH correction False discovery rate (FDR) cut-off from p-value. 25

ChIP-seq data analysis: Review 1. Read mapping (sequence alignment) 2. Peak calling: MACS or SICER 1. QC 2. DNA fragment size estimation (for Single-end) 3. Pile-up profile generation 4. Peak/signal detection 3. Downstream analysis/integration 26

Data formats • fastq: raw sequences • BED: chr11 10344210 10344260 255 0 - chr4 76649430 76649480 255 0 + chr3 77858754 77858804 255 0 + chr16 62688333 62688383 255 0 + chr22 33031123 33031173 255 0 - • SAM/BAM: aligned sequencing reads • bedGraph, Wig, bigWig: pile-up profiles for browser visualization 27

Data flow Raw sequence • fastq reads Aligned • BAM/BED reads Bowtie/BWA Reference genome Profile; • bedGraph/Wig/bigWig Peaks • BED MACS/SICER 28

Galaxy: web-interface analysis platform • https://usegalaxy.org/ 29

Run MACS on Cistrome, a Galaxy-based platform • http://cistrome.org/ap/ 30

Run SICER on Galaxy-based platforms • http://services.cbib.u-bordeaux.fr/galaxy/ 31

ChIP-seq: Downstream analysis • Data visualization – UCSC genome browser: http://genome.ucsc.edu/ – WashU epigenome browser: http://epigenomegateway.wustl.edu/ – IGV: http://software.broadinstitute.org/software/igv/ • Meta analysis – CEAS: http://liulab.dfci.harvard.edu/CEAS/ • Integration with gene expression – BETA: http://cistrome.org/BETA/ – MARGE: http://cistrome.org/MARGE/ • Integration with other epigenomic data – GREAT: http://great.stanford.edu – ENCODE SCREEN: http://screen.umassmed.edu/ – MANCIE: https://cran.r-project.org/package=MANCIE – Cistrome DB: http://cistrome.org/db/ 32

BETA: Binding Expression Target Analysis � − ∆ ij � � • Regulatory Potential P ( g i ) = exp λ j ∈ S ( i ) TSS i j 33

MARGE: A big data driven, integrative regression and semi- supervised approach for predicting functional enhancers enhancer sample samples samples samples prediction selection 34 Wang, Zang et al. Genome Res 2016

ENCODE https://www.encodeproject.org/ 35

Cistrome Data Browser http://cistrome.org/db/ 36

The Epigenome Tools 2: ChIP-Seq and Data Analysis Chongzhi Zang - PowerPoint PPT Presentation

The Epigenome Tools 2: ChIP-Seq and Data Analysis Chongzhi Zang zang@virginia.edu http://zanglab.com PHS5705: Public Health Genomics March 20, 2017 1 Outline Epigenome: basics review ChIP-seq overview ChIP-seq data analysis 2

Methods for Analyzing ChIP-Seq data Introduction to the ChIP-Seq server at SIB Lausanne Public

Importing data Peter Humburg Statistician, Macquarie University DataCamp ChIP-seq Workflows in

Jen Grenier Director, TREx Facility Announcements New and Improved Project Submission Form

Introduction to Chromatin IP sequencing (ChIP-seq) data analysis Workshop on ChIP-seq data

ChIP-seq data analysis 04-05-12 Outlook Friday 04-05-12: Next-generation sequencing

Introduction to RNA-Seq Mary Piper Bioinformatics Consultant and Trainer DataCamp RNA-Seq

Scaling normalisation for ChIP-seq with exogenous chromatin Workshop on ChIP-seq data analysis

Re-analysis of a CD4 ChIP-Seq data set with csaw Ryan C. Thompson Salomon Lab The Scripps

Introduction to differential binding Peter Humburg Statistician, Macquarie University DataCamp

RNA-seq Data Analysis Introduction to RNA-seq data analysis June, 2018 1 Luigi Grassi < lg

International Symposium on Epigenome Dynamics and Regulation in Germ Cells February 17-19, 2016

Epigenome and Gene Expression 02-715 Advanced Topics in Computa8onal

International Human Epigenome Consortium (IHEC) Eric Marcotte, PhD Chair, IHEC Executive

Calibration des Microroc (II) Alex, Cyril, Giom, Jean, Max 09 Mai 2011, Annecy 1 Reminder 2

RNA-seq Data Analysis Introduction to RNA-seq data analysis September, 2018 1 Guillermo Parada

Genome-wide supervised ChIP-seq peak detection Toby Dylan Hocking toby.hocking@mail.mcgill.ca

Byzantine Generals Problem II & FLP Impossibility August 28, 2019 Recap Conditions to

The Consensus Problem Roger Wattenhofer thread a lot of kudos to memory Maurice Herlihy and

When Is Agreement Possible? CS 188 Distributed Systems February 24, 2015 Lecture 13 Page 1 CS

Outline Asynchronous shared memory model Wait-free Consensus in shared memory with R/W

CS6100: Topics in Design and Analysis of Algorithms Fault Tolerant Consensus CS6100 (Even 2012):

Tropical Secant Graphs of Monomial Curves Mar a Ang elica Cueto Shaowei Lin Department

Combinatorics of the Star Product in AQFT Eli Hawkins Kasia Rejzner The University of York July

The Liar Paradox and Other Puzzles: Games Logicians Play Dr. Sara L. Uckelman

Sambuz

Useful Links

Newsletter

Mail Us

The Epigenome Tools 2: ChIP-Seq and Data Analysis Chongzhi Zang - PowerPoint PPT Presentation

The Epigenome Tools 2: ChIP-Seq and Data Analysis Chongzhi Zang zang@virginia.edu http://zanglab.com PHS5705: Public Health Genomics March 20, 2017 1 Outline Epigenome: basics review ChIP-seq overview ChIP-seq data analysis 2

Methods for Analyzing ChIP-Seq data Introduction to the ChIP-Seq server at SIB Lausanne Public

Importing data Peter Humburg Statistician, Macquarie University DataCamp ChIP-seq Workflows in

Jen Grenier Director, TREx Facility Announcements New and Improved Project Submission Form

Introduction to Chromatin IP sequencing (ChIP-seq) data analysis Workshop on ChIP-seq data

ChIP-seq data analysis 04-05-12 Outlook Friday 04-05-12: Next-generation sequencing

Introduction to RNA-Seq Mary Piper Bioinformatics Consultant and Trainer DataCamp RNA-Seq

Scaling normalisation for ChIP-seq with exogenous chromatin Workshop on ChIP-seq data analysis

Re-analysis of a CD4 ChIP-Seq data set with csaw Ryan C. Thompson Salomon Lab The Scripps

Introduction to differential binding Peter Humburg Statistician, Macquarie University DataCamp

RNA-seq Data Analysis Introduction to RNA-seq data analysis June, 2018 1 Luigi Grassi &lt; lg

International Symposium on Epigenome Dynamics and Regulation in Germ Cells February 17-19, 2016

Epigenome and Gene Expression 02-715 Advanced Topics in Computa8onal

International Human Epigenome Consortium (IHEC) Eric Marcotte, PhD Chair, IHEC Executive

Calibration des Microroc (II) Alex, Cyril, Giom, Jean, Max 09 Mai 2011, Annecy 1 Reminder 2

RNA-seq Data Analysis Introduction to RNA-seq data analysis September, 2018 1 Guillermo Parada

Genome-wide supervised ChIP-seq peak detection Toby Dylan Hocking toby.hocking@mail.mcgill.ca

Byzantine Generals Problem II &amp; FLP Impossibility August 28, 2019 Recap Conditions to

The Consensus Problem Roger Wattenhofer thread a lot of kudos to memory Maurice Herlihy and

When Is Agreement Possible? CS 188 Distributed Systems February 24, 2015 Lecture 13 Page 1 CS

Outline Asynchronous shared memory model Wait-free Consensus in shared memory with R/W

CS6100: Topics in Design and Analysis of Algorithms Fault Tolerant Consensus CS6100 (Even 2012):

Tropical Secant Graphs of Monomial Curves Mar a Ang elica Cueto Shaowei Lin Department

Combinatorics of the Star Product in AQFT Eli Hawkins Kasia Rejzner The University of York July

The Liar Paradox and Other Puzzles: Games Logicians Play Dr. Sara L. Uckelman

Sambuz

Useful Links

Newsletter

Mail Us

RNA-seq Data Analysis Introduction to RNA-seq data analysis June, 2018 1 Luigi Grassi < lg

Byzantine Generals Problem II & FLP Impossibility August 28, 2019 Recap Conditions to