Epigenomic enrichment analysis using Bioconductor EuroBioc 2019 – Brussels Dario Righelli – PhD Istituto per le Applicazioni del Calcolo «M. Picone» – CNR - Napoli d.righelli@na.iac.cnr.it || dario.righelli@gmail.com drighelli
What’s the aim? Compare methods and provide guidelines on epigenomic data analysis
ATAC-seq dataset Before Fear Induction Condition (E0) } 4 biological replicates } 4 biological replicates Catching differences in open chromatine After Fear Induction Condition (E1) regions } Yijing Su et al. 2017 - Nature Neuroscience - Neuronal activity modifies the chromatin accessibility landscape in the adult brain
ChIP-seq dataset (NULL dataset) Home Cage Controls - Histon 3, Lysine 9 Acetilation (H3K9ac) 9 biological replicates } } How many random differences are we able to catch inside a control dataset?
BWA and Bowtie2 perform the same o Most used aligners for epigenomics data o Correlation computed on ChIP-seq data coverages o used DeepTools plotCorrelation tool o Computed correlations on the coverages of the same samples on BWA and Bowtie2 bams have value of 1.
A Bioconductor Approach o MACS2 (No Bioconductor) o Most used peak caller o Broad and Narrow peaks option o DEScan2 Peak DESCan2 MACS2 CSAW o Has a peak detector in R Callers o Peak resolution -> bin size Broad Narrow o Can work with external peaks o DiffBind o No peak detection Peak o Fast on matrix construction Consensus & DESCan2 DiffBind CSAW o Uses external peaks Matrices o CSAW o Starts from BAM files o Computes matrix of bins x samples Differential edgeR o edgeR Enrichment o Widely used method o Very flexible in usage
Counts Normalization Affects Differentially Accessible Regions (DARs) ATAC-seq dataset o Pay attention to the normalization process o One tryes to apply a classic RNA-Seq normalization o The process does not always give the same results o Maybe some more specific normalization is required for this kind of data
Comparing DARs across methods ATAC − seq DARs o All the methods have the biggest 16652 overlap on the detected peaks 15000 o CSAW and DiffBind show a big amount 11982 of not-overlapping regions Intersection Size o DEScan2 shows the lowest number of 10000 7956 not-overlapping regions 7505 6597 6530 o The big amount of not-overlapping 5000 4491 regions by CSAW and DiffBind suggests a possible high-level of false positive 976 976 654 599 regions detected. 523 514 344 282 0 o Ad-hoc designed UpsetPlot on GRanges DEScan_Z10_K4_DARs o Based on findOverlaps method DiffBindNarrow CSAW DiffBindBroad 40000 20000 0 Set Size
Peaks contrasts on NULL dataset show no results H3K9ac ChIP-seq dataset o Compared performances on a null dataset of ChIP-seq H3K9ac samples 8 o Performed 126 permutations of samples o Samples are randomly divided in 6 two groups o All the possible permutations on 9 normalized nElem samples (126) 4 NO YES o All the methods find mostly 0 Differential Enriched Peaks on the 2 random conditions. o Sometimes some differences have been found 0 m m m m m m m m o With and without normalization r r r r r r r r o o o o o o o o N N N N N N N N o _ o _ o _ _ _ N 2 N a N r r d r r _ n _ o _ a a a 2 a a r r N N o c B r r n o a _ _ B S _ a r N 2 d E B 2 _ c M _ M n d S D _ 2 B i n E 2 _ _ M 2 i M 2 f B D S i f _ S _ D f f 2 E 2 E i D S D S D E E D D method
What’s Next? On-going and future works
Some comparisons are still needed o Compare CSAW on ChIP-seq o Compare normalization methods with all epigenomics methods o Explore in-silico biological functions of results o Testing ATAC—seq Single Cell dataset
Acknowledgements • Dr. Claudia Angelini – Istituto per le Applicazioni del Calcolo-CNR • Dr. Davide Risso – Univeristy of Padua • Dr. Lucia Peixoto – Elon S. Floyd College of Medicine, Washington State University • Dr. Timothy Triche Jr. - Van Andel Research Institute • Dr. Ben Johnson - Van Andel Research Institute • Thank you for your Attention!
Napoli R/ Bioconductor Meetup o Since Nov 2018 o R Consortium Array Group o At least 25 people any event with a good https://www.facebook.com/pg/NapoliRBiocMeetup turn-over of attendees o Eight meetups until now o R Package Creation o scRNA-seq Analysis o Differentially Methylated Regions Analysis o Microscope Image Processing o Chromosomal Copy Number Changes http://lists.moo.gs/mailman/listinfo/biocmeetup.naples Detection napoli.r.bioc@gmail.com o Bulk RNA-seq Differential Expression o Hi-C data analysis using HiCeekR o Metagenomics analysis workflow
Napoli R/Bioconductor Meetup • Part of a wider idea • Third city in the World • Boston (USA) • New York (USA) • Napoli (IT) • Useful to • share ideas and workflows • create new collaborations • extend bioinfo community
Is there a best Aligner? Bowtie2 vs BWA
Comparing DARs across methods (2) ATAC-seq dataset ATAC − seq Regions Nar/Broa & DEScan2 o Ad-hoc designed UpsetPlot on Granges 80000 73600 o Based on findOverlaps method 60794 o Results description 60000 Intersection Size 40000 24236 20000 16120 13994 5854 4376 2879 2424 2122 1913 1857 1457 1046722 703 466 410 351 350 327 241 137 100 93 89 9 5 5 3 2 0 DiffBindMACSNarrow DEScanZ10K4 DiffBindMACSBroad DEScanMACSBroad DEScanMACSNarrow 125000 100000 75000 50000 25000 0 Set Size
Duplicates Removal doesn’t impact peak detection o Diagonal Correlations on counts matrices show that DEScan2 DiffBind there is no big differences noDup_E0_1 noDup_E0_2 noDup_E0_3 noDup_E0_4 noDup_E1_1 noDup_E1_2 noDup_E1_3 noDup_E1_4 DiffBind between duplicates and noDup_E0_1 noDup_E0_2 noDup_E0_3 noDup_E0_4 noDup_E1_1 noDup_E1_2 noDup_E1_3 noDup_E1_4 1 no-duplicates samples withDup_E0_1 1 0.8 o rmDup with samtools withDup_E0_1 0.8 withDup_E0_2 0.6 withDup_E0_2 0.6 o DEScan2 counts matrices 0.4 withDup_E0_3 0.4 withDup_E0_3 0.2 withDup_E0_4 o DiffBind counts matrices 0.2 0 withDup_E0_4 withDup_E1_1 0 − 0.2 withDup_E1_1 Final Peaks with/without Duplicates withDup_E1_2 − 0.2 − 0.4 40000 withDup_E1_2 − 0.4 − 0.6 withDup_E1_3 − 0.6 − 0.8 withDup_E1_3 30000 withDup_E1_4 − 0.8 − 1 withDup_E1_4 20000 − 1 10000 0 Dup_DEScan2 noDup_DEScan2 Dup_DiffBind noDup_DiffBind
DEScan2 – Differential Enriched Scan 2 • Filter out the peaks with a score lower than a user-defined threshold • Aligns the peaks over user-defined number of samples • Different thresholds produce different trends in number of final peaks detected
Recommend
More recommend