CHiCAGO: Statistical methodology for signal detection in Capture Hi-C data Jonathan Cairns jonathan.cairns@babraham.ac.uk @jonathancairns Fraser/Spivakov labs, Babraham Insitute 4th October 2016
Table of Contents Introduction 1 The CHiCAGO model 2 Results 3 J. Cairns (Babraham Institute) regulatorygenomicsgroup.org/chicago 2 / 20
Table of Contents Introduction 1 The CHiCAGO model 2 Results 3 J. Cairns (Babraham Institute) regulatorygenomicsgroup.org/chicago 3 / 20
Motivation J. Cairns (Babraham Institute) regulatorygenomicsgroup.org/chicago 4 / 20
CHi-C: improved resolution at promoters, over Hi-C Lieberman-Aiden et al (2009) J. Cairns (Babraham Institute) regulatorygenomicsgroup.org/chicago 5 / 20
CHi-C: improved resolution at promoters, over Hi-C Approx. 12-fold increase in read coverage Sch¨ onfelder et al (2015), Mifsud et al (2015), Sahl´ en et al (2015) J. Cairns (Babraham Institute) regulatorygenomicsgroup.org/chicago 5 / 20
The data Align reads & filter out artefacts with HiCUP Wingett et al (2016) J. Cairns (Babraham Institute) regulatorygenomicsgroup.org/chicago 6 / 20
The data Align reads & filter out artefacts with HiCUP Obtain counts X ij : Wingett et al (2016) J. Cairns (Babraham Institute) regulatorygenomicsgroup.org/chicago 6 / 20
The data Align reads & filter out artefacts with HiCUP Obtain counts X ij : other ends (i) 823,000 1 3 7 5 4 0 0 0 0 1 0 2 0 4 6 5 4 0 baits (j) 0 0 1 2 0 4 6 9 10 ... 22,000 0 0 0 1 1 2 5 3 4 0 0 0 0 0 1 1 5 7 ... ... Wingett et al (2016) J. Cairns (Babraham Institute) regulatorygenomicsgroup.org/chicago 6 / 20
MIR625−201 (224546) MIR625 300 ● no interaction ● 200 N ● ● ● 100 ● ● ● ● ● ● 50 ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●●● ● ●● ● ●● ● ● ● ● ● ● ● ● ●●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ●● ● ●● ● ● ● ● ● 0 ● ● ● ● ● ●● ● ●●●● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −6e+05 −4e+05 −2e+05 0e+00 2e+05 4e+05 6e+05 Distance from viewpoint PPP1CB−004,PPP1CB−006,PPP1CB−005,PPP1CB−003,PPP1CB−001,PPP1CB−009,... (340147) PPP1CB ● 500 ● interaction ● 400 300 ● ● N 200 ● ● ● ● ● ● ● ● ● 100 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ●●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● 0 ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● −6e+05 −4e+05 −2e+05 0e+00 2e+05 4e+05 6e+05 Distance from viewpoint
Table of Contents Introduction 1 The CHiCAGO model 2 Results 3 J. Cairns (Babraham Institute) regulatorygenomicsgroup.org/chicago 8 / 20
CHiCAGO CHiCAGO – Capture Hi-C Analysis of Genomic Organization. J. Cairns (Babraham Institute) regulatorygenomicsgroup.org/chicago 9 / 20
Model Background comes from two sources: J. Cairns (Babraham Institute) regulatorygenomicsgroup.org/chicago 10 / 20
Model Background comes from two sources: Brownian Technical Source Random collisions Sequencing artefacts J. Cairns (Babraham Institute) regulatorygenomicsgroup.org/chicago 10 / 20
Model Background comes from two sources: Brownian Technical Source Random collisions Sequencing artefacts Depends on Yes (decreasing) No distance? J. Cairns (Babraham Institute) regulatorygenomicsgroup.org/chicago 10 / 20
Model Background comes from two sources: Brownian Technical Source Random collisions Sequencing artefacts Depends on Yes (decreasing) No distance? Dominates Close to bait Far from bait J. Cairns (Babraham Institute) regulatorygenomicsgroup.org/chicago 10 / 20
Model Background comes from two sources: Brownian Technical Source Random collisions Sequencing artefacts Depends on Yes (decreasing) No distance? Dominates Close to bait Far from bait Under H 0 (no interaction), counts are sum of the two components: X ij = B ij + T ij J. Cairns (Babraham Institute) regulatorygenomicsgroup.org/chicago 10 / 20
Brownian background estimation X ij = B ij + T ij J. Cairns (Babraham Institute) regulatorygenomicsgroup.org/chicago 11 / 20
Brownian background estimation X ij = B ij + T ij B ij ∼ NB, with E ( B ij ) = f ( d ij ) 50 40 30 250 20 10 0 50 40 30 1177 20 10 0 50 Expected count 40 30 5348 20 10 0 50 40 30 5373 20 10 0 50 40 30 5382 20 10 0 −500000 −250000 0 250000 500000 Distance from bait J. Cairns (Babraham Institute) regulatorygenomicsgroup.org/chicago 11 / 20
Brownian background estimation X ij = B ij + T ij B ij ∼ NB, with E ( B ij ) = f ( d ij ) × (bait bias) j 50 40 30 250 20 10 0 50 40 30 1177 20 10 0 50 Expected count 40 30 5348 20 10 0 50 40 30 5373 20 10 0 50 40 30 5382 20 10 0 −500000 −250000 0 250000 500000 Distance from bait J. Cairns (Babraham Institute) regulatorygenomicsgroup.org/chicago 11 / 20
Brownian background estimation X ij = B ij + T ij B ij ∼ NB, with E ( B ij ) = f ( d ij ) × (bait bias) j × (other end bias) i 50 40 30 250 20 10 0 50 40 30 1177 20 10 0 50 Expected count 40 30 5348 20 10 0 50 40 30 5373 20 10 0 50 40 30 5382 20 10 0 −500000 −250000 0 250000 500000 Distance from bait J. Cairns (Babraham Institute) regulatorygenomicsgroup.org/chicago 11 / 20
Distance function 2.5 2.0 1.5 log(f(d)) 1.0 0.5 0.0 −0.5 10 13 11 12 14 log(distance) Brownian background estimation X ij = B ij + T ij B ij ∼ NB, with E ( B ij ) = f ( d ij ) × (bait bias) j × (other end bias) i J. Cairns (Babraham Institute) regulatorygenomicsgroup.org/chicago 11 / 20
Distance function 2.5 2.0 1.5 log(f(d)) 1.0 0.5 0.0 −0.5 10 13 11 12 14 log(distance) Brownian background estimation X ij = B ij + T ij B ij ∼ NB, with E ( B ij ) = f ( d ij ) × (bait bias) j × (other end bias) i f ( d ): estimated close to bait ( < 1 . 5 Mb ) in 20 kb bins. J. Cairns (Babraham Institute) regulatorygenomicsgroup.org/chicago 11 / 20
Distance function 2.5 2.0 1.5 log(f(d)) 1.0 0.5 0.0 −0.5 10 13 11 12 14 log(distance) Brownian background estimation X ij = B ij + T ij B ij ∼ NB, with E ( B ij ) = f ( d ij ) × (bait bias) j × (other end bias) i f ( d ): estimated close to bait ( < 1 . 5 Mb ) in 20 kb bins. bin-wise estimates f ( d b ) from geometric mean across baits J. Cairns (Babraham Institute) regulatorygenomicsgroup.org/chicago 11 / 20
Brownian background estimation X ij = B ij + T ij B ij ∼ NB, with E ( B ij ) = f ( d ij ) × (bait bias) j × (other end bias) i Distance function f ( d ): 2.5 estimated close to bait 2.0 ( < 1 . 5 Mb ) in 20 kb bins. 1.5 log(f(d)) 1.0 bin-wise estimates f ( d b ) from geometric mean 0.5 across baits 0.0 −0.5 interpolation: cubic fit on log-log scale 10 13 11 12 14 log(distance) J. Cairns (Babraham Institute) regulatorygenomicsgroup.org/chicago 11 / 20
Brownian background estimation X ij = B ij + T ij B ij ∼ NB, with E ( B ij ) = f ( d ij ) × (bait bias) j × (other end bias) i Bait-specific bias: J. Cairns (Babraham Institute) regulatorygenomicsgroup.org/chicago 11 / 20
Recommend
More recommend