exome sequencing in the rotterdam study jeroen van rooij
play

Exome Sequencing in the Rotterdam Study Jeroen van Rooij, - PowerPoint PPT Presentation

Characterization of exome sequencing data in population studies Exome Sequencing in the Rotterdam Study Jeroen van Rooij, PhD-Student Department of Internal Medicine, Department of Neurology SNPs and Human Diseases (15-11-2017) ERGO: The


  1. Characterization of exome sequencing data in population studies Exome Sequencing in the Rotterdam Study Jeroen van Rooij, PhD-Student Department of Internal Medicine, Department of Neurology SNP’s and Human Diseases (15-11-2017)

  2. ERGO: The Rotterdam Study Genome-Wide Association Studies

  3. Exome Sequencing of 3000 samples in the Rotterdam Study Funded by local and European grants (NGI- NCHA, NWO, BBMRI) n = 3,000 samples from RS-I Sequencing July 2011-2014; Nimblegen v2 3,000 RS-I samples were genotyped with Illumina’s Exome Array (overlap ~1,500) • CCDS (Sept 2009) Additional 1000 ADSP samples • miRBase (v14, Sept 2009) • RefSeq (Jan 2010) • 2,100,000 probes • 30,246 coding genes • 329,028 exons • 710 miRNAs • 36.5 Mb primary target • 44.1 Mb capture target

  4. Sequencing is done in-house, followed by standard BWA-GATK processing Illumina Compute Isilon Storage (150 TB) Dell Compute (128 cores) Illumina Hiseq 2000 x2 Alignment Variant-Calling • BclToFastQ • BaseQualityScore • ANNOVAR, (CASAVA) Recalibration, VCFtools • BWA (paired) • HaplotypeCaller • Chastity Filter IndelRealignment • PlinkSeq, SKAT, • SortSam, • VQSR (GATK) R MarkDuplicates • VarEval • Spotfire (picard) Demultiplexing Processing Analysis Second freeze released in November 2014; 2628 samples and ~700.000 variants, Third freeze released in March 2016; added QC for consortia-based analysis

  5. Annotating Dataset  After all QC’ing procedures, the dataset is frozen and released to other researchers  Each variant is annotated with a range of databases  Use the genotyping matrix + annotations for various purposes;  Normal metrics  Backward genetics  Population screening  Forward genetics  Population genetics

  6. Determining “ normal ” dataset characteristics #SNVs • Population properties • QC (outliers, missing data) • Technical feedback (ie; depth of coverage needed) Depth of Sequencing

  7. Cases versus controls in ERGO, or in combined datasets Adding to GWAS; finemapping GWAS hits and associating rare variants

  8. Determining variants in a (healthy) population Ie; someone sequenced a cases-series and needs controls

  9. Compare variants across populations ESP (N=6.503) ExAC (N=60.706) 1000G (N=2.504) 236 204 972 280 159 829 UK10K (N=3.781) 129 311 5.098 288 151 307 GONL (N=500) 326 114 61 121 319 5.368 Combined (N=73.994)

  10. And respective variant frequencies

  11. Population genetics; rare versus common SNVs

  12. Different composition of common vs rare variants

  13. Allowing us to characterize possibly damaging variants

  14. And understand the layout of our genome Count the number of variants per gene (correcting for gene size) Score = Gene Size / #SNVs Higher score means fewer variants Compare tails; mutation tolerant vs intolerant genes

  15. TTN MBNL3 NCOR1 RAB3IP BTBD9 CLASP1 MLLT4 EPM2AIP1SC5D VPS13C APOOL DCLRE1C NCAPH TGFB2 IKZF1 PIK3R1 BPTF RAB5B RNF41 CYB561D1 SYNE1 ANKRD12 CCDC88A NCOR2 NFIB CACNA1G CBFA2T2 ARNTL2 NHSL1 CEP170B GRSF1 KCNMA1 ZDHHC15 THSD7B SLC4A10 WAC COL18A1 GNE EMP2 STRIP2 NEB MKLN1 PHACTR2 BEND4 NEDD4L PPM1A REL MDM2 GPR107 TMEM59 NUCKS1 EPB41L1 DST IQGAP2 PLEKHA2 EIF5A2 GABRA4 LUZP2 LGSN NSUN4 DST KCNQ3 CACNA1B JMJD1C NR2C2 NFIC DLG2 COL11A1 DYSF MLLT3 SLC35D1 KLHL15 SYNGAP1 NPAS3 LRRTM2 RXRA KLF8 CEP63 SLITRK1 NR2F2 DGKH ATRX IKZF3 CACNG8 UBE2W DLG2 C1orf95 FAM168A KLHL3 ABI2 GOLGA7B OPA1 PARD3 DCAF17 ARHGEF40KDM6A MAP3K4 RLIM GON4L ERG TNRC6B MPRIP BCAT1 ATP2B4 UHMK1 GAS7 SLC4A4 TRIM13 GPR126 GRSF1 KLHDC10 ARPIN,C15o SMAD3 EPB41 GIT2 ENC1 OCLN EIF4G1 LCOR UCHL5 OPRM1 RORA CLCN5 RREB1 ACVR1C KLF7 DMBT1 MAPKBP1 PCDH15 PEG10 NCOA7 PCDH9 KIF13A SMAD3 GRIA4 DYRK1A KCNJ2 MATR3 MON2 FGF5 OPRM1 APC TACC2 ROBO2 ZAN CACNA1G HNRNPR DOCK8 NEBL EPB41L5 RIMS2 SAP30L GABRA4 FAM160B1TMEM194A PPFIA2 PUM1 SCRN1 PKNOX1 ATP8B3 PLEC MAST4 PPP1R12B TTN MGAT4A PTBP3 CACNA1D SLC45A4 GPR107 KIF21A NEURL1B CEP41 RUNX1 CTNND1 GOLGB1 STON1 PTPRE UBE3A IGF2 UTP15 CACNA1C ORAI2 SESN3 PAX5 NEDD4L KLF7 MLLT4 PLCB1 LY75 NRP2 SYT7 GNAO1 TRPM3 ZNF562 PAFAH1B1TTC33 MBNL1 PLEKHG5 MAP3K7 UBE2K AFF2 COL6A3 ZKSCAN1 GFPT1 USP15 CACNB4 NFIB IGSF3 MYH14 ORC4 RASGRF1 PTPN3 ANO6 EFCAB14 TP53INP1 LIN28B TPCN1 YAP1 LDLR ZMYND8 AFF2 SMAD2 IKZF3 BTBD9 POLH CHL1 FAM199X NEDD4 TMEM56,TM LAMP2 MLEC FGF12 KDM6A PREPL ZNF662 RGPD2 KIAA1598 SLC8A3 MRO RNF38 ENAH MTR ANKRD17 TSC1 IRAK3 PRDM2 SLC4A4 ZAK MYO16 ATP2B3 FOXP2 RAPGEF1 CDC73 IKZF1 KIAA0586 ACTR3 ALPK1 FBXO28 VAMP4 TLK2 NSL1 PTPRB ADAM22 CENPE ATP2A2 VAPB ACOX1 AGL PLAG1 SNX2 OSBPL3 PHKA1 AP3S2,C15o RGS5 HLF VCL KITLG PVRL3 SIN3B NR5A2 RAB3B MON2 PHACTR2 IL17RA PDE11A TRAF6 MR1 PARP8 PDPK1 ETV1 SP3 DDX6 RASSF6 ZNF652 ZNF407 EIF4G1 SLC39A9 CPT1A MBNL1 TLE4 DCUN1D5 BAHCC1 CELF2 PAX5 PCDH11X CREB5 VAPB SECISBP2LUACA MOB3B SEZ6L POGZ PIK3CB PVR TBL1X CXADR PKNOX1 POLR1B PLB1 ISY1,ISY1-RA LSM8 SPTBN1 IKZF3 NRXN3 PTPRD CD84 KCMF1 PITPNM3 KLF13 TMEM2 PTPRT NFYA PHF8 TLE3 ZNF75D HMHA1 CUL4B SORBS2 BIVM,BIVM SYT14 ENTPD1 ITPR1 ADAM22 DGKH PCDH11X MTX3 HNRNPR PCGF5 PDE5A TBC1D5 ZDHHC3 DLG2 SMIM12 FBXO45 MKLN1 MCUR1 FAM13B DACH1 ZMIZ2 GRIP1 ABL2 TRPS1 LDLRAD4 CYLD PPM1A MTX3 EIF4EBP2 OSBPL6 SAMD8 FBXO32 MFAP3L SPATS2L DCP1A ZFAND5 MXD1 MAP2 IPO8 YAP1 FBXL20 BAG5 ERBB4 CACNA1I LDLRAD4 PTPN13 SYNRG GAB1 MAP4K4 HIPK3 CDHR1 CDH4 RALGPS1 ABI2 JADE1 GRIA1 ZMYM3 MEF2A ZDHHC20 BCL2L11 RBM47 MTSS1 ABL2 MDM4 TMOD2 PAX5 ABCA2 KIRREL ZFP14 EGLN1 CACNA1C KALRN EPB41L1 PSEN1 KRAS TNIK PPP1R12A ARHGAP19 EFNA5 WAC L1CAM FAM217B RELN TFDP2 IL6ST ZNF678 ZYG11B TACC1 HNRNPR SYNJ2BP CPEB2 ADCYAP1R1 MAP4 TBC1D2B CCSER2 GNRHR CXXC4 TMEM236 CRKL WNK1 CDK14 DPYSL3 SCN8A CLCN5 ANKRD17 DNAL1 ILDR2 SPG11 IYD SENP6 PCLO PDE7A NRCAM TRIM44 MAVS COL25A1 ELK4 TOM1L2 LDOC1L MROH1 ZNF527 GCLM DMXL1 NIPBL OBSCN SCN5A PDE4D CREBRF GGCX PCDH15 ZBTB20 ORC4 ZNF24 KIAA0586 CACNA1F NFATC4 PPP1R12B TBL1X FOXJ3 CNOT1 CXADR VPS8 WNK3 ARHGEF12SRGAP3 BTBD7 PAPD5 DSTYK AGL SORT1 NOL9 MAPK1IP1L ZDHHC3 GNAL LIMCH1 USP8 GNAL ADCY10 AKT2 RAB5B UNKL LTBP4 Genes in Genes Genes Enrichment Enrichment Adjusted Kegg Pathway Name Pathway Observed Expected Ratio P-value P-value MAPK signaling pathway 268 23 5.7 4.1 1.4E-08 1.4E-06 Calcium signaling pathway 177 15 3.7 4.0 5.6E-06 5.0E-04 Renal cell carcinoma 70 9 1.5 6.1 1.6E-05 1.5E-03 Adherens junction 73 9 1.5 5.8 2.2E-05 2.1E-03 Chronic myeloid leukemia 73 9 1.5 5.8 2.2E-05 2.1E-03 Colorectal cancer 62 8 1.3 6.1 4.6E-05 4.4E-03 500 genes with lowest gene Chagas disease (American trypanosomiasis) 104 10 2.2 4.6 7.0E-05 6.7E-03 density scores Neurotrophin signaling pathway 127 11 2.7 4.1 8.1E-05 7.7E-03 Endocytosis 201 14 4.2 3.3 9.9E-05 9.4E-03 Regulation of actin cytoskeleton 212 14 4.5 3.1 2.0E-04 1.9E-02 Hypertrophic cardiomyopathy (HCM) 83 8 1.8 4.6 4.0E-04 3.8E-02 Type II diabetes mellitus 48 6 1.0 5.9 5.0E-04 4.8E-02 Tight junction 132 10 2.8 3.6 5.0E-04 4.8E-02 ErbB signaling pathway 87 8 1.8 4.4 5.0E-04 4.8E-02

Recommend


More recommend