visualizing encode data in the ucsc genome browser
play

Visualizing ENCODE Data in the UCSC Genome Browser Pauline Fujita, - PowerPoint PPT Presentation

Visualizing ENCODE Data in the UCSC Genome Browser Pauline Fujita, Ph.D. UCSC Genome Bioinformatics Group Training Resources genome@soe.ucsc.edu Genomewiki: genomewiki.ucsc.edu Mailing list archives: genome.ucsc.edu/FAQ/ Training page:


  1. Visualizing ENCODE Data in the UCSC Genome Browser Pauline Fujita, Ph.D. UCSC Genome Bioinformatics Group

  2. Training Resources genome@soe.ucsc.edu • Genomewiki: genomewiki.ucsc.edu • Mailing list archives: genome.ucsc.edu/FAQ/ • Training page: genome.ucsc.edu/training.html • Twitter @GenomeBrowser • Tutorial videos: YouTube channel • Open Helix: openhelix.com/ucsc

  3. Outline • Basics: search, display, more info • Tools for finding ENCODE data • Annotating a BED file: RNAseq example • Annotating a VCF file • Track Hubs: What are they? How do I make one? • Exercises

  4. Basic Navigation: Main Display genome.ucsc.edu/cgi-bin/hgTracks?db=hg19

  5. Display Configuration • Visibility: hide, dense, squish, pack, full • Track ordering: drag and drop • Drag and zoom/highlighting • Configuration page • Right click menu

  6. How to find more info Item Description Track Description

  7. More info: Track Description

  8. More info: Item Description

  9. ENCODE

  10. ENCODE: Super-track Settings

  11. ENCODE: Track Settings

  12. ENCODE: Item Details

  13. ENCODE Tools

  14. ENCODE ENCODE genome.ucsc.edu/ENCODE/

  15. ENCODE: Experiment Matrix

  16. ENCODE: ChIP-Seq Matrix

  17. ENCODE: Experiment Summary

  18. ENCODE: Track Search

  19. File Formats Scale 2 kb hg19 BED chr2: 191,876,000 191,877,000 191,878,000 191,879,000 191,880,000 191,881,000 Basic Gene Annotation Set from GENCODE Version 19 STAT1 STAT1 STAT1 STAT1 STAT1 DNaseI Hypersensitivity by Digital DNaseI from ENCODE/University of Washington GM12878 Ht 2 GM12878 Pk 2 100 _ GM12878 DNaseI HS Raw Signal Rep 2 from ENCODE/UW GM12878 Sg 2 1 _ wig(gle) K562 TFBS Uniform Peaks of Znf143_(16618-1-AP) from ENCODE/Stanford/Analysis K562 Znf143 IgG-rab ChIP-seq Peaks from ENCODE/SYDH 40 _ K562 Znf143 IgG-rab ChIP-seq Signal from ENCODE/SYDH K562 Z143 IgR 3 _ K562 polyA+ IFNa30 RNA-seq Alignments from ENCODE/SYDH BAM Exome Aggregation Consortium (ExAC) - Variants from 60,706 Exomes A C G T G T A C bit.ly/fileformatsession A T VCF C C CTT C - T T C G A T C C T T C G A A G G A C T G A A C T C A G G C A G A G T C C A A G A G T C G C A G

  20. File Formats Positional annotations. (ex. Scale 2 kb hg19 BED Regions w/: enriched ChIP-seq chr2: 191,876,000 191,877,000 191,878,000 191,879,000 191,880,000 191,881,000 Basic Gene Annotation Set from GENCODE Version 19 STAT1 STAT1 signal for TF binding, Δ ’l STAT1 STAT1 STAT1 DNaseI Hypersensitivity by Digital DNaseI from ENCODE/University of Washington methylation, splice jxns from GM12878 Ht 2 GM12878 Pk 2 100 _ GM12878 DNaseI HS Raw Signal Rep 2 from ENCODE/UW GM12878 Sg 2 RNA-seq) 1 _ wig(gle) K562 TFBS Uniform Peaks of Znf143_(16618-1-AP) from ENCODE/Stanford/Analysis K562 Znf143 IgG-rab ChIP-seq Peaks from ENCODE/SYDH Continuous signal data. # of 40 _ K562 Znf143 IgG-rab ChIP-seq Signal from ENCODE/SYDH K562 Z143 IgR 3 _ reads (ex. DNase I HS and K562 polyA+ IFNa30 RNA-seq Alignments from ENCODE/SYDH ChIP-seq signals) BAM Alignments of seq. reads, mapped to genome (ex. RNA- seq alignments) Exome Aggregation Consortium (ExAC) - Variants from 60,706 Exomes A C G T G T A C Variation data: SNPs, indels, A T VCF C C CTT C - T Copy Number Variants, T C G A Structural Variants (ex. ExAC T C C T data) T C G A A G G A C T G A A C T C A G G C A G A G T C C A A G A G T C G C A G

  21. Indexed File Formats Scale 2 kb hg19 BED bigBed chr2: 191,876,000 191,877,000 191,878,000 191,879,000 191,880,000 191,881,000 Basic Gene Annotation Set from GENCODE Version 19 STAT1 STAT1 STAT1 STAT1 STAT1 DNaseI Hypersensitivity by Digital DNaseI from ENCODE/University of Washington GM12878 Ht 2 GM12878 Pk 2 100 _ GM12878 DNaseI HS Raw Signal Rep 2 from ENCODE/UW GM12878 Sg 2 1 _ wig(gle) bigWig K562 TFBS Uniform Peaks of Znf143_(16618-1-AP) from ENCODE/Stanford/Analysis K562 Znf143 IgG-rab ChIP-seq Peaks from ENCODE/SYDH 40 _ K562 Znf143 IgG-rab ChIP-seq Signal from ENCODE/SYDH K562 Z143 IgR 3 _ K562 polyA+ IFNa30 RNA-seq Alignments from ENCODE/SYDH BAM Exome Aggregation Consortium (ExAC) - Variants from 60,706 Exomes A C G T G T A C A T VCF C C CTT C - T T C G A T C C T T C G A A G G A C T G A A C T C A G G C A G A G T C C A A G A G T C G C A G

  22. Indexed File Formats • Only displayed portions of files transferred to UCSC • Display large files (would time out) • File + index on your web-accessible server (http, https, or ftp) • Faster display • More user control

  23. File Formats

  24. File Formats

  25. File Formats

  26. File Formats www.encodeproject.org/help/file-formats/ Help File formats

  27. Custom Tracks

  28. Custom Tracks genome.ucsc.edu/cgi-bin/hgCustom

  29. Custom Tracks genome.ucsc.edu/cgi-bin/hgCustom track name=”BED_custom_track” chr7 127471196 127472363 Gene1

  30. Annotating your data: BED Tools Data Integrator

  31. Data Integrator genome.ucsc.edu/cgi-bin/hgIntegrator

  32. Data Integrator

  33. Data Integrator

  34. Data Integrator http://genome.ucsc.edu/cgi-bin/hgIntegrator?hgsid=43297266... #ct_SYDHTFBS_4733.chrom ct_SYDHTFBS_4733.chromStart ct_SYDHTFBS_4733.chromEnd ct_SYDHTFBS_4733.name ct_SYDHTFBS_4733.score wgEncodeGencodeBasicV19.name wgEncodeGencodeBasicV19.name2 chr21 33031473 33032186 . 608 ENST00000449339.1 AP000253.1 chr21 33031473 33032186 . 608 ENST00000270142.6 SOD1 chr21 33031473 33032186 . 608 ENST00000389995.4 SOD1 chr21 33031473 33032186 . 608 ENST00000470944.1 SOD1 1 of 1 6/26/15, 3:20 PM

  35. Annotating your VCF file 1. Make a VCF custom track 2. Go to the Variant Annotation Integrator 3. Choose your track 4. Add annotations

  36. Remotely Hosted Custom Tracks • Put data file (bigBed/bigWig/BAM/VCF, etc) in internet accessible location • Must have: 1. track info, 2. bigDataUrl • VCF example: track type=vcfTabix name="VCF_Example" description="VCF Ex. 1: 1000 Genomes phase 1 interim SNVs" bigDataUrl= http://hgwdev.cse.ucsc.edu/~pauline/presentations/ vcfExample.vcf.gz

  37. Variant Annotation Integrator • Upload pgSnp or VCF custom track • Associate UCSC annotations with your uploaded variant calls • Add dbSNP info if dbSNP identifier found • Select custom track and VAI options 37

  38. Variant Annotation Integrator Tools Variant Annotation Integrator

  39. Variant Annotation Integrator genome.ucsc.edu/cgi-bin/hgVai

  40. Track Data Hubs 
 • Remotely hosted • Data persistence • File formats: bigBED, bigWig, BAM, VCF • Track organization: groups, supertracks • multiWigs • Assembly hubs

  41. Track Hubs My Data Track Hubs

  42. Track Hubs genome.ucsc.edu/cgi-bin/hgHubConnect My Data Track Hubs

  43. My Hubs genome.ucsc.edu/cgi-bin/hgHubConnect My Data Track Hubs

  44. Make Your Own Track Hub You will need: • Data (compressed binary index formats: bigBed, bigWig, BAM, VCF) • Text files to define properties of the track hub • Internet-enabled web/ftp server • Assembly Hubs: a twoBit sequence file

  45. Track Hubs genome.ucsc.edu/cgi-bin/hgHubConnect My Data Track Hubs myHub/ - directory containing track hub files hub.txt - a short description of hub properties genomes.txt - list of genome assemblies included hg19/ - directory of data for the hg19 human assembly Data files! BAM, bigBed, bigWig, VCF

  46. An Example Assembly Hub An Arabidopsis hub: http://genome-test.cse.ucsc.edu/ ~pauline/hubs/Plants/hub.txt

  47. Acknowledgements UCSC Ge UCSC Geno nome me Br Browse wser t r team am – Da David Hau vid Haussle ssler – co r – co-PI -PI – Jim K im Kent – Br nt – Browse wser Co r Conce ncept, BLA pt, BLAT, T , Team Le am Leade ader, PI , PI hn – – Asso – Bo Bob K b Kuhn Associat ciate Dire Direct ctor, Ou , Outre treach – co ach – co-PI -PI – Do Donna K nna Kar arolchik lchik, Ann Z , Ann Zweig – Pr ig – Proje ject Manage ct Manageme ment nt Engineering Engine ring QA QA, Do , Docs, Su cs, Suppo pport t Sys-admins Sys-admins Katrina Learned Jorge Garcia Angie Hinrichs Pauline Fujita Erich Weiler Kate Rosenbloom Luvina Guruvadoo Gary Moro Hiram Clawson Steve Heitner Galt Barber Brian Lee Brian Raney Jonathan Caspar Max Haeussler Matt Speir

  48. THE GB TEAM UC Santa Cruz Genomics Institute

  49. Funding Sources Nation Na tional Huma l Human Gen Genome R ome Resea esearch In h Institut stitute (NHGRI) e (NHGRI) Na Nation tional Ca l Cancer cer In Institut stitute (NCI) e (NCI) Na Nation tional In l Institut stitute f e for or Den Denta tal a l and d Cr Cranio-F -Facia cial R l Resea esearch (NIDCR) h (NIDCR) Na Nation tional In l Institut stitute f e for or Child Hea Child Health a lth and Huma d Human De Developmen elopment (NICHD) t (NICHD) QB3 (UCB QB3 ( UCBerkele ley, UCSF , UCSF, UCSC) , UCSC) Amer America ican R Reco ecover ery a y and R d Rein einvestmen estment A t Act (ARRA) stimulus fun ct (ARRA) stimulus funds ds UC Santa Cruz Genomics Institute

  50. 
 genome.ucsc.edu THANK YOU! UC Santa Cruz Genomics Institute

Recommend


More recommend