Visualizing ENCODE Data in the UCSC Genome Browser Pauline Fujita, Ph.D. UCSC Genome Bioinformatics Group
Training Resources genome@soe.ucsc.edu • Genomewiki: genomewiki.ucsc.edu • Mailing list archives: genome.ucsc.edu/FAQ/ • Training page: genome.ucsc.edu/training.html • Twitter @GenomeBrowser • Tutorial videos: YouTube channel • Open Helix: openhelix.com/ucsc
Outline • Basics: search, display, more info • Tools for finding ENCODE data • Annotating a BED file: RNAseq example • Annotating a VCF file • Track Hubs: What are they? How do I make one? • Exercises
Basic Navigation: Main Display genome.ucsc.edu/cgi-bin/hgTracks?db=hg19
Display Configuration • Visibility: hide, dense, squish, pack, full • Track ordering: drag and drop • Drag and zoom/highlighting • Configuration page • Right click menu
How to find more info Item Description Track Description
More info: Track Description
More info: Item Description
ENCODE
ENCODE: Super-track Settings
ENCODE: Track Settings
ENCODE: Item Details
ENCODE Tools
ENCODE ENCODE genome.ucsc.edu/ENCODE/
ENCODE: Experiment Matrix
ENCODE: ChIP-Seq Matrix
ENCODE: Experiment Summary
ENCODE: Track Search
File Formats Scale 2 kb hg19 BED chr2: 191,876,000 191,877,000 191,878,000 191,879,000 191,880,000 191,881,000 Basic Gene Annotation Set from GENCODE Version 19 STAT1 STAT1 STAT1 STAT1 STAT1 DNaseI Hypersensitivity by Digital DNaseI from ENCODE/University of Washington GM12878 Ht 2 GM12878 Pk 2 100 _ GM12878 DNaseI HS Raw Signal Rep 2 from ENCODE/UW GM12878 Sg 2 1 _ wig(gle) K562 TFBS Uniform Peaks of Znf143_(16618-1-AP) from ENCODE/Stanford/Analysis K562 Znf143 IgG-rab ChIP-seq Peaks from ENCODE/SYDH 40 _ K562 Znf143 IgG-rab ChIP-seq Signal from ENCODE/SYDH K562 Z143 IgR 3 _ K562 polyA+ IFNa30 RNA-seq Alignments from ENCODE/SYDH BAM Exome Aggregation Consortium (ExAC) - Variants from 60,706 Exomes A C G T G T A C bit.ly/fileformatsession A T VCF C C CTT C - T T C G A T C C T T C G A A G G A C T G A A C T C A G G C A G A G T C C A A G A G T C G C A G
File Formats Positional annotations. (ex. Scale 2 kb hg19 BED Regions w/: enriched ChIP-seq chr2: 191,876,000 191,877,000 191,878,000 191,879,000 191,880,000 191,881,000 Basic Gene Annotation Set from GENCODE Version 19 STAT1 STAT1 signal for TF binding, Δ ’l STAT1 STAT1 STAT1 DNaseI Hypersensitivity by Digital DNaseI from ENCODE/University of Washington methylation, splice jxns from GM12878 Ht 2 GM12878 Pk 2 100 _ GM12878 DNaseI HS Raw Signal Rep 2 from ENCODE/UW GM12878 Sg 2 RNA-seq) 1 _ wig(gle) K562 TFBS Uniform Peaks of Znf143_(16618-1-AP) from ENCODE/Stanford/Analysis K562 Znf143 IgG-rab ChIP-seq Peaks from ENCODE/SYDH Continuous signal data. # of 40 _ K562 Znf143 IgG-rab ChIP-seq Signal from ENCODE/SYDH K562 Z143 IgR 3 _ reads (ex. DNase I HS and K562 polyA+ IFNa30 RNA-seq Alignments from ENCODE/SYDH ChIP-seq signals) BAM Alignments of seq. reads, mapped to genome (ex. RNA- seq alignments) Exome Aggregation Consortium (ExAC) - Variants from 60,706 Exomes A C G T G T A C Variation data: SNPs, indels, A T VCF C C CTT C - T Copy Number Variants, T C G A Structural Variants (ex. ExAC T C C T data) T C G A A G G A C T G A A C T C A G G C A G A G T C C A A G A G T C G C A G
Indexed File Formats Scale 2 kb hg19 BED bigBed chr2: 191,876,000 191,877,000 191,878,000 191,879,000 191,880,000 191,881,000 Basic Gene Annotation Set from GENCODE Version 19 STAT1 STAT1 STAT1 STAT1 STAT1 DNaseI Hypersensitivity by Digital DNaseI from ENCODE/University of Washington GM12878 Ht 2 GM12878 Pk 2 100 _ GM12878 DNaseI HS Raw Signal Rep 2 from ENCODE/UW GM12878 Sg 2 1 _ wig(gle) bigWig K562 TFBS Uniform Peaks of Znf143_(16618-1-AP) from ENCODE/Stanford/Analysis K562 Znf143 IgG-rab ChIP-seq Peaks from ENCODE/SYDH 40 _ K562 Znf143 IgG-rab ChIP-seq Signal from ENCODE/SYDH K562 Z143 IgR 3 _ K562 polyA+ IFNa30 RNA-seq Alignments from ENCODE/SYDH BAM Exome Aggregation Consortium (ExAC) - Variants from 60,706 Exomes A C G T G T A C A T VCF C C CTT C - T T C G A T C C T T C G A A G G A C T G A A C T C A G G C A G A G T C C A A G A G T C G C A G
Indexed File Formats • Only displayed portions of files transferred to UCSC • Display large files (would time out) • File + index on your web-accessible server (http, https, or ftp) • Faster display • More user control
File Formats
File Formats
File Formats
File Formats www.encodeproject.org/help/file-formats/ Help File formats
Custom Tracks
Custom Tracks genome.ucsc.edu/cgi-bin/hgCustom
Custom Tracks genome.ucsc.edu/cgi-bin/hgCustom track name=”BED_custom_track” chr7 127471196 127472363 Gene1
Annotating your data: BED Tools Data Integrator
Data Integrator genome.ucsc.edu/cgi-bin/hgIntegrator
Data Integrator
Data Integrator
Data Integrator http://genome.ucsc.edu/cgi-bin/hgIntegrator?hgsid=43297266... #ct_SYDHTFBS_4733.chrom ct_SYDHTFBS_4733.chromStart ct_SYDHTFBS_4733.chromEnd ct_SYDHTFBS_4733.name ct_SYDHTFBS_4733.score wgEncodeGencodeBasicV19.name wgEncodeGencodeBasicV19.name2 chr21 33031473 33032186 . 608 ENST00000449339.1 AP000253.1 chr21 33031473 33032186 . 608 ENST00000270142.6 SOD1 chr21 33031473 33032186 . 608 ENST00000389995.4 SOD1 chr21 33031473 33032186 . 608 ENST00000470944.1 SOD1 1 of 1 6/26/15, 3:20 PM
Annotating your VCF file 1. Make a VCF custom track 2. Go to the Variant Annotation Integrator 3. Choose your track 4. Add annotations
Remotely Hosted Custom Tracks • Put data file (bigBed/bigWig/BAM/VCF, etc) in internet accessible location • Must have: 1. track info, 2. bigDataUrl • VCF example: track type=vcfTabix name="VCF_Example" description="VCF Ex. 1: 1000 Genomes phase 1 interim SNVs" bigDataUrl= http://hgwdev.cse.ucsc.edu/~pauline/presentations/ vcfExample.vcf.gz
Variant Annotation Integrator • Upload pgSnp or VCF custom track • Associate UCSC annotations with your uploaded variant calls • Add dbSNP info if dbSNP identifier found • Select custom track and VAI options 37
Variant Annotation Integrator Tools Variant Annotation Integrator
Variant Annotation Integrator genome.ucsc.edu/cgi-bin/hgVai
Track Data Hubs • Remotely hosted • Data persistence • File formats: bigBED, bigWig, BAM, VCF • Track organization: groups, supertracks • multiWigs • Assembly hubs
Track Hubs My Data Track Hubs
Track Hubs genome.ucsc.edu/cgi-bin/hgHubConnect My Data Track Hubs
My Hubs genome.ucsc.edu/cgi-bin/hgHubConnect My Data Track Hubs
Make Your Own Track Hub You will need: • Data (compressed binary index formats: bigBed, bigWig, BAM, VCF) • Text files to define properties of the track hub • Internet-enabled web/ftp server • Assembly Hubs: a twoBit sequence file
Track Hubs genome.ucsc.edu/cgi-bin/hgHubConnect My Data Track Hubs myHub/ - directory containing track hub files hub.txt - a short description of hub properties genomes.txt - list of genome assemblies included hg19/ - directory of data for the hg19 human assembly Data files! BAM, bigBed, bigWig, VCF
An Example Assembly Hub An Arabidopsis hub: http://genome-test.cse.ucsc.edu/ ~pauline/hubs/Plants/hub.txt
Acknowledgements UCSC Ge UCSC Geno nome me Br Browse wser t r team am – Da David Hau vid Haussle ssler – co r – co-PI -PI – Jim K im Kent – Br nt – Browse wser Co r Conce ncept, BLA pt, BLAT, T , Team Le am Leade ader, PI , PI hn – – Asso – Bo Bob K b Kuhn Associat ciate Dire Direct ctor, Ou , Outre treach – co ach – co-PI -PI – Do Donna K nna Kar arolchik lchik, Ann Z , Ann Zweig – Pr ig – Proje ject Manage ct Manageme ment nt Engineering Engine ring QA QA, Do , Docs, Su cs, Suppo pport t Sys-admins Sys-admins Katrina Learned Jorge Garcia Angie Hinrichs Pauline Fujita Erich Weiler Kate Rosenbloom Luvina Guruvadoo Gary Moro Hiram Clawson Steve Heitner Galt Barber Brian Lee Brian Raney Jonathan Caspar Max Haeussler Matt Speir
THE GB TEAM UC Santa Cruz Genomics Institute
Funding Sources Nation Na tional Huma l Human Gen Genome R ome Resea esearch In h Institut stitute (NHGRI) e (NHGRI) Na Nation tional Ca l Cancer cer In Institut stitute (NCI) e (NCI) Na Nation tional In l Institut stitute f e for or Den Denta tal a l and d Cr Cranio-F -Facia cial R l Resea esearch (NIDCR) h (NIDCR) Na Nation tional In l Institut stitute f e for or Child Hea Child Health a lth and Huma d Human De Developmen elopment (NICHD) t (NICHD) QB3 (UCB QB3 ( UCBerkele ley, UCSF , UCSF, UCSC) , UCSC) Amer America ican R Reco ecover ery a y and R d Rein einvestmen estment A t Act (ARRA) stimulus fun ct (ARRA) stimulus funds ds UC Santa Cruz Genomics Institute
genome.ucsc.edu THANK YOU! UC Santa Cruz Genomics Institute
Recommend
More recommend