Visualizing ENCODE Data in the UCSC Genome Browser Pauline Fujita, - PowerPoint PPT Presentation

Visualizing ENCODE Data in the UCSC Genome Browser Pauline Fujita, Ph.D. UCSC Genome Bioinformatics Group

Training Resources genome@soe.ucsc.edu • Genomewiki: genomewiki.ucsc.edu • Mailing list archives: genome.ucsc.edu/FAQ/ • Training page: genome.ucsc.edu/training.html • Twitter @GenomeBrowser • Tutorial videos: YouTube channel • Open Helix: openhelix.com/ucsc

Outline • Basics: search, display, more info • Tools for finding ENCODE data • Annotating a BED file: RNAseq example • Annotating a VCF file • Track Hubs: What are they? How do I make one? • Exercises

Basic Navigation: Main Display genome.ucsc.edu/cgi-bin/hgTracks?db=hg19

Display Configuration • Visibility: hide, dense, squish, pack, full • Track ordering: drag and drop • Drag and zoom/highlighting • Configuration page • Right click menu

How to find more info Item Description Track Description

More info: Track Description

More info: Item Description

ENCODE

ENCODE: Super-track Settings

ENCODE: Track Settings

ENCODE: Item Details

ENCODE Tools

ENCODE ENCODE genome.ucsc.edu/ENCODE/

ENCODE: Experiment Matrix

ENCODE: ChIP-Seq Matrix

ENCODE: Experiment Summary

ENCODE: Track Search

File Formats Scale 2 kb hg19 BED chr2: 191,876,000 191,877,000 191,878,000 191,879,000 191,880,000 191,881,000 Basic Gene Annotation Set from GENCODE Version 19 STAT1 STAT1 STAT1 STAT1 STAT1 DNaseI Hypersensitivity by Digital DNaseI from ENCODE/University of Washington GM12878 Ht 2 GM12878 Pk 2 100 _ GM12878 DNaseI HS Raw Signal Rep 2 from ENCODE/UW GM12878 Sg 2 1 _ wig(gle) K562 TFBS Uniform Peaks of Znf143_(16618-1-AP) from ENCODE/Stanford/Analysis K562 Znf143 IgG-rab ChIP-seq Peaks from ENCODE/SYDH 40 _ K562 Znf143 IgG-rab ChIP-seq Signal from ENCODE/SYDH K562 Z143 IgR 3 _ K562 polyA+ IFNa30 RNA-seq Alignments from ENCODE/SYDH BAM Exome Aggregation Consortium (ExAC) - Variants from 60,706 Exomes A C G T G T A C bit.ly/fileformatsession A T VCF C C CTT C - T T C G A T C C T T C G A A G G A C T G A A C T C A G G C A G A G T C C A A G A G T C G C A G

File Formats Positional annotations. (ex. Scale 2 kb hg19 BED Regions w/: enriched ChIP-seq chr2: 191,876,000 191,877,000 191,878,000 191,879,000 191,880,000 191,881,000 Basic Gene Annotation Set from GENCODE Version 19 STAT1 STAT1 signal for TF binding, Δ ’l STAT1 STAT1 STAT1 DNaseI Hypersensitivity by Digital DNaseI from ENCODE/University of Washington methylation, splice jxns from GM12878 Ht 2 GM12878 Pk 2 100 _ GM12878 DNaseI HS Raw Signal Rep 2 from ENCODE/UW GM12878 Sg 2 RNA-seq) 1 _ wig(gle) K562 TFBS Uniform Peaks of Znf143_(16618-1-AP) from ENCODE/Stanford/Analysis K562 Znf143 IgG-rab ChIP-seq Peaks from ENCODE/SYDH Continuous signal data. # of 40 _ K562 Znf143 IgG-rab ChIP-seq Signal from ENCODE/SYDH K562 Z143 IgR 3 _ reads (ex. DNase I HS and K562 polyA+ IFNa30 RNA-seq Alignments from ENCODE/SYDH ChIP-seq signals) BAM Alignments of seq. reads, mapped to genome (ex. RNA- seq alignments) Exome Aggregation Consortium (ExAC) - Variants from 60,706 Exomes A C G T G T A C Variation data: SNPs, indels, A T VCF C C CTT C - T Copy Number Variants, T C G A Structural Variants (ex. ExAC T C C T data) T C G A A G G A C T G A A C T C A G G C A G A G T C C A A G A G T C G C A G

Indexed File Formats Scale 2 kb hg19 BED bigBed chr2: 191,876,000 191,877,000 191,878,000 191,879,000 191,880,000 191,881,000 Basic Gene Annotation Set from GENCODE Version 19 STAT1 STAT1 STAT1 STAT1 STAT1 DNaseI Hypersensitivity by Digital DNaseI from ENCODE/University of Washington GM12878 Ht 2 GM12878 Pk 2 100 _ GM12878 DNaseI HS Raw Signal Rep 2 from ENCODE/UW GM12878 Sg 2 1 _ wig(gle) bigWig K562 TFBS Uniform Peaks of Znf143_(16618-1-AP) from ENCODE/Stanford/Analysis K562 Znf143 IgG-rab ChIP-seq Peaks from ENCODE/SYDH 40 _ K562 Znf143 IgG-rab ChIP-seq Signal from ENCODE/SYDH K562 Z143 IgR 3 _ K562 polyA+ IFNa30 RNA-seq Alignments from ENCODE/SYDH BAM Exome Aggregation Consortium (ExAC) - Variants from 60,706 Exomes A C G T G T A C A T VCF C C CTT C - T T C G A T C C T T C G A A G G A C T G A A C T C A G G C A G A G T C C A A G A G T C G C A G

Indexed File Formats • Only displayed portions of files transferred to UCSC • Display large files (would time out) • File + index on your web-accessible server (http, https, or ftp) • Faster display • More user control

File Formats

File Formats www.encodeproject.org/help/file-formats/ Help File formats

Custom Tracks

Custom Tracks genome.ucsc.edu/cgi-bin/hgCustom

Custom Tracks genome.ucsc.edu/cgi-bin/hgCustom track name=”BED_custom_track” chr7 127471196 127472363 Gene1

Annotating your data: BED Tools Data Integrator

Data Integrator genome.ucsc.edu/cgi-bin/hgIntegrator

Data Integrator

Data Integrator http://genome.ucsc.edu/cgi-bin/hgIntegrator?hgsid=43297266... #ct_SYDHTFBS_4733.chrom ct_SYDHTFBS_4733.chromStart ct_SYDHTFBS_4733.chromEnd ct_SYDHTFBS_4733.name ct_SYDHTFBS_4733.score wgEncodeGencodeBasicV19.name wgEncodeGencodeBasicV19.name2 chr21 33031473 33032186 . 608 ENST00000449339.1 AP000253.1 chr21 33031473 33032186 . 608 ENST00000270142.6 SOD1 chr21 33031473 33032186 . 608 ENST00000389995.4 SOD1 chr21 33031473 33032186 . 608 ENST00000470944.1 SOD1 1 of 1 6/26/15, 3:20 PM

Annotating your VCF file 1. Make a VCF custom track 2. Go to the Variant Annotation Integrator 3. Choose your track 4. Add annotations

Remotely Hosted Custom Tracks • Put data file (bigBed/bigWig/BAM/VCF, etc) in internet accessible location • Must have: 1. track info, 2. bigDataUrl • VCF example: track type=vcfTabix name="VCF_Example" description="VCF Ex. 1: 1000 Genomes phase 1 interim SNVs" bigDataUrl= http://hgwdev.cse.ucsc.edu/~pauline/presentations/ vcfExample.vcf.gz

Variant Annotation Integrator • Upload pgSnp or VCF custom track • Associate UCSC annotations with your uploaded variant calls • Add dbSNP info if dbSNP identifier found • Select custom track and VAI options 37

Variant Annotation Integrator Tools Variant Annotation Integrator

Variant Annotation Integrator genome.ucsc.edu/cgi-bin/hgVai

Track Data Hubs   • Remotely hosted • Data persistence • File formats: bigBED, bigWig, BAM, VCF • Track organization: groups, supertracks • multiWigs • Assembly hubs

Track Hubs My Data Track Hubs

Track Hubs genome.ucsc.edu/cgi-bin/hgHubConnect My Data Track Hubs

My Hubs genome.ucsc.edu/cgi-bin/hgHubConnect My Data Track Hubs

Make Your Own Track Hub You will need: • Data (compressed binary index formats: bigBed, bigWig, BAM, VCF) • Text files to define properties of the track hub • Internet-enabled web/ftp server • Assembly Hubs: a twoBit sequence file

Track Hubs genome.ucsc.edu/cgi-bin/hgHubConnect My Data Track Hubs myHub/ - directory containing track hub files hub.txt - a short description of hub properties genomes.txt - list of genome assemblies included hg19/ - directory of data for the hg19 human assembly Data files! BAM, bigBed, bigWig, VCF

An Example Assembly Hub An Arabidopsis hub: http://genome-test.cse.ucsc.edu/ ~pauline/hubs/Plants/hub.txt

Acknowledgements UCSC Ge UCSC Geno nome me Br Browse wser t r team am – Da David Hau vid Haussle ssler – co r – co-PI -PI – Jim K im Kent – Br nt – Browse wser Co r Conce ncept, BLA pt, BLAT, T , Team Le am Leade ader, PI , PI hn – – Asso – Bo Bob K b Kuhn Associat ciate Dire Direct ctor, Ou , Outre treach – co ach – co-PI -PI – Do Donna K nna Kar arolchik lchik, Ann Z , Ann Zweig – Pr ig – Proje ject Manage ct Manageme ment nt Engineering Engine ring QA QA, Do , Docs, Su cs, Suppo pport t Sys-admins Sys-admins Katrina Learned Jorge Garcia Angie Hinrichs Pauline Fujita Erich Weiler Kate Rosenbloom Luvina Guruvadoo Gary Moro Hiram Clawson Steve Heitner Galt Barber Brian Lee Brian Raney Jonathan Caspar Max Haeussler Matt Speir

THE GB TEAM UC Santa Cruz Genomics Institute

Funding Sources Nation Na tional Huma l Human Gen Genome R ome Resea esearch In h Institut stitute (NHGRI) e (NHGRI) Na Nation tional Ca l Cancer cer In Institut stitute (NCI) e (NCI) Na Nation tional In l Institut stitute f e for or Den Denta tal a l and d Cr Cranio-F -Facia cial R l Resea esearch (NIDCR) h (NIDCR) Na Nation tional In l Institut stitute f e for or Child Hea Child Health a lth and Huma d Human De Developmen elopment (NICHD) t (NICHD) QB3 (UCB QB3 ( UCBerkele ley, UCSF , UCSF, UCSC) , UCSC) Amer America ican R Reco ecover ery a y and R d Rein einvestmen estment A t Act (ARRA) stimulus fun ct (ARRA) stimulus funds ds UC Santa Cruz Genomics Institute

  genome.ucsc.edu THANK YOU! UC Santa Cruz Genomics Institute

Visualizing ENCODE Data in the UCSC Genome Browser Pauline Fujita, - PowerPoint PPT Presentation

Visualizing ENCODE Data in the UCSC Genome Browser Pauline Fujita, Ph.D. UCSC Genome Bioinformatics Group Training Resources genome@soe.ucsc.edu Genomewiki: genomewiki.ucsc.edu Mailing list archives: genome.ucsc.edu/FAQ/ Training page:

Week 2: from categorical and ordered Express Separate Express Separate Arrange

ENCODE Encyclopedia Goal : Use a genome browser to show

ENCODE Element Browser Goal: to navigate the candidate DNA elements predicted by the ENCODE

Outline - Tasks - Map projections - Visualizing area data - Visualizing point data -

Molecular Biology in a Nutshell (via UCSC Genome Browser)

Visualizing Complex Systems CMPM 290A, F2017 Angus Forbes angus@ucsc.edu

Genome Reassembly From Fragments 7 January 2019 OSU CSE 1 Genome A genome is the encoding

Whole Genome Analysis and Annotation Adam Siepel Biological Statistics & Computational

Genome Sequencing & Analysis Core Resource Olivier Fedrigo Friday, October 19, 12 Reference

Visualizing Complex Systems CMPM 290A, F2017 Angus Forbes angus@ucsc.edu

Current Topics in Genome Analysis Fall 2006 Week 4: Mining Genomic Sequence Data Tyra G.

Mouse-Human ENCODE Revisited ENCODE Users Meeting Washington, DC July 1, 2015 1 Thomas R.

How To: Run the ENCODE long-RNA-seq analysis pipeline on DNAnexus Overview: In this exercise, we

Week 5: Manipulate, Facet, Reduce Encode Manipulate Facet Encode Manipulate Facet

Case Study: Montreal BIXI Bike Data Ryan Hafen Author, TrelliscopeJS DataCamp Visualizing Big

Genome Annotation The steps in genome sequencing Generate genome sequence Assembly ORF

From Text to Networks Tutorial @ DH 2018, Montreal Nils Reiter, Sandra Murr, Max Overbeck,

Using Type Annotations in Python by Philippe Fremy / IDEMIA Python code can be obscure def

Natural Language Processing Info 159/259 Lecture 5: Truth and ethics (Sept 6, 2018) David

Manipulating and Annotating Slides in a Multi-Display Environment Patrick Chiu, Qiong Liu, John

Discriminative Metric Learning in Nearest Neighbor Models for Image Annotation Matthieu

Annotation and Evaluation Diana Maynard, Niraj Aswani University of Sheffield University of

Knowtator A plug-in for creating training and evaluation data sets for Biomedical Natural

UI Object Access Colin S. Gordon, Werner M. Dietl, Michael D. Ernst, Dan Grossman University of

Sambuz

Useful Links

Newsletter

Mail Us

Visualizing ENCODE Data in the UCSC Genome Browser Pauline Fujita, - PowerPoint PPT Presentation

Visualizing ENCODE Data in the UCSC Genome Browser Pauline Fujita, Ph.D. UCSC Genome Bioinformatics Group Training Resources genome@soe.ucsc.edu Genomewiki: genomewiki.ucsc.edu Mailing list archives: genome.ucsc.edu/FAQ/ Training page:

Week 2: from categorical and ordered Express Separate Express Separate Arrange

ENCODE Encyclopedia Goal : Use a genome browser to show

ENCODE Element Browser Goal: to navigate the candidate DNA elements predicted by the ENCODE

Outline - Tasks - Map projections - Visualizing area data - Visualizing point data -

Molecular Biology in a Nutshell (via UCSC Genome Browser)

Visualizing Complex Systems CMPM 290A, F2017 Angus Forbes angus@ucsc.edu

Genome Reassembly From Fragments 7 January 2019 OSU CSE 1 Genome A genome is the encoding

Whole Genome Analysis and Annotation Adam Siepel Biological Statistics &amp; Computational

Genome Sequencing &amp; Analysis Core Resource Olivier Fedrigo Friday, October 19, 12 Reference

Visualizing Complex Systems CMPM 290A, F2017 Angus Forbes angus@ucsc.edu

Current Topics in Genome Analysis Fall 2006 Week 4: Mining Genomic Sequence Data Tyra G.

Mouse-Human ENCODE Revisited ENCODE Users Meeting Washington, DC July 1, 2015 1 Thomas R.

How To: Run the ENCODE long-RNA-seq analysis pipeline on DNAnexus Overview: In this exercise, we

Week 5: Manipulate, Facet, Reduce Encode Manipulate Facet Encode Manipulate Facet

Case Study: Montreal BIXI Bike Data Ryan Hafen Author, TrelliscopeJS DataCamp Visualizing Big

Genome Annotation The steps in genome sequencing Generate genome sequence Assembly ORF

From Text to Networks Tutorial @ DH 2018, Montreal Nils Reiter, Sandra Murr, Max Overbeck,

Using Type Annotations in Python by Philippe Fremy / IDEMIA Python code can be obscure def

Natural Language Processing Info 159/259 Lecture 5: Truth and ethics (Sept 6, 2018) David

Manipulating and Annotating Slides in a Multi-Display Environment Patrick Chiu, Qiong Liu, John

Discriminative Metric Learning in Nearest Neighbor Models for Image Annotation Matthieu

Annotation and Evaluation Diana Maynard, Niraj Aswani University of Sheffield University of

Knowtator A plug-in for creating training and evaluation data sets for Biomedical Natural

UI Object Access Colin S. Gordon, Werner M. Dietl, Michael D. Ernst, Dan Grossman University of

Sambuz

Useful Links

Newsletter

Mail Us

Whole Genome Analysis and Annotation Adam Siepel Biological Statistics & Computational

Genome Sequencing & Analysis Core Resource Olivier Fedrigo Friday, October 19, 12 Reference