British Columbia Cancer Agency Genome Sciences Centre Vancouver . British Columbia . Canada Complemen(ng*Computa(on*with* Visualiza(on*in*Genomics* March*11,*2010* EBI*Interfaces*Interest*Forum* Cydney*Nielsen*
Discovery*path* Biological*Sample* Genomic*Data* Scien(fic*Insight*
Discovery*path* Biological*Sample* Genomic*Data* Scien(fic*Insight*
Components*of*Data*Analysis* Automa(on* Analysis* Genomic*Data* Scien(fic*Insight* Human*Judgment*
Outline* • Genome*Assembly*Visualiza(on* – ABySSMExplorer* • Complement*to*genome*browsing** – Using*clustering*and*interac(ve*data* explora(on*
Outline* • Genome*Assembly*Visualiza(on* – ABySSMExplorer* • Complement*to*genome*browsing** – Using*clustering*and*interac(ve*data* explora(on*
Genome*Sequencing* cell*popula(on* extracted*DNA* Shotgun approach sheared*DNA* sequencing*reads* AGCGGATTGCATGACAGT* GCGCTACGATCAGATCAA* GTACAGCCTGACAGAAGC* CATGACAGTCCGAGTACA* TTCAGAATGGTACAGCAG*
ABySS*–*Assembly*By*Short*Sequences* Simpson* et$al. *Genome*Res*2009 $ GGACA GGACATC TC Sequencing*read*set*(read*length*=*7*nt):* GGACA GGACAGA GA Corresponding*de*Bruijn*graph*( k *=*5*nt):*
ABySS*–*Assembly*By*Short*Sequences* Simpson* et$al. *Genome*Res*2009 $ GGACA GGACATC TC Sequencing*read*set*(read*length*=*7*nt):* GGACA GGACAGA GA Corresponding*de*Bruijn*graph*( k *=*5*nt):* ABySS*merges*unambiguously*connected*ver(ces*to*form*con(gs*
Assembly*Ambigui(es* True*genome*sequence* GGATTGAAAAAAAAAAAAAAAAGTAGCACGAATATACATAGAAAAAAAAAAAAAAAAATTACG
Assembly*Ambigui(es* True*genome*sequence* GGATTGAAAAAAAAAAAAAAAAGTAGCACGAATATACATAGAAAAAAAAAAAAAAAAATTACG Assembled sequence de Bruijn graph representation
Star(ng*Point* Shaun*Jackman*
Example*of*exis(ng*tools:*Consed*
Example*of*exis(ng*tools:*Consed*
Proper(es*of*DNA*
Capture*sequence*strand* AAAAAT 2+* 1+*
Capture*sequence*strand* AAAAAT 2+* 1+* TTTTTA 2M* 1M*
Capture*sequence*strand* AAAAAT 1+* 2+* TTTTTA
Capture*sequence*strand* AAAAAT 1M* 2M* TTTTTA
Capture*sequence*length* one*oscilla(on*=*100*nt*
Genome*Sequencing* cell*popula(on* extracted*DNA* read*pair*informa(on* read* sheared*DNA* dsDNA* fragment* (known*size)* sequencing*reads* AGCGGATTGCATGACAGT* read* GCGCTACGATCAGATCAA* GTACAGCCTGACAGAAGC* (typically*produce*millions)* CATGACAGTCCGAGTACA* TTCAGAATGGTACAGCAG*
Capture*read*pair*informa(on* A_er*building*the*ini(al*singleMend*(SE)*con(gs*from* k Mmer*sequences,* ABySS*uses*pairedMend*reads*to*resolve*ambigui(es.*
Capture*read*pair*informa(on* Paired*end*read*informa(on*is*used*the*construct*paired*end*(PE)*con(gs* …* 13+$$44&$$46+$$4+$$79+$$70+ *…* blue*gradient*=*paired*end*con(g* orange*=*selected*single*end*con(g*
ABySSMExplorer* • *Visual*representa(on*of:* • *con(g*adjacency*informa(on* • *con(g*strand* • *con(g*length* • *pairedMend*rela(onships* • *pairedMend*con(gs* * • *Implemented*using*the*Java*Universal*Network/Graph* Framework*(JUNG)* • *Applied*the*KamadaMKawai*layout*algorithm*(JUNG* implementa(on)* • *Use*ABySS*files*as*input*(version*1.1.0*and*higher)*
hdp://www.bcgsc.ca/plaeorm/bioinfo/so_ware/abyssMexplorer*
Part*1:*Conclusions*and*Future*Work* • *Graph*encoding*provides*a*integrated*display*of* genome*assemblies*and*associated*metaMdata* • *This*representa(on*is*par(cularly*powerful*for* revealing*highMlevel*genome*assembly*structure,* not*readily*viewable*in*any*other*interac(ve*tool* • *Future*work*includes:* • *support*for*other*assembly*algorithm*outputs** • *enable*flexible*annota(on*display* • *integrate*with*exis(ng*assembly*edi(ng*tools*
Outline* • Genome*Assembly*Visualiza(on* – ABySSMExplorer* • Complement*to*genome*browsing** – Using*clustering*and*interac(ve*data* explora(on*
Genome*Sequencing* cell*popula(on* extracted*DNA* sheared*DNA* sequencing*reads* AGCGGATTGCATGACAGT* GCGCTACGATCAGATCAA* GTACAGCCTGACAGAAGC* (typically*produce*millions)* CATGACAGTCCGAGTACA* TTCAGAATGGTACAGCAG*
Genome*Sequencing* cell*popula(on* extracted*DNA* sheared*DNA* sequencing*reads* AGCGGATTGCATGACAGT* GCGCTACGATCAGATCAA* GTACAGCCTGACAGAAGC* (typically*produce*millions)* CATGACAGTCCGAGTACA* TTCAGAATGGTACAGCAG*
Genome*Sequencing* cell*popula(on* Chroma(n* Immunoprecipita(on* and*Sequencing** extracted*DNA* (ChIPMSeq)* selec/on * sheared*DNA* sequencing*reads* AGCGGATTGCATGACAGT* GCGCTACGATCAGATCAA* GTACAGCCTGACAGAAGC* (typically*produce*millions)* GTACAGCCTGACAGAAGC* CATGACAGTCCGAGTACA* TTCAGAATGGTACAGCAG* TTCAGAATGGTACAGCAG*
Align*sequences*to*the*genome* CCGAGTACAGCCTGACAGA CCGAGTACAGCCTGACAGA $ GCATGACAGTCCGAGTAC GCATGACAGTCCGAGTAC $ TTGCATGACAGTCCGAGT $ TTGCATGACAGTCCGAGT AGCGGATTGCATGACAGT $ AGCGGATTGCATGACAGT AGCGGATTGCATGACAGT $ AGCGGATTGCATGACAGT AGCGGATTGCATGACAGT $ AGCGGATTGCATGACAGT Reference$Genome $ AGCGGATTGCATGACAGTCCGAGTACAGCCTGACAGA AGCGGATTGCATGACAGTCCGAGTACAGCCTGACAGA Read$coverage $ Genomic$coordinate $
Genome*browser*can*reveal*local*paderns* H3K4me3 * H3K36me3 * H3K27me3 * H3K9me3 * H3K9Ac * MRE *
Difficult*to*get*global*overview*
Focus*on*regions*of*interest* 1.*For*example,*transcrip(onal*start*sites*(TSS*+/M*3000*nt)* H3K4me3* H3K9Ac* H3K4me1* H3K36me3* MeDIP* MRE* 2.*Extract*data*matrices* Normaliza(on*for*bin* i ,*sample* h :* 3.*Cluster*matrices*(kMmeans*clustering*with*Euclidean*distance)*
Focus*on*regions*of*interest* 1.*For*example,*transcrip(onal*start*sites*(TSS*+/M*3000*nt)* H3K4me3* H3K9Ac* H3K4me1* H3K36me3* MeDIP* MRE* 2.*Extract*data*matrices* Normaliza(on*for*bin* i ,*sample* h :* 3.*Cluster*matrices*(kMmeans*clustering*with*Euclidean*distance)*
Focus*on*regions*of*interest* 1.*For*example,*transcrip(onal*start*sites*(TSS*+/M*3000*nt)* H3K4me3* H3K9Ac* H3K4me1* H3K36me3* MeDIP* MRE* 2.*Extract*data*matrices* Normaliza(on*for*bin* i ,*sample* h :* 3.*Cluster*matrices*( k Mmeans*clustering*with*Euclidean*distance)*
Enable*interac(ve*explora(on* 4.*Interac(ve*cluster*visualiza(on*(data*from*H1*cells)* cluster*size*indicator*(total*n=*15,618)* H3K4me3* H3K9Ac* H3K4me1* H3K36me3* H3K4me3* H3K27me3* H3K9me3* H3K9Ac* MeDIP* H3K4me1* MRE* mRNA* H3K36me3* cluster** H3K27me3* (average*values*displayed)* individual*TSS* H3K9me3* MeDIP* H3K4me3* H3K9Ac* MRE* H3K4me1* H3K36me3* mRNA* H3K27me3* H3K9me3* MeDIP* MRE* HOXC12*gene* mRNA* scroll*bar*to*explore*all* cluster*members* 5.*LinkMout*to*UCSC*genome*browser*
Enable*interac(ve*explora(on* 4.*Interac(ve*cluster*visualiza(on*(data*from*H1*cells)* cluster*size*indicator*(total*n=*15,618)* H3K4me3* H3K9Ac* H3K4me1* H3K36me3* H3K4me3* H3K27me3* H3K9me3* H3K9Ac* MeDIP* H3K4me1* MRE* mRNA* H3K36me3* cluster** H3K27me3* (average*values*displayed)* individual*TSS* H3K9me3* MeDIP* H3K4me3* H3K9Ac* MRE* H3K4me1* H3K36me3* mRNA* H3K27me3* H3K9me3* MeDIP* MRE* mRNA* scroll*bar*to*explore*all* cluster*members*
Enable*interac(ve*explora(on* 4.*Interac(ve*cluster*visualiza(on*(data*from*H1*cells)* cluster*size*indicator*(total*n=*15,618)* H3K4me3* H3K9Ac* H3K4me1* H3K36me3* H3K4me3* H3K27me3* H3K9me3* H3K9Ac* MeDIP* H3K4me1* MRE* mRNA* H3K36me3* cluster** H3K27me3* (average*values*displayed)* individual*TSS* H3K9me3* MeDIP* H3K4me3* H3K9Ac* MRE* H3K4me1* H3K36me3* mRNA* H3K27me3* H3K9me3* MeDIP* MRE* HOXC12*gene* mRNA* scroll*bar*to*explore*all* cluster*members* 5.*LinkMout*to*UCSC*genome*browser*
Part*2:*Conclusions*and*Future*Work* • *Clustering*reveals*paderns*that*were*not*obvious* using*a*genome*browser.* * • *Access*to*both*global*and*detailed*view*is*valuable* • *Future*work*includes:* • *search*func(onality*(e.g.*by*region*id)* • *integra(on*with*other*clustering*tools* • *richer*analysis*func(onality*(e.g.*interac(ve* clustering)*
Acknowledgements* NIH*Epigenomics*Roadmap* ABySSMExplorer* Joe Costello, UCSF Shaun Jackman Peggy Farnham, UC Davis İ nanç Birol Thea Tlsty, UCSF Jason Chang Marco Marra Martin Hirst Lymphoma*Project*Analyst* Yongjun Zhao Karen Mungall Nina Thiessen Richard Varhol Supervisor* Primary*Data*Genera(on* Steven Jones Lymphoma Genomics Team
Recommend
More recommend