complemen ng computa on with visualiza on in genomics
play

Complemen(ng*Computa(on*with* Visualiza(on*in*Genomics* - PowerPoint PPT Presentation

British Columbia Cancer Agency Genome Sciences Centre Vancouver . British Columbia . Canada Complemen(ng*Computa(on*with* Visualiza(on*in*Genomics* March*11,*2010* EBI*Interfaces*Interest*Forum* Cydney*Nielsen* Discovery*path*


  1. British Columbia Cancer Agency Genome Sciences Centre Vancouver . British Columbia . Canada Complemen(ng*Computa(on*with* Visualiza(on*in*Genomics* March*11,*2010* EBI*Interfaces*Interest*Forum* Cydney*Nielsen*

  2. Discovery*path* Biological*Sample* Genomic*Data* Scien(fic*Insight*

  3. Discovery*path* Biological*Sample* Genomic*Data* Scien(fic*Insight*

  4. Components*of*Data*Analysis* Automa(on* Analysis* Genomic*Data* Scien(fic*Insight* Human*Judgment*

  5. Outline* • Genome*Assembly*Visualiza(on* – ABySSMExplorer* • Complement*to*genome*browsing** – Using*clustering*and*interac(ve*data* explora(on*

  6. Outline* • Genome*Assembly*Visualiza(on* – ABySSMExplorer* • Complement*to*genome*browsing** – Using*clustering*and*interac(ve*data* explora(on*

  7. Genome*Sequencing* cell*popula(on* extracted*DNA* Shotgun approach sheared*DNA* sequencing*reads* AGCGGATTGCATGACAGT* GCGCTACGATCAGATCAA* GTACAGCCTGACAGAAGC* CATGACAGTCCGAGTACA* TTCAGAATGGTACAGCAG*

  8. ABySS*–*Assembly*By*Short*Sequences* Simpson* et$al. *Genome*Res*2009 $ GGACA GGACATC TC Sequencing*read*set*(read*length*=*7*nt):* GGACA GGACAGA GA Corresponding*de*Bruijn*graph*( k *=*5*nt):*

  9. ABySS*–*Assembly*By*Short*Sequences* Simpson* et$al. *Genome*Res*2009 $ GGACA GGACATC TC Sequencing*read*set*(read*length*=*7*nt):* GGACA GGACAGA GA Corresponding*de*Bruijn*graph*( k *=*5*nt):* ABySS*merges*unambiguously*connected*ver(ces*to*form*con(gs*

  10. Assembly*Ambigui(es* True*genome*sequence* GGATTGAAAAAAAAAAAAAAAAGTAGCACGAATATACATAGAAAAAAAAAAAAAAAAATTACG

  11. Assembly*Ambigui(es* True*genome*sequence* GGATTGAAAAAAAAAAAAAAAAGTAGCACGAATATACATAGAAAAAAAAAAAAAAAAATTACG Assembled sequence de Bruijn graph representation

  12. Star(ng*Point* Shaun*Jackman*

  13. Example*of*exis(ng*tools:*Consed*

  14. Example*of*exis(ng*tools:*Consed*

  15. Proper(es*of*DNA*

  16. Capture*sequence*strand* AAAAAT 2+* 1+*

  17. Capture*sequence*strand* AAAAAT 2+* 1+* TTTTTA 2M* 1M*

  18. Capture*sequence*strand* AAAAAT 1+* 2+* TTTTTA

  19. Capture*sequence*strand* AAAAAT 1M* 2M* TTTTTA

  20. Capture*sequence*length* one*oscilla(on*=*100*nt*

  21. Genome*Sequencing* cell*popula(on* extracted*DNA* read*pair*informa(on* read* sheared*DNA* dsDNA* fragment* (known*size)* sequencing*reads* AGCGGATTGCATGACAGT* read* GCGCTACGATCAGATCAA* GTACAGCCTGACAGAAGC* (typically*produce*millions)* CATGACAGTCCGAGTACA* TTCAGAATGGTACAGCAG*

  22. Capture*read*pair*informa(on* A_er*building*the*ini(al*singleMend*(SE)*con(gs*from* k Mmer*sequences,* ABySS*uses*pairedMend*reads*to*resolve*ambigui(es.*

  23. Capture*read*pair*informa(on* Paired*end*read*informa(on*is*used*the*construct*paired*end*(PE)*con(gs* …* 13+$$44&$$46+$$4+$$79+$$70+ *…* blue*gradient*=*paired*end*con(g* orange*=*selected*single*end*con(g*

  24. ABySSMExplorer* • *Visual*representa(on*of:* • *con(g*adjacency*informa(on* • *con(g*strand* • *con(g*length* • *pairedMend*rela(onships* • *pairedMend*con(gs* * • *Implemented*using*the*Java*Universal*Network/Graph* Framework*(JUNG)* • *Applied*the*KamadaMKawai*layout*algorithm*(JUNG* implementa(on)* • *Use*ABySS*files*as*input*(version*1.1.0*and*higher)*

  25. hdp://www.bcgsc.ca/plaeorm/bioinfo/so_ware/abyssMexplorer*

  26. Part*1:*Conclusions*and*Future*Work* • *Graph*encoding*provides*a*integrated*display*of* genome*assemblies*and*associated*metaMdata* • *This*representa(on*is*par(cularly*powerful*for* revealing*highMlevel*genome*assembly*structure,* not*readily*viewable*in*any*other*interac(ve*tool* • *Future*work*includes:* • *support*for*other*assembly*algorithm*outputs** • *enable*flexible*annota(on*display* • *integrate*with*exis(ng*assembly*edi(ng*tools*

  27. Outline* • Genome*Assembly*Visualiza(on* – ABySSMExplorer* • Complement*to*genome*browsing** – Using*clustering*and*interac(ve*data* explora(on*

  28. Genome*Sequencing* cell*popula(on* extracted*DNA* sheared*DNA* sequencing*reads* AGCGGATTGCATGACAGT* GCGCTACGATCAGATCAA* GTACAGCCTGACAGAAGC* (typically*produce*millions)* CATGACAGTCCGAGTACA* TTCAGAATGGTACAGCAG*

  29. Genome*Sequencing* cell*popula(on* extracted*DNA* sheared*DNA* sequencing*reads* AGCGGATTGCATGACAGT* GCGCTACGATCAGATCAA* GTACAGCCTGACAGAAGC* (typically*produce*millions)* CATGACAGTCCGAGTACA* TTCAGAATGGTACAGCAG*

  30. Genome*Sequencing* cell*popula(on* Chroma(n* Immunoprecipita(on* and*Sequencing** extracted*DNA* (ChIPMSeq)* selec/on * sheared*DNA* sequencing*reads* AGCGGATTGCATGACAGT* GCGCTACGATCAGATCAA* GTACAGCCTGACAGAAGC* (typically*produce*millions)* GTACAGCCTGACAGAAGC* CATGACAGTCCGAGTACA* TTCAGAATGGTACAGCAG* TTCAGAATGGTACAGCAG*

  31. Align*sequences*to*the*genome* CCGAGTACAGCCTGACAGA CCGAGTACAGCCTGACAGA $ GCATGACAGTCCGAGTAC GCATGACAGTCCGAGTAC $ TTGCATGACAGTCCGAGT $ TTGCATGACAGTCCGAGT AGCGGATTGCATGACAGT $ AGCGGATTGCATGACAGT AGCGGATTGCATGACAGT $ AGCGGATTGCATGACAGT AGCGGATTGCATGACAGT $ AGCGGATTGCATGACAGT Reference$Genome $ AGCGGATTGCATGACAGTCCGAGTACAGCCTGACAGA AGCGGATTGCATGACAGTCCGAGTACAGCCTGACAGA Read$coverage $ Genomic$coordinate $

  32. Genome*browser*can*reveal*local*paderns* H3K4me3 * H3K36me3 * H3K27me3 * H3K9me3 * H3K9Ac * MRE *

  33. Difficult*to*get*global*overview*

  34. Focus*on*regions*of*interest* 1.*For*example,*transcrip(onal*start*sites*(TSS*+/M*3000*nt)* H3K4me3* H3K9Ac* H3K4me1* H3K36me3* MeDIP* MRE* 2.*Extract*data*matrices* Normaliza(on*for*bin* i ,*sample* h :* 3.*Cluster*matrices*(kMmeans*clustering*with*Euclidean*distance)*

  35. Focus*on*regions*of*interest* 1.*For*example,*transcrip(onal*start*sites*(TSS*+/M*3000*nt)* H3K4me3* H3K9Ac* H3K4me1* H3K36me3* MeDIP* MRE* 2.*Extract*data*matrices* Normaliza(on*for*bin* i ,*sample* h :* 3.*Cluster*matrices*(kMmeans*clustering*with*Euclidean*distance)*

  36. Focus*on*regions*of*interest* 1.*For*example,*transcrip(onal*start*sites*(TSS*+/M*3000*nt)* H3K4me3* H3K9Ac* H3K4me1* H3K36me3* MeDIP* MRE* 2.*Extract*data*matrices* Normaliza(on*for*bin* i ,*sample* h :* 3.*Cluster*matrices*( k Mmeans*clustering*with*Euclidean*distance)*

  37. Enable*interac(ve*explora(on* 4.*Interac(ve*cluster*visualiza(on*(data*from*H1*cells)* cluster*size*indicator*(total*n=*15,618)* H3K4me3* H3K9Ac* H3K4me1* H3K36me3* H3K4me3* H3K27me3* H3K9me3* H3K9Ac* MeDIP* H3K4me1* MRE* mRNA* H3K36me3* cluster** H3K27me3* (average*values*displayed)* individual*TSS* H3K9me3* MeDIP* H3K4me3* H3K9Ac* MRE* H3K4me1* H3K36me3* mRNA* H3K27me3* H3K9me3* MeDIP* MRE* HOXC12*gene* mRNA* scroll*bar*to*explore*all* cluster*members* 5.*LinkMout*to*UCSC*genome*browser*

  38. Enable*interac(ve*explora(on* 4.*Interac(ve*cluster*visualiza(on*(data*from*H1*cells)* cluster*size*indicator*(total*n=*15,618)* H3K4me3* H3K9Ac* H3K4me1* H3K36me3* H3K4me3* H3K27me3* H3K9me3* H3K9Ac* MeDIP* H3K4me1* MRE* mRNA* H3K36me3* cluster** H3K27me3* (average*values*displayed)* individual*TSS* H3K9me3* MeDIP* H3K4me3* H3K9Ac* MRE* H3K4me1* H3K36me3* mRNA* H3K27me3* H3K9me3* MeDIP* MRE* mRNA* scroll*bar*to*explore*all* cluster*members*

  39. Enable*interac(ve*explora(on* 4.*Interac(ve*cluster*visualiza(on*(data*from*H1*cells)* cluster*size*indicator*(total*n=*15,618)* H3K4me3* H3K9Ac* H3K4me1* H3K36me3* H3K4me3* H3K27me3* H3K9me3* H3K9Ac* MeDIP* H3K4me1* MRE* mRNA* H3K36me3* cluster** H3K27me3* (average*values*displayed)* individual*TSS* H3K9me3* MeDIP* H3K4me3* H3K9Ac* MRE* H3K4me1* H3K36me3* mRNA* H3K27me3* H3K9me3* MeDIP* MRE* HOXC12*gene* mRNA* scroll*bar*to*explore*all* cluster*members* 5.*LinkMout*to*UCSC*genome*browser*

  40. Part*2:*Conclusions*and*Future*Work* • *Clustering*reveals*paderns*that*were*not*obvious* using*a*genome*browser.* * • *Access*to*both*global*and*detailed*view*is*valuable* • *Future*work*includes:* • *search*func(onality*(e.g.*by*region*id)* • *integra(on*with*other*clustering*tools* • *richer*analysis*func(onality*(e.g.*interac(ve* clustering)*

  41. Acknowledgements* NIH*Epigenomics*Roadmap* ABySSMExplorer* Joe Costello, UCSF Shaun Jackman Peggy Farnham, UC Davis İ nanç Birol Thea Tlsty, UCSF Jason Chang Marco Marra Martin Hirst Lymphoma*Project*Analyst* Yongjun Zhao Karen Mungall Nina Thiessen Richard Varhol Supervisor* Primary*Data*Genera(on* Steven Jones Lymphoma Genomics Team

Recommend


More recommend