Genomics Virtual Laboratory Mike Pheasant (UQ) Andrew Lonie (VLSCI)
What is the Genomics Virtual Laboratory? NeCTAR funded nationally distributed platform for genomics, built on the Research Cloud and RDSI
NeCTAR? Research Cloud? RDSI? NeCTAR = National eResearch Collaboration Tools and Resources http://www.nectar.org.au NeCTAR Research Cloud http://www.nectar.org.au/research-cloud RDSI = Research Data Storage Initiative http://www.rdsi.uq.edu.au/
What is the Genomics Virtual Laboratory? NeCTAR funded nationally distributed platform for genomic analyses: Infrastructure ● Workflow management system ● Bioinformatics toolkit (for command-line users) ● Visualisation services ● Scalable compute infrastructure Resources ● Tutorials and exemplar workflows targetted at common high throughput genomics tasks ● Data catalogues and coordination centres ● Subscription based support
What is the Genomics Virtual Lab?
Workflow platforms
Workflow platforms Interactive platforms for developing genomics workflows and interactive data analysis ● Galaxy ● Genepattern, others possible (Bioflow, ...) What's Galaxy? "an open, web-based platform for performing accessible, reproducible, and transparent genomic science." http://galaxyproject.org Accessible: Users without programming experience can easily specify parameters and run tools and workflows Reproducible: Galaxy captures information so that any user can repeat and understand a complete computational analysis Transparent: Users share and publish analyses via the web
Visualisation platforms
Cluster-on-the-cloud
Cluster-on-the-cloud CloudBioLinux - Linux with comprehensive, actively maintained suite of bioinformatics tools http://cloudbiolinux.org/ CloudMan : platform for launching and scaling CloudBioLinux clusters and Galaxy clusters on the cloud http://usecloudman.org Research Cloud : ~25000 CPUs to be spread across 6-10 research centres around Australia, to host research activities 'on demand' http://www.nectar.org.au/research-cloud
Data catalogues
Data catalogues UCSC databases Ensembl databases ENCODE dbSNP, Hapmap ICGC, COSMIC BPA Framework Datasets ● sarcoma ● wheat ● soil diversity
Tutorials and workshops
Tutorials and education resources NGS School - summer schools, 2 day workshops Galaxy based online tutorials: ● Intro to NGS ● Genome Browsers ● Common analyses ○ Differential gene expression ○ Variant calling ○ ChIPseq ○ ...
Exemplar best practice workflows
Exemplar workflows ● Variant calling: GATK best-practice ○ microbial ○ cancer-optimised ○ ● RNA-seq differential expression ● Fusion gene discovery from RNA-seq ● MicroRNA analysis ● De novo genome and transcriptome assembly ● Metagenomics ● ChIP-seq ● Variant annotation ● Pathway analysis ● Methylation
Support
Genomics Informatics Network Institutional subscriptions: ● genomics support (% of FTE) ● large compute and data resources ● managed instances of GVL ● new GVL tool development ● advocacy to funding bodies for resources ● communities of best practice
Or...roll your own GVL
Progress and timelines Dec 2012 Prototype at Qld (UQ) and Vic (UoM) ● Galaxy ● UCSC browser + databases ● Bioinformatics cluster-on-the-cloud ● Initial tutorials and exemplars Jun 2013 Production at Qld (UQ) and Vic (UoM), prototype @ other Research Cloud nodes Data coordination centres, data catalogues Dec 2013 Additional workflows and tutorials Additional nodes Jun 2014 Operations (support centres - subscriptions)
Recommend
More recommend