Reproducible genomics analysis pipelines with GNU Guix 使用 提供 可重复性的 基因组学 分析管道 R. Wurmus , B. Uyar, B. Osberg, V. Franke, https://doi.org/10.1093/gigascience/giy123 A. Gosdschan, K. Wreczycka, J. Ronen, A. Akalin
笔记本 a = 10ml b = 30ml b Supplier: ACME a Temp: 22 deg C
To repeat an experiment we first need to reproduce its environment
How hard could this possibly be?
coreutils-8.24 acl-2.2.52 libcap-2.24 attr-2.4.47 gmp-6.1.0 gettext-0.19.7 m4-1.4.17 expat-2.1.0 perl-5.22.1 glibc-utf8-locales-2.22 ld-wrapper-0 patch-2.7.5 gawk-4.1.3 ed-1.12 gzip-1.6 fi le-5.25 fi ndutils-4.6.0 libsigsegv-2.10 bzip2-1.0.6 sed-4.2.2 di ff utils-3.3 xz-5.2.2 tar-1.28 lzip-1.16 grep-2.22 coreutils-8.24 libcap-2.24 acl-2.2.52 attr-2.4.47 make-4.1 gmp-6.1.0 gettext-0.19.7 pkg-con fi g-0.29 perl-boot0-5.22.1 perl-5.22.1 m4-1.4.17 expat-2.1.0 guile-2.0.11 Very. glibc-utf8-locales-2.22 readline-6.3 gmp-6.1.0 libgc-7.4.2 gzip-1.6 gzip-1.6 libunistring-0.9.6 libltdl-2.4.6 pkg-con fi g-0.29 lib ffi -3.2.1 libatomic-ops-7.4.2 ncurses-6.0 m4-1.4.17 bash-4.3.42 readline-6.3 ncurses-6.0 gcc-4.9.3 gcc-4.9.3 ld-wrapper-boot3-0 binutils-2.25.1 zlib-1.2.8 libstdc++-4.9.3 guile-bootstrap-2.0 gcc-cross-boot0-wrapped-4.9.3 glibc-2.22 bash-static-4.3.42 gcc-cross-boot0-wrapped-4.9.3 glibc-intermediate-2.22 gettext-boot0-0.19.7 linux-libre-headers-3.14.37 texinfo-6.0 gcc-cross-boot0-4.9.3 bison-3.0.4 perl-boot0-5.22.1 m4-1.4.17 binutils-cross-boot0-2.25.1 fi le-boot0-5.25 fi ndutils-boot0-4.6.0 di ff utils-boot0-3.3 make-boot0-4.1 gcc-bootstrap-0 binutils-bootstrap-0 bootstrap-binaries-0 glibc-bootstrap-0
Containers to the rescue?
Containers whale oil? lack transparency strawberry?
Automate genomics analyses 1 RNAseq ChIPseq A C A G C G C C U A A A G C U Design BSseq single cell goals
Compute read coverage PiGx ChIPseq R Scripts ChIP QC & Improve Peak Call peaks Align reads reproducibility read quality annotation Bowtie2 ChIPQC + IDR genomation Trim-Galore MACS2 Check Pan-sample sequencing quality check quality FastQC MultiQC
Simple user interface 2 interactive reports browser tracks Design Sample sheet alignments QC reports goals sample clustering Settings
3 Easy to install reproducibly guix package --install pigx Design goals
Reproducible package manager Full environment declarations Builds software in isolation source / binary transparency
Pack an application bundle higher order lower-level binary source description application bundles
90% ~ 98% all pipelines PiGx BSseq PiGx ChIPseq PiGx RNAseq Status PiGx scRNAseq not reproducible minor problems reproducible
1 Constrain software variables 2 Containers are not transparent (smoothies) 3 Guix builds software reproducibly and transparently 4 PiGx shows that Guix makes reproducibility easy 5 PiGx brings analysis to non-bioinformaticians
Learn more http://bioinformatics.mdc-berlin.de/pigx/ https://hpc.guixsd.org Let’s talk! https://gnu.org/s/guix #guix on irc.freenode.net ricardo.wurmus@mdc-berlin.de
Compute read coverage PiGx RNAseq Bedtools Analyze Quantify Improve Find enriched Align reads differential expression read quality GO terms expression Trim-Galore STAR / Salmon STAR DESeq2 g:ProfileR Check Pan-sample sequencing quality check quality FastQC MultiQC
PiGx BSseq Differential methylation methylkit Improve Annotate DMRs Call Align reads read quality and segments methylation Bismark genomation Trim-Galore methylkit Check Pan-sample Methylation sequencing quality check segmentation quality FastQC MultiQC methylkit
Compute read coverage PiGx single cell Bedtools RNAseq Determine Dropout rate Improve Dimension Align reads cell number and QC read quality reduction STAR tSNE + PCA Trim-Galore Dropbead Scater Check Pan-sample sequencing quality check quality FastQC MultiQC
headers sources build tools libraries ...
headers sources build tools libraries ... cabba9e- samtools -1.7/ bin samtools lib ...
Recommend
More recommend