Accelerating Gene Set Enrichment Analysis on CUDA-Enabled GPUs Bertil Schmidt Christian Hundt
Contents • Gene Set Enrichment Analysis (GSEA) – Background – Algorithmic details • cudaGSEA • Performance evaluation
GSEA and Bioinformatics • High throughput technologies generate large-scale gene expression data sets – RNA-Seq – Microarrays • GSEA uses annotated gene sets to mine a given gene expression matrix – MSigDB contains over 10K signatures each containing around 100 gene identifiers on average • Typical GSEA study: – identify metabolic pathways that are differentially changed in human type-2 diabetes
Gene Set Enrichment Analysis • Reveals correlation between gene sets and diseases using gene expression data • State-of-the-art tool with over 10,000 citations • Written in (multi-threaded) Java • Highly time consuming – analyzing 20,639 genes measured in 200 patients with 4,725 pathways and 1M permutations takes around 1 week with GSEA 2.2.2 software on a CPU • We present – GSEA parallelization on a GPU using CUDA (cudaGSEA) – cudaGSEA around two orders-of- magnitude faster than BroadGSEA
GSEA Algorithm – Gene Ranking • Gene expression matrix D obtained from RNA-Seq or Microarray experiments • For each gene i and patient j with associated (binary) phenotype C expression value D [ i , j ] is stored Diseases driven by complex gene interactions simply reporting top-ranked genes • produce many false positives • Domain experts provides set of genes that might possibly explain observed phenotypes
GSEA Algorithm – Enrichment score • Enrichment score (ES) measure correlation between given gene set S and calculated gene ranking g ( i ) – Report maximum deviation of a running sum ( k ) – Sum increases if we hit a member of S and decreases otherwise How significant is ES = 0.857? p -value calculation using permutation testing •
GSEA Algorithm – Permuation testing
GSEA Algorithm – Permuation testing
GSEA Algorithm |ES| -|ES| • Histogram of 1,000,000 enrichment scores gained by permuting patient phenotypes • Estimate p -value by counting events in both tails • Why so many permutations? – When testing 1,000 gene sets at significance level p <0.001 we need more than 1,000,000 samples to reject null hypothesis at 1,000 p < 0.001 (Bonferroni correction)
Transpose D to CUDA Parallelization ensure coalesced memory accesses
CUDA Parallelization
CUDA Parallelization
CUDA Implementation Details • Support for single-precision and double-precision • Resulting matrix of enrichment scores (#gene sets x #permutations) can be large – e.g. 5K x 1M x 8B = 40GB • p -value estimation, Family-wise error rate (FWER), normalized enrichment score (NES) computation can be accomplished on the GPU with ( sum / max ) reduction kernels without the need for storing this matrix • False discovery rate (FDR) computation this matrix is transferred to the CPU for post-processing
cudaGSEA Features • Reading data sets directly in Broad Institute-compatible file formats • Supporting several local deviation measures – Mean-based measures (difference/quotient/log-quotient of means) – Mean and standard deviation-based measures (signal to noise- ratio, t -tests, one/two-pass estimation) – Numerically stable summation schemes for local measures and ES (Kahan etc.) • Package for the R framework and standalone application • Multi-threaded CPU version in C++ using OpenMP
Performance Evaluation • GSE19429 dataset – collapsed to 20,639 gene symbols; 200 patients (183 cases + 17 controls) • Hallmark: 50 gene sets – MSigDB 5.1 smallest gene set collection • GeForce Titan X (single precison) / Tesla K40c (double precision, ECC off), CUDA 7.5 • 10 core Xeon E5-2660v3@2.60GHz, 20 Threads, Ubuntu 14.04, gcc 4.8.4, 64-bit OpenJDK • BroadGSEA v.2.2.2
Performance Evaluation • GSE19429 dataset – collapsed to 20,639 gene symbols; 200 patients (183 cases + 17 controls) • C2: 4726 gene sets – MSigDB 5.1 largest gene set collection • GeForce Titan X (single precison) / Tesla K40c (double precision, ECC off), CUDA 7.5 • 10 core Xeon E5-2660v3@2.60GHz, 20 Threads, Ubuntu 14.04, gcc 4.8.4, 64-bit OpenJDK • BroadGSEA v.2.2.2
Conclusion • High-throughput technologies establish the need for scalable bioinformatics tools that can process large- scale gene expression data sets • CUDA is a suitable technology to address this need • cudaGSEA on one GPU achieves around two orders-of- magnitude speedup versus BroadGSEA on a CPU – analyzing 20,639 genes measured in 200 patients with 4,726 pathways and 1M permutations takes around 1 week with GSEA 2.2.2 on a Xeon E5-2660v3 CPU while less than 1 hour on a GeForce Titan X • Source code available at: – https://github.com/gravitino/cudaGSEA • Group Website: – https://www.hpc.informatik.uni-mainz.de/
Thank you! Accelerating Gene Set Enrichment Analysis on CUDA-Enabled GPUs Bertil Schmidt, Christian Hundt Institute of Computer Science Johannes Gutenberg University Mainz {bertil.schmidt, hundt}@uni-mainz.de
Recommend
More recommend