AN NVIDIA POWERED BIO-GRAPH ALIGNMENT AND VISUALIZATION TOOL G3NA-V Global GPU-based Gene Network Alignment Visualization Karan Sapra Melissa C. Smith, Alex Feltus, Joshua Levine ACCELERATING DISCOVERY
Problem Statement World Population Estimates Cancer and other Infectious Diseases Crop Yield “Human Prosperity depends on our ability to understand genomes of organism and change them for the better (or worse).”
Overview of Genomic Discoveries Sequencing and Sampling Collecting multiple Samples from tissues from organism, etc. Correlation, Topological Analysis, thresholding, clustering Construct Gene Expression Identify Modules Network Alignment (G3NA) Relate Modules using Utilize GO Ontology, Evolutionary Tree, External information Molecular Structure
G3NA-V Workflow Conserved Graph Alignment (Compute ) Gene Expression Matrix Clustering and GUI Gene Expression Network Multi Network Alignment Ontologies Sample Curve Distribution Evolutionary Tree Molecular Visualization ACCELERATING DISCOVERY
Complex biological systems can be modeled as graphs… Alzheimer’s Rice graph mapped to genome (Plaque in Brain) Node (Gene) Edge Higher (Gene Interaction) Yield!!!! Genenet Engine: sysbio.genome.clemson.edu
Paleogenomics: Conserved subgraphs can be detected by graph alignment… Maize Aligned Graphs Conserved Subgraphs Rice Evidence: Maize-Rice Ancestor Shared Similar Gene Interaction Patterns 50-70 Millions of Years Ago Ficklin & Feltus "Gene coexpression network alignment and conservation of gene modules between two grass species: maize and` rice." Plant Physiology 156:3 (2011)
G3NA-V Workflow Conserved Graph Alignment (Compute ) Gene Expression Matrix Clustering and GUI Gene Expression Network Multi Network Alignment Ontologies Sample Curve Distribution Evolutionary Tree Molecular Visualization ACCELERATING DISCOVERY
G3NA-V Overview Compute Engine Daemon (Message Passing Visualization Engine Control Unit) Preprocessing osg Visualization(Graph, Molecule, Network Alignment Matrix, etc) Node Matching mpi Update Visualization Edge and Cluster Matching tcp Apply Filtering Graph Algorithm shared-memory Postprocessing
Daemon (Message Passing Control Unit) • User activity offload Computation task such as multiple alignment, clustering, data reduction, ray-casting, Parsing, etc. • Use Shared Memory / TCP / UDP • Fast Offloading to Daemon • Daemon Offloads using MPI / OSG • MPI using obtaining multiple nodes during initial launch • Can launch Multiple GPU/CPU Super Daemon Daemon • Daemon Monitors resources Daemon GPU GPU GPU • Working on Integration with Open Science Grid ( OSG ) Node Node Node GPU GPU GPU
Compute Engine • CUDA7 enabled global pairwise aligner GPU-enabled Global Gene Network Aligner (G3NA) • CUDA enabled graph processing libraries • Thrust, Map-graph, etc. • Use Multiple GPUs for alignment of multiple graph • Utilize various algorithms • Clustering, Page Ranking, Filtering, Max-flow min-cut, etc.
Visualization Engine • Orientation and Visual Flexibility • GPU enabled OpenGL and GLUI based visualization • Support for Multiple View ports and Data Types • CUDA-based Layout algorithms for Graphs and Trees • Dual/Multi GPU Support for Compute and Visualization
INPUT DATA FORMAT • Input Data : Tab Separate Data for each Graph • Undirected Edge list pair • Size : 2000 Nodes / Graph Maize Rice • 40,000 Edges / Graph • Alignment Graph: Tab Separated for between pair of graphs • Undirected Edge List pair • Size : ~700 Nodes Alignment Graph • Edges ~ 1000 Edges Maize - Rice
SUPPORTING DATA • Cluster File: Tab Separated for each graph Cluster File • File per graph containing node and clusterID • Network Information File: Tab Separated for each graph • File per graph containing information about species including extra non-utilized information • Utilize to get Ontology information Network Information File
ONTOLOGY FORMAT • Available at: http://geneontology.org/ontology/go-basic.obo • Gene Ontology(GO) Basic File • Id: GOID ( GO:xxxxxxx) • Name: Gene Ontology information • NameSpace: Gene Ontology Classification • Definition: Description about the GO • is_a, consider, synonym, obsolete, etc.
{ "graph": { c "graph1": { "id": 1, "name": "Maize", JSON FILE "fileLocation": "M.tab", "clusterLocation": "M.tab.cluster", "Ontology": "Maize_info2.txt", "x": -300, "y": 0, • Used for user-directed layout "z": 0, and input of graph and "w": 200, "h": 200 supporting data }, "graph2": { "id": 2, • Contains Position in 3D "name": "Rice", "fileLocation": "R.tab", space "clusterLocation": "R.tab.cluster", "Ontology": "Rice_info2.txt", "x": 0, • Contains Initial Size "y": 0, "z": 0, "w": 200, "h": 200 • Contains Alignment } Information between graphs }, "alignment": { "alignment1": { "graphID1" : 1, "graphID2" : 2, "filelocation" : “output.gna”} } }
Enabling Systems Genetics using HPC • Enabling anonymous pairwise alignment using Palmetto Supercomputer at Clemson University • Snappy overview visualization of alignment using WebGL 721.9 Teraflops 2021 Compute Nodes and 22,336 cores 374 nodes with dual K20/K40 for acceleration and visualization 56Gbit interconnect http://network.genome.clemson.edu
G3NA-V Workflow Gene Expression Matrix Sample Curve Distribution
Gene Expression Matrix • Raw genomic data is a list of genes associated to a species and a number of Samples • Each Sample is an intensity value expressed by the gene • Raw data matrix is visualized as a heatmap
Gene Sample Distribution • Sample curve distribution identify outliers in the raw genomic expression data. • Normalized histogram of intensities with the range [-1, 1].
G3NA-V Workflow Gene Expression Matrix Gene Expression Network Multi Network Alignment Sample Curve Distribution
G3NA Result Performance with IsoRankN G3NA Scalability All Performance data are from single node K40
G3NA Profiling Overview All Performance data are from single node K40
Ontology Visualization • A gene may be associated to multiple GO terms • Every GO term is part on an ontology • We navigate through the ontologies through the GO terms (and descriptions)
Genomic Molecular Vis • We can visualize the protein structure for each gene node • Obtain files from the Protein Data Bank archive (PDB) • Crystal Structure of Protective Ebola Virus Antibody 114
Conclusion • NVIDIA powered tool for alignment and visualizations of graphs and networks related information • Support for various formats and visualization • Accelerate discovery by incorporation tools.
Recommend
More recommend