SWAMP+: Enhanced Smith- Waterman Search for Parallel Models Shannon Steinfadt, Ph.D. Los Alamos National Laboratory shannon@lanl.gov U N C L A S S I F I E D Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA
Outline gcggacgctccacg-tgtc--c—-ct-cgccgcgccc-cgtctacc Motivation for Sequence Alignment � ||:|:||||::|-|::|--|--||-|-|:|:|::| ||-|:|| gggccctcctggctcccaacagcttctcagttc ccacttc Smith-Waterman Local Sequence Alignment � SWAMP � ASC � • SWAMP using ASC Emulator SWAMP+ � SWAMP and SWAMP+ on Metal � • ClearSpeed • Convey Computer Contributions � Future Work � Questions? � U N C L A S S I F I E D Slide 2 Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA - LA-UR-12-20189
Motivation: Sequence Alignment Given two sequences: DNA nucelotides {A, G, T, C} Proteins { A, R, N, D, C, Q, E, G, H, I, L, K, M, F, P, S, T, W, Y, V } Align them to find the longest, most common subsequence Query: IHACYSRQPELAAKLMKDVIAEPYRERLLPGFRQARQAVAEIGAVASGISGSGPTLFALCDKPETAQRVA Subject: MFCVQCEQTIRTPAGNGCSYAQGMCGKTAETSDLQDLLIAALQGLSAWAVKAREYGIINHDVDSFAPRAFFST LTNVNFDSPRIVGYAREAIALREALKAQCLAVDANARVDNPMADLQLVSDDLGELQRQAAEFTPNKDKAAIGENILGLRL LCLYGLKGAAAYMEHAHVLGQYDNDIYAQYHKIMAWLGTWPADMNALLECSMEIGQMNFKVMSILDAGETGKYGHPTPTQ VNVKATAGKCILISGHDLKDLYNLLEQTEGTGVNVYTHGEMLPAHGYPELRKFKHLVGNYGSGWQNQQVEFARFPGPIVM TSNCIIDPTVGAYDDRIWTRSIVGWPGVRHLDGDDFSAVITQAQQMAGFPYSEIPHLITVGFGRQTLLGAADTLIDLVSR EKLRHIFLLGGCDGARGERHYFTDFATSVPDDCLILTLACGKYRFNKLEFGDIEGLPRLVDAGQCNDAYSAIILAVTLAE KLGCGVNDLPLSLVLSWFEQKAIVILLTLLSLGVKNIVTGPTAPGFLTPDLLAVLNEKFGLRSITTVEEDMKQLLSA U N C L A S S I F I E D Slide 3 Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA - LA-UR-12-20189
Motivation: Sequence Alignment Given two sequences: DNA nucelotides {A, G, T, C} Proteins { A, R, N, D, C, Q, E, G, H, I, L, K, M, F, P, S, T, W, Y, V } Align them to find the longest, most common subsequence One of the most common fundamental tasks is local sequence alignment Query: VIA-EPYRE-RLLPGFRQARQAVAEIGAVASGISGSGPTLFALCDK : : :: : :: : : : : Subject: LVSREKLRHIFLLGGCDGARGERHYFTDFATSVPDDCLILTLACGK U N C L A S S I F I E D Slide 4 Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA - LA-UR-12-20189
Pairwise Local Sequence Alignment Similar Characters Similar Structure Similar Characters Similar Structure Similar Function Similar Function Homologous (derived by humans) Sequences Ancestral Relationships (preserved by Ancestral Relationships evolution) Gene Functionality Gene Functionality Aid in Drug Discovery Aid in Drug Discovery Assembly of Raw Data Assembly of Raw Data U N C L A S S I F I E D Slide 5 Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA - LA-UR-12-20189
Aligning using Smith-Waterman Algorithm Compare all possible combinations of sequence characters against each other Cost Key Match +10 Miss -3 Insert a Gap -3 Extend a Gap -1 U N C L A S S I F I E D Slide 6 Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA - LA-UR-12-20189
Aligning using Smith-Waterman Algorithm Compare all possible combinations of sequence characters against each other Cost Key Match +10 Miss -3 Insert a Gap -3 Extend a Gap -1 U N C L A S S I F I E D Slide 7 Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA - LA-UR-12-20189
Aligning using Smith-Waterman Algorithm Compare all possible combinations of sequence characters against each other Cost Key Match +10 Miss -3 Insert a Gap -3 Extend a Gap -1 U N C L A S S I F I E D Slide 8 Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA - LA-UR-12-20189
Aligning using Smith-Waterman Algorithm Compare all possible combinations of sequence characters against each other Cost Key Match +10 Miss -3 Insert a Gap -3 Extend a Gap -1 U N C L A S S I F I E D Slide 9 Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA - LA-UR-12-20189
Aligning using Smith-Waterman Algorithm Compare all possible combinations of sequence characters against each other Cost Key Match +10 Miss -3 Insert a Gap -3 Extend a Gap -1 U N C L A S S I F I E D Slide 10 Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA - LA-UR-12-20189
Aligning using Smith-Waterman Algorithm Compare all possible combinations - but it has dynamic programming data dependencies Cost Key Match +10 Miss -3 Insert a Gap -3 Extend a Gap -1 U N C L A S S I F I E D Slide 11 Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA - LA-UR-12-20189
Aligning using Smith-Waterman Algorithm Compare all possible combinations - but it has dynamic programming data dependencies Cost Key Match +10 Miss -3 Insert a Gap -3 Extend a Gap -1 U N C L A S S I F I E D Slide 12 Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA - LA-UR-12-20189
Aligning using Smith-Waterman Algorithm Compare all possible combinations - but it has dynamic programming data dependencies Cost Key Match +10 Miss -3 Insert a Gap -3 Extend a Gap -1 U N C L A S S I F I E D Slide 13 Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA - LA-UR-12-20189
Aligning using Smith-Waterman Algorithm Compare all possible combinations - but it has dynamic programming data dependencies Cost Key Match +10 Miss -3 Insert a Gap -3 Extend a Gap -1 U N C L A S S I F I E D Slide 14 Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA - LA-UR-12-20189
Aligning using Smith-Waterman Algorithm Compare all possible combinations - but it has dynamic programming data dependencies Cost Key Match +10 Miss -3 Insert a Gap -3 Extend a Gap -1 U N C L A S S I F I E D Slide 15 Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA - LA-UR-12-20189
Aligning using Smith-Waterman Algorithm Compare all possible combinations - but it has dynamic programming data dependencies Cost Key Match +10 Miss -3 Insert a Gap -3 Extend a Gap -1 U N C L A S S I F I E D Slide 16 Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA - LA-UR-12-20189
Aligning using Smith-Waterman Algorithm Compare all possible combinations - but it has dynamic programming data dependencies Cost Key Match +10 Miss -3 Insert a Gap -3 Extend a Gap -1 U N C L A S S I F I E D Slide 17 Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA - LA-UR-12-20189
Smith-Waterman Recursive Matrix Equations − ⎧ ⎫ ⎪ ⎪ C g ⎧ ⎫ − 1 , i j = − σ D i . j max ⎨ ⎬ D . ⎪ ⎪ i j ⎪ ⎪ D ⎩ ⎭ − 1 , i j ⎪ ⎪ I i , j C i , j = max ⎨ ⎬ ( ) , j − 1 + d S 1 i , S 2 j ⎪ ⎪ C i − 1 − ⎧ ⎫ ⎪ ⎪ C g − ⎪ ⎪ , 1 i j = − σ max ⎨ ⎬ ⎩ 0 ⎭ I , g i j ⎪ ⎪ I ⎩ ⎭ − , 1 i j = ⎧ if ( ) ⎪ match_cost S1 S2 = i j d ⎨ S1 , S2 g : gap extension cost ≠ i j ⎪ if miss_cost S1 S2 ⎩ i j σ : gap opening cost U N C L A S S I F I E D Slide 18 Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA - LA-UR-12-20189
Traceback in the Smith-Waterman Algorithm 1) Find the maximum computed value Cost Key Match +10 Miss -3 Insert a Gap -3 Extend a Gap -1 U N C L A S S I F I E D Slide 19 Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA - LA-UR-12-20189
Traceback in the Smith-Waterman Algorithm 1) Find the maximum computed value 2) Traceback until you reach ‘0’s Alignment: Cost Key CATTG Match +10 C - -TG Miss -3 Insert a Gap -3 Extend a Gap -1 U N C L A S S I F I E D Slide 20 Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA - LA-UR-12-20189
Smith-Waterman Vectorization Approaches Parallel Processing � • Allows high-quality results in less time using the Smith-Waterman algorithm Rognes described four basic approaches: � Vectors along the anti-diagonal (a wavefront) approach described by Wozniak • • Vectors along the query (a single column split downward) described by Rognes and Seeberg • A striped approach introduced by Farrar • Multi-sequence vectors described by Alpern et. al. and again by Rognes U N C L A S S I F I E D Slide 21 Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA - LA-UR-12-20189
Parallelizing the Smith-Waterman Algorithm Sequential matrix of computed values U N C L A S S I F I E D Slide 22 Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA - LA-UR-12-20189
Parallelizing the Smith-Waterman Algorithm Tilted data arrangement to parallelize and process a diagonal at a time. U N C L A S S I F I E D Slide 23 Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA - LA-UR-12-20189
Parallelizing the Algorithm: “Tilting” the Matrix U N C L A S S I F I E D Slide 24 Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA - LA-UR-12-20189
Recommend
More recommend