swamp enhanced smith waterman search for parallel models
play

SWAMP+: Enhanced Smith- Waterman Search for Parallel Models - PowerPoint PPT Presentation

SWAMP+: Enhanced Smith- Waterman Search for Parallel Models Shannon Steinfadt, Ph.D. Los Alamos National Laboratory shannon@lanl.gov U N C L A S S I F I E D Operated by Los Alamos National Security, LLC for the U.S. Department of Energys


  1. SWAMP+: Enhanced Smith- Waterman Search for Parallel Models Shannon Steinfadt, Ph.D. Los Alamos National Laboratory shannon@lanl.gov U N C L A S S I F I E D Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA

  2. Outline gcggacgctccacg-tgtc--c—-ct-cgccgcgccc-cgtctacc Motivation for Sequence Alignment � ||:|:||||::|-|::|--|--||-|-|:|:|::| ||-|:|| gggccctcctggctcccaacagcttctcagttc ccacttc Smith-Waterman Local Sequence Alignment � SWAMP � ASC � • SWAMP using ASC Emulator SWAMP+ � SWAMP and SWAMP+ on Metal � • ClearSpeed • Convey Computer Contributions � Future Work � Questions? � U N C L A S S I F I E D Slide 2 Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA - LA-UR-12-20189

  3. Motivation: Sequence Alignment Given two sequences: DNA nucelotides {A, G, T, C} Proteins { A, R, N, D, C, Q, E, G, H, I, L, K, M, F, P, S, T, W, Y, V } Align them to find the longest, most common subsequence Query: IHACYSRQPELAAKLMKDVIAEPYRERLLPGFRQARQAVAEIGAVASGISGSGPTLFALCDKPETAQRVA Subject: MFCVQCEQTIRTPAGNGCSYAQGMCGKTAETSDLQDLLIAALQGLSAWAVKAREYGIINHDVDSFAPRAFFST LTNVNFDSPRIVGYAREAIALREALKAQCLAVDANARVDNPMADLQLVSDDLGELQRQAAEFTPNKDKAAIGENILGLRL LCLYGLKGAAAYMEHAHVLGQYDNDIYAQYHKIMAWLGTWPADMNALLECSMEIGQMNFKVMSILDAGETGKYGHPTPTQ VNVKATAGKCILISGHDLKDLYNLLEQTEGTGVNVYTHGEMLPAHGYPELRKFKHLVGNYGSGWQNQQVEFARFPGPIVM TSNCIIDPTVGAYDDRIWTRSIVGWPGVRHLDGDDFSAVITQAQQMAGFPYSEIPHLITVGFGRQTLLGAADTLIDLVSR EKLRHIFLLGGCDGARGERHYFTDFATSVPDDCLILTLACGKYRFNKLEFGDIEGLPRLVDAGQCNDAYSAIILAVTLAE KLGCGVNDLPLSLVLSWFEQKAIVILLTLLSLGVKNIVTGPTAPGFLTPDLLAVLNEKFGLRSITTVEEDMKQLLSA U N C L A S S I F I E D Slide 3 Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA - LA-UR-12-20189

  4. Motivation: Sequence Alignment Given two sequences: DNA nucelotides {A, G, T, C} Proteins { A, R, N, D, C, Q, E, G, H, I, L, K, M, F, P, S, T, W, Y, V } Align them to find the longest, most common subsequence One of the most common fundamental tasks is local sequence alignment Query: VIA-EPYRE-RLLPGFRQARQAVAEIGAVASGISGSGPTLFALCDK : : :: : :: : : : : Subject: LVSREKLRHIFLLGGCDGARGERHYFTDFATSVPDDCLILTLACGK U N C L A S S I F I E D Slide 4 Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA - LA-UR-12-20189

  5. Pairwise Local Sequence Alignment Similar Characters Similar Structure Similar Characters Similar Structure Similar Function Similar Function Homologous (derived by humans) Sequences Ancestral Relationships (preserved by Ancestral Relationships evolution) Gene Functionality Gene Functionality Aid in Drug Discovery Aid in Drug Discovery Assembly of Raw Data Assembly of Raw Data U N C L A S S I F I E D Slide 5 Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA - LA-UR-12-20189

  6. Aligning using Smith-Waterman Algorithm Compare all possible combinations of sequence characters against each other Cost Key Match +10 Miss -3 Insert a Gap -3 Extend a Gap -1 U N C L A S S I F I E D Slide 6 Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA - LA-UR-12-20189

  7. Aligning using Smith-Waterman Algorithm Compare all possible combinations of sequence characters against each other Cost Key Match +10 Miss -3 Insert a Gap -3 Extend a Gap -1 U N C L A S S I F I E D Slide 7 Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA - LA-UR-12-20189

  8. Aligning using Smith-Waterman Algorithm Compare all possible combinations of sequence characters against each other Cost Key Match +10 Miss -3 Insert a Gap -3 Extend a Gap -1 U N C L A S S I F I E D Slide 8 Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA - LA-UR-12-20189

  9. Aligning using Smith-Waterman Algorithm Compare all possible combinations of sequence characters against each other Cost Key Match +10 Miss -3 Insert a Gap -3 Extend a Gap -1 U N C L A S S I F I E D Slide 9 Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA - LA-UR-12-20189

  10. Aligning using Smith-Waterman Algorithm Compare all possible combinations of sequence characters against each other Cost Key Match +10 Miss -3 Insert a Gap -3 Extend a Gap -1 U N C L A S S I F I E D Slide 10 Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA - LA-UR-12-20189

  11. Aligning using Smith-Waterman Algorithm Compare all possible combinations - but it has dynamic programming data dependencies Cost Key Match +10 Miss -3 Insert a Gap -3 Extend a Gap -1 U N C L A S S I F I E D Slide 11 Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA - LA-UR-12-20189

  12. Aligning using Smith-Waterman Algorithm Compare all possible combinations - but it has dynamic programming data dependencies Cost Key Match +10 Miss -3 Insert a Gap -3 Extend a Gap -1 U N C L A S S I F I E D Slide 12 Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA - LA-UR-12-20189

  13. Aligning using Smith-Waterman Algorithm Compare all possible combinations - but it has dynamic programming data dependencies Cost Key Match +10 Miss -3 Insert a Gap -3 Extend a Gap -1 U N C L A S S I F I E D Slide 13 Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA - LA-UR-12-20189

  14. Aligning using Smith-Waterman Algorithm Compare all possible combinations - but it has dynamic programming data dependencies Cost Key Match +10 Miss -3 Insert a Gap -3 Extend a Gap -1 U N C L A S S I F I E D Slide 14 Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA - LA-UR-12-20189

  15. Aligning using Smith-Waterman Algorithm Compare all possible combinations - but it has dynamic programming data dependencies Cost Key Match +10 Miss -3 Insert a Gap -3 Extend a Gap -1 U N C L A S S I F I E D Slide 15 Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA - LA-UR-12-20189

  16. Aligning using Smith-Waterman Algorithm Compare all possible combinations - but it has dynamic programming data dependencies Cost Key Match +10 Miss -3 Insert a Gap -3 Extend a Gap -1 U N C L A S S I F I E D Slide 16 Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA - LA-UR-12-20189

  17. Aligning using Smith-Waterman Algorithm Compare all possible combinations - but it has dynamic programming data dependencies Cost Key Match +10 Miss -3 Insert a Gap -3 Extend a Gap -1 U N C L A S S I F I E D Slide 17 Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA - LA-UR-12-20189

  18. Smith-Waterman Recursive Matrix Equations − ⎧ ⎫ ⎪ ⎪ C g ⎧ ⎫ − 1 , i j = − σ D i . j max ⎨ ⎬ D . ⎪ ⎪ i j ⎪ ⎪ D ⎩ ⎭ − 1 , i j ⎪ ⎪ I i , j C i , j = max ⎨ ⎬ ( ) , j − 1 + d S 1 i , S 2 j ⎪ ⎪ C i − 1 − ⎧ ⎫ ⎪ ⎪ C g − ⎪ ⎪ , 1 i j = − σ max ⎨ ⎬ ⎩ 0 ⎭ I , g i j ⎪ ⎪ I ⎩ ⎭ − , 1 i j = ⎧ if ( ) ⎪ match_cost S1 S2 = i j d ⎨ S1 , S2 g : gap extension cost ≠ i j ⎪ if miss_cost S1 S2 ⎩ i j σ : gap opening cost U N C L A S S I F I E D Slide 18 Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA - LA-UR-12-20189

  19. Traceback in the Smith-Waterman Algorithm 1) Find the maximum computed value Cost Key Match +10 Miss -3 Insert a Gap -3 Extend a Gap -1 U N C L A S S I F I E D Slide 19 Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA - LA-UR-12-20189

  20. Traceback in the Smith-Waterman Algorithm 1) Find the maximum computed value 2) Traceback until you reach ‘0’s Alignment: Cost Key CATTG Match +10 C - -TG Miss -3 Insert a Gap -3 Extend a Gap -1 U N C L A S S I F I E D Slide 20 Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA - LA-UR-12-20189

  21. Smith-Waterman Vectorization Approaches Parallel Processing � • Allows high-quality results in less time using the Smith-Waterman algorithm Rognes described four basic approaches: � Vectors along the anti-diagonal (a wavefront) approach described by Wozniak • • Vectors along the query (a single column split downward) described by Rognes and Seeberg • A striped approach introduced by Farrar • Multi-sequence vectors described by Alpern et. al. and again by Rognes U N C L A S S I F I E D Slide 21 Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA - LA-UR-12-20189

  22. Parallelizing the Smith-Waterman Algorithm Sequential matrix of computed values U N C L A S S I F I E D Slide 22 Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA - LA-UR-12-20189

  23. Parallelizing the Smith-Waterman Algorithm Tilted data arrangement to parallelize and process a diagonal at a time. U N C L A S S I F I E D Slide 23 Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA - LA-UR-12-20189

  24. Parallelizing the Algorithm: “Tilting” the Matrix U N C L A S S I F I E D Slide 24 Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA - LA-UR-12-20189

Recommend


More recommend