dataflow acceleration of smith waterman with traceback
play

Dataflow acceleration of Smith- Waterman with Traceback for high - PowerPoint PPT Presentation

Dataflow acceleration of Smith- Waterman with Traceback for high throughput Next Generation Sequencing Konstantina Koliogeorgi*, Nils Voss , Sotiria Fytraki , Sotirios Xydis*, Georgi Gaydadjiev , Dimitrios Soudris* * National


  1. Dataflow acceleration of Smith- Waterman with Traceback for high throughput Next Generation Sequencing Konstantina Koliogeorgi*, Nils Voss ⴕ , Sotiria Fytraki ⴕ , Sotirios Xydis*, Georgi Gaydadjiev ⴕ , Dimitrios Soudris* * National Technical University of Athens, Greece, {konstantina, sxydis, dsoudris}@microlab.ntua.gr Maxeler Technologies UK {nvoss, sfytraki, georgi}@maxeler.com FPL 2019 Conference

  2. Genome Sequencing • Genome represents entire genetic information of an organism • Next-Generation Sequencing technologies allow to compare individual to reference genome • Typical genomic workflow e.g. SeqMule • short read alignment: reads ~100 bases long Alignment QC assessment Extract Variant Calling coverage Alignment on input consensus calls statistics sequences • Operate on huge amount of data WES Analysis Bowtie2 Workflows Bowtie2 in SeqMule WES Workflows Time for Increasing Input Size 6 10 5 Other Time in hrs 8 4 Time in hrs Bowtie2 6 3 10 GB 4 2 14.8 GB 2 1 19.3 GB 0 0 SomaticVarscan SomaticSamtools NormalVarscan SomaticVarscan SomaticSamtools Normal Workflow Workflow • Aligners Bottleneck in Workflow => in need of acceleration! 8 September 2019

  3. Problem Statement • Most Aligners utilize Seed & Extend Model • Fragment reads into short pieces (seeds) that align exactly to genome • Extend seeds to full alignment with SmithWaterman • SmithWaterman 60 83% % of total reads • Matrix Fill Stage followed by Traceback 50 40 • Takes up 60% (55% + 5% respectively) 30 20 of total time 10 • Distributed over hundreds of tasks per read 0 1 5 8 10 15 20 30 40 50 100 200 • calling & data transfer overhead number of calls • Challenge • Co-designed Solution to avoid overhead • Extract parallelism to further boost performance FPL 2019 Conference 8 September 2019

  4. Standalone Optimized Dataflow Implementation • Matrix Fill Calculates Matrices E,H,F • Traceback traverses matrices in reverse order to construct alignment path up, left elements: not yet past n th antidiagonal current checks computed checks 𝑄𝐹 � 0 0 0 0 1 4 7 1 0 0 1 2 1 3 6 4 0 0 1 2 1 3 6 4 Traceback 𝑄𝐹 � 1 1 2 1 1 2 5 4 5 2 1 1 5 4 1 2 2 1 4 3 2 1 2 5 1 1 4 1 2 2 1 4 3 2 1 4 𝑜 2 1 3 2 2 1 3 2 1 1 2 2 1 3 1 4 1 0 2 1 0 0 0 2 7 2 7 2 1 3 1 4 1 0 2 1 0 0 0 2 4 𝑄𝐹 � 7 2 1 2 1 0 2 1 2 1 0 2 1 2 1 0 0 2 1 0 0 0 2 2 0 2 1 2 1 0 2 1 0 0 0 2 2 𝑄𝐹 � 1 st row 𝑜 + 𝑛 − 1 reference upleft element: stream Matrix-Fill next check • Interleaving Data Scheme • Interlace data from subsequent read-reference pairs • Double Buffering • operate in pipeline fashion FPL 2019 Conference 8 September 2019

  5. Proposed Integration Architecture Key Architectural Decisions • Move Traceback on Hardware to alleviate transfer cost • Major Software Restructure to constraint number of accelerator calls Reads Alignments L-interleaved pairs 1 st C-CTACC 1 st Sm Sm Sm Sm Sm Sm Sm Sm Sm Sm W W W W W W W W W W 2 nd Sm Sm Sm Sm PE0 2 nd ACGT--CG W W W W Traceback . . . interleaving Sm Sm Sm Sm Sm Sm Sm Sm PCIe PCIe W W W W W W W W . . . Sm Sm Sm Sm Sm Sm PE1 W W W W W W . . . chain of seed-extend alignments Sm Sm W W . . . Sm W PEn L th Sm Sm Sm Sm Sm Sm Sm Sm Sm Sm L th ACGTGCC W W W W W W W W W W HW Execution Data gathering Data distribution phase phase phase Results • x18 speedup standalone • x1,55 speedup end to end FPL 2019 Conference 8 September 2019

  6. Thank you for your attention! FPL 2019 Conference 8 September 2019

Recommend


More recommend