Accelerating Tandem MS Protein Database Searches Using OpenCL - - PowerPoint PPT Presentation

accelerating tandem ms protein database searches using
SMART_READER_LITE
LIVE PREVIEW

Accelerating Tandem MS Protein Database Searches Using OpenCL - - PowerPoint PPT Presentation

Rick Weber, David D. Jenkins, Nicholas Lineback, Robert Hettich, Gregory D. Peterson Accelerating Tandem MS Protein Database Searches Using OpenCL Programming devices the intractable way Programming devices with OpenCL T andem MS/MS


slide-1
SLIDE 1

Accelerating Tandem MS Protein Database Searches Using OpenCL

Rick Weber, David D. Jenkins, Nicholas Lineback, Robert Hettich, Gregory D. Peterson

slide-2
SLIDE 2

Programming devices the intractable way

slide-3
SLIDE 3

Programming devices with OpenCL

slide-4
SLIDE 4

T andem MS/MS experiment

 Collect a sample  Clean it

Try to remove things that aren’t proteins

 Dissolve proteins into peptides

Trypsin

 Shoot mixture through mass spectrometer  Mass spectrometer gives ~100k scans

containing m/Z and intensities

slide-5
SLIDE 5

Peptide searching with database

slide-6
SLIDE 6

Search algorithms

 Mostly differ in the scoring algorithm

Consequently, different execution rates

 Sequest

Cross correlation Most widely used

 X! Tandem

Dot product

 Myrimatch

Multi-Variate Hypergeometric (MVH) distribution

slide-7
SLIDE 7

Specmaster

 OpenCL Myrimatch implementation

Runs correctly on AMD, Nvidia GPUs; AMD,

Intel CPUs

Not tested anything else

Designed from ground up for speed

 Myrimatch already multi-threaded

No 400x speedup using GPU 10x is more reasonable

slide-8
SLIDE 8

Algorithm design

 Make peptides from proteins

sequentially on CPU

Needs to be done in OpenCL (future work) Amdahl’s law

 Perform search using OpenCL devices

Each workgroup processes different MS2+

scan

Each work item searches a different

candidate

slide-9
SLIDE 9

Search

 Binary search for candidates

Precursor masses within tolerance for

assumed charge state

 Binary search for ions

Look for peaks theoretically predicted for

peptide’s amino acid in multiple charge states

Compute MVH as a function of number of

found peaks by intensity class

slide-10
SLIDE 10

OpenCL and the lack of free lunch

 Little performance portability  Different devices have:

Different memories Different SIMD sizes Different branch penalties Different execution models

slide-11
SLIDE 11

Memory speeds

__constant __local __global (cached) __global (raw) E5-2680 518GB/s 425GB/s 469GB/s 51GB/s GTX 480 1.29TB/s 1.3TB/s 588GB/s 152GB/s Radeon 7970 7TB/s 3.6TB/s 1.7TB/s 213GB/s

slide-12
SLIDE 12

Preferred work group sizes

 CPU: 1  AMD GPU: multiple of 32 or 64  Nvidia GPU: multiple of 32 or 64

slide-13
SLIDE 13

Peformance (as of time of publication)

slide-14
SLIDE 14

“Future work” already completed

 Portable device specific tuning

Still running with same kernel code on all

devices!

Preprocessor abuse Kernel apathetic to work group size

 Heterogeneous scan scoring

Use every device in CPU to score Up to 90% of peak strong-scaled throughput

using 32 cores and 3 Radeon 7970s

slide-15
SLIDE 15

Actual future work

 Post translational modifications When generating peptides, create each

modified variant of pepties on CPU

Easy (Don’t need to modify kernels) Probably slow

Take existing unmodified list and modify on the

fly on the device

Hard due to lack of recursion in OpenCL Amortizes sequential execution and PCIe transfers

slide-16
SLIDE 16

Acknowledgements

 The University of Tennessee  NSF and SCALE-IT  Intel

For donating the research machine

slide-17
SLIDE 17

Questions?