Fast Binding Site Mapping using GPUs and CUDA Bharat Sukhwani Martin C. Herbordt Computer Architecture and Automated Design Laboratory Department of Electrical and Computer Engineering Boston University http://www.bu.edu/caadlab * This work supported, in part, by the U.S. NIH/NCRR
Why Bother? Problem: Combat the bird flu virus Method: Inhibit its function by “gumming up” Neuraminidase, a surface protein, with an inhibitor - Neuraminidase helps release progeny viruses from the cell. Procedure*: - Search protein surface for likely sites - Find a molecule that binds there (and only there) Binding site mapping : - Very compute intensive: Usually run on clusters - GPU based desktop alternative *Landon, et al. Chem. Biol. Drug Des 2008 # # From From New Scientist New Scientist www.newscientist.com/channel/health/bird www.newscientist.com/channel/health/bird- -flu flu 2
Outline � Overview of Binding Site Mapping � Rigid Docking � Energy Minimization � Overview of NVIDIA GPUs / CUDA � Rigid Docking on GPU � Energy Minimization on GPU � Results 3
Binding Site Mapping Purpose: Identification of hot spots Significance: Very effective for drug-discovery Rationale: � Hot spots are major contributors to the binding energy � They bind a large variety of small molecules Process: Docking small probes � Rigid Docking � Energy Minimization 4
Mapping: Two Step Process � Rigid Docking of Probes into Protein � Grid-based computation � Exhaustive 6D search � Find an approximate conformation Good fit Collision Poor fit � Local refinement – Energy Minimization � Model the flexibility in the side-chains 5
FTMap* � 16 small molecule probes � Dock each probes into the protein � 500 rotations – 10 6 translations per rotation � 30 minutes on a single CPU � Energy minimize 2000 conformations per protein-probe complex � Up to 30 seconds per conformation � 16 hours per probe! * Brenke R, Kozakov D, Chuang G-Y, Beglov D, Mattos C, and Vajda S. Fragment-based identification of druggable "hot spots" of proteins using Fourier domain correlation, Bioinformatics. 6
Outline � Overview of Binding Site Mapping � Rigid Docking � Energy Minimization � Overview of NVIDIA GPUs / CUDA � Rigid Docking on GPU � Energy Minimization on GPU � Results 7
NVIDIA GPU Architecture Streaming Processor (SP) Streaming Multiprocessor (SM) Device Memory NVIDIA Tesla C1060 Architecture ������������ � ������������������ � ������������������� � 4 GB Device memory � ������������� * Source: NVIDIA Corporation 8
Memory Hierarchy 100 GB/s 1000 GB/s 3 GB/s CPU Main Device Memory Shared Memory Memory Register Read Constant Cache On-board On-chip * Source: NVIDIA Corporation 9
CUDA Programming Model Thread Threads within a block can be synchronized On-chip Block of Threads Different blocks must be independent On-board Grid of Blocks * Source: NVIDIA Corporation 10
Outline � Overview of Binding Site Mapping � Rigid Docking � Energy Minimization � Overview of NVIDIA GPUs / CUDA � Rigid Docking on GPU � Energy Minimization on GPU � Results 11
Rigid Docking: Procedure Protein Probe Rotation Grid Assignment Pose Score: 3D FFT Correlation Scoring and Filtering 12
PIPER Rigid Docking Program 2.4% 2.3% 2.3% � Structural Bioinformatics lab at BU 93% � Complex energy functions � Top scorer in CAPRI * challenge Rotation + Grid FFT Correlation Accumulation Scoring and Filtering E E w E w E = + + shape 2 elec 3 desol E shape = E attr + w 1 E repul Perform once Repeat for each rotation Read Receptor and Rotate ligand grid by Ligand files next incremental angle Read parameter, rotation E E E and coefficients Repeat for each of (P + 4) grids = + elec born coulomb Perform forward FFT Compute FFT size on ligand grid Modulate the transformed receptor and Create receptor grids for different P 1 − ligand grids energy functions � E E = Perform inverse FFT Perform (P + 4) desol pairpot _ k on product grid forward FFTs k 0 = Accumulate pairwise potential Compute complex product grids conjugate of FFT grids Up to 22 FFT correlations Create ligand grids for Perform weighted different energy functions are required scoring and filtering Best Fit * Janin, J., Henrick, K., Moult, J., Eyck, L., Sternberg, M., Vajda, S., Vakser, I., and Wodak, S. CAPRI: A 13 critical assessment of predicted interactions. Proteins, 52 (2003), 2-9
Rigid Docking on GPUs - Correlation � Direct Correlation (better than FFT!) � For small grid sizes � Replaces FFT, voxel-voxel summation, IFFT SMP SMP SMP � Each multiprocessor accesses both Shared Shared Shared Memory Memory Memory the grids � Protein grid on the global memory Global Memory � Probe grid duplicated on shared memories ����� � Multiple correlations together ����� ������ � Voxel represents multiple energy functions 14
Direct Correlation on GPUs SMP SMP SMP � Shared memory limits the probe size Shared Shared Shared Memory Memory Memory � With 8 correlations – 8 cubed � Probe grids are typically 4 cubed Global Memory � Multiple rotations together � 8 rotations SMP � Effectively loop-unrolling Shared Memory � Multiple computations per global memory fetch � 2.7x additional performance improvement 15
Direct Correlation on GPUs � Distribution of work among threads / blocks � Scheme 1: Entire 2D-plane to a thread block � Scheme 2: Part of the 2D-plane to a thread block � Both yield similar results Result grid SMP SMP SMP SMP SMP SMP SMP SMP 16
Scoring and Filtering on GPUs � Score Computation N 3 Scores N 3 � Divide work among different threads M T 0 T 1 T 2 T M-2 T M-1 � Sync and Serialize to find the best-of- the-best Shared Memory � Only one multiprocessor utilized T 0 Best Score � Flagging for exclusion (N 3 entries) � Serial code – Exclusion bit-vector 1 1 0 0 0 1 0 (100 entries) � GPU Solution 1 – Exclusion index array 4 5 16 28 45 � GPU Solution 2 – Exclusion bit-vector on (N 3 entries) GPU global memory 1 1 0 0 0 1 0 17
Outline � Overview of Binding Site Mapping � Rigid Docking � Energy Minimization � Overview of NVIDIA GPUs / CUDA � Rigid Docking on GPU � Energy Minimization on GPU � Results 18
Energy Minimization � Minimizing energy between two molecules � Iterative process Convergence? � Optimization moves � Used to model flexible side chains � ����� ��� �� !�� ���� !�� "�#��! � �#$�� !�� ������# N-body problem with a cut-off "�#��� #�#%"�#��� 19
Looks like MD, but it’s not � Performed on a local region � Many fewer atoms, typically few thousand Different � Much smaller atom neighborhoods geometry � Very small cut-off radius � Move to the next position � Coordinate adjustments - No motion / velocity updates � No cell-lists / efficient filtering Different � Refinement step; close to dest. - small motions computations � Neighbor lists are very sparse, with non- uniform distribution 20
Energy Minimization step of FTMap ���� ����� ����� ����� ������ ����������������� ���� �������������� ������������� ������ FTMap Minimization Step Energy evaluation phase � ����� ��� �� !�� ���� !�� "�#��! � �#$�� !�� ������# Absolute time ~ 10 ms per iteration (on a single core) #�#%"�#��� "�#��� 21
FTMap Electrostatics Model � ����� ��� �� !�� ���� !�� "�#��! � �#$�� !�� ������# Analytic Continuum Electrostatics (ACE) Atom Self Energy: Electrostatic energy due to the charge itself � � 2 ~ r 4 � � � � 2 ik 2 2 3 q q − � � q V r � τ τ 2 � � self self i � � self σ i i k ik E E E e ik = + � � = + i ik ik 4 4 � � 2 R 8 r ε ω π + µ s i k i ik ik ik ≠ Pairwise interaction – Generalized Born eqn.: Electrostatic energy due to the presence of other charges q q q q � � i j i j int E 332 166 = − τ ij � � r 2 r ij � � j i j i ≠ ≠ ij � � − 4 � α α � 2 i j r e + α α ij i j Born Radii – depends on E self 22
FTMap Data Structure - Neighbor Lists First Atoms Second Atoms Atoms List Self Energy 2 0 0 1 1 6 Cycle through 1 st 1 11 7 2 14 1 2 11 atoms – update 3 4 2 3 0 partial energies of 5 both 2 4 14 15 3 12 12 n-1 4 5 Random updates for second atoms � � Can’t distribute the atoms list across multiprocessors • Memory conflicts during Write conflicts � updates • Serialization during � Second atom might appear in multiple lists accumulation Not suitable for parallel implementations � 23
Recommend
More recommend