Red Blood Cell Simulations with Chemical Transport Properties Ansel - PowerPoint PPT Presentation

Red Blood Cell Simulations with Chemical Transport Properties Ansel L. Blumers Karniadakis Group, Brown University, Rhode Island, USA San José || GTC 2017 || May, 2017

Scientific Inquiries Aim to investigate Chemical-driven plaque and thrombus formation. Model chemical transport Released from red and white blood cells to plasma, Sieved through the vessel wall to surrounding tissue. Model red blood cells Red blood cell dynamics in the blood. 2 Ansel L. Blumers || ansel_blumers@brown.edu || GTC 2017 || May, 2017

Dissipative Particle Dynamics A coarse-grained particle method for mesoscopic simulations. Pairwise Force Interaction X ( F C ij + F D ij + F R F i = ij ) i 6 = j F R ij = σ ij ω R ( r ij ) ξ ij δ t − 1 / 2 e ij random F D ij = − γ ij ω D ( r ij )( e ij · v ij ) e ij dissipative F C ij = α ij ω C ( r ij ) e ij conservative Groot, R.D., Warren, P.B., The Journal of Chemical Physics, 1997 3 Ansel L. Blumers || ansel_blumers@brown.edu || GTC 2017 || May, 2017

Transport Properties Solves the Advection-Diffusion-Reaction equation. Pairwise Chemical Transport source / external random flux Fickian flux Li, Z., Yazdani, A., Tartakovsky, A., Karniadakis, G.E., The Journal of Chemical Physics, 2015 4 Ansel L. Blumers || ansel_blumers@brown.edu || GTC 2017 || May, 2017

Modeling Red Blood Cell (RBC) Discretized with 500 vertices and held together by 3 types of bonded potentials. Bending rigidity Visco-elastic + hydrostatic-elastic Global area and volume constraints + local area constraint Fedosov, D.A., Caswell, B., Karniadakis, G.E., Biophysical Journal., 2010 5 Ansel L. Blumers || ansel_blumers@brown.edu || GTC 2017 || May, 2017

Need a Fast & Robust Program Fusion 6 Ansel L. Blumers || ansel_blumers@brown.edu || GTC 2017 || May, 2017

USER MESO 2.0 https://github.com/AnselGitAccount USER MESO Concen. & ADR Injects new capabilities into USER MESO * RBC Combining Open MPI, OpenMP and CUDA Simultaneously simulate ... chemical concentration field advection-diffusion-reaction processes USER MESO 2.0 red blood cell dynamics * S4518 , GTC 2014, Y.-H. Tang, Accelerating Dissipative Particle Dynamics Simulation on Kepler: Algorithm, Numerics and Application USERMESO 2.0 : Blumers, A., Tang, Y.-H., Li, Z., Li, X., Karniadakis, G. E., Computer Physics Communications, 2017 USERMESO : Tang, Y.-H., Karniadakis, G.E., Computer Physics Communications, 2014 7 Ansel L. Blumers || ansel_blumers@brown.edu || GTC 2017 || May, 2017

Finding Neighbors Particles are binned into cells. Finding neighbors by calculating relative distance to other particles in adjacent cells. Particles in … Center Cell N Neighboring Cells M N x M predicates Strategy #1 : Each warp takes on particles in Center Cell. Strategy #2 : Each warp takes on particles in Neighboring Cells. 3 x 3 cells 8 Ansel L. Blumers || ansel_blumers@brown.edu || GTC 2017 || May, 2017

Neighbor Kernel Optimization Atomics-free and parallel committing at warp-level dependency. Optimize : Save particle IDs to a neighbor list orderly . Solution : 1. Use __balloc(int) which returns a bit mask called ballot. 2. Use __popc(int) which returns number of bits set. 3. Return value of __popc(int) is broadcasted to each thread and further masked by a lane- specific mask 4. Place ID accordingly. S4518 , GTC 2014, Y.-H. Tang, Accelerating Dissipative Particle Dynamics Simulation on Kepler: Algorithm, Numerics and Application 9 Ansel L. Blumers || ansel_blumers@brown.edu || GTC 2017 || May, 2017

Data Layout – Chemical Concentration Each particle carries an arbitrary number of chemical species. Flux is calculated using particle location and chemical concentration. Global Load 2D Texture Hardware: K20X Stall – Data Request ~ 23% ~ 44% Texture Cache Hit Rate ~ 52% ~ 36% Kernel Duration 33.7ms 43.7ms 30% 2D Texture Implementation: Each layer holds the concentration of all particles for one species. Culprit – Texture cache depletion Data locality in coordinates is optimized for texture cache hit rate. The concentration texture depletes the cache and thus disrupt the data locality optimization. Overall texture cache hit rate is therefore reduced significantly. 10 Ansel L. Blumers || ansel_blumers@brown.edu || GTC 2017 || May, 2017

Parallel GPU/MPI Synchronization Domain decomposition: Broken up into sub-domains. Each sub-domain is computed by one CPU-GPU pair. Steps for RBC computation: 1) Compute total area and volume of local RBCs. 2) All-reduce across all nodes. 3) Enforce area and volume constraints for each RBC. Complication - Domain decomposition causes large synchronization overhead. Naive approach K-Gather Compute total area and volume of each RBC. Prior Processes prior to RBC computation. K-Apply Enforce area and volume constraints. Subsequent Processes subsequent to RBC computation. 11 Ansel L. Blumers || ansel_blumers@brown.edu || GTC 2017 || May, 2017

Attempted Optimization #1: Multi-stream Strategy: Overlapping K-Gather and K-Apply . Uses the total area and volume from previous time step. Complication: Streaming multiprocessor saturation – kernels competing for computing resources. The resulting higher cache refresh rate is detrimental to efficiency. Consequence: Worse performance 12 Ansel L. Blumers || ansel_blumers@brown.edu || GTC 2017 || May, 2017

Attempted Optimization #2: Multi-stream + Non-blocking Strategy: Overlapping data transfer and computation + Utilizing Non-blocking communication Algorithm: 1) Wait for the completion of MPI-Iallreduce from last time step. 5) Download data to host with asynchronous Memcpy-HtD . 2) Upload data to device with asynchronous Memcpy-HtD . 6) Enforce the area and volume constraints in K-Apply . 3) Compute the total area and volume of each RBC in K-Gather . 7) Wait for the completion of Memcpy-DtH . 4) Place asynchronous Memcpy-DtH in execution queue. 8) Sum total area and volume with MPI-Iallreduce . 13 Ansel L. Blumers || ansel_blumers@brown.edu || GTC 2017 || May, 2017

Benchmark – Weak & Strong Scalings Metrics: (million particles) • (steps per second) à MPS/second Strong Weak Global volume - 2,097,152 Local volume (per node) - 32,768 For example: Hct 7% for a system volume of 32,768 translates to 24 RBCs and 131,072 pure fluid particles. Hct 35% for a system volume of 32,768 translates to 123 RBCs and 49,768 pure fluid particles. 14 Ansel L. Blumers || ansel_blumers@brown.edu || GTC 2017 || May, 2017

Benchmark - Speedup Hardware: Titan, 8 Opteron 6274 clusters + 1 K20X / Cray XK7 node. Time (Non-blocking impl.) Speedup: Runtime: 1 rank with 8 OpenMP threads per node. Time (Blocking impl.) Strong Weak 15 Ansel L. Blumers || ansel_blumers@brown.edu || GTC 2017 || May, 2017

Benchmark – Single Node Time (CPU-GPU hybrid) Speedup: Time (CPU only) Hct System Volume Solvent Particles RBC Count Total Particle Count Speedup 7% 8,192 32,768 6 37,768 3.8 16,382 65,536 12 71,536 5.1 32,768 131,072 24 143,072 5.4 65,536 262,144 49 286,644 5.7 35% 8,192 32,768 30 49,768 4.5 16,384 65,536 61 96,036 5.3 32,768 131,072 123 192,572 5.9 65,536 262,144 246 385,144 6.7 For example: Hct 7% for a system volume of 8,192 translates to 6 RBCs and 32,768 pure fluid particles. 16 Ansel L. Blumers || ansel_blumers@brown.edu || GTC 2017 || May, 2017

Bonus Benchmark Hardware: Two Intel Xeon E5-2630L CPUs at 2.0 GHz, GeForce TITAN X Maxwell or TITAN X Pascal Runtime: 1 rank with 8 OpenMP threads. System Total particle TITAN X Maxwell TITAN X Pascal volume count speedup speedup 8,192 47,768 4.8 6.5 16,382 96,036 5.8 9.2 32,768 192,572 7.2 9.9 65,536 385,144 7.2 10.1 17 Ansel L. Blumers || ansel_blumers@brown.edu || GTC 2017 || May, 2017

720,778 particles ~12x speedup 5% represents RBCs @ than CPU-only 500 particles per RBC version 18 Ansel L. Blumers || ansel_blumers@brown.edu || GTC 2017 || May, 2017

Summary USER MESO 2.0 offers GPU-accelerated ability to model advection-diffusion-reaction of chemical and RBC dynamics. Multi-stream scheduling and non-blocking communication are employed to maximize GPU- CPU concurrency. Comparing with CPU counterpart, USER MESO 2.0 produces up to 10.1 times speedup on one GPU over 16 cores in a single node. It is able to achieve a weak scaling efficiency of 91% across 256 nodes and almost linear strong scaling. 19 Ansel L. Blumers || ansel_blumers@brown.edu || GTC 2017 || May, 2017

Thank you for listening! Acknowledgement: This work is supported by NIH and US Army Research Laboratory. The benchmarks were performed on TITAN at Oak Ridge National Laboratory through the Innovative and Novel Computational Impact on Theory and Experiment program under project BIP118. A.L.B. would like to acknowledge Wayne Joubert (ORNL) for his effort to coordinate machine reservations. We would like also to acknowledge the support of NVIDIA Corporation with the donation of TITAN X Pascal GPU. 20 Ansel L. Blumers || ansel_blumers@brown.edu || GTC 2017 || May, 2017

Red Blood Cell Simulations with Chemical Transport Properties Ansel - PowerPoint PPT Presentation

Red Blood Cell Simulations with Chemical Transport Properties Ansel L. Blumers Karniadakis Group, Brown University, Rhode Island, USA San Jos || GTC 2017 || May, 2017 Scientific Inquiries Aim to investigate Chemical-driven plaque

[LE,RO] red red red red red red red red red red red red red red red red red red

! Petchey Academy Blood Pres.entation BLOOD Blood groups Red blood cell ! Red blood

Uniqueness for a class of linear quadratic mean field games with common noise Foguen Tchuendom

Every ONE matters Patient Blood M anagement Guidelines www.blood.gov.au www.blood.gov.au/

Red Eyes, Red Spots, and Red Flags Red Eyes Common reason for primary care visits Red

1 st Blood Moon April 15, 2014 1 st Blood Moon April 15, 2014 A Warning to America 2 nd Blood

Disorders of Blood Cells & Blood Coagulation g HIHIM 409 CBC Red cell indices WBC

Chemical Equations and Chemical Reactions Symbols Used in Chemical Equations Chemical Equations

I Know it Was the Blood Verse 1 I know it was the blood I know it was the blood I know it was

Bacteria Without a Cell Wall L-forms Pros & Cons of Cell Wall Cell membrane Cell wall DNA

Red fox By Hunter.K Red fox traits A Red fox is a mammal.(Mammals have hair and are warm

Red- -Light Running Light Running Red Red-Light Running 2 Traffic Signals Traffic Signals

Red- -Light Running Light Running Red Red-Light Running 2 Traffic Signals Traffic Signals

80% of Code Red 2 Code Red 2 re-re- Code Red 1 and Code Red 2 Code Red 2 re- cleaned up

COMPUTER COMPUTER COMPUTER COMPUTER SIMULATIONS SIMULATIONS SIMULATIONS SIMULATIONS

Cell Communication and Cell Signaling Why is cell signaling important? Why is cell signaling

A Framework for Advancing Excellence throughout The University of Texas System: Action Plan

Pilot Study on WAP Diet, Dr. Beverly Rubik Live Blood Analysis of Persons Consuming the Weston A.

Manchest nchester er Hea ealth lth Depa epartm tment ent Cosmetolog logy y Es Esta

COSMETICS FOR YOUR HOME HOUSEHOLD CHEMISTRY SCOPE OF PURPOSE OF MILL CLEAN PRODUCTS PROFESSIONAL

Safe and Reliable Test Results Handling Running a practice session on results handling How to

Maryland Health Care Commission Quality Review and Chart Audit M A R Y A N D R U S , B A , R N

NHSN Data Quality QIA January, 2017 Mission Statement The Mission of the IPRO End Stage Renal

cultures improves patient management Martin McHugh Clinical Scientist 1 Staphylococcal

Sambuz

Useful Links

Newsletter

Mail Us

Red Blood Cell Simulations with Chemical Transport Properties Ansel - PowerPoint PPT Presentation

Red Blood Cell Simulations with Chemical Transport Properties Ansel L. Blumers Karniadakis Group, Brown University, Rhode Island, USA San Jos || GTC 2017 || May, 2017 Scientific Inquiries Aim to investigate Chemical-driven plaque

[LE,RO] red red red red red red red red red red red red red red red red red red

! Petchey Academy Blood Pres.entation BLOOD Blood groups Red blood cell ! Red blood

Uniqueness for a class of linear quadratic mean field games with common noise Foguen Tchuendom

Every ONE matters Patient Blood M anagement Guidelines www.blood.gov.au www.blood.gov.au/

Red Eyes, Red Spots, and Red Flags Red Eyes Common reason for primary care visits Red

1 st Blood Moon April 15, 2014 1 st Blood Moon April 15, 2014 A Warning to America 2 nd Blood

Disorders of Blood Cells &amp; Blood Coagulation g HIHIM 409 CBC Red cell indices WBC

Chemical Equations and Chemical Reactions Symbols Used in Chemical Equations Chemical Equations

I Know it Was the Blood Verse 1 I know it was the blood I know it was the blood I know it was

Bacteria Without a Cell Wall L-forms Pros &amp; Cons of Cell Wall Cell membrane Cell wall DNA

Red fox By Hunter.K Red fox traits A Red fox is a mammal.(Mammals have hair and are warm

Red- -Light Running Light Running Red Red-Light Running 2 Traffic Signals Traffic Signals

Red- -Light Running Light Running Red Red-Light Running 2 Traffic Signals Traffic Signals

80% of Code Red 2 Code Red 2 re-re- Code Red 1 and Code Red 2 Code Red 2 re- cleaned up

COMPUTER COMPUTER COMPUTER COMPUTER SIMULATIONS SIMULATIONS SIMULATIONS SIMULATIONS

Cell Communication and Cell Signaling Why is cell signaling important? Why is cell signaling

A Framework for Advancing Excellence throughout The University of Texas System: Action Plan

Pilot Study on WAP Diet, Dr. Beverly Rubik Live Blood Analysis of Persons Consuming the Weston A.

Manchest nchester er Hea ealth lth Depa epartm tment ent Cosmetolog logy y Es Esta

COSMETICS FOR YOUR HOME HOUSEHOLD CHEMISTRY SCOPE OF PURPOSE OF MILL CLEAN PRODUCTS PROFESSIONAL

Safe and Reliable Test Results Handling Running a practice session on results handling How to

Maryland Health Care Commission Quality Review and Chart Audit M A R Y A N D R U S , B A , R N

NHSN Data Quality QIA January, 2017 Mission Statement The Mission of the IPRO End Stage Renal

cultures improves patient management Martin McHugh Clinical Scientist 1 Staphylococcal

Sambuz

Useful Links

Newsletter

Mail Us

Disorders of Blood Cells & Blood Coagulation g HIHIM 409 CBC Red cell indices WBC

Bacteria Without a Cell Wall L-forms Pros & Cons of Cell Wall Cell membrane Cell wall DNA