Accelerated Photodynamic Cancer Therapy Planning with FullMonte Jeffrey Cassidy, PhD Candidate University of Toronto (Canada) #OpenPOWERSummit
Photodynamic Therapy (PDT) for Cancer Photosensitizer Light Exposure (drug) (fluence J/cm 2 ) Multiple FDA- approved drugs � Surface illumination, Topical, IV, or oral Implanted fibre, or intraoperative Tissue Oxygen Cells killed Normally present in tissue Join the conversation at #OpenPOWERSummit 2
Photodynamic Therapy (PDT) for Cancer ▪ Benefits ▪ Very low systemic toxicity ▪ Repeatable ▪ Highly targeted ▪ Simple, inexpensive delivery ▪ Single-shot, possibly outpatient basis � ▪ Challenges ▪ Variable outcomes ▪ Compute-intensive to model � � Join the conversation at #OpenPOWERSummit 3
Photodynamic Therapy (PDT) for Cancer ▪ Current use ▪ Skin (basal-cell carcinoma, actinic keratosis) ▪ Superficial oral cavity � ▪ Trials ▪ Prostate ▪ Bladder ▪ Head & neck Brain ▪ � Bladder model � Join the conversation at #OpenPOWERSummit 4
PDT Treatment planning � ▪ Many sims required � ▪ Many free parameters � ▪ Properties variable ▪ Minutes per sim on CPU � � Simulate Image Delineate Define dose Approve planning Propose plan organs parameters plan volume Image courtesy Robert Weersink, Princess Margaret Cancer Centre Join the conversation at #OpenPOWERSummit 5
PDT Treatment planning Join the conversation at #OpenPOWERSummit 6
FullMonte simulation kernel Monte Carlo simulation traces photons through tetrahedral mesh Launch Draw Region Hop Interface step lookup Drop Exit Spin Dead “Digimouse” open-source mouse atlas (standard pre-clinical model) Join the conversation at #OpenPOWERSummit 7
FullMonte simulation kernel � � � Launch � � � ▪ Altera Stratix V FPGA Draw Region Hop Interface step lookup � ▪ Flow stages <-> hardware modules Drop Exit � ▪ Queues at join points � 1 step/clk @ 250 MHz ▪ Spin Dead � Join the conversation at #OpenPOWERSummit 8
Preliminary Results *Prototype x86-hosted system, limited (48k) mesh size Performance FPGA* 4 FPGA + server Metric 1 instance @ 4 inst/FPGA (<25% area) (projected) Throughput / node 4x 64x Throughput / $capital 0.95x 3.6x Throughput / W 67x 41x Intel Sandy Bridge (32nm) i7-2600K 3.6 GHz 4-core HT gcc –O3, multithreaded, hand-tuned SSE4 Price $1200 (excl. GPU & monitor) � Altera Stratix V (28nm) at F max =280MHz Quartus II PowerPlay power estimation 4.5W single instance Price $5000 (Nallatech 385 list price) � Join the conversation at #OpenPOWERSummit 9
OpenPOWER/CAPI Scale-up CAPI platform: � ▪ Enables large meshes (no more 64k limit) ▪ On-chip cache for most frequently accessed ▪ Host serves long tail via CAPI � ▪ Maintains power & performance advantage � ▪ Provides support code (host & FPGA side) � ▪ Supports fast host-accelerator communication � � Join the conversation at #OpenPOWERSummit 10 � �
OpenPOWER/CAPI Scale-up Digimouse (lung tumour) 1 Static LRU Hybrid Oracle 0.75 FPGA memory Miss Rate capacity 0.5 � Misses served by CAPI 0.25 0 1 2 4 8 16 32 64 128 256 512 1k 2k 4k 8k 16k 32k 64k Cache Size (elements) 1 Join the conversation at #OpenPOWERSummit 11
OpenPOWER/CAPI Scale-up Simple hybrid cache Digimouse (lung tumour) � 4 Static LRU Hybrid ▪ L1 4-el. LRU Miss Rate (relative to Oracle=1.0) ▪ L2 N-el. static 3 ▪ Most frequent ▪ Banked 2 Host-managed ▪ � 1 -32% misses vs. pure LRU and simpler to implement 0 64k 1 2 4 8 16 32 64 128 256 512 1k 2k 4k 8k 16k 32k � Cache Size (elements) +60% vs. clairvoyant (perfect prediction) 3 Join the conversation at #OpenPOWERSummit 12 2
& BlueLink � ▪ Designed with Bluespec SystemVerilog (BSV) ▪ Atomic rules for complex concurrency ▪ High performance Strong typing, good IP library, fast sim ▪ � ▪ Created open-source BSV library BlueLink ▪ Interface to IBM CAPI hardware & sim env. ▪ Simplified module interface ▪ IP for host <-> FPGA xfer to reg/MLAB/ BRAM ▪ Examples � github.com/jeffreycassidy/bluelink (work in progress) � Join the conversation at #OpenPOWERSummit 13 �
Summary MC photon transport simulation on tetrahedral mesh � ▪ Performance per Watt -> FPGA ▪ Fixed-point (18b) arithmetic ▪ Spatial dataflow pipeline ▪ Inexpensive, effective custom caching � ▪ Tight host-FPGA coupling -> OpenPOWER CAPI ▪ Large meshes (>> FPGA on-chip mem.) ▪ Host mem. serves infrequent items ▪ Host-managed static cache set � >40x more performance/W vs CPU � Join the conversation at #OpenPOWERSummit 14 �
Acknowledgements PhD Supervisors Prof. Vaughn Betz, Univ. Toronto ECE Prof. Lothar Lilge, Princess Margaret Cancer Centre � Funding IBM, Altera, CIHR, NSERC � In-Kind Support IBM, SOSCIP , Altera, Bluespec � Discussion Robert Weersink, PMCC Henry Wong, U Toronto Join the conversation at #OpenPOWERSummit 15
Recommend
More recommend