Dataflow Super-Computing Jacob Bower YU INFO, February 2012
Maxeler Technologies • Maxeler offers complete hardware, software and application acceleration solutions for high performance computing • ~70 people, offices in London, UK and Palo Alto, CA Card: PCI Express x16, compute, memory and local interconnect Node: 1U solutions with 1 or 4 Cards Hardware Rack: 10U, 20U or 40U, balancing compute, storage & network MaxelerOS: Resource management of Dataflow Computing Runtime support: memory management and data choreography Software MaxCompiler: providing programmability HPC System Performance Architecture Algorithms and Numerical Optimization Consulting Integration into business and technical processes 2
Overview Dataflow Computing Programming Dataflow Systems Dataflow Engines and Platforms Case Study: Accelerating Risk Computation 3
DATAFLOW COMPUTING
Computing with Instruction Processors 5
Instruction Processor Spectrum Single-Core CPU Multi-Core Many-Core Intel, AMD GPU (NVIDIA, AMD) Tilera, XMOS etc... Hybrid e.g. AMD Fusion, IBM Cell 6
Computing with Dataflow 7
Computation Resources Intel 6- Core X5680 “ Westmere ” Xilinx Virtex-6 SX 475T Computation Computation MaxelerOS 8
PROGRAMMING DATAFLOW SYSTEMS
Programming with MaxCompiler Host application CPU Kernels MaxCompilerRT MaxelerOS Dataflow Engine PCI Express Memory + + * Memory Manager (MaxelerOS) 10
MaxCompiler Architecture 11
Dataflow Kernel Programming ( ) / 3 y x x x i i 1 i i 1 12
DATAFLOW ENGINES AND PLATFORMS
Various Dataflow Systems MaxWorkstation Desktop development system MaxNode High density compute system 1-4 Dataflow Engines with up to 192GB RAM MaxNode10G Low latency connectivity platform 1-2 Dataflow Engines with up to six 10GE connections MaxRack 10, 20 or 40 node rack systems Balanced compute, networking & storage 14
MaxNode with MAX3 • 1U Form Factor • 4x MAX3 cards with Virtex-6 FPGAs • MaxRing interconnect • 2x Intel Xeon CPUs • Up to 192GB host RAM • Up to 192GB FPGA RAM • 3x 3.5” disks • ~700W Power 15
CASE STUDY: ACCELERATING J.P. MORGAN RISK COMPUTATION
Computational Finance • Compute value of complex financial products • Compute risk: what’s the sensitivity to X changing? • Typically computed overnight on hundreds to thousands of CPU cores. But – The market changes throughout the day! – We really need to evaluate scenarios: what happens if country X defaults? 17
Credit Derivatives 101 • Bonds are a way for Companies/Countries to borrow – Investors make profit through coupon payments • Investors mitigate risk using – Credit Default Swaps (CDS) – Collateralized Debt Obligations (CDO)... 18
CDOs CDS CDS CDS CDS CDS 19
CDOs High risk Tranche CDS CDS CDS CDS Tranche CDS Low risk 20
CDOs High risk CDS CDS CDS CDS CDS Low risk 21
CDOs High risk CDS CDS CDS CDS CDS Low risk 22
Losses for Different Tranches Amount of Loss for $1M notional in various tranches of 125 name pool 1000000 900000 800000 Amount of Loss in Tranche ($) 700000 600000 500000 0% - 100% (CDSI) 400000 0%-3% 300000 3%-7% 7%-15% 200000 15%-30% 100000 30%-100% 0 0 20 40 60 80 100 120 Number of Names Defaulted 23
Valuing Tranched Credit Derivatives Unconditional Survival Probability for this Name Market factor M Conditional Survival Probability for this Name Correlation Good Market (M>>0) Bad Market (M<<0) 1 1 Probability Probability 0 0 100 100 0 0 Amount of Loss (%) Amount of Loss (%) 24
Application Analysis 25
Convoluter Design Conditional Survival Probabilities Weights Credits Unrolled (c) Notional Sizes Accumulated Loss Distribution (weighted sum) Market Factors Unrolled (m) 26
Credit Derivatives Acceleration • Calculation of current value and credit spread risk for population of 2,925 bespoke tranches. • Speedup from 1 MAX2: – 219 – 270x compared to 1 core – ~30x compared to 8-core node • Power consumption drops from 250W/node to 240W/node with acceleration – >30x more power efficient 27
Summary & Conclusions • Dataflow computing allows: – massive parallelism in computation – highly efficient use of silicon area on chips • Maxeler creates: – dataflow engines and systems – MaxCompiler to easily program these • Dataflow computing used at J.P. Morgan – around 30x performance improvement in speed 28
Recommend
More recommend