on multicore systems
play

on Multicore Systems Prasad Pawar Nishant Kumar Amit Kalele Tata - PowerPoint PPT Presentation

Counterparty Credit Risk and IM Computation for CCP on Multicore Systems Prasad Pawar Nishant Kumar Amit Kalele Tata Consultancy Services Limited 1 Overview Introduction Counter Party Credit Risk Basic Terminology Sequential


  1. Counterparty Credit Risk and IM Computation for CCP on Multicore Systems Prasad Pawar Nishant Kumar Amit Kalele Tata Consultancy Services Limited 1

  2. Overview  Introduction  Counter Party Credit Risk • Basic Terminology  Sequential Algorithm  Parallel algorithm using CUDA  Optimizations applied on GPGPU  Parallelized and Optimized algorithm on Intel Platform  Comparison Results  Conclusion and future work 2

  3. Introduction Counterparty credit risk Counterparty credit risk is defined as the risk that the counterparty to a transaction could default before the final settlement of the transaction’s cash flows. Basic Terminology • IRS trade - Interest rate swap trades are trades done primarily for hedging or speculation on interest rate direction by the market participants • Cash Flow - In the context of IRS trades cash flow refers to sum of money to be paid or received on predefined cash flow dates that are mentioned in the trade • Zero Coupon Yield Curve - It is a curve representing the yield of zero coupon bonds which are plotted against the length of time they have to run to maturity and essentially it provides forward rates and spot rates used for calculating cash flow and discounting • MTM value - MTM value refers to the mark to market value of the IRS trade which reflects monetary gain or loss on the trade to the two parties of the trade 3

  4. Counter Party Credit Risk Mark to Market Computations • The central counterparty (CCP) values, using current yield curve, the complete portfolio of all interest rate swap trades received from all the members on intraday basis. • Calculates the mark to market (MTM) margin requirement for each member. • Block the margin from a member’s collateral and if the margin is not sufficient make margin call to the required members. Initial Margin Computations • The IM computation requires 250 times valuation of the member’s current portfolio using 250 different yield curves that are picked from historical data. 4

  5. Challenges  On traditional systems (database systems) MTM computations takes ~25min for 20,000 trades, each with ~150 cash flows.  Initial Margin (IM) computations takes ~10min for 250 different yield curves each with 20,000 trades along with ~150 cash flows on .NET based solution. Such a high timings leads to • It makes the process inefficient as the user is unproductive during the 25 minutes for which the valuation happens • The timings are high if information is required by senior executives or regulators on an urgent basis • If the trade volumes increase say to 100,000, a realistic possibility, then the time taken will be more than 2 hours which will be virtually unacceptable • Till the time IM result is computed, a member can continue to do trading but the trades are guaranteed for settlement from point of trade in TS which increases risk for CCP An efficient solution is required to solve such problem 5

  6. Computational Steps Yield curve generations: 1. Using the linear interpolation, compute the intermediate swap rates for tenors whose swap rates are not provided. Where (x,y) represents tenor and corresponding interest rate. 2. Zero rates for tenor up to one year are computed using continuous compounding method as: 3. Standard bootstrapping method is used to compute zero rates for tenor beyond one year. 6

  7. Computational Steps MTM values The input to the mark to market computation is business date, immediate previous and future cash flow dates, principal amount, accrued interest, fixed rate, floating rate and zero rate. 1. Compute fixed cash flows and compute floating cash flows using discrete equivalent formula for future floating interest rates i.e. forward rates: 2. Calculate MTM value of each trade by doing discounting of fixed and floating cash flows using zero rate and netting off fixed and floating cash flows. Discounted Value = Present value of cash flows Trade MTM = Sum (Discounted Value) 3. The MTM value i.e. margin requirement for each member is obtained by aggregating MTM value of all the trades of the member. MTM for Member = Sum (Trade MTM of that Member) 7

  8. Computational Steps Initial Margin calculation 1. Value the complete IRS trade portfolio of a Member 250 times using 250 historical zero rates 2. Compute the daily percentage change in the MTM value of the portfolio and record the 249 results 3. Assign weight to each of the result using EWMA scheme such that more recent the result more is the weight 4. Sort the 249 results in ascending order 5. Add the weights from top and wherever the cumulative weight is equal to 0.05 i.e. worst 5 percent, the corresponding percentage change is then multiplied by portfolio value and adjusted for the holding period to compute IM 8

  9. Implementation Details 9

  10. Input and Output of IRS • Input - Swap Rate, Swap Tenor • Output - Yield curve • Input for MTM Computation - Cash_flow_dates, Prev_cash_flow_dates, Notional_amt, accured_int, fixed_rate • Output – Present CashFlow, MTM value 10

  11. Sequential Algorithm Single MTM Computations Compute zero rates MTMfinal = 0 for trades = 0 to nTrade do MTM[trade] = 0 for CF = 0 to nCf do Compute Present_CashFlow[CF] MTM[trade] = MTM[trade] + Present_CashFlow[CF] end for MTMfinal = MTMfinal + MTM[trade] end for 11

  12. Sequential Algorithm Single MTM Computation for CF = 0 to nCf do if(CF ==0) Read eff_date, curr_cash_flow_date else Read last_cash_flow_date, curr_cash_flow_date 1. Calculate no. of days between curr_date and last_date to calculate the tenor. 2. Calculates intermediate values of fw_rate, comp_fw_rate, dist_fw_rate etc 3. Calculated floating cashflow of particular trade based on above calculated values and inputs such as notional_amt, accrued_int and fixed_rate. 4. Compute Present value of cashflow using fixed/floatinig cashflow and yield curve. 5. MTM = MTM + Present_CashFlow[CF] end for 12

  13. Sequential Algorithm Initial Margin Computations using 250 MTM values : Compute current dates zero rates and retrieve 249 different zero rates from database. MTMfinal [nRate] = 0 for zero rate = 0 to nRate do for trades = 0 to nTrade do Single MTM MTM[trade] = 0 Computations for CF = 0 to nCf do Compute Present_CashFlow[CF] MTM[trade] = MTM[trade] + Present_CashFlow[CF] end for MTMfinal[nRate] = MTMfinal[nRate] + MTM[trade] end for end for Compute IM using MTMfinal[nRate] 13

  14. NVIDIA GPU Systems Kepler K20x - Device: The nVidia’s Kepler K20x GPU with 796 MHz and 2496 cores , 5GB RAM. Host: The Intel Xeon(R) CPU E5-2697 v3@ 2.1 GHz, dual socket, 6 cores/socket, 16GB RAM. Kepler K40 - Device: The nVidia’s Kepler K40 GPU with 745 MHz and 2880 cores , 12 GB RAM. Host: The Intel(R) Xeon(R) CPU E5-2697 v3 @ 2.60GHz, dual socket, 14 cores/socket, 64GB RAM. Kepler K80 - Device: The nVidia’s Kepler K80 GPU with 562 MHz and 2x2496 cores , 2x12 GB RAM. Host: The Intel(R) Xeon(R) CPU E5-2697 v3 @ 2.60GHz, dual socket, 14 cores/socket, 64GB RAM. 14

  15. GPU Algorithm for single MTM Compute zero rates MTMfinal = 0 Launch CUDA Kernel with nTrade=20000 threads T1 T3 T4 T20000 T2 Kernel MTM[Ti] = 0 Computation for CF = 0 to nCf do Compute discount rate[CF] MTM[Ti] = MTM[Ti] + discount rate[CF] end for MTMfinal =∑ MTM 15

  16. Results Sr. No. Experiment Time in Performance Sec Gain Sequential computation of 20000 1 81.15 - trades with 150 cash flows each on 250 diff. yield curves 9.612 8.44x 2 Parallel computation of 20000 trades with 150 cash flows each on 250 diff. yield curves on K20x Results taken on Kepler K20x system 16

  17. Further Optimization Nvidia GPU optimization:  Multi level parallelism using Hyper-Q  Using Shared Memory with coalesced memory access  Modified data structure  Resolved the issue of warp divergence  Using constant memory  Read-only cache memory using const __restrict__ 17

  18. Multi level parallelism using Hyper-Q Allows connection from multiple CUDA streams, Message Passing Interface (MPI) processes, or multiple threads of the same process. 32 concurrent work queues, can receive work from 32 process cores at the same time. 1.5x performance benefit achieved . Figure source: nvidia.com 18

  19. GPU Algorithm for 250 MTM & IM Computation Compute zero rates and retrieve 249 previous zero rates Set 32 CUDA stream and nRate=250 Distribute nRate/32 computations to each of 32 streams MTMfinal[nRate] = 0 . . . . . . . . . . . . . . . Streams S0 S1 S2 S31 . . . . . . Compute IM using MTM[nRate] 19

  20. Hyper-Q using default streaming nvcc --default-stream per-thread -c MTM_value.cu -arch sm_35 -w -Xcompiler -fopenmp 20

  21. Results Sr. No. Experiment Time in Performance Sec Gain Sequential computation of 20000 1 81.15 - trades with 150 cash flows each on 250 diff. yield curves 2 9.612 1x Parallel computation of 20000 trades with 150 cash flows each on 250 diff. Zero rates. 1.54x 3 6.24 Using default streaming flag Experiment Results taken on Kepler K20x system 21

  22. Using Shared Memory • Grid Read/write per-block • Speed equivalent to Block (0, 0) Block (1, 0) local cache • 100x faster than Global Shared Memory Shared Memory memory Registers Registers Registers Registers • Limit up to 48KB • Zero rate and Swap Thread (0, 0) Thread (1, 0) Thread (0, 0) Thread (1, 0) tenor are used as shared memory Host Global Memory Constant Memory Figure source: nvidia.com 22

Recommend


More recommend