using nvidia cudf to simplify and accelerate data prep
play

Using NVIDIA CUDF to Simplify and Accelerate Data Prep for Credit - PowerPoint PPT Presentation

Using NVIDIA CUDF to Simplify and Accelerate Data Prep for Credit Card Algo. Prediction March 19, 2019 Richard Liu Vice President Agenda Macro economics trends Behavioral surplus Paradigm shift Deep dive to the data How


  1. Using NVIDIA CUDF to Simplify and Accelerate Data Prep for Credit Card Algo. Prediction March 19, 2019 Richard Liu Vice President

  2. Agenda • Macro economics trends • Behavioral surplus • Paradigm shift • Deep dive to the data • How RAPIDS/cuDF helps 2

  3. Perspective on the challenges Business case: Credit card business now faces the challenges on risk management and more importantly on payment or transaction behavior. The conventional balance sheet data approaches can hardly afford such new requirement. U.S. Credit Cards 1,000,000 4.00% 900,000 3.50% 800,000 3.00% Amounts in $ Million 700,000 2.50% 600,000 $ % Rate 500,000 2.00% 400,000 1.50% 300,000 1.00% 200,000 0.50% 100,000 0 0.00% 1984Q1 1985Q1 1986Q1 1987Q1 1988Q1 1989Q1 1990Q1 1991Q1 1992Q1 1993Q1 1994Q1 1995Q1 1996Q1 1997Q1 1998Q1 1999Q1 2000Q1 2001Q1 2002Q1 2003Q1 2004Q1 2005Q1 2006Q1 2007Q1 2008Q1 2009Q1 2010Q1 2011Q1 2012Q1 2013Q1 2014Q1 2015Q1 2016Q1 2017Q1 2018Q1 Total outstanding Noncurrent rate Source from FDIC: Loan Performance (as of 2018/Q4)

  4. Trade secret on behavioral surplus Traditional Digital Age Action to behavior to data to prediction Surveillance capitalism Pool level thinking Book keeping

  5. Paradigm Shift But … How to walk the talk?

  6. Now We Ha Have New Way T y To Look At Dat ata Examples (simulated data for illustration purpose) Customer ID: cust_id Merchant category code: mcc Transaction date: trans_date Dollar amount: trans_amt Array objects after pivoting process: [array of mcc], [array of trans_date], [array of trans_amt] Neuroscience observation on customer behavior (Visualization)

  7. Why y RAPID IDS S cu cuDF Prog ogress ress so far has large gely ly been n towar ard d demonstratin onstrating g general neral approac oaches hes for building lding narrow ow systems stems rather er than n general neral approac oaches hes for building lding genera neral l systems. stems. Progre gress ss toward rd the forme mer r does s not entail ail substanti stantial l progre gress ss towar ard d the latter. ter. AlphaGo and AI Progress. Retrieved October 24, 2017, from http://www.milesbrundage.com/blog-posts/alphago-and-ai-progress. Our expectation: The advantage of modern computation: • The efficient way to deal with very sparse Functional language: data against computation • Performance with ease of programming increment :: [int] -> [int] (Python Pandas like) increment = map (1+) • Much better return on GPU solution investment

  8. How RAPIDs Helps On Transaction Over Time Horizon Easier yet efficient way to resolve the chronic “horizon stacking” data Conventional Distributed over GPU cores SELECT COUNT(), PATTERN = HORIZON(); SUM(), Data object STD(), (0 until array.length) .map( I = PATTERN PARTITION BY () # Window function .addData( attributes( i ),array( i ))) FROM … LEFT JOIN … Smart distributed computation by RAPIDS GROUP BY … Time interval = 1…n

  9. Challenges from DL computation With conventional table way, how to find a departure from the prevailing deep learning zeitgeist that prizes learning from scratch, tabula rasa. Table with system records Hebbian learning like representation SDR

  10. CuDF with Better format for Deep Learning Like Computation Inspiration from Recursive Cortical Network, Hierarchical Temporal Memory function feature_map( hierarchical , data [1.. T ], C ) levels [1.. L ] <- hierarchical.levels for l <- 1 to L do regions <- levels [ l ]. regions for all r in regions do Until spatialPooling converged for r for t <-1 to T do spatialPooling ( r , data [ t ]) end for end Until for c <- 1 to C do for t <- 1 to T do spatialPooling ( r , data [ t ]) Sparse_Data_Representation <- pivoted_array Time_Horizon_Pooling ( r , Sparse_Data_Representation ) end for end for * Dileep George et al. Science 2017;358:eaag2612 (published by AAAS) end for end for end function

  11. How Much RAPIDs Helps • Speed, speed, speed! Things you should know by yesterday. • More time to think (smart machine for smart people). – Feature engineering (more and accurate) – Computational significance (less data yet robust to noise) Dileep George et al. Science 2017;358:eaag2612 (published by AAAS) Github scripts: https://github.com/vicariousinc/science_rcn

  12. Thank you

Recommend


More recommend