near data processing for differentiable machine learning
play

Near-Data Processing for Differentiable Machine Learning Models - PowerPoint PPT Presentation

Near-Data Processing for Differentiable Machine Learning Models Hyeokjun Choe 1 , Seil Lee 1 , Hyunha Nam 1 , Seongsik Park 1 , Seijoon Kim 1 , Eui-Young Chung 2 and Sungroh Yoon 1 , 3 1 Electrical and Computer Engineering, Seoul National


  1. Near-Data Processing for Differentiable Machine Learning Models Hyeokjun Choe 1 , Seil Lee 1 , Hyunha Nam 1 , Seongsik Park 1 , Seijoon Kim 1 , Eui-Young Chung 2 and Sungroh Yoon 1 , 3 ∗ 1 Electrical and Computer Engineering, Seoul National University 2 Electrical and Electronic Engineering, Yonsei University 3 Neurology and Neurological Sciences, Stanford University ∗ Correspondence: sryoon@snu.ac.kr Homepage: http://dsl.snu.ac.kr May 19th, 2017 Hyeokjun Choe et al. MSST 2017 May 19th, 2017 1 / 33

  2. Outline 1 Introduction 2 Background 3 Proposed Methodology 4 Experimental Results 5 Discussion and Conclusion Hyeokjun Choe et al. MSST 2017 May 19th, 2017 2 / 33

  3. Outline 1 Introduction 2 Background 3 Proposed Methodology 4 Experimental Results 5 Discussion and Conclusion Hyeokjun Choe et al. MSST 2017 May 19th, 2017 3 / 33

  4. Machine Learning’s Success Big data Powerful parallel processors ⇒ Sophisticated models Hyeokjun Choe et al. MSST 2017 May 19th, 2017 4 / 33

  5. Issues on Conventional Memory Hierachy Data movement in memory hierarchy Computational efficiency ⇓ Power consumption ⇑ Hyeokjun Choe et al. MSST 2017 May 19th, 2017 5 / 33

  6. Near-data Processing (NDP) Memory or storage with intelligence (i.e., computing power) Process the data stored in memory or storage Reduce the data movements, CPU offloading Hyeokjun Choe et al. MSST 2017 May 19th, 2017 6 / 33

  7. ISP-ML ISP-ML: a full-fledged ISP-supporting SSD platform Easy to implement machine learning algorithm in C/C++ For validation, three SGD algorithms were implemented and experimented with ISP-ML User SSD Application SSD controller OS Embedded SRAM Processor(CPU) ISP SW CPU Main Memory Channel Host I/F Controller SSD controller Cache NAND NAND SRAM ARM Processor SSD ISP HW ISP SW Controller Flash Flash DRAM Channel Controller NAND NAND Channel ISP HW Flash Flash Cache Host I/F ISP HW Controller Controller Channel Controller ISP HW NAND NAND ISP HW NAND NAND Flash Flash ISP HW Flash Flash DRAM Hyeokjun Choe et al. MSST 2017 May 19th, 2017 7 / 33

  8. Outline 1 Introduction 2 Background 3 Proposed Methodology 4 Experimental Results 5 Discussion and Conclusion Hyeokjun Choe et al. MSST 2017 May 19th, 2017 8 / 33

  9. Machine Learning as an Optimization Problem Machine learning categories Supervised learning, unsupervised learning, reinforcement learning The main purpose of supervised machine learning Find the optimal θ that minimizes F ( D ; θ ) F ( D, θ ) = L ( D, θ ) + r ( θ ) (1) D : input data θ : model parameters L : loss function r : regularization term F : objective function Input Output layer layer Hyeokjun Choe et al. MSST 2017 May 19th, 2017 9 / 33

  10. Gradient Descent θ t +1 = θ t − η ∇ F ( D, θ t ) (2) � = θ t − η ∇ F ( D i , θ t ) (3) i η : learning rate t : iteration index h t t p s : / / s e b a s t i a n r a s c h k a . c o m / f a q / d o c s / c l o s e d - f o r m - v s - g d . h t m l i : data sample index 1st-order iterative optimization algorithm Use all samples per iteration Stochastic gradient descent (SGD) Use only one sample per iteration. Minibatch stochastic gradient descent Between gradient descent and SGD Use multiple samples per iteration Hyeokjun Choe et al. MSST 2017 May 19th, 2017 10 / 33

  11. Parallel and Distributed SGD Synchornous SGD Parameter server aggregates ∇ θ slave synchronously. Downpour SGD Workers communicate with parameter server asynchronously. Elastic Average SGD (EASGD) Each worker has own parameters Workers transfer ( θ slave − θ master ), not ∇ θ slave Hyeokjun Choe et al. MSST 2017 May 19th, 2017 11 / 33

  12. Fundamentals of Solid-State Drives (SSDs) SSD Controller Embedded processor for FTL HDD emulation Wear Leveling, Garbage collection, etc. Cache controller Channel controller DRAM Cache and Buffer 512MB - 2GB NAND flash arrays Simultaneously accessible Host interface logic SATA, PCIe Hyeokjun Choe et al. MSST 2017 May 19th, 2017 12 / 33

  13. Previous Work on Near-Data Processing:PIM Perform computation inside the main memory 3D stacked memory (e.g. HMC) is used for PIM recently Implement processing unit in Logic Layer Applications: sorting, string matching, CNN, matrix multiplication etc. Hyeokjun Choe et al. MSST 2017 May 19th, 2017 13 / 33

  14. Previous Work on Near-Data Processing:ISP Perform computation inside the storage ISP with embedded processor Pros: easy to implement, flexible Cons: no parallelism ISP with dedicated hardware logic Pros: channel parallelism, hardware acceleration Cons: hard to implement and change Applications: DB query (scan, join), linear regression, k-means, string match etc. Hyeokjun Choe et al. MSST 2017 May 19th, 2017 14 / 33

  15. Outline 1 Introduction 2 Background 3 Proposed Methodology 4 Experimental Results 5 Discussion and Conclusion Hyeokjun Choe et al. MSST 2017 May 19th, 2017 15 / 33

  16. ISP-ML: ISP Platform for Machine Learning on SSDs ISP-supporting SSD simulator Implemented in SystemC on the Synopsys Platform Architect Software/Hardware co-simulation Easily executes various machine learning algorithms in C/C++ Transaction level simulator For reasonable simulation speed ISP components SRAM ISP SW, ISP HW Embedded Processor User Channel SSD Application SSD controller Controller OS SRAM Embedded Processor(CPU) ISP SW clk/rst CPU Host I/F Main Memory Channel Host I/F Controller Cache SSD controller Cache NAND NAND SRAM ARM Processor SSD ISP SW ISP HW Controller Flash Flash Controller DRAM Channel Controller NAND NAND Channel ISP HW Flash Flash Cache Host I/F Controller ISP HW DRAM Controller Channel Controller ISP HW NAND NAND ISP HW NAND NAND Flash Flash ISP HW NAND Flash Flash Flash DRAM Hyeokjun Choe et al. MSST 2017 May 19th, 2017 16 / 33

  17. ISP-ML: ISP Platform for Machine Learning on SSDs We implemented two types of ISP hardware components. Channel controller: perform primitive operations on the stored data. Cache controller: collect the results from each of the channel controller. Master-slave architecture They communicate with each other. User SSD Application SSD controller OS Embedded SRAM Processor(CPU) ISP SW CPU Main Memory Channel Host I/F Controller SSD controller Cache NAND NAND SRAM ARM Processor ISP HW SSD ISP SW Controller Flash Flash DRAM Channel Controller NAND NAND Channel ISP HW Flash Flash Cache Host I/F ISP HW Controller Controller Channel Controller ISP HW ISP HW NAND NAND NAND NAND Flash Flash ISP HW Flash Flash DRAM Hyeokjun Choe et al. MSST 2017 May 19th, 2017 17 / 33

  18. Parallel SGD Implementation on ISP-ML Hyeokjun Choe et al. MSST 2017 May 19th, 2017 18 / 33

  19. Parallel SGD Implementation on ISP-ML Hyeokjun Choe et al. MSST 2017 May 19th, 2017 18 / 33

  20. Parallel SGD Implementation on ISP-ML Hyeokjun Choe et al. MSST 2017 May 19th, 2017 18 / 33

  21. Parallel SGD Implementation on ISP-ML Hyeokjun Choe et al. MSST 2017 May 19th, 2017 18 / 33

  22. Parallel SGD Implementation on ISP-ML Hyeokjun Choe et al. MSST 2017 May 19th, 2017 18 / 33

  23. Parallel SGD Implementation on ISP-ML Hyeokjun Choe et al. MSST 2017 May 19th, 2017 18 / 33

  24. Parallel SGD Implementation on ISP-ML Hyeokjun Choe et al. MSST 2017 May 19th, 2017 18 / 33

  25. Parallel SGD Implementation on ISP-ML Hyeokjun Choe et al. MSST 2017 May 19th, 2017 18 / 33

  26. Parallel SGD Implementation on ISP-ML Hyeokjun Choe et al. MSST 2017 May 19th, 2017 18 / 33

  27. Parallel SGD Implementation on ISP-ML Hyeokjun Choe et al. MSST 2017 May 19th, 2017 18 / 33

  28. Parallel SGD Implementation on ISP-ML Hyeokjun Choe et al. MSST 2017 May 19th, 2017 18 / 33

  29. Parallel SGD Implementation on ISP-ML Hyeokjun Choe et al. MSST 2017 May 19th, 2017 18 / 33

  30. Parallel SGD Implementation on ISP-ML Hyeokjun Choe et al. MSST 2017 May 19th, 2017 18 / 33

  31. Parallel SGD Implementation on ISP-ML Hyeokjun Choe et al. MSST 2017 May 19th, 2017 18 / 33

  32. Parallel SGD Implementation on ISP-ML Hyeokjun Choe et al. MSST 2017 May 19th, 2017 18 / 33

  33. Parallel SGD Implementation on ISP-ML Hyeokjun Choe et al. MSST 2017 May 19th, 2017 18 / 33

  34. Parallel SGD Implementation on ISP-ML Hyeokjun Choe et al. MSST 2017 May 19th, 2017 18 / 33

  35. Methodology for IHP-ISP Performance Comparison Ideal Ways to Fairly Compare ISP and IHP 1 Implementing ISP-ML in a real semiconductor chip High chip manufacturing costs 2 Simulating IHP in the ISP-ML framework. High simulation time to simulate IHP 3 Implementing both ISP and IHP using FPGAs. Require another significant development efforts. ⇒ Hard to fairly compare the performances of ISP and IHP ⇒ We propose a practical comparison methodology Hyeokjun Choe et al. MSST 2017 May 19th, 2017 19 / 33

Recommend


More recommend