learning systems
play

Learning Systems Research at the Intersection of Machine Learning - PowerPoint PPT Presentation

Learning Systems Research at the Intersection of Machine Learning & Data Systems Joseph E. Gonzalez Asst. Professor, UC Berkeley jegonzal@cs.berkeley.edu How can machine learning techniques be used to address systems challenges ? Learning


  1. Learning Systems Research at the Intersection of Machine Learning & Data Systems Joseph E. Gonzalez Asst. Professor, UC Berkeley jegonzal@cs.berkeley.edu

  2. How can machine learning techniques be used to address systems challenges ? Learning Systems How can systems techniques be used to address machine learning challenges ?

  3. How can machine learning techniques be used to address systems challenges ? Learning Systems How can systems techniques be used to address machine learning challenges ?

  4. How can machine learning techniques be used to address systems challenges ? Systems are getting increasing complex: Ø Resource Disaggregation à growing diversity of system configurations and freedom to add resources as needed Ø New Pricing Models à dynamic pricing and potential to bid for different types of resources Ø Data-centric Workloads à performance depends on interaction between system, algorithms, and data

  5. Paris Performance Aware Runtime Inference System Neeraja Bharath Randy Yadwadkar Hariharan Katz Ø What vm-type should I use to run my experiment? r3.2xlarge m4.large t2.large c4.large r3.xlarge m4.4xlarge r3.large g2.8xlarge m3.xlarge m4.2xlarge g2.2xlarge m3.2xlarge m3.medium t2.micro c4.2xlarge c4.xlarge m3.large c4.4xlarge r3.8xlarge r3.4xlarge c4.8xlarge t2.small t2.nano m4.xlarge t2.medium x1.32xlarge m4.10xlarge

  6. Paris Performance Aware Runtime Inference System Neeraja Bharath Randy Yadwadkar Hariharan Katz Ø What vm-type should I use to run my experiment? g2.2xlarge m3.xlarge r3.2xlarge m4.2xlarge t2.large c4.large r3.xlarge m4.large m4.4xlarge r3.large g2.8xlarge m3.medium 54 Instance Types t2.micro m3.large c4.2xlarge r3.8xlarge r3.4xlarge m3.2xlarge c4.8xlarge c4.4xlarge c4.xlarge t2.small m4.xlarge t2.nano x1.32xlarge t2.medium m4.10xlarge

  7. Paris Performance Aware Runtime Inference System Neeraja Bharath Randy Yadwadkar Hariharan Katz Ø What vm-type should I use to run my experiment? c4.large r3.large m4.large t2.small t2.micro r3.xlarge t2.large m4.2xlarge c4.4xlarge g2.2xlarge m3.xlarge m3.large r3.8xlarge c4.8xlarge g2.8xlarge t2.medium m4.10xlarge c4.xlarge m3.medium c4.2xlarge m4.xlarge t2.nano x1.32xlarge r3.4xlarge m3.2xlarge m4.4xlarge r3.2xlarge 54 25 18 Ø Answer: workload specific and depends on cost & runtime goals

  8. Paris Performance Aware Runtime Inference System Neeraja Bharath Randy Yadwadkar Hariharan Katz Ø Best vm-type depends on workload as well as cost & runtime goals Which VM will cost Price Runtime me the least? m1.small is cheapest?

  9. Paris Performance Aware Runtime Inference System Neeraja Bharath Randy Yadwadkar Hariharan Katz Ø Best vm-type depends on workload as well as cost & runtime goals Price Job Runtime Cost Requires accurate runtime prediction .

  10. Paris Performance Aware Runtime Inference System Neeraja Bharath Randy Yadwadkar Hariharan Katz Ø Goal: Predict the runtime of workload w on VM type v Ø Challenge: How do we model workloads and VM types Ø Insight: Benchmarking Ø Extensive benchmarking to model … relationships between VM types vm1 vm2 vm100 Ø Costly but run once for all workloads Ø Lightweight workload “fingerprinting” Fingerprinting Workload by on a small set of test VMs Ø Generalize workload performance on other VMs Ø Results: Runtime prediction 17% Relative RMSE (56% Baseline)

  11. Hemingway * Modeling Throughput and Convergence for ML Workloads Shivaram Xinghao Zi Venkataraman Pan Zheng Ø What is the best algorithm and level of parallelism for an ML task? Ø Trade-off: Parallelism, Coordination, & Convergence Ø Research challenge: Can we model this trade-off explicitly? We can estimate I from ML Metric Iter. / Sec. data on many systems Loss Systems Metric We can estimate L from data for our problem Cores Iteration I ( p ) Iterations per second as Loss as a function of L ( i, p ) a function of cores p iterations i and cores p *follow-up work to Shivaram’s Ernest paper

  12. Hemingway * Modeling Throughput and Convergence for ML Workloads Shivaram Xinghao Zi Venkataraman Pan Zheng Ø What is the best algorithm and level of parallelism for an ML task? Ø Trade-off: Parallelism, Coordination, & Convergence Ø Research challenge: Can we model this trade-off explicitly? Loss as a function of loss ( t, p ) = L ( t ∗ I ( p ) , p ) L ( i, p ) iterations i and cores p I ( p ) Iterations per second as • How long does it take to get to a given loss? a function of cores p • Given a time budget and number of cores which algorithm will give the best result? *follow-up work to Shivaram’s Ernest paper

  13. Deep Code Completion Neural architectures for reasoning about programs Ø Goals: Xin Chang Dawn Wang Liu Song Ø Smart naming of variables and routines Ø Learn coding styles and patterns Ø Predict large code fragments def fib( ): x Ø Char and Symbol LSTMs if x < 2 : return x else : y = fib(x–1) + fib(x–2) Ø Programs are more tree shaped… return y

  14. Deep Code Completion Neural architectures for reasoning about programs Ø Goals: Xin Chang Dawn Wang Liu Song Ø Smart naming of variables and routines Ø Learn coding styles and patterns def fib( ): Ø Predict large code fragments x Parse Ø Char and Symbol LSTMs if x < 2 Tree return x = y Ø Programs are more tree shaped… + fib(x–1) fib(x–2) return y

  15. Deep Code Completion Neural architectures for reasoning about programs Ø Goals: Xin Chang Dawn Wang Liu Song Ø Smart naming of variables and routines Ø Learn coding styles and patterns def fib( ): Ø Predict large code fragments x Parse Ø Char and Symbol LSTMs if x < 2 Tree return x = y Ø Exploring Tree LSTMs + fib(x–1) Ø Issue: dependencies fib(x–2) flow in both directions return y Kai Sheng Tai, Richard Socher, Christopher D. Manning. “Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks.” (ACL 2015)

  16. Deep Code Completion Neural architectures for reasoning about computer programs Ø Goals: Xin Chang Dawn Wang Liu Song Ø Smart naming of variables and routines Ø Learn coding styles and patterns Ø Predict large code fragments Ø Current studying Char-LSTM and Tree-LSTM on benchmark C++ Vanilla LSTM code and JavaScript code. Ø Plan to extend Tree-LSTM with downward information flow Tree- LSTM

  17. Fun Code Sample Generated by Char-LSTM Generated Code Sample Code Prefix For now, the neural network can learn some code patterns like matching the parenthesis, if-else block, etc but the variable name issue still hasn’t been solved. *this is trained on the leetcode OJ code submissions from Github.

  18. How can machine learning techniques be used to address systems challenges ? Learning Systems How can systems techniques be used to address machine learning challenges ?

  19. How can machine learning techniques be used to address systems challenges ? Learning Systems How can systems techniques be used to address machine learning challenges ?

  20. Systems for Machine Learning Training Big Data Big Model Timescale: minutes to days Systems: offline and batch optimized Heavily studied ... primary focus of the ML research

  21. Training Big Data Big Model CoCoA Splash Please make a Logo!

  22. Training Big Data Big Model emgine CoCoA Splash Please make a Logo!

  23. Temgine A Scalable Multivariate Time Series Analysis Engine Challenge: Francois Xin Evan Billetti Wang Sparks Ø Estimate second order statistics Ø E.g. Auto-correlation, auto-regressive models, … Ø for high-dimensional & irregularly sampled time series Sensor 1 Regularly Irregularly Sensor 1 Sampled Sampled Time Sensor 2 Time Sensor 2 Difficult to align! Samples are Time easy to align Sensor 3 Time (requires sorting) Sensor 3 Time t 0 t 1 t 2 t 3 t 4 t 5 t 6 Time

  24. Temgine A Scalable Multivariate Time Series Analysis Engine Challenge: Francois Xin Evan Billetti Wang Sparks Ø Estimate second order statistics Ø E.g. Auto-correlation, auto-regressive models, … Ø for high-dimensional & irregularly sampled time series Solution: Irregularly Sensor 1 • Project onto Fourier basis Sampled Time • does not require data alignment Sensor 2 Difficult to align! • Infer statistics in frequency domain Time • equivalent to kernel smoothing Sensor 3 • analysis of bias – variance tradeoff Time

  25. Temgine A Scalable Multivariate Time Series Analysis Engine Challenge: Francois Xin Evan Billetti Wang Sparks Ø Estimate second order statistics Ø E.g. Auto-correlation, auto-regressive models, … Ø for high-dimensional & irregularly sampled time series Solution: • Project onto Fourier basis emgine • does not require data alignment • Infer statistics in frequency domain Define an operator DAG (like TF) • equivalent to kernel smoothing and then rely on query-optimization • analysis of bias – variance tradeoff to define efficient execution.

  26. Learning Training Big Data Big Model

  27. Learning Inference Query ? Big Training Data Decision Big Model Application

  28. Inference Learning Query Big Training Data Decision Big Model Application Timescale: ~ 10 milliseconds Systems: online and latency optimized Less Studied …

Recommend


More recommend