NSML : A Machine Learning Platform That Enables You Focus on Your - PowerPoint PPT Presentation

NSML : A Machine Learning Platform That Enables You Focus on Your Models. ML-Sys WS 2017 @ NIPS Nako Sung , Minkyu Kim, Hyunwoo Jo, Youngil Yang, Jinwoong Kim, Leonard Lausen, Youngkwan Kim,   Gayoung Lee, Donghyun Kwak, Jung-Woo Ha, and Sunghun Kim CLOVA AI Research (CLAIR), NAVER | LINE, Search Solution, NAVER Webtoon, HKUST

What is NSML? • A machine learning platform that enables you focus on your models • Two options: on-premise / PaaS

https://xkcd.com/303/

https://www.youtube.com/watch?v=lxZyxxHOw3Y

https://www.youtube.com/watch?v=lxZyxxHOw3Y Wasted Time

https://www.formula1.com/en/latest/features/2017/2/F1-cars-of-2017.html

https://www.formula1.com/en/latest/features/2017/2/F1-cars-of-2017.html Importance of Fast Machines (Multiple Servers and GPUs)

https://www.sportskeeda.com/f1/what-happens-during-f1-pit-stop

https://www.sportskeeda.com/f1/what-happens-during-f1-pit-stop ML Research Challenges: Incidental Tasks

GPU GPU GPU GPU (busy) (idle) (busy) (idle) GPU GPU GPU GPU (idle) (idle) (idle) (idle) Heavy Model GPU GPU GPU GPU (idle) (idle) (idle) (idle) Model Heavy GPU GPU GPU GPU (idle) (idle) (idle) (idle) Model Heavy Heavy Model

ML Research Challenges: Resource Scheduling and Utilization 14 GPUs available but only 7 GPUs can be used in a single machine. GPU GPU GPU GPU (busy) (idle) (busy) (idle) GPU GPU GPU GPU (idle) (idle) (idle) (idle) Heavy Model GPU GPU GPU GPU (idle) (idle) (idle) (idle) Model Heavy GPU GPU GPU GPU (idle) (idle) (idle) (idle) Model Heavy Heavy Model

https://livingthing.danmackinlay.name/automl.html

https://livingthing.danmackinlay.name/automl.html ML Research Challenges: Hyperparameter Tuning

Tensor board Visdom TRAINING TRAINING DONE DONE γ =1e-2 γ =0.3, K=1 γ =0.1 γ =0.2

Visdom Tensor board ML Research Challenges: Multiple Experiments TRAINING TRAINING DONE DONE γ =1e-2 γ =0.3, K=1 γ =0.1 γ =0.2

https://www.linkedin.com/pulse/protecting-workers-who-work-alone-sandie-baillargeon

https://www.linkedin.com/pulse/protecting-workers-who-work-alone-sandie-baillargeon ML Research Challenges: Isolated Researchers

Challenges • Slack • Incidental Tasks • Ine ffi cient resource utilization • Naive hyperparameter tuning • Painful keeping track of multiple sessions • Isolated researchers

Requirements of ML Platforms • Resource Management • Better computational resource management • Data Management • Post datasets once and reuse them for multiple models • Share datasets with others • Serverless Configuration • No framework / library lock-in • Easy and lightweight task submission

Requirements of ML Platforms • Experiment Management and Visualization • Parallel runs with di ff erent jobs priorities • Automatic visualization and summarization of learning progress • Leaderboard • Leaderboard for each dataset to compare models and hyper parameters • AutoML • Experiment performance prediction based on previously run experiments. • Automatic hyper parameter optimization based on the performance predictions.

Limitations of Previous Solutions • Vendor lock-in (Cloud service) • Ine ffi cient model experiments • Inconsistent research environments • Still hard to keep track of experiments

This work was done for NCSoft and was presented at Nvidia GTC Korea 2015. MINI

This work was done for NCSoft and was presented at Nvidia GTC Korea 2015. My Previous Work in Early 2015 MINI

  URI {Dataset} / {User id} / {Session id} / {Model id} • Every dataset, session and model have uniform resource identifier.   CIFAR_10 CIFAR 10 dataset CIFAR_10/researcher_A/24 research_A’s 24th session for CIFAR_10 CIFAR_10/researcher_A/24/322 Snapshot from epoch 322

Easy One-Liner CLI

Easy One-Liner CLI Dataset registration

Easy One-Liner CLI Dataset registration Train

Easy One-Liner CLI Dataset registration Train Serve

Parallel Experiments to Kill Slack Distributed responses Exp. #1 Exp #2. vari. 1 Exp #2. vari. 2 Exp #3 Time

https://www.interaction-design.org/literature/book/the-encyclopedia-of-human-computer-interaction-2nd-ed/data-visualization-for-human-perception Need to Visualize • Balance your brain to understand without e ff ort

Flexible Analysis DONE Your code @1 TRAINING NSML Visualization tool Your code @2 TRAINING Your code @3

Dynamic Control Flow Typical training loop NSML Forward pass Backward pass Communicate to NSML Command queue model 1 Watch a variable change_lr(0.2) 2 Change a hyper parameter on the fly nsml.save(‘quick’) 3 Save current snapshot nsml.load(424) 4 Load saved snapshot 5 vis.image(model.generate(2)) Generate an image to visdom … …. …

CLI • Base of advanced features like save, load, infer, …

Bring Your Own Workspace • (Almost) Nothing to learn • Cached (Fast)

No Framework Lock-in

GPU server 10.0.0.1 python your_model.py stdout Interactive Mode

Pragmatic Research

Collaboration and Competition Leaderboard, CI-ML

New Workflow for ML Research Collaboration and Competition Leaderboard, CI-ML

Collaborative Research • Easy to reproduce and extend other’s research.

Cohesive and Competitive Dataset-centric environment Models are ranked automatically Standardized and Quantified Easy to compete Towards AutoML

AutoML • Quantitive model analysis makes ML workflow as a gym of AutoML

Dataset ASR Bob’s model 12 98.2% Bob’s model 13 94.2% Alice’s model 4 92.1% REST API   Seamless Connection to Services SOTA server https://service.nsml.navercorp.com/ASR

Dataset ASR Bob’s model 12 98.2% Bob’s model 13 94.2% Alice’s model 4 92.1% Alice’s model 5 98.3% REST API   Seamless Connection to Services SOTA server https://service.nsml.navercorp.com/ASR

Q1. 2018

https://research.clova.ai/nsml-alpha Thank you Several Hundreds of GPUs for this alpha (free)

NSML : A Machine Learning Platform That Enables You Focus on Your - PowerPoint PPT Presentation

NSML : A Machine Learning Platform That Enables You Focus on Your Models. ML-Sys WS 2017 @ NIPS Nako Sung , Minkyu Kim, Hyunwoo Jo, Youngil Yang, Jinwoong Kim, Leonard Lausen, Youngkwan Kim, Gayoung Lee, Donghyun Kwak, Jung-Woo Ha, and Sunghun

Organon Analytics AI Platform We use our own advanced machine learning platform to help

IBMs Open-Source Based AI Developer Tools Sumit Gupta VP , AI, Machine Learning & HPC

1 Dont Make Me Get Non -Linear! A Grounding Example: Linear Regression Predict

Distributed Deep Learning with Horovod Alex Sergeev, Machine Learning Platform, Uber Engineering

Next Generation NAS Evolution of DSM Platform Jeremie Francois Synology Deep Machine Learning

Introducing Krylov eBay AI Platform - Machine Learning Made Easy Henry Saputra Technical Lead

a web platform for collaborative analysis of multi-gigapixel images with machine learning Rapha

This Lecture Classification Machine Learning and Pattern Recognition Now we focus on

An Exercise in An Exercise in Machine Learning Machine Learning

ServiceNow Project Kick-Off 03/19/19 What is ServiceNow Cloud-based platform that enables

MACHINE LEARNING Slide adapted from learning from data book and course, and Berkeley cs188 by Dan

Machine Learning By Alex Scarlatos What is Machine Learning? Machine Learning is the process by

Machine Learning: Study of algorithms that improve their performance P at some task T

Traditional Machine Learning: Unsupervised Learning Juhan Nam Traditional Machine Learning

A platform for the Complete Machine Learning Lifecycle Corey Zumar June 24 th , 2019 Outline

Building a Big Data Machine Learning Platform Cliff Click, CTO 0xdata cliffc@0xdata.com

CS 335 Machine Learning What is Machine Learning? Dan Sheldon Spring 2019 What is Machine

Machine Learning Machine Learning: algorithms that use experience to improve their

ESP4ML Platform-Based Design of System-on-Chip for Embedded Machine Learning Davide Giri

1 Automating Machine Learning and Deep Learning Workflows 2 Information Name: Mourad

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

1 Why Study Machine Learning? Why Study Machine Learning? Cognitive Science The Time is Ripe

Markov Crawler Machine learning platform aiding CS188 Crawler can execute a reflex agent or

MACHINE LEARNING, STATISTICAL LEARNING AND PARALLEL COMPUTING INTRODUCTION VS MACHINE LEARNING

NSML : A Machine Learning Platform That Enables You Focus on Your - PowerPoint PPT Presentation

NSML : A Machine Learning Platform That Enables You Focus on Your Models. ML-Sys WS 2017 @ NIPS Nako Sung , Minkyu Kim, Hyunwoo Jo, Youngil Yang, Jinwoong Kim, Leonard Lausen, Youngkwan Kim, Gayoung Lee, Donghyun Kwak, Jung-Woo Ha, and Sunghun

Organon Analytics AI Platform We use our own advanced machine learning platform to help

IBMs Open-Source Based AI Developer Tools Sumit Gupta VP , AI, Machine Learning &amp; HPC

1 Dont Make Me Get Non -Linear! A Grounding Example: Linear Regression Predict

Distributed Deep Learning with Horovod Alex Sergeev, Machine Learning Platform, Uber Engineering

Next Generation NAS Evolution of DSM Platform Jeremie Francois Synology Deep Machine Learning

Introducing Krylov eBay AI Platform - Machine Learning Made Easy Henry Saputra Technical Lead

a web platform for collaborative analysis of multi-gigapixel images with machine learning Rapha

This Lecture Classification Machine Learning and Pattern Recognition Now we focus on

An Exercise in An Exercise in Machine Learning Machine Learning

ServiceNow Project Kick-Off 03/19/19 What is ServiceNow Cloud-based platform that enables

MACHINE LEARNING Slide adapted from learning from data book and course, and Berkeley cs188 by Dan

Machine Learning By Alex Scarlatos What is Machine Learning? Machine Learning is the process by

Machine Learning: Study of algorithms that improve their performance P at some task T

Traditional Machine Learning: Unsupervised Learning Juhan Nam Traditional Machine Learning

A platform for the Complete Machine Learning Lifecycle Corey Zumar June 24 th , 2019 Outline

Building a Big Data Machine Learning Platform Cliff Click, CTO 0xdata cliffc@0xdata.com

CS 335 Machine Learning What is Machine Learning? Dan Sheldon Spring 2019 What is Machine

Machine Learning Machine Learning: algorithms that use experience to improve their

ESP4ML Platform-Based Design of System-on-Chip for Embedded Machine Learning Davide Giri

1 Automating Machine Learning and Deep Learning Workflows 2 Information Name: Mourad

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

1 Why Study Machine Learning? Why Study Machine Learning? Cognitive Science The Time is Ripe

Markov Crawler Machine learning platform aiding CS188 Crawler can execute a reflex agent or

MACHINE LEARNING, STATISTICAL LEARNING AND PARALLEL COMPUTING INTRODUCTION VS MACHINE LEARNING

IBMs Open-Source Based AI Developer Tools Sumit Gupta VP , AI, Machine Learning & HPC