CS 6453: Parameter Server Soumya Basu March 7, 2017 What is a - PowerPoint PPT Presentation

CS 6453: Parameter Server Soumya Basu March 7, 2017

What is a Parameter Server? • Server for large scale machine learning problems • Machine learning tasks in a nutshell: Feature (1, 1, 1) Training Extraction (2, -1, 3) (5, 6, 7) … • Design a server that makes the above fast!

Why Now? • Machine learning is important! • Read the news to see why… • Feature extraction fits nicely into Map-Reduce • Many systems take care of this problem… • So, parameter server focuses on training models

Training in ML • Training consists of the following steps: 1. Initialize model with small random values 2. Try to guess the right answer for your input set 3. Adjust the model 4. Repeat step 2-3 until your error is small enough

Systems view of Training • Initialize model with small random values • Paid once- fairly trivial to parallelize • Try to guess the right answer for your input set • Iterate through the input set many many times • Adjust the model • Send a small update to the model parameters

Key Challenges • Three main challenges of implementing a parameter server: • Accessing parameters requires lots of network bandwidth • Training is sequential and synchronization is hard to scale • Fault tolerance at scale (~25% failure rate for 10k machine-hour jobs)

First Attempts • First attempts used memcached for synchronization [VLDB 2010] • Key-value stores have very large overheads • Synchronization costs are expensive and not always necessary

Second Generation • Second generation of attempts were application specific parameter servers [WDSM 2012, NIPS 2012, NIPS 2013] • Fails to factor out common difficulties between many different types of problems • Difficult to deploy multiple algorithms in parallel

General Purpose ML • General purpose machine-learning frameworks • Many have synchronization points -> difficult to scale • Key observation: cache state between iterations

GraphLab • Distributed GraphLab [PVLDB 2012] • Uses coarse-grained snapshots for fault tolerance, impeding scalability • Doesn’t scale elastically like map-reduce frameworks • Asynchronous task scheduling is the main contribution

Piccolo • Piccolo [OSDI 2010] • Most similar to this paper • Is not optimized for Machine Learning though

Technical Contribution • Recall the three main challenges: • Accessing parameters requires lots of network bandwidth • Training is sequential and synchronization is hard to scale • Fault tolerance at scale

Dealing with Parameters • What are parameters of a ML model? • Usually an element of a vector, matrix, etc. • Need to do lots of linear algebra operations • Introduce new constraint: ordered keys • Typically some index into a linear algebra structure

Dealing with Parameters • High model complexity leads to overfitting • Updates don’t touch many parameters • Range push-and-pull: Can update a range of values in a row instead of single key • When sending ranges, use compression

Synchronization • ML models try to find a good local min/min • Need updates to be generally in the right direction • Not important to have strong consistency guarantees all the time • Parameter server introduces Bounded Delay

Fault Tolerance • Server stores all state, workers are stateless • However, workers cache state across iterations • Keys are replicated for fault tolerance • Jobs are rerun if a worker fails

Evaluation

Limitations • Evaluation was done on specially designed ML algorithms • Distributed regression and distributed gradient descent • How fast is it on a sequential algorithm? • Count-Min Sketch is trivially parallelizable • No neural networks evaluated?

Future Work • What happens to sequential ML algorithms? • Synchronization cost ignored, rather than resolved • Where are the bottlenecks of synchronization? • Lots of waiting time, but on what resource(s)?

CS 6453: Parameter Server Soumya Basu March 7, 2017 What is a - PowerPoint PPT Presentation

CS 6453: Parameter Server Soumya Basu March 7, 2017 What is a Parameter Server? Server for large scale machine learning problems Machine learning tasks in a nutshell: Feature (1, 1, 1) Training Extraction (2, -1, 3) (5, 6, 7)

6. Parameter Passing Parameter Passing CS 381 Spring 2016 Example (Formal) Parameter void

10/16/19 Parameter Control Genetic Algorithms Motivation Parameter setting Tuning

Server Traffic Management Server Traffic Management Jeff Chase Duke University, Department of

Content Server Caching Network Client Web Server Browser Avoid Network Latency Avoid Queuing

Batch Processing Natacha Crooks - CS 6453 Data (usually) doesnt fit on a single machine

CS 6453 Network Fabric Presented by Ayush Dubey Based on: 1. Jupiter Rising: A Decade of Clos

CS 6453: StreamScope Soumya Basu March 7, 2017 Motivation Streaming data is everywhere!

CS 6453 LECTURE 6: MESOS PLATFORM REUBEN RAPPAPORT WHAT IS THE PROBLEM? There are many

CS 6453: Geode and Clarinet Soumya Basu April 13, 2017 Motivation Motivation Status Quo Tens

Parameter Passing and Pointers Parameter passing and functions I: reference parameters

10/16/19 Parameters and Parameter Tuning Genetic Algorithms History Taxonomy

Server Upgrades 6/25/19 Agenda Existing Server Infrastructure Reasons for upgrading

1 Handling Return Traffic Handling Return Traffic URL Switching URL Switching Idea: switch

Proxy Server, Network Address Translator, Firewall 1 Proxy Server 2 1 Introduction What

Installing a Web Server 1. Install a sample web server, which supports Servlets/JSPs. A light

Installing a Web Server 1. Install a sample web server, which supports Servlets/JSPs. A light

New Features in BulkTracker pkgsrcCon 2015 Benny Siegert bsiegert@NetBSD.org Outline 1.

Mass Digitization on Demand Automation and Terrible Metadata We Digitize for Remote Requests

Overview CS 446 What is machine learning? Machine learning : study of computational

How & Why is the Arctic Changing? -> Integrate IASOA datasets and experts in sustained

INFO-4604, Applied Machine Learning University of Colorado Boulder August 28, 2018 Prof. Michael

Large-Scale Data Engineering Introduction to cloud computing + Hadoop, HDFS & MapReduce

The Poisson Arrival Process CS 70, Summer 2019 Bonus Lecture, 8/14/19 1 / 22 Poisson

732A54 Big Data Analytics Lecture 11: Machine Learning with Spark Jose M. Pe na IDA, Link

Sambuz

Useful Links

Newsletter

Mail Us

CS 6453: Parameter Server Soumya Basu March 7, 2017 What is a - PowerPoint PPT Presentation

CS 6453: Parameter Server Soumya Basu March 7, 2017 What is a Parameter Server? Server for large scale machine learning problems Machine learning tasks in a nutshell: Feature (1, 1, 1) Training Extraction (2, -1, 3) (5, 6, 7)

6. Parameter Passing Parameter Passing CS 381 Spring 2016 Example (Formal) Parameter void

10/16/19 Parameter Control Genetic Algorithms Motivation Parameter setting Tuning

Server Traffic Management Server Traffic Management Jeff Chase Duke University, Department of

Content Server Caching Network Client Web Server Browser Avoid Network Latency Avoid Queuing

Batch Processing Natacha Crooks - CS 6453 Data (usually) doesnt fit on a single machine

CS 6453 Network Fabric Presented by Ayush Dubey Based on: 1. Jupiter Rising: A Decade of Clos

CS 6453: StreamScope Soumya Basu March 7, 2017 Motivation Streaming data is everywhere!

CS 6453 LECTURE 6: MESOS PLATFORM REUBEN RAPPAPORT WHAT IS THE PROBLEM? There are many

CS 6453: Geode and Clarinet Soumya Basu April 13, 2017 Motivation Motivation Status Quo Tens

Parameter Passing and Pointers Parameter passing and functions I: reference parameters

10/16/19 Parameters and Parameter Tuning Genetic Algorithms History Taxonomy

Server Upgrades 6/25/19 Agenda Existing Server Infrastructure Reasons for upgrading

1 Handling Return Traffic Handling Return Traffic URL Switching URL Switching Idea: switch

Proxy Server, Network Address Translator, Firewall 1 Proxy Server 2 1 Introduction What

Installing a Web Server 1. Install a sample web server, which supports Servlets/JSPs. A light

Installing a Web Server 1. Install a sample web server, which supports Servlets/JSPs. A light

New Features in BulkTracker pkgsrcCon 2015 Benny Siegert bsiegert@NetBSD.org Outline 1.

Mass Digitization on Demand Automation and Terrible Metadata We Digitize for Remote Requests

Overview CS 446 What is machine learning? Machine learning : study of computational

How &amp; Why is the Arctic Changing? -&gt; Integrate IASOA datasets and experts in sustained

INFO-4604, Applied Machine Learning University of Colorado Boulder August 28, 2018 Prof. Michael

Large-Scale Data Engineering Introduction to cloud computing + Hadoop, HDFS &amp; MapReduce

The Poisson Arrival Process CS 70, Summer 2019 Bonus Lecture, 8/14/19 1 / 22 Poisson

732A54 Big Data Analytics Lecture 11: Machine Learning with Spark Jose M. Pe na IDA, Link

Sambuz

Useful Links

Newsletter

Mail Us

How & Why is the Arctic Changing? -> Integrate IASOA datasets and experts in sustained

Large-Scale Data Engineering Introduction to cloud computing + Hadoop, HDFS & MapReduce