I NTRODUCTION H OPSWORKS D ISTRIBUTED D EEP L EARNING P ARALLEL B LACK -B OX O PTIMIZATION S UMMARY Distributed Deep Learning Using Hopsworks SF Machine Learning Mesosphere Kim Hammar kim@logicalclocks.com
I NTRODUCTION H OPSWORKS D ISTRIBUTED D EEP L EARNING P ARALLEL B LACK -B OX O PTIMIZATION S UMMARY D ISTRIBUTED C OMPUTING + D EEP L EARNING = ? Distributed Computing Deep Learning b 0 b 1 x 0 , 1 x 1 , 1 ˆ y x 0 , 2 x 1 , 2 x 0 , 3 x 1 , 3 Why Combine the two? 2em1 1 Chen Sun et al. “Revisiting Unreasonable Effectiveness of Data in Deep Learning Era”. In: CoRR abs/1707.02968 (2017). arXiv: 1707.02968 . URL : http://arxiv.org/abs/1707.02968 . 2em1 2 Jeffrey Dean et al. “Large Scale Distributed Deep Networks”. In: Advances in Neural Information Processing Systems 25 . Ed. by F. Pereira et al. Curran Associates, Inc., 2012, pp. 1223–1231.
I NTRODUCTION H OPSWORKS D ISTRIBUTED D EEP L EARNING P ARALLEL B LACK -B OX O PTIMIZATION S UMMARY D ISTRIBUTED C OMPUTING + D EEP L EARNING = ? Distributed Computing Deep Learning b 0 b 1 x 0 , 1 x 1 , 1 ˆ y x 0 , 2 x 1 , 2 x 0 , 3 x 1 , 3 Why Combine the two? ◮ We like challenging problems � 2em1 1 Chen Sun et al. “Revisiting Unreasonable Effectiveness of Data in Deep Learning Era”. In: CoRR abs/1707.02968 (2017). arXiv: 1707.02968 . URL : http://arxiv.org/abs/1707.02968 . 2em1 2 Jeffrey Dean et al. “Large Scale Distributed Deep Networks”. In: Advances in Neural Information Processing Systems 25 . Ed. by F. Pereira et al. Curran Associates, Inc., 2012, pp. 1223–1231.
I NTRODUCTION H OPSWORKS D ISTRIBUTED D EEP L EARNING P ARALLEL B LACK -B OX O PTIMIZATION S UMMARY D ISTRIBUTED C OMPUTING + D EEP L EARNING = ? Distributed Computing Deep Learning b 0 b 1 x 0 , 1 x 1 , 1 ˆ y x 0 , 2 x 1 , 2 x 0 , 3 x 1 , 3 Why Combine the two? ◮ We like challenging problems � ◮ More productive data science ◮ Unreasonable effectiveness of data 1 ◮ To achieve state-of-the-art results 2 2em1 1 Chen Sun et al. “Revisiting Unreasonable Effectiveness of Data in Deep Learning Era”. In: CoRR abs/1707.02968 (2017). arXiv: 1707.02968 . URL : http://arxiv.org/abs/1707.02968 . 2em1 2 Jeffrey Dean et al. “Large Scale Distributed Deep Networks”. In: Advances in Neural Information Processing Systems 25 . Ed. by F. Pereira et al. Curran Associates, Inc., 2012, pp. 1223–1231.
I NTRODUCTION H OPSWORKS D ISTRIBUTED D EEP L EARNING P ARALLEL B LACK -B OX O PTIMIZATION S UMMARY D ISTRIBUTED D EEP L EARNING (DDL): P REDICTABLE S CALING 3 2em1 3 Jeff Dean. Building Intelligent Systems withLarge Scale Deep Learning . https : / / www . scribd . com / document/355752799/Jeff-Dean-s-Lecture-for-YC-AI . 2018.
I NTRODUCTION H OPSWORKS D ISTRIBUTED D EEP L EARNING P ARALLEL B LACK -B OX O PTIMIZATION S UMMARY D ISTRIBUTED D EEP L EARNING (DDL): P REDICTABLE S CALING
I NTRODUCTION H OPSWORKS D ISTRIBUTED D EEP L EARNING P ARALLEL B LACK -B OX O PTIMIZATION S UMMARY DDL IS NOT A SECRET ANYMORE 4 2em1 4 Tal Ben-Nun and Torsten Hoefler. “Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency Analysis”. In: CoRR abs/1802.09941 (2018). arXiv: 1802.09941 . URL : http://arxiv.org/abs/ 1802.09941 .
I NTRODUCTION H OPSWORKS D ISTRIBUTED D EEP L EARNING P ARALLEL B LACK -B OX O PTIMIZATION S UMMARY DDL IS NOT A SECRET ANYMORE Companies using DDL Frameworks for DDL TensorflowOnSpark Distributed TF CaffeOnSpark
I NTRODUCTION H OPSWORKS D ISTRIBUTED D EEP L EARNING P ARALLEL B LACK -B OX O PTIMIZATION S UMMARY DDL REQUIRES AN ENTIRE SOFTWARE / INFRASTRUCTURE STACK Distributed Systems A/B Data Validation Testing Distributed Training e 4 Gradient ∇ Gradient ∇ Model Serving Data Collection e 1 e 3 HyperParameter Tuning Gradient ∇ Gradient ∇ Monitoring Hardware e 2 Management Feature Engineering Pipeline Management
I NTRODUCTION H OPSWORKS D ISTRIBUTED D EEP L EARNING P ARALLEL B LACK -B OX O PTIMIZATION S UMMARY O UTLINE 1. Hopsworks : Background of the platform 2. Managed Distributed Deep Learning using HopsYARN, HopsML, PySpark, and Tensorflow 3. Black-Box Optimization using Hopsworks, Metadata Store, PySpark, and Maggy 5 2em1 5 Moritz Meister and Sina Sheikholeslami. Maggy . https://github.com/logicalclocks/maggy . 2019.
I NTRODUCTION H OPSWORKS D ISTRIBUTED D EEP L EARNING P ARALLEL B LACK -B OX O PTIMIZATION S UMMARY O UTLINE 1. Hopsworks : Background of the platform 2. Managed Distributed Deep Learning using HopsYARN, HopsML, PySpark, and Tensorflow 3. Black-Box Optimization using Hopsworks, Metadata Store, PySpark, and Maggy 6 2em1 6 Moritz Meister and Sina Sheikholeslami. Maggy . https://github.com/logicalclocks/maggy . 2019.
I NTRODUCTION H OPSWORKS D ISTRIBUTED D EEP L EARNING P ARALLEL B LACK -B OX O PTIMIZATION S UMMARY H OPSWORKS
I NTRODUCTION H OPSWORKS D ISTRIBUTED D EEP L EARNING P ARALLEL B LACK -B OX O PTIMIZATION S UMMARY H OPSWORKS HopsFS
I NTRODUCTION H OPSWORKS D ISTRIBUTED D EEP L EARNING P ARALLEL B LACK -B OX O PTIMIZATION S UMMARY H OPSWORKS HopsYARN (GPU/CPU as a resource) HopsFS
I NTRODUCTION H OPSWORKS D ISTRIBUTED D EEP L EARNING P ARALLEL B LACK -B OX O PTIMIZATION S UMMARY H OPSWORKS Frameworks (ML/Data) HopsYARN (GPU/CPU as a resource) HopsFS
I NTRODUCTION H OPSWORKS D ISTRIBUTED D EEP L EARNING P ARALLEL B LACK -B OX O PTIMIZATION S UMMARY H OPSWORKS b 0 b 1 x 0 , 1 x 1 , 1 ML/AI Assets ˆ y x 0 , 2 x 1 , 2 x 0 , 3 x 1 , 3 Feature Store Pipelines Experiments Models Frameworks (ML/Data) HopsYARN (GPU/CPU as a resource) HopsFS
I NTRODUCTION H OPSWORKS D ISTRIBUTED D EEP L EARNING P ARALLEL B LACK -B OX O PTIMIZATION S UMMARY H OPSWORKS from hops import featurestore from hops import experiment APIs featurestore.get_features([ "average_attendance", "average_player_age"]) experiment.collective_all_reduce(features , model) b 0 b 1 x 0 , 1 x 1 , 1 ML/AI Assets ˆ y x 0 , 2 x 1 , 2 x 0 , 3 x 1 , 3 Experiments Models Feature Store Pipelines Frameworks (ML/Data) HopsYARN (GPU/CPU as a resource) HopsFS
I NTRODUCTION H OPSWORKS D ISTRIBUTED D EEP L EARNING P ARALLEL B LACK -B OX O PTIMIZATION S UMMARY H OPSWORKS from hops import featurestore from hops import experiment APIs featurestore.get_features([ "average_attendance", "average_player_age"]) experiment.collective_all_reduce(features , model) b 0 b 1 ML/AI Assets x 0 , 1 x 1 , 1 ˆ y x 0 , 2 x 1 , 2 x 0 , 3 x 1 , 3 Experiments Models Feature Store Pipelines Frameworks (ML/Data) HopsYARN (GPU/CPU as a resource) Distributed Metadata (Available from REST API) HopsFS
S UMMARY I NNER AND O UTER L OOP OF L ARGE S CALE D EEP P ARALLEL B LACK -B OX O PTIMIZATION b 0 b 1 b 1 b 1 b 1 worker N ∇ N x 0 , 1 x 1 , 1 x 1 , 1 x 1 , 1 x 1 , 1 ˆ y x 0 , 2 x 1 , 2 x 1 , 2 x 1 , 2 x 1 , 2 x 0 , 3 x 1 , 3 x 1 , 3 x 1 , 3 x 1 , 3 . . . Synchronization Data D ISTRIBUTED D EEP L EARNING b 0 b 1 b 1 b 1 b 1 worker 2 ∇ 2 x 0 , 1 x 1 , 1 x 1 , 1 x 1 , 1 x 1 , 1 ˆ y x 0 , 2 x 1 , 2 x 1 , 2 x 1 , 2 x 1 , 2 x 0 , 3 x 1 , 3 x 1 , 3 x 1 , 3 x 1 , 3 b 0 b 1 b 1 b 1 b 1 worker 1 ∇ 1 x 0 , 1 x 1 , 1 x 1 , 1 x 1 , 1 x 1 , 1 y ˆ Inner loop x 0 , 2 x 1 , 2 x 1 , 2 x 1 , 2 x 1 , 2 x 0 , 3 x 1 , 3 x 1 , 3 x 1 , 3 x 1 , 3 H OPSWORKS L EARNING I NTRODUCTION
I NTRODUCTION H OPSWORKS D ISTRIBUTED D EEP L EARNING P ARALLEL B LACK -B OX O PTIMIZATION S UMMARY I NNER AND O UTER L OOP OF L ARGE S CALE D EEP L EARNING Outer loop Inner loop Data worker 1 worker 2 worker N x 0 , 3 x 0 , 2 x 0 , 1 x 0 , 3 x 0 , 2 x 0 , 1 x 0 , 3 x 0 , 2 x 0 , 1 b 0 b 0 b 0 hparams � h x 1 , 3 x 1 , 2 x 1 , 1 b 1 x 1 , 3 x 1 , 2 x 1 , 1 b 1 x 1 , 3 x 1 , 2 x 1 , 1 b 1 . . . x 1 , 3 x 1 , 2 x 1 , 1 x 1 , 3 x 1 , 2 x 1 , 1 x 1 , 3 x 1 , 2 x 1 , 1 b 1 b 1 b 1 Search Method x 1 , 3 x 1 , 2 x 1 , 1 x 1 , 3 x 1 , 2 x 1 , 1 x 1 , 3 x 1 , 2 x 1 , 1 b 1 b 1 b 1 x 1 , 3 x 1 , 2 x 1 , 1 b 1 x 1 , 3 x 1 , 2 x 1 , 1 b 1 x 1 , 3 x 1 , 2 x 1 , 1 b 1 y ˆ y ˆ ˆ y ∇ 1 ∇ 2 ∇ N Synchronization Metric τ
I NTRODUCTION H OPSWORKS D ISTRIBUTED D EEP L EARNING P ARALLEL B LACK -B OX O PTIMIZATION S UMMARY I NNER AND O UTER L OOP OF L ARGE S CALE D EEP L EARNING Outer loop Inner loop Data worker 1 worker 2 worker N x 0 , 3 x 0 , 2 x 0 , 1 x 0 , 3 x 0 , 2 x 0 , 1 x 0 , 3 x 0 , 2 x 0 , 1 b 0 b 0 b 0 hparams � h x 1 , 3 x 1 , 2 x 1 , 1 b 1 x 1 , 3 x 1 , 2 x 1 , 1 b 1 x 1 , 3 x 1 , 2 x 1 , 1 b 1 . . . x 1 , 3 x 1 , 2 x 1 , 1 x 1 , 3 x 1 , 2 x 1 , 1 x 1 , 3 x 1 , 2 x 1 , 1 b 1 b 1 b 1 Search Method x 1 , 3 x 1 , 2 x 1 , 1 x 1 , 3 x 1 , 2 x 1 , 1 x 1 , 3 x 1 , 2 x 1 , 1 b 1 b 1 b 1 x 1 , 3 x 1 , 2 x 1 , 1 b 1 x 1 , 3 x 1 , 2 x 1 , 1 b 1 x 1 , 3 x 1 , 2 x 1 , 1 b 1 y ˆ y ˆ ˆ y ∇ 1 ∇ 2 ∇ N Synchronization Metric τ
I NTRODUCTION H OPSWORKS D ISTRIBUTED D EEP L EARNING P ARALLEL B LACK -B OX O PTIMIZATION S UMMARY I NNER L OOP : D ISTRIBUTED D EEP L EARNING e 4 e 1 e 3 e 2
I NTRODUCTION H OPSWORKS D ISTRIBUTED D EEP L EARNING P ARALLEL B LACK -B OX O PTIMIZATION S UMMARY I NNER L OOP : D ISTRIBUTED D EEP L EARNING e 4 Gradient ∇ Gradient ∇ e 1 e 3 Gradient ∇ Gradient ∇ e 2
I NTRODUCTION H OPSWORKS D ISTRIBUTED D EEP L EARNING P ARALLEL B LACK -B OX O PTIMIZATION S UMMARY I NNER L OOP : D ISTRIBUTED D EEP L EARNING p 4 Data Partition e 4 Gradient ∇ Gradient ∇ p 1 e 1 e 3 p 3 Data Partition Data Partition Gradient ∇ Gradient ∇ e 2 p 2 Data Partition
Recommend
More recommend