From Python to PySpark and Back Again - Unifying Single-host and Distributed Machine Learning with Maggy Moritz Meister, @morimeister Software Engineer, Logical Clocks Jim Dowling, @jim_dowling Associate Professor, KTH Royal Institute of Technology
ML Model Development A simplified view Feature Explainability Exploration Experimentation Model Training Serving Pipelines and Validation
ML Model Development It’s simple - only four steps Explore Experimentation: Explainability and Model Training and Design Tune and Search Ablation Studies (Distributed)
Artifacts and Non DRY Code Explore Experimentation: Explainability and Model Training and Design Tune and Search Ablation Studies (Distributed)
What It’s Really Like … not linear but iterative
What It’s Really Really Like … not linear but iterative
Root Cause: Iterative Development of ML Models Explore Experimentation: Explainability and Model Training and Design Tune and Search Ablation Studies (Distributed)
Iterative Development Is a Pain, We Need DRY Code! Each step requires different implementations of the training code EDA HParam Tuning Ablation Studies Training (Dist)
The Oblivious Training Function # RUNS ON THE WORKERS def train(): def input_fn(): # return dataset model = … optimizer = … OBLIVIOUS model.compile(…) rc = tf.estimator.RunConfig( TRAINING FUNCTION ‘CollectiveAllReduceStrate gy’) keras_estimator = tf.keras.estimator. model_to_estimator(….) tf.estimator.train_and_evaluate( keras_estimator, input_fn) EDA HParam Tuning Ablation Studies Training (Dist)
Challenge: Obtrusive Framework Artifacts Example: TensorFlow ▪ TF_CONFIG ▪ Distribution Strategy ▪ Dataset (Sharding, DFS) ▪ Integration in Python - hard from inside a notebook ▪ Keras vs. Estimator vs. Custom Training Loop
Where is Deep Learning headed?
Productive High-Level APIs Or why data scientists love Keras and PyTorch Idea Framework Experiment Tracking Visualization Infrastructure Results Francois Chollet, “Keras: The Next 5 Years”
Productive High-Level APIs Or why data scientists love Keras and PyTorch Idea Framework Experiment Tracking Visualization Infrastructure ? Results Hopsworks (Open Source) Databricks Apache Spark Cloud Providers Francois Chollet, “Keras: The Next 5 Years”
How do we keep our high-level APIs transparent and productive?
What Is Transparent Code? def dataset(batch_size): def dataset(batch_size): (x_train, y_train) = load_data() (x_train, y_train) = load_data() x_train = x_train / np.float32(255) x_train = x_train / np.float32(255) y_train = y_train.astype(np.int64) y_train = y_train.astype(np.int64) train_dataset = tf.data.Dataset.from_tensor_slices( train_dataset = tf.data.Dataset.from_tensor_slices( NO CHANGES! (x_train,y_train)).shuffle(60000) (x_train,y_train)).shuffle(60000) .repeat().batch(batch_size) .repeat().batch(batch_size) return train_dataset return train_dataset def build_and_compile_cnn_model(lr): def build_and_compile_cnn_model(lr): model = tf.keras.Sequential([ model = tf.keras.Sequential([ tf.keras.Input(shape=(28, 28)), tf.keras.Input(shape=(28, 28)), tf.keras.layers.Conv2D(32, 3, activation='relu'), tf.keras.layers.Conv2D(32, 3, activation='relu'), tf.keras.layers.Flatten(), tf.keras.layers.Flatten(), tf.keras.layers.Dense(128, activation='relu'), tf.keras.layers.Dense(128, activation='relu'), tf.keras.layers.Dense(10) tf.keras.layers.Dense(10) ]) ]) model.compile( model.compile( loss=SparseCategoricalCrossentropy(from_logits=True), loss=SparseCategoricalCrossentropy(from_logits=True), optimizer=SGD(learning_rate=lr)) optimizer=SGD(learning_rate=lr)) return model return model
Building Blocks for Distribution Transparency
Distribution Context Single-host vs. parallel multi-host vs. distributed multi-host Worker 1 Worker 8 Worker 2 Single Driver Driver Worker 7 Worker 3 Experiment Host TF_CONFIG Controller Worker 6 Worker 4 Worker 5 Worker 1 Worker 2 Worker N
Distribution Context Single-host vs. parallel multi-host vs. distributed multi-host Worker 1 Worker 8 Worker 2 Single Driver Driver Worker 7 Worker 3 Experiment Host TF_CONFIG Controller Worker 6 Worker 4 Worker 5 Worker 1 Worker 2 Worker N Explore Experimentation: Explainability and Model Training and Design Tune and Search Ablation Studies (Distributed)
Model Development Best Practices ▪ Modularize ▪ Parametrize ▪ Higher order training functions Training Model Dataset ▪ Usage of callbacks at Logic Generation Generation runtime
Oblivious Training Function as an Abstraction Let the system handle the complexities System takes care of ... … fixing parameters … launching trials (parametrized … setting up TF_CONFIG instantiations of the function) … launching … wrapping in Distribution Strategy the function … generating new trials … launching function as workers … collecting and logging results … collecting results
Maggy Make the Oblivious Training Function a core abstraction on Hopsworks Spark+AI Summit 2019 Today With Hopsworks and Maggy, we provide a unified development and execution environment for distribution transparent ML model development.
Hopsworks - Award Winning Plattform
Recap: Maggy - Asynchronous Trials on Spark Spark is bulk-synchronous HopsFS Metrics 1 Metrics 2 Metrics 3 Task 11 Task 21 Task 31 Task 12 Task 22 Task 32 Barrier Barrier Barrier Task 13 Task 23 Task 33 Wasted Compute Early-Stopping Task 1N Task 2N Task 3N Wasted Wasted Compute Compute Driver
Recap: The Solution Add Communication and Long Running Tasks Task 11 Task 12 Barrier Task 13 Task 1N Driver Metrics New Trial
What’s New? Worker discovery and distribution context set-up Task 11 Task 12 Barrier Task 13 Task 1N Driver Discover Launch Oblivious Training Workers Function in Context
What’s New: Distribution Context from maggy import experiment experiment.set_dataset_generator(gen_dataset) experiment.set_model_generator(gen_model) # Hyperparameter optimization experiment.set_context('optimization', 'randomsearch', searchspace) result = experiment.lagom(train_fun) params = result.get('best_hp') # Distributed Training experiment.set_context('dist_training', 'MultiWorkerMirroredStrategy', params) experiment.lagom(train_fun) # Ablation study experiment.set_context('ablation', 'loco', ablation_study, params) experiment.lagom(train_fun)
DEMO Code changes required to go from standard Python code to scale-out hyperparameter tuning and distributed training.
What’s Next Extend the platform to provide a unified development and execution environment for distribution transparent Jupyter Notebooks.
Summary ▪ Moving between distribution contexts requires code rewriting ▪ Factor out obtrusive framework artifacts ▪ Let system handle distribution context ▪ Keep productive high-level APIs
Thank You! Get Started Thanks to the Logical Clocks Team! hopsworks.ai github.com/logicalclocks/maggy Contributions from colleagues Sina Sheikholeslami Twitter ▪ Robin Andersson @morimeister ▪ @jim_dowling Alex Ormenisan ▪ @logicalclocks Kai Jeggle @hopsworks ▪ Web www.logicalclocks.com
Feedback Your feedback is important to us. Don’t forget to rate and review the sessions.
Recommend
More recommend