Simplifying ML Workflows with Apache Beam & TensorFlow Extended - PowerPoint PPT Presentation

Simplifying ML Workflows with Apache Beam & TensorFlow Extended Tyler Akidau @takidau Software Engineer at Google Apache Beam PMC +

Apache Beam Portable data-processing pipelines +

Example pipelines Python Java +

Cross-language Portability Framework Language A Language B Language C SDK SDK SDK The Beam Model Runner 1 Runner 2 Runner 3 The Beam Model Language A Language B Language C +

Python compatible runners Direct runner (local machine): Now Google Cloud Dataflow: Now Apache Flink: Q2-Q3 Apache Spark: Q3-Q4 +

TensorFlow Extended End-to-end machine learning in production +

“Doing ML in production is hard.” -Everyone who has ever tried

Because, in addition to the actual ML... ML Code +

...you have to worry about so much more. Data Monitoring Verification Configuration Data Collection Analysis Tools ML Code Serving Process Management Machine Infrastructure Tools Resource Feature Extraction Management + Source: Sculley et al.: Hidden Technical Debt in Machine Learning Systems

In this talk, I will... +

In this talk, I will... Show you how to apply transformations... Show you how to apply transformations... TensorFlow Transform +

In this talk, we will... Show you how to apply transformations... ... consistently between Training and Serving TensorFlow TensorFlow TensorFlow Transform Estimators Serving +

In this talk, we will... Introduce something new... TensorFlow TensorFlow TensorFlow TensorFlow Model Transform Estimators Serving Analysis +

TensorFlow Transform Consistent In-Graph Transformations in Training and Serving +

Typical ML Pipeline During training During serving request data batch processing “ live ” processing +

TensorFlow Transform During training During serving request data tf.Transform batch processing transform as tf.Graph +

Defining a preprocessing function in TF Transform X Y Z def preprocessing_fn(inputs): x = inputs['X'] ... return { "A": tft.bucketize( tft.normalize(x) * y), "B": tensorflow_fn(y, z), "C": tft.ngrams(z) } Many operations available for dealing with text and numeric features, user can define their own. +

Defining a preprocessing function in TF Transform X Y Z def preprocessing_fn(inputs): x = inputs['X'] ... return { "A": tft.bucketize( mean stddev tft.normalize(x) * y), "B": tensorflow_fn(y, z), normalize "C": tft.ngrams(z) } Many operations available for dealing with text and numeric features, user can define their own. +

Defining a preprocessing function in TF Transform X Y Z def preprocessing_fn(inputs): x = inputs['X'] ... return { "A": tft.bucketize( mean stddev tft.normalize(x) * y), "B": tensorflow_fn(y, z), normalize "C": tft.ngrams(z) multiply } Many operations available for dealing with text and numeric features, user can define their own. +

Defining a preprocessing function in TF Transform X Y Z def preprocessing_fn(inputs): x = inputs['X'] ... return { "A": tft.bucketize( mean stddev tft.normalize(x) * y), "B": tensorflow_fn(y, z), normalize "C": tft.ngrams(z) multiply } quantiles Many operations available for dealing with text and numeric features, user can define their own. bucketize + A

Defining a preprocessing function in TF Transform X Y Z def preprocessing_fn(inputs): x = inputs['X'] ... return { "A": tft.bucketize( mean stddev tft.normalize(x) * y), "B": tensorflow_fn(y, z), normalize "C": tft.ngrams(z) multiply } quantiles Many operations available for dealing with text and numeric features, user can define their own. bucketize + A B

Defining a preprocessing function in TF Transform X Y Z def preprocessing_fn(inputs): x = inputs['X'] ... return { "A": tft.bucketize( mean stddev tft.normalize(x) * y), "B": tensorflow_fn(y, z), normalize "C": tft.ngrams(z) multiply } quantiles Many operations available for dealing with text and numeric features, user can define their own. bucketize + A B C

mean stddev Analyzers Transforms normalize Reduce (full pass) Instance-to-instance (don’t multiply change batch dimension) Implemented as a distributed data pipeline quantiles Pure TensorFlow bucketize +

data constant tensors mean stddev normalize normalize Analyze multiply multiply quantiles bucketize bucketize

What can be done with TF Transform? tf.Transform batch processing Pretty much anything. +

What can be done with TF Transform? tf.Transform batch processing Serving Graph Anything that can be expressed Pretty much anything. as a TensorFlow Graph +

Some common use-cases... Scale to ... Bag of Words / N-Grams Bucketization Feature Crosses

Some common use-cases... Scale to ... Bag of Words / N-Grams Bucketization Feature Crosses Apply another TensorFlow Model

github.com/tensorflow/transform

Introducing… TensorFlow Model Analysis Scaleable, sliced, and full-pass metrics +

Let’s Talk about Metrics... ● How accurate? ● Converged model? ● What about my TB sized eval set? ● Slices / subsets? ● Across model versions? +

ML Fairness: analyzing model mistakes by subgroup ROC Curve All groups Sensitivity (True Positive Rate) Specificity (False Positive Rate) Learn more at ml-fairness.com

ML Fairness: analyzing model mistakes by subgroup ROC Curve All groups Group A Sensitivity (True Positive Rate) Group B Specificity (False Positive Rate) Learn more at ml-fairness.com

ML Fairness: understand the failure modes of your models

ML Fairness: Learn More ml-fairness.com

How does it work? Inference Graph (SavedModel) ... SignatureDef estimator = DNNLinearCombinedClassifier(...) estimator.train(...) estimator.export_savedmodel( serving_input_receiver_fn=serving_input_fn) tfma.export.export_eval_savedmodel ( estimator=estimator, eval_input_receiver_fn=eval_input_fn) ... +

How does it work? Inference Graph (SavedModel) ... SignatureDef estimator = DNNLinearCombinedClassifier(...) estimator.train(...) estimator.export_savedmodel( serving_input_receiver_fn=serving_input_fn) tfma.export.export_eval_savedmodel ( Eval Graph (SavedModel) estimator=estimator, eval_input_receiver_fn=eval_input_fn) Eval SignatureDef Metadata ... +

github.com/tensorflow/model-analysis

Summary Apache Beam: Data-processing framework the runs locally and scales to massive data, in the Cloud (now) and soon on-premise via Flink (Q2-Q3) and Spark (Q3-Q4). Powers large-scale data processing in the TF libraries below. tf.Transform: Consistent in-graph transformations in training and serving. tf.ModelAnalysis: Scalable, sliced, and full-pass metrics. +

Simplifying ML Workflows with Apache Beam & TensorFlow Extended - PowerPoint PPT Presentation

Simplifying ML Workflows with Apache Beam & TensorFlow Extended Tyler Akidau @takidau Software Engineer at Google Apache Beam PMC + Apache Beam Portable data-processing pipelines + Example pipelines Python Java + Cross-language

Introduction to Apache Beam Dan Halperin JB Onofr Google Talend Beam podling PMC Beam

C-FX-02-V1.0 DSV 4.0 2 45 15 TensorFlow TensorBoard TensorFlow

Getting Started with TensorFlow Part I: TensorFlow Graphs and Sessions Nick Winovich Department

Sergey Beryozkin, T alend Sergey Beryozkin, T alend Apache CXF Apache CXF Practical JOSE

Apache Felix Web Console Carsten Ziegeler | cziegeler@apache.org ApacheCon NA 2014 About

The Apache Way The Apache Way Nick Burch Nick Burch CTO, Quanticate CTO, Quanticate The

Apache Calcite for Enabling SQL Access to NoSQL Data Systems such as Apache Geode Christian

A Trip Through the NGC TensorFlow Container GTC 2019 S9256 AGENDA A Trip Through the TensorFlow

Distributed TensorFlow Stony Brook University CSE545, Fall 2017 Goals Understand

TensorFlow w/XLA: TensorFlow, Compiled! Expressiveness with performance Pre-release

TensorFlow: a Framework for Scalable Machine Learning ACM Learning Center, 2016 You probably

TensorFlow: neural networks lab Paolo Dragone and Andrea Passerini paolo.dragone@unitn.it

Some resources for ML/TensorFlow TensorFlow resources A good tutorial (about 2:40:00 long)

Data Processing at the Speed of 100 Gbps using Apache Crail Patrick Stuedi IBM Research Apache

Multi-tenant Machine Learning Apache Aurora & Apache Mesos Stephan Erb

Stream Processing with Apache Apex Thomas Weise Apache Apex PMC Chair thw@apache.org @thweise

Industrial Industrial Volume Volume Proprietary Proprietary Technology Technology Low-Cost

Lake Chelan Coordinated Cost-Reimbursement Vanessa Brinkhuis Water Resources Program Contracts

TUTORIAL - TUTORIAL -ABC ABC TOTAL COST for a COST OBJECT TOTAL COST for a COST OBJECT

Earth Mining MA Chenghui Director, Division of Radioactive Waste Management, Department of

FDOE DATABASE BASICS 2017 FAMIS Conference TERESA SANCHO Daytona Beach, FL Presenter June 28,

Using Naiad to Analyze Twitter Data in Batch and Real-time George Wort University of Cambridge

Life Cycle of a Batch Dina Kamen, Deputy Director of Accounting Joseph Walker, Professional

A"endance Areas 2018-19+ Listening & Input Sessions September 14, 2017 & October

Simplifying ML Workflows with Apache Beam & TensorFlow Extended - PowerPoint PPT Presentation

Simplifying ML Workflows with Apache Beam & TensorFlow Extended Tyler Akidau @takidau Software Engineer at Google Apache Beam PMC + Apache Beam Portable data-processing pipelines + Example pipelines Python Java + Cross-language

Introduction to Apache Beam Dan Halperin JB Onofr Google Talend Beam podling PMC Beam

C-FX-02-V1.0 DSV 4.0 2 45 15 TensorFlow TensorBoard TensorFlow

Getting Started with TensorFlow Part I: TensorFlow Graphs and Sessions Nick Winovich Department

Sergey Beryozkin, T alend Sergey Beryozkin, T alend Apache CXF Apache CXF Practical JOSE

Apache Felix Web Console Carsten Ziegeler | cziegeler@apache.org ApacheCon NA 2014 About

The Apache Way The Apache Way Nick Burch Nick Burch CTO, Quanticate CTO, Quanticate The

Apache Calcite for Enabling SQL Access to NoSQL Data Systems such as Apache Geode Christian

A Trip Through the NGC TensorFlow Container GTC 2019 S9256 AGENDA A Trip Through the TensorFlow

Distributed TensorFlow Stony Brook University CSE545, Fall 2017 Goals Understand

TensorFlow w/XLA: TensorFlow, Compiled! Expressiveness with performance Pre-release

TensorFlow: a Framework for Scalable Machine Learning ACM Learning Center, 2016 You probably

TensorFlow: neural networks lab Paolo Dragone and Andrea Passerini paolo.dragone@unitn.it

Some resources for ML/TensorFlow TensorFlow resources A good tutorial (about 2:40:00 long)

Data Processing at the Speed of 100 Gbps using Apache Crail Patrick Stuedi IBM Research Apache

Multi-tenant Machine Learning Apache Aurora &amp; Apache Mesos Stephan Erb

Stream Processing with Apache Apex Thomas Weise Apache Apex PMC Chair thw@apache.org @thweise

Industrial Industrial Volume Volume Proprietary Proprietary Technology Technology Low-Cost

Lake Chelan Coordinated Cost-Reimbursement Vanessa Brinkhuis Water Resources Program Contracts

TUTORIAL - TUTORIAL -ABC ABC TOTAL COST for a COST OBJECT TOTAL COST for a COST OBJECT

Earth Mining MA Chenghui Director, Division of Radioactive Waste Management, Department of

FDOE DATABASE BASICS 2017 FAMIS Conference TERESA SANCHO Daytona Beach, FL Presenter June 28,

Using Naiad to Analyze Twitter Data in Batch and Real-time George Wort University of Cambridge

Life Cycle of a Batch Dina Kamen, Deputy Director of Accounting Joseph Walker, Professional

A&quot;endance Areas 2018-19+ Listening &amp; Input Sessions September 14, 2017 &amp; October

Multi-tenant Machine Learning Apache Aurora & Apache Mesos Stephan Erb

A"endance Areas 2018-19+ Listening & Input Sessions September 14, 2017 & October