simplifying ml workflows with apache beam tensorflow
play

Simplifying ML Workflows with Apache Beam & TensorFlow Extended - PowerPoint PPT Presentation

Simplifying ML Workflows with Apache Beam & TensorFlow Extended Tyler Akidau @takidau Software Engineer at Google Apache Beam PMC + Apache Beam Portable data-processing pipelines + Example pipelines Python Java + Cross-language


  1. Simplifying ML Workflows with Apache Beam & TensorFlow Extended Tyler Akidau @takidau Software Engineer at Google Apache Beam PMC +

  2. Apache Beam Portable data-processing pipelines +

  3. Example pipelines Python Java +

  4. Cross-language Portability Framework Language A Language B Language C SDK SDK SDK The Beam Model Runner 1 Runner 2 Runner 3 The Beam Model Language A Language B Language C +

  5. Python compatible runners Direct runner (local machine): Now Google Cloud Dataflow: Now Apache Flink: Q2-Q3 Apache Spark: Q3-Q4 +

  6. TensorFlow Extended End-to-end machine learning in production +

  7. “Doing ML in production is hard.” -Everyone who has ever tried

  8. Because, in addition to the actual ML... ML Code +

  9. ...you have to worry about so much more. Data Monitoring Verification Configuration Data Collection Analysis Tools ML Code Serving Process Management Machine Infrastructure Tools Resource Feature Extraction Management + Source: Sculley et al.: Hidden Technical Debt in Machine Learning Systems

  10. In this talk, I will... +

  11. In this talk, I will... Show you how to apply transformations... Show you how to apply transformations... TensorFlow Transform +

  12. In this talk, we will... Show you how to apply transformations... ... consistently between Training and Serving TensorFlow TensorFlow TensorFlow Transform Estimators Serving +

  13. In this talk, we will... Introduce something new... TensorFlow TensorFlow TensorFlow TensorFlow Model Transform Estimators Serving Analysis +

  14. TensorFlow Transform Consistent In-Graph Transformations in Training and Serving +

  15. Typical ML Pipeline During training During serving request data batch processing “ live ” processing +

  16. Typical ML Pipeline During training During serving request data batch processing “ live ” processing +

  17. TensorFlow Transform During training During serving request data tf.Transform batch processing transform as tf.Graph +

  18. Defining a preprocessing function in TF Transform X Y Z def preprocessing_fn(inputs): x = inputs['X'] ... return { "A": tft.bucketize( tft.normalize(x) * y), "B": tensorflow_fn(y, z), "C": tft.ngrams(z) } Many operations available for dealing with text and numeric features, user can define their own. +

  19. Defining a preprocessing function in TF Transform X Y Z def preprocessing_fn(inputs): x = inputs['X'] ... return { "A": tft.bucketize( mean stddev tft.normalize(x) * y), "B": tensorflow_fn(y, z), normalize "C": tft.ngrams(z) } Many operations available for dealing with text and numeric features, user can define their own. +

  20. Defining a preprocessing function in TF Transform X Y Z def preprocessing_fn(inputs): x = inputs['X'] ... return { "A": tft.bucketize( mean stddev tft.normalize(x) * y), "B": tensorflow_fn(y, z), normalize "C": tft.ngrams(z) multiply } Many operations available for dealing with text and numeric features, user can define their own. +

  21. Defining a preprocessing function in TF Transform X Y Z def preprocessing_fn(inputs): x = inputs['X'] ... return { "A": tft.bucketize( mean stddev tft.normalize(x) * y), "B": tensorflow_fn(y, z), normalize "C": tft.ngrams(z) multiply } quantiles Many operations available for dealing with text and numeric features, user can define their own. bucketize + A

  22. Defining a preprocessing function in TF Transform X Y Z def preprocessing_fn(inputs): x = inputs['X'] ... return { "A": tft.bucketize( mean stddev tft.normalize(x) * y), "B": tensorflow_fn(y, z), normalize "C": tft.ngrams(z) multiply } quantiles Many operations available for dealing with text and numeric features, user can define their own. bucketize + A B

  23. Defining a preprocessing function in TF Transform X Y Z def preprocessing_fn(inputs): x = inputs['X'] ... return { "A": tft.bucketize( mean stddev tft.normalize(x) * y), "B": tensorflow_fn(y, z), normalize "C": tft.ngrams(z) multiply } quantiles Many operations available for dealing with text and numeric features, user can define their own. bucketize + A B C

  24. mean stddev Analyzers Transforms normalize Reduce (full pass) Instance-to-instance (don’t multiply change batch dimension) Implemented as a distributed data pipeline quantiles Pure TensorFlow bucketize +

  25. data constant tensors mean stddev normalize normalize Analyze multiply multiply quantiles bucketize bucketize

  26. What can be done with TF Transform? tf.Transform batch processing Pretty much anything. +

  27. What can be done with TF Transform? tf.Transform batch processing Serving Graph Anything that can be expressed Pretty much anything. as a TensorFlow Graph +

  28. Some common use-cases... Scale to ... Bag of Words / N-Grams Bucketization Feature Crosses

  29. Some common use-cases... Scale to ... Bag of Words / N-Grams Bucketization Feature Crosses Apply another TensorFlow Model

  30. github.com/tensorflow/transform

  31. Introducing… TensorFlow Model Analysis Scaleable, sliced, and full-pass metrics +

  32. Let’s Talk about Metrics... ● How accurate? ● Converged model? ● What about my TB sized eval set? ● Slices / subsets? ● Across model versions? +

  33. ML Fairness: analyzing model mistakes by subgroup ROC Curve All groups Sensitivity (True Positive Rate) Specificity (False Positive Rate) Learn more at ml-fairness.com

  34. ML Fairness: analyzing model mistakes by subgroup ROC Curve All groups Group A Sensitivity (True Positive Rate) Group B Specificity (False Positive Rate) Learn more at ml-fairness.com

  35. ML Fairness: understand the failure modes of your models

  36. ML Fairness: Learn More ml-fairness.com

  37. How does it work? Inference Graph (SavedModel) ... SignatureDef estimator = DNNLinearCombinedClassifier(...) estimator.train(...) estimator.export_savedmodel( serving_input_receiver_fn=serving_input_fn) tfma.export.export_eval_savedmodel ( estimator=estimator, eval_input_receiver_fn=eval_input_fn) ... +

  38. How does it work? Inference Graph (SavedModel) ... SignatureDef estimator = DNNLinearCombinedClassifier(...) estimator.train(...) estimator.export_savedmodel( serving_input_receiver_fn=serving_input_fn) tfma.export.export_eval_savedmodel ( Eval Graph (SavedModel) estimator=estimator, eval_input_receiver_fn=eval_input_fn) Eval SignatureDef Metadata ... +

  39. github.com/tensorflow/model-analysis

  40. Summary Apache Beam: Data-processing framework the runs locally and scales to massive data, in the Cloud (now) and soon on-premise via Flink (Q2-Q3) and Spark (Q3-Q4). Powers large-scale data processing in the TF libraries below. tf.Transform: Consistent in-graph transformations in training and serving. tf.ModelAnalysis: Scalable, sliced, and full-pass metrics. +

Recommend


More recommend