EuroPython 2020 c e d o Real Time Machine Learning with Python - PowerPoint PPT Presentation

@ A x S a u EuroPython 2020 c e d o Real Time Machine Learning with Python Alejandro Saucedo | as@seldon.io Twitter: @AxSaucedo

@ my name is Alejandro A Hello, x S a u c e d o Engineering Director Seldon Technologies Chief Scientist The Institute for Ethical AI & ML Head of Solutions Eng & Sci Eigen Technologies Software Engineer & DevX Lead Bloomberg LP Alejandro Saucedo

Seldon: OSS Production @ A x S a u ML Deployment c e d o

@ A The Institute for Ethical AI x S a u c e & Machine Learning d o

@ We are part of the LFAI A x S a u c e d o

@ A Today x S a u c e d o ● Conceptual intro to stream processing ● Machine learning for real time ● Tradeoffs across tools ● Hands on use-case

@ A Real Time Reddit Processing x S a u c e d o ● Real time ML model for reddit comments ● 200k comments for training model ● /r/science comments removed by mods We will be fixing the front page of the internet

@ A A trip to the past present: ETL x S a u c e d o E - Extract T - Transform L - Load

@ A Variations x S a u c e d o ● ETL - Extract Transform Load ● ELT - Extract Load Transform ● EL - Extract Load ● LT - Load Transform ● WTF - LOL

@ A Specialised Tools x S a u c e d o

@ A x S a u c e d o EL ETL ELT Nifi Oozie Elasticsearch Flume Airflow Data Warehouse … Jupyter notebook?

@ A Batch VS Streaming x S a u c e d o The spectrum of data processing

@ A Batch VS AND Streaming x S a u c e d o The right tool for the challenge

@ A Unifying Worlds x S a u c e d o Massive drive on converging worlds

@ A Streaming Concepts: Windows x S a u c e d o Processing of batches in real time

@ A Streaming Concepts: Checkpoints x S a u c e d o Keeping track of stream progress

@ A Streaming Concepts: Watermarks x S a u c e d o Considering data that comes late in windows and stream batches

@ A Some Stream Processing Tools x S a u c e d o ● Flink (Multiple Languages) ● Kafka Streams (Multiple Languages) ● Spark Stream (Multiple Languages) ● Faust (Python) ● Apache Beam (Python)

@ A Today we’re using x S a u c e d o Stream Processing ML Serving ML Training

@ A Machine Learning Workflow x S a u c e d o

@ A Model Training x S a u c e d o Clean Text clean_text_transformer = CleanTextTransformer() spacy_tokenizer = SpacyTokenTransformer() SpaCy Tokenizer tfidf_vectorizer = TfidfVectorizer( min_df=3, max_features=1000, preprocessor=lambda x: x, tokenizer=lambda x: x, token_pattern=None, TFIDF Vectorizer ngram_range=(1, 3), use_idf=1, smooth_idf=1, sublinear_tf=1) lr_model = LogisticRegression(C=1.0, verbose=True) Logistic Regression

@ A Model Training “You are a DUMMY!!!!!” x S a u c e d o x_train_clean = \ clean_text_transformer.transform(x_train) “You are dummy” x_train_tokenized = \ spacy_tokenizer.transform(x_train_clean) tfidf_vectorizer.fit( [ PRON, IS, DUMB ] x_train_tokenized[TOKEN_COLUMN].values) x_train_tfidf = \ tfidf_vectorizer.transform( [ 1000, 0100, 0010 ] x_train_tokenized[TOKEN_COLUMN].values) lr_model.fit(x_train_tfidf, y_train) pred = lr_model.predict(x_test_tfidf) [ 1 ]

@ A More on EDA & Model Evaluation x S a u c e d o https://github.com/axsaucedo/reddit-classification-exploration/

@ A Overview of Components x S a u c e d o Queue Reddit Source Topic: Topic: Topic: reddit_stream prediction alert ML Service Processor: Processor: seldon model fetch_stream ml_predict Stream processor

@ A Generating comments x S a u c e d o @app.timer(0.1) Queue async def generate_reddit_comments(): Reddit Source reddit_sample = await fetch_reddit_comment() Topic: Topic: Topic: reddit_stream prediction alert reddit_data = { "id": reddit_sample["id"].values[0], "score": int(reddit_sample["score"].values[0]), ... # Cut down for simplicity } ML Service Processor: await app.topic("reddit_stream").send( seldon model fetch_stream key=reddit_data["id"], value=reddit_data) Stream processor

@ A ML Stream Processing Step x S a u c e d o @app.agent(app.topic("reddit_stream")) Queue async def predict_reddit_content(tokenized_stream): async for key, comment_extended in tokenized_stream.items(): Reddit Source tokens = comment_extended["body_tokens"] Topic: Topic: Topic: reddit_stream prediction alert probability = seldon_prediction_req(tokens) data = { "probability": probability, "original": comment_extended["body"] } ML Service await app.topic("reddit_prediction").send( Processor: seldon model key=key, ml_predict value=data) Stream processor if probability > MODERATION_THRESHOLD: await reddit_mod_alert_topic.send( key=key, value=data)

@ A ML Model Request Step x S a u c e d o sc = SeldonClient( Queue gateway_endpoint="istio-ingress.istio-system.svc.cluster.local", Reddit Source deploment_name="reddit-model", namespace="default") Topic: Topic: Topic: reddit_stream prediction alert def seldon_prediction_req(tokens): data = np.array(tokens) output = sc.predict(data=data) return output.response["data"]["ndarray"] ML Service Processor: Processor: seldon model fetch_stream ml_predict Stream processor

@ A Overview of Seldon Model Serving x S a u c e d o

@ import dill A x S a Wrapping u from ml_utils import CleanTextTransformer, SpacyTokenTransformer c e d o class RedditClassifier: ML def __init__(self): self._clean_text_transformer = CleanTextTransformer() models self._spacy_tokenizer = SpacyTokenTransformer() for with open('tfidf_vectorizer.model', 'rb') as model_file: self._tfidf_vectorizer = dill.load(model_file) Serving with open('lr.model', 'rb') as model_file: self._lr_model = dill.load(model_file) with def predict(self, X, feature_names): clean_text = self._clean_text_transformer.transform(X) spacy_tokens = self._spacy_tokenizer.transform(clean_text) Seldon tfidf_features = self._tfidf_vectorizer.transform(spacy_tokens) predictions = self._lr_model.predict_proba(tfidf_features) return predictions

@ A Overview of Components x S a u c e d o Queue Reddit Source Topic: Topic: Topic: reddit_stream prediction alert ML Service Processor: Processor: seldon model fetch_stream ml_predict Stream processor

@ A Recap of Today x S a u c e d o ● Conceptual intro to stream processing ● Machine learning for real time ● Tradeoffs across tools ● Hands on use-case

EuroPython 2020 Real Time Machine Learning with Python Alejandro Saucedo | as@seldon.io @AxSaucedo

EuroPython 2020 c e d o Real Time Machine Learning with Python - PowerPoint PPT Presentation

@ A x S a u EuroPython 2020 c e d o Real Time Machine Learning with Python Alejandro Saucedo | as@seldon.io Twitter: @AxSaucedo @ my name is Alejandro A Hello, x S a u c e d o Engineering Director Seldon Technologies

Running EuroPython 2020 as an online conference EuroPython 2020 24.07.2020 EuroPython 2020

Becoming a 10x Engineer EuroPython 2020 @jenxness EuroPython 2020 @jenxness

EuroPython 2021 Dublin, July 26 Aug 1 Let's build it together! EPS Board EuroPython 2020

Automatic Conference Scheduling with PuLP EuroPython 2017 EuroPython 2017 Rimini, Italy

Writing an autoreloader in Python EuroPython 2019 Tom Forbes - tom@tomforb.es Tom Forbes -

Software patterns for productive teams Radoslav Georgiev, @Rado_g, EuroPython 2019 3rd EuroPython

Probabilistic Forecasting with DeepAR and AWS SageMaker EuroPython 2020 - Probabilistic

Staying for the Community EuroPython 2020 https://bit.ly/ceder-ep2020 Naomi Ceder, @NaomiCeder

Auditing hooks and security transparency for CPython Steve Dower, Christian Heimes EuroPython

Python and GraphQL Alec MacQueen Software Engineer @ Administrate Alec MacQueen - @macqueenism -

the Python-only web framework Iwan Vosloo EuroPython 2015 About 10 z o 80 p 26 600 loc e

PyCon in Asia -Noah Chen https://fossasia.org/ EuroPython 2018 lighting talk 07:38AM PyCon in

Distributed Workflows with Flowy EuroPython 2015 Sever Banesiu @severb Overview 1. Distributed

Whats the point of Object Orientation? Iwan Vosloo EuroPython 2016 Introduction About

EuroPython 2018 @ultrabug Gentoo Linux developer CTO at Numberly The rise of Python in the

Static Typing in Python EuroPython 2020 @di_codes Hi, I'm Dustin Developer Advocate @

Modelling and Verification Timed Automata: A Formalism for Real-time Systems Labelled transition

A Nivat Theorem for Weighted Timed Automata and Weighted Relative Distance Logic Manfred Droste

Extending the swsusp Hibernation Framework to ARM Russell Dill 1 Introduction Russ Dill of

Schedulability Analysis under Uncertainty using Formal Methods (part 2) tienne Andr and

Leonardo de Moura and Nikolaj Bjrner Microsoft Research Verification/Analysis tools need some

Introduction to SMT Albert Oliveras Technical University of Catalonia 8th International

Real-time Model Checking Timed Temporal Logics Nicolas M ARKEY Lav. Sp ecification

INVARIANTS FOR FINITE INSTANCES AND BEYOND October, 21 st 2013 Sylvain Conchon, Amit Goel, Sava