A practical approach of different programming techniques to implement a real-time application using Django Dipl.-Math. Sebastian Stigler sebastian.stigler@hs-aalen.de Marina Burdack, MSc marina.burdack@hs-aalen.de Aalen University of Applied Sciences, Germany
Motivation Dipl.-Math. Stigler, Burdack, MSc 2/18
Aims for the Django Application How far do we get with an Python only approach? Tool to configure and run DA / ML pipeline Datasource Preprocessing Tasks Machine Learning Tasks Presentation of the Result Dipl.-Math. Stigler, Burdack, MSc 3/18
The Focus of this Paper The preprocessing part of the pipeline. How does our application scale? What are the knobs we can use to scale the application? Dipl.-Math. Stigler, Burdack, MSc 4/18
Preprocessing App (in german) Source : own graphic Dipl.-Math. Stigler, Burdack, MSc 5/18
Methodology Dipl.-Math. Stigler, Burdack, MSc 6/18
Types of Scaling Singlethreaded ✗ Multithreaded [4, 8] ✗ Multiprocessing ✓ Distributed Task Queue ✓ Dipl.-Math. Stigler, Burdack, MSc 7/18
Multiprocessing Multiprocessing Workflow Source : own graphic The Multiprocessing Pool is realized with the ProcessPoolExecuter Class of the concurrent.futures module [5] from Python 3.7’s Standard Library. Dipl.-Math. Stigler, Burdack, MSc 8/18
Task Queue Celery Workflow Source : own graphic The Task Queue is realized with Celery 4.3 [7] and Redis [6]. Dipl.-Math. Stigler, Burdack, MSc 9/18
Structure of a chained Task Processing of a chained Task Source : own graphic Dipl.-Math. Stigler, Burdack, MSc 10/18
The Math Queueing Theory [1] A queue with c servers is stable (won’t grow without bound) if the following equation holds: ρ = λ c µ < 1 (1) Where ρ is the server utilization, λ is the arrival rate and µ is the service rate (the inverse of the service time) for one task. Dipl.-Math. Stigler, Burdack, MSc 11/18
Evaluation Dipl.-Math. Stigler, Burdack, MSc 12/18
Test Data 750 ′ 000 Measurements ( rows ) from a Davis Weatherstation 33 value/row in total 26 of them with numerical values 75 − 750 ′ 000 Messages ( msg ) are the output of the buffer with a rows/msg rate from 10000 down to 1 16 Subtasks Prepare and Result Task 6 Tasks which directly uses methods from Pandas [3] 8 Tasks which uses preprocessing methods form scikit-learn [2] Dipl.-Math. Stigler, Burdack, MSc 13/18
Test Runs Mean Servicetime per Message s task_prepare (GEN) s task_fillna_zero (PAN) s task_normalizer (SKN) 10 6 10 6 10 6 task queue task queue task queue multiprocessing multiprocessing multiprocessing groundtruth groundtruth groundtruth 10 5 10 5 10 5 saturation saturation saturation 10 4 10 4 10 4 10 3 10 3 10 3 10 2 10 2 10 2 10 0 10 1 10 2 10 3 10 4 10 0 10 1 10 2 10 3 10 4 10 0 10 1 10 2 10 3 10 4 rows/msg rows/msg rows/msg Mean Servicetime per Row task_prepare (GEN) task_fillna_zero (PAN) task_normalizer (SKN) s s s 10 5 10 5 10 5 task queue multiprocessing saturation saturation saturation groundtruth 10 4 10 4 10 4 10 3 10 3 10 3 10 2 10 2 10 2 10 1 task queue 10 1 10 1 task queue multiprocessing multiprocessing groundtruth groundtruth 10 0 10 0 10 0 10 0 10 1 10 2 10 3 10 4 10 0 10 1 10 2 10 3 10 4 10 0 10 1 10 2 10 3 10 4 rows/msg rows/msg rows/msg Dipl.-Math. Stigler, Burdack, MSc 14/18 Source : own graphic
Conclusion Dipl.-Math. Stigler, Burdack, MSc 15/18
Python Libraries a sophisticated enough for scaling real-time applications. Buffering incomming datarows can compensate overhead for Task Queues. λ µ < c determine’s the scaling for the application. All results are applicable to the machine learning process too. Dipl.-Math. Stigler, Burdack, MSc 16/18
Thank you for your attention! This was A practical approach of different programming techniques to implement a real-time application using Django Dipl.-Math. Sebastian Stigler sebastian.stigler@hs-aalen.de Marina Burdack, MSc marina.burdack@hs-aalen.de Dipl.-Math. Stigler, Burdack, MSc 17/18
References I U. Narayan Bhat. An Introduction to Queueing Theory. Modelling and Analysis in [1] Applications . Birkhäuser Basel, 2015. doi : 10.1007/978-0-8176-8421-1 . David Cournapeau and contriburors. scikit-learn . url : https://scikit-learn.org . [2] Wes McKinney et al. Pandas. Python Data Analysis Library . url : [3] https://pandas.pydata.org/ . Python Software Foundation. Thread State and the Global Interpreter Lock . url : [4] https://docs.python.org/3/c-api/init.html#thread-state-and-the-global-interpreter-lock . Brian Quinlan. concurrent.futures — Launching parallel tasks . url : [5] https://docs.python.org/3/library/concurrent.futures.html . Salvatore Sanfilippo and contriburors. Redis . url : hppts://redis.io . [6] Ask Solem and contributors. Celery: Distributed Task Queue . url : [7] www.celeryproject.org . Thomas Wouters. GlobalInterpreterLock . url : [8] https://wiki.python.org/moin/GlobalInterpreterLock . Dipl.-Math. Stigler, Burdack, MSc 18/18
Recommend
More recommend