learn more from your logfiles learn more from your
play

Learn more from your logfiles Learn more from your logfiles using - PowerPoint PPT Presentation

Learn more from your logfiles Learn more from your logfiles using machine learning using machine learning [DEV1156] Adam.Spiers @suse.com Dirk.Mueller @suse.com CC BY-NC 2.0 Thomas Hawk We are SUSE OpenStack Cloud software engineers We are


  1. Learn more from your logfiles Learn more from your logfiles using machine learning using machine learning [DEV1156] Adam.Spiers @suse.com Dirk.Mueller @suse.com CC BY-NC 2.0 Thomas Hawk

  2. We are SUSE OpenStack Cloud software engineers We are SUSE OpenStack Cloud software engineers

  3. We love green CI We love green CI

  4. We care about upstream OpenStack CI too We care about upstream OpenStack CI too

  5. OpenStack Health OpenStack Health

  6. ?

  7. Did you find it? Did you find it?

  8. Manual Process Manual Process

  9. Idea: Reducing scrolling by pattern matching Idea: Reducing scrolling by pattern matching warning /(? i ) warning / error / Traceback \( most recent call last \)/ error /(? i ) error / error /(? i )\ bfail ( ure | ed )?\ b / error /(? i ) fatal / error / $ h 1!! /

  10. Dealing with false positives Dealing with false positives # Successful tempest run ok / ^ - ( Expected Fail | Failed ): 0 $ / ok / Warning : Turning on '--gpg-auto-import-keys' / ok / Warning : Permanently added .* to the list of known hosts / ok / WARNING : Device for PV .* not found or rejected by a filter / ok / WARNING : \ w + signature detected on .* offset \ d + . Wipe it ? / ok /grep - v failed \ b / # rpms containing "Error" ok / perl - Error [ -]| libsamba - errors | mariadb - errormessages / # https://bugzilla.suse.com/show_bug.cgi?id=1030822 warning / Cleaning up ( vip - admin -\ S + ) on \ S + , removing fail - count -\ 1 / # https://bugzilla.suse.com/show_bug.cgi?id=971832 ok / Failed to try - restart vsftpd @. service : Unit name vsftpd @. service is not va

  11. Vision: Machine Learning Vision: Machine Learning

  12. Log-Classify Log-Classify

  13. Today's plan Today's plan Intro to Machine Learning Log-Classify Implementation Demo

  14. AI vs ML vs DL AI vs ML vs DL

  15. Why Machine Learning? Why Machine Learning? g e s t e a q u r e a n p c e s n y e m s e v n l a e s e p r o s a s l i o r t unknown a l l e i a e c i m c m v health l o l e t addition n o e i u t s t t m u i s c n o a u Applications r e e t o t i h p u Research r accuracy discovery m s 8 s performance e approaches 1 r 0 whether applied s 2 training n d rule Positive l i u d e b l i system t generalization similarity knowledge u a d b R goal a e systems fields e s t e relationships n i s many desired r problem o n v cluster g I i i e methods r t r n performing a unsupervised one o c given labels n c machine s o n l l object i n t u a a m c r w logic language e s d y like i e c Negative l c i e e types p t e i e p t n n n using a trained representation n r a a c l g e t g l e p n y use r i t r i statistics e l predictions e i models t b f s anomaly theoretical medical a u n time l Semisupervised algorithmic c i article f outputs o intelligence trees tasks u e features n computing False artificial k d brain rules r e signal x o s d p Natural w e n e r r study used i a o t e two Theory i n t neural i m probabilistic c h people different s e i g take e l specific c i r l n field e mining examples i decision d s without contains s a y d restricted r d e e task Bayesian a h r e c supervised r p r c l support often a a e Software a n r e human s r o r e i class thus related e r u t c neurons m t v rulebased genetic e c e a reinforcement s mathematical e u w s regression a r e d t new s o n d statistical s m inputs i perform programming information H i e s a Classification n c k f p i Main g i n o e set show n g a r v either a i bias t AI s c predict t e t Computer e i example s dictionary o r s represent o logical r e s association way n n inductive l u algorithms e called c k See o machines may detection Networks f Sparse represented a techniques m e learned v algorithm i instances t c Similar various based n o function i vision o r r program d i m d include m test model d e a e process l feature y r known p p recognition method r n r problems user o a tree learn m v m vector network e analysis r i also c e input t find n Optimization previously Relation o output i t a v e d Typically linear a r n m deep l image e e u p m approach e biases i r Journal complexity within o a f m p g i observations s o r i c y s e e t f i a r a t r values computational layers n l y e y s density c d m i r r o e e p t neuron t r leading s i e i s e h d Speech n connection t b e a l t n y n i a o c t i k i l r n i a e s b t a c r b e o n r c p o c

  16. CI Logfiles: ML Challenges CI Logfiles: ML Challenges • Each Instance of a CI Logfile execute the same steps Install, Build, Test ฀ – – Result is recorded (success, failures) • The individual Logfiles are quickly evolving Every check-in changes it 😑 – Each run has a lot of completely unique noise 😓 • – Timestamps, UUIDs, Passwords and – ordering due to parallel execution

  17. Learning model Variations Learning model Variations Instance-based Generalizing • Directly store instances of training • Abstracting a model from training data • Derives hypotheses directly from training • Requires much longer training phase instances • Can not "untrain" previously learned data • Model can be quickly react to new training input Artifical Neural Networks (DL) • Model can be incrementally updated discarding old training input k-Nearest-Neighbor

  18. Overfitting / Underfitting Overfitting / Underfitting

  19. Machine Learning Variations Machine Learning Variations Supervised Unsupervised Classification Clustering Naive Bayes K-Means NearestNeighbor Hidden Markov Model Support Vector Machines (SVM) Neural Networks ... Neural Networks ... Regression Decision Trees Linear Regression Neural Networks ...

  20. Supervised Learning: Classification Supervised Learning: Classification Banana Banana

  21. Using machine learning for CI log files Using machine learning for CI log files

  22. Machine Learning Workflow Machine Learning Workflow • Build : an individual CI log file • Baseline : Collection of log files from good CI runs • Target : The failed CI log run logfile to be analyzed

  23. log-classify: Analogy using pictures log-classify: Analogy using pictures

  24. Generic Training Workflow Generic Training Workflow

  25. Generic Testing Workflow Generic Testing Workflow

  26. Generic Testing Workflow Generic Testing Workflow

  27. Log Input transformation example Log Input transformation example Splitting by lines Mar 11 02:43:28 localhost sudo [5195]: pam _ unix ( sudo : session ): session opened for user root by ( uid = 5) Tokenization DATE localhost sudo pam _ unix sudo session session opened for user root uid Hashing hash ( DATE ) hash ( localhost ) hash ( sudo ) hash ( pam _ unix ) hash ( sudo ) hash ( session ) hash ( opened ) ... Transformation [0, ...., 0, 1, 0, ..., 0, 1, 0, ...]

  28. Input transformation: Replace irrelevant pieces with fixed strings Input transformation: Replace irrelevant pieces with fixed strings Token Raw text months/days/date DATE UUIDs RNGU IPv4 or IPv6 addresses RNGI words that are exactly 32, 64 or 128 chars RNGN numbers of at least 3 digits RNGD

  29. Example matrix of a CI logfile Example matrix of a CI logfile

  30. k-Nearest Neighbors (k=1) k-Nearest Neighbors (k=1)

  31. Example distance calculation in kNeighbors queries Example distance calculation in kNeighbors queries • VARIABLE IS NOT DEFINED is not part of the baseline

  32. Limitations Limitations • Nearest Neighbor performs linear search in model • Complexity grows linearly with samples size • Unfiltered Noise may distract from important information • Logs containing too many features

  33. Unique vectors over training set instances Unique vectors over training set instances

  34. Lookup time per sample size Lookup time per sample size

  35. Introducing log-classify Introducing log-classify

  36. Log-classify Log-classify scikit http://scikit-learn.org/ not yet : https://www.tensorflow.org/ ( ) https://github.com/facebookresearch/pysparnn Python 3 ฀ • • Multiple Text Extraction Models • Assumes text, line based log-like input

  37. scikit-learn scikit-learn

  38. log-classify: Installation log-classify: Installation openSUSE Leap/Tumbleweed/SLE 15 SUSE Package Hub: $ zypper install python 3- logreduce Others install from PyPI: $ pip 3 install -- user logreduce NOTE : • log-classify is the new name • Rename from logreduce hasn't been completed yet

Recommend


More recommend