A Semantic-aware Representation Framework for Online Log Analysis - PowerPoint PPT Presentation

A Semantic-aware Representation Framework for Online Log Analysis Wei Weibin Men eng, Yi Ying L Liu, Yu Yuheng Hu Huang, Sh Shenglin Zh Zhang Fe Federico Za Zaiter, Bi Bingji jin Ch Chen en, Dan Pei ei 2020/8/28 1 Weibin Meng

1 Background Design 2 Outli Outline 3 Evaluation Summary 4 2020/8/28 2 Weibin Meng

Background 2020/8/28 3 Weibin Meng

Internet Services Growing rapidly Various types of services Stability are important 2020/8/28 4 Weibin Meng

Logs ■ Monitoring data: ■ logs, traffic, PV. ■ Logs are one of the most valuable data for service management General ■ Every service generates logs Diverse ■ Logs record a vast range of runtime information (7*24) 2020/8/28 5 Weibin Meng

Logs ■ Logs are unstructured text ■ designed by developers ■ printed by logging statements (e.g., printf()) L 1 . Interface ae3, changed state to down L 2 . Interface ae3, changed state to up L 3 . Interface ae1, changed status to down L 4 . Interface ae1, changed status to up Logs are similar L 5 . Vlan-interface vlan22, changed state to down to nature L 6 . Vlan-interface vlan22, changed state to up language 2020/8/28 6 Weibin Meng

Manual inspection of logs ■ Manual inspection of logs is impossible ■ A large-scale service is often implemented/maintained by hundreds of developers/operators. ■ The volume of logs is growing rapidly. ■ Traditional way: labor-intensive and time consuming Automatic log analysis 2020/8/28 7 Weibin Meng

Automatic log analysis ■ Automatic log analysis approaches, which are employed for services management, have been widely studied Failure prediction Anomaly detection Monitoring Problem Identifying [SIGMETRICS’18] [CCS’17] [INFOCOM’19] [FSE’18] 2020/8/28 8 Weibin Meng

Log representation ■ Most of automatic log analysis require structured input ■ Logs are unstructured text ■ Log representation serves as the first step of automatic log analysis ■ Template index Lost semantic information ■ Template count vector Semantic-aware log representation approach 2020/8/28 9 Weibin Meng

Challenges Domain-specific semantic information 1 • Logs contain logs of domain-specific words Out-of-vocabulary (OOV) words 2 • The vocabulary is growing continuously because the service can be upgraded to add new features and fix bugs 2020/8/28 10 Weibin Meng

Idea Logs are designed by developers Original goal of logs: “ and “printf”-ed by services logs are for users to read” The intuition and methods in NLP can Log2Vec be applied for log representation 2020/8/28 11 Weibin Meng

Design 2020/8/28 12 Weibin Meng

Overview of Log2Vec Syns & Ants 1. Log-specific word Triples embedding 1 Word OOV word Historical embedding processor logs 2 2. Out-of-vocabulary Offline stage word processor Vocabulary Online stage 3 3. Log vector Word generation Real-time Log vectors vectors logs Open source toolkit: https://github.com/WeibinMeng/Log2Vec 2020/8/28 13 Weibin Meng

Log-specific semantics ■ When embed words of logs, we should consider many information: ■ Antonyms ■ Synonyms ■ Relation triples ■ Others (future work) ■ Traditional word embedding methods (e.g., word2vec) assumes that words with a similar context tend to have a similar meaning fail to capture the log-specific meaning 2020/8/28 14 Weibin Meng

Prepare log-specific information ■ Automatically extract ■ Antonyms & Synonyms ■ Search from WordNet [1] , a lexical database for English ■ Triples ■ Dependency tree [2] Relations Word pairs Adding methods Synonyms Interface port Operators ■ Manually modify DOWN UP WordNet Antonyms powerDown powerUp Operators Syns & Ants Historical Relations (interface, changed, state) Dependency tree Triples logs [1]Fellbaum C. WordNet[J]. The encyclopedia of applied linguistics, 2012 . [2]Culotta A, Sorensen J. Dependency tree kernels for relation extraction[C]//Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL-04). 2004: 423-429. 2020/8/28 15 Weibin Meng

Log-specific word embedding ■ Log-specific word embedding combines two existing methods: ■ Lexical Information word embedding (LWE) [1] -> ants & syns ■ Semantic Word embedding (SWE) [2] -> relation triples Share embedding with word2vec CBOW ( a model of word2vec ) SWE LWE [1]Luchen Tan, Haotian Zhang, Charles Clarke, and Mark Smucker. Lexical comparison between wikipedia and twitter corpora by using word embeddings. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing , pages 657–661, 2015. [2]/Quan Liu, Hui Jiang, Si Wei, Zhen-Hua Ling, and Yu Hu. Learning semantic word embeddings based on ordinal knowledge constraints. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) , pages 1501– 1511, 2015. 2020/8/28 16 Weibin Meng

OOV processor ■ We adopt MIMICK [3] to handle OOV words at runtime. ■ Learn a function from spelling to distributional embeddings. [3].Yuval Pinter, Robert Guthrie, and Jacob Eisenstein. Mimicking word embeddings using subword rnns. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 102–112, 2017. 2020/8/28 17 Weibin Meng

Log vector generation (Online stage) 1. Determine whether each word in logs is in vocabulary 2. Convert existing words to word vectors 3. Assign a new embedding vector to the OOV word 4. Calculate the log vector by averaging of its word vectors. 2020/8/28 18 Weibin Meng

Evaluation 2020/8/28 19 Weibin Meng

Experimental setting ■ Datasets: Datasets Description # of logs HPC High performance cluster 433,489 HDFS Hadoop distributed file system 11,175,629 ZooKeeper ZooKeeper service 74,380 Hadoop Hadoop MapReduce job 394,308 ■ Experimental setup: ■ Linux server with Intel Xeon 2.40 GHz CPU 2020/8/28 20 Weibin Meng

Measurement of OOV ■ To highlight the challenge in processing OOV words ■ Generate training sets with the percentage of original logs ranging from 10% to 90% and regard the remaining logs as the testing set OOV words has a Measurement of logs with OOV words Measurements of OOV words big percentage Always more than when trained on a 90% logs contain It’s important to handle OOV words smaller sample OOV words in Spark/Windows 2020/8/28 21 Weibin Meng

Evaluation of OOV processor ■ Randomly select a word in each log ■ Changed one of the letters to make the word as an OOV ■ Test the similarity between the changed log and the original log Dataset Spark HDFS Windows Hadoop Similarity 0.964 0.984 0.993 0.996 Average similarity when Log2Vec processes logs with OOV words Distribution of Logs’ Similarity 2020/8/28 22 Weibin Meng

Log-based service management task ■ Online log classification ■ Baselines: LogSig, FT-tree, Spell, template2Vec ■ Divide: 50% training set and 50% testing set Average Fscore of Average Fscore of Log2Vec is 0.944 baselines 0.745 Log2Vec is stable Comparison of log classification when use 50% training logs 2020/8/28 23 Weibin Meng

Summary 2020/8/28 24 Weibin Meng

Summary Log2Vec OOV processor Open-source toolkit Experiments Semantic-aware We have open- representation sourced Log2Vec, framework for online log analysis The results are excellent A mechanism for generating OOV word embeddings when new types of logs appear 2020/8/28 25 Weibin Meng

Thanks mwb16@mails.tsinghua.edu.cn Open source toolkit: https://github.com/WeibinMeng/Log2Vec 2020/8/28 26 Weibin Meng

A Semantic-aware Representation Framework for Online Log Analysis - PowerPoint PPT Presentation

A Semantic-aware Representation Framework for Online Log Analysis Wei Weibin Men eng, Yi Ying L Liu, Yu Yuheng Hu Huang, Sh Shenglin Zh Zhang Fe Federico Za Zaiter, Bi Bingji jin Ch Chen en, Dan Pei ei 2020/8/28 1 Weibin Meng 1

(142733/102960-Log[4])+(614851/73920-2 Log[64]) h 2 +(2329/1680-Log[4]) h 4 -h 10 /20160

Chandra data reduction The CDFs Giorgio, Margherita, Elisabeta, Eleonora, Lazarus, Enrica,

Semantic Processing Augmenting CFGs Currying Quantifier scope Semantic Grammars L445 / L545

Syslog and Log Rotate Computer Center, CS, NCTU Log files Execution information of each

Distributed ephemeral log service Log entries are replicated,dispersed See Ivy,

Scaling Log-Structured KV-Stores featuring Monkey and Dostoevsky SIGMOD17 / SIGMOD18 Niv Dayan

Section 3.7 Derivatives of logarithmic functions 1 Rules of exponentials and logarithms 1.

Creating Semantic Mashups: Bridging Web 2.0 and the Semantic Web Jamie Taylor, Colin Evans, Toby

: on the Semantic Web : on the Semantic Web Building a Semantic Prototype for Danish Building a

Align, Disambiguate, and Walk A Unified Approach for Measuring Semantic Similarity Semantic

Semantic Similarity MultiJEDI ERC 259234 Semantic Similarity Semantic Similarity Mostly

Module 13 Introduction to Semantic Technology, Ontologies and the Semantic Web Module 13 Outline

Section5.4 Properties of Logarithmic Functions PropertiesofLogarithms Formulas Basic

STUDIES OF CLOSED/OPEN MIRROR SYMMETRY FOR QUINTIC THREE-FOLDS THROUGH LOG MIXED HODGE THEORY 0.

CS320: Performance Evaluation Plotting data sets Semi log plots Log log plots Analyzing Program

Complementary log-log and probit: activation functions implemented in artificial neural networks

Obie%vi dello screening neonatale Alberto Burlina Dire&ore

1 An old, but timely regulatory problem When I started working on regulation in the 1970s the

CSC421 Intro to Artificial Intelligence UNIT 05: Constraint Satisfaction Problems Online search

COMP 516 Research Methods in Computer Science Dominik Wojtczak Department of Computer Science

LTC care for the future: Person Centred Co-ordinated Care Jacquie White Deputy Director -

Graph Coloring: Comparing Cluster Graphs to Factor Graphs A,B,E B,E,G B,E A,E G A,C,D,F

The Role of Academic Researcher to Mintzbergs Managerial Roles Hamzah Altamony (DBA

Uranium: Critical to a Clean 2015 ATSE Eminent Speaker Series Energy Future Vanessa Guthrie

Sambuz

Useful Links

Newsletter

Mail Us

A Semantic-aware Representation Framework for Online Log Analysis - PowerPoint PPT Presentation

A Semantic-aware Representation Framework for Online Log Analysis Wei Weibin Men eng, Yi Ying L Liu, Yu Yuheng Hu Huang, Sh Shenglin Zh Zhang Fe Federico Za Zaiter, Bi Bingji jin Ch Chen en, Dan Pei ei 2020/8/28 1 Weibin Meng 1

(142733/102960-Log[4])+(614851/73920-2 Log[64]) h 2 +(2329/1680-Log[4]) h 4 -h 10 /20160

Chandra data reduction The CDFs Giorgio, Margherita, Elisabeta, Eleonora, Lazarus, Enrica,

Semantic Processing Augmenting CFGs Currying Quantifier scope Semantic Grammars L445 / L545

Syslog and Log Rotate Computer Center, CS, NCTU Log files Execution information of each

Distributed ephemeral log service Log entries are replicated,dispersed See Ivy,

Scaling Log-Structured KV-Stores featuring Monkey and Dostoevsky SIGMOD17 / SIGMOD18 Niv Dayan

Section 3.7 Derivatives of logarithmic functions 1 Rules of exponentials and logarithms 1.

Creating Semantic Mashups: Bridging Web 2.0 and the Semantic Web Jamie Taylor, Colin Evans, Toby

: on the Semantic Web : on the Semantic Web Building a Semantic Prototype for Danish Building a

Align, Disambiguate, and Walk A Unified Approach for Measuring Semantic Similarity Semantic

Semantic Similarity MultiJEDI ERC 259234 Semantic Similarity Semantic Similarity Mostly

Module 13 Introduction to Semantic Technology, Ontologies and the Semantic Web Module 13 Outline

Section5.4 Properties of Logarithmic Functions PropertiesofLogarithms Formulas Basic

STUDIES OF CLOSED/OPEN MIRROR SYMMETRY FOR QUINTIC THREE-FOLDS THROUGH LOG MIXED HODGE THEORY 0.

CS320: Performance Evaluation Plotting data sets Semi log plots Log log plots Analyzing Program

Complementary log-log and probit: activation functions implemented in artificial neural networks

Obie%vi dello screening neonatale Alberto Burlina Dire&amp;ore

1 An old, but timely regulatory problem When I started working on regulation in the 1970s the

CSC421 Intro to Artificial Intelligence UNIT 05: Constraint Satisfaction Problems Online search

COMP 516 Research Methods in Computer Science Dominik Wojtczak Department of Computer Science

LTC care for the future: Person Centred Co-ordinated Care Jacquie White Deputy Director -

Graph Coloring: Comparing Cluster Graphs to Factor Graphs A,B,E B,E,G B,E A,E G A,C,D,F

The Role of Academic Researcher to Mintzbergs Managerial Roles Hamzah Altamony (DBA

Uranium: Critical to a Clean 2015 ATSE Eminent Speaker Series Energy Future Vanessa Guthrie

Sambuz

Useful Links

Newsletter

Mail Us

Obie%vi dello screening neonatale Alberto Burlina Dire&ore