A Graphical representation for identifier structure in application - PowerPoint PPT Presentation

UC Berkeley A Graphical representation for identifier structure in application logs Ari Rabkin, Wei Xu, Avani Wildani, Armando Fox, David Patterson and Randy Katz SLAML October 3, 2010

Motivation & Summary • Log analysis is fundamentally constrained by the information content of the underlying logs • Need tools to help developers spot flaws in their loging • We propose a compact graph-based representation for log structure • Differs from previous work in analyzing logging behavior, not logs of particular executions

Focus on identifers • We focus on identifiers in logs – Variable fields that refer to entities in a system. – Can be operationally defined as variable fields with increasingly many possible strings [Xu 09] • Previous work has modeled logs as sets of concurrent state machines. [Fu 09, Tan 08] – Identifiers tie together messages that correlate to the same state machine

Some defects • Imagine a transaction processing system. 3:45 Starting transaction t123   3:46 Transaction failed   3:50 Starting transaction t123   3:51 Finished trans that was started at 3:50.

Missing IDs • Imagine a transaction processing system. 3:45 Starting transaction t123   3:46 Transaction failed   No ID 3:50 Starting transaction t123   3:51 Finished trans that was started at 3:50.

Inconsistent IDs • Imagine a transaction processing system. 3:45 Starting transaction t123   3:46 Transaction failed   3:50 Starting transaction t123   3:51 Finished trans that was started at 3:50. Inconsistent identification

Ambiguous IDs • Imagine a transaction processing system. 3:45 Starting transaction t123   Ambiguous 3:46 Transaction failed   identification 3:50 Starting transaction t123   3:51 Finished trans that was started at 3:50.

Goals • Seek a compact representation for logs • Make common logging flaws visible • Facilitate comparison across related logs • Not depend on details of particular execution traces

A real example !, !3 <99+=>9 ?#@ 4 5)67&*+89 / , -. !- :%;( '&#$( ! Hadoop datanode !2 "#$%&'&#$()*&+ logs from Yahoo! M45 cluster !! !/ !0 !1

Definitions • Definitions: – A log message is a string. – Each log message is associated with a specific message type. – All messages of a type are structurally identical. (same set of identifier fields) – Identifiers belong to identifier classes.

Assumptions • Assumptions – Have representative sample of logs – Can find message type from message – Can extract identifiers from messages – Have identifier class for each identifier field in a message type

Core structure • Ex: Starting task t123 on node n Host name Task ID Task ID Starting task… Host name Formally: a graph with V = { identifier classes} U {message types} E = { (i,m) | message m includes an identifier of class i }

Subsumption • Sometimes, one identifier includes another. • Model this by adding a graph edge between two identifiers if one inclues another. • Call this subsumption – E.g., URLs subsume host names Host name URL

Frequency • Can encode frequency information on diagram Rare Medium Common • Scaled relative to most-frequent message or identifier • γ -correction: scale by sqrt(freq / Max(freq))

Ubiquity • Can show information about joint ID- message statistics • Want to distinguish (ab)normal messages • Defn: The ubiquity of identifier class C for message type T is the fraction of identifiers belonging to class C appearing in messages of type T. • Orthogonal to frequency of message

Drawing ubiquity • Line thickness proportional to ubiquity Task ID Starting task… Abnormal failure

Diagramming defects • Missing ID: Message 1 Message 2 • Inconsistent IDs Message 1 Message 2 ID 1 ID 2

Our prototype • Have a prototype that converts logs into .dot files for rendering with GraphViz • Pluggable parsers • Omit message strings; output alongside

A real example, part 2 !, !3 <99+=>9 ?#@ 4 5)67&*+89 / , -. !- :%;( '&#$( ! Hadoop datanode !2 "#$%&'&#$()*&+ logs from Yahoo! M45 cluster !! !/ !0 !1

Inconsistent identifiers +0 ,- +, /8 7'3& -, +! 9:; 567 -. *+ +< ,+ -8 "#$%&'()* !"#$%&'() 9:'/#"() -0 -/ "#$%&'(1*234(5%&5'6 !"#$%&'.)/01'2$%2&3 ! -4 Old New Logs from Chukwa, an open-source log collection system [Boulon 08, Rabkin 10]

Ambiguous identifiers Logs from SCADS, an experimental system at Berkeley

Comparing logs 9&:( 7&8( !, +. !2 )3 )1 .* 36 * 6 7#'18&32 <#',=&.- !- !5 !4 !* )6 !. !3 10 *- )/ * 30 *6 )- %&'( 4 %&'( 1! 7-8#9-:; -! ** ** 2 *4 0112341 3 5 -+ +,,-./, >?#8( )0 ! +! )* ! 1) )! 1* *1 *) +* "#$ )6 "#$ 15 )* ,* )5 ,) @ABC?D-=, */ ). *+ )4 *! *, Missing ID/message )! *! 11 !* ,, -. !6 15-node cluster at Berkeley M45 cluster (professional management) Comparing Hadoop JobTracker logs

Conclusions • Aspects of log structure can be encoded in succinct diagrams. • Our choice of representation captures: – missing identifiers, inconsistent identifiers, and ambiguous identifiers – How much detail about different topics – Ratio of routine vs peculiar messages + types • Usable on real systems, even with limited understanding of system and logs • No need for temporal information

Questions?

A note on parsing • I used semi-hand-written parsers. • Wrote rules to tag identifiers: – e.g., "job_..." is a job ID • Tokenized lines, identified line by token sequence + constants – Special cases for numbers • Explored using program analysis to extract messages – Came out ugly, but cleanable. – Need to fix names – Need to merge some categories

Related work • Xu 09 • State machines • Entropy as metric?

A Graphical representation for identifier structure in application - PowerPoint PPT Presentation

UC Berkeley A Graphical representation for identifier structure in application logs Ari Rabkin, Wei Xu, Avani Wildani, Armando Fox, David Patterson and Randy Katz SLAML October 3, 2010 Motivation & Summary Log analysis is

Security Purposes draft-iab-identifier-comparison-00 Dave Thaler dthaler@microsoft.com 1

Graphical Screen Design Grids are an essential tool for graphical design Important graphical

Graphical > Tangible? What are their limitations? 93 94 Graphical > Tangible? Graphical

Graphical Models Graphical Models Bayesian Networks Siamak Ravanbakhsh Fall 2019 Previously on

Transforming Graphical System Models to Graphical Attack Models ! Joint work with Marieta

Graphical Screen Design Grids are an essential tool for graphical design Important graphical

10/4/15 Graphical Programming (1) Maze Program TOPICS Graphical Programming Using

Probabilistic Graphical Models Probabilistic Graphical Models Structure learning in Bayesian

ANTIBACTERIAL ACTIVITY Antibacterial activity: Standard drugs 246 Graphical Presentation of

Undirected Graphical Models Aaron Courville, Universit de Montral 2 (UNDIRECTED) GRAPHICAL

JUST THE MATHS SLIDES NUMBER 5.10 GEOMETRY 10 (Graphical solutions) by A.J.Hobson

Probabilistic Graphical Models Probabilistic Graphical Models Variable elimination Siamak

Graphical Perception Nam Wook Kim Mini-Courses January @ GSAS 2018 What is graphical

Graphical models Review Graphical models (Bayes nets, Markov random fields, factor graphs) !

Probabilistic Graphical Models CMSC 691 UMBC Two Problems for Graphical Models 1 ,

Probabilistic Graphical Models Probabilistic Graphical Models introduction to learning Siamak

INTRODUCTION TO TRANSACTION PROCESSING CHAPTER 21 (6/E) CHAPTER 17 (5/E) CHAPTER 21 OUTLINE

Introduction to Transaction Processing (1) Dr Janusz R. Getta School of Computing and

Harnessing Unstructured Data with Text Mining Jarlath Quinn Analytics Consultant Rachel Clinton

Part 1: Defining Leadership Bobby Brady-Sharp / OFA Training Projects Manager We will begin the

CS 6320 - Advanced Database Systems Immanuel Trummer Course Organization Lectures

Faster MySQL replication using dependencies Abhinav Sharma Facebook Inc. Motivation

Outline Temporal and Real-Time Temporal database Databases: A survey Real-time database

CS 147: Computer Systems Performance Analysis Workload Selection 1 / 39 Overview CS147

A Graphical representation for identifier structure in application - PowerPoint PPT Presentation

UC Berkeley A Graphical representation for identifier structure in application logs Ari Rabkin, Wei Xu, Avani Wildani, Armando Fox, David Patterson and Randy Katz SLAML October 3, 2010 Motivation & Summary Log analysis is

Security Purposes draft-iab-identifier-comparison-00 Dave Thaler dthaler@microsoft.com 1

Graphical Screen Design Grids are an essential tool for graphical design Important graphical

Graphical &gt; Tangible? What are their limitations? 93 94 Graphical &gt; Tangible? Graphical

Graphical Models Graphical Models Bayesian Networks Siamak Ravanbakhsh Fall 2019 Previously on

Transforming Graphical System Models to Graphical Attack Models ! Joint work with Marieta

Graphical Screen Design Grids are an essential tool for graphical design Important graphical

10/4/15 Graphical Programming (1) Maze Program TOPICS Graphical Programming Using

Probabilistic Graphical Models Probabilistic Graphical Models Structure learning in Bayesian

ANTIBACTERIAL ACTIVITY Antibacterial activity: Standard drugs 246 Graphical Presentation of

Undirected Graphical Models Aaron Courville, Universit de Montral 2 (UNDIRECTED) GRAPHICAL

JUST THE MATHS SLIDES NUMBER 5.10 GEOMETRY 10 (Graphical solutions) by A.J.Hobson

Probabilistic Graphical Models Probabilistic Graphical Models Variable elimination Siamak

Graphical Perception Nam Wook Kim Mini-Courses January @ GSAS 2018 What is graphical

Graphical models Review Graphical models (Bayes nets, Markov random fields, factor graphs) !

Probabilistic Graphical Models CMSC 691 UMBC Two Problems for Graphical Models 1 ,

Probabilistic Graphical Models Probabilistic Graphical Models introduction to learning Siamak

INTRODUCTION TO TRANSACTION PROCESSING CHAPTER 21 (6/E) CHAPTER 17 (5/E) CHAPTER 21 OUTLINE

Introduction to Transaction Processing (1) Dr Janusz R. Getta School of Computing and

Harnessing Unstructured Data with Text Mining Jarlath Quinn Analytics Consultant Rachel Clinton

Part 1: Defining Leadership Bobby Brady-Sharp / OFA Training Projects Manager We will begin the

CS 6320 - Advanced Database Systems Immanuel Trummer Course Organization Lectures

Faster MySQL replication using dependencies Abhinav Sharma Facebook Inc. Motivation

Outline Temporal and Real-Time Temporal database Databases: A survey Real-time database

CS 147: Computer Systems Performance Analysis Workload Selection 1 / 39 Overview CS147

Graphical > Tangible? What are their limitations? 93 94 Graphical > Tangible? Graphical