C ATCH T ARTAN : Representing and Summarizing Dynamic Multicontextual Behaviors Meng Jiang (UIUC), Christos Faloutsos (CMU), Jiawei Han (UIUC)
2 What is Tartan? Visited CMU in 2012-13 Watched lots of Tartans’ games…
3 What is Behavior? Is it valuable? Behavior: interactions made by individuals or organisms in conjunction with themselves or their environment . (Wikipedia) v Tweeting behavior v Publishing-paper behavior 20:03:09 @ebekahwsm 2009 P . Melville, W. Gryc, R. this better be the best halftime Lawrence, “Sentiment analysis of show ever in the history of blogs by combining lexical halftimes shows. ever. knowledge with text classification”, #SuperBowl KDD’09. Refs: p81623, p84395… Q: What can we discover from behavioral data? Ex. Given every phone call / message between the military leaders, scientists, businesspersons, Find …
4 Why We Talk about Behavior Today? Physical Environment Online Environment The human behaviors are broadly and deeply recorded in an unprecedented level. This is the first time that we can get insights of human behaviors and the society from large scale real data.
5 Representing and Summarizing Behavior Representing Raw data to Math Patterns: trends, events, Summarizing campaigns… Factors underlying the patterns: Understanding influence, intentions… Predicting What will happen in the future? Recommendation, Intervening spam/fraud detection…
6 Given the behavioral data (e.g., DBLP data, tweets) 2009 P. Melville, W. Gryc, R. Lawrence, “Sentiment analysis of blogs by combining lexical knowledge with text classification”, KDD’09. Refs: p81623, p84395… Return behavioral summaries (e.g., research trends, events)
7 Behaviors: Dynamic and Multicontextual 20:03:09 @ebekahwsm v Tweeting behavior this better be the best halftime show ever in the history of halftimes shows. ever. #SuperBowl t h Contextual factors: u p p p One-guaranteed Empty (set Empty (set value Set value of) value of) value Dynamic
8 Behaviors: Dynamic and Multicontextual 2009 P . Melville, W. Gryc, R. Lawrence, “Sentiment analysis of v Publishing-paper blogs by combining lexical behavior knowledge with text classification”, KDD’09. Refs: p81623, p84395… Contextual factors: t c a c a v One-guaranteed a p p p Dynamic Set value Set value value Set value
9 Summarizing Behaviors v Dynamic: taking a set of consecutive time slices v Multicontextual: taking a set of dimensions and a set of dimensional values in each dimension
10 Tensor Fails v Tensor - modeling v Representation: multidimensions: FEMA (multicontextual) (KDD’14), CrossSpot (ICDM’15) v Empty values? ∅ http://XXX.YYY ∅ Bob ∅
11 Tensor Fails (cont.) v Tensor - modeling v Summarization: multidimensions: FEMA (dynamic) (KDD’14), CrossSpot (ICDM’15) v Temporal patterns? t 10 t 11 http://XXX.YYY t 24 t 25 t 26 t 31 Bob t 37
12 Our Representations for Behavior and Behavioral Summary v Behavior: “Two-level matrix” v Behavioral summary: “Tartan”
13 The Problem of Behavioral Summarization
14 C ATCH T ARTAN v Employing a lossless encoding scheme v The Minimum Description Length (MDL) principle v Estimating the number of bits that encoding the Tartan can save from merging the meaningful pattern into the encoding of the data
15 Objective Function to Maximize Tartan Data First-level matrix Individual entries
16 Objective Function to Maximize (cont.)
17 Encoding the Tartan: Dimensions
18 Encoding the Tartan: Dimensional Values
19 Encoding the Tartan: Time Slices
20 Encoding the Tartan: Behaviors
21 Encoding the Tartan: Entries
22 Greedy Search for the Local Minimum Time complexity:
23 Qualitative Analysis: DBLP data
24 Qualitative Analysis: Super Bowl 2013
25 Quantitative Analysis: Accuracy and Efficiency in Synthetic Experiments v Tartan distribution v Data distribution
26
27 Summary v Novel representations v Behavior: “two-level matrix” vs. tensor v Behavioral summary: “Tartan” vs. dense block v A new summarization algorithm v Principled-scoring and Parameter-free: Objective function based on Minimum Description Length v Scalable: Greedy search for local optimum v Effectiveness, discovery and efficiency
28 THANK YOU! CatchTartan: Representing and Summarizing Dynamic Multicontextual Behaviors www.meng-jiang.com
29 The Distributions in Real Data
30 Qualitative Analysis: Grammy 2013
31 Convergence v Synthetic test v Real-data test
32 Efficiency and Tartan Distributions
33 Things Related with “Two-Level”Matrix v Time-evolving heterogeneous networks v Bipartite one-to-many graph t 1 p1 Author Venue t 1 1 1 p 1 a1 v1 1 1 1 p 2 p2 Papers ... … … … … a2 v2 t 2 1 1 1 p 3 p3 1 1 1 p 4 … … … … … t 2 a 1 a 2 v 1 v 2 p4
34 Things Related with “Two-Level”Matrix (cont.) v Time-evolving heterogeneous networks v Relationships t 1 p1 Author Paper Venue t 1 1 1 Relationsihps a1 v1 1 1 p2 ... … ... … … … t 2 a2 v2 p3 … … … … … … t 2 a 1 a 2 p 1 p 2 v 1 v 2 p4
Things Related with “Two-Level”Matrix (cont.) v The Meta-Path similarity metric Author Venue p1 t 1 1 1 p 1 Papers 1 1 1 p 2 a1 v1 ... … … … … p2 Author Venue 35 a2 v2 t 1 1 1 p 1 p3 1 1 1 p 2 Papers ... … … … … t 2 1 1 1 p 3 p4 1 1 1 p 4 Author – Paper – Author … … … … … Author – Paper – Venue – Paper – Author a 1 a 2 v 1 v 2
Recommend
More recommend