Motivation & Objective Temporal term association model Query operators Running example Conclusions Querying Term Associations and their Temporal Evolution in Social Data Vassilis Plachouras Yannis Stavrakas IMIS / ”ATHENA”R.C. Greece August 31, 2012
Motivation & Objective Temporal term association model Query operators Running example Conclusions Motivation • Many applications use data from OSNs or microblogging services • Data collected by searching for terms related to the application domain • Selection of terms can have significant impact on results • Important to be able to explore the context and associations of terms
Motivation & Objective Temporal term association model Query operators Running example Conclusions Objective • Aim to develop a platform that enables definition of data analysis campaigns from OSNs • Example: a journalist explores Twitter data can issue the following query concerning the financial crisis: For the period during which there is a strong association between hashtags #crisis and #protest, which other hashtags are associated to both #crisis and #protest? Which are the relevant tweets?
Motivation & Objective Temporal term association model Query operators Running example Conclusions Preliminaries • Model applies to any temporally evolving collection of documents • We focus on tweets • Downloaded tweets are processed at regular time instances t = 1 , 2 , . . . , i • At time instance t = i , we process tweets downloaded between i − 1 and i • load tweets in relation TT with attributes tweet id, publication time and term • build model for tweets published between i − 1 and i
Motivation & Objective Temporal term association model Query operators Running example Conclusions Model definition Model M is a set of quintuples M = {� n , c , w , T , g �} where • n and c are target and context nodes, respectively, corresponding to terms • T is the set of time instances for which the tuple is valid • g is the time granularity 1 � n , c | tw |− 1 • w = P T ( n → c ) = or � n ∈ tw 1 � n ∈ tw , | tw | =1 1 w = P T ( n → n ) = � n ∈ tw 1
Motivation & Objective Temporal term association model Query operators Running example Conclusions Example of Model Build model M for the tweets tw i in two time instances t = 1 : tw 1 = { a } , tw 2 = { a } , tw 3 = { a , b } , tw 4 = { c } , tw 5 = { a , c } t = 2 : tw 6 = { a } , tw 7 = { a , c }
Motivation & Objective Temporal term association model Query operators Running example Conclusions Example of Model Build model M for the tweets tw i in two time instances t = 1 : tw 1 = { a } , tw 2 = { a } , tw 3 = { a , b } , tw 4 = { c } , tw 5 = { a , c } t = 2 : tw 6 = { a } , tw 7 = { a , c } • For tuple � a , b , w , { 1 } , 1 � ∈ M , w = 1 / 4 = 0 . 25
Motivation & Objective Temporal term association model Query operators Running example Conclusions Example of Model Build model M for the tweets tw i in two time instances t = 1 : tw 1 = { a } , tw 2 = { a } , tw 3 = { a , b } , tw 4 = { c } , tw 5 = { a , c } t = 2 : tw 6 = { a } , tw 7 = { a , c } • For tuple � a , b , w , { 1 } , 1 � ∈ M , w = 1 / 4 = 0 . 25 The model M is M = {� a , b , 0 . 25 , { 1 } , 1 � , � a , c , 0 . 25 , { 1 } , 1 � , � b , a , 1 . 00 , { 1 } , 1 � , � c , a , 0 . 50 , { 1 } , 1 � , � a , a , 0 . 50 , { 1 } , 1 � , � c , c , 0 . 50 , { 1 } , 1 � , � a , c , 0 . 50 , { 2 } , 1 � , � c , a , 1 . 00 , { 2 } , 1 � , � a , a , 0 . 50 , { 2 } , 1 �}
Motivation & Objective Temporal term association model Query operators Running example Conclusions Model as a graph b 0.25,{1},1 0.50,{1},1 1.00,{1},1 a 0.50,{2},1 0.50,{2},1 0.25,{1},1 0.50,{1},1 0.50,{1},1 c 1.00,{2},1
Motivation & Objective Temporal term association model Query operators Running example Conclusions Query operators Manipulating the quintuples of models with operators • filter • fold • jump • merge • join
Motivation & Objective Temporal term association model Query operators Running example Conclusions Filter operator Notation filter ( M , cond ) Input • Model M • Condition cond Returns Set of quintuples in M that satisfy cond Example M 2 = filter ( M 1 , T inside { 5 . . . 12 } ∧ w ∈ top (10))
Motivation & Objective Temporal term association model Query operators Running example Conclusions Fold operator Notation fold ( M , g ) Input • Model M • integer g = g o / g i where g o and g i are the time granularities of the output and input models respectively Returns Set of folded quintuples with time granularity g × g i
Motivation & Objective Temporal term association model Query operators Running example Conclusions Fold operator Example For the input model M 1 M 1 = {� n 1 , c 1 , w 1 , { 1 } , 1 � , � n 1 , c 1 , w 2 , { 2 } , 1 � , � n 1 , c 1 , w 3 , { 3 } , 1 � , � n 2 , c 1 , w 4 , { 1 } , 1 � , � n 2 , c 1 , w 5 , { 4 } , 1 �} the operation M 2 = fold ( M 1 , 3) returns M 2 = {� n 1 , c 1 , w 6 , { 1 , 2 , 3 } , 3 � , � n 2 , c 1 , w 4 , { 1 , 2 , 3 } , 3 � , � n 2 , c 1 , w 5 , { 4 , 5 , 6 } , 3 �} where w 6 = P { 1 , 2 , 3 } ( n 1 → c 1 )
Motivation & Objective Temporal term association model Query operators Running example Conclusions Jump operator Notation jump ( M , k ) Input • Model M • integer k Output A model with expanded contexts and weights equal to the probability of a path of length k between two nodes
Motivation & Objective Temporal term association model Query operators Running example Conclusions Jump operator Example For t = 1 the transition matrix b 0.25,{1},1 0 . 50 0 . 25 0 . 25 0.50,{1},1 1.00,{1},1 P { 1 } = 1 . 00 0 . 00 0 . 00 0 . 50 0 . 00 0 . 50 a 0.50,{2},1 0.50,{2},1 0.25,{1},1 0.50,{1},1 0.50,{1},1 c 1.00,{2},1
Motivation & Objective Temporal term association model Query operators Running example Conclusions Jump operator Example For t = 1 the transition matrix b 0.25,{1},1 0 . 50 0 . 25 0 . 25 0.50,{1},1 1.00,{1},1 P { 1 } = 1 . 00 0 . 00 0 . 00 0 . 50 0 . 00 0 . 50 a 0.50,{2},1 0.50,{2},1 For M ′ = jump ( M , 2) the 0.25,{1},1 weight w of tuple 0.50,{1},1 � a , a , w , { 1 } , 1 � ∈ M ′ is 0.50,{1},1 c w = p 2 1.00,{2},1 { 1 } (1 , 1)
Motivation & Objective Temporal term association model Query operators Running example Conclusions Merge operator Notation merge ( M ) Input • Model M Output A model where all tuples with the same n and c are aggregated
Motivation & Objective Temporal term association model Query operators Running example Conclusions Merge operator Example If the input model is M 1 = {� n 1 , c 1 , w 1 , T 1 , g � , � n 2 , c 1 , w 2 , T 1 , g � , � n 1 , c 1 , w 3 , T 2 , g �} then the output model M 2 = merge ( M 1 ) is M 2 = {� n 1 , c 1 , w 4 , T 1 ∪ T 2 , g � , � n 2 , c 1 , w 2 , T 1 , g �}
Motivation & Objective Temporal term association model Query operators Running example Conclusions Join operator Notation join ( M 1 , M 2 , cond ) Input • Models M 1 and M 2 • Condition cond Output A subset of M 1 which satisfies condition cond on variables of M 1 and M 2
Motivation & Objective Temporal term association model Query operators Running example Conclusions Join operator Example Given M 1 M 1 = {� n 1 , c 1 , 0 . 5 , { 1 , 2 } , 1 � , � n 1 , c 2 , 0 . 5 , { 1 , 2 } , 1 � , � n 1 , c 1 , 0 . 7 , { 3 , 4 } , 1 � , � n 1 , c 2 , 0 . 3 , { 3 , 4 } , 1 �} a query, which asks for the tuples with increasing weight over time join ( M 1 as m , M 1 as m ′ , m . n = m ′ . n ∧ m . c = m ′ . c ∧ min ( m . T ) > max ( m ′ . T ) ∧ m . w > m ′ . w ) returns M 2 = {� n 1 , c 1 , 0 . 7 , { 3 , 4 } , 1 �}
Motivation & Objective Temporal term association model Query operators Running example Conclusions Dataset • Set of 16.5 million tweets • tracking a set of 74 Greek stop-words • collected between March 20 and June 20, 2012 • processed every 4 hours • Two most frequent hashtags are #ff and #elections12 Volume of tweets with hashtags per day Volume of tweets per day 350000 50000 300000 40000 250000 # of tweets # of tweets 200000 30000 150000 20000 100000 10000 50000 0 0 10/03 24/03 07/04 21/04 05/05 19/05 02/06 16/06 30/06 10/03 24/03 07/04 21/04 05/05 19/05 02/06 16/06 30/06 Date Date
Motivation & Objective Temporal term association model Query operators Running example Conclusions Example query Query Find the hashtags that are associated with #ekloges12 and for which the association weight increases for two consecutive weeks.
Motivation & Objective Temporal term association model Query operators Running example Conclusions Example query Query expressed with operators M 2 = filter ( M 1 , n = #ekloges12 ) M 3 = fold ( M 2 , 42) M 4 = join ( M 3 as m , M 3 as m ′ , cond ) M 5 = join ( M 4 as m , M 4 as m ′ , cond ) where cond = m . n <> m . c ∧ m . n = m ′ . n ∧ m . c = m ′ . c ∧ m . w > m ′ . w ∧ min ( m . T ) = max ( m ′ . T ) + 1
Recommend
More recommend