Mining Interesting Link Formation Rules in Social Networks Cane Wing-Ki Leung, Ee-Peng Lim, David Lo, Jianshu Weng School of Information Systems Singapore Management University
Outline • Introduction • Methodology • Empirical Study • Conclusions 27/10/2010 CIKM'10 2
Introduction • Propose the task of mining interesting link formation rules in social networks • Goal: examine how links are formed in social networks as a structural effect 27/10/2010 CIKM'10 3
Example: Reciprocity Effect • A simple example – reciprocity effect: – Given is a pair of nodes, called the start node s and the end node e – Suppose we know that e trusts s at a certain time point. Questions: • Will s also trust e later? • How frequently/likely will this happen? • What other connections between s and e may lead to link formation? 27/10/2010 CIKM'10 4
More on the task • Will s trust e later? – A temporal constraint – A partial order in which “ s trusts e ” is formed after all other links connecting s and e • How frequently/likely will this happen? – Quantifying the interestingness of the observed patterns 27/10/2010 CIKM'10 5
More on the task • What other connections between s and e may lead to link formation? – Structural constraints require that s and e be connected in some way – We consider dyadic and triadic structures , aka local structures , as they have long been used in sociology for studying and predicting the dynamics of large, complex networks – Seek to mine interesting patterns that obey such constraints 27/10/2010 CIKM'10 6
Outline • Introduction • Methodology • Empirical Study • Conclusions 27/10/2010 CIKM'10 7
Methodology • We propose to study local structures for link formation in social networks – Introduce link formation rules (LF-rules) as special subgraph patterns – Formulate our task as a subgraph mining task in a social network, modeled as a directed , labeled , temporal graph – Devised a subgraph mining approach (introduced next) – Applied the proposed approach to two real-world datasets 27/10/2010 CIKM'10 8
Methodology Overview – Mine LF-rules from a given social network – Apply randomization technique to the network, for estimating the expected support of LF-rules in a random graph – Evaluate interesting rules with higher-than-expected support 27/10/2010 CIKM'10 9
LF-Patterns and Rules • LF-pattern: – a graph pattern built upon dyadic and/or triadic structures – in any actual occurrence of a LF-pattern, the link from s to e , or simply ( s,e ), is formed after all other links in the same pattern 27/10/2010 CIKM'10 10
LF-Patterns and Rules • LF-rule: – generated from a LF-pattern – consists of a precondition and a postcondition – the ( s,e ) link in our illustrations is always the postcondition 27/10/2010 CIKM'10 11
Mining LF-Rules • LF-patterns define the structural constraints of LF-rules – captures the formation of a link from a node s , called the start node, to another node e , called the end node • Mining LF-rules: – we are given a graph G , a predefined minimum frequency ( support ) and a predefined minimum confidence – find all LF-patterns that satisfy the frequency threshold – generate LF-rules from the frequent LF-patterns and compute their confidence values – retain those that satisfy the confidence threshold 27/10/2010 CIKM'10 12
Mining LF-Rules • Each LF-rule is associated with – a support value: % of nodes in G that served as the node s of the rule at least once – a confidence value: the likelihood that the ( s,e ) link exists given that the precondition connecting s and e exists • Example: – Support : ~24% of nodes in G served as node s of this rule – Confidence : Among the nodes that received a link from another node, ~32% of them reciprocated the link 27/10/2010 CIKM'10 13
Graph Randomization • Why? – LF-rules may exist in the network just by chance • How? – One possibility is graph randomization : randomize an input graph G , but preserve important nodal properties – Compute the support of LF-rules from the randomized graph, called expected support • We randomized the connectivity in G while preserving its in- degree, out-degree, label and timestamp distributions 27/10/2010 CIKM'10 14
Measuring (Un)Expectedness • Expected Support of a rule w.r.t. G – its support in G’ • Surprise of a rule – support divided by expected support of a rule – the higher the more “surprising” or “unexpected” • If link formation does follow some rules, we shall expect those rules to have higher support in G than in G’ 27/10/2010 CIKM'10 15
Summary of Methodology • Introduce LF-patterns and LF-rules – capture structural and temporal constraints • Devise a subgraph mining algorithm to find and count such patterns in a graph G – output: a set of LF-rules R with sufficient support and confidence • Conduct graph randomization on G – measure the expected support and surprise values of all rules in R • Present interesting rules in R with high surprise values 27/10/2010 CIKM'10 16
Outline • Introduction • Methodology • Empirical Study • Conclusions 27/10/2010 CIKM'10 17
Datasets • Epinions – Web of Trust, with trust ( +ve ) and distrust ( -ve ) links • myGamma, courtesy of BuzzCity – friendship network, with friends ( +ve ) and foe ( -ve ) links • Expected support computed based on 10 randomized samples of the graphs 27/10/2010 CIKM'10 18
Interesting LF-rules in myGamma • We focus on myGamma for which the complete history and ordering of friendship links are available • Top 5-rules in terms of support – report the interestingness scores of them in terms of support , expected support , surprise (supp/exp. supp), and confidence 27/10/2010 CIKM'10 19
support expected surprise confidence support (supp/exp. supp) Interestingness scores 28.91% 22.41% 1.29 43.22% 28.38% 22.37% 1.27 43.1% 25.42% 13.54% 1.88 39.15% 24.37% 1.22% 20.06 31.98% 20.55% 11.49% 1.79 27.52% 27/10/2010 CIKM'10 20
Other Observations • Users tend to rely more on mutually trusted friends in forming new friendship links. For example, – R 12 (right) has much higher confidence (~34% vs. ~22%) and surprise values (5.32 vs. 3.52) than R 11 (left) 27/10/2010 CIKM'10 21
Other Observations • In myGamma, 3.45% of users reciprocated a friend link from another user with a foe link, but with a much lower likelihood (15.98%) as compared to reciprocal friend links (31.98%) – probably due to “unwanted friendship” – not frequent/interesting in Epinions as “unwanted trustor ” is not an issue 27/10/2010 CIKM'10 22
Other Observations • If a user has formed a link based on a given precondition through an intermediary (e.g. common friend), then there is a good chance that s(he) has formed a link based on multiple occurrences of the same precondition – 29% of users support R 5 (left) • About two-third of them also support R 32 (middle) • About one-third of them also support R 34 (right) 27/10/2010 CIKM'10 23
Outline • Background – our task and motivations • Methodology • Results on myGamma • Conclusions 27/10/2010 CIKM'10 24
Conclusions • We proposed the task of mining interesting link formation rules in social network – Introduced the notions of LF-patterns and LF-rules, in which a new link between a node pair is formed as structural effect of preexisting links – Formulated as a subgraph mining task from a directed, labeled, temporal graph • Proposed a comprehensive subgraph mining approach – Devised a LF-rule mining algorithm based on gSpan – Presented LF-rules with higher-than-expected support 27/10/2010 CIKM'10 25
Thank You! 27/10/2010 CIKM'10 26
Recommend
More recommend