Modeling Topic Diffusion in Multi- Relational Bibliographic Information Networks Huan Gui, Yizhou Sun, Jiawei Han, George Brova UIUC
Multi-relational Information Networks • In the real word, objects are connected via different types of relationships, forming multi- relational heterogeneous information networks • E.g. – in the bibliographic information network, researchers could be linked together via different types of relationships • collaboration relationships, citation relationships, sharing common co-authors, co-attending conferences, etc. – In the social network case, people are connected • via friendships, colleague relationships, family relationships, etc.
Multi-relational Information Networks
Goal of this paper • They address the problem of modeling information diffusion in multi-relational information networks – Propose multi-relational diffusion model • Propose two models by extending the Linear Threshold model – Learn parameters of the diffusion model • Learning from action log (a sequence of object set recording when an object is activated) • Using MLE
Dataset • They extracted topics from papers’ titles and abstracts: – 79 topics in DBLP dataset, and 30 topics in APS dataset, – study diffusion of these topics during selected periods when these topics have increasing popularity trends
Distributed Graph Summarization
Graph Summarization • Give a compressed representation of the graph
Distributed graph processing systems • Giraph: an open source implementation of Pregel [8] proposed by Google – This paper • Others – GraphLab: proposed by Carlos Guestrin – Trinity: A Distributed Graph Engine on a Memory Cloud [SIGMOD 2013] by Microsoft Research Asia • Other distributed system in the database – Hadoop: Google – Hyracks: by Michael Carey et al (ICDE 2011)
Algorithm
MapReduce Triangle Enumeration With Guarantees
Idea • Divide graphs into multiple overlap partitions, and distribute each partition to a mapper • Based on TTP (Triangle Type Partition) algorithm [CIKM 2013] • Using multiple rounds to reduce the memory cost
Contributions • They propose Colored Triangle Type Partition (CTTP), a multi-round MapReduce randomized algorithm for triangle enumeration – Require rounds in the worst case • E is the total number of edges • m denotes the expected memory size of a reducer • M the total available space. – use M/E space per mapper, m space per reducer, and M words as total aggregate space
Results They are the first to get the result for this graph
Component Detection in Directed Networks
Directional community • They propose a novel concept of communities, directional community – nodes play two different roles, source and terminal, in a directed network
Proposed Methods • They changed Markov Clustering (MCL) and its variant, R-MCL methods • Based on a simulation of stochastic flows on the network
Case Study: Twitter • Detecting Communities from Twitter Interaction Network – a directed edge from a source node to a terminal node is created if any of the following interactions happens • retweet(forwards) a tweet • reply to a tweet • mention someone
Case Study: Twitter • Source: post some tweets • Terminal: spread the tweets This hashtag represents the “No vull pagar ” (“I don’t want to pay”) campaign, a protest in Catalonia at early April, 2012
Recommend
More recommend