relational bibliographic information
play

Relational Bibliographic Information Networks Huan Gui, Yizhou Sun, - PowerPoint PPT Presentation

Modeling Topic Diffusion in Multi- Relational Bibliographic Information Networks Huan Gui, Yizhou Sun, Jiawei Han, George Brova UIUC Multi-relational Information Networks In the real word, objects are connected via different types of


  1. Modeling Topic Diffusion in Multi- Relational Bibliographic Information Networks Huan Gui, Yizhou Sun, Jiawei Han, George Brova UIUC

  2. Multi-relational Information Networks • In the real word, objects are connected via different types of relationships, forming multi- relational heterogeneous information networks • E.g. – in the bibliographic information network, researchers could be linked together via different types of relationships • collaboration relationships, citation relationships, sharing common co-authors, co-attending conferences, etc. – In the social network case, people are connected • via friendships, colleague relationships, family relationships, etc.

  3. Multi-relational Information Networks

  4. Goal of this paper • They address the problem of modeling information diffusion in multi-relational information networks – Propose multi-relational diffusion model • Propose two models by extending the Linear Threshold model – Learn parameters of the diffusion model • Learning from action log (a sequence of object set recording when an object is activated) • Using MLE

  5. Dataset • They extracted topics from papers’ titles and abstracts: – 79 topics in DBLP dataset, and 30 topics in APS dataset, – study diffusion of these topics during selected periods when these topics have increasing popularity trends

  6. Distributed Graph Summarization

  7. Graph Summarization • Give a compressed representation of the graph

  8. Distributed graph processing systems • Giraph: an open source implementation of Pregel [8] proposed by Google – This paper • Others – GraphLab: proposed by Carlos Guestrin – Trinity: A Distributed Graph Engine on a Memory Cloud [SIGMOD 2013] by Microsoft Research Asia • Other distributed system in the database – Hadoop: Google – Hyracks: by Michael Carey et al (ICDE 2011)

  9. Algorithm

  10. MapReduce Triangle Enumeration With Guarantees

  11. Idea • Divide graphs into multiple overlap partitions, and distribute each partition to a mapper • Based on TTP (Triangle Type Partition) algorithm [CIKM 2013] • Using multiple rounds to reduce the memory cost

  12. Contributions • They propose Colored Triangle Type Partition (CTTP), a multi-round MapReduce randomized algorithm for triangle enumeration – Require rounds in the worst case • E is the total number of edges • m denotes the expected memory size of a reducer • M the total available space. – use M/E space per mapper, m space per reducer, and M words as total aggregate space

  13. Results They are the first to get the result for this graph

  14. Component Detection in Directed Networks

  15. Directional community • They propose a novel concept of communities, directional community – nodes play two different roles, source and terminal, in a directed network

  16. Proposed Methods • They changed Markov Clustering (MCL) and its variant, R-MCL methods • Based on a simulation of stochastic flows on the network

  17. Case Study: Twitter • Detecting Communities from Twitter Interaction Network – a directed edge from a source node to a terminal node is created if any of the following interactions happens • retweet(forwards) a tweet • reply to a tweet • mention someone

  18. Case Study: Twitter • Source: post some tweets • Terminal: spread the tweets This hashtag represents the “No vull pagar ” (“I don’t want to pay”) campaign, a protest in Catalonia at early April, 2012

Recommend


More recommend