Influence Identification on Independent Cascade Model ������������� ��� ��� ���
Part 1 Introduction Part 2 Related Work Part 3 Algorithm Part 4 Experiment Part 5 Conclusion Part 6 References
�
��������� ����������� �������������� ��������������������� l Modeled as a graph l Spread of information, ideas, and influence l Relationships and interactions l From several sources to a scope of vertices l Plays a vital role in information diffusion l Instrumental in privacy protection etc.
��������������������� ����������������������������������� l Viral marketing is a business strategy that uses existing social networks to promote a product l A fraction of customers are provided with free copies of a product, and the retailer desires the number of adoptions triggered by such trials to be maximized l Consider the first customers as virus carrier, viral marketing expects a maximal scale of infection
��������� ����������� ������� ������ ������� ��������� ����� Select a fixed ����������� number of nodes The problem also has important applications beyond social graphs, such as placing sensors in water distribution networks for detecting contamination.
�
����������(-��(������ ��)���()�((��� ��)���()�((��������(���� � l Ranks nodes directly according to degree l Refined adaptive version of HD l Only needs local information l Recalculates the degrees after each removal l Easy to implement l Represents the one-body scenario where the l It cannot deal with the circumstance in influencers are considered in isolation, resulting in which hubs form tight community a lack of the collective influence effects from the such that their spreading areas would neighborhood heavily overlap
���-�������)�-������( (����� �������(��� l Nodes are ranked based on their ks values l Once used by Google to rank websites l k -shell decomposition l Calculation formula: l Iteratively remove nodes l All the nodes are divided into different shells according to their relative locations in l C ( A ) is defined as the number of links going out networks of page A l Core nodes have higher probabilities to l It outputs a probability distribution cause large-scale diffusions used to represent the likelihood that a person l Influence areas would heavily overlap randomly clicking on links will arrive at any under the circumstance of multiple nodes particular page
��������� ����������� ����������������������� The idea is to measure the influential power of each node based on its local topological • structure. Then the seeds can be selected by greedily choosing the one with highest influential power. • This concept was originally proposed for solving the optimal • percolation problem, which is, to disconnect the network with as few as possible nodes. Define ! "→$ as the probability that node % belongs to the giant • component in & ∖ ( .
��������� ����������� ����������������������� Clearly all ! = 0 is a solution. But in order to be an attainable solution by iteration, it must be • stable. Define $ % as the Jacobian matrix at point all ! = 0 . The solution is stable if and only if the • leading eigenvalue of $ % is less than 1.
��������� ����������� ����������������������� The leading eigenvalue (spectrum radius) can be computed by power method. • The cost function can be simplified into the following form. •
��������� ����������� ����������������������� The cost function can be written as the sum of collective influence of each node. • To make the zero solution stable, we can iteratively remove the node with maximum CI • value until the leading eigenvalue ! < 1 . $ is proportional to the order of power iteration. Higher values of $ is more accurate, and • harder to compute. As CI depends on the status of neighboring nodes, it should be re-computed in each • iteration.
�
������������������� ����������� ������������� ������������������ Collective influence was not designed for influence maximization, but it gives a way to • measure the influential power of each node. Our work is to derive the formulas of collective influence for independent cascade model. • The problem is complicated as there are now two types of interactions: transmission of • infection and giant component. Define ! "# as the probability that node $ is infected in % ∖ ' . Given seed set, this can be • calculated by iteration.
������������������� ����������� ������������� ������������������ We are interested in the appearance of giant components in infected nodes. A breakout of • infection occurs when most infected nodes are connected to each other. Now we define ! "# as the probability that node $ is infected and belongs to the giant • component without % . To list equations, three more symbols need to be defined: • & "# and ' "# , which are conditional probabilities of event in ! "# given the occurrence or • non-occurrence of event in ( "# . These two variables are independent for each edge $, % . * "# , conditional probability of event in ! "# given the knowledge that node % is not • infected successfully by $ .
������������������� ����������� ������������� ������������������ Now we can write the relationship between these variables, firstly in ! "# and $ "# , then in % "# • and & "# .
������������������� ����������� ������������� ������������������ Now we can write the relationship between these variables, firstly in ! "# and $ "# , then in % "# • and & "# .
������������������� ����������� ������������� ������������������ Then we can take partial derivative to write the Jacobian matrix. •
������������������� ����������� ������������� ������������������ And our formula of collective influence can be written in a similar way as in [1]. •
������������������� ����������� ������������� ������������������ l Finally, our algorithm for seed selection.
�
����� �������������� ���������������� Language and Module: • Python 2.7 networkx 1.11 Network: a random graph follows • power law Node state: if infected, state = 1; • otherwise, state = 0 Spread Range: •
����� �������������� �������������������������� Essentially, graph traversal
����� �������������� �������������������������� Sampled graph for fair comparison Unfair circumstance in random information spread •
����� �������������� ����������������� • 3000 nodes, 2999 edges, average degree 2 • Achieve highest spread range in a very small q ratio (1-100 seed nodes) • HD, PageRank and CI performance are close to each other, but all a little worse than our algorithm. • K-core performs worst because it will destroy the cluster structure of power law distribution graph.
����� �������������� ����������������� • Can’t outperform other heuristic algorithm. • Maybe perform better in more complicated spreading model, say linear threshold model.
�
Recommend
More recommend