Outlines ”Topic-aware Social Influence Propagation Models” by N Barbieri and et al. , ICDM 2012 ”Scalable topic-specific influence analysis on microblogs” by Bin Bi and et al., WSDM 2014
Introduction The problem of influence maximization has received a good deal of attention by the data mining research community in the last decade, but quite surprisingly, the characteristics of the item being the subject of the viral marketing campaign has been left out of the picture. Observations: Users have different interests Items have different characteristics Similar items are likely to interest the same users.
Outlines of the paper Extending the IC and LT models to be topic-aware. Using an EM approach to estimate the parameter of TIC They Introduce a new influence propagation model AIR ( Authoritativeness − Interest − Relevance ). Devising a generalized expectation maximization (GEM) approach to learn the parameter of AIR
Background Under both the IC and LT propagation models, influence maximization is NP-hard. The function σ m ( S ) is monotone (i.e., σ m ( S ) ≤ σ m ( T ) whenever S ⊆ T ) and submodular (i.e., σ m ( S ∪ w ) − σ ( S ) ≥ σ m ( T ∪ w ) − σ ( T ) whenever S ⊆ T ). There is a 1 − 1 e + φ -approximate algorithm for the influence maximization problem
Topic-aware Independent Cascade Model (TIC) In the topic-aware version of the IC model the user-to-user influence probabilities depend on the topic. - p z representing the strength of the influence exerted by user v on u , v user u on topic z . ( u , v ) ∈ E and z ∈ [1 , K ] - for each item i that propagates in the network, we have a distribution over the topics, � K γ z z =1 γ z i = P ( Z = z | i ), i = 1
Topic-aware Independent Cascade Model (TIC) In this model a propagation happens like in the IC model: when a node v first becomes active on item i, has one chance of influencing each inactive neighbor u, independently of the history thus far. The tentative succeeds with a probability that is the weighted average of the link probability w.r.t. the topic distribution of the item i: K � p i γ z i p z u , v = u , v z =1
Topic-aware Linear Threshold Model (TLT) - p z representing the weight of the edge u , v on topic z . Sum of u , v incoming weights in each node and for each topic is no more than 1. - θ u is the threshold of each node and is chosen uniformly at random from [0, 1]. i ( u ) = � K - Influence weight: W t v ∈ F i ( u , t ) γ z i p z � u , v . z =1 F i ( u , t ) denotes the set of users that have a link to u and that at time t have already adopted the item i. If W t i ( u ) ≥ θ u then u will activate on item i at time t + 1.
Observation : in both cases only the model parameters are topic-aware, while the overall mechanism of propagation does not change. Proposition: The expected spread σ m ( S ) remains monotone and submodular for m = TIC or m= TLT. The greedy algorithm provides an Corollary: (1 − 1 e − φ )-approximation for the influence maximization problem also under the TIC and TLT propagation models.
AIR (Authoritativeness-Interest-Relevance) TIC and TLT we have K ( | E | + | I | ) parameters. AIR (Authoritativeness-Interest-Relevance) propagation model assumes that social influence depends on a user authority in the context of a given topic and the interest of the user social neighborhood for that topic.
The AIR model has the following parameters: Authoritativeness of a user in a topic: p z v ∈ R (+) authoritativeness, (-) distrust Interest of a user for a topic: u = P ( Z = z | u ), and � K ϑ z z =1 ϑ z u = 1 Relevance of an item for a topic: − → ϕ z ∈ R | I | , with ϕ z i ∈ R
The working principle of AIR is a general threshold model. At the beginning of the process each user u chooses a threshold θ u uniformly at random from [0, 1]. � P ( i | u , t ) = P ( z | u ) P ( i | u , z , t ) ≥ θ u z where P ( z | u ) = ϑ z u f v ( i , u , t ) and f ( i , u , t ) are scaling factor. f v ( i , u , t ) = 0 if v �∈ F i ( u , t )
Influence Maximization in AIR Although AIR is a general threshold model, the fact that user authoritativeness can be negative makes σ AIR not submodular and not even monotone. Greedy: at each iteration greedily add to the set of seeds S the node x that brings the largest marginal gain, i.e. σ AIR ( S ∪ { x } ) − σ AIR ( S ) is maximal. Estimate σ AIR ( S ) for given S by Monte Carlo simulation. Top-k authorities: given the new item i and its distribution over i , select the top-k users v w.r.t. � K topics γ z z =1 γ z i p z v
Dataset use two real-world and publicly available datasets, both containing a social graph G = ( V , E ) and a log of past propagations D = { (User,Item,Time) } . DIGG social graph contains 11,142 users and 99,846 directed arcs, while FLIXSTER contains 6,353 users and 84,606 directed arcs.
EXPERIMENTAL EVALUATION
EXPERIMENTAL EVALUATION
EXPERIMENTAL EVALUATION
Introduction Although a few prior works do support topic-specific influence analysis, they either separate the analysis of content from that of network structure, or assume that content is the only cause of links, which is clearly an inappropriate assumption for microblog networks.
contributions They propose a new Bayesian Bernoulli-Multinomial mixture model, FLDA, to jointly model both content and links in the same generative process. They discuss and implement a distributed Gibbs-sampling technique for training FLDA over large clusters.
Followship-LDA (FLDA)
Topic-Specific Influence: the influence of user e on topic x is measured by σ e | x which is the probability of e being followed for topic x in the FLDA model. Content-Independent Popularity: the content-independent popularity of user e is measured by π e which is the probability of e being followed for any content-independent reason in the FLDA model.
Gibbs Sampling for FLDA Implement Distributed FLDA using Spark: Spark is a large-scale distributed processing framework specifically targeted at machine-learning iterative workloads.
QUERYING TOPICAL INFLUENCERS They propose a general search framework for topic-specific key influencers, called SKIT. SKIT allows a user to freely express his/her interests by typing a set of keywords. Then, SKIT returns an ordered list of key influencers by their influence scores that satisfy the users intent. INFL ( t ; u ) = σ e = u | x = t W ( t ; q ) = θ z = t | m = q
Dataset
Recommend
More recommend