Damping Effect on PageRank Distribution IEEE High Performace Extreme Computing, Waltham, MA, USA September 26, 2018 Tiancheng Liu Yuchen Qian Xi Chen Xiaobai Sun Department of Computer Science, Duke University, USA
Outline ⋄ Analysis: ⋄ Personalized PageRank model: damping effects on PageRank distributions invention by Brin and Page (1998) ⋄ Algorithm: in need of innovative extension exploiting structures of the personalized, ⋄ The PageRank model family: stochastic Krylov (PSK) space an analytic apparatus with increased ⋄ Findings: description power and scope by experiments on real-world network data
Sparse graphs in sparse matrix representations x 1 2 x 20 2 2 4 x 2 x 19 4 4 x 3 6 x 18 6 6 x 4 8 x 17 8 8 x 16 x 5 10 10 10 x 6 x 7 12 12 12 x 11 x 8 14 14 x 12 14 x 13 16 16 16 x 9 18 x 14 18 18 x 15 20 20 20 x 10 2 4 6 8 10 12 14 16 18 20 2 4 6 8 10 12 14 16 18 20 2 4 6 8 10 12 14 16 18 20 probability transition matrix P link graph G ( V , E ) adjacency matrix A P = A · diag(1 ./ d out ) directed edge A ( v , u ) = 1 factor form in storage ( u , v ) ∈ E d in in-degrees d out out-degrees 1 / 26
Precursor: Personalized PageRank Web surfing modeled as a random walk on M α ( v ), a Markov chain with a personalized term S e T M α ( v ) = α P + (1 − α ) S , S = v damping factor link graph personalized vector gathering vector x 1 x 1 x 1 x 20 x 20 x 20 x 2 x 19 x 2 x 19 x 2 x 19 x 3 x 3 x 3 x 18 x 18 x 18 x 4 x 4 x 4 x 17 x 17 x 17 x 16 x 16 x 16 x 5 x 5 x 5 = α +(1 − α ) x 6 x 6 x 6 x 7 x 7 x 7 x 11 x 11 x 11 x 8 x 8 x 8 x 12 x 12 x 12 x 13 x 13 x 13 x 9 x 9 x 9 x 14 x 14 x 14 x 15 x 15 x 15 x 10 x 10 x 10 link graph personalized Markov chain personalized direct links Bernoulli decision at each click: The personalized term S : follow P -links or S -links direct links to v -nodes (yellow) with probability α ∈ (0 , 1) gathering/broadcasting a.k.a. damping factor rank-1, stochastic 2 / 26
Precursor: Personalized PageRank Web surfing modeled as a random walk on M α ( v ), a Markov chain with a personalized term S e T M α ( v ) = + (1 − α ) S , S = α P v damping factor link graph personalized vector gathering vector 2 2 2 4 4 4 6 6 6 8 8 8 = α + (1 − α ) 10 10 10 12 12 12 0 . 85 0 . 15 14 14 14 16 16 16 18 18 18 20 20 20 5 10 15 20 2 4 6 8 10 12 14 16 18 20 2 4 6 8 10 12 14 16 18 20 Bernoulli decision at each click: The personalized term S : follow P -links or S -links direct links to v -nodes (yellow) with probability α ∈ (0 , 1) gathering/broadcasting a.k.a. damping factor rank-1, stochastic 2 / 26
Equivalent expressions of PageRank distribution vector Purpose: multi-aspect investigation for interpretation and computational analysis 1. Steady state distribution of M α 3. Explicit representation � α P + (1 − α ) ve T � k α k ( P k v ) M α x = x = x x = (1 − α ) � in Neumann series with P , v , α the power method � k � k 2 2 2 � α k � � 4 4 4 (1 − α ) 6 6 − → 6 2 2 2 8 8 8 4 4 4 10 6 − → 10 10 6 6 12 8 12 12 8 8 14 10 10 10 14 14 16 12 k 16 16 12 12 18 14 18 18 14 14 20 16 2 4 6 8 10 12 14 16 18 20 20 20 16 16 18 18 18 20 2 4 6 8 10 12 14 16 18 20 20 20 link graph P v x M k x 0 x α Cumulative propagation of v on P Asymptotic walk on M α , memoryless of x 0 2. Solution to sparse linear system 4. Differential transition equation ( I − α P ) x = (1 − α ) v x ( α ) = [ P ( I − α P ) − 1 − (1 − α ) − 1 I ] x ( α ) ˙ many iterative solution methods spectrum-based method 3 / 26
Outline ⋄ Analysis: ⋄ Personalized PageRank model: damping effects on PageRank distributions invention by Brin and Page (1998) in need of innovative extension ⋄ Algorithm: exploiting structures of the personalized, ⋄ The PageRank model family: stochastic Krylov (PSK) space an analytic apparatus with increased ⋄ Findings: description power and scope by experiments on real-world network data
PageRank model family: characterizing various propagation patterns 0.9 0.7 0.8 0.6 0.7 0.5 0.6 Model description in equivalent 0.5 0.4 0.4 0.3 0.3 0.2 expressions: 0.2 0.1 0.1 0 0 0 2 4 6 8 10 12 14 16 18 20 0 2 4 6 8 10 12 14 16 18 20 ⋄ Propagation kernel functions Geometric kernels (Brin-Page) Poisson kernels (Chung) 0.6 0.45 propagation patterns 0.4 0.5 0.35 0.4 0.3 0.25 0.3 ⋄ Cumulative propagation on P 0.2 0.15 0.2 0.1 0.1 0.05 ⋄ Linear systems 0 0 0 2 4 6 8 10 12 14 16 18 20 0 2 4 6 8 10 12 14 16 18 20 Conway-Maxwell-Poisson kernels (slow) Conway-Maxwell-Poisson kernels (fast) ⋄ Differential transitions 0.9 0.4 0.8 0.35 0.7 PageRank distribution response 0.3 0.6 0.25 0.5 0.2 to damping variation 0.4 0.15 0.3 0.1 0.2 0.05 0.1 0 0 0 2 4 6 8 10 12 14 16 18 20 0 2 4 6 8 10 12 14 16 18 20 Negative Binomial kernels Logarithmic kernels A few particular subfamilies of propagation kernel functions 4 / 26
Propagation kernel functions 10 6 7 Propagation kernel function f ρ ( λ ) 10 6 6 # of nodes (bin counts) 5 6 4 Bin counts 4 3 � λ k f ρ ( λ ) = w k ( ρ ) 2 2 0.9 1 0 0.8 10 -5 10 0 graph k 0 10 -5 10 0 10 5 0.7 discrete 10 5 eigenvalue pmf 10 6 7 10 6 6 # of nodes (bin counts) 5 6 PageRank vector (model solution) with particular 4 Bin counts 4 3 network P and personalized distribution vector v 2 2 30 1 20 0 10 -5 10 0 10 0 10 -5 10 0 10 5 10 5 10 6 � P k v x = f ρ ( P ) v = w k ( ρ ) · 7 10 6 6 k -th step k # of nodes (bin counts) 5 damping on 6 propagation 4 Bin counts k -th step 4 3 2 2 0.95 0.9 1 0 0.85 { w k ( ρ ) } : any probability mass function (pmf) 10 -5 0 10 0 0.8 10 -5 10 0 10 5 10 5 of variable ρ , w.i./w.o. additional parameters PageRank distributions of 3 propagation patterns with P for link graph Twitter(www) 1 1 H. Kwak et al. (2009) 5 / 26
Propagation pattern kernels : CMP sub-family 0.45 0.4 0.35 Conway-Maxwell-Poisson (CMP) : 0.3 0.25 ρ k 0.2 w k ( ρ , ν ) = 0.15 ( k !) ν Z 0.1 damping damping 0.05 normalization speed variable 0 constant 0 2 4 6 8 10 12 14 16 18 20 slow damping speed: 0 ≤ ν ≤ 1 ( ρ = 0 . 9) Damping speed parameter ν ≥ 0 including BP model and Chung’s model 0.6 0.5 0 , geometric, (B-P, 1998) 0.4 1 , Poisson, (Chung, 2007) 0.3 ν = < 1 , slow decaying with k 0.2 0.1 > 1 , fast decaying with k 0 0 2 4 6 8 10 12 14 16 18 20 fast damping speed: ν ≥ 1 ( ρ = 5) Slow and fast propagation patterns of CMP distribution 6 / 26
Propagation pattern kernels: NB sub-family Negative Binomial (NB) : step k 0.4 0.35 � k + r − 1 � ρ k (1 − ρ ) r w k ( ρ , r ) = 0.3 k distribution damping shape 0.25 variable 0.2 Distribution shape parameter r : 0.15 1 , geometric distribution 0.1 r = ∞ , Poisson distribution, with r · (1 − ρ ) = const ρ 0.05 0 0 2 4 6 8 10 12 14 16 18 20 Propagation patterns of NB distribution 7 / 26
Propagation pattern kernels: logarithmic distribution Logarithmic : step k 0.9 0.8 ρ k − 1 w k ( ρ ) = ρ ∈ (0 , 1) k , 0.7 ln(1 − ρ ) 0.6 0.5 unique new model in the model family: 0.4 weight decay faster than geometric distribution 0.3 0.2 weight decay slower than Poisson distribution 0.1 no extra control parameters 0 0 2 4 6 8 10 12 14 16 18 20 Propagation patterns of logarithmic distributions 8 / 26
Propagation pattern kernels: precursor models and new model 0.9 0.8 Precursor models : 0.7 0.6 Brin-Page 1 model: geometric distribution 0.5 0.4 0.3 w k ( α ) = (1 − α ) α k 0.2 0.1 0 Chung’s 2 model: Poisson distribution 0 2 4 6 8 10 12 14 16 18 20 0.7 w k ( β ) = e − β β k 0.6 0.5 k ! 0.4 0.3 0.2 0.1 new model in the family : 0 0 2 4 6 8 10 12 14 16 18 20 0.9 log- γ model: logarithmic distribution 0.8 γ k 0.7 − 1 0.6 w k ( γ ) = 0.5 ln(1 − γ ) k 0.4 0.3 0.2 0.1 0 0 2 4 6 8 10 12 14 16 18 20 1 L. Page and S. Brin, 1998 2 F. Chung, PNAS, 2007 9 / 26
Cumulative propagation on P 0.9 0.25 2 0.8 4 0.2 0.7 6 0.6 8 0.15 0.5 10 0.4 12 0.1 0.3 14 link graph P and 0.2 0.05 16 0.1 18 personalized vector v propagation on P 0 0 0 2 4 6 8 10 12 14 16 18 20 20 0 2 4 6 8 10 12 14 16 18 20 α k Pk v � x ( α ) = z α 2 4 6 8 10 12 14 16 18 20 geometric kernel (Brin-Page) 2 2 2 2 2 2 k 4 4 4 4 4 4 0.7 0.18 2 6 0.16 6 6 6 6 6 0.6 4 0.14 8 0.5 6 8 8 8 8 8 0.12 8 0.4 10 · · · 0.1 10 10 10 10 10 10 0.08 0.3 12 12 12 12 12 12 12 0.06 0.2 14 0.04 14 14 14 14 14 14 16 0.1 0.02 18 16 16 16 16 16 16 0 0 0 2 4 6 8 10 12 14 16 18 20 20 0 2 4 6 8 10 12 14 16 18 20 β k 18 18 18 18 18 18 Pk v 2 4 6 8 10 12 14 16 18 20 � Poisson kernel (Chung) x ( β ) = z β 20 20 20 20 20 20 k ! k 2 4 6 8 10 12 14 16 18 20 0.9 0.2 v P 2 v P m − 1 v P v Pv 2 0.18 0.8 4 0.16 0.7 6 0.14 0.6 8 0.12 0.5 10 0.1 0.4 0.08 12 0.3 0.06 14 0.2 0.04 16 0.1 0.02 18 0 0 0 2 4 6 8 10 12 14 16 18 20 0 2 4 6 8 10 12 14 16 18 20 20 γ k Pk v 2 4 6 8 10 12 14 16 18 20 � Logarithmic kernel (log- γ ) x ( γ ) = z γ k k 10 / 26
Recommend
More recommend