a sub linear method for computing columns of functions of
play

A sub-linear method for computing columns of functions of sparse - PowerPoint PPT Presentation

A sub-linear method for computing columns of functions of sparse matrices Kyle Kloster and David F. Gleich Purdue University March 3, 2014 Supported by NSF CAREER 1149756-CCF Kyle Kloster (Purdue) Fast f ( A ) b March 3, 2014 1 / 29


  1. A sub-linear method for computing columns of functions of sparse matrices Kyle Kloster and David F. Gleich Purdue University March 3, 2014 Supported by NSF CAREER 1149756-CCF Kyle Kloster (Purdue) Fast f ( A ) b March 3, 2014 1 / 29

  2. Overview 1. f ( A ): problem description and applications 2. Description of “sub-linear” results 3. The Algorithm for f ( A ) b 4. Intuition for proof 5. Experiments on real-world social networks Kyle Kloster (Purdue) Fast f ( A ) b March 3, 2014 2 / 29

  3. The Problem Functions of Matrices: background We can apply most functions, e.g. f ( x ) = cos ( x ), to any square matrices A if f is defined on the eigenvalues of A . One definition: Taylor series! 0! + − x 2 + x 4 cos ( x ) = 1 4! + · · · 2! 0! + − A 2 + A 4 cos ( A ) = I 4! + · · · 2! Then we can think of f ( A ) b as the action of the operator f ( A ) on b , or as a diffusion on a graph underlying the matrix A . Kyle Kloster (Purdue) Fast f ( A ) b March 3, 2014 3 / 29

  4. The Problem Functions of Matrices: applications Action : f ( x ) = e x : d x dt = Ax ; x (0) = x 0 solution: x ( t ) = exp { t A } x 0 f ( x ) = x 1 / p : P ( t ) transition matrix for Markov process P (1) describes process over a year; P 1 / 12 for a month Kyle Kloster (Purdue) Fast f ( A ) b March 3, 2014 4 / 29

  5. The Problem Functions of Matrices: applications Action : f ( x ) = e x : d x dt = Ax ; x (0) = x 0 solution: x ( t ) = exp { t A } x 0 f ( x ) = x 1 / p : P ( t ) transition matrix for Markov process P (1) describes process over a year; P 1 / 12 for a month Diffusion : f ( x ) = (1 − α x ) − 1 : the resolvent yields the PageRank diffusion: f ( P ) e i interpreted as nodes’ importance to node i . f ( x ) = e tx : e t P e i , the heat kernel diffusion, offers an alternative ranking of nodes’ importance Kyle Kloster (Purdue) Fast f ( A ) b March 3, 2014 4 / 29

  6. The Problem Parameters of f ( A ) b A : Original motivation: A = a normalized version of an adjacency matrix from a social network; the Laplacian or random-walk matrix. Sparse, small diameter, stochastic, degree distribution follows power-law Generalized: any nonnegative A with � A � 1 ≤ 1. b : Originally b = e i , i.e. compute a column f ( A ) e i Generalized: b can be any sparse, stochastic vector f ( · ): Originally f ( x ) = e x , (1 − α x ) − 1 Generalized: can be any function decaying “fast enough” Kyle Kloster (Purdue) Fast f ( A ) b March 3, 2014 5 / 29

  7. The Problem Columns of the Matrix Exponential exp { A } used for link-prediction, node centrality, and clustering. Why? ∞ 1 � k ! A k exp { A } = k =0 ( A k ) ij gives the number of length- k walks from i to j , so... Large entries of exp { A } denote “important” nodes / links Used for link-prediction, node ranking, clustering Kyle Kloster (Purdue) Fast f ( A ) b March 3, 2014 6 / 29

  8. The Problem Columns of the Matrix Exponential exp { A } used for link-prediction, node centrality, and clustering. Why? ∞ 1 � k ! A k exp { A } = k =0 ( A k ) ij gives the number of length- k walks from i to j , so... Large entries of exp { A } denote “important” nodes / links Used for link-prediction, node ranking, clustering exp { A } is common, but other f ( A ) can be used: PageRank can be defined from the resolvent: ∞ ( I − α A ) − 1 = � α k A k k =0 1 → replace k ! with other coefficients? Kyle Kloster (Purdue) Fast f ( A ) b March 3, 2014 6 / 29

  9. The Problem f ( A ) as weighted sum of walks For f ( A ) = e t A and f ( A ) = (1 − α A ) − 1 , how are walks weighted? f 0 I + f 1 A + f 2 A 2 + f 3 A 3 + · · · � � f ( A ) b = b 0 10 α =0.99 Weight −5 10 t=1 t=5 t=15 α =0.85 0 20 40 60 80 100 Length Kyle Kloster (Purdue) Fast f ( A ) b March 3, 2014 7 / 29

  10. The Problem Big Graphs from Social Networks We’ve seen the computation ( f ); what does the domain of inputs look like? Social networks like Twitter, YouTube, Friendster, Livejournal Large: n = 10 6 , 10 7 , 10 9 + Sparse: | E | = O ( n ), often ≤ 50 n Difficulty: “small world” property: diameter ≈ 4 (!) Helpful: Power-law degree distribution (picture) Kyle Kloster (Purdue) Fast f ( A ) b March 3, 2014 8 / 29

  11. The Problem Power-law degree distribution 1e+07 1e+06 100000 frequency 10000 1000 100 10 1 0 9 99 999 9999 outdegree [Laboratory for Web Algorithms, http://law.di.unimi.it/index.php] Kyle Kloster (Purdue) Fast f ( A ) b March 3, 2014 9 / 29

  12. The Problem Difficulties with current methods: Sidje, TOMS 1998; Al-Mohy and Higham, SISC 2011 Leading methods for f ( A ) b use Krylov or Taylor methods: “basically” repeated mat-vecs “Small world” property: graph diameter ≤ 4 ⇒ repeated mat-vecs fill in rapidly (see picture) Not designed specifically for sparse networks. Kyle Kloster (Purdue) Fast f ( A ) b March 3, 2014 10 / 29

  13. The Problem Fill-in from repeated matvecs Vectors P k e i for k = 1 , 2 , 3 , 4. n = 1133 Kyle Kloster (Purdue) Fast f ( A ) b March 3, 2014 11 / 29

  14. The Problem f ( P ) e i is a localized vector 1.4 1.2 1 0.8 0.6 0.4 0.2 0 0 200 400 600 800 1000 1200 x-axis: vector index, y-axis: magnitude of entry the column of exp { P } produced by previous slide’s matvecs Kyle Kloster (Purdue) Fast f ( A ) b March 3, 2014 12 / 29

  15. The Problem Local Method New method: avoid mat-vecs! → use a local method. Local algorithms run in time proportional to size of output: sparse solution vector = small runtime Instead of matvecs, we do specially-selected vector adds using a relaxation method. Kyle Kloster (Purdue) Fast f ( A ) b March 3, 2014 13 / 29

  16. Main results Main Result 1 Theorem 1:[action of f on b ] Given nonnegative A satisfying � A � 1 ≤ 1, with power-law degree distribution and max degree d ; and sparse stochastic b ; our method computes x ≈ f ( A ) b such that � (1 /ε ) C f log(1 /ε ) d 2 log( d ) 2 � � f ( A ) b − x � 1 < ε in work ( ε ) = O , d 2 log( d ) 2 in the graph size “work” “scales as” for any function f that decays “fast enough”. The constant C f depends on how quickly the Taylor coefficients of f decay. Kyle Kloster (Purdue) Fast f ( A ) b March 3, 2014 14 / 29

  17. Main results Main Result 1 Theorem 1:[action of f on b ] Given nonnegative A satisfying � A � 1 ≤ 1, with power-law degree distribution and max degree d ; and sparse stochastic b ; our method computes x ≈ f ( A ) b such that � (1 /ε ) C f log(1 /ε ) d 2 log( d ) 2 � � f ( A ) b − x � 1 < ε in work ( ε ) = O , d 2 log( d ) 2 in the graph size “work” “scales as” for any function f that decays “fast enough”. The constant C f depends on how quickly the Taylor coefficients of f decay. For f ( x ) = (1 − α x ) − 1 , 1 C f = (Note: α ∈ (0 , 1)). 1 − α C f = 3 For f ( x ) = e x , 2 3 p For f ( x ) = x 1 / p , C f = (Note: p ∈ (0 , 1)). 5 p − 1 Kyle Kloster (Purdue) Fast f ( A ) b March 3, 2014 14 / 29

  18. Main results Main Result 2 Theorem 2:[diffusion of f across a graph] x ≈ ˜ Given column stochastic A and b , ˜ f ( t A ) b can be computed such that � 2 f ( t ) � � ˜ f ( P ) b − ˜ x � ∞ < ε in work ( ε ) = O , ε (Remark: the ‘tilde’ denotes a degree-normalized version for the diffusion: D − 1 exp { t P } b , for example. We normalize by degrees to adjust for the influence of the stationary distribution of P .) Corollary: f ( A ) b is a local vector. Proof: Because sublinear work is done, f ( A ) b cannot have O ( n ) nonzeros. Kyle Kloster (Purdue) Fast f ( A ) b March 3, 2014 15 / 29

  19. Our method: Nexpokit Overview Outline of Nexpokit method (our second method, hk-relax, is related) 1. Express f ( A ) b via a Taylor polynomial 2. Form large linear system out of Taylor terms 3. Use sparse solver to approximate each term’s largest entries 4. Combine approximated terms into a solution Kyle Kloster (Purdue) Fast f ( A ) b March 3, 2014 16 / 29

  20. Our method: Nexpokit In terms of Taylor terms Taylor polynomial: � f 0 I + f 1 A + f 2 A 2 + f 3 A 3 + · · · + f N A N � f ( A ) b ≈ b Compute terms recursively: v k = f k A k e i = f k f k − 1 A k − 1 � � f k − 1 A e i f k v k = f k − 1 Av k − 1 Then f ( A ) b ≈ v 0 + v 1 + · · · + v N − 1 + v N (But we want to avoid computing v j in full...) Kyle Kloster (Purdue) Fast f ( A ) b March 3, 2014 17 / 29

  21. Our method: Nexpokit Forming a linear system So we convert the Taylor polynomial into a linear system. For simplicity’s sake, we use the example of exp { A } e i here. Kyle Kloster (Purdue) Fast f ( A ) b March 3, 2014 18 / 29

  22. Our method: Nexpokit Forming a linear system So we convert the Taylor polynomial into a linear system. For simplicity’s sake, we use the example of exp { A } e i here.   I     v 0 e i − A / 1 I v 1 0         ...     0   v 2 − A / 2 =        .   .   ...  . .     . .   I       v N 0 − A / N I where we use the identity v k = 1 k Av k − 1 (which comes from k ! , so f k / f k − 1 = ( k − 1)! f k − 1 Av k − 1 , since f k = 1 f k = 1 v k = k ). k ! Then exp { A } e i ≈ v 0 + v 1 + · · · + v N − 1 + v N Kyle Kloster (Purdue) Fast f ( A ) b March 3, 2014 18 / 29

Recommend


More recommend