Spectral analysis of Wikipedia and PhysRev networks Klaus Frahm - PowerPoint PPT Presentation

Spectral analysis of Wikipedia and PhysRev networks Klaus Frahm Quantware MIPS Center Universit´ e Paul Sabatier Laboratoire de Physique Th´ eorique, UMR 5152, IRSAMC, CNRS supported by EC FET Open project NADINE FET NADINE Workshop, Directed Networks Days 2013, Milano, 13 Juin 2013

Google matrix for directed networks Google matrix for directed networks Define the adjacency matrix A by A ij = 1 if there is a link from the node j to i in the network (of size N ) and A ij = 0 otherwise. Let S ij = A ij / � i A ij and S ij = 1 /N if � i A ij = 0 (dangling nodes). S is of Perron-Frobenius type but for many networks the eigenvalue λ 1 = 1 is highly degenerate [ ⇒ convergence problem to arrive at the stationary limit of p ( t + 1) = S p ( t ) ]. Therefore define the Google matrix : G ( α ) = αS + (1 − α ) 1 N ee T where e = (1 , . . . , 1) T and α = 0 . 85 is a typical damping factor. Here there is a unique eigenvector for λ 1 = 1 called the PageRank P and the convergence goes with α t . ( CheiRank P ∗ by replacing: A → A ∗ = A T ). Klaus Frahm 2 Milano, 13 Juin 2013

Arnoldi method Arnoldi method to (partly) diagonalize large sparse non-symmetric d × d matrices: • choose an initial normalized vector ξ 0 (random or “otherwise”) • determine the Krylov space of dimension n A (typically: 1 ≪ n A ≪ d ) spanned by the vectors: ξ 0 , G ξ 0 , . . . , G n A − 1 ξ 0 • determine by Gram-Schmidt orthogonalization an orthonormal basis { ξ 0 , . . . , ξ n A − 1 } and the representation of G in this basis: k +1 � G ξ k = H jk ξ j j =0 • diagonalize the Arnoldi matrix H which has Hessenberg form:  ∗ ∗ · · · ∗ ∗  ∗ ∗ · · · ∗ ∗   0 ∗ · · · ∗ ∗   which provides the Ritz eigenvalues that are H =   . . . . ... . . . .   . . . .   0 0 · · · ∗ ∗   0 0 · · · 0 ∗ very good aproximations to the “largest” eigenvalues of A . Klaus Frahm 3 Milano, 13 Juin 2013

Invariant subspaces Invariant subspaces In realistic WWW or other networks invariant subspaces of nodes create (possibly) large degeneracies of λ 1 (or λ 2 if α < 1 ) which is very problematic for the Arnoldi method. Therefore one needs to determine the invariant subspaces defined as subsets of nodes such that for any node in a subspace each outgoing link stays in the subspace . One can efficiently find all subspaces of maximal size (or dimension) N c (with N c = bN a certain fraction of the network size N , e.g. b = 0 . 1 ) and then all subspaces with common members are merged resulting in a decomposition of the network in many separate subspaces with N s nodes and a “big” core space of the remaining N − N s nodes. Note that dangling nodes are by construction core space nodes . Possible: core space node → subspace node Impossible: subspace node → core space node Klaus Frahm 4 Milano, 13 Juin 2013

Invariant subspaces The decomposition in subspaces and a core space implies a block structure of the matrix S :   S 1 0 . . . � � S ss S sc S = , S ss = 0 S 2   0 S cc . ... . . where S ss is block diagonal according to the subspaces. The subspace blocks of S ss are all matrices of PF type with at least one eigenvalue λ 1 = 1 explaining the high degeneracies. To determine the spectrum of S apply: • Exact (or Arnoldi) diagonalization on each subspace. • The Arnoldi method to S cc to determine the largest core space eigenvalues λ j (note: | λ j | < 1 ). The largest eigenvalues of S cc are no longer degenerate but other degeneracies are possible (e.g. λ j = 0 . 9 for Wikipedia). Klaus Frahm 5 Milano, 13 Juin 2013

Spectrum of Wikipedia Spectrum of Wikipedia L. Ermann, KMF and D.L. Shepelyansky, Eur. Phys. J. B 86 , 193 (2013) Wikipedia 2009 : N = 3282257 nodes, N ℓ = 71012307 network links. spectrum of S ∗ , N s = 21198 spectrum of S , N s = 515 n A = 6000 for both cases Klaus Frahm 6 Milano, 13 Juin 2013

Spectrum of Wikipedia Some Eigenvectors: left (right): PageRank (CheiRank) black: PageRank (CheiRank) at α = 0 . 85 grey: PageRank (CheiRank) at α = 1 − 10 − 8 red and green: first two core space eigenvectors blue and pink: two eigenvectors with large imaginary part in the eigenvalue Klaus Frahm 7 Milano, 13 Juin 2013

Spectrum of Wikipedia Detail study of 200 selected eigenvectors with eigenvalues “close” to the unit circle: Klaus Frahm 8 Milano, 13 Juin 2013

Spectrum of Wikipedia Power law decay of eigenvectors: | ψ i ( K i ) | ∼ K b K i ≥ 10 4 for i ϕ = arg( λ i ) Klaus Frahm 9 Milano, 13 Juin 2013

Spectrum of Wikipedia Inverse participation ratio of eigenvectors: j | ψ i ( j ) | 2 ) 2 / � j | ψ i ( j ) | 4 ξ IPR = ( � ϕ = arg( λ i ) Klaus Frahm 10 Milano, 13 Juin 2013

Spectrum of Wikipedia “Themes” of certain eigenvectors: math (function, geometry,surface, logic-circuit) England poetry Iceland aircraft Kuwait poetry Bangladesh football 0.5 biology song muscle-artery muscle-artery New Zeland DNA Austria Bible Poland muscle-artery music 0 -1 -0.5 0 0.5 1 Australia Canada protein Brazil China RNA skin war rail 0 Texas-Dallas-Houston Gaafu Alif Atoll -0.82 -0.8 -0.78 -0.76 -0.74 -0.72 Quantum Leap Language Switzerland Australia Australia England mathematics 0 0.8 0.82 0.84 0.86 0.88 0.9 0.92 0.94 0.96 0.98 Klaus Frahm 11 Milano, 13 Juin 2013

Spectrum of Wikipedia Number of links between or inside sets A and B defined by the index K i ordered by decreasing absolute value of Wikipedia eigenstates: A = { 1 , . . . , K i } B = { K i + 1 , . . . , N } Klaus Frahm 12 Milano, 13 Juin 2013

Physical Review network Physical Review network (work in progress: KMF , Young-Ho Eom, D. Shepelyansky) N = 463347 nodes and N ℓ = 4691015 links. Coarse-grained matrix structure ( 500 × 500 cells): left: time ordered right: journal and then time ordered “11” Journals of Physical Review: (Phys. Rev. Series I), Phys. Rev., Phys. Rev. Lett., (Rev. Mod. Phys.), Phys. Rev. A, B, C, D, E, (Phys. Rev. STAB and Phys. Rev. STPER). Klaus Frahm 13 Milano, 13 Juin 2013

Physical Review network ⇒ nearly triangular matrix structure of adjancy matrix: most citations links t → t ′ are for t > t ′ (“past citations”) but there is small number ( 12126 = 2 . 6 × 10 − 3 N ℓ ) of links t → t ′ with t ≤ t ′ corresponding to future citations . Spectrum by “double-precision” Arnoldi method with n A = 8000 : Numerical problem: eigenvalues with | λ | < 0 . 3 − 0 . 4 are not reliable! Reason: large Jordan subspaces associated to the eigenvalue λ = 0 . Klaus Frahm 14 Milano, 13 Juin 2013

Physical Review network “very bad” Jordan perturbation theory: Consider a “perturbed” Jordan block of size D :   0 1 · · · 0 0 0 0 · · · 0 0   . . . . ...   . . . . . . . .     0 0 · · · 0 1   ε 0 · · · 0 0 characteristic polynomial: λ D − ( − 1) D ε ε = 0 ⇒ λ = 0 λ j = − ε 1 /D exp(2 πij/D ) ε � = 0 ⇒ for D ≈ 10 2 and ε = 10 − 16 ⇒ “Jordan-cloud” of artifical eigenvalues due to rounding errors in the region | λ | < 0 . 3 − 0 . 4 . Klaus Frahm 15 Milano, 13 Juin 2013

Triangular approximation Triangular approximation Remove the small number of links due to “future citations”. Semi-analytical diagonalization is possible: S = S 0 + e d T /N where e n = 1 for all nodes n , d n = 1 for dangling nodes n and d n = 0 otherwise. S 0 is the pure link matrix which is nil-potent : S l 0 = 0 with l = 352 . Let ψ be an eigenvector of S with eigenvalue λ and C = d T ψ . • If C = 0 ⇒ ψ eigenvector of S 0 ⇒ λ = 0 since S 0 nil-potent. These eigenvectors belong to large Jordan blocks and are responsible for the numerical problems. Note: Similar situation as in network of integer numbers where l = [log 2 ( N )] and numerical instability for | λ | < 0 . 01 . Klaus Frahm 16 Milano, 13 Juin 2013

Triangular approximation • If C � = 0 ⇒ λ � = 0 since the equation S 0 ψ = − C e/N does not have a solution ⇒ λ 1 − S 0 invertible. l − 1 � j � S 0 ⇒ ψ = C ( λ 1 − S 0 ) − 1 e/N = C � e/N . λ λ j =0 From λ l = ( d T ψ/C ) λ l ⇒ P r ( λ ) = 0 with the reduced polynomial of degree l = 352 : l − 1 P r ( λ ) = λ l − λ l − 1 − j c j = 0 c j = d T S j � , 0 e/N . j =0 ⇒ at most l = 352 eigenvalues λ � = 0 which can be numerically determined as the zeros of P r ( λ ) . However: still numerical problems: • c l − 1 ≈ 3 . 6 × 10 − 352 • alternate sign problem with a strong loss of significance. • big sensitivity of eigenvalues on c j Klaus Frahm 17 Milano, 13 Juin 2013

Triangular approximation Solution: Using the multi precision library GMP with 256 binary digits the zeros of P r ( λ ) can be determined with accuracy ∼ 10 − 18 . Furthermore the Arnoldi method can also be implemented with higher precision. zeros of P r ( λ ) from 256 binary red crosses: digits calculation blue squares: eigenvalues from Arnoldi method with 52, 256, 512, 1280 binary digits. In the last case: ⇒ break off at n A = 352 with vanishing coupling element. Klaus Frahm 18 Milano, 13 Juin 2013

Full Physical Review network Full Physical Review network High precision Arnoldi method for full Physical Review network (including the “future citations”) for 52, 256, 512, 768 binary digits and n A = 2000 : Klaus Frahm 19 Milano, 13 Juin 2013

Spectral analysis of Wikipedia and PhysRev networks Klaus Frahm - PowerPoint PPT Presentation

Spectral analysis of Wikipedia and PhysRev networks Klaus Frahm Quantware MIPS Center Universit e Paul Sabatier Laboratoire de Physique Th eorique, UMR 5152, IRSAMC, CNRS supported by EC FET Open project NADINE FET NADINE Workshop,

BIBLIOGRAPHY PRESENTATIONS KLAUS AMMANN UNTIL 20190423 klaus.ammann@ips.unibe.ch Ammann Klaus

Spectral Clustering Spectral Clustering? Spectral methods Methods using eigenvectors of

Safety Pharmacology Klaus Olejniczak Federal I nstitute for Drugs and Medical Devices (BfArM),

HDR By Ken Fisher My Inspiration Trey Ratcliffe Stuck in Customs Klaus Herrman Farbspiel

Semantic Wikipedia [[enhances::Wikipedia]] Wikipedia today A free online encyclopdia

An Introduction to Spectral Learning Hanxiao Liu November 8, 2013 An Introduction to Spectral

Lesson 9 Introduction Signal Spectral Analysis: Estimation of the power spectral density

Spectral properties of Google matrix Klaus Frahm Quantware MIPS Center Universit e Paul

Spectral properties of Google matrix Lecture 3 Klaus Frahm Quantware MIPS Center Universit e

Tor and Wikipedia Roger Dingledine The Free Haven Project 1 Motivation China blocks

Wikipedia: n ++ made easy Matt Might University of Utah / NGLY1.org matt.might.net What

Wikipedia Sociographics Jimmy Wales President, Wikimedia Foundation Wikipedia Founder Todays

Saturday, 29 January 2011 OVERVIEW What is Wikipedia/Wikimedia? (Mike) What makes a

Genealogy Wikis & Wikipedia Dave Barton Agenda What is a Wiki Genealogy Wikis

Computers Session 1 INST 346 Agenda The Computer The Course Source: Wikipedia

Physical Infrastructure Week 1 INFM 603 Agenda The Computer The Internet The Web

Spectral Techniques for Internet Traffic Christos Papadopoulos and John Heidemann USC/ISI Data

RENORMALIZATION GROUP APPROACH IN SPECTRAL ANALYSIS AND PROBLEM OF RADIATION I.M. Sigal

Spectral analysis of ranking algorithms Social and Technological Networks Rik Sarkar University

Toward certified quantum programming Christophe Chareton S ebastien Bardin, Franc ois

Reflective Laser Protective Eyewear James K Santucci 2016 DOE Accelerator Safety Workshop 21

Lecture 7: Image Sources, Convolution, Scene Graphs COMPSCI/MATH 290-04 Chris Tralie, Duke

Regularity of the Boltzmann equation in bounded domains Daniela Tonon joint work with Y. Guo, C.

Mobile Speech Processing David Huggins-Daines Language Technologies Institute Carnegie Mellon

Sambuz

Useful Links

Newsletter

Mail Us

Spectral analysis of Wikipedia and PhysRev networks Klaus Frahm - PowerPoint PPT Presentation

Spectral analysis of Wikipedia and PhysRev networks Klaus Frahm Quantware MIPS Center Universit e Paul Sabatier Laboratoire de Physique Th eorique, UMR 5152, IRSAMC, CNRS supported by EC FET Open project NADINE FET NADINE Workshop,

BIBLIOGRAPHY PRESENTATIONS KLAUS AMMANN UNTIL 20190423 klaus.ammann@ips.unibe.ch Ammann Klaus

Spectral Clustering Spectral Clustering? Spectral methods Methods using eigenvectors of

Safety Pharmacology Klaus Olejniczak Federal I nstitute for Drugs and Medical Devices (BfArM),

HDR By Ken Fisher My Inspiration Trey Ratcliffe Stuck in Customs Klaus Herrman Farbspiel

Semantic Wikipedia [[enhances::Wikipedia]] Wikipedia today A free online encyclopdia

An Introduction to Spectral Learning Hanxiao Liu November 8, 2013 An Introduction to Spectral

Lesson 9 Introduction Signal Spectral Analysis: Estimation of the power spectral density

Spectral properties of Google matrix Klaus Frahm Quantware MIPS Center Universit e Paul

Spectral properties of Google matrix Lecture 3 Klaus Frahm Quantware MIPS Center Universit e

Tor and Wikipedia Roger Dingledine The Free Haven Project 1 Motivation China blocks

Wikipedia: n ++ made easy Matt Might University of Utah / NGLY1.org matt.might.net What

Wikipedia Sociographics Jimmy Wales President, Wikimedia Foundation Wikipedia Founder Todays

Saturday, 29 January 2011 OVERVIEW What is Wikipedia/Wikimedia? (Mike) What makes a

Genealogy Wikis &amp; Wikipedia Dave Barton Agenda What is a Wiki Genealogy Wikis

Computers Session 1 INST 346 Agenda The Computer The Course Source: Wikipedia

Physical Infrastructure Week 1 INFM 603 Agenda The Computer The Internet The Web

Spectral Techniques for Internet Traffic Christos Papadopoulos and John Heidemann USC/ISI Data

RENORMALIZATION GROUP APPROACH IN SPECTRAL ANALYSIS AND PROBLEM OF RADIATION I.M. Sigal

Spectral analysis of ranking algorithms Social and Technological Networks Rik Sarkar University

Toward certified quantum programming Christophe Chareton S ebastien Bardin, Franc ois

Reflective Laser Protective Eyewear James K Santucci 2016 DOE Accelerator Safety Workshop 21

Lecture 7: Image Sources, Convolution, Scene Graphs COMPSCI/MATH 290-04 Chris Tralie, Duke

Regularity of the Boltzmann equation in bounded domains Daniela Tonon joint work with Y. Guo, C.

Mobile Speech Processing David Huggins-Daines Language Technologies Institute Carnegie Mellon

Sambuz

Useful Links

Newsletter

Mail Us

Genealogy Wikis & Wikipedia Dave Barton Agenda What is a Wiki Genealogy Wikis