Fast Incremental von Neumann Graph Entropy Computation: Theory, Algorithm, and Applications Pin-Yu Chen IBM Research AI joint work with Lingfei Wu (IBM Research AI) Sijia Liu (IBM Research AI) Indika Rajapakse (Univ. Michigan Ann Arbor) Poster: Tuesday 6:30-9:00 pm, Pacific Ballroom #265 June 10, 2019 P.-Y. Chen ICML 2019 June 10, 2019 1 / 16
Graph as a Data Representation P.-Y. Chen ICML 2019 June 10, 2019 2 / 16
Information-Theoretic Measures between Graphs Structural reducibility of multilayer networks (unsupervised learning) De Domenico et al., ”Structural reducibility of multilayer networks.” Nature Communications 6 (2015). P.-Y. Chen ICML 2019 June 10, 2019 3 / 16
Von Neumann Graph Entropy (VNGE): Introduction Quantum information theory: Φ is a n × n density matrix that is symmetric, positive semidefinite, and trace ( Φ ) = 1 { λ i } n i =1 : eigenvalues of Φ Von Neumann entropy H = − trace ( Φ ln Φ ) = − � i : λ i > 0 λ i ln λ i i =1 , since � → Shannon entropy over eigenspectrum { λ i } n i λ i = 1 ⇒ Generally requires O ( n 3 ) computation complexity for H Graph G = ( V , E , W ) ∈ G : undirected weighted graphs with nonnegative edge weights. G has |V| = n nodes and |E| = m edges. L = D − W : combinatorial graph Laplacian matrix of G . D = diag ( { λ i } ) : diagonal degree matrix. [ W ] ij = w ij : edge weight. Von Neumann graph entropy (VNGE): Φ = L N = c · L , where 1 1 1 c = trace ( L ) = � i ∈V d i = 2 � ( i,j ) ∈E w ij H ≤ ln( n − 1) , “ = ” when G is a complete graph with identical edge weight Braunstein, Samuel L., Sibasish Ghosh, and Simone Severini. ”The Laplacian of a graph as a density matrix: a basic combinatorial approach to separability of mixed states.” Annals of Combinatorics 10.3 (2006): 291-317. Passerini, Filippo, and Simone Severini. ”The von Neumann entropy of networks.” (2008). P.-Y. Chen ICML 2019 June 10, 2019 4 / 16
Von Neumann Graph Entropy (VNGE): Introduction VNGE characterizes structural complexity of a graph and enables computation of Jensen-Shannon distance (JSdist) between graphs. Applications in network learning, computer vision and data science: Structural reducibility of multilayer networks (hierarchical clustering) 1 De Domenico et al., ”Structural reducibility of multilayer networks.” Nature Communications 6 (2015). Depth-analysis for image processing 2 Han, Lin, et al. ”Graph characterizations from von Neumann entropy.” Pattern Recognition Letters 33.15 (2012): 1958-1967. Bai, Lu, and Edwin R. Hancock. ”Depth-based complexity traces of graphs.” Pattern Recognition 47.3 (2014): 1172-1186. Network-ensemble comparison via edge rewiring 3 Li, Zichao, Peter J. Mucha, and Dane Taylor. ”Network-ensemble comparisons with stochastic rewiring and von Neumann entropy.” SIAM Journal on Applied Mathematics, 78(2): 897920 (2018). Structure-function analysis in genetic networks 4 Liu et al., ”Dynamic network analysis of the 4D nucleome.” bioRxiv, pp. 268318 (2018). High consistency with classical Shannon graph entropy that is defined as a probability distribution of a function on subgraphs of G . Anand, Kartik, Ginestra Bianconi, and Simone Severini. ”Shannon and von Neumann entropy of random networks with heterogeneous expected degree.” Physical Review E 83.3 (2011): 036109. Anand, Kartik, and Ginestra Bianconi. ”Entropy measures for networks: Toward an information theory of complex topologies.” Physical Review E 80.4 (2009): 045102. Li, Angsheng, and Yicheng Pan. ”Structural Information and Dynamical Complexity of Networks.” IEEE Transactions on Information Theory 62.6 (2016): 3290-3339. P.-Y. Chen ICML 2019 June 10, 2019 5 / 16
Outline The main challenge of exact VNGE computation: it generally requires cubic complexity O ( n 3 ) for obtaining the full eigenspectrum → NOT scalable to large graphs Our solution: FINGER , a scalable and provably asymptotically correct approximate computation framework of VNGE FINGER supports two different data modes: batch and online (a) Batch mode: O ( n + m ) (b) Online mode: O (∆ n + ∆ m ) New applications: Anomaly detection in evolving Wikipedia hyperlink networks 1 Bifurcation detection of cellular networks during cell reprogramming 2 Synthesized denial of service attack detection in router networks 3 P.-Y. Chen ICML 2019 June 10, 2019 6 / 16
Efficient VNGE Computation via FINGER Recall H = − � n i =1 λ i ln λ i ⇒ O ( n 3 ) cubic complexity FINGER enables fast and incremental computation of H with asymptotic approximation guarantee Lemma (Quadratic approximation of H ) The quadratic approximation of the von Neumann graph entropy H via Taylor expansion is equivalent to Q = 1 − c 2 ( � i + 2 · � i ∈V d 2 ( i,j ) ∈E w 2 ij ) d i : degree (sum of edge weights) of node i w ij : edge weight of edge ( i, j ) 1 c = 2 � ( i,j ) ∈E w ij O ( n + m ) linear complexity. |V| = n , |E| = m . Q can be incremental updated given graph changes ∆ G ⇒ O (∆ n + ∆ m ) complexity P.-Y. Chen ICML 2019 June 10, 2019 7 / 16
Approximate VNGE with Asymptotic Guarantees Let λ max ( λ min ) be the largest (smallest) positive eigenvalue in { λ i } Approx. VNGE for batch graph sequence: � H ( G ) = − Q ln λ max Approx. VNGE for online graph sequence: � H ( G ) = − Q ln(2 c · d max ) Relation: � H ≤ � H ≤ H Theorem ( o (ln n ) approximation error with balanced eigenspectrum) If the number of positive eigenvalues n + = Ω( n ) and λ min = Ω( λ max ) , the scaled approximation error (SAE) H − � ln n → 0 and H − � H H ln n → 0 as n → ∞ . h ( n ) = 0 , and lim sup n →∞ | f ( n ) f ( n ) f ( n ) = o ( h ( n )) and f ( n ) = Ω( h ( n )) mean lim n →∞ h ( n ) | > 0 , respectively. Computing λ max only requires O ( n + m ) operations via power iteration ⇒ O ( n + m ) linear complexity for � H . Theorem (Incremental update of � H with O (∆ n + ∆ m ) complexity) The VNGE � H ( G ⊕ ∆ G ) can be updated by � H ( G ⊕ ∆ G ) = F ( � H ( G ) , ∆ G ) P.-Y. Chen ICML 2019 June 10, 2019 8 / 16
Numerical Validation on Synthetic Random Graphs Erdos-Renyi graphs Watts-Strogatz graphs approx. error 0.2 approx. error 0.08 scaled scaled 0.06 0.1 0.04 0 0.02 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 number of nodes number of nodes reduction ratio (%) computation time computation time reduction ratio (%) 100 100 d = 2 p WS = 0 90 d = 5 p WS = 0 . 1 80 p WS = 0 . 2 d = 10 80 p WS = 0 . 4 d = 20 p WS = 0 . 6 d = 50 70 60 p WS = 0 . 8 1000 2000 3000 4000 5000 d = 100 1000 2000 3000 4000 5000 number of nodes number of nodes p WS = 1 d = 200 Figure: Scaled approximation error (SAE) and computation time reduction ratio scaled approximation error (SAE) = H − H approx ln n Time H − Time H approx computation time reduction ratio = Time H almost 100% speed-up ( O ( n 3 ) v.s. O ( n + m ) ) approximation error decreases as average degree increases regular (random) graphs have smaller (larger) approximation error P.-Y. Chen ICML 2019 June 10, 2019 9 / 16
Jensen-Shannon Distance between Graphs using FINGER Two graphs G and � G of the same node set V . KL divergence D KL ( G | � G ) = trace ( L N ( G ) · [ln L N ( G ) − ln L N ( � G )]) (not symmetric) Let G = G ⊕ � denote the averaged graph of G and � G G , where 2 L N ( G ) = L N ( G )+ L N ( � G ) . 2 The Jensen-Shannon divergence is defined as DIV JS ( G, � G ) = 2 D KL ( G | � 2 D KL ( � 2 [ H ( G ) + H ( � 1 G ) + 1 G | G ) = H ( G ) − 1 G )] (symmetric) G ) = √ DIV JS , The Jensen-Shannon distance is defined as JSdist ( G, � which is proved to be a valid distance metric. Briet, Jop, and Peter Harremos. ”Properties of classical and quantum Jensen-Shannon divergence.” Physical review A 79.5 (2009): 052311. P.-Y. Chen ICML 2019 June 10, 2019 10 / 16
FINGER Algorithms for Jensen-Shannon Distance Jensen-Shannon distance computation via FINGER- � H (batch mode): Input: Two graphs G and � G Output: JSdist( G, � G ) 1. Obtain G = G ⊕ � and compute � H ( G ) , � H ( � G ) , and � G H ( G ) via 2 FINGER (Fast) 2. JSdist( G, � G ) = � H ( G ) − 1 2 [ � H ( G ) + � H ( � G )] ⇒ O ( n + m ) complexity inherited from � H Jensen-Shannon distance computation via FINGER- � H (online mode): Input: Graph G and its changes ∆ G , Approx VNGE � H ( G ) of G Output: JSdist( G, G ⊕ ∆ G ) 1. compute � H ( G ⊕ ∆ G 2 ) and � H ( G ⊕ ∆ G ) via FINGER (Inc.) 2. JSdist( G, G ⊕ ∆ G ) = � 2 [ � H ( G ) + � H ( G ⊕ ∆ G 2 ) − 1 H ( G ⊕ ∆ G )] ⇒ O (∆ n + ∆ m ) complexity inherited from � H √ o ( ln n ) approximation guarantee of JSdist via FINGER (see paper) P.-Y. Chen ICML 2019 June 10, 2019 11 / 16
Recommend
More recommend