mining rich graphs
play

Mining Rich Graphs Ranking, Classification, and Anomaly Detection - PowerPoint PPT Presentation

Mining Rich Graphs Ranking, Classification, and Anomaly Detection Leman Akoglu Feb 9 th 2018 Networks are ubiquitous! - - - Terrorist Network Food Web Internet Map [Krebs 2002] [2007] [Koren 2009] Social Network Protein Network


  1. Mining Rich Graphs Ranking, Classification, and Anomaly Detection Leman Akoglu Feb 9 th 2018

  2. Networks are ubiquitous! - - - Terrorist Network Food Web Internet Map [Krebs 2002] [2007] [Koren 2009] Social Network Protein Network [Newman 2005] [Salthe 2004] Web Graph 2

  3. Graph problems - • ranking, ¡ - • classifica-on, ¡ - • clustering ¡& ¡ anomaly ¡mining, ¡ • link ¡predic-on, ¡ ¡ Terrorist Network Food Web Internet Map [Krebs 2002] [2007] [Koren 2009] • role ¡discovery, ¡ ¡ • similarity ¡search, ¡ • influence, ¡ • evolu-on, ¡ Social Network Protein Network • … ¡ ¡ [Newman 2005] [Salthe 2004] Web Graph 3

  4. Ranking in networks Src: wiki/PageRank 4

  5. Src: [Adamic+ 2005] Classification in networks 5

  6. Community detection in networks Src: [McAuley&Leckovec 2012] 6

  7. Rich networks 7

  8. Rich networks also ubiquitous! Read the Web 8

  9. Graph problems on rich networks • ranking, ¡ • clustering ¡& ¡ anomaly ¡mining, ¡ • classifica-on, ¡ • link ¡predic-on, ¡ ¡ • role ¡discovery, ¡ ¡ • similarity ¡search, ¡ • influence, ¡ Read the Web • evolu-on, ¡ • … ¡ ¡ 9

  10. Graph problems on rich networks • ranking, ¡ • clustering ¡& ¡ anomaly ¡mining, ¡ ¡ • classifica-on, ¡ • link ¡predic-on, ¡ ¡ • role ¡discovery, ¡ ¡ • similarity ¡search, ¡ • influence, ¡ Read the Web • evolu-on, ¡ • … ¡ ¡ 10

  11. Ranking in rich networks: example Medical referral network (weighted, directed) 11

  12. Ranking in rich networks: example Medical referral network + physician expertise 12

  13. Ranking in rich networks: example Town B Town A Medical referral network + physician expertise + location 13

  14. Ranking in rich networks Town B Town A Ranking Problem: Which are the top k nodes of a certain type? e.g.: Who are the best cardiologists in the network, in my town, etc.? Ranking in Heterogeneous Networks with Geo-Location Information Abhinav Mishra & Leman Akoglu SIAM SDM 2017. 14

  15. Modeling the ranking problem Goal : ranking in directed heterogeneous information networks (HIN) with geo-location n HINside model 1. Relation strength 2. Relation distance 3. Neighbor authority 4. Authority transfer rates 5. Competition v Closed form solution n Parameter estimation 15

  16. HINside model Relation Strength and Distance q edge weights ⇥ denote the log( w ( i, j ) + 1) . where W ( i, j ) = distance matrix such that q pair-wise distances ⇥ that D ( i, j ) = log( d ( l i , l j ) + 1) . for the relation distance, we combine M = W � D (3.1)

  17. HINside model i In-neighbor authority X r i = M ( j, i ) r j (3.2) j ∈ V r i : authority score of node i i Authority Transfer Rates (ATR) X r i = Γ ( t j , t i ) M ( j, i ) r j . (3.3) j ∈ V t i : type of node i

  18. HINside model other nodes of type t i in the vicinity of node j Competition j i ⇢ g ( d ( l u , l v )) u, v 2 V , u 6 = v N ( u, v ) = 0 u = v e.g. g ( z ) = e � z . for monotonically decreasing the authority scores X X (3.4) r i = Γ ( t j , t i ) M ( j, i ) ( r j + N ( v, j ) r v ) v : t v = t i j

  19. Closed-form n Authority scores vector r written in closed form (& computed by power iterations) as : L 0 + ( L 0 N 0 � E ) ⇥ ⇤ r = r = H r 2 V 8 define L = M � ( T Γ T 0 ) q T ( i, c ) = 1 if t i = T ( c ) Let T denote (n x m) where n Γ ( (m x m) authority transfer rates (ATR) n ⇢ 1 ⇢ if t u = t v q where E ( u, v ) = 0 otherwise form, E = TT 0 . X n: #nodes m: #types 19

  20. Modeling the ranking problem Goal : ranking in directed heterogeneous information networks (HIN) with geo-location n HINside model 1. Relation strength 2. Relation distance 3. Neighbor authority 4. Authority transfer rates 5. Competition v Closed form solution n Parameter estimation 20

  21. Parameter estimation n HINside’s parameters consist of the m 2 authority transfer rates (ATR) X X (3.4) r i = Γ ( t j , t i ) M ( j, i ) ( r j + N ( v, j ) r v ) v : t v = t i j q r i as a vector-vector product X X X ⇥ ⇤ r i = Γ ( t, t i ) M ( j, i )( r j + N ( v, j ) r v ) v : t v = t i t j : t j = t X (4.8) r i = Γ ( t, t i ) X ( t, i ) = t i ) = Γ 0 ( t i , :) · X (: , i ) = Γ 0 t i · x i of a feature vector x i and r i = f ( x i ) = < w , x i > . representation to be used 21

  22. An alternating optimization scheme: estimate Γ ( Γ ( r X n Given : graph G, (partial) lists ranking a subset of for exactly Output: nodes of a certain type 1: Γ 0 ( Output: } , k = 0 q Randomly initialize , 1: Γ 0 ( q Compute authority scores r using repeat q Repeat X k X ← ß compute feature vectors using r n Γ k +1 X ← ß ß learn new parameters by learning-to-rank n Γ k +1 n compute authority scores r using q Until convergence 22

  23. An alternating optimization scheme: estimate Γ ( Γ ( r X n Given : graph G, (partial) lists ranking a subset of for exactly Output: nodes of a certain type 1: Γ 0 ( Output: } , k = 0 q Randomly initialize , 1: Γ 0 ( q Compute authority scores r using repeat q Repeat X k X ← ß compute feature vectors using r n Γ k +1 ß learn new parameters by learning-to-rank X ← n Γ k +1 n compute authority scores r using q Until convergence 23

  24. RankSVM formulation Cross-entropy based objective v Given partial ranked lists; by gradient descent nodes) ( u, v ) q create all pairs otherwise. As a result, training d ) , y d ) } |D| of { (( x 1 d , x 2 d =1 , q add training data feature vectors that belong if u ranked ahead of v instance (( x u , x v ) , 1) nodes) ( u, v ) in ), and otherwise instance (( x u , x v ) , ) , − 1) in the ), and q for each type t, solve: X || Γ t || 2 min 2 + � ✏ d Γ t d 2 D t ( x 1 d − x 2 s.t. Γ 0 d ) y d ≥ 1 − ✏ d , ∀ d ∈ D and t x 1 d = t d , t x 2 ✏ d ≥ 0 , ∀ d ∈ D Γ t ( c ) ≥ 0 , ∀ c = 1 , . . . , m

  25. Graph problems on rich networks • ranking, ¡ • clustering ¡& ¡ anomaly ¡mining, ¡ ¡ • classifica-on, ¡ • link ¡predic-on, ¡ ¡ • role ¡discovery, ¡ ¡ • similarity ¡search, ¡ • influence, ¡ Read the Web • evolu-on, ¡ • … ¡ ¡ 25

  26. Attributed graphs Attributed graph: each node has 1+ properties skater data Teenager scientist Adult telemarketer doctor 26

  27. Communities in rich networks Attributed graph: each node has 1+ properties 27

  28. Anomalous subgraphs Given a set of attributed subgraphs* (e.g. Google+ circles), Find poorly-defined ones * social circles, communities, egonetworks, … 28

  29. Communities in attributed networks Given an attributed subgraph*, how to quantify its quality? * social circles, communities, egonetworks, … 29

  30. Communities in attributed networks v Given a subgraph, how to quantify its quality? 30

  31. Communities in attributed networks v Given a subgraph, how to quantify its quality? q Structure-only n Internal measures q e.g. average degree 31

  32. Communities in attributed networks v Given a subgraph, how to quantify its quality? q Structure-only n Internal-only q average degree n Boundary-only q cut edges n Internal + Boundary q conductance 32

  33. Communities in attributed networks v Given an attributed subgraph, how to quantify its quality? q Structure-only n Internal-only q average degree n Boundary-only q cut edges n Internal + Boundary q conductance q Structure + Attributes? Scalable Anomaly Ranking of Attributed Neighborhoods Bryan Perozzi and Leman Akoglu SIAM SDM 2016. 33

  34. What’s an Anomaly, Anyhow? v Given an attributed subgraph how to quantify quality? high low 34

  35. Normality (intuition) n Given an attributed subgraph how to quantify quality? q Internal n structural density high low 35

  36. Normality (intuition) n Given an attributed subgraph how to quantify quality? q Internal n structural density AND n attribute coherence high v neighborhood “focus” chess biking low 36

  37. Normality (intuition) n Given an attributed subgraph how to quantify quality? q Internal n structural density AND n attribute coherence high v neighborhood “focus” q Boundary n structural sparsity, OR n external separation low v “exoneration” 37

  38. Normality (intuition) n Motivation: [Leskovec+ ‘08] q no good cuts in real-world graphs [McAuley+ ‘14] q social circles overlap n “exoneration” : by (a) null model, (b) attributes separable by edges expected, not surprising different “focus” (b) neighborhood overlap (a) hub effect 38

  39. The measure of Normality A ij − k i k j X � � N = I + E = s ( x i , x j | w ) 2 m i ∈ C,j ∈ C 1 1 − min(1 , k i k b X � � 2 m ) s ( x i , x b | w ) (3.4) − i ∈ C,b ∈ B ( i,b ) ∈ E 39 Leman Akoglu

  40. The measure of Normality Null model A ij − k i k j X � � N = I + E = s ( x i , x j | w ) 2 m i ∈ C,j ∈ C 1 1 − min(1 , k i k b similarity X � � internal 2 m ) s ( x i , x b | w ) (3.4) − “focus” vector consistency i ∈ C,b ∈ B ( i,b ) ∈ E chess biking 40

Recommend


More recommend