Communi unity Det etec ection & & Modula larit ity The search for clustered and overlapping nodes es 1
“In the e en end, more e than they ey wanted ed freed eedom, they ey wanted ed sec ecurity. They ey wanted ed a comfortable e lif life a and they lo lost it it a all ll -- -- secu curity, co comfort a and freedom.... Whe hen n the he Athe heni nians ns f fina nally w want nted not ot to o give to o soc ociety but for or soc ociety to o give to o them em, when en t the f e freed eedom they ey w wished ed f for most was freed eedom from res esponsibility, then en Athen ens cea eased ed to be e free. ee.” -- -- Edw dward d Gibbo bbon 2
Communi unity D Detection: n: › Community Detection is the process of seeking out community structures within a network B ut what is a community? › Community structure is the occurrence of groups of nodes in a network that are more densely connected internally than with the rest of the network 3
A c communi unity i is essentia ially lly a a sub ubgraph h sel elec ected ed f from withi hin a n a net etwork While this makes sense as a simplification a communities may not be a complete graph or may overlap with other communities 4
Barabasi si’s H s Hypothese ses › A network’s community structure is uniquely encoded in its wiring diagram › A community corresponds to a connected subgraph (connectedness) › Communities correspond to locally dense neighborhoods of a network (density) › R andomly wired networks are not expected to have a community structure. 5
As n s networks s bec ecome e larger er a and more c comple lex it it is is harder er t to d det etec ect def efined ed communi unities And nd t thus hus we m mus ust employ oy algor orithms t to o det etec ect them em w wher ere e mer ere i e infer eren ence e fails 6
Basic ic P Partit itio ionin ing › In the mini nimum um-cu cut method: the network is divided into a predetermined number of parts, usually of approximately the same size, chosen such that the number of edges between groups is minimized. › Kernighan-Lin algorithm attempts to find an optimal series of interchange operations between elements of A and B which maximizes the difference in total weights 7
Kerni nigha han-Lin in A Alg lgorit ithm In order to create partitions A and B let be the internal cost of a , that is, the sum of the costs of edges between a and other nodes in A , and let be the external cost of a , that is, the sum of the costs of edges between A and nodes in B . Furthermore, let be the difference between the external and internal costs of a . If a and b are interchanged, then the reduction in cost is where is the cost of the possible edge between a and b . 8
Kerni nigha han-Lin in A Alg lgorit ithm 1 function Kernighan-Lin( G(V,E) ): 2 determine a balanced initial partition of the nodes into sets A and B 3 In order to create partitions A and B let be the 4 do 5 compute D values for all a in A and b in internal cost of a , that is, the sum of the costs of B edges between a and other nodes in A , and let 6 let gv, av, and bv be empty lists 7 for ( n := 1 to | V| / 2) be the external cost of a , that is, the sum of the 8 find a from A and b from B, such that costs of edges between A and nodes in B . g = D[ a] + D[ b] - 2* c(a, b) is maximal 9 remove a and b from further Furthermore, let consideration in this pass 10 add g to gv, a to av, and b to bv 11 update D values for the elements of A = A \ a and B = B \ b be the difference between the external and internal 12 end for 13 find k which maximizes g_max, the costs of a . If a and b are interchanged, then the sum of gv[ 1] ,...,gv[ k] reduction in cost is 14 if ( g_m ax > 0) then 15 Exchange av[ 1] ,av[ 2] ,...,av[ k] with bv[ 1] ,bv[ 2] ,...,bv[ k] 16 until ( g_m ax < = 0) 17 return G( V,E) where is the cost of the possible edge between a and b . 9
Kerni nigha han-Lin in A Alg lgorit ithm 1 function Kernighan-Lin( G(V,E) ): 2 determine a balanced initial partition of the nodes into sets A and B 3 4 do › Partition a network into two groups of 5 compute D values for all a in A and b in B predefined size. This partition is called 6 let gv, av, and bv be empty lists cut. 7 for ( n := 1 to | V| / 2) 8 find a from A and b from B, such that › Inspect each a pair of nodes, one from g = D[ a] + D[ b] - 2* c(a, b) is maximal each group. Identify the pair that results 9 remove a and b from further consideration in this pass in the largest reduction of the cut size 10 add g to gv, a to av, and b to bv 11 update D values for the elements of A (links between the two groups) if we swap = A \ a and B = B \ b them 12 end for 13 find k which maximizes g_max, the › Swap them. sum of gv[ 1] ,...,gv[ k] 14 if ( g_m ax > 0) then › If no pair deduces the cut size, we swap 15 Exchange av[ 1] ,av[ 2] ,...,av[ k] with the pair that increases the cut size the bv[ 1] ,bv[ 2] ,...,bv[ k] 16 until ( g_m ax < = 0) least. 17 return G( V,E) › The process is repeated until each node is moved once. 1 0
Kerni nigha han-Li Lin Alg lgorit ithm Bipartitioning market data 1 1
Kerni nigha han-Li Lin Alg lgorit ithm Mixed Market Highs
Kerni nigha han-Li Lin Alg lgorit ithm Stock Market Indexes
Kerni nigha han-Li Lin Alg lgorit ithm Cryptocurrency Partition
Hie ierarchic ical C l Clu lusterin ing Div ivis isiv ive Clu lusterin ing Agglo lomerativ ive Clu lusterin ing Divisive algorithms split Agglomerative algorithms communities by removing merge nodes and links that connect nodes communities with high with low similarity. similarity. › Girvan-Newman › Clauset-Newman-Moore algorithm algorithm › Louvain algorithm 1 5
Gi Girvan-Newman an Alg lgorit ithm › the Girvan–Newman algorithm focuses on edges that are most likely "between" communities › Vertex Betweenness is an indicator of highly central nodes in networks › The Girvan–Newman algorithm extends this definition to the case of edges, defining the "edge betweenness" of an edge as the number of shortest paths between pairs of nodes that run along it. 1 6
Gi Girvan-Newman an Alg lgorit ithm 1. The betweenness of all › the Girvan–Newman algorithm existing edges in the network is calculated focuses on edges that are most first. likely "between" communities 2. The edge with the highest betweenness is › Vertex Betweenness is an indicator removed. 3. The betweenness of all of highly central nodes in networks edges affected by the › The Girvan–Newman algorithm removal is recalculated. 4. Steps 2 and 3 are extends this definition to the case repeated until no edges remain. of edges, defining the "edge betweenness" of an edge as the number of shortest paths between pairs of nodes that run along it. 1 7
Gi Girvan- NewmanAlg lgorit it hm hm As applied to stock indexes 1 8
Gi Girvan- NewmanAlg lgorit it hm hm Partitions stock indexes
Gi Girvan- NewmanAlg lgorit it hm hm High Currency Values
Gi Girvan- NewmanAlg lgorit it hm hm Cryptocurrency partitioned by volume
Div ivis isiv ive C Clu lusterin ing Cryptocurrency and Foreign E xchange
Agglomer erative e Clus ustering ng › Modula larit ity is a scale value ● represents the edge weight between nodes and ; between -1 and 1 that measures the ● and are the sum of the density of edges inside weights of the edges attached to nodes and , respectively; communities to edges outside ● is the sum of all of the edge communities weights in the graph; ● and are the communities of › For a weighted graph, modularity is the nodes; and ● is a simple delta function. defined as: 23
Cl Clauset-Newman an- Moor oore A Algor orithm Partitioni ning ng C Cur urrenc ncies by O Optim imiz izin ing Modula larit ity
Louv uvain A n Algorithm hm › First, each node in the network is assigned to its own community › Then for each node i , the change in modularity is calculated for removing i from its own community and moving it into the community of each neighbor j j of i i 25
Recommend
More recommend