Quick detection of nodes with large degrees Nelly Litvak - PowerPoint PPT Presentation

Quick detection of nodes with large degrees Nelly Litvak University of Twente, Stochastic Operations Research group NADINE meeting, 14-06-2013

Finding top-k largest degree nodes with Konstantin Avrachenkov, Marina Sokol, Don Towsley [ Nelly Litvak, 14-06-2013 ] 2/27

Finding top-k largest degree nodes with Konstantin Avrachenkov, Marina Sokol, Don Towsley What if we would like to find in a network top-k nodes with largest degrees? Some applications: ◮ Routing via large degree nodes ◮ Proxy for various centrality measures ◮ Node clustering and classification ◮ Epidemic processes on networks [ Nelly Litvak, 14-06-2013 ] 2/27

Top-k largest degree nodes If the adjacency list of the network is known... the top-k list of nodes can be found by the HeapSort with complexity O ( n + klog ( n )) , where n is the total number of nodes. Even this modest complexity can be quite demanding for large networks. [ Nelly Litvak, 14-06-2013 ] 3/27

Random walk approach Let us now try a random walk on the network. We actually recommend the random walk with jumps with the following transition probabilities: � α/ n + 1 d i + α , if i has a link to j , p ij = (1) α/ n d i + α , if i does not have a link to j , where d i is the degree of node i and α is a parameter. [ Nelly Litvak, 14-06-2013 ] 4/27

Random walk approach Let us now try a random walk on the network. We actually recommend the random walk with jumps with the following transition probabilities: � α/ n + 1 d i + α , if i has a link to j , p ij = (1) α/ n d i + α , if i does not have a link to j , where d i is the degree of node i and α is a parameter. The introduced random walk is time reversible, its stationary distribution is given by a simple formula d i + α π i ( α ) = ∀ i ∈ V . (2) 2 | E | + n α [ Nelly Litvak, 14-06-2013 ] 4/27

Random walk approach Example: If we run a random walk on the web graph of the UK domain (about 18 500 000 nodes), the random walk spends on average only about 5 800 steps to detect the largest degree node. Three order of magnitude faster than HeapSort! [ Nelly Litvak, 14-06-2013 ] 5/27

Random walk approach We propose the following algorithm for detecting the top k list of largest degree nodes: 1 Set k , α and m . 2 Execute a random walk step according to ( 1 ) . If it is the first step, start from the uniform distribution. 3 Check if the current node has a larger degree than one of the nodes in the current top k candidate list. If it is the case, insert the new node in the top-k candidate list and remove the worst node out of the list. 4 If the number of random walk steps is less than m , return to Step 2 of the algorithm. Stop, otherwise. [ Nelly Litvak, 14-06-2013 ] 6/27

How to choose α W t – state of the random walk at time t = 0, 1, . . . P π [ W t = i | jump ] = 1 P π [ W t = i | no jump ] = d i n , 2 | E | = π i ( 0 ) α is too small: the random walk can gets ‘lost’ in the network. α is too large: jumps are too frequent, no useful information [ Nelly Litvak, 14-06-2013 ] 7/27

How to choose α W t – state of the random walk at time t = 0, 1, . . . P π [ W t = i | jump ] = 1 P π [ W t = i | no jump ] = d i n , 2 | E | = π i ( 0 ) α is too small: the random walk can gets ‘lost’ in the network. α is too large: jumps are too frequent, no useful information Maximize the long-run fraction of independent samples from π ( 0 ) 1 − P π [ jump ] ( P π [ jump ]) − 1 = P π [ jump ]( 1 − P π [ jump ]) → max . [ Nelly Litvak, 14-06-2013 ] 7/27

How to choose α W t – state of the random walk at time t = 0, 1, . . . P π [ W t = i | jump ] = 1 P π [ W t = i | no jump ] = d i n , 2 | E | = π i ( 0 ) α is too small: the random walk can gets ‘lost’ in the network. α is too large: jumps are too frequent, no useful information Maximize the long-run fraction of independent samples from π ( 0 ) 1 − P π [ jump ] ( P π [ jump ]) − 1 = P π [ jump ]( 1 − P π [ jump ]) → max . 2 | E | + n α = 1 n α P π [ jump ] = α = 2 | E | / n = average degree. 2 [ Nelly Litvak, 14-06-2013 ] 7/27

Stopping rules ◮ Objective: on average at least ¯ b of the top k nodes are identified correctly. ◮ Let us compute the expected number of top k elements observed in the candidate list up to trial m . � 1, node j has been observed at least once, H j = 0, node j has not been observed. Assuming we sample in i.i.d. fashion from the distribution (2), we can write k k k � � � E [ H j ] = E [ H j ] = P [ X j � 1 ] = j = 1 j = 1 j = 1 k k � � ( 1 − ( 1 − π j ) m ) . ( 1 − P [ X j = 0 ]) = (3) j = 1 j = 1 [ Nelly Litvak, 14-06-2013 ] 8/27

Stopping rules (cont.) (a) α = 0.001 (b) α = 28.6 Figure: Average number of correctly detected elements in top-10 for UK. [ Nelly Litvak, 14-06-2013 ] 9/27

Stopping rules (cont.) Here we can use the Poisson approximation k k � � ( 1 − e − m π j ) . E [ H j ] ≈ j = 1 j = 1 and propose stopping rule. Denote k � ( 1 − e − X ji ) . b m = i = 1 Stopping rule: Stop at m = m 0 , where m 0 = arg min { m : b m � ¯ b } . [ Nelly Litvak, 14-06-2013 ] 10/27

Example ◮ UK domain, about 18 500 000 nodes ◮ The random walk spends on average only about 5 800 steps to detect the largest degree node ◮ With ¯ b = 7 we obtain on average 9.22 correct elements out of top-10 list for an average of 65 802 random walk steps for the UK network. [ Nelly Litvak, 14-06-2013 ] 11/27

Directed networks: Twitter with Konstantin Avrachenkov and Liudmila Ostroumova [ Nelly Litvak, 14-06-2013 ] 12/27

Directed networks: Twitter with Konstantin Avrachenkov and Liudmila Ostroumova ◮ Huge network (more than 500M users) [ Nelly Litvak, 14-06-2013 ] 12/27

Directed networks: Twitter with Konstantin Avrachenkov and Liudmila Ostroumova ◮ Huge network (more than 500M users) ◮ Network accessed only through Twitter API [ Nelly Litvak, 14-06-2013 ] 12/27

Directed networks: Twitter with Konstantin Avrachenkov and Liudmila Ostroumova ◮ Huge network (more than 500M users) ◮ Network accessed only through Twitter API ◮ The rate of requests is limited ◮ One request: ◮ ID’s of at most 5000 followers of a node, or ◮ the number of followers of a node [ Nelly Litvak, 14-06-2013 ] 12/27

Random walk? Random walk quickly arrives to a large node and cannot randomly sample from its followers/followees because it is much more than 5000 [ Nelly Litvak, 14-06-2013 ] 13/27

Algorithm for finding top- k most followed on Twitter 1 Choose n 1 nodes at random 2 Retrieve the id’s of at most 5000 users followed by each of the n 1 nodes 3 Let S j be the number of followers of node j discovered among the n 1 nodes 4 Check the number of followers for n 2 users with the largest values of S j 5 Return the identified top- k most followed users In total, there are n = n 1 + n 2 requests to API [ Nelly Litvak, 14-06-2013 ] 14/27

Performance prediction ◮ Heuristic: Let 1, 2, . . . , k be the top- k nodes ◮ Approximate the probability that the node j is discovered by P ( S j > max { S n 2 , 1 } ) Then the fraction of correctly identified nodes is k 1 � P ( S j > max { S n 2 , 1 } ) k j = 1 and S j have approximately Poisson ( n 1 d j / N ) distribution, where N is the number of users [ Nelly Litvak, 14-06-2013 ] 15/27

Extreme value theory Theorem (Extreme value theory) D 1 , D 2 , . . . , D n are i.i.d. with 1 − F ( x ) = P ( D > x ) = Cx − α + 1 . Then � max { D 1 , D 2 , . . . , D n } − b n � = exp (−( 1 + δ x ) − 1 /δ ) , n → ∞ P lim � x a n with δ = 1 / ( α − 1 ) , a n = δ C δ n δ , b n = C δ n δ . (Therefore, the maximum is ‘of the order’ n 1 / ( α − 1 ) ) [ Nelly Litvak, 14-06-2013 ] 16/27

Prediction based on identified top- m , m < k ◮ We do not know d 1 , d 2 , . . . , d n but we can predict their value using the quantile estimation from the Extreme Value Theory (Dekkers et al, 1989): � m � ˆ γ ˆ d j = d m , j > 1, j << N , j − 1 where m − 1 1 � γ = ˆ log ( d i ) − log ( d m ) . m − 1 i = 1 ◮ If m is small enough then we can be almost sure that we discovered top- m correctly. [ Nelly Litvak, 14-06-2013 ] 17/27

Caveats in the prediction based on top- m , m < k ◮ We do not know the top- m degrees either. However, we can find them with high precision. [ Nelly Litvak, 14-06-2013 ] 18/27

Caveats in the prediction based on top- m , m < k ◮ We do not know the top- m degrees either. However, we can find them with high precision. ◮ The consistency of the estimator ˆ d j is proved for j < m but we use it for j > m . Can we prove the consistency, and if not: can we encounter some pathological behaviour? [ Nelly Litvak, 14-06-2013 ] 18/27

Quick detection of nodes with large degrees Nelly Litvak - PowerPoint PPT Presentation

Quick detection of nodes with large degrees Nelly Litvak University of Twente, Stochastic Operations Research group NADINE meeting, 14-06-2013 Finding top-k largest degree nodes with Konstantin Avrachenkov, Marina Sokol, Don Towsley [ Nelly

SimpleGraphs: Degrees lastweek Multi-Graph thisweek degrees.1 degrees.2 AlbertRMeyer

Shimura Degrees, New Modular Degrees, and Congruence Primes Alyson Deines CCR La Jolla October

Sum of Degrees of Vertices Theorem Theorem (Sum of Degrees of Vertices Theorem) Suppose a graph

Detection of neutral particles detection of neutrons detection of neutrinons detection of low

Paris Agreement (2015) UN IPCC Report (2018) Significant differences between 1.5 degrees C and 2

latitude : (noun) distance north or south of the equator measured in degrees up to 90 degrees. The

Habanero Operating Committee January 25 2017 Habanero Overview 1. Execute Nodes 2. Head Nodes

Minimum Number Of Nodes Minimum number of nodes in a binary tree whose height is h. At

Minimum Number Of Nodes Minimum number of nodes in a binary tree whose height is h. At

Minimum Number Of Nodes Minimum number of nodes in a binary tree whose height is h. At

Printout Tuesday, October 29, 2019 7:38 PM Quick Notes Page 1 Quick Notes Page 2 Quick Notes

Sorting Chapter 7 1 Quick Sort One of the most popular fast sorting algorithms Quick sort

Chemspace KNIME nodes Chemspace Search Chemspace KNIME nodes Chemspace Search and Chemspace

Chemspace KNIME nodes Expanded Search Chemspace KNIME nodes Chemspace Search and Chemspace

The effects of dangling nodes on citation networks Erjia Yan & Ying Ding ISSI 2011 - June

Alexander Lee: C: elegans metabolic network Graph of C. elegans metabolic network. Note that

Validating Bacon: Accuracy, Representation, and the Early Modern Social Network Scott Weingart,

Descriptive Characterizations of Pettis and Bochner Integrals on m -Dimensional Compact Intervals

Search for mesic nuclei in the photoproduction of and ' mesons off light nuclei

Exploration of a Multi-Sensor Approach for the Detection and Mapping of Coal Mine Fires in the

Council Meeting Monday October 17 th , 2016 Welcome School Council Executive: Sandy Sokol and

SLE and conformal invariance for critical Ising model Stanislav Smirnov jointly with Dmitry

OWASP Foundation OWASP does not endorse or recommend commercial products or services , allowing our

Random Walk Based Algorithms for Complex Network Analysis Konstantin Avrachenkov Inria Sophia

Quick detection of nodes with large degrees Nelly Litvak - PowerPoint PPT Presentation

Quick detection of nodes with large degrees Nelly Litvak University of Twente, Stochastic Operations Research group NADINE meeting, 14-06-2013 Finding top-k largest degree nodes with Konstantin Avrachenkov, Marina Sokol, Don Towsley [ Nelly

SimpleGraphs: Degrees lastweek Multi-Graph thisweek degrees.1 degrees.2 AlbertRMeyer

Shimura Degrees, New Modular Degrees, and Congruence Primes Alyson Deines CCR La Jolla October

Sum of Degrees of Vertices Theorem Theorem (Sum of Degrees of Vertices Theorem) Suppose a graph

Detection of neutral particles detection of neutrons detection of neutrinons detection of low

Paris Agreement (2015) UN IPCC Report (2018) Significant differences between 1.5 degrees C and 2

latitude : (noun) distance north or south of the equator measured in degrees up to 90 degrees. The

Habanero Operating Committee January 25 2017 Habanero Overview 1. Execute Nodes 2. Head Nodes

Minimum Number Of Nodes Minimum number of nodes in a binary tree whose height is h. At

Minimum Number Of Nodes Minimum number of nodes in a binary tree whose height is h. At

Minimum Number Of Nodes Minimum number of nodes in a binary tree whose height is h. At

Printout Tuesday, October 29, 2019 7:38 PM Quick Notes Page 1 Quick Notes Page 2 Quick Notes

Sorting Chapter 7 1 Quick Sort One of the most popular fast sorting algorithms Quick sort

Chemspace KNIME nodes Chemspace Search Chemspace KNIME nodes Chemspace Search and Chemspace

Chemspace KNIME nodes Expanded Search Chemspace KNIME nodes Chemspace Search and Chemspace

The effects of dangling nodes on citation networks Erjia Yan &amp; Ying Ding ISSI 2011 - June

Alexander Lee: C: elegans metabolic network Graph of C. elegans metabolic network. Note that

Validating Bacon: Accuracy, Representation, and the Early Modern Social Network Scott Weingart,

Descriptive Characterizations of Pettis and Bochner Integrals on m -Dimensional Compact Intervals

Search for mesic nuclei in the photoproduction of and ' mesons off light nuclei

Exploration of a Multi-Sensor Approach for the Detection and Mapping of Coal Mine Fires in the

Council Meeting Monday October 17 th , 2016 Welcome School Council Executive: Sandy Sokol and

SLE and conformal invariance for critical Ising model Stanislav Smirnov jointly with Dmitry

OWASP Foundation OWASP does not endorse or recommend commercial products or services , allowing our

Random Walk Based Algorithms for Complex Network Analysis Konstantin Avrachenkov Inria Sophia

The effects of dangling nodes on citation networks Erjia Yan & Ying Ding ISSI 2011 - June