Effjcient Computation of Change-Graph Scores David Eppstein - PowerPoint PPT Presentation

Effjcient Computation of Change-Graph Scores David Eppstein (includes joint work with Emma Spiro, Mike Goodrich, Darren Strash, Lowell Trott, and Maarten Löffmer)

Context: analysis of social networks Represent interactions among people and their environments as graphs (often: vertices = people, edges = pairwise interactions) Goals: Predict human behavior Detect anomalous behavior Handle varied types of graph data and scale well to large networks Effjcient computation of change-graph scores D. Eppstein, UC Irvine, 2009

Mathematical modeling of social networks Develop mathematical models with a small number of meaningful numerical parameters that generate graphs resembling real social networks Why? – Fitting the parameters to real data tells us how real social nets behave – The parts of the real networks that do not match the model may be anomalous – We can use the model to generate test data for other analysis algorithms Not a pipe, but a model of a pipe René Magritte, The Treachery of Images , 1928–9 Effjcient computation of change-graph scores D. Eppstein, UC Irvine, 2009

Exponential random graph model: graphs shaped by their local structures Defjne local features that may be present in a graph: • Presence of an edge • Degree of a vertex • Small subgraphs Assign weights to features: positive = more likely, negative = less likely Log-likelihood of G = sum of weights of features + normalizing constant Different feature sets and weights give different models capable of fjtting different types of social network Public-domain image by Mohylek on Wikimedia commons, http://commons.wikimedia.org/wiki/File:Magnifying.jpg Effjcient computation of change-graph scores D. Eppstein, UC Irvine, 2009

Probabilistic reasoning in exponential random graphs Most basic problem: pull the handle, generate a random graph from the model With a generation subroutine, we can also: •Find normalizing constant •Fit weights to data •Understand typical behavior of graphs in this model (e.g. how many edges?) •Detect unusual structures in real-world graphs Crop of CC-BY-SA licensed image “Slot Machine” by Jeff Kubina on Flickr, http://www.fmickr.com/photos/95118988@N00/347687569 Effjcient computation of change-graph scores D. Eppstein, UC Irvine, 2009

Standard method for random generation: Markov Chain Monte Carlo (random walk) Start with any graph Repeatedly choose a random edge to add or remove Calculate change to log-likelihood Choose whether to perform the update (positive change score: always perform negative change score: sometimes reject) After enough steps, graph is random with correct probability distribution “The Mambo”, public artwork by Jack Mackie and Chuck Greening, Seattle, 1979. Modifjed from GFDL-licensed photo by Joe Mabel on Wikimedia Commons, http://commons.wikimedia.org/wiki/File:Seattle_B%27way_Mambo_02.jpg Effjcient computation of change-graph scores D. Eppstein, UC Irvine, 2009

The key algorithmic subproblem: Add and remove edges in a dynamic graph At each step, update feature counts (how many of each type of small subgraph it has) A telephone switchboard, an early example of a dynamic graph Photo by Joseph A. Carr, 1975, available online under a free license at http://commons.wikimedia.org/wiki/ File:JT_Switchboard_770x540.jpg Because this is in the inner loop, it must be very fast Effjcient computation of change-graph scores D. Eppstein, UC Irvine, 2009

MURI-funded work on this problem: The h -index of a graph and its application to dynamic subgraph statistics (with E. S. Spiro) Presented at WADS, Banff, Canada, 2009. Lecture Notes in Comp. Sci. 5664, 2009, pp. 278-289. Shortlisted for best paper award. Undirected graphs, feature = subgraph with ≤ 3 vertices Extended dynamic subgraph statistics using h -index parameterized data structures (with M. T . Goodrich, D. Strash, and L. Trott) in preparation Directed graphs, larger numbers of vertices per feature See poster session New research still under development (with M. T . Goodrich, M. Löffmer) Geometric graphs and geometric features Effjcient computation of change-graph scores D. Eppstein, UC Irvine, 2009

Interdependence among 3-vertex feature counts 1 1 1 1 n ( n – 1)( n – 2)/6 0 1 2 3 m ( n – 2) 0 0 1 3 deg( v ) (deg( v ) – 1)/2 0 0 0 1 number of triangles So if we can maintain the number of triangles in a dynamic graph we can easily compute all other counts Effjcient computation of change-graph scores D. Eppstein, UC Irvine, 2009

Degree-based partitioning of a graph Select a number D Partition vertices into two subsets: L: many vertices with degree less than D H: few vertices with degree greater than D Boys choosing sides for hockey on Sarnia Bay, Ontario, December 29, 1908. Public domain image from Library and Archives Canada / John Boyd Collection / PA-060732 http://www.collectionscanada.gc.ca/hockey/024002-2300-e.html Effjcient computation of change-graph scores D. Eppstein, UC Irvine, 2009

What we store: Number of paths through low-degree vertices Maintain hash table C indexed by pairs ( u , v ) of vertices C[ u , v ] = number of two-edge paths u —L— v Hollerith 1890 census tabulator from http://www.columbia.edu/acis/history/census-tabulator.html Effjcient computation of change-graph scores D. Eppstein, UC Irvine, 2009

When edge ( u,v ) is added or removed: The number of triangles with the third vertex in L is stored in C[ u , v ] (look it up there) The number of triangles with a third vertex w in H can be counted by examining all possibilities for w (loop over all vertices in H and test whether each one forms a triangle) If u belongs to L, add degree( v ) to C[ u , w ] for each neighbor w of u (perform a symmetric update if v belongs to L) (Very infrequently) update the partition into low and high degree Effjcient computation of change-graph scores D. Eppstein, UC Irvine, 2009

How much time does it take per change? Finding triangles involving changed edge takes O(|H|) Each edge is involved in O(D) x—L—x paths, so updating hash table after a change takes O(D) If L/H partition ever changes, update counts for all x—L—x paths through moved vertex taking time O(D 2 ) How to choose D so |H| + D is small and partition changes infrequently? Modifjed from CC-BY licensed photo by smaedli on Flickr, http://www.fmickr.com/photos/smaedli/3271558744/ Effjcient computation of change-graph scores D. Eppstein, UC Irvine, 2009

A detour into bibliometrics How to measure productivity of an academic researcher? Total publication count: encourages many low-impact papers Total citation count: unduly infmuenced by few high-impact pubs h -index [J. E. Hirsch, PNAS 2005]: maximum number such that h papers each have ≥ h citations CC-BY-SA-licensed image by Jhodson from Wikimedia commons, http://commons.wikimedia.org/wiki/File:Bookspile.jpg Public-domain image by Ael 2 from Wikimedia Commons, http://commons.wikimedia.org/wiki/File:H-index_plot.PNG Effjcient computation of change-graph scores D. Eppstein, UC Irvine, 2009

The h -index of a graph: Maximum number such that h vertices each have ≥ h neighbors H = set of h high-degree vertices L = remaining vertices, degree ≤ h Provides optimal tradeoff between |H| and D Never more than sqrt( m ) Else H would have too many edges Effjcient computation of change-graph scores D. Eppstein, UC Irvine, 2009

Results: We can maintain the h -index of a dynamic graph in constant time per update (details beyond the scope of this talk) A relaxed degree partition based on the h -index changes very rarely On average, some vertex changes sides once in every O( h ) updates As a consequence, we can maintain triangle counts and change scores in time O( h ) per update All algorithms are simple and implementable Later work (Trott poster) generalizes this to more complex features Still need to do: implement them and test their actual performance Effjcient computation of change-graph scores D. Eppstein, UC Irvine, 2009

Effjcient computation of change-graph scores D. Eppstein, UC Irvine, 2009

Effjcient Computation of Change-Graph Scores David Eppstein - PowerPoint PPT Presentation

Effjcient Computation of Change-Graph Scores David Eppstein (includes joint work with Emma Spiro, Mike Goodrich, Darren Strash, Lowell Trott, and Maarten Lffmer) Context: analysis of social networks Represent interactions among people and

Chapter 5: z-Scores : Location of Scores Chapter 5: z-Scores : Location of Scores and Standardized

A Scalable, Portable, and Memory-Effjcient Lock-Free FIFO Queue Ruslan Nikolaev Systems

Parent Seminar Welcome! PSAT Scores SAT vs. ACT Next Steps Overview New PSAT Score Report

1/12/2011 Chapter 5: z-Scores : Location of Scores and Standardized Distributions Introduction to

Top-k Queries over Uncertain Scores Qing Liu, Debabrota Basu, Talel Abdessalem, St ephane

GRAPH MINING AND GRAPH KERNELS Part I: Graph Mining Karsten Borgwardt^ and Xifeng Yan*

Graph Mining Marco Serafini COMPSCI 532 Lecture 11 Classes of Graph Systems Graph

GRAPH MINING AND GRAPH KERNELS Part II: Graph Kernels Karsten Borgwardt^ and Xifeng Yan*

Effjcient Similarity Computation for Collaborative Filtering in Dynamic Environments Olivier

Effjcient pairing computation with theta functions. ANTS IX David Lubicz 1,2 , Damien Robert 3 1

Formal Definition of Computation Formal Definition of Computation p.1/28 Computation

INTRODUCTION TO PYTORCH Caio corro Computation Graph Dynamic: you re-build the computation

Organ failure scores in Organ failure scores in neonatal sepsis. neonatal sepsis. Hugo

Using Quality Using Quality -of of-Life Scores to Life Scores to Guide Prostate Radiation

2017 SBAC ELA Scores 2017 SBAC ELA Scores Average Scaled Scores Percentage

CMAS: PARCC New state assessment scores arriving by new year New assessment to measure mastery

OCMS F IELD , P ATH OF T RAVEL , A RT , C-S MART L AB U PGRADES S TART D ATE 5/16/14 C

Logic & Proofs for Cyber-Physical Systems Andr e Platzer aplatzer@cs.cmu.edu Computer

Experimental Evaluation of a Joint Cognitive System for 4D Trajectory Management The Third SESAR

Introduction to Distributed Systems Material adapted from Distributed Systems: Concepts &

1 & 2 Samuel Series Lesson #082 January 31, 2017 Dean Bible Ministries

STABILITY F O R A L L T I M E S PSALM 78 MIKE RICCARDI SHARED STATISTICS ON ANXIETY AND

Caesar had his Brutus, Charles the First his Cromwell, and George the Third [at this point the

Ma Matt tthe hew w 5: 5:9 INT INTRODUCTION ODUCTION Pea eace-lo lovin ving g an

Sambuz

Useful Links

Newsletter

Mail Us

Effjcient Computation of Change-Graph Scores David Eppstein - PowerPoint PPT Presentation

Effjcient Computation of Change-Graph Scores David Eppstein (includes joint work with Emma Spiro, Mike Goodrich, Darren Strash, Lowell Trott, and Maarten Lffmer) Context: analysis of social networks Represent interactions among people and

Chapter 5: z-Scores : Location of Scores Chapter 5: z-Scores : Location of Scores and Standardized

A Scalable, Portable, and Memory-Effjcient Lock-Free FIFO Queue Ruslan Nikolaev Systems

Parent Seminar Welcome! PSAT Scores SAT vs. ACT Next Steps Overview New PSAT Score Report

1/12/2011 Chapter 5: z-Scores : Location of Scores and Standardized Distributions Introduction to

Top-k Queries over Uncertain Scores Qing Liu, Debabrota Basu, Talel Abdessalem, St ephane

GRAPH MINING AND GRAPH KERNELS Part I: Graph Mining Karsten Borgwardt^ and Xifeng Yan*

Graph Mining Marco Serafini COMPSCI 532 Lecture 11 Classes of Graph Systems Graph

GRAPH MINING AND GRAPH KERNELS Part II: Graph Kernels Karsten Borgwardt^ and Xifeng Yan*

Effjcient Similarity Computation for Collaborative Filtering in Dynamic Environments Olivier

Effjcient pairing computation with theta functions. ANTS IX David Lubicz 1,2 , Damien Robert 3 1

Formal Definition of Computation Formal Definition of Computation p.1/28 Computation

INTRODUCTION TO PYTORCH Caio corro Computation Graph Dynamic: you re-build the computation

Organ failure scores in Organ failure scores in neonatal sepsis. neonatal sepsis. Hugo

Using Quality Using Quality -of of-Life Scores to Life Scores to Guide Prostate Radiation

2017 SBAC ELA Scores 2017 SBAC ELA Scores Average Scaled Scores Percentage

CMAS: PARCC New state assessment scores arriving by new year New assessment to measure mastery

OCMS F IELD , P ATH OF T RAVEL , A RT , C-S MART L AB U PGRADES S TART D ATE 5/16/14 C

Logic &amp; Proofs for Cyber-Physical Systems Andr e Platzer aplatzer@cs.cmu.edu Computer

Experimental Evaluation of a Joint Cognitive System for 4D Trajectory Management The Third SESAR

Introduction to Distributed Systems Material adapted from Distributed Systems: Concepts &amp;

1 &amp; 2 Samuel Series Lesson #082 January 31, 2017 Dean Bible Ministries

STABILITY F O R A L L T I M E S PSALM 78 MIKE RICCARDI SHARED STATISTICS ON ANXIETY AND

Caesar had his Brutus, Charles the First his Cromwell, and George the Third [at this point the

Ma Matt tthe hew w 5: 5:9 INT INTRODUCTION ODUCTION Pea eace-lo lovin ving g an

Sambuz

Useful Links

Newsletter

Mail Us

Logic & Proofs for Cyber-Physical Systems Andr e Platzer aplatzer@cs.cmu.edu Computer

Introduction to Distributed Systems Material adapted from Distributed Systems: Concepts &

1 & 2 Samuel Series Lesson #082 January 31, 2017 Dean Bible Ministries