announcements thank you for participating in our mid
play

Announcements: - Thank you for participating in our mid-quarter - PowerPoint PPT Presentation

Announcements: - Thank you for participating in our mid-quarter evaluation Thank you for participating in our homework feedback polls! - - Course project - Average was ~80% Dont worry about grade but take feedback seriously - -


  1. Announcements: - Thank you for participating in our mid-quarter evaluation Thank you for participating in our homework feedback polls! ☺ - - Course project - Average was ~80% Don’t worry about grade but take feedback seriously - - Project Milestone due Thu Sun - No late days and no exceptions - Consider meeting with your assigned TA

  2.  We often think of networks being organized into modules, clusters, communities: 5/7/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 2

  3. 5/7/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 3

  4. Nodes Nodes Adjacency matrix Network 5/7/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 4

  5.  Find micro-markets by partitioning the query-to-advertiser graph: query advertiser [Andersen, Lang: Communities from seed sets, 2006] 5/7/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 5

  6.  Clusters in Movies-to-Actors graph: [Andersen, Lang: Communities from seed sets, 2006] 5/7/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 6

  7.  Discovering social circles, circles of trust: [McAuley, Leskovec: Discovering social circles in ego networks, 2012] 5/7/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 7

  8.  Graph is large ▪ Assume the graph fits in main memory ▪ For example, to work with a 200M node and 2B edge graph one needs approx. 16GB RAM ▪ But the graph is too big for running anything more than linear time algorithms  We will cover a PageRank based algorithm for finding dense clusters ▪ The runtime of the algorithm will be proportional to the cluster size (not the graph size!) 5/7/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 8

  9.  Discovering clusters based on seed nodes ▪ Given: Seed node s ▪ Compute (approximate) P ersonalized P age R ank ( PPR ) around node s (teleport set={ s }) ▪ Idea is that if s belongs to a nice cluster, the random walk will get trapped inside the cluster Seed node 5/7/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 9

  10. Cluster “quality” (lower is better) Good clusters Seed node  Algorithm outline: Node rank in decreasing PPR score ▪ Pick a seed node s of interest ▪ Run PPR with teleport set = { s } ▪ Sort the nodes by the decreasing PPR score ▪ Sweep over the nodes and find good clusters 5/7/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 10

  11. 5 1  Undirected graph 𝑯(𝑾,𝑭): 2 6 4 3  Partitioning task: ▪ Divide vertices into 2 disjoint groups 𝐵, 𝐶 = 𝑊\𝐵 A B=V\A 5 1 2 6 4 3  Question: ▪ How can we define a “good” cluster in 𝑯 ? 5/7/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 11

  12.  What makes a good cluster? ▪ Maximize the number of within-cluster connections ▪ Minimize the number of between-cluster connections 5 1 2 6 4 3 A V\A 5/7/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 12

  13.  Express cluster quality as a function of the “edge cut” of the cluster  Cut: Set of edges (edge weights) with only one node in the cluster: Note: This works for weighted and unweighted (set all w ij =1 ) graphs A 5 1 cut(A) = 2 2 6 4 3 5/7/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 13

  14.  Partition quality: Cut score ▪ Quality of a cluster is the weight of connections pointing outside the cluster  Degenerate case: “Optimal cut” Minimum cut  Problem: ▪ Only considers external cluster connections ▪ Does not consider internal cluster connectivity 5/7/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 14

  15. [Shi-Malik]  Criterion: Conductance: Connectivity of the group to the rest of the network relative to the density of the group    | {( , ) ; , } | i j E i A j A  = ( ) A − min( ( ), 2 ( )) vol A m vol A 𝒘𝒑𝒎(𝑩) : total weight of the edges with at least m … number of edges of one endpoint in 𝑩 : 𝐰𝐩𝐦 𝑩 = σ 𝒋∈𝑩 𝒆 𝒋 the graph ◼ Vol(A)=2*#edges inside A + #edges pointing out of A d i … degree ◼ Why use this criterion? of node I E... edge set ◼ Produces more balanced partitions of the graph 5/7/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 15

  16. 𝝔 = 𝟑/𝟓 = 𝟏.𝟔 𝝔 = 𝟕/𝟘𝟑 = 𝟏. 𝟏𝟕𝟔 5/7/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 16

  17.  Algorithm outline: Conductance 𝝔 𝑩 𝒋 ▪ Pick a seed node s of Good clusters interest ▪ Run PPR w/ teleport={ s } ▪ Sort the nodes by the decreasing PPR score ▪ Sweep over the nodes and find good clusters Node rank i in decreasing PPR score  Sweep: ▪ Sort nodes in decreasing PPR score 𝑠 1 > 𝑠 2 > ⋯ > 𝑠 𝑜 ▪ For each 𝒋 compute 𝝔(𝑩 𝒋 = 𝒔 𝟐 , … 𝒔 𝒋 ) ▪ Local minima of 𝝔(𝑩 𝒋 ) correspond to good clusters 5/7/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 17

  18.  The whole Sweep Conductance 𝝔 𝑩 𝒋 Good clusters curve can be computed in linear time: ▪ For loop over the nodes ▪ Keep hash-table of Node rank i in decreasing PPR score nodes in a set 𝐵 𝑗 ▪ To compute 𝝔 𝑩 𝒋+𝟐 = 𝐷𝑣𝑢(𝐵 𝑗+1 )/𝑊𝑝𝑚(𝐵 𝑗+1 ) ▪ 𝑊𝑝𝑚 𝐵 𝑗+1 = 𝑊𝑝𝑚 𝐵 𝑗 + 𝑒 𝑗+1 ▪ 𝐷𝑣𝑢 𝐵 𝑗+1 = 𝐷𝑣𝑢 𝐵 𝑗 + 𝑒 𝑗+1 − 2#(𝑓𝑒𝑕𝑓𝑡 𝑝𝑔 𝑣 𝑗+1 𝑢𝑝 𝐵 𝑗 ) 5/7/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 18

  19.  How to compute Personalized PageRank (PPR) without touching the whole graph? ▪ Power method won’t work since each single iteration accesses all nodes of the graph: 𝐬 (𝐮+𝟐) = 𝛄𝐍 ⋅ 𝐬 (𝒖) + 𝟐 − 𝜸 𝒃 At index S ▪ 𝒃 is a teleport vector: 𝒃 = 𝟏 … 𝟏 𝟐 𝟏 … 𝟏 𝑼 ▪ 𝒔 is the personalized PageRank vector  Approximate PageRank [Andersen, Chung, Lang, ‘07] ▪ A fast method for computing approximate Personalized PageRank ( PPR ) with teleport set ={ s } ▪ ApproxPageRank(s, β , ε ) ▪ s … seed node ▪ β … teleportation parameter ▪ ε … approximation error parameter 5/7/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 19

  20.  Overview of the approximate PPR ▪ Lazy random walk , which is a variant of a random walk that stays put with probability 1/2 at each time step, and walks to a random neighbor the other half of the time: 𝑒 𝑗 … degree of 𝑗 (𝒖) ▪ Keep track of residual PPR score 𝒓 𝒗 = 𝒒 𝒗 − 𝒔 𝒗 ▪ Residual tells us how well PPR score 𝑞 𝑣 of 𝒗 is approximated ▪ 𝒒 𝒗 … is the “true” PageRank of node 𝒗 (𝒖) … is PageRank estimate of node 𝑣 at around 𝒖 ▪ 𝒔 𝒗 𝒓 𝒗 If residual 𝒓 𝒗 of node 𝒗 is too big 𝒆 𝒗 ≥ 𝜻 then push the walk further (distribute some of residual 𝑟 𝑣 to all 𝑣 ’s neighbors along out- coming edges), else don’t touch the node 5/7/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 20

  21.  A different way to look at PageRank: [Jeh&Widom. Scaling Personalized Web Search , 2002] 𝒒 𝜸 (𝒃) = 𝟐 − 𝜸 𝒃 + 𝜸 𝒒 𝜸 (𝑵 ⋅ 𝒃) ▪ 𝒒 𝜸 (𝒃) is the true PageRank vector with teleport parameter 𝜸 , and teleport vector 𝒃 ▪ 𝒒 𝜸 (𝑵 ⋅ 𝒃) is the PageRank vector with teleportation vector 𝑵 ⋅ 𝒃 , and teleportation parameter 𝜸 ▪ where 𝑵 is the stochastic PageRank transition matrix ▪ Notice: 𝑵 ⋅ 𝒃 is one step of a random walk 5/7/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 21

  22.  Proving: 𝒒 𝜸 (𝒃) = 𝟐 − 𝜸 𝒃 + 𝜸 𝒒 𝜸 (𝑵 ⋅ 𝒃) ▪ We can break this probability into two cases: ▪ Walks of length 0, and ▪ Walks of length longer than 0 ▪ The probability of length 0 walk is 𝟐 − 𝜸 , and the walk ends where it started, with walker distribution 𝒃 ▪ The probability of walk length >0 is 𝜸 , and then the walk starts at distribution 𝒃 , takes a step, (so it has distribution 𝑵𝒃 ), then takes the rest of the random walk with distribution 𝒒 𝜸 (𝑵𝒃) ▪ Note that we used the memoryless nature of the walk: After we know the location of the second step of the walk has distribution 𝑵𝒃 , the rest of the walk can forget where it started and behave as if it started at 𝑵𝒃 . This is the key idea of the proof. 5/7/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 22

Recommend


More recommend