Mathematics for Computer Science Google Rankings MIT 6.042J/18.062J Which webpages are “more important?” PageRank Model of internet: • Users click random link on a page. (by Google founder • Occasionally start over. Larry Page) A page is “more important” if viewed a large fraction of time pagerank.1 pagerank.2 Albert R Meyer, May 13, 2015 Albert R Meyer, May 13, 2015 Random Walk on the Web Random Walk on the Web To model starting over: View the entire web as digraph * add a “supernode” to the graph • vertices are webpages * an edge from supernode to each • edge (V,W) exists if link from other node page V to page W * edges from each other node back • edges out of V equally likely to supernode Pr[(V,W)] = 1/outdeg(V) may get customized probabilities pagerank.3 pagerank.4 Albert R Meyer, May 13, 2015 Albert R Meyer, May 13, 2015 1
Super-node PageRank Compute super s stationary distribution ½ ½ H T PageRank(V) :: = s V H ½ T T ½ H T T T H Rank V above W when HH HT T TH TT T T H T H T s V > s W H H T H win H T lose pagerank.5 pagerank.6 Albert R Meyer, May 13, 2015 Albert R Meyer, May 13, 2015 Importance of Super-node Resistance to scamming ensures * Creating fake nodes s * unique stable distribution pointing to self p * every initial distribution s converges to * Adding links to important ⋅ Mt = s lim t →∞ p nodes * convergence is rapid: won’t improve PageRank s t is small so easy to compute pagerank.7 pagerank.8 Albert R Meyer, May 13, 2015 Albert R Meyer, May 13, 2015 2
Actual Google Rank Google rank rules are a closely held trade secret using text, location, payment, and other criteria that have evolved for 15 years. But PageRank continues to play a significant role. pagerank.9 Albert R Meyer, May 13, 2015 3
MIT OpenCourseWare http s ://ocw.mit.edu 6.042J / 18.062J Mathematics for Computer Science Spring 20 15 For information about citing these materials or our Terms of Use, visit: http s ://ocw.mit.edu/terms.
Recommend
More recommend