Chapter 5: Link Analysis for Authority Scoring 5.1 PageRank (S. Brin - PowerPoint PPT Presentation

Chapter 5: Link Analysis for Authority Scoring 5.1 PageRank (S. Brin and L. Page 1997/1998) 5.2 HITS (J. Kleinberg 1997/1999) 5.3 Comparison and Extensions 5.4 Topic-specific and Personalized PageRank 5.5 Efficiency Issues 5.6 Online Page Importance 5.7 Spam-Resilient Authority Scoring 5-1 IRDM WS 2005

5.3 Comparison and Extensions Literature contains plethora of variations on Page-Rank and HITS Key points are: • mutual reinforcement between hubs and authorities • re-scale edge weights (normalization) Unified notation (for link graph with n nodes): - n × n link matrix, L ij = 1 if there is an edge (i,j), 0 else L - n × 1 vector with din i = indegree(i), Din n × n = diag(din) din - n × 1 vector with dout i = outdegree(i), Dout n × n = diag(dout) dout - n × 1 authority vector x - n × 1 hub vector y Iop - operation applied to incoming links Oop - operation applied to outgoing links 5-2 IRDM WS 2005

HITS and PageRank in Unified Framework HITS: x = Iop(y), y=Oop(x) with Iop(y) = L T y , Oop(x) = Lx PageRank : x = Iop(x) with Iop(x) = P T x with P T = L T Dout -1 or P T = α L T Dout -1 + (1- α ) (1/n) e e T SALSA (PageRank-style computation with mutual reinforcement): x = Iop(y) with Iop(y) = P T y with P T = L T Dout -1 y = Oop(x) with Oop(x) = Q x with Q = L Din -1 and other models of link analysis can be cast into this framework, too 5-3 IRDM WS 2005

A Familiy of Link Analysis Methods General scheme: Iop( ⋅ ) = Din -p L T Dout -q ( ⋅ ) and Oop( ⋅ ) = Iop T ( ⋅ ) Specific instance Out-link normalized Rank (Onorm-Rank) : Iop( ⋅ ) = L T Dout -1/2 ( ⋅ ) , Oop( ⋅ ) = Dout -1/2 L ( ⋅ ) applied to x and y: x = Iop(y), y = Oop(x) In-link normalized Rank (Inorm-Rank) : Iop( ⋅ ) = Din -1/2 L T ( ⋅ ) , Oop( ⋅ ) = L Din -1/2 ( ⋅ ) Symmetric normalized Rank (Snorm-Rank) : Iop( ⋅ ) = Din -1/2 L T Dout -1/2 ( ⋅ ) , Oop( ⋅ ) = Dout -1/2 L Din -1/2 ( ⋅ ) Some properties of Snorm-Rank: x = Iop(y) = Iop(Oop(x)) → λ x = A (S) x with A (S) = Din -1/2 L T Dout -1 L Din -1/2 → Solution: λ = 1, x = din 1/2 and analogously for hub scores: λ y = H (S) y → λ =1, y = dout 1/2 5-4 IRDM WS 2005

Experimental Results Construct neighborhood graph from result of query "star" Compare authority-scoring ranks HITS OnormRank PageRank 1 www.starwars.com www.starwars.com www.starwars.com 2 www.lucasarts.com www.lucasarts.com www.lucasarts.com 3 www.jediknight.net www.jediknight.net www.paramount.com 4 www.sirstevesguide.com www.paramount.com www.4starads.com/romanc 5 www.paramount.com www.sirstevesguide.com www.starpages.net 6 www.surfthe.net/swma/ www.surfthe.net/swma/ www.dailystarnews.com 7 insurrection.startrek.com insurrection.startrek.com www.state.mn.us 8 www.startrek.com www.fanfix.com www.star-telegram.com 9 www.fanfix.com shop.starwars.com www.starbulletin.com 10 www.physics.usyd.edu.au/ www.physics.usyd.edu.au/ www.kansascity.com .../starwars .../starwars ... Bottom line: 19 www.jediknight.net Differences between all kinds of authority 21 insurrection.startrek.com 23 www.surfthe.net/swma ranking methods are fairly minor ! 5-5 IRDM WS 2005

More LAR (Link Analysis Ranking) Methods HubAveraging (similar to ONorm for hubs): 1 = a q h p = h p a q ( ) ( ) ( ) ( ) ∑ ∈ ∑ ∈ p IN q OUT p q OUT p ( ) ( ) | ( ) | AuthorityThreshold (only k best authorities per hub): 1 = a q h p = h p a q ( ) ( ) ( ) ( ) ∑ ∈ ∑ p IN q k ( ) ∈ − q OUT k p ( ) − = − ∈ p a q q OUT p OUT k ( ) argmax k { ( ) | ( )} with q Max (AuthorityThreshold with k=1): = a q h p = ∈ h p a a q q OUT p ( ) ( ) ( ) ( argmax { ( ) | ( )}) ∑ ∈ p IN q q ( ) BreadthFirstSearch (transitive citations up to depth k): j − 1 k where N (j) (q) are nodes that 1 =   j a q N q ( ) ( ) | ( ) | have a path to q by alternating   ∑ 2 = j   1 o OUT and i IN steps with j=o+i 5-6 IRDM WS 2005

LAR as Bayesian Learning + h a e exp( ) p q p → = Postulate prob. model for p → → → → q: P p q [ ] + + h a e 1 exp( ) p q p with parameters θ θ θ θ = (h 1 , ..., h n , a 1 , ..., a n , e 1 , ..., e n ) Postulate prior f( θ θ ) for parameters θ θ θ θ θ θ : normal distr. ( µ µ µ µ , σ σ ) for each e i , exponential distr. ( λ σ σ λ λ =1) for each a i , h i λ Posterior f( θ θ |G) for links i → → j ∈ ∈ G: θ θ → → ∈ ∈ θ θ θ f G f G f ( | ) ~ ( | ) ( ) Theorem: + + a h e a h e − − − − µ σ h a e 2 2 θ Π ⋅ Π Π + f G e e e ( ) / 2 j i i j i i ( | ) ~ i i i / ( 1 ) = ∈ i n i j G i j 1 .. ( , ) , ˆ θ = E θ G Estimate using numerical algorithms : [ | ] h a p q → = P p q [ ] Alternative simpler model: + h a 1 p q 5-7 IRDM WS 2005

LAR Quality Measures: Score Distances Consider two n-dimensional authority score vectors a and b = α − β d a b a b ( , ) min | | d 1 distance: ∑ = α β ≥ i i 1 , 1 i n 1 .. with scaling weights α , β to compensate normalization distortions could alternatively use Lq norm rather than L1 5-8 IRDM WS 2005

LAR Quality Measures: Rank Distances Consider top-k of two rankings τ 1 and τ 2 or full permutations of 1..n • overlap similarity OSim ( τ 1, τ 2) = | top(k, τ 1) ∩ top(k, τ 2) | / k • Kendall's τ τ τ τ measure KDist ( τ 1, τ 2) = ∈ ≠ τ τ u v u v U u v and disagree on relative order of u v | {( , ) | , , , 1 , 2 , } ⋅ − U U | | (| | 1 ) with U = top(k, τ 1) ∪ top(k, τ 2) (with missing items set to rank k+1) with ties in one ranking and order in the other, count p with 0 ≤ p ≤ 1 → p=0: weak KDist, → p=1: strict KDist 1 τ − τ u u • footrule distance Fdist ( τ 1, τ 2) = | 1 ( ) 2 ( ) | ∑ U | | ∈ u U (normalized) Fdist is upper bound for KDist and Fdist/2 is lower bound 5-9 IRDM WS 2005

LAR Similarity Two LAR algorithms A and B are similar on the class G G of graphs G G with n nodes under authority distance measure d if for n →∞ : max {d(A(G),B(G)) | G ∈ ∈ ∈ ∈ G G } = o(M n (d,L q )) G G where M n (d,L q ) is the maximum distance under d for any two n-dimensional vectors x and y that have L q norm 1 (which is Θ (n1-1/q) for d 1 distance and L q norm) Two LAR algorithms A and B are weakly (strictly) rank-similar on the class G G of graphs with n nodes under weak (strict) rank distance r G G if for n →∞ : max {r(A(G),B(G)) | G ∈ ∈ ∈ G ∈ G } = o(1) G G Theorems: SALSA and Indegree are similar and strictly rank-similar. No other LAR algorithms are known to be similar or weakly rank-similar. 5-10 IRDM WS 2005

LAR Stability For graphs G=(V,E) and G‘=(V,E‘) the link distance d link is: d link (G,G‘) = |(E ∪ ∪ E‘) - (E ∩ ∩ E‘)| ∪ ∪ ∩ ∩ For graph G ∈ G, we define C k (G) = {G‘ ∈ G | d link (G,G‘) ≤ k} LAR algorithm A is stable on the class G of graphs with n nodes under authority distance measure d if for every k > 0 for n →∞ : max {d(A(G),A(G‘)) | G ∈ ∈ G, G, G‘ ∈ ∈ C k (G)} = o(M n (d,L q )) ∈ ∈ ∈ ∈ G, G, LAR algorithm A is weakly (strictly) rank-stable on the class G of graphs with n nodes under weak (strict) rank distance r if for every k > 0 for n →∞ : max {r(A(G),A(G‘)) | G ∈ ∈ ∈ ∈ G, G, G‘ ∈ ∈ ∈ ∈ C k (G)} = o(1) G, G, Theorems: Indegree is stable. No other LAR algorithm is known to be stable or weakly rank-stable (but some are under modified stability definitions). PageRank is stable with high probability for power-law graphs. 5-11 IRDM WS 2005

LAR Experimental Comparison: Queries Experimental setup: • 34 queries • rootsets of 200 pages each obtained from Google • basesets computed using Google with first 50 predecessors per page Source: Borodin et al., ACM TOIT 2005 5-12 IRDM WS 2005

LAR Experimental Comparison: Precision@10 Source: Borodin et al., ACM TOIT 2005 5-13 IRDM WS 2005

LAR Experimental Comparison: Key Authorities Is there a winner at all? Source: Borodin et al., ACM TOIT 2005 5-14 IRDM WS 2005

LAR Results for Query „Classical Guitar“ (1) Source: Borodin et al., ACM TOIT 2005 5-15 IRDM WS 2005

5.4 Topic-specific PageRank [Haveliwala 2003] Given: a (small) set of topics c k , each with a set T k of authorities (taken from a directory such as ODP (www.dmoz.org) or bookmark collection) Key idea : change the PageRank random walk by biasing the random-jump probabilities to the topic authorities T k : = ε + − ε r p A r with A' ij = 1/outdegree(j) for (j,i) ∈ E, 0 else ( 1 ) ' � � � k k k with (p k ) j = 1/|T k | for j ∈ T k , 0 else (instead of p j = 1/n) Approach: 1) Precompute topic-specific Page-Rank vectors r k 2) Classify user query q (incl. query context) w.r.t. each topic c k → probability w k := P[c k | q] w r d ( ) 3) Total authority score of doc d is ∑ k k k 5-18 IRDM WS 2005

Chapter 5: Link Analysis for Authority Scoring 5.1 PageRank (S. Brin - PowerPoint PPT Presentation

Chapter 5: Link Analysis for Authority Scoring 5.1 PageRank (S. Brin and L. Page 1997/1998) 5.2 HITS (J. Kleinberg 1997/1999) 5.3 Comparison and Extensions 5.4 Topic-specific and Personalized PageRank 5.5 Efficiency Issues 5.6 Online Page

Exercise 8: Scoring Exercise 8: Scoring FLUKA Beginners Course Exercise 8: Scoring Aim of the

Mountain High Swim League Scoring Presentation 2018 Scoring Committee 1 MHSL Scoring Training

Exercise 8: Scoring FLUKA Beginners Course Exercise 8: Scoring Aim of the exercise: 1- Add

Chapter 5: The Data Link Layer Chapter 5 Link Layer and LANs Our goals: understand

Chapter 5: The Data Link Layer Chapter 5 Link Layer and LANs Our goals: understand

Welcome to Scoring the ACIRI a Job Aid. 1 This job aid provides a brief review of the scoring

Investment Board April 21, 2014 Agenda UW-IT Portfolio Scoring Process Scoring Results

Mobile Credit Scoring: Powering Consumer Finance in Emerging Markets SUMMARY Credit Scoring

SI Scoring Guide SUBORDINATION INDEX USING SALT Discuss the scoring rules SALT SOFTWARE, LLC

Corporate Presentation September 2018 About Link REIT About Link REIT Link is Our Portfolio (1)

10 GHz Microwave Link 10 GHz Microwave Link 10 GHz Microwave Link 10 GHz Microwave Link Project

Vertex Standard EVX-Link Training EVX-Link Training What is the EVX-Link EVX-Link is a fast

Changing the Game - The De-Linking Paradigm Old Way Our Way De-Link De-Link Link Link

Automatic Scoring of Automatic Scoring of Handwritten Essays using Latent Handwritten Essays

Waldorf Presentation 26 3 SCORING 1 2 3 4 FINAL Waldorf (6-2, 4) 0 13 0 13 26

Continuous Flow Scoring of Prose Constructed Response: A Hybrid of Automated and Human Scoring

Web-Based Information Course Content Systems Introduction Databases & WWW

Exact Rank Aggregation with Parameterized Algorithms Robert Bredereck

Announcements This week : My office hours for Thurs (11/20) are moved to Friday (11/21) from 1

A remark on the ultrapower cardinality and the continuum problem c 1 and Aleksandar Perovi c 2

The Web Week 10 LBSC 671 Creating Information Infrastructures Virtual Private Networks a

Principles of Database Systems V. Megalooikonomou Fractals and Databases (based on notes by C.

Partial Kernelization for Rank Aggregation: Theory and Experiments Nadja Betzler, Robert

An Ethnographic Study of Copy and Paste Programming Practices in OOPL Miryung Kim 1 , Lawrence

Chapter 5: Link Analysis for Authority Scoring 5.1 PageRank (S. Brin - PowerPoint PPT Presentation

Chapter 5: Link Analysis for Authority Scoring 5.1 PageRank (S. Brin and L. Page 1997/1998) 5.2 HITS (J. Kleinberg 1997/1999) 5.3 Comparison and Extensions 5.4 Topic-specific and Personalized PageRank 5.5 Efficiency Issues 5.6 Online Page

Exercise 8: Scoring Exercise 8: Scoring FLUKA Beginners Course Exercise 8: Scoring Aim of the

Mountain High Swim League Scoring Presentation 2018 Scoring Committee 1 MHSL Scoring Training

Exercise 8: Scoring FLUKA Beginners Course Exercise 8: Scoring Aim of the exercise: 1- Add

Chapter 5: The Data Link Layer Chapter 5 Link Layer and LANs Our goals: understand

Chapter 5: The Data Link Layer Chapter 5 Link Layer and LANs Our goals: understand

Welcome to Scoring the ACIRI a Job Aid. 1 This job aid provides a brief review of the scoring

Investment Board April 21, 2014 Agenda UW-IT Portfolio Scoring Process Scoring Results

Mobile Credit Scoring: Powering Consumer Finance in Emerging Markets SUMMARY Credit Scoring

SI Scoring Guide SUBORDINATION INDEX USING SALT Discuss the scoring rules SALT SOFTWARE, LLC

Corporate Presentation September 2018 About Link REIT About Link REIT Link is Our Portfolio (1)

10 GHz Microwave Link 10 GHz Microwave Link 10 GHz Microwave Link 10 GHz Microwave Link Project

Vertex Standard EVX-Link Training EVX-Link Training What is the EVX-Link EVX-Link is a fast

Changing the Game - The De-Linking Paradigm Old Way Our Way De-Link De-Link Link Link

Automatic Scoring of Automatic Scoring of Handwritten Essays using Latent Handwritten Essays

Waldorf Presentation 26 3 SCORING 1 2 3 4 FINAL Waldorf (6-2, 4) 0 13 0 13 26

Continuous Flow Scoring of Prose Constructed Response: A Hybrid of Automated and Human Scoring

Web-Based Information Course Content Systems Introduction Databases &amp; WWW

Exact Rank Aggregation with Parameterized Algorithms Robert Bredereck

Announcements This week : My office hours for Thurs (11/20) are moved to Friday (11/21) from 1

A remark on the ultrapower cardinality and the continuum problem c 1 and Aleksandar Perovi c 2

The Web Week 10 LBSC 671 Creating Information Infrastructures Virtual Private Networks a

Principles of Database Systems V. Megalooikonomou Fractals and Databases (based on notes by C.

Partial Kernelization for Rank Aggregation: Theory and Experiments Nadja Betzler, Robert

An Ethnographic Study of Copy and Paste Programming Practices in OOPL Miryung Kim 1 , Lawrence

Web-Based Information Course Content Systems Introduction Databases & WWW