Link Analysis Stony Brook University CSE545, Fall 2016 The Web , - PowerPoint PPT Presentation

Link Analysis Stony Brook University CSE545, Fall 2016

The Web , circa 1998

The Web , circa 1998 Match keywords, language ( information retrieval ) Explore directory

The Web , circa 1998 Easy to game with “term spam” Time-consuming; Match keywords, language ( information retrieval ) Not open-ended Explore directory

Enter PageRank ...

PageRank Key Idea: Consider the citations of the website in addition to keywords.

PageRank Key Idea: Consider the citations of the website in addition to keywords. Who links to it? and what are their citations?

PageRank Key Idea: Consider the citations of the website in addition to keywords. Who links to it? and what are their citations? The Web as a directed graph:

PageRank Key Idea: Consider the citations of the website in addition to keywords. Who links to it? and what are their citations? Innovation 1: What pages would a “random Web surfer” end up at? Innovation 2: Not just own terms but what terms are used by citations?

PageRank Key Idea: Consider the citations of the website in addition to keywords. Flow Model: in-links as votes Innovation 1: What pages would a “random Web surfer” end up at? Innovation 2: Not just own terms but what terms are used by citations?

PageRank Key Idea: Consider the citations of the website in addition to keywords. Flow Model: in-links (citations) as votes But citations from important pages should count more. Use recursion to figure out if each page is important. Innovation 1: What pages would a “random Web surfer” end up at? Innovation 2: Not just own terms but what terms are used by citations?

PageRank Key Idea: Consider the citations of the website in addition to keywords. Flow Model: How to compute? in-links (citations) as votes Each page ( j ) has an importance (i.e. rank, r j ) But citations from important pages should count more. ( n j is |out-links|) Use recursion to figure out if each page is important. Innovation 1: What pages would a “random Web surfer” end up at? Innovation 2: Not just own terms but what terms are used by citations?

A B PageRank C D How to compute? Each page ( j ) has an importance (i.e. rank, r j ) ( n j is |out-links|) Innovation 1: What pages would a “random Web surfer” end up at? Innovation 2: Not just own terms but what terms are used by citations?

A B r A /1 PageRank r B /4 C D r C /2 r D = r A /1 + r B /4 + r C /2 How to compute? Each page ( j ) has an importance (i.e. rank, r j ) ( n j is |out-links|) Innovation 1: What pages would a “random Web surfer” end up at? Innovation 2: Not just own terms but what terms are used by citations?

A B PageRank C D How to compute? Each page ( j ) has an importance (i.e. rank, r j ) ( n j is |out-links|) Innovation 1: What pages would a “random Web surfer” end up at? Innovation 2: Not just own terms but what terms are used by citations?

A B PageRank C D A system of equations? How to compute? Each page ( j ) has an importance (i.e. rank, r j ) ( n j is |out-links|) Innovation 1: What pages would a “random Web surfer” end up at? Innovation 2: Not just own terms but what terms are used by citations?

A B PageRank C D A system of equations? How to compute? Provides Each page ( j ) has an importance (i.e. rank, r j ) intuition, but impractical to ( n j is |out-links|) solve at scale. Innovation 1: What pages would a “random Web surfer” end up at? Innovation 2: Not just own terms but what terms are used by citations?

A B PageRank C D to \ from A B C D A 0 1/2 1 0 B 1/3 0 0 1/2 C 1/3 0 0 1/2 D 1/3 1/2 0 0 “Transition Matrix”, M Innovation 1: What pages would a “random Web surfer” end up at?

A B PageRank C D to \ from A B C D A 0 1/2 1 0 B 1/3 0 0 1/2 C 1/3 0 0 1/2 D 1/3 1/2 0 0 “Transition Matrix”, M Innovation 1: What pages would a “random Web surfer” end up at? To start: N=4 nodes, so r = [¼, ¼, ¼, ¼,]

A B PageRank C D to \ from A B C D A 0 1/2 1 0 B 1/3 0 0 1/2 C 1/3 0 0 1/2 D 1/3 1/2 0 0 “Transition Matrix”, M Innovation 1: What pages would a “random Web surfer” end up at? To start: N=4 nodes, so r = [¼, ¼, ¼, ¼,] after first iteration: M·r = [3/8, 5/24, 5/24, 5/24]

A B PageRank C D to \ from A B C D A 0 1/2 1 0 B 1/3 0 0 1/2 C 1/3 0 0 1/2 D 1/3 1/2 0 0 “Transition Matrix”, M Innovation 1: What pages would a “random Web surfer” end up at? To start: N=4 nodes, so r = [¼, ¼, ¼, ¼,] after first iteration: M·r = [3/8, 5/24, 5/24, 5/24] after second iteration: M(M·r) = M 2 ·r = [15/48, 11/48, … ]

A B PageRank C D Power iteration algorithm to \ from A B C D r [0] = [1/N, … , 1/N], Initialize: A 0 1/2 1 0 r [-1]=[0,...,0] while (err_norm(r[t],r[t-1])>min_err): B 1/3 0 0 1/2 r [t+1] = M·r [t] C 1/3 0 0 1/2 t+=1 D 1/3 1/2 0 0 solution = r [t] “Transition Matrix”, M err_norm(v1, v2) = |v1 - v2| #L1 norm Innovation 1: What pages would a “random Web surfer” end up at? To start: N=4 nodes, so r = [¼, ¼, ¼, ¼,] after first iteration: M·r = [3/8, 5/24, 5/24, 5/24] after second iteration: M(M·r) = M 2 ·r = [15/48, 11/48, … ]

As err_norm gets smaller we are moving toward: PageRank r = M·r Power iteration algorithm We are actually just finding the r [0] = [1/N, … , 1/N], Initialize: eigenvector of M. r [-1]=[0,...,0] while (err_norm(r[t],r[t-1])>min_err): r [t+1] = M·r [t] t+=1 solution = r [t] err_norm(v1, v2) = |v1 - v2| #L1 norm Innovation 1: What pages would a “random Web surfer” end up at? To start: N=4 nodes, so r = [¼, ¼, ¼, ¼,] after first iteration: M·r = [3/8, 5/24, 5/24, 5/24] after second iteration: M(M·r) = M 2 ·r = [15/48, 11/48, … ]

As err_norm gets smaller we are moving toward: PageRank r = M·r Power iteration algorithm We are actually just finding the r [0] = [1/N, … , 1/N], Initialize: eigenvector of M. r [-1]=[0,...,0] while (err_norm(r[t],r[t-1])>min_err): x is an r [t+1] = M·r [t] eigenvector of � if: t+=1 A · x = � · x solution = r [t] err_norm(v1, v2) = |v1 - v2| #L1 norm Innovation 1: What pages would a “random Web surfer” end up at? To start: N=4 nodes, so r = [¼, ¼, ¼, ¼,] after first iteration: M·r = [3/8, 5/24, 5/24, 5/24] after second iteration: M(M·r) = M 2 ·r = [15/48, 11/48, … ]

As err_norm gets smaller we are moving toward: PageRank r = M·r Power iteration algorithm We are actually just finds the... finding the r [0] = [1/N, … , 1/N], Initialize: eigenvector of M. r [-1]=[0,...,0] while (err_norm(r[t],r[t-1])>min_err): x is an r [t+1] = M·r [t] eigenvector of � if: t+=1 A · x = � · x solution = r [t] A = 1 since columns of M sum to 1. err_norm(v1, v2) = |v1 - v2| #L1 norm thus, 1r=Mr Innovation 1: What pages would a “random Web surfer” end up at? To start: N=4 nodes, so r = [¼, ¼, ¼, ¼,] after first iteration: M·r = [3/8, 5/24, 5/24, 5/24] after second iteration: M(M·r) = M 2 ·r = [15/48, 11/48, … ]

Link Analysis Stony Brook University CSE545, Fall 2016 The Web , - PowerPoint PPT Presentation

Link Analysis Stony Brook University CSE545, Fall 2016 The Web , circa 1998 The Web , circa 1998 The Web , circa 1998 Match keywords, language ( information retrieval ) Explore directory The Web , circa 1998 Easy to game with term

Corporate Presentation September 2018 About Link REIT About Link REIT Link is Our Portfolio (1)

10 GHz Microwave Link 10 GHz Microwave Link 10 GHz Microwave Link 10 GHz Microwave Link Project

Vertex Standard EVX-Link Training EVX-Link Training What is the EVX-Link EVX-Link is a fast

Changing the Game - The De-Linking Paradigm Old Way Our Way De-Link De-Link Link Link

Teacher Teacher-Student Data Link Teacher Teacher Student Data Link Student Data Link Student

ESCom and Scottish Environment LINK Phoebe Cochrane Scottish Environment LINK May 2014

An introduction to link homology Marco Mackaay CAMGSD and Universidade do Algarve 2 September,

RT-Link: A Time-Synchronized Link Protocol Anthony Rowe, Rahul Mangharam, Raj Rajkumar C

Data-link layer Da Data ta-link link layer er Referred to as layer 2 Physical

Chapter 5: The Data Link Layer Chapter 5 Link Layer and LANs Our goals: understand

Lecture 6: Wireless Link Layer, Lecture 6: Wireless Link Layer, MAC protocols, CSMA MAC

Chapter 5: The Data Link Layer Chapter 5 Link Layer and LANs Our goals: understand

Direct Link Networks Direct Link Networks 10/11/06 UIUC - CS/ECE438, Fall 2006 2 Direct Link

Link Analysis & Social Media: A New And Powerful Investigation Tactic Link Analysis &

Access Link at UHCprovider.com Sign in to Link by clicking on the Link button in the top right

Modification 630R UK Link Gemini considerations April 2018 Gemini UK Link Gemini holds

Context Since we are at the end Announcements This is the last class of the semester -- no

Drone Mapping for LoRa and IoT Communications Institute IRNAS Rae March 9, 2018 IoT low-power

NIFA Reporting Web Conference November 10, 2016 Start Recording Adam Preuter

An IoT Forecast Thats Sunny and Clear (No Clouds!) x Philippe Coval (on behalf of

The Common Language Runtime (CLR) Based on Mark Sapossnek Computer Science Department

FARM FRESH RHODE ISLAND Ki Kim Clark Harvest of Farm to Caf the Month

FACTS AND FIGURES IN RESPECT TO THE FORESTS OF THE COUNTRY AND THEIR CONSUMPTION BY N.H.

Funding Sm Small Food & & Farm En Enter erprises ses Access to

Sambuz

Useful Links

Newsletter

Mail Us

Link Analysis Stony Brook University CSE545, Fall 2016 The Web , - PowerPoint PPT Presentation

Link Analysis Stony Brook University CSE545, Fall 2016 The Web , circa 1998 The Web , circa 1998 The Web , circa 1998 Match keywords, language ( information retrieval ) Explore directory The Web , circa 1998 Easy to game with term

Corporate Presentation September 2018 About Link REIT About Link REIT Link is Our Portfolio (1)

10 GHz Microwave Link 10 GHz Microwave Link 10 GHz Microwave Link 10 GHz Microwave Link Project

Vertex Standard EVX-Link Training EVX-Link Training What is the EVX-Link EVX-Link is a fast

Changing the Game - The De-Linking Paradigm Old Way Our Way De-Link De-Link Link Link

Teacher Teacher-Student Data Link Teacher Teacher Student Data Link Student Data Link Student

ESCom and Scottish Environment LINK Phoebe Cochrane Scottish Environment LINK May 2014

An introduction to link homology Marco Mackaay CAMGSD and Universidade do Algarve 2 September,

RT-Link: A Time-Synchronized Link Protocol Anthony Rowe, Rahul Mangharam, Raj Rajkumar C

Data-link layer Da Data ta-link link layer er Referred to as layer 2 Physical

Chapter 5: The Data Link Layer Chapter 5 Link Layer and LANs Our goals: understand

Lecture 6: Wireless Link Layer, Lecture 6: Wireless Link Layer, MAC protocols, CSMA MAC

Chapter 5: The Data Link Layer Chapter 5 Link Layer and LANs Our goals: understand

Direct Link Networks Direct Link Networks 10/11/06 UIUC - CS/ECE438, Fall 2006 2 Direct Link

Link Analysis &amp; Social Media: A New And Powerful Investigation Tactic Link Analysis &amp;

Access Link at UHCprovider.com Sign in to Link by clicking on the Link button in the top right

Modification 630R UK Link Gemini considerations April 2018 Gemini UK Link Gemini holds

Context Since we are at the end Announcements This is the last class of the semester -- no

Drone Mapping for LoRa and IoT Communications Institute IRNAS Rae March 9, 2018 IoT low-power

NIFA Reporting Web Conference November 10, 2016 Start Recording Adam Preuter

An IoT Forecast Thats Sunny and Clear (No Clouds!) x Philippe Coval (on behalf of

The Common Language Runtime (CLR) Based on Mark Sapossnek Computer Science Department

FARM FRESH RHODE ISLAND Ki Kim Clark Harvest of Farm to Caf the Month

FACTS AND FIGURES IN RESPECT TO THE FORESTS OF THE COUNTRY AND THEIR CONSUMPTION BY N.H.

Funding Sm Small Food &amp; &amp; Farm En Enter erprises ses Access to

Sambuz

Useful Links

Newsletter

Mail Us

Link Analysis & Social Media: A New And Powerful Investigation Tactic Link Analysis &

Funding Sm Small Food & & Farm En Enter erprises ses Access to