Announcements: - Thank you for your course feedback! - Watch out - PowerPoint PPT Presentation

Announcements: - Thank you for your course feedback! - Watch out for homework 2 feedback poll - Course project –TAs will reach out with feedback - Regrade requests for HW1 – Deadline Thu next week at 23:59pm - Today: HW2 due / HW3 release 4/27/20 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 1

A B C 3.3 38.4 34.3 D E F 3.9 8.1 3.9 1.6 1.6 1.6 1.6 1.6 4/27/20 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 2

[1/N] NxN M 0.8·½+0.2·⅓ y 1/2 1/2 0 1/3 1/3 1/3 + 0.2 1/2 0 0 1/3 1/3 1/3 0.8 0.8·½+0.2·⅓ 0 1/2 1 1/3 1/3 1/3 0.8·½+0.2·⅓ 0.2·⅓ 0.2· ⅓ y 7/15 7/15 1/15 0.8+0.2·⅓ a 7/15 1/15 1/15 0.8·½+0.2·⅓ a m 1/15 7/15 13/15 m 0 . 2 · ⅓ 0 . 2 A · ⅓ y 1/3 0.33 0.24 0.26 7/33 a = . . . 1/3 0.20 0.20 0.18 5/33 m 1/3 0.46 0.52 0.56 21/33 r = A r 4/27/20 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 3

¡ Input: Graph 𝑯 and parameter 𝜸 § Directed graph 𝑯 (can have spider traps and dead ends ) § Parameter 𝜸 ¡ Output: PageRank vector 𝒔 (#) = % § Set: 𝑠 & , 𝑢 = 1 ! (𝒖$𝟐) 𝒔 𝒋 § Do: ∀𝑘: 𝒔′ 𝒌 = ∑ 𝒋→𝒌 𝜸 𝒆 𝒋 If the graph has no dead- 𝒔′ 𝒌 = 𝟏 if in-degree of 𝒌 is 0 ends then the amount of leaked PageRank is 1-β . But § Now re-insert the leaked PageRank: since we have dead-ends the (𝒖) = 𝒔 (𝒌 + 𝟐)𝑻 amount of leaked PageRank ∀𝒌: 𝒔 𝒌 where: 𝑇 = ∑ ! 𝑠′ ! may be larger. We have to 𝑶 explicitly account for it by § 𝒖 = 𝒖 + 𝟐 computing S . (,) − 𝑠 (,-%) < 𝜁 § while ∑ ! 𝑠 ! ! 4/27/20 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 4

¡ Measures generic popularity of a page § Will ignore/miss topic-specific authorities § Solution: Topic-Specific PageRank ( next ) ¡ Uses a single measure of importance § Other models of importance § Solution: Hubs-and-Authorities ¡ Susceptible to Link spam § Artificial link topographies created in order to boost page rank § Solution: TrustRank 4/27/20 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 5

4/27/20 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 6

¡ Instead of generic popularity, can we measure popularity within a topic? ¡ Goal: Evaluate Web pages not just according to their popularity, but also by how close they are to a particular topic, e.g. “sports” or “history” ¡ Allows search queries to be answered based on interests of the user § Example: Query “Trojan” wants different pages depending on whether you are interested in sports, history, or computer security 4/27/20 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 7

¡ Random walker has a small probability of teleporting at any step ¡ Teleport can go to: § Standard PageRank: Any page with equal probability § To avoid dead-end and spider-trap problems § Topic Specific PageRank: A topic-specific set of “relevant” pages (teleport set) ¡ Idea: Bias the random walk § When the walker teleports, she picks a page from a set S § S contains only pages that are relevant to the topic § E.g., Open Directory (DMOZ) pages for a given topic/query § For each teleport set S , we get a different vector r S 4/27/20 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 8

¡ To make this work all we need is to update the teleportation part of the PageRank formulation: 𝑩 𝒋𝒌 = 𝜸 𝑵 𝒋𝒌 + (𝟐 − 𝜸)/|𝑻| if 𝒋 ∈ 𝑻 𝜸 𝑵 𝒋𝒌 + 𝟏 otherwise § A is a stochastic matrix! ¡ We weighted all pages in the teleport set S equally § Could also assign different weights to pages! ¡ Compute as for regular PageRank: § Multiply by M , then add a vector § Maintains sparseness 4/27/20 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 9

Suppose S = {1} , b = 0.8 0.2 1 0.5 Node Iteration 0.5 0.4 0 1 2 … stable 0.4 1 0.25 0.4 0.28 0.294 1 2 3 2 0.25 0.1 0.16 0.118 0.8 3 0.25 0.3 0.32 0.327 1 1 4 0.25 0.2 0.24 0.261 0.8 0.8 4 S β r 1 r 2 r 3 r 4 S β r 1 r 2 r 3 r 4 {1,2,3,4} 0.8 0.13 0.10 0.39 0.36 {1} 0.9 0.17 0.07 0.40 0.36 {1,2,3} 0.8 0.17 0.13 0.38 0.30 {1} 0.8 0.29 0.12 0.33 0.26 {1,2} 0.8 0.26 0.20 0.29 0.23 {1} 0.7 0.39 0.14 0.27 0.19 {1} 0.8 0.29 0.12 0.33 0.26 4/27/20 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 10

¡ Create different PageRanks for different topics § The 16 DMOZ top-level categories: § Arts, Business, Sports,… ¡ Which topic ranking to use? § User can pick from a menu § Classify query into a topic § Can use the context of the query § E.g., query is launched from a web page talking about a known topic § History of queries e.g., “basketball” followed by “Jordan” § User context, e.g., user’s bookmarks, … 4/27/20 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 11

Random Walk with Restarts: set S is a single node 4/27/20 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 12

[Tong-Faloutsos, ‘06] I 1 J 1 1 A 1 H 1 B 1 1 D 1 1 1 E G F a.k.a.: Relevance, Closeness, ‘Similarity’… 4/27/20 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 13

¡ Shortest path is not good: ¡ No effect of degree-1 nodes (E, F, G)! ¡ Multi-faceted relationships 4/27/20 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 14

¡ Network flow is not good: ¡ Does not punish long paths 4/27/20 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 15

¡ Need a method that considers: § Multiple connections § Multiple paths § Direct and indirect connections § Degree of the node 4/27/20 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 16

¡ SimRank: Random walks from a fixed node on k -partite graphs Conferences Tags Authors ¡ Setting: k -partite graph with k types of nodes § E.g.: Authors, Conferences, Tags ¡ Topic Specific PageRank from node u : teleport set S = { u } ¡ Resulting scores measure similarity/proximity to node u ¡ Problem: § Must be done once for each node u § Only suitable for sub-Web-scale applications 4/27/20 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 17

… … Q: What is the most IJCAI related conference Philip S. Yu KDD to ICDM ? Ning Zhong ICDM A: Topic-Specific R. Ramakrishnan SDM PageRank with M. Jordan AAAI teleport set S={ICDM} … NIPS … Conference Author 4/27/20 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 18

Pin Board 4/27/20 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 20

¡ Pins belong to Boards 4/27/20 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 21

Input: 4/27/20 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 22

Input: Recommendations: 4/27/20 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 23

Input: 4/27/20 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 24

Input: Recommendations: 4/27/20 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 25

¡ Idea: § Every node has some importance § Importance gets evenly split among all edges and pushed to the neighbors ¡ Given a set of QUERY NODES Q, simulate a random walk: Q 4/27/20 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 29

¡ Proximity to query node(s) Q : 4/27/20 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 30

¡ Proximity to query node(s) Q : 5 5 5 5 5 5 14 9 Q 16 7 8 8 8 8 1 1 1 Yummm Strawberries Smoothies Smoothie Madness!•!•!•! 4/27/20 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 31

¡ Pixie: § Outputs top 1k pins with highest visit count Extensions: ¡ Weighted edges: § The walk prefers to traverse certain edges: § Edges to pins in your local language ¡ Early stopping: § Don’t need to walk a fixed big number of steps § Walk until 1k-th pin has at least 20 visits 4/27/20 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 32

Announcements: - Thank you for your course feedback! - Watch out - PowerPoint PPT Presentation

Announcements: - Thank you for your course feedback! - Watch out for homework 2 feedback poll - Course project TAs will reach out with feedback - Regrade requests for HW1 Deadline Thu next week at 23:59pm - Today: HW2 due / HW3

Questions? Questions? Questions? Questions? Questions? Questions? Questions? Questions?

ICR Event S eries Thank you to our Gold S ponsor Thank you to our Gold S ponsor Thank you to

Thank you to Thank you Platinum Sponsors Thank you Gold Sponsors Thank you Silver Sponsors

Thank You to our Speakers David Frum 2 1 Thank You to our Sponsors DIAMOND SPONSOR 3 Thank

A smart watch with alcohol-based sanitizer gel dual purpose unit that can flip from watch to

NATIONAL HOME WATCH ASSOCIATION NATIONAL HOME WATCH ASSOCIATION NATIONAL HOME WATCH ASSOCIATION

Announcements: - Thank you for participating in our mid-quarter evaluation Thank you for

October 20-23 Washington Hilton Thank You to Our Lead Sponsor Thank You to Our Platinum

VICPD BLOCKWATCH WATCH COMMANDERS OPEN HOUSE Bowen-Michael Osoko, MA VicPD BlockWatch Watch

Emerson WATCH D.O.G.S. What is WATCH D.O.G.S.? Watch D.O.G.S. (Dads Of Great Students) is the

F*WATCH, making a watch differently! Federico Vaga, Matthieu Cattin FOSDEM, Brussels, 31 January

Physicists Summary Asher Kaboth 21 Sept 2016 Thank you! Thank you to the organizers!

Baby Got Feedback: How to Give and Take Feedback Like A Boss Sarah Hagan @thesarahhagan Sarah

Course Orientation q Course Description q Course Outcomes q Course Requirements q Course Outline

Course Feedback Uses: Improvement and Formal Evaluation Fall Faculty Conference 2018 Student

NEIGHBORHOOD WATCH Gretchen Lorenzo FCPP,CPTED,CIS,CIT Crime Prevention Coordinator Fort Myers

E me rg e nc y Pro c e dure s Howard Avenue Building B u i l d i n g & P r o p e r t y M

WELCOME ABOARD! Thanks so much for getting Tube Profit Explosion, youre pretty cool! TUBE

Outline Motivation 1 Decaying Gravitino Dark Matter 2 Neutrino Detection 3 Neutrino

Vibration Data Jeremiah Holzbauer/Chris Adolphsen LCLS-II Transportation Readiness Review June

Shadows increase realism: Cry Cry En Engine Zaxxon Zaxxon (1982) 2 Shadows increase

Converting Millilitres and Litres Aim I can convert metric measures involving volume and

Lantern of Slides Lantern of Slides Filesize: 3.06 MB Reviews Reviews Comprehensive guide for

@CFED facebook.com/CFEDNews cfed.org/blog/inclusiveeconomy @CFED facebook.com/CFEDNews

Announcements: - Thank you for your course feedback! - Watch out - PowerPoint PPT Presentation

Announcements: - Thank you for your course feedback! - Watch out for homework 2 feedback poll - Course project TAs will reach out with feedback - Regrade requests for HW1 Deadline Thu next week at 23:59pm - Today: HW2 due / HW3

Questions? Questions? Questions? Questions? Questions? Questions? Questions? Questions?

ICR Event S eries Thank you to our Gold S ponsor Thank you to our Gold S ponsor Thank you to

Thank you to Thank you Platinum Sponsors Thank you Gold Sponsors Thank you Silver Sponsors

Thank You to our Speakers David Frum 2 1 Thank You to our Sponsors DIAMOND SPONSOR 3 Thank

A smart watch with alcohol-based sanitizer gel dual purpose unit that can flip from watch to

NATIONAL HOME WATCH ASSOCIATION NATIONAL HOME WATCH ASSOCIATION NATIONAL HOME WATCH ASSOCIATION

Announcements: - Thank you for participating in our mid-quarter evaluation Thank you for

October 20-23 Washington Hilton Thank You to Our Lead Sponsor Thank You to Our Platinum

VICPD BLOCKWATCH WATCH COMMANDERS OPEN HOUSE Bowen-Michael Osoko, MA VicPD BlockWatch Watch

Emerson WATCH D.O.G.S. What is WATCH D.O.G.S.? Watch D.O.G.S. (Dads Of Great Students) is the

F*WATCH, making a watch differently! Federico Vaga, Matthieu Cattin FOSDEM, Brussels, 31 January

Physicists Summary Asher Kaboth 21 Sept 2016 Thank you! Thank you to the organizers!

Baby Got Feedback: How to Give and Take Feedback Like A Boss Sarah Hagan @thesarahhagan Sarah

Course Orientation q Course Description q Course Outcomes q Course Requirements q Course Outline

Course Feedback Uses: Improvement and Formal Evaluation Fall Faculty Conference 2018 Student

NEIGHBORHOOD WATCH Gretchen Lorenzo FCPP,CPTED,CIS,CIT Crime Prevention Coordinator Fort Myers

E me rg e nc y Pro c e dure s Howard Avenue Building B u i l d i n g &amp; P r o p e r t y M

WELCOME ABOARD! Thanks so much for getting Tube Profit Explosion, youre pretty cool! TUBE

Outline Motivation 1 Decaying Gravitino Dark Matter 2 Neutrino Detection 3 Neutrino

Vibration Data Jeremiah Holzbauer/Chris Adolphsen LCLS-II Transportation Readiness Review June

Shadows increase realism: Cry Cry En Engine Zaxxon Zaxxon (1982) 2 Shadows increase

Converting Millilitres and Litres Aim I can convert metric measures involving volume and

Lantern of Slides Lantern of Slides Filesize: 3.06 MB Reviews Reviews Comprehensive guide for

@CFED facebook.com/CFEDNews cfed.org/blog/inclusiveeconomy @CFED facebook.com/CFEDNews

E me rg e nc y Pro c e dure s Howard Avenue Building B u i l d i n g & P r o p e r t y M