Web Dynamics Part 7 Human Behaviour on the Web 7.1 Recommendation - PowerPoint PPT Presentation

Web Dynamics Part 7 – Human Behaviour on the Web 7.1 Recommendation 7.2 Personalized Search Summer Term 2010 Web Dynamics 7-1

High-Level View of Recommendation Input : Collected data on behavior of users • Items (books, dvds, cds,…) purchased • Items (books, movies, hotels, …) rated • Web sites browsed or bookmarked • Searches and clicked search results • Sequence of activities (browsing, searching, …) • Mails, Documents read and written • Profile in social networks (contacts) ⇒ build extensive user models Summer Term 2010 Web Dynamics 7-2

High-Level View of Recommendation Output : Items of potential interest to user • Items (books, movies, hotels,…) to purchase/view/visit/… • Web sites to visit • Improved search results • Potential query expansions/refinements • People to meet in social networks Summer Term 2010 Web Dynamics 7-3

Three orthogonal approaches User-centric approach („nearest neighbors“): User A likes/buys/visits item X user B may like (model of ) user B similar to item X as well (model of) user A Item-centric approach: User A likes/buys/visits item X user A may like Item X similar to item Y item Y as well Static approach : Many people buy X Summer Term 2010 Web Dynamics 7-4

Example 1: Web site suggestion Summer Term 2010 Web Dynamics 7-5

Example 1: Web site suggestion ⇒ ⇒ ⇒ ⇒ item-centric approach, (seemingly) no user model used Summer Term 2010 Web Dynamics 7-6

Example 2: Product Recommendations ⇒ ⇒ ⇒ ⇒ static and item-centric approach Summer Term 2010 Web Dynamics 7-7

Example 2: Product Recommendation Summer Term 2010 Web Dynamics 7-8

Example 3: Book Recommendations Summer Term 2010 Web Dynamics 7-9

Towards user-centric recommendations Assume n users U, m items I. Model user-item relation as n x m – matrix V: • V={0,1} nxm : binary purchase matrix • V=[min,max] nxm : quantified preference matrix Both are very sparse! (Librarything: 1,000,000 users, 52 mio books, less than 200 books for most users ⇒ 0,0004% non-zero entries) „semantics“: v ij seen as „vote“ of user i for item j Summer Term 2010 Web Dynamics 7-10

Recommendation Problem Inputs: • Set of votes of user u with items I u • Set of votes of other users Goal : predict votes of u for items in I\I u (to identify the items with highest votes) ⇒ yields scalability problem (|I| is large!) Summer Term 2010 Web Dynamics 7-11

Vote Prediction Initial vote calibration (to remove bias): 1 ∑ * = v v = − v v v i ij ij ij i | I | ∈ j I i i Predict vote of user u for item j as weighted average over the votes of all other users: n 1 ∑ n ∑ * = + ⋅ ˆ v v w v = C | w | uj u ui ij ui C = i 1 = i 1 similarity of users u and i Summer Term 2010 Web Dynamics 7-12

Estimating User-User Similarity • Correlation-Based similarity: 1 ∑ = − − w ( v v )( v v ) Unreliable results if ai aj a ij i C ∈ ∩ j I I 2 overlap between users a i 1 / 2   is small ∑ ∑   2 2 = − − C ( v v ) ( v v )   2 aj a ij i   ∈ ∩ ∈ ∩ j I I j I I a i a i • Vector similarity (cosine): v v ∑ aj ij = w ∑ ∑ ai 2 2 v v ∈ j I ak ik ∈ ∈ k I k I a i Remaining problem: high dimensionality (number of users and items) Summer Term 2010 Web Dynamics 7-13

Reducing Dimensionality: SVD Replace V by rank-k approximation of V using SVD: T = × × V A S B A: user-concept similarity matrix (n × r) S: diagonal matrix of singular values (with r nonzero entries, where r=rank(V)), corresponding to topics B T : concept-item similarity (r × m) Additionally restrict to k largest singular values to further reduce dimensionality Summer Term 2010 Web Dynamics 7-14

SVD Example   1 1 1 0 0 0     1 0 1 0 0 0   = V 0 1 1 0 0 0     0 0 0 1 1 1     0 0 0 0 1 1       − 0 . 707 0 0 . 544 0 0 . 707 2 . 414 0 0 0 0 0 . 5 0 . 5 0 . 707 0 0 0             − − 0 . 5 0 0 . 707 0 0 . 5 0 2 . 136 0 0 0 0 0 0 0 . 369 0 . 657 0 . 657       = − × × − 0 . 5 0 0 . 707 0 0 . 5 0 0 1 0 0 0 . 707 0 . 707 0 0 0 0             − − 0 0 . 788 0 0 . 615 0 0 0 0 0 . 662 0 0 0 0 0 . 929 0 . 261 0 . 261            −  0 0 . 615 0 0 . 788 0 0 0 0 0 0 . 414 0 . 5 0 . 5 0 . 707 0 0 0 A S B T Summer Term 2010 Web Dynamics 7-15

SVD Example   1 1 1 0 0 0     1 0 1 0 0 0   = V 0 1 1 0 0 0     0 0 0 1 1 1     0 0 0 0 1 1     0 . 707 0 0 . 854 0 . 854 1 . 207 0 0 0         0 . 5 0 0 . 604 0 . 604 0 . 854 0 0 0     2 . 414 0 0 . 5 0 . 5 0 . 707 0 0 0         ≈ × × = 0 . 5 0     0 . 604 0 . 604 0 . 854 0 0 0         0 2 . 136 0 0 0 0 . 369 0 . 657 0 . 657     0 0 . 788 0 0 0 0 . 621 1 . 106 1 . 106         0 0 . 615 0 0 0 0 . 485 0 . 864 0 . 864 A S B T Summer Term 2010 Web Dynamics 7-16

Recommendations with SVD • Predict votes on A, not on V ⇒ compute estimate v‘ uj for each topic j • Extend the vote estimate from topics to items k ∑ ( ) = ⋅ ⋅ v v ' S B ui uj jj ji = j 1 New issue: Maintaining the SVD when data changes SVD generates implicit clustering of items Summer Term 2010 Web Dynamics 7-17

Reducing Dimensionality: Clustering • Reduce number of users by precomputing K clusters of similar users • Represent each cluster P by its centroid c(P): 1 ∑ = c ( P ) v i ui | P | ∈ u P • For prediction: – Assign user to one of the clusters – Compute „nearest neighbor“-prediction for clusters instead of users • Potential problem : users may belong to multiple clusters Summer Term 2010 Web Dynamics 7-18

User-Centric is Expensive • User actions are highly dynamic – difficult to precompute and maintain similarities – best recommendations based on items just bought • One recommendation takes time O(n+m): – needs to scan all users and their items – most users have ≤C1 items – few users (≤C2) have >C1 items – cost bounded by (n-C2)·C1 + C2·m=O(n+m) – n,m large • Recommendations need to be computed in real time (≤200ms) Summer Term 2010 Web Dynamics 7-19

Item-centric Recommendations Observation: Relationships of items (i.e., correlation in purchases) a lot less dynamic than relationships of users – information from yesterday still reasonably accurate today – not recommending new items tolerable Predict vote of user u for item j as weighted average over the votes of user u for other items: m 1 ∑ m * ∑ = + ⋅ ˆ v v w v = C | w | uj u ji ui C ui = i 1 = i 1 similarity of items j and i Requires only limited knowledge about the user Summer Term 2010 Web Dynamics 7-20

Estimating Item-Item Similarity using correlation-based or cosine similarity (similar to user-user similarity) Example : cosine similarity v v ∑ uj ui = w ∑ ∑ ji 2 2 v v ∈ u U kj ki ∈ ∈ k U k U Computing similarities expensive (O(m 2 n)), but offline Computing predictions is cheap (O(m) if only constant number of items considered) Summer Term 2010 Web Dynamics 7-21

Using Search to Recommend Assume we can identify features of items (genre, actors, director, keywords, …) • Identify frequent/characteristic features for the user‘s items • Submit search for those features and recommend the results Problems: • Does not scale well for many owned items • Does not provide good recommendations Summer Term 2010 Web Dynamics 7-22

Probabilistic Models for Recommendation Consider joint probability distribution for m-dimensional set of items (binary preferences): P[v 1 …v m ] : probability that random user has vote vector ( v 1 ,… v m ) Predict unknown value v ui as P[v i =1|v j =1 for j ∈ I u ] Impossible to maintain explicitly (2 m parameters!) ⇒ approximate through finite mixture : K ≈ ∑ = ⋅ = P [ v ... v ] P [ v ... v | c k ] P [ c k ] 1 m 1 m = k 1 assume independence within each component: m ∏ = = = [ ... | ] [ | ] P v v c k P v c k 1 m j = j 1 Summer Term 2010 Web Dynamics 7-23

Evaluating Recommender Systems Goal: Out of several recommendation algorithms, determine which gives best recommendations. Required components of such a benchmark : • set of (user,item,rating) tuples for training (known to the algorithm in advance) • set of (user,item,rating) tuples for testing (where the algorithm needs to predict rating ) – Can be offline (part of the data) or live user experiment • metrics for quantifying result quality Summer Term 2010 Web Dynamics 7-24

Web Dynamics Part 7 Human Behaviour on the Web 7.1 Recommendation - PowerPoint PPT Presentation

Web Dynamics Part 7 Human Behaviour on the Web 7.1 Recommendation 7.2 Personalized Search Summer Term 2010 Web Dynamics 7-1 High-Level View of Recommendation Input : Collected data on behavior of users Items (books, dvds, cds,)

Web Dynamics Part 1 Introduction 1.1 Dimensions of dynamics in the Web 1.2 Application examples

Web Dynamics Part 1 - Introduction 1.1 Dimensions of dynamics in the Web 1.2 Application

Web Services Web Services Towards Web Services Towards Web Services Towards Web Services A

Web Dynamics Part 3 Searching the Dynamic Web 3.1 Crawling and recrawling policies 3.2

Web Dynamics Part 3 Searching the Dynamic Web 3.1 Crawling and recrawling policies 3.2

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

Lecture 1: Semantic Web and RDF Aidan Hogan aidhog@gmail.com THE WEB The Web is now 26 years

Web Crawling and Web Dynamics Knut Magne Risvik and Rolf Michelsen, Search engines and Web

Web Application Security Attacks on the Web Attacker Web User Application Web Database Web

Web Mining Web Mining to automatically discover and extract information from Web

Web Scraping 1 / 9 Web Scraping Two ways to mine data from the web The hard way, by web

Agenda Web MVC-2: Apache Struts Drawbacks with Web Model 1 Web Model 2 (Web MVC) Rimon

7. Dynamics & Age Outline 7.1. Dynamics & Age 7.2. Temporal Information 7.3. Search in

Web Services Serge Abiteboul INRIA-Futurs Web services 2002 1 Abstract Web services

CS 410/510: Web Basics Basics Web Clients HTTP Web Servers PC running Firefox Web

Web Mining Web Mining Web mining is the use of data mining techniques to automatically

A World on NDN Affordances and Implications of the Named Data Networking Future Internet

Introduction to Content Centric Networking Van Jacobson van@parc.com FISS 09 Bremen, Germany

A Data-centric Profiler for Parallel Programs Xu Liu John Mellor-Crummey Department of Computer

USER-CENTRIC SOCIAL MULTIMEDIA COMPUTING FROM USERS,ON USERS,FOR USERS Jitao Sang Institute of

Novelty Detection from an Ego- Centric Perspective Omid Aghazadeh, Josephine Sullivan, and Stefan

Using artificial intelligence to help bridge students from high school to college Elizabeth

ICN and IoT Andrs Arcia-Moret N4D Lab, Computer Laboratory University of Cambridge Agenda

Citizen Centric Business Process: Empowering Citizens in Public Service design and delivery

Web Dynamics Part 7 Human Behaviour on the Web 7.1 Recommendation - PowerPoint PPT Presentation

Web Dynamics Part 7 Human Behaviour on the Web 7.1 Recommendation 7.2 Personalized Search Summer Term 2010 Web Dynamics 7-1 High-Level View of Recommendation Input : Collected data on behavior of users Items (books, dvds, cds,)

Web Dynamics Part 1 Introduction 1.1 Dimensions of dynamics in the Web 1.2 Application examples

Web Dynamics Part 1 - Introduction 1.1 Dimensions of dynamics in the Web 1.2 Application

Web Services Web Services Towards Web Services Towards Web Services Towards Web Services A

Web Dynamics Part 3 Searching the Dynamic Web 3.1 Crawling and recrawling policies 3.2

Web Dynamics Part 3 Searching the Dynamic Web 3.1 Crawling and recrawling policies 3.2

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

Lecture 1: Semantic Web and RDF Aidan Hogan aidhog@gmail.com THE WEB The Web is now 26 years

Web Crawling and Web Dynamics Knut Magne Risvik and Rolf Michelsen, Search engines and Web

Web Application Security Attacks on the Web Attacker Web User Application Web Database Web

Web Mining Web Mining to automatically discover and extract information from Web

Web Scraping 1 / 9 Web Scraping Two ways to mine data from the web The hard way, by web

Agenda Web MVC-2: Apache Struts Drawbacks with Web Model 1 Web Model 2 (Web MVC) Rimon

7. Dynamics &amp; Age Outline 7.1. Dynamics &amp; Age 7.2. Temporal Information 7.3. Search in

Web Services Serge Abiteboul INRIA-Futurs Web services 2002 1 Abstract Web services

CS 410/510: Web Basics Basics Web Clients HTTP Web Servers PC running Firefox Web

Web Mining Web Mining Web mining is the use of data mining techniques to automatically

A World on NDN Affordances and Implications of the Named Data Networking Future Internet

Introduction to Content Centric Networking Van Jacobson van@parc.com FISS 09 Bremen, Germany

A Data-centric Profiler for Parallel Programs Xu Liu John Mellor-Crummey Department of Computer

USER-CENTRIC SOCIAL MULTIMEDIA COMPUTING FROM USERS,ON USERS,FOR USERS Jitao Sang Institute of

Novelty Detection from an Ego- Centric Perspective Omid Aghazadeh, Josephine Sullivan, and Stefan

Using artificial intelligence to help bridge students from high school to college Elizabeth

ICN and IoT Andrs Arcia-Moret N4D Lab, Computer Laboratory University of Cambridge Agenda

Citizen Centric Business Process: Empowering Citizens in Public Service design and delivery

7. Dynamics & Age Outline 7.1. Dynamics & Age 7.2. Temporal Information 7.3. Search in