Recommendation Systems part 2 School for advanced sciences of - PowerPoint PPT Presentation

Recommendation Systems � part 2 School for advanced sciences of School for advanced sciences of Luchon Luchon 2015 2015 Debora Donato debora@stumbleupon.com

Today presentation • Similarity-based methods – User-similarity – Item-similarity • Similarity score – Rating-based similarity – Structural similarity • Serendipitous Rec – LDA

Similarity-based methods • Also known as Memory-based collaborative filtering. • Divided in two main classes – User similarity: people who agree in their past evaluations tend to agree again in their future evaluations – Item similarity: objects that are similar to what a user has collected before.

User similarity • For a given user, find other similar users whose ratings strongly correlate with the current user. • Recommend items rated highly by these similar users, but not rated by the current user.

User-similarity method • Weight all users with respect to similarity with the active user. • Select a subset of the users ( neighbors ) to use as predictors. • Normalize ratings and compute a prediction from a weighted combination of the selected neighbors ’ ratings. • Present items with highest predicted ratings as recommendations. 5

Neighbor Selection s uv • Let denote with the similarity score between user u and user v ˆ U u • To select the set of users that are most similar to user u, there are two neighborhood selection strategies: 1. maximum number of neighbors consists of using the most similar k users to u based on similarity score 2. correlation threshold is based on selecting all the users whose similarity weight is above a given threshold. 6

User-similarity ratings prediction The predicted rating of user u on object α is ! ∑ r u α = r u + k s uv ( r u α − r v ) v ∈ ˆ U u where r • : rating from user u on object α u α • : set of objects that user u has evaluated Γ u u = 1 ∑ • : average rating given by u r r u α Γ u α ∈Γ u 1 • : normalization factor k = ∑ s uv v

Item-similarity ratings prediction The predicted rating of user u on object α is ∑ s αβ r u β ! β ∈Γ u r u α = ∑ s αβ β ∈Γ u where • : item-item similarity score s αβ • : set of objects that user u has evaluated Γ u

Similarity score • Similarity of users/objects is the key problem • Two scenarios: – Available ratings -> correlation metrics – No ratings available -> structural properties of the input data • external information such as users’ attributes, tags and objects’ content meta information can be utilized

Cosine index • When explicit information is available (5 levels from 1 to 5) xy = r x ⋅ r y s cos r x ⋅ r Where r x r y – For users similarity and are rating vectors in the N-dimensional object space. r x – For items similarity and are rating vectors in the r y N-dimensional user space. Important to keep into consideration ‘tendencies’

Pearson coefficient in the user space • Pearson coefficient for measuring rating correlation between users u and v: ∑ ( r u α − r u ) ( r v α − r v ) α ∈ O uv s PC uv = ∑ ∑ u ) 2 v ) 2 ( r u α − r ( r v α − r α ∈ O uv α ∈ O uv Where – is the set of items rated by both u and v O uv = Γ u ∩Γ v 11

Pearson coefficient in the item space • Pearson coefficient for measuring rating correlation between items α and β : ∑ ( r u α − r α ) ( r u β − r β ) u ∈ U αβ s PC αβ = ∑ ∑ α ) 2 β ) 2 ( r u α − r ( r u β − r u ∈ U αβ u ∈ U αβ Where – is the set of users who rated both α and β U αβ 12

Correlation coefficients properties • Used also for binary vectors – Amazon use case: “User who bought this also bought” • Constrained Pearson coefficient – To take into consideration positive and negative rates r – is substituted by the “central rating” (3 stars) x • Weighted Pearson coefficient – To capture confidence in the correlation " PC O uv s uv for O uv ≤ H $ S WPC H uv = # $ PC otherwise s uv %

Structural similarity • Similarity can be defined using the external attributes such as tag and content information (difficult to obtain) • structural similarity only exploit data network structure • For sparse data, structural similarity outperforms correlation • Computed by projecting the rating bipartite network into a monopartite user-user or item- item network

Node-dependent similarity The node similarity is given by the number of Common Neighbors (CN) Many possible variations: • Salton Index, Jaccard Index, Sørensen Index, Hub Promoted Index (HPI), Hub Depressed Index (HDI) and Leicht-Holme- Newman Index (LHN1) • Variations to reward less- connected neighbors with a higher weight: Adamic- Adar Index (AA) and Resource Allocation Index (RA) • Preferential Attachment Index (PA) builds on the classical preferential attachment rule in network science

Path-dependent similarity • Two nodes are similar if they are connected by many paths ! # A n • : number of paths between nodes i and " $ ij j • Local Path Index: LP = A 2 ( ) xy + ε A 3 ( ) xy s xy • Katz similarity: Katz = β A xy + β 2 A 2 ) xy + β 3 A 3 ( ( ) xy + … s xy

Random-walk-based similarity. Image courtesy: http://parkcu.com/blog/pagerank/

Topic Sensitive or Personalized Pagerank Image courtesy: http://parkcu.com/blog/pagerank/

Many other variations – SimRank: based on the assumption that two nodes are similar if they are connected to similar nodes ∑ ∑ SimRank s zz ' z ∈Γ x z ' ∈Γ x s SimRank = C xy k x k y – Local Random Walk: To measure similarity between nodes x and y, a random walker is introduced in node x ( ) = e x • the initial occupancy vector is π x 0 ) = P T π x ( t ) • At each t: ( π x t + 1 LRW ( t ) = q x π xy t ( ) + q y π yx ( t ) s xy • q is the initial configuration function and t denotes the time step • q may be detrmined by the node degree q x = k x / M

Similarity based on external information • User attributes: – u: <age,gender, location, career,…> • Content meta information – Information retrieval • User-generated tags

SERENDIPITOUS RECS

Hibrid methodology • Content features extraction – Dimensionality Reduction – Build LDA model using “Head” URLs – Use the model to classify “Tail” URLs in Latent Topic Space • Document Graph – Compute pairwise similarity between documents with topic overlaps Cosine Similarity, Weighted Jaccard – Build a graph where documents make up the nodes and the similarity score make up the edge weights. • Page Rank – Run topic sensitive page rank over the document graph. – Spot influential documents per topic and index for fast retrieval

Content Categorization: Discovering Semantic Groups

Properties • Unsupervised (Classic LDA) and generative • Well suited for domain adaptation (taxonomy shift) • Allows making topic clusters as loose/tight as needed α – controls the peak-ness of the per-document topic distributions – controls the peak-ness of the per-topic word β distributions • Can be extended to discover relations, hierarchies, etc.,

Evaluation + Relearning • Periodically evaluate the model Perplexity ∑ • 2 Entropy = 2 p log p − Measure of how surprised the model is on an average when having to guess – between k equally probable choices. The average log probability of the trained model having seen the test samples – • Use human judgment from word intrusion and topic intrusion tasks • Good topic associations can be initialized from previous trainings or from separate topic clustering

Topic Mixtures

Controlling Serendipity • Given an initial document d, we can pick similar document i.e., document with a similar distribution on the topic space. • Using topical page rank to control serendipity T1 ¡ T2 ¡ T3 ¡ T4 ¡ T5 ¡ D1 ¡ 1 ¡ 1 ¡ 0 ¡ 0 ¡ 1 ¡ D2 ¡ 1 ¡ 1 ¡ 0 ¡ 1 ¡ 1 ¡ D3 ¡ 1 ¡ 1 ¡ 0 ¡ 1 ¡ 0 ¡

Evaluation • A/B Testing – Measure the difference in user behavior (implicit/explicit signals and retention): • “A Recommended item” vs. “Randomly picked item from the set” • “Serendipity free stumbling session” vs. “Sessions with serendipitous recommendations”

Recommendation Systems part 2 School for advanced sciences of - PowerPoint PPT Presentation

Recommendation Systems part 2 School for advanced sciences of School for advanced sciences of Luchon Luchon 2015 2015 Debora Donato debora@stumbleupon.com Today presentation Similarity-based methods User-similarity Item-similarity

Recommendation Systems Stony Brook University CSE545, Fall 2017 Recommendation Systems What

Recommendation Systems Stony Brook University CSE545, Spring 2019 Recommendation Systems

Plains Nitrogen Recommendation Plains Nitrogen Recommendation N lbs/A = (yield * N req.) lbs of

Writing Letters of Recommendation What is a letter of recommendation? A statement of

2015-2016 SUPERINTENDENTS BUDGET RECOMMENDATION BUDGET RECOMMENDATION CHESHIRE PUBLIC

Recommended For You: A First Look at Content Recommendation Networks Muhammad Ahmad

Why learn how to build recommendation engines? Jamen Long Data Scientist DataCamp Building

A Preference-Based Bandit Framework for Personalized Recommendation Maryam Tavakol and Ulf

Future of Personalized Recommendation Systems Xing Xie Microsoft Research Asia Recommendation

Context-aware recommendation Eirini Kolomvrezou, Hendrik Heuer Special Course in Computer and

A REFERENCE-BASED RECOMMENDATION SYSTEM FOR ACADEMIC PAPERS ON ACEMAP By Jingqi Zhang

Systems Systems Systems Integration Systems Integration Systems Systems Integration Systems

2016 DOJ CRI RECOMMENDATION 2.1 Recommendation 2.1 The SFPD must work with the City and

PRODUCT POLICY FORUM NOVEMBER 5, 2019 RECOMMENDATION: Unverifiable Rumors Recommendation:

Issues and Recommended Solutions Issue: Regulatory Board Recommendation: A recommendation for

Recommendation FY 2012 through FY 2014 1 FY 2013 Executive Recommendation Economy/Revenues

Strings Joan Boone jpboone@email.unc.edu Summer 2020 Slide 1 Topics Part 1 Basic string

DESIGNING FOR DISCOVERY IN THE ERA OF DATA-INTENSIVE ASTRONOMY Sarah Hegarty with A/Prof

Property Recommendations for all Australians September 2016 Glenn Bunker Data Science Manager

Evaluating Ontological Fit Jaimie Murdock Cameron Buckner Colin Allen The Representation

The serendipity of EW baryogenesis Graldine SERVANT DESY/U.Hamburg Higgs Cosmology workshop,

BACKGROUND LAUNCHED OUR FIRST DIGITAL GALLERY IN 2008 LSTA GRANT STARTED WITH 1500

OWASP Summit 2017 Why, how, what, where (v0.9, Jan 2017) Close your eyes Imagine a place

Converting Food Waste to Energy Hosted by Warren Leon, Executive Director, CESA May 12, 2014