collaborative filtering basic ideas slides based on
play

Collaborative Filtering: basic ideas (slides based on chapter 2 of - PowerPoint PPT Presentation

Collaborative Filtering: basic ideas (slides based on chapter 2 of Programming Collective Intelligence book by Toby Segaran) Fernando Lobo Data mining 1 / 16 Recommendation Systems Use the preferences of a group of people to make


  1. Collaborative Filtering: basic ideas (slides based on chapter 2 of Programming Collective Intelligence book by Toby Segaran) Fernando Lobo Data mining 1 / 16

  2. Recommendation Systems ◮ Use the preferences of a group of people to make recommendations to other people. ◮ Applications: ◮ product recommendation for online shopping (like Amazon) ◮ suggesting interesting websites ◮ helping people find music and movies 2 / 16

  3. Low-tech solution ◮ Ask friends for suggestions. ◮ You want to ask friends that have good taste (they should usually like the same things as you do) ◮ It’s a good approach, but it’s limited. ◮ Shall we ask all of them? ◮ Even if we do so, we don’t have that many friends . . . ◮ But even with lots of friends, how to integrate the results? 3 / 16

  4. Collaborative Filtering ◮ Searches in a large group of people and finds a smaller set with tastes similar to yours. ◮ Looks at other things they like and combines them to create a ranked list of suggestions. 4 / 16

  5. Example: rows=People, columns=Movies Lady Snake Luck Superman Dupree Night Lisa 2.5 3.5 3.0 3.5 2.5 3.0 Gene 3.0 3.5 1.5 5.0 3.5 3.0 Michael 2.5 3.0 3.5 4.0 Claudia 3.5 3.0 4.0 2.5 4.5 Mike 3.0 4.0 2.0 3.0 2.0 3.0 Jack 3.0 4.0 5.0 3.5 3.0 Toby 4.5 4.0 1.0 5 / 16

  6. Finding Similar Users ◮ Need a way to determine how similar people are in their tastes. ◮ We need a similarity measure (just like in clustering or nearest neighbor algorithms) ◮ Various similarity measures (distance functions) can be used. 6 / 16

  7. Finding Similar Users ◮ Similarity measure is usually applied to items (movies) rated in common. ◮ Example based on Euclidean Distance (gives value between 0 and 1): 1 Similarity ( X , Y ) = 1 + EuclideanDistance ( X , Y ) Similarity ( ′ Michael ′ , ′ Claudia ′ ) = 1 (3 . 0 − 3 . 5) 2 + (3 . 5 − 4 . 0) 2 + (4 . 0 − 4 . 5) 2 = 0 . 536 � 1 + 7 / 16

  8. Another measure: Pearson Correlation Score n � x i y i − � x i � y i Pearson ( X , Y ) = n � x i 2 − ( � x i ) 2 � n � y i 2 − ( � y i ) 2 � ◮ Measures how well two sets of data fit on a straight line. ◮ Interesting property: corrects grade inflation. ◮ Jack tends to give higher scores than Lisa, but the line still fits because they have relatively similar preferences. 8 / 16

  9. Lisa and Jack have a high Pearson Correlation Score 9 / 16

  10. Ranking people ◮ Now we can rank people according to how their tastes are similar to mine (or those of any other person): ◮ just compute the similarity score between myself and every other person. ◮ this is just a kind of nearest neighbor algorithm. 10 / 16

  11. Recommending Items ◮ We can find someone with similar tastes to mine. ◮ But what we want is a movie recommendation. ◮ Solution: score the items (movies) by doing a weighted average of the score given by the other people. 11 / 16

  12. Example: recommendations for Toby 12 / 16

  13. Example: recommendations for Toby ◮ table shows movies Toby hasn’t seen (Night, Lady, Luck). ◮ columns starting with S.x give similarity multiplied by rating. ◮ need to divide by the sum of the similarities for people that reviewed that movie (Sim. Sum row, in the table) ◮ the last row shows the scores (recommendations) for movies that Toby hasn’t seen. 13 / 16

  14. Matching products 14 / 16

  15. Matching products ◮ We find people with similar taste to ours in order to get movie recommendations. ◮ We can also find which products (movies) are similar to each other.. ◮ Algorithm is the same (just change the role of people and movies). 15 / 16

  16. More things ◮ We’ve just seen a basic method that belong to a class of so-called memory-based methods . ◮ These methods have severe limitations if the data matrix is sparse (the usual case in real applications). ◮ There are more advanced algorithms to deal with this issue. 16 / 16

Recommend


More recommend