Coping with Noisy Search Experiences Coping with Noisy Search Experiences Pierre-Antoine Champin, Peter Briggs, Maurice Coyle, Barry Smyth LIRIS, Clarity, Universit´ e de Lyon, University College Dublin, France Ireland 16 December 2009
Coping with Noisy Search Experiences Structure of the Talk 1 Context 2 Addressed Problem 3 Proposals and results 4 Perspectives 2 / 29
Coping with Noisy Search Experiences Context Structure of the Talk 1 Context Recommender Systems Context Aware Recommendation Social Search 2 Addressed Problem 3 Proposals and results 4 Perspectives 3 / 29
Coping with Noisy Search Experiences Context HeyStaks HeyStaks is a social context aware recommender system for web searches 4 / 29
Coping with Noisy Search Experiences Context Recommender Systems Recommender Systems Recommender systems aim at presenting users with information that suit their particular preferences or needs. General purpose search engines provide results based on an objective measure of relevance w.r.t. the query → same results for everyone Recommender systems for web search aim at personalising the results of search engines. 5 / 29
Coping with Noisy Search Experiences Context Recommender Systems HeyStaks as a Recommender System HeyStaks is an extension for Firefox. It aims at integrating into users’ habits rather than forcing them to change. It recognizes result pages from popular search engines, and alter them in order to promote links (move them up in the list), insert new links, based on the user’s past search experiences . 6 / 29
Coping with Noisy Search Experiences Context Recommender Systems HeyStaks as a Recommender System HeyStaks is an extension for Firefox. It aims at integrating into users’ habits rather than forcing them to change. It recognizes result pages from popular search engines, and alter them in order to promote links (move them up in the list), insert new links, based on the user’s past search experiences . Past search experiences are acquired by : implicit feedback: query results click-through explicit feedback: tagging page, voting, sharing 6 / 29
Coping with Noisy Search Experiences Context Recommender Systems Recommendations in HeyStaks 7 / 29
Coping with Noisy Search Experiences Context Context Aware Recommendation Context Aware Recommendation Search engines provide the same results for every user. Recommender systems provide personalised results... ... but provide the same personalisation every time. Nobody is only one user... ... their need depends on the context , especially when considering Web searches. 8 / 29
Coping with Noisy Search Experiences Context Context Aware Recommendation Context Aware Recommendation Search engines provide the same results for every user. Recommender systems provide personalised results... ... but provide the same personalisation every time. Nobody is only one user... ... their need depends on the context , especially when considering Web searches. My searches are sometimes related to my research, my teaching, my leisure... → need for different recommendations in different contexts . 8 / 29
Coping with Noisy Search Experiences Context Context Aware Recommendation Search Staks A search stak is a repository of search experiences all related to the same context. Users can create as many staks as they need. They manually select the active stak (current context). The active stak is where search experiences will be collected, and where they will be tapped to provide recommendations. 9 / 29
Coping with Noisy Search Experiences Context Social Search Social Search Social search is the process of sharing search experiences between like-minded Web searchers. In HeyStaks, social search is possible by shared staks : several users can contribute to, and receive recommendation from the same stak. Staks can be private: only people I invite can join it. public: anyone can join the stak. 10 / 29
Coping with Noisy Search Experiences Context Social Search HeyStaks Portal 11 / 29
Coping with Noisy Search Experiences Addressed Problem Structure of the Talk 1 Context 2 Addressed Problem 3 Proposals and results 4 Perspectives 12 / 29
Coping with Noisy Search Experiences Addressed Problem The Problem of Stak Selection Users fail to select the appropriate stak before starting a search. The recommendations they will get will be less relevant. Their search experience is filed in the wrong stak. → HeyStaks ends up with a noisy experience repositories, and provides less accurate recommendations (even when the correct stak is selected). 13 / 29
Coping with Noisy Search Experiences Addressed Problem Implemented Workarounds Fall back to default stak when idle. limits the input noise potentially reduces context awareness Signal when other staks provide recommendations. improves the relevance of recommendation, despite a wrong active stak encourages to select the right stak 14 / 29
Coping with Noisy Search Experiences Addressed Problem Other Possible Solutions Automatically select the right stak at query time. almost impossible if based on the sole query terms hazardous if based on the available recommendations technically complicated if based on external indicators time tracking tools browsing history ... 15 / 29
Coping with Noisy Search Experiences Addressed Problem Other Possible Solutions Automatically select the right stak at query time. almost impossible if based on the sole query terms hazardous if based on the available recommendations technically complicated if based on external indicators time tracking tools browsing history ... Help stak owners to maintain (or curate ) their staks. use classification techniques recommend the correct stak for a page pages are easier to classify than queries 15 / 29
Coping with Noisy Search Experiences Addressed Problem Other Possible Solutions Help stak owners to maintain (or curate ) their staks. use classification techniques recommend the correct stak for a page pages are easier to classify than queries 15 / 29
Coping with Noisy Search Experiences Addressed Problem Training the Page Classifier Most work on coping with noise in recommender systems assume that a clean training set is available before noise is encountered. We need to find the kernel of each stak: the set (or a subset of) the pages actually relevant to that stak. How can we find a reliable kernel? How can we evaluate its reliability? 16 / 29
Coping with Noisy Search Experiences Proposals and results Structure of the Talk 1 Context 2 Addressed Problem 3 Proposals and results Clustering Popularity weighting Popularity-based kernel 4 Perspectives 17 / 29
Coping with Noisy Search Experiences Proposals and results Clustering Clustering Idea: use clustering techniques to identify a candidate kernel Rationale: kernel pages must be somehow similar , while noisy pages will be heterogeneous Problem: huge variability depending on numerous parameters comparing terms or pages different similarity measures different clustering algorithms threshold values 18 / 29
Coping with Noisy Search Experiences Proposals and results Popularity weighting Popularity Weighting Idea: use a measure of the popularity of pages as a proxy to relevance, in order to provide a fuzzy kernel Rationale: kernel pages are repeatedly selected, while noisy pages will only be accidentally selected 19 / 29
Coping with Noisy Search Experiences Proposals and results Popularity weighting Popularity Measure 20 / 29
Coping with Noisy Search Experiences Proposals and results Popularity weighting Popularity Measure 20 / 29
Coping with Noisy Search Experiences Proposals and results Popularity weighting Popularity Measure 20 / 29
Coping with Noisy Search Experiences Proposals and results Popularity weighting Popularity Measure 20 / 29
Coping with Noisy Search Experiences Proposals and results Popularity weighting User Evaluation Poll: for each of the 20 biggest shared staks 15 most popular pages & 15 least popular pages presented in random order to the stak owner asked if the page is relevant to the stak 21 / 29
Coping with Noisy Search Experiences Proposals and results Popularity weighting Poll Results 100 Irelevant I don’t know 90 Relevant 80 number of documents 70 60 50 40 30 20 10 0 0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 popularity 22 / 29
Coping with Noisy Search Experiences Proposals and results Popularity weighting Experiment Classifier: decision tree / naive bayse for each of the 20 biggest shared staks trained with every page, weighted by normalized popularity 10-fold cross validation Accuracy each page contributes to the accuracy proportionally to its normalized popularity → it is more important for the classifier to recognize popular pages than unpopular pages. 23 / 29
Coping with Noisy Search Experiences Proposals and results Popularity weighting Experimental Results 0.8 weighted 0.7 0.6 weighted accuracy 0.5 0.4 0.3 0.2 0.1 J48 NaiveBayes ZeroR 24 / 29
Coping with Noisy Search Experiences Proposals and results Popularity weighting Experimental Results 0.8 weighted boolean unweighted 0.7 0.6 weighted accuracy 0.5 0.4 0.3 0.2 0.1 J48 NaiveBayes ZeroR 24 / 29
Recommend
More recommend