TimeRank Personalising Web-search Results Using Time And Topic Gina Horscroft, Jordan Kadish, Tashiv Sewpersad Supervisor: Jivashi Nagar Co-Supervisor: Hussein Suleman
Project Context What problem do we aim to address in the field of Information Retrieval ?
Context: Information Retrieval (IR) Field within Computer Science which aims to: Maximise relevance of search results for a given user’s query. ● Satisfy user search intentions quickly. ● Current Problem in IR: “Short User Queries” In the context of the Internet. ● Users are unwilling to state search intentions explicitly. ● Difficult to retrieve relevant results due to ambiguities [1]. ● Example: “Java” ● Related to programming ? ○ Related to coffee ? ○
Context: Web Search Personalization Web search personalization as a solution: Uses implicit user information like search query history to improve ● the relevance of search results. Often done through re-ranking or query extension. ● Shown to improve retrieval quality of IR systems [3]. ● Used by Google - same results for same query from same user. ● Problem in Web search personalization: To our knowledge, no approach can model the diverging interests ● of users at different times as observed by Mandl [4]. Example: work versus leisure interests. ●
Same Result for Same Query from Same User 1 User Objectives: 2 Coffee @ 11PM ? ● Programming @ 8AM? ● 3 4
Project Overview What components make up Web search personalization?
Project Aim By the end of the project, we want to: Provide a more personalised overall search experience ● Using time as well as topic ○ Improve the search results returned by a user submitted query through ● re-ranking . Do so without being obtrusive ●
Web Search Personalization Components
Work Allocation Project Parts: Browser Plugin for User Data Collection ( Jordan Kadish ) ● User Modelling ( Tashiv Sewpersad ) ● Web-Search Ranking Algorithm ( Gina Horscroft ) ● Project Progression: One part engineering, two parts research... ● Parts are dependent on one another, ● But will be developed in parallel... ●
Legal & Ethical Issues User privacy Informed user consent ● Secure storage of user information ● Result censorship Re-ranking of results may be viewed as censorship ● Allow user to disable re-ranking and view original results ●
Browser Plugin A framework to support data collection and re-ranking
Browser Plugin - Related Works Why Queries were chosen: Queries represent user’s general interests [5] [2]. ● Shown to improve retrieval quality when used in a user’s profile [3]. ● Especially beneficial for modelling short term (within session) user ● behaviour which becomes more useful as a search session progresses [5].
Browser Plugin - Related Works Other Data Collection Sources: Bookmarks shown to be insufficient information source [2] ● not all internet users use bookmarks ○ Internet History beneficial for modelling long term user behaviour ● useful at start of browsing session [5][9]. ○ Already being used by Google (amongst others) ○ Web Server logs difficult to access publicly [12] ● May be difficult to track user IP’s ○
Browser Plugin - Overview Engineering Goal: “ Build unobtrusive browser plugin that collects, stores, and encrypts user queries.” The feature requirements: Must automatically collect user information ● After consent has been given ○ Encrypt queries ● Re-rank results displayed to user ● Using the methods provided by team members ○ Assumptions: One user per pc ●
Browser Plugin - Overview Scope: Currently Chrome, could be expanded to other browsers ● Chrome offers great support for developers ● Choice: Plugin vs. Toolbar? Toolbars are outdated, intrusive ● Don’t mess with the user’s normal flow of searching ●
Browser Plugin - Methodology Queries need to be stored and manipulated ● Client-side storage ○ AES-256 file encryption ○ Data not transferred to server (prevent server leaks) ○ Allow users the choice to opt in ● Plugin can display alert on install ○ Unobtrusive ● run in the background ○ Combination of Javascript, Java & Python ● Compilation issues? Libraries convert Python/Java to Javascript (Jiphy, Transcrypt) ○
Browser Plugin - Evaluation Unit test evaluation: Robust test cases need to be developed ● Are queries written to file? Encrypted? Etc ○
User Modelling To what degree can a time and topic-based user-profile be used to predict future user searches?
User Modelling - Overview What is User Modeling? Building a user profile based on query topics ● A representation of user interests ● Can be used to personalize Web search results ● Research Question: To what degree can a time and topic-based user-profile be used to predict ● future user searches? Novelty: Associating query times to the topics that represent them. ● Investigating 24 hours, 1 week and fortnight encodings. ●
User Modelling - Topic Modelling Approaches Latent Semantic Indexing (LSI) [6] Assumes one topic per query. ● Probabilistic Latent Semantic Indexing (PLSI) [7] Limited to the number of topics detected during training. ● I.e. cannot set the number of topics ● Latent Dirichlet Allocation (LDA) [8] Queries related to multiple topics. ● Not limited to a set number of topics. ● I.e. can set the number of topics ●
User Modelling - Methodology Building a User Profile: 1. Query Log: 2. Pre-Processed Query Log: 23:00 - “java” 23:00 - “java” 23:02 - “coffee houses near me” 23:02 - “coffee house” Preprocess 08:00 - “java” 08:00 - “java” 08:01 - “java programming guides” 08:01 - “java program guide” ... ... Topic Modelling 4. User Profile: 3. Topics: 0.2 - “java” “java” 0.1 - “coffee house” “coffee house” Associate 0.6 - “program” “program” Time ... ...
User Modelling - Evaluation Step 1 - Prepare AOL query log: 1. AOL Query Log: 2. Training Set (80%): 1 - 23:02 - “coffee houses” 1 - 23:02 - “coffee houses” 2 - 08:01 - “java programming” ... ... Split 3. Testing Set (20%): 2 - 08:01 - “java programming” ... Topic Modeling 4. Testing Set Topics: 1 - “Java”, “Programming” ...
User Modelling - Evaluation Step 2 - Different Profile Build Profiles: 24 Hour Profile Build Profiles 2. Training Set (80%): 1 Week Profile 23:02 - “coffee houses near me” ... 2 Week Profile
User Modelling - Evaluation Step 3- Check prediction ability: Should be 24 Hour Profile looking for Java 4. Testing Set Topics: 1 - “Java”, “Programming” ... Test Profiles Should be 1 Week Profile looking for Cats * Sliding Window Approach
User Modelling - Evaluation AOL query logs are Controversial: Poor anonymisation ● Now redacted by AOL ● Terms of Use: for non-commercial research use only ● AOl Query Log Snippet:
Re-ranking Algorithm A means for improving search result relevance
Re-ranking Algorithm - Related Works Teevan et al. [1] suggested an issue in providing more personalised ● search results to users User unlikely to specify their intentions ○ Though different users have different intentions ■ Solution - use implicit data about the user to improve results ● Re-rank returned results for a query based on this implicit data to improve relevance ○ Efficient client-side computation is able to provide improvements in ● search rankings scalable ○
Re-ranking Algorithm - Related Works Mandl [4] observed that while different personalisation algorithms can ● improve results, no such algorithm takes the diverging interests of users at different times into account Why re-ranking? ● Search engines already provide results that satisfy a wide range of interests ○ Re-ranking to make these results more individualised ■ Mandl[4] showed that re-ranking is more effective than query modification ○ Adapts to users interests ■
Re-ranking Algorithm - Aim Can a Web-search ranking algorithm that personalizes results on the basis of time-sensitivity return results that are more relevant to a user than an algorithm that does not? Currently, no Web-search personalisation methods factor in time as ● implicit information to determine rankings Goal: determine if re-ordering search results factoring in time can improve the relevance of results Produce a solution to re-rank search results based on a user’s habits over ● time
Re-ranking Algorithm - Methodology 1. Create 10-12 “Dummy” profiles of ideal users Create independance from other stages of the project ○ Use user profiles generated in step 2 as input ○ Dummy profiles as temporary stand-ins ○ 2. Retrieve top ~20 results for a query Using JSoup to extract HTML data ○
Re-ranking Algorithm - Methodology 3. Analyse snippets of results for topics - Dictionary analysis
Re-ranking Algorithm - Methodology 4. Search user history for overlap with topics 5. Re-rank list of returned documents based on each document and the user profile With respect to topic and time ●
Recommend
More recommend