TimeRank Personalising Web-search Results Using Time And Topic - PowerPoint PPT Presentation

TimeRank Personalising Web-search Results Using Time And Topic Gina Horscroft, Jordan Kadish, Tashiv Sewpersad Supervisor: Jivashi Nagar Co-Supervisor: Hussein Suleman

Project Context What problem do we aim to address in the field of Information Retrieval ?

Context: Information Retrieval (IR) Field within Computer Science which aims to: Maximise relevance of search results for a given user’s query. ● Satisfy user search intentions quickly. ● Current Problem in IR: “Short User Queries” In the context of the Internet. ● Users are unwilling to state search intentions explicitly. ● Difficult to retrieve relevant results due to ambiguities [1]. ● Example: “Java” ● Related to programming ? ○ Related to coffee ? ○

Context: Web Search Personalization Web search personalization as a solution: Uses implicit user information like search query history to improve ● the relevance of search results. Often done through re-ranking or query extension. ● Shown to improve retrieval quality of IR systems [3]. ● Used by Google - same results for same query from same user. ● Problem in Web search personalization: To our knowledge, no approach can model the diverging interests ● of users at different times as observed by Mandl [4]. Example: work versus leisure interests. ●

Same Result for Same Query from Same User 1 User Objectives: 2 Coffee @ 11PM ? ● Programming @ 8AM? ● 3 4

Project Overview What components make up Web search personalization?

Project Aim By the end of the project, we want to: Provide a more personalised overall search experience ● Using time as well as topic ○ Improve the search results returned by a user submitted query through ● re-ranking . Do so without being obtrusive ●

Web Search Personalization Components

Work Allocation Project Parts: Browser Plugin for User Data Collection ( Jordan Kadish ) ● User Modelling ( Tashiv Sewpersad ) ● Web-Search Ranking Algorithm ( Gina Horscroft ) ● Project Progression: One part engineering, two parts research... ● Parts are dependent on one another, ● But will be developed in parallel... ●

Legal & Ethical Issues User privacy Informed user consent ● Secure storage of user information ● Result censorship Re-ranking of results may be viewed as censorship ● Allow user to disable re-ranking and view original results ●

Browser Plugin A framework to support data collection and re-ranking

Browser Plugin - Related Works Why Queries were chosen: Queries represent user’s general interests [5] [2]. ● Shown to improve retrieval quality when used in a user’s profile [3]. ● Especially beneficial for modelling short term (within session) user ● behaviour which becomes more useful as a search session progresses [5].

Browser Plugin - Related Works Other Data Collection Sources: Bookmarks shown to be insufficient information source [2] ● not all internet users use bookmarks ○ Internet History beneficial for modelling long term user behaviour ● useful at start of browsing session [5][9]. ○ Already being used by Google (amongst others) ○ Web Server logs difficult to access publicly [12] ● May be difficult to track user IP’s ○

Browser Plugin - Overview Engineering Goal: “ Build unobtrusive browser plugin that collects, stores, and encrypts user queries.” The feature requirements: Must automatically collect user information ● After consent has been given ○ Encrypt queries ● Re-rank results displayed to user ● Using the methods provided by team members ○ Assumptions: One user per pc ●

Browser Plugin - Overview Scope: Currently Chrome, could be expanded to other browsers ● Chrome offers great support for developers ● Choice: Plugin vs. Toolbar? Toolbars are outdated, intrusive ● Don’t mess with the user’s normal flow of searching ●

Browser Plugin - Methodology Queries need to be stored and manipulated ● Client-side storage ○ AES-256 file encryption ○ Data not transferred to server (prevent server leaks) ○ Allow users the choice to opt in ● Plugin can display alert on install ○ Unobtrusive ● run in the background ○ Combination of Javascript, Java & Python ● Compilation issues? Libraries convert Python/Java to Javascript (Jiphy, Transcrypt) ○

Browser Plugin - Evaluation Unit test evaluation: Robust test cases need to be developed ● Are queries written to file? Encrypted? Etc ○

User Modelling To what degree can a time and topic-based user-profile be used to predict future user searches?

User Modelling - Overview What is User Modeling? Building a user profile based on query topics ● A representation of user interests ● Can be used to personalize Web search results ● Research Question: To what degree can a time and topic-based user-profile be used to predict ● future user searches? Novelty: Associating query times to the topics that represent them. ● Investigating 24 hours, 1 week and fortnight encodings. ●

User Modelling - Topic Modelling Approaches Latent Semantic Indexing (LSI) [6] Assumes one topic per query. ● Probabilistic Latent Semantic Indexing (PLSI) [7] Limited to the number of topics detected during training. ● I.e. cannot set the number of topics ● Latent Dirichlet Allocation (LDA) [8] Queries related to multiple topics. ● Not limited to a set number of topics. ● I.e. can set the number of topics ●

User Modelling - Methodology Building a User Profile: 1. Query Log: 2. Pre-Processed Query Log: 23:00 - “java” 23:00 - “java” 23:02 - “coffee houses near me” 23:02 - “coffee house” Preprocess 08:00 - “java” 08:00 - “java” 08:01 - “java programming guides” 08:01 - “java program guide” ... ... Topic Modelling 4. User Profile: 3. Topics: 0.2 - “java” “java” 0.1 - “coffee house” “coffee house” Associate 0.6 - “program” “program” Time ... ...

User Modelling - Evaluation Step 1 - Prepare AOL query log: 1. AOL Query Log: 2. Training Set (80%): 1 - 23:02 - “coffee houses” 1 - 23:02 - “coffee houses” 2 - 08:01 - “java programming” ... ... Split 3. Testing Set (20%): 2 - 08:01 - “java programming” ... Topic Modeling 4. Testing Set Topics: 1 - “Java”, “Programming” ...

User Modelling - Evaluation Step 2 - Different Profile Build Profiles: 24 Hour Profile Build Profiles 2. Training Set (80%): 1 Week Profile 23:02 - “coffee houses near me” ... 2 Week Profile

User Modelling - Evaluation Step 3- Check prediction ability: Should be 24 Hour Profile looking for Java 4. Testing Set Topics: 1 - “Java”, “Programming” ... Test Profiles Should be 1 Week Profile looking for Cats * Sliding Window Approach

User Modelling - Evaluation AOL query logs are Controversial: Poor anonymisation ● Now redacted by AOL ● Terms of Use: for non-commercial research use only ● AOl Query Log Snippet:

Re-ranking Algorithm A means for improving search result relevance

Re-ranking Algorithm - Related Works Teevan et al. [1] suggested an issue in providing more personalised ● search results to users User unlikely to specify their intentions ○ Though different users have different intentions ■ Solution - use implicit data about the user to improve results ● Re-rank returned results for a query based on this implicit data to improve relevance ○ Efficient client-side computation is able to provide improvements in ● search rankings scalable ○

Re-ranking Algorithm - Related Works Mandl [4] observed that while different personalisation algorithms can ● improve results, no such algorithm takes the diverging interests of users at different times into account Why re-ranking? ● Search engines already provide results that satisfy a wide range of interests ○ Re-ranking to make these results more individualised ■ Mandl[4] showed that re-ranking is more effective than query modification ○ Adapts to users interests ■

Re-ranking Algorithm - Aim Can a Web-search ranking algorithm that personalizes results on the basis of time-sensitivity return results that are more relevant to a user than an algorithm that does not? Currently, no Web-search personalisation methods factor in time as ● implicit information to determine rankings Goal: determine if re-ordering search results factoring in time can improve the relevance of results Produce a solution to re-rank search results based on a user’s habits over ● time

Re-ranking Algorithm - Methodology 1. Create 10-12 “Dummy” profiles of ideal users Create independance from other stages of the project ○ Use user profiles generated in step 2 as input ○ Dummy profiles as temporary stand-ins ○ 2. Retrieve top ~20 results for a query Using JSoup to extract HTML data ○

Re-ranking Algorithm - Methodology 3. Analyse snippets of results for topics - Dictionary analysis

Re-ranking Algorithm - Methodology 4. Search user history for overlap with topics 5. Re-rank list of returned documents based on each document and the user profile With respect to topic and time ●

TimeRank Personalising Web-search Results Using Time And Topic - PowerPoint PPT Presentation

TimeRank Personalising Web-search Results Using Time And Topic Gina Horscroft, Jordan Kadish, Tashiv Sewpersad Supervisor: Jivashi Nagar Co-Supervisor: Hussein Suleman Project Context What problem do we aim to address in the field of

WO-94 Google Smarter Tips 1 11/5/2014 Google Doodles Go to Doodles are the fun,

STAKEHOLDER WEBINAR OCTOBER 19, 2017 Presented by: Syed Hafeez, Reza Noorani, Nancy Cleeland,

ACCESS: ACCESS BASICS PARTICIPATION PROJECT WV K-12 Education Problem WV Senate Problem TOPICS

The Local Transient Occupancy Tax & Online Rental Platforms Presented to The Online Home

How eBay Puts Big Data and Data Science to Work Mike

Majority is not the Answer: A Think-Aloud Study to Understand Factors Affecting Online Health

/3

Overview of Patent Retrieval Task at NTCIR-4 Atsushi Fujii (Univ. of Tsukuba) Makoto Iwayama

Visibility Software (800) 9149594 visibilitysoftware.com MAXIMIZING YOUR ROI WITH

Teams in Health Care Some Lessons from NASA on Norma A. Padrn, PhD Center for Health

Retail Marketing Enhancing your Business with SEO How to boost sales, drive footfall and

BASIC FACTS ABOUT IPC REFORM M. MAKAROV (WIPO) M. MAKAROV (WIPO) History of the IPC N

Alpha Presentation MyHumanaBot The Capstone Experience Team Humana Anthony Dionise Tynan Ford

Opportunities for pedagogic innovation: understanding the value of OER and Jorum Nicola Siminson

OUR SKILLS OUR SKILLS OUR SKILLS OUR SKILLS OUR SKILLS OUR SKILLS OUR SKILLS OUR SKILLS OUR

Building a Common Garden Network for High Throughput Seed Transfer Development Francis Kilkenny

Development Coordinating Council (SECDCC) Nov. 30, 2017 Agenda I. Welcome and Introductions

ACHIEVING THE VIS ISION Proposed In Innovation and Economic Vit itali lity 2014-2016 Work

Grain Farmers of Ontario John Cowan VP of Strategic Development Ontarios corn, soybean,

How-to AEDP Experiential language, essential AEDP interventions, self-disclosure, and

Fraud, Waste and Abuse: Compliance Program Section 4: National Provider Network Handbook

Online vs Offline Patterns of Disclosure of Adolescents and Adults: An Exploratory Study Rene

Resolving Potential Violations of the Stark Law Rob Stone Alston & Bird, LLP Advanced

Stephanie Farruggia, Department Chair of Special Education Nicole Smith- Special Education

Sambuz

Useful Links

Newsletter

Mail Us

TimeRank Personalising Web-search Results Using Time And Topic - PowerPoint PPT Presentation

TimeRank Personalising Web-search Results Using Time And Topic Gina Horscroft, Jordan Kadish, Tashiv Sewpersad Supervisor: Jivashi Nagar Co-Supervisor: Hussein Suleman Project Context What problem do we aim to address in the field of

WO-94 Google Smarter Tips 1 11/5/2014 Google Doodles Go to Doodles are the fun,

STAKEHOLDER WEBINAR OCTOBER 19, 2017 Presented by: Syed Hafeez, Reza Noorani, Nancy Cleeland,

ACCESS: ACCESS BASICS PARTICIPATION PROJECT WV K-12 Education Problem WV Senate Problem TOPICS

The Local Transient Occupancy Tax &amp; Online Rental Platforms Presented to The Online Home

How eBay Puts Big Data and Data Science to Work Mike

Majority is not the Answer: A Think-Aloud Study to Understand Factors Affecting Online Health

/3

Overview of Patent Retrieval Task at NTCIR-4 Atsushi Fujii (Univ. of Tsukuba) Makoto Iwayama

Visibility Software (800) 9149594 visibilitysoftware.com MAXIMIZING YOUR ROI WITH

Teams in Health Care Some Lessons from NASA on Norma A. Padrn, PhD Center for Health

Retail Marketing Enhancing your Business with SEO How to boost sales, drive footfall and

BASIC FACTS ABOUT IPC REFORM M. MAKAROV (WIPO) M. MAKAROV (WIPO) History of the IPC N

Alpha Presentation MyHumanaBot The Capstone Experience Team Humana Anthony Dionise Tynan Ford

Opportunities for pedagogic innovation: understanding the value of OER and Jorum Nicola Siminson

OUR SKILLS OUR SKILLS OUR SKILLS OUR SKILLS OUR SKILLS OUR SKILLS OUR SKILLS OUR SKILLS OUR

Building a Common Garden Network for High Throughput Seed Transfer Development Francis Kilkenny

Development Coordinating Council (SECDCC) Nov. 30, 2017 Agenda I. Welcome and Introductions

ACHIEVING THE VIS ISION Proposed In Innovation and Economic Vit itali lity 2014-2016 Work

Grain Farmers of Ontario John Cowan VP of Strategic Development Ontarios corn, soybean,

How-to AEDP Experiential language, essential AEDP interventions, self-disclosure, and

Fraud, Waste and Abuse: Compliance Program Section 4: National Provider Network Handbook

Online vs Offline Patterns of Disclosure of Adolescents and Adults: An Exploratory Study Rene

Resolving Potential Violations of the Stark Law Rob Stone Alston &amp; Bird, LLP Advanced

Stephanie Farruggia, Department Chair of Special Education Nicole Smith- Special Education

Sambuz

Useful Links

Newsletter

Mail Us

The Local Transient Occupancy Tax & Online Rental Platforms Presented to The Online Home

Resolving Potential Violations of the Stark Law Rob Stone Alston & Bird, LLP Advanced