USI at the TREC 2015 Contextual Suggestion Track Mohammad - PowerPoint PPT Presentation

USI at the TREC 2015 Contextual Suggestion Track Mohammad Aliannejadi Seyed Ali Bahrainian Fabio Crestani University of Lugano November 19, 2015

Outline Introduction Overview Useful Information Gathering Profile Modeling Profile Enrichment Lack of Information Ranking Results Discussion Future work 2 of 19

Introduction • Task: Provide travel suggestions in new cities for visitors based on their personal interests in venues that they have visited • Two experiments: ◦ Live Experiment ◦ Batch Experiment • Our attempt: Batch experiment • 211 user profiles • 60 attractions the user has previously rated • 30 candidate suggestions to rank 3 of 19

Overview Our attempt for this track is done in four steps: 1. Useful information gathering 2. Profile modeling 3. Profile enrichment 4. Suggestion ranking 4 of 19

Useful Information Gathering • Analyze the URL collection: almost 9,000 URLs • Approximately half of the URLs are from known sources of information: Yelp, Foursquare, TripAdvisor • What to do with the other half?! ◦ Fetch URL and use its content to represent the place → not a good idea ✗ ◦ Locate the place in known sources of information → good idea ✓ • Try to make the information homogeneous: All from Yelp • Try to combine it with other sources of information: Foursquare and TripAdvisor 5 of 19

Useful Information Gathering (cont.) Steps for useful information gathering: 1. Fetch all given Yelp URLs 2. Locate Yelp profiles for all other attractions 3. Fetch located Yelp URLs 4. Use information on Yelp profiles to locate Foursquare and TripAdvisor profiles for each attraction 5. Scrape all fetched pages 6 of 19

Data Layout • Yelp ◦ Name ◦ Yelp URL ◦ Overall rating ◦ Categories ◦ Subcategories ◦ Reviews • Rating • Comment • Date • . . . ◦ . . . 7 of 19

Data Layout • Foursquare • Yelp ◦ . . . ◦ Name ◦ Tips ◦ Yelp URL ◦ Visits ◦ Overall rating ◦ Visitors ◦ Categories ◦ . . . ◦ Subcategories ◦ Reviews • TripAdvisor • Rating ◦ . . . • Comment ◦ Dining options • Date ◦ Rating summary • . . . ◦ Attraction ranking ◦ . . . ◦ . . . 7 of 19

Profile Modeling • We assume that user likes what others like about a place and vice versa • Find reviews with similar rating: ◦ Positive Profile: Reviews with rating 3 or 4 corresponding to places that user gave a similar rating ◦ Negative Profile: Reviews with rating 0 or 1 corresponding to places that user gave a similar rating • Train a classifier for each user • Features: Tf-idf score of each term 8 of 19

Profile Enrichment • To have a better idea of the user’s taste and interest we need to take into account their liked/disliked categories • It is not clear exactly which category or subcategory a user likes/dislikes. • In this example, we see the corresponding categories to three attractions a user likes: ◦ Pizzeria - Italian - Takeaway - Pizza ◦ Restaurant - Pasta - Pizza - Sandwich ◦ Restaurant - American - Pizza - Burger • The user likes Pizza , since it is the only category in common • We introduce a metric to model user interest 9 of 19

Profile Enrichment (cont.) • To model the user taste, we followed these steps: 1. For each category/subcategory for a place with positive rating 2. Add the category/subcategory to positive taste model 3. Compute its normalized frequency: cf ( category , user ) = count ( category , user ) � c count ( c , user ) 4. Do the same for places with negative rating to build negative taste model • Each category item in the positive or negative taste profile will have a score between 0 and 1 • A category may be in both positive and negative taste profiles 10 of 19

Lack of Information • There are some cases for which the system is unable to build positive/negative user profile → we adapt the scores • For example: How can we build a negative profile when there is no such review? • In such cases, we redefine positive and negative places and reviews • There is no negative reviews (0 or 1) Positive profile will be reviews with rating 4 Negative profile will be reviews with rating 3 • Doing so, we are still differentiating between places the user liked more and less. 11 of 19

Ranking • Our approach: To combine scores from user profile, user taste profile and other information: ◦ UP = Extract all the reviews and classify using the user profile classifier: Support Vector Machines (SVM) and Na¨ ıve Bayes ◦ UT = Assign a taste score to place by adding positive scores of all categories subtracted by all negative scores ◦ U 4 = Score given to the place based on Foursquare tips classifier ◦ UTA = Score given to the place based on TripAdvisor taste model ◦ Sc = ω 1 UP + ω 2 UT + ω 3 U 4 + ω 4 UTA 12 of 19

Results • We assigned weights ω 1 to ω 4 by doing cross-validation on UDel dataset: ω 1 = 1 , ω 2 = 1 , ω 3 = 0 . 3 , ω 4 = 0 . 3 • We submitted two runs: one using SVM classifier named 11 and one Na¨ ıve Bayes classifier named 22 : Runs P@5 MRR 11 0 . 5858 0 . 7404 22 0 . 5450 0 . 6991 TREC Median 0 . 5090 0 . 6716 13 of 19

Discussion • Parameters are tuned based on cross-validation on another dataset • It is not the optimal parameter set, but hopefully performs better than a random assignment. • User profile ( UP ) is the richest information source; thus, it has the highest weight ( ω 1 ). • Due to lack of reviews in some cases, user taste profile ( UT ) plays a significant role to achieve a better ranking. Therefore, it has the highest weight as well. • The other two terms are not as comprehensive as the first ones. Therefore, assigning high weights to them may have reverse result on overall performance. 14 of 19

Discussion (cont.) • Dataset in comprehensive and homogeneous: information plays a significant role. • The run with SVM classifier as user profile performed better. • Why? ◦ High dimensions ◦ Weighted features ◦ Sparse document vectors ◦ Text is usually linearly separable • Lack of reviews is compensated for by profile enrichment. 15 of 19

Discussion (cont.) 16 of 19

Discussion (cont.) • The plot shows the performance for the users who liked less than 10 places. • These users are considered to be more difficult to model. • When we are unable to build user profile, profile enrichment will be the decision maker. • The plot shows that in such cases, profile enrichment benefited our system comparing to TREC median. 17 of 19

Future work • Look into ways to find relation between the context and the candidate places. • Try to form a relation between the user tags and profiles to make user profile even richer. • Look more deeply into users with imbalanced distribution of reviews and try to find a solution for them. • Retune weights and add more information sources to the scoring algorithm using the real data. 18 of 19

Questions Thanks for your attention Mohammad Aliannejadi mohammad.alian.nejadi@usi.ch @maliannejadi 19 of 19

USI at the TREC 2015 Contextual Suggestion Track Mohammad - PowerPoint PPT Presentation

USI at the TREC 2015 Contextual Suggestion Track Mohammad Aliannejadi Seyed Ali Bahrainian Fabio Crestani University of Lugano November 19, 2015 Outline Introduction Overview Useful Information Gathering Profile Modeling Profile

Overview of TREC 2014 Ellen Voorhees Text REtrieval Conference (TREC) TREC 2014 Track

Regional Trec - September 27, 2015 - Cadogan Farms TREC Workshop April 2015 Regional TREC

Contextual Suggestion Track TREC Thaer Samar, Alejandro Bellogin, Jimmy Lin, Arjen P. de Vries,

AutoAdapt @ TREC 2010 Dyaa Albakour October 7, 2010 Dyaa Albakour AutoAdapt @ TREC 2010 The

TREC, TAC, takeoffs, tacks, tasks, and titillations for 2009 Ian Soboroff, NIST

Contextual Inquiry Take Aways Overview of Contextual Design Contextual inquiry

The Contextual Bandits Problem The Contextual Bandits Problem The Contextual Bandits Problem The

Search Evaluation at Grooveshark Yoni Teitelbaum 2013-07-02 Traditional Evaluation: TREC Image

Contextual Analysis SWEN-444 Contextual analysis Systematic analysis of contextual user work

SMT-based Function Summarization for Software Verification Martin Blicha Leonardo Alt Sepideh

Overview of TREC 2013 Ellen Voorhees Text REtrieval Conference (TREC) Back to our roots, writ

Text REtrieval Conference (TREC) TREC TRACKS Crowdsourcing Personal Blog, Microblog documents

Webis at the TREC 2012 Session track Matthias Hagen Martin Potthast Matthias Busse Jakob Gomoll

Trade and Inequality: A Suggestion and Research Gaps Guanghua Wan Suggestion: Add Poverty

Examining Our Budget and Examining Our Budget and Offering A Suggestion Offering A Suggestion

Contextual Advertising: Contextual Advertising: Semantic Approach Semantic Approach Ekaterina

GTC Data Privacy & Security Training November 3, 2017 Hosted by 1 SPECIAL THANKS TO ....

CS341: Project in Mining Massive Datasets Michele Catasta, Jure Leskovec, Jeffrey Ullman Agenda

Publishing AR Content, as Data on Web Persistent, Indexable, Composable mmocny@google.com Oct,

ETC5510: Introduction to Data Analysis ETC5510: Introduction to Data Analysis Week 6, part B

Midterm review Midterm: what you need to know Everything weve covered thus far (chapters 1

Public Agenda Item #7.1 Review ew o of 2019 ERS Accomplish shments August 21, 2019 Porter

Integrated care for people with HIV OHTN HIV Endgame Claire Kendall Presenter disclosure I

Written-out Talk Scripted Talker University of Careful Planning May 1, 1894 But vertical motion

Sambuz

Useful Links

Newsletter

Mail Us

USI at the TREC 2015 Contextual Suggestion Track Mohammad - PowerPoint PPT Presentation

USI at the TREC 2015 Contextual Suggestion Track Mohammad Aliannejadi Seyed Ali Bahrainian Fabio Crestani University of Lugano November 19, 2015 Outline Introduction Overview Useful Information Gathering Profile Modeling Profile

Overview of TREC 2014 Ellen Voorhees Text REtrieval Conference (TREC) TREC 2014 Track

Regional Trec - September 27, 2015 - Cadogan Farms TREC Workshop April 2015 Regional TREC

Contextual Suggestion Track TREC Thaer Samar, Alejandro Bellogin, Jimmy Lin, Arjen P. de Vries,

AutoAdapt @ TREC 2010 Dyaa Albakour October 7, 2010 Dyaa Albakour AutoAdapt @ TREC 2010 The

TREC, TAC, takeoffs, tacks, tasks, and titillations for 2009 Ian Soboroff, NIST

Contextual Inquiry Take Aways Overview of Contextual Design Contextual inquiry

The Contextual Bandits Problem The Contextual Bandits Problem The Contextual Bandits Problem The

Search Evaluation at Grooveshark Yoni Teitelbaum 2013-07-02 Traditional Evaluation: TREC Image

Contextual Analysis SWEN-444 Contextual analysis Systematic analysis of contextual user work

SMT-based Function Summarization for Software Verification Martin Blicha Leonardo Alt Sepideh

Overview of TREC 2013 Ellen Voorhees Text REtrieval Conference (TREC) Back to our roots, writ

Text REtrieval Conference (TREC) TREC TRACKS Crowdsourcing Personal Blog, Microblog documents

Webis at the TREC 2012 Session track Matthias Hagen Martin Potthast Matthias Busse Jakob Gomoll

Trade and Inequality: A Suggestion and Research Gaps Guanghua Wan Suggestion: Add Poverty

Examining Our Budget and Examining Our Budget and Offering A Suggestion Offering A Suggestion

Contextual Advertising: Contextual Advertising: Semantic Approach Semantic Approach Ekaterina

GTC Data Privacy &amp; Security Training November 3, 2017 Hosted by 1 SPECIAL THANKS TO ....

CS341: Project in Mining Massive Datasets Michele Catasta, Jure Leskovec, Jeffrey Ullman Agenda

Publishing AR Content, as Data on Web Persistent, Indexable, Composable mmocny@google.com Oct,

ETC5510: Introduction to Data Analysis ETC5510: Introduction to Data Analysis Week 6, part B

Midterm review Midterm: what you need to know Everything weve covered thus far (chapters 1

Public Agenda Item #7.1 Review ew o of 2019 ERS Accomplish shments August 21, 2019 Porter

Integrated care for people with HIV OHTN HIV Endgame Claire Kendall Presenter disclosure I

Written-out Talk Scripted Talker University of Careful Planning May 1, 1894 But vertical motion

Sambuz

Useful Links

Newsletter

Mail Us

GTC Data Privacy & Security Training November 3, 2017 Hosted by 1 SPECIAL THANKS TO ....