EnterpriseandDesktopSearch Lecture2:SearchingtheEnterprise Web - PowerPoint PPT Presentation

Enterprise and Desktop Search  Lecture 2:  Searching the Enterprise  Web  Pavel Dmitriev  Pavel Serdyukov  Sergey Chernov  Yahoo! Labs  University of  L3S Research Center  Sunnyvale, CA  Twente  Hannover  USA  Netherlands  Germany 

Outline  • Searching the Enterprise Web  – What works and what doesn’t (Fagin 03, Hawking 04)  • User Feedback in Enterprise Web Search  – Explicit vs Implicit feedback (Joachims 02, Radlinski  05)  – User AnnotaWons (Dmitriev 06, Poblete 08, Chirita 07)  – Social AnnotaWons (Millen 06, Bao 07, Xu 07, Xu 08)  – User AcWvity (Bilenko 08, Xue 03)  – Short‐term User Context (Shen 05, Buscher 07) 

Searching the Enterprise Web 

Searching the Workplace Web Ronald Fagin Ravi Kumar Kevin S. McCurley Jasmine Novak D. Sivakumar John A. Tomlin David P . Williamson IBM Almaden Research Center 650 Harry Road San Jose, CA 95120 • How is Enterprise Web different from the Public  Web?  – Structural differences  • What are the most important features for  search?  – Use Rank AggregaWon to experiment with different  ranking methods and features 

Enterprise Web vs Public Web:  Structural Differences  Structure of the Public Web [Broder 00] 

Enterprise Web vs Public Web:  Structural Differences  Structure of Enterprise Web [Fagin 03]  • ImplicaWons:  – More difficult to crawl  – DistribuWon of PageRank values is such that larger fracWon  of pages has high PR values, thus PR may be less effecWve  in discriminaWng among regular pages 

Rank AggregaWon  • Input: several ranked lists of objects  • Output: a single ranked list of the  union of all the objects which  minimizes the number of  “inversions” wrt iniWal lists  • NP‐hard to compute for 4 or more lists  • Variety of heurisWc approximaWons exist for  compuWng either the whole ordering or top k [Dwork  01, Fagin 03‐1]  Rank AggregaWon can also be useful in Enterprise Search for  combining rankings from different data source 

What are the most important  features?  • Create 3 indices: Content, Title, Anchortext  (aggregated text from the <a> tags poinWng to the  page)  • Get the results, rank them by l‐idf, and feed to the  ranking heurisWcs  • Combine the results  using                                      ✲ R Rank AggregaWon  Content Index a ✲ Title Index ✛ n • Evaluate all possible                                                        k ✲ Anchortext Index subsets of indices and                                         A ✲ PageRank ✲ g Result ✲ Indegree ✲ heurisWcs on very                                               g ✲ r ✲ Discovery date ✲ e frequent ( Q1 ) and                                                  ✲ g ✲ ✲ Words in URL a medium frequency  ( Q2 )                                      ✲ URL length ✲ t queries with manually                                          i ✲ URL depth ✲ o ✲ ✲ Discriminator n determined correct answers 

Results  I R 1 ( α ) I R 3 ( α ) I R 5 ( α ) I R 10 ( α ) I R 20 ( α ) α Ti 29 . 2 13 . 6 5 . 6 6 . 2 5 . 6 An 24 . 0 47 . 1 58 . 3 74 . 4 87 . 5 I Ri ( a ) is “influence” of the  Co 3 . 3 − 6 . 0 − 7 . 0 − 4 . 4 − 2 . 7 ranking metnod  a  Le 3 . 3 4 . 2 1 . 8 0 0 De − 9 . 7 − 4 . 0 − 3 . 5 − 2 . 9 − 4 . 0 Wo 3 . 3 0 − 1 . 8 0 1 . 4 Di 0 − 2 . 0 − 1 . 8 0 0 ObservaWons:  PR 0 13 . 6 11 . 8 7 . 9 2 . 7 In 0 − 2 . 0 − 1 . 8 1 . 5 0 Da 0 4 . 2 5 . 6 4 . 6 0 • Anchortext is by far the  most influenWal feature  I R 1 ( α ) I R 3 ( α ) I R 5 ( α ) I R 10 ( α ) I R 20 ( α ) α Ti 6 . 7 8 . 7 3 . 4 3 . 0 0 • Title is very useful, too  An 23 . 1 31 . 6 30 . 4 21 . 4 15 . 2 Co − 6 . 2 − 4 . 0 3 . 4 0 5 . 6 • Content is ineffecWve for  Le 6 . 7 − 4 . 0 0 0 − 5 . 3 De − 18 . 8 − 8 . 0 − 10 − 8 . 8 − 7 . 9 Q1 , but is useful for  Q2  Wo 6 . 7 − 4 . 0 0 0 0 Di − 6 . 2 − 4 . 0 0 0 0 • PR is useful, but does  PR 6 . 7 4 . 2 11 . 1 6 . 2 2 . 7 In − 6 . 2 − 4 . 0 0 0 0 not have a huge impact  Da 14 . 3 4 . 2 3 . 4 0 2 . 7

Challenges in Enterprise Search David Hawking CSIRO ICT Centre, GPO Box 664, Canberra, Australia 2601 David.Hawking@csiro.au 70.0 70.0 This study confirms  60.0 60.0 50.0 50.0 most of the findings if  P@1 (%) S@1 (%) 40.0 description description 40.0 description description URL words URL words URL words [Fagin 03] on 6 different  30.0 30.0 anchors anchors anchors anchors content content content content subject subject subject subject 20.0 20.0 Enterprise Webs  title title title title 10.0 10.0 (results for 4 datasets  CSIRO - 130 queries; 95,907 documentss Curtin Uni. - 332 queries; 79,296 documents are shown)  • Anchortext and Wtle  70.0 70.0 60.0 60.0 are sWll the best  50.0 50.0 S@1 (%) P@1 (%) 40.0 description description 40.0 description description • Content is also useful  URL words URL words URL words URL words 30.0 30.0 anchors anchors anchors anchors content content content content subject subject subject subject 20.0 20.0 title title title title 10.0 10.0 DEST - 62 queries; 8416 documents unimelb - 415 queries

Summary  • Enterprise Web and Public Web exhibit  significant structural differences  • These differences result in some features very  effecWve for web search not being so effecWve  for Enterprise Web Search  – Anchortext is very useful (but there is much less of  it)  – Title is good  – Content is quesWonable  – PageRank is not as useful 

Using User Feedback in   Enterprise Web Search 

Using User Feedback  • One of the most promising direcWons in  Enterprise Search  – Can trust the feedback (no spam)  – Can provide incenWves  – Can design a system to facilitate feedback  – Can actually implement it  • We will look at several different                     sources of feedback  – Clicks (very briefly)  – Explicit AnnotaWons  – Queries  – Social AnnotaWons  – Browsing Traces 

EnterpriseandDesktopSearch Lecture2:SearchingtheEnterprise Web - PowerPoint PPT Presentation

EnterpriseandDesktopSearch Lecture2:SearchingtheEnterprise Web PavelDmitriev PavelSerdyukov SergeyChernov Yahoo!Labs Universityof L3SResearchCenter Sunnyvale,CA

Enterprise and Desktop Search Lecture 5: Desktop Search and Personal Information Personal

Adit Enterprise. Adit Enterprise. Adit Enterprise. Adit Enterprise. ADIT Enterprise is a

Enterprise desktop at home with FreeIPA and GNOME Alexander Bokovoy ( abokovoy@redhat.com )

Desktop Capture 164.pdf Page 1 of 35 Made with Doceri Desktop Capture 164.pdf Page 2 of 35

The State of the Linux Desktop An OSDL Perspective John Cherry OSDL Desktop Linux (DTL)

Outline Searching Computers Computers Computers Topic 2: Searching Topic 2: Searching Topic

Search Engines Issues Avi Rappoport Search Tools Consulting Search Issues Enterprise Search

Chapter 5 Searching and Binary Search Trees 5.1 Searching sequence The purpose of searching :

Searching in speech Language and Keyword searching in OSCAR Language and Computers Computers

Linguistics 384: Language and Computers Operators Searching the web Topic 2: Searching

The K Desktop Environment (KDE) Page 1 We Shall be Covering ... Desktop environment The

CS101 Lecture 28: Searching Algorithms Linear Search Binary Search Aaron Stevens (azs@bu.edu)

Searching Tiziana Ligorio 1 Todays Plan Searching algorithms and their analysis 2

CSN08101 Digital Forensics Lecture 3: Linux Searching Lecture 3: Linux Searching Module Leader:

Searching Documents and Pages Searching Documents and Pages Searching Documents and Pages Prof.

Searching on Graphs November 16, 2016 CMPE 250 Graphs- Searching on Graphs November 16, 2016 1

a On the Management of Vehicular Traffic HYP2012 Massimiliano D. Rosini mrosini@icm.edu.pl

The current status of NRQCD descriptions of J / and system Jian-Xiong Wang Institute of

Reshaping the Global Economy Through Constructive Engagement By Jeffrey A. Sheehan Associate

CSI5180. MachineLearningfor BioinformaticsApplications Deep learning encoding and transfer

Guiding stars for physics beyond SM: Higgs boson and dark matter , Shou-hua Zhu ITP,

MERIT (A4001026) Trial Maraviroc versus Efavirenz, both with Zidovudine-Lamivudine MERIT

Causal Effect Moderation (Modification) When Treatment or Exposure is Time-Varying Daniel

Development of Mogamulizumab, a defucosylated anti-CCR4 humanized monoclonal antibody Michinori