Thoughts on Federated and Aggregated Search Architectures � Information Scatter � • � Internal � • � Local and remote file shares � • � Email � • � CMS /DMS � • � Application portals � • � Knowledge bases � • � Multimedia, digital assets � • � External � • � Research papers and other gated content � Enterprise Search Summit 2010 NY � Avi Rappoport, Search Tools Consulting � • � Public-facing sites � www.searchtools.com / consult2@searchtools.com � • � The Web � Avi Rappoport / Enterprise Search Summit NY / May 2010 / consult2@searchtools.com � 2 � !" Thoughts on Federated and Aggregated Search Architectures � Thoughts on Federated and Aggregated Search Architectures � Federated Search Diagram � Solution 1: Federate Searching � • � aka “MetaSearch” � • � Single Search Interface � • � Accepts queries and converts to various formats � • � Sends queries to multiple external search engines � • � Includes user authentication � • � Collects result lists � • � In external search relevance order � • � Collates and sorts by relevance � • � Single list or in panels � • � Can be dynamically updated � #$"%&'"()&*"+,-.,&&/.,-0".)&*&,-1234" Avi Rappoport / Enterprise Search Summit NY / May 2010 / consult2@searchtools.com � Avi Rappoport / Enterprise Search Summit NY / May 2010 / consult2@searchtools.com � 3 � !" !" 4 �
Thoughts on Federated and Aggregated Search Architectures � Thoughts on Federated and Aggregated Search Architectures � Federated Results: Apple Site � Federated Results: Science.gov � Special sources � Product info � Store Items � Main Results � Support Dynamic facets pages � from results clustering � Avi Rappoport / Enterprise Search Summit NY / May 2010 / consult2@searchtools.com � Avi Rappoport / Enterprise Search Summit NY / May 2010 / consult2@searchtools.com � !" 5 � !" 6 � Thoughts on Federated and Aggregated Search Architectures � Thoughts on Federated and Aggregated Search Architectures � Federate at Search Time � Solution 2: Aggregate Indexing � aka “Unified Information Access” � • � Gather all possible data � • � Robot crawlers on intranets � • � RSS blog feeds � • � Automated connectors � • � Custom scripts � • � Store in a single index � • � Include access control information � • � Simple to search all at once � graphic by AJ Summers � some rights reserved � Avi Rappoport / Enterprise Search Summit NY / May 2010 / consult2@searchtools.com � Avi Rappoport / Enterprise Search Summit NY / May 2010 / consult2@searchtools.com � 8 � !" 7 � !"
Thoughts on Federated and Aggregated Search Architectures � Thoughts on Federated and Aggregated Search Architectures � Aggregate At Index Time � Aggregated Search Diagram � #$"%&'"()&*"+,-.,&&/.,-0".)&*&,-1234" graphic by AJ Summers � some rights reserved � Avi Rappoport / Enterprise Search Summit NY / May 2010 / consult2@searchtools.com � Avi Rappoport / Enterprise Search Summit NY / May 2010 / consult2@searchtools.com � !" 9 � !" 10 � Thoughts on Federated and Aggregated Search Architectures � Thoughts on Federated and Aggregated Search Architectures � Aggregate Results: HP.com � Sources of Content � Federating � Aggregating � • � Lotus Notes � • � Enterprise intranets � • � News feeds and archives � • � File servers � • � Legal: Westlaw, Lexis � • � Sharepoint � • � Government Documents � • � CMS/DMS � • � Patents, Census � • � usually have awful search � • � Multi-national materials � • � Data warehouses � • � Academic journal portals � • � Current CRM � • � Large social networks � • � Legal discovery � • � The Web � Avi Rappoport / Enterprise Search Summit NY / May 2010 / consult2@searchtools.com � Avi Rappoport / Enterprise Search Summit NY / May 2010 / consult2@searchtools.com � 12 � !" 11 � !"
Thoughts on Federated and Aggregated Search Architectures � Thoughts on Federated and Aggregated Search Architectures � Keeping Content Current � Preparation Process � Federating � Aggregating � Federating: � Aggregating: � • � Analyze sources � • � Index � • � Source internal updates � • � Depend on connectors � • � Test search connectors � • � Match data connectors � • � Near-real-time � • � Frequent polling � • � Store source information � • � Open each file or record � • � Content may change � • � Automated notification � • � Tokenize, stem words � • � Programmatic triggers � • � Open Archives Initiative � • � Between queries � • � De-duplicate � • � Re-crawling � • � Taxonomy � • � No notification � • � Store words and documents � • � Merging Updates � • � Minimal bandwidth � • � Scale issues � • � Scale issues � • � Hardware and software � • � Content may change � • � Bandwidth requirements � • � Between index runs � • � Some notification � Avi Rappoport / Enterprise Search Summit NY / May 2010 / consult2@searchtools.com � 13 � Avi Rappoport / Enterprise Search Summit NY / May 2010 / consult2@searchtools.com � 14 � !" !" Thoughts on Federated and Aggregated Search Architectures � Thoughts on Federated and Aggregated Search Architectures � Security & Access Control � Search and Retrieval Process � Federating � Aggregating � Federating � Aggregating � • � Send user credentials with • � Early-binding � • � Convert & send query � • � Single syntax � query � • � Index and store ACL info � • � z39.50, RDW � • � No delay � • � Depends on source security � • � Update index on changes � • � HTTP, Web Services � • � Results in standard format � • � Automation can be hard � • � Late-binding � • � OAI - Open Archives � • � Cache frequent results � • � Always current � • � Custom connectors � • � For each result � • � Network and source speed � • � Send authorization request � • � OK if item is allowed � • � Collect results � • � Repeat until 10 items are • � Standardized formats, XML � allowed � • � Screen-scraping � • � Cache frequent results � Avi Rappoport / Enterprise Search Summit NY / May 2010 / consult2@searchtools.com � 15 � Avi Rappoport / Enterprise Search Summit NY / May 2010 / consult2@searchtools.com � 16 � !" !"
Thoughts on Federated and Aggregated Search Architectures � Thoughts on Federated and Aggregated Search Architectures � Relevance Ranking � Checklist Per Data Source � • � Federation � Federating � Aggregating � • � Source interaction at search time � • � Combine results listings � • � One results listing � • � External content, databases, channels � • � Duplicate detection here � • � Still may need de-duping � • � Overall source relevance � • � Get single relevance rank � • � Always-current content and access control � • � May re-rank � • � Very fast � • � Slower response time, tricky relevance � • � IDF: Inverse Document • � Source ranking is quirky � • � Aggregation � Frequency � • � Based on metadata � • � Rare words in index � • � User activities � • � Source interaction at index time � • � Boost for more matches � • � Very large index files � • � Content and access control updates trickier � • � Fast response time, straightforward relevance � Avi Rappoport / Enterprise Search Summit NY / May 2010 / consult2@searchtools.com � 17 � Avi Rappoport / Enterprise Search Summit NY / May 2010 / consult2@searchtools.com � 18 � !" !" Thoughts on Federated and Aggregated Search Architectures � Be open-minded, analyze the benefits of each approach for each data source. � Aggregator � Post or blog about your experiences � !" 19 � Avi Rappoport / Enterprise Search Summit NY / May 2010 / consult2@searchtools.com �
Recommend
More recommend