Search Engines Issues Avi Rappoport Search Tools Consulting Search - PowerPoint PPT Presentation

Search Engines Issues Avi Rappoport Search Tools Consulting

Search Issues •Enterprise Search Engines •Corporate and institutional sites •E-commerce •Intranets •P2P, Meta search and distributed search •CMSs and Search Engines •Security and Search

P2P Search •Address the centralized index problem •Everyone serves their content •Gnutella and FreeNet (MP3s) •OpenCOLA •scientific collaborations •auctions •Does not scale •Problems with completeness •Privacy issues - what to share?

Meta Search •Send queries to several sources •text search engines •databases •email •Extract text from result •Display all together •Successful on the Web •Problems with “screen scraping” •Problems with relevance ranking

Distributed Search •Common language for query & response •Transport mechanism (HTTP) •Basic query syntax •Single relevance score range •Maybe standard algorithm •Results with XML •Deal with the “Best Sources” issue

Past Implementations •Z39.50 •Pioneer, for better and worse •Too complex, never finished •Limited to speed of slowest server •Harvest •Early web system •Stanford STARTS & LORE

Protocols •JXTA •Java distributed system at Sun •XQuery •XML equivalent to SQL •no relevance ranking •Open Archives Meta data •export meta data about collections •address “best source” issues •Google APIs

Current Projects •Science.gov •Access to public databases •Commercial Products •Verity Federated Search •Intelliseek, translates to SQL •Library Systems •MuseGlobal

Future •Centralized search engines will index databases and other silos •More meta search •Complex databases •Integrating library content •Distributed search protocols •Libraries are pioneers •Middleware interpreters •Sit between search and dbs •Index and search time

Search & CMS •CMS: Content Mangement System •Related to document management •Templates •Workflow •Editorial accountability •Publishing

Search & CMS •Navigation links are not enough •Labels can be confusing •Categories often limiting •Search allows ad-hoc access •Other ways of finding •Wide variety in use of language •Integrate CMS-generated pages with other content •Avoid becoming data silos

Improve search •Synchronize indexing & publishing •Everything is current •Only unique pages •Duplicate pages a big problem for robots •Content only •No indexing of navigation text •Actual content modification date •Web servers often lie •Require page titles

Meta Data •CMSs simplify meta data entry •Use the Dublin Core •Automate some meta tags •Author, department •Language & character set •Subject tags •Use controlled vocabulary •Category "facets" •Non-hierarchical attributes •Based on content

CMSs With Search •Commercial •Atomz Publish ASP •divine Eprise •Microsoft Site Server •Plumtree •Vignette •Open Source •OpenCMS •Red Hat CMS •Zope

External Search •Integrate CMS content •Search together with intranet, external content •Indexing •Robot crawler •CMS API for indexing •Syndication publishing •RSS 1.0 •ICE •Two features for one

Search & security •Content security •Private data types •Access control issues •Results with teaser content •Hiding inaccessible results

Types of Private Data •Personal Records •Financial, legal, health, academic, employment, etc. •Special case, very difficult •Research and analysis •Business discussions •Sales proposals •Licensed content •Personal files and email

Protect Privacy •Search should never expose private data to public view •Use HTTPS encryption in transit •Indexer client •Serving search results •Secure the index file and server against intrusion

Access Control •Basic Authentication •User name and password •Lightweight security •Indexer can store and issue •File-based permissions for users and groups •Windows NT Challenge & Response •LDAP authorization systems •Others...

Indexing access •Search indexer •Becomes a “user” •Member of all relevant groups •Indexer must send passwords or certificates •Store flag for the protected documents

Results as Teasers •Show protected documents in search results •Among public pages •In a separate section •Encourage payments or subscriptions •Encourage registration •Intranets •Limited-access databases •Other departments

Why Restrict? •Showing in results is vulnerable to reverse engineering •Example: search for “merger” •If protected pages are displayed •Employee or outsider can search for merger candidates •Gleaning information from the existence of results

Permissions in Index •Store the access permissions •Mark for each document in the index •Search engine checks before displaying •Very fast at retrieval •Index must be always current •Good with CMS integration •Replicate access control functionality

Results-Time Check •Work with access control system •Ask about top batch of results •Send user credentials and document info •Ask if they’re allowed to see it •Always current •Can be a bit slow •Can perform parallel requests •Show results as they come back

Conclusions •Meta and distributed search provide access to external content •Indexing CMS content can be powerful and timely •Search should never expose private data •Integrate search with access control More search info: www.searchtools.com

Search Engines Issues Avi Rappoport Search Tools Consulting Search - PowerPoint PPT Presentation

Search Engines Issues Avi Rappoport Search Tools Consulting Search Issues Enterprise Search Engines Corporate and institutional sites E-commerce Intranets P2P, Meta search and distributed search CMSs and Search Engines

Set 10 Search Engines & SEO Outline How do search engines work? Basic operation

Set 10 Search Engines & SEO Outline How do search engines work? Basic operation

Set11 Search Engines & SEO Outline How do search engines work? Basic operation

NCC Education and You Study and Communication Skills Your Name Internet Search Engines Date

Game Engines 1 Overview Game engines are a significant part of the modern games industry

The Overview of Web Search Engines Presented by Sunny Lam Outline Introduction Information

1 A Comparison of Open Source Search A Comparison of Open Source Search Engines Engines

The Economics of Internet Search Hal R. Varian Sept 31, 2007 Search engine use Search

Engines Previously We talked about the motivation behind vertical search engines,

EPAs Air Quality Regulations for Stationary Engines for Stationary Engines Melanie King U.S.

Why learn how to build recommendation engines? Jamen Long Data Scientist DataCamp Building

Network Query Engines Network Query Engines Craig Knoblock USC Information Sciences Institute 1

Imagine for a moment @trentmwillis Lazy Loading Engines: Anything But Lazy Engines allow

CS490W Without search engines the web wouldnt scale The acceptance of search interaction makes

search engine optimization ABOUT ME HOLISTIC SEARCH 2.0 ECOSYSTEM eRetail Search Platform

Tabu Search Search Tabu Page 1 Part I Part I Tabu Search Principles Search Principles Tabu

The Latest Progress of Chinese-Italian Culture Heritage Project - presented by Sijin QIAN (Peking

Parameterized Complexity of Kemeny Rankings Nadja Betzler Friedrich-Schiller-Universit at Jena

Lecture 1: Semantic Web and RDF Aidan Hogan aidhog@gmail.com THE WEB The Web is now 26 years

Federated Search Diagram Solution 1: Federate Searching aka MetaSearch

Web Engineering availability and reliability = money large global enterprises, e.g. Prof.

How custom is too custom? Tips for coding (and when not too) Brock Fanning brockfanning on

Mobile Web Basics Joan Boone jpboone@email.unc.edu Slide 1 Topics Part 1: Viewports Part 2:

ZEBRA TECHNOLOGIES ZEBRA TECHNOLOGIES DevTalk - Enterprise Browser 2.5 Darryn Campbell SW

Sambuz

Useful Links

Newsletter

Mail Us

Search Engines Issues Avi Rappoport Search Tools Consulting Search - PowerPoint PPT Presentation

Search Engines Issues Avi Rappoport Search Tools Consulting Search Issues Enterprise Search Engines Corporate and institutional sites E-commerce Intranets P2P, Meta search and distributed search CMSs and Search Engines

Set 10 Search Engines &amp; SEO Outline How do search engines work? Basic operation

Set 10 Search Engines &amp; SEO Outline How do search engines work? Basic operation

Set11 Search Engines &amp; SEO Outline How do search engines work? Basic operation

NCC Education and You Study and Communication Skills Your Name Internet Search Engines Date

Game Engines 1 Overview Game engines are a significant part of the modern games industry

The Overview of Web Search Engines Presented by Sunny Lam Outline Introduction Information

1 A Comparison of Open Source Search A Comparison of Open Source Search Engines Engines

The Economics of Internet Search Hal R. Varian Sept 31, 2007 Search engine use Search

Engines Previously We talked about the motivation behind vertical search engines,

EPAs Air Quality Regulations for Stationary Engines for Stationary Engines Melanie King U.S.

Why learn how to build recommendation engines? Jamen Long Data Scientist DataCamp Building

Network Query Engines Network Query Engines Craig Knoblock USC Information Sciences Institute 1

Imagine for a moment @trentmwillis Lazy Loading Engines: Anything But Lazy Engines allow

CS490W Without search engines the web wouldnt scale The acceptance of search interaction makes

search engine optimization ABOUT ME HOLISTIC SEARCH 2.0 ECOSYSTEM eRetail Search Platform

Tabu Search Search Tabu Page 1 Part I Part I Tabu Search Principles Search Principles Tabu

The Latest Progress of Chinese-Italian Culture Heritage Project - presented by Sijin QIAN (Peking

Parameterized Complexity of Kemeny Rankings Nadja Betzler Friedrich-Schiller-Universit at Jena

Lecture 1: Semantic Web and RDF Aidan Hogan aidhog@gmail.com THE WEB The Web is now 26 years

Federated Search Diagram Solution 1: Federate Searching aka MetaSearch

Web Engineering availability and reliability = money large global enterprises, e.g. Prof.

How custom is too custom? Tips for coding (and when not too) Brock Fanning brockfanning on

Mobile Web Basics Joan Boone jpboone@email.unc.edu Slide 1 Topics Part 1: Viewports Part 2:

ZEBRA TECHNOLOGIES ZEBRA TECHNOLOGIES DevTalk - Enterprise Browser 2.5 Darryn Campbell SW

Sambuz

Useful Links

Newsletter

Mail Us

Set 10 Search Engines & SEO Outline How do search engines work? Basic operation

Set 10 Search Engines & SEO Outline How do search engines work? Basic operation

Set11 Search Engines & SEO Outline How do search engines work? Basic operation