eyeshot multimedia search engine multimedia search engine
play

eyeShot Multimedia Search Engine Multimedia Search Engine eyeShot - PowerPoint PPT Presentation

eyeShot Multimedia Search Engine Multimedia Search Engine eyeShot Extracting text patterns from the WWW Extracting text patterns from the WWW to characterize Multimedia Resources to characterize Multimedia Resources A project by


  1. eyeShot Multimedia Search Engine Multimedia Search Engine eyeShot “Extracting text patterns from the WWW “Extracting text patterns from the WWW to characterize Multimedia Resources” to characterize Multimedia Resources” A project by A project by Demetris Zeinalipour & Theodoros Folias Demetris Zeinalipour & Theodoros Folias Online Demo: http://www.cs.ucr.edu/~ csyiazti/eyeshot/ Online Demo: http://www.cs.ucr.edu/~ csyiazti/eyeshot/

  2. The Problem The Problem • Many multimedia resources on the WWW which are not indexed by the various WWW Search Engines efficiently. It is estimated that the 1/3 of the web consist of only Images. • Many proprietary solutions 1. Specific Image Engines (WebSeer, GoogleImage, WeebSeek etc) 2. Specific Streaming Audio Search Engines (SpeechBot search an archive of 6,500 hours of online radio-show transcripts). 3. Specific MP3 engines (SingingFish.com has developed what it claims to be the largest index of MP3 files). • These solutions are content specific. The search in the multimedia file is achieved by content processing techniques, e.g. image processing, Audio to text converters etc. • But the formats targeted for the Web are growing exponentially. Will we need to design a proprietary search engine for each particular resource? Scalability?

  3. The Solution The Solution We argue that the WWW was designed for text . All other resources are • supplementary. • This means that someone who publishes a multimedia resource will of course first have an html page from where the particular resource will be linked. Our Approach: Analyze the text based pages which link to that • particular resource and try to characterize the particular multimedia resource. • This Solutions scales up as the number of file formats increases since it makes no use of resource-specific details

  4. Design & Implementation Details Design & Implementation Details The Title < title> NBA … Jordan< title> Captions and bold text < h2> < b> Michael Jordan< /b> < h2> Surrounding Text { # 23, Brooklyn, North, Carolina, Washington, Wizards, College…} The resource is characterized by the above characteristics. The overall rank of that resource is identified by the importance of the page that hosts the resource

  5. eyeShot Multimedia Search Engine WebRACE High Performance Crawler 1 A A A A Crawler r r r crawl (URL, r URL-Queue Tape depth) W seed.txt URL fetchers request queue W c c c c W crawl (URL, depth-1, owners) getState (URL) Object Cache h h h h Video store (URL) i i i i filter meta-info store (URL) t t t t read index cache (metainfo) e e e e eyeShot eyeShot Index Server 2 Web eyeShot offline eyeShot Page Processor c c c c UI Lexicon Generator Coordinator HTML Filtering Processors Lexicon t t t t index request queue Object Validation add(URL,{keywords}) u u u u validate index Lexicon (URL) index Web Client Object r r r r index Web Client e e e e Web Client add (keword,{urls}) load Index Object Validation Object store Lexicon store store

  6. Online Demo Demonstration Online Demo Demonstration Target Web : www.cs.ucr.edu Target Web : www.cs.ucr.edu http:/ / www.cs.ucr.edu/ ~ csyiazti/ eyeshot/ http:/ / www.cs.ucr.edu/ ~ csyiazti/ eyeshot/

  7. Online Demo Demonstration Online Demo Demonstration Target Web : www.cs.ucr.edu Target Web : www.cs.ucr.edu Lookup time = 7 milliseconds http:/ / www.cs.ucr.edu/ ~ csyiazti/ eyeshot/ http:/ / www.cs.ucr.edu/ ~ csyiazti/ eyeshot/

Recommend


More recommend