Web Mining For Knowledge Discovery Using Ontology as a Background knowledge
Searching using Google’s rules • Google searches for all words • Google ignores many common words (stop words) • Google finds results anywhere in a document, not just in its text • Google returns pages ordered by PageRank, a measure that Google uses to gauge a page’s popularity • Proximity matters • Simple Google searches are limited to ten keywords • Google finds its results depending on words that occur in Web pages, not by analyzing your search phrase for its meaning
Search Engine Optimization • Search Engine Optimization (SEO) is the black magic, craft, or art (depending upon whom you ask) of writing or editing Web pages and sites so that they move up in search engine rankings and are returned at the top of a list of search results. • This is an important subject because if a Web page is not in the top search results, very few people can find it. Webmasters want to know about SEO to improve their rankings and increase traffic to their sites. • As a general rule, people don’t look past the first three pages (or 30 listings) of search results
SEO & Google Recommendations • Determine the most important keywords that are relevant to your content and use them to titles, URL, Heading, and image tags on each page. • Pages with content that is often renewed tend to get more attention than pages that don’t have anything new • Simple site designs are better than busy pages. • Create links from your pages out to relevant, popular Web pages (Outbound Links) • Request that sites that have content related to your pages link to you (Inbound Links)
Unwanted Results • SPAM Pages • Commercial Pages • Error Pages • Login Pages
Occurrence Operators
Synonym Operator • When you place the synonym operator, ~, directly in front of a search term (without any spaces), the search matches Web synonyms as well as the given search term • Google does not use a synonym lookup table, or a thesaurus. Instead, synonyms are determined by Web usage of the term. • Accordingly, This method of discovering synonyms sometimes leads to some pretty weird results. (try ~patient, ~zebra, and ~cheap)
Interpreting User Query • Part of the Semantic Web vision is to provide web-scale access to semantically described content. • In particular, this implies understanding users’ information needs accurately enough to allow for retrieving a precise answer using semantic technologies. • Currently, most web search engines are however based on purely statistical techniques.
Interpreting User Query • For restricted domains which can be formalized using ontologies, there is nevertheless hope that semantic technologies can be put into work to allow for more semantics based search • Users are definitely used to express their information need via simple queries based on keywords. • There is substantial recent work on interpreting full natural language questions semantically w.r.t. an ontology
Available Approaches • Approaches for interpreting keyword queries using background knowledge available in ontologies. • One approach translates a keyword query into a DL conjunctive query which can be evaluated with respect to an underlying knowledge base (KB) • Another approach exists work on the translation of keywords to XML-based queries, e.g. to interpret keywords as X- Queries on XML data.
Available Contribution • there has already been work on translating keywords to semantic queries. The approach proposes to map keywords to corresponding WordNet synsets. • SemSearch also aim at answering complex keyword queries by translating them into a logical query.
Available Contributions • My approach is divided into three folds. The first part, like previous approaches, translate user keywords into formal query, using ontology. They use ready made ontology. My contribution attempts to automatically or semi-automatically build domain ontology using the same search results or using other search results attempted beforehand. • The second fold is to use the built ontology to build a hierarchical structure for the results. This structure is built from the keywords on extracted from the results that maps to their counterparts in the ontology.
My Approach • Bush->Afghanistan->Democracy , Violence • Bush-> Afghanistan->NATO Secretary General • Bush-> Afghanistan->Sending Troops – Bush-> Afghanistan->Sending Troops-> More Troops by 2009 – Bush-> Afghanistan->Sending Troops-> Casualty of War
My Approach • The final fold is summarization • Summarization would be based on document level and on collection of document level
Recommend
More recommend