Course Content Principles of Knowledge • Introduction to Data Mining Discovery in Data • Data warehousing and OLAP • Data cleaning Fall 2004 • Data mining operations Chapter 9: Web Mining • Data summarization • Association analysis Dr. Osmar R. Zaïane • Classification and prediction • Clustering • Web Mining • Multimedia and Spatial Data Mining University of Alberta • Other topics if time permits Dr. Osmar R. Zaïane, 1999-2004 Dr. Osmar R. Zaïane, 1999-2004 1 2 Principles of Knowledge Discovery in Data University of Alberta Principles of Knowledge Discovery in Data University of Alberta Outline Chapter 9 Objectives • Introduction to Web Mining Understand the different knowledge discovery – What are the incentives of web mining? – What is the taxonomy of web mining? issues in data mining from the World Wide • Web Content Mining: Getting the Essence From Web. Within Web Pages. Distinguish between resource discovery and • Web Structure Mining: Are Hyperlinks Information? Knowledge discovery from the Internet. • Web Usage Mining: Exploiting Web Access Logs. • Warehousing the Web Dr. Osmar R. Zaïane, 1999-2004 Dr. Osmar R. Zaïane, 1999-2004 Principles of Knowledge Discovery in Data University of Alberta 3 Principles of Knowledge Discovery in Data University of Alberta 4
WWW: Incentives WWW: Facts • No standards, unstructured and heterogeneous • Enormous wealth of information on web Internet growth • Growing and changing very rapidly 40000000 35000000 • The web is a huge, widely distributed collection of: 30000000 25000000 Hosts – One new WWW server every 2 hours 20000000 15000000 10000000 – Documents of all sorts ( static as well as dynamically 5000000 0 – 5 million documents in 1995 Sep-69 Sep-72 Sep-75 Sep-78 Sep-81 Sep-84 Sep-87 Sep-90 Sep-93 Sep-96 Sep-99 generated content and services) – 320 million documents in 1998 – Hyper-link information – More than 1 billion in 2000 – Access and usage information The Asilomar Report urges the database research – How many today? • Mine interesting nuggets of information leads to wealth community to contribute in of information and knowledge deploying new technologies Need for better resource for resource and • Challenge: Unstructured, huge, dynamic. discovery and information retrieval from the World-Wide Web. knowledge extraction . Dr. Osmar R. Zaïane, 1999-2004 Dr. Osmar R. Zaïane, 1999-2004 5 6 Principles of Knowledge Discovery in Data University of Alberta Principles of Knowledge Discovery in Data University of Alberta Web Mining WWW and its Problems • Web mining is the application of data mining techniques • Web: A huge, widely-distributed, highly heterogeneous, semi- and other means of extraction of knowledge for the structured, interconnected, evolving, hypertext/hypermedia integration of information gathered over the World information repository. Wide Web in all its forms: content, structure or usage. • Problems: The integrated information is useful for either: – Understanding on-line user behaviour; – the “ abundance ” problem: – Retrieving/consolidating relevant knowledge/resources; • 99 % of info of no interest to 99% of people – Evaluate the effectiveness of particular web sites or web-based – limited coverage of the Web: applications; • Web mining research integrates research from • hidden Web sources, majority of data in DBMS. Databases, Data Mining, Information retrieval, Machine – limited query interface based on keyword-oriented search learning, Natural language processing, software agent – limited customization to individual users communication, etc. Dr. Osmar R. Zaïane, 1999-2004 Dr. Osmar R. Zaïane, 1999-2004 Principles of Knowledge Discovery in Data University of Alberta 7 Principles of Knowledge Discovery in Data University of Alberta 8
Web Mining Taxonomy Challenges for Web Applications Web Mining • Finding Relevant Information (high-quality Web documents on a specified topic/concept/issue.) Web Content Web Structure Web Usage • Creating knowledge from Information available Mining Mining Mining • Personalization of the information Web Page General Access Customized Search Result • Learning about customers / individual users; Content Mining Pattern Tracking Usage Tracking Mining understanding user navigational behaviour; understanding on-line purchasing behaviour. Web Mining can play an important Role! Dr. Osmar R. Zaïane, 1999-2004 Dr. Osmar R. Zaïane, 1999-2004 9 10 Principles of Knowledge Discovery in Data University of Alberta Principles of Knowledge Discovery in Data University of Alberta Web Mining Taxonomy Web Mining Taxonomy Web Mining Web Mining Web Content Mining Web Content Web Structure Web Structure Mining Web Usage Web Usage Mining Mining Web Page Content Mining Mining Mining Web Page Summarization Web Page Content Mining WebLog ( Lakshmanan et.al. 1996 ), Search Result Mining WebOQL( Mendelzon et.al. 1998 ) …: Search Result General Access Customized General Access Customized Search Engine Result Web Structuring query languages; Pattern Tracking Usage Tracking Pattern Tracking Usage Tracking Mining Summarization Can identify information within given •Clustering Search Result ( Leouski web pages and Croft, 1996, Zamir and Etzioni, •Ahoy! ( Etzioni et.al. 1997 ):Uses heuristics 1997 ): to distinguish personal home pages from Categorizes documents using other web pages phrases in titles and snippets •ShopBot ( Etzioni et.al. 1997 ): Looks for product prices within web pages Dr. Osmar R. Zaïane, 1999-2004 Dr. Osmar R. Zaïane, 1999-2004 Principles of Knowledge Discovery in Data University of Alberta 11 Principles of Knowledge Discovery in Data University of Alberta 12
Web Mining Taxonomy Web Mining Taxonomy Web Mining Web Mining Web Content Web Usage Mining Mining Web Structure Mining Web Content Web Structure Web Usage Using Links Mining Mining Mining •Hypursuit ( Weiss et al. 1996 ) •PageRank ( Brin et al., 1998 ) General Access Pattern Tracking •CLEVER ( Chakrabarti et al., 1998 ) Web Page •Knowledge from web-page navigation ( Shahabi et al., 1997 ) Customized General Access Search Result Content Mining Usage Tracking Use interconnections between web pages to give •WebLogMining ( Zaïane, Xin and Han, 1998 ) Pattern Tracking Mining weight to pages. •SpeedTracer ( Wu,Yu, Ballman, 1998 ) Search Result Using Generalization •Wum ( Spiliopoulou, Faulstich, 1998 ) Web Page Customized Mining Content Mining Usage Tracking •MLDB ( 1994 ), VWV ( 1998 ) •WebSIFT ( Cooley, Tan, Srivastave, 1999 ) Uses a multi-level database representation of the Uses KDD techniques to understand general access patterns and trends. Can shed light on better structure Web. Counters (popularity) and link lists are used and grouping of resource providers as well as network for capturing structure. and caching improvements. Dr. Osmar R. Zaïane, 1999-2004 Dr. Osmar R. Zaïane, 1999-2004 13 14 Principles of Knowledge Discovery in Data University of Alberta Principles of Knowledge Discovery in Data University of Alberta Web Mining Taxonomy Outline • Introduction to Web Mining Web Mining – What are the incentives of web mining? – What is the taxonomy of web mining? Web Content Web Structure Web Usage • Web Content Mining: Getting the Essence From Mining Mining Mining Within Web Pages. Web Page General Access Customized Usage Tracking Content Mining Pattern Tracking • Web Structure Mining: Are Hyperlinks Information? •Adaptive Sites ( Perkowitz & Etzioni, 1997 ) Search Result • Web Usage Mining: Exploiting Web Access Logs. Analyzes access patterns of each user at a time. Mining Web site restructures itself automatically by learning from user access patterns. • Warehousing the Web •Personalization (SiteHelper: Ngu & Wu, 1997. WebWatcher: Joachims et al, 1997. Mobasher et al., 1999). Provide recommendations to web users. Dr. Osmar R. Zaïane, 1999-2004 Dr. Osmar R. Zaïane, 1999-2004 Principles of Knowledge Discovery in Data University of Alberta 15 Principles of Knowledge Discovery in Data University of Alberta 16
Recommend
More recommend