Course Content Web Technologies and Applications • Introduction • Databases & WWW • Internet and WWW • SGML / XML Winter 2001 • Protocols • Managing servers • HTML and beyond • Search Engines CMPUT 499: Web Mining • Animation & WWW • Web Mining • Java Script • CORBA Dr. Osmar R. Zaïane • Dynamic Pages • Security Issues • Perl Intro. • Selected Topics • Java Applets • Projects University of Alberta Dr. Osmar R. Zaïane, 2001 Dr. Osmar R. Zaïane, 2001 Web Technologies and Applications University of Alberta 1 Web Technologies and Applications University of Alberta 2 2 Outline of Lecture 14 Objectives of Lecture 14 Web Mining Web Mining • Introduction to Data Mining • Get an overview about the functionalities • Introduction to Web Mining and the issues in data mining. – What are the incentives of web mining? • Understand the different knowledge – What is the taxonomy of web mining? discovery issues in data mining from the • Web Content Mining: Getting the Essence From Within World Wide Web. Web Pages. • Distinguish between resource discovery • Web Structure Mining: Are Hyperlinks Information? and Knowledge discovery from the Internet. • Present some problems and explore • Web Usage Mining: Exploiting Web Access Logs. cutting-edge solutions Dr. Osmar R. Zaïane, 2001 Dr. Osmar R. Zaïane, 2001 3 4 Web Technologies and Applications University of Alberta Web Technologies and Applications University of Alberta
We Are Data Rich but What Should We Do? Information Poor We are not trying to find the needle in the haystack because DBMSs know how to do that. Databases are too big We are merely trying to Data Mining can help understand the consequences of discover knowledge the presence of the needle, if it exists. Terrorbytes Dr. Osmar R. Zaïane, 2001 Dr. Osmar R. Zaïane, 2001 Web Technologies and Applications University of Alberta 5 Web Technologies and Applications University of Alberta 6 What Led Us To This? Evolution of Database Technology • 1950s : First computers, use of computers for census Necessity is the Mother of Invention • 1960s : Data collection, database creation (hierarchical and network models) • Technology is available to help us collect data Bar code, scanners, satellites, cameras, etc . � • 1970s : Relational data model, relational DBMS implementation. • Technology is available to help us store data • 1980s : Ubiquitous RDBMS, advanced data models (extended- � Databases, data warehouses, variety of repositories… • We are starving for knowledge (competitive edge, research, etc.) relational, OO, deductive, etc.) and application-oriented DBMS (spatial, scientific, engineering, etc.). • 1990s : Data mining and data warehousing, massive media We are swamped by data that continuously pours on us. 1. We do not know what to do with this data digitization, multimedia databases, and Web technology. 2. We need to interpret this data in search for new knowledge Notice that storage prices have consistently decreased in the last decades Dr. Osmar R. Zaïane, 2001 Dr. Osmar R. Zaïane, 2001 7 8 Web Technologies and Applications University of Alberta Web Technologies and Applications University of Alberta
A Brief History of Data Mining Research What Is Our Need? • 1989 IJCAI Workshop on Knowledge Discovery in Databases (Piatetsky-Shapiro) Knowledge Discovery in Databases Extract interesting knowledge (G. Piatetsky-Shapiro and W. Frawley, 1991) (rules, regularities, patterns, constraints) • 1991-1994 Workshops on Knowledge Discovery in Databases Advances in Knowledge Discovery and Data Mining from data in large collections. (U. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, 1996) • 1995-1998 International Conferences on Knowledge Discovery in Databases and Data Mining (KDD’95-98) Knowledge – Journal of Data Mining and Knowledge Discovery (1997) • 1998-2000 ACM SIGKDD’98-2000 conferences Data Dr. Osmar R. Zaïane, 2001 Dr. Osmar R. Zaïane, 2001 Web Technologies and Applications University of Alberta 9 Web Technologies and Applications University of Alberta 10 What kind of information are Data Collected (Con’t) we collecting? • Business transactions • Digital media • Scientific data • CAD and Software engineering • Medical and personal data • Virtual worlds • Surveillance video and pictures • Text reports and memos • Satellite sensing • The World Wide Web • Games Dr. Osmar R. Zaïane, 2001 Dr. Osmar R. Zaïane, 2001 11 12 Web Technologies and Applications University of Alberta Web Technologies and Applications University of Alberta
What are Data Mining and Many Steps in KD Process Knowledge Discovery? • Gathering the data together Knowledge Discovery: • Cleanse the data and fit it in together Process of non trivial extraction of implicit, previously unknown and • Select the necessary data potentially useful information from • Crunch and squeeze the data to large collections of data extract the essence of it • Evaluate the output and use it Dr. Osmar R. Zaïane, 2001 Dr. Osmar R. Zaïane, 2001 Web Technologies and Applications University of Alberta 13 Web Technologies and Applications University of Alberta 14 Data Mining: A KDD Process So What Is Data Mining? • In theory, Data Mining is a step in the knowledge – Data mining: the core of Pattern discovery process. It is the extraction of implicit knowledge discovery Evaluation process. information from a large dataset. • In practice, data mining and knowledge discovery Task-relevant are becoming synonyms. Data • There are other equivalent terms: KDD, knowledge extraction, discovery of regularities, patterns Selection and Data Warehouse Transformation discovery, data archeology, data dredging, business Data intelligence, information harvesting… Cleaning Data Integration • Notice the misnomer for data mining. Shouldn’t it be knowledge mining? Database s Dr. Osmar R. Zaïane, 2001 Dr. Osmar R. Zaïane, 2001 15 16 Web Technologies and Applications University of Alberta Web Technologies and Applications University of Alberta
KDD at the Confluence of Many Disciplines Data Mining: On What Kind of Data? DBMS Machine Learning Query processing Neural Networks Datawarehousing • Flat Files Agents Database Systems OLAP Artificial Intelligence Knowledge Representation … … • Heterogeneous and legacy databases Computer graphics Information Retrieval Indexing Human Computer • Relational databases Visualization Inverted files Interaction … 3D representation and other DB: Object-oriented and object-relational databases … High Performance Computing Statistics • Transactional databases Parallel and Statistical and Transaction(TID, Timestamp, UID, {item1, item2,…}) Distributed Mathematical Other Computing Modeling … … Dr. Osmar R. Zaïane, 2001 Dr. Osmar R. Zaïane, 2001 Web Technologies and Applications University of Alberta 17 Web Technologies and Applications University of Alberta 18 Data Mining: On What Kind of Data? Slice on January • Data warehouses The Data Cube and The Sub-Space Aggregates Q Q Red Deer 3 4 Q Q Lethbridge 1 2 Calgary Edmonton Edmonton By City By Time Group By Cross Tab By Time & City Category Q1 Q3 Q4 By Category Q2 Drama Drama Drama Comedy Comedy Comedy Electronics Horror Aggregate Horror Horror By Category & City Dice on By Time & Category By Time Sum Sum Sum Sum By Category January Electronics and Edmonton January Dr. Osmar R. Zaïane, 2001 Dr. Osmar R. Zaïane, 2001 19 20 Web Technologies and Applications University of Alberta Web Technologies and Applications University of Alberta
Data Mining: On What Kind of Data? Data Mining: On What Kind of Data? • Multimedia databases • Text Documents • Spatial Databases • The World Wide Web � The content of the Web • Time Series Data and Temporal Data � The structure of the Web � The usage of the Web Dr. Osmar R. Zaïane, 2001 Dr. Osmar R. Zaïane, 2001 Web Technologies and Applications University of Alberta 21 Web Technologies and Applications University of Alberta 22 What Can Be Discovered? Data Mining Functionality • Characterization : Summarization of general features of objects in a target class. What can be discovered depends (Concept description) upon the data mining task employed. Ex: Characterize grad students in Science • Discrimination : •Descriptive DM tasks Comparison of general features of objects between a target class and a Describe general properties contrasting class. (Concept comparison) Ex: Compare students in Science and students in Arts •Predictive DM tasks • Association : Infer on available data Studies the frequency of items occurring together in transactional databases. Ex: buys(x, bread) � buys(x, milk). Dr. Osmar R. Zaïane, 2001 Dr. Osmar R. Zaïane, 2001 23 24 Web Technologies and Applications University of Alberta Web Technologies and Applications University of Alberta
Recommend
More recommend