summary of last chapter principles of knowledge discovery
play

Summary of Last Chapter Principles of Knowledge Discovery in - PDF document

Summary of Last Chapter Principles of Knowledge Discovery in Databases What kind of information are we collecting? What are Data Mining and Knowledge Discovery? Fall 1999 What kind of data can be mined? Chapter 2: Data Warehousing


  1. Summary of Last Chapter Principles of Knowledge Discovery in Databases • What kind of information are we collecting? • What are Data Mining and Knowledge Discovery? Fall 1999 • What kind of data can be mined? Chapter 2: Data Warehousing and OLAP • What can be discovered? Dr. Osmar R. Zaïane • Is all that is discovered interesting and useful? • How do we categorize data mining systems? • What are the issues in Data Mining? University of Alberta • Are there application examples?  Dr. Osmar R. Zaïane, 1999 Principles of Knowledge Discovery in Databases University of Alberta 1  Dr. Osmar R. Zaïane, 1999 Principles of Knowledge Discovery in Databases University of Alberta 2 Course Content Chapter 2 Objectives • Introduction to Data Mining • Data warehousing and OLAP Realize the purpose of data warehousing. • Data cleaning • Data mining operations Comprehend the data structures behind data • Data summarization • Association analysis warehouses and understand the OLAP • Classification and prediction technology. • Clustering • Web Mining Get an overview of the schemas used for • Similarity Search multi-dimensional data. • Other topics if time permits  Dr. Osmar R. Zaïane, 1999 Principles of Knowledge Discovery in Databases University of Alberta 3  Dr. Osmar R. Zaïane, 1999 Principles of Knowledge Discovery in Databases University of Alberta 4 Data Warehouse and OLAP Incentive for a Data Warehouse Outline • Businesses have a lot of data, operational data and facts. • This data is usually in different databases and in different • What is a data warehouse and what is it for? physical places. • What is the multi-dimensional data model? • Data is available (or archived), but in different formats and locations. (heterogeneous and distributed). • What is the difference between OLAP and OLTP? • What is the general architecture of a data warehouse? • How can we implement a data warehouse? • Decision makers need to access information (data that has been • Are there issues related to data cube technology? summarized) virtually on one single site. • Can we mine data warehouses? • This access needs to be fast regardless of the size of the data, and how old the data is.  Dr. Osmar R. Zaïane, 1999  Dr. Osmar R. Zaïane, 1999 Principles of Knowledge Discovery in Databases University of Alberta 5 Principles of Knowledge Discovery in Databases University of Alberta 6 1

  2. What Is Data Warehouse? Evolution of Decision Support Systems • A data warehouse consolidates different data sources. 1970s 1990s 1960s 1980s • A data warehouse is a database that is different and maintained Terminal-based Data Warehousing and B Desktop Data Analysis Tools a Decision Support Systems On-Line Analytical Processing R t separately from an operational database. c e h p o a r n • A data warehouse combines and merges information in a consistent t i d n g M database (not necessarily up-to-date) to help decision support. a n u a l • Statistician • Computer scientist Difficult and limited • Data Analyst • Data Analyst Decision support systems access data warehouse and queries highly Inflexible and Flexible integrated do not need to access operational databases � do • Executive specific to some non-integrated spreadsheets. Integrated tools distinctive needs not unnecessarily over-load operational databases. tools Slow access to Data Mining operational data  Dr. Osmar R. Zaïane, 1999 Principles of Knowledge Discovery in Databases University of Alberta 7  Dr. Osmar R. Zaïane, 1999 Principles of Knowledge Discovery in Databases University of Alberta 8 Definitions Definitions (con’t) Data Warehouse is a subject-oriented, integrated, time-variant and non-volatile collection of data in support of management’s Data Warehousing is the process of constructing and using decision making process. (W.H. Inmon) data warehouses. Subject oriented: oriented to the major subject areas of the corporation that have been defined in the data model. A corporate data warehouse collects data about subjects Integrated: data collected in a data warehouse originates from different heterogeneous data sources. spanning the whole organization. Data Marts are specialized, single-line of business warehouses. They collect data for a Time-variant: The dimension “time” is all-pervading in a data warehouse. department or a specific group of people. The data stored is not the current value, but an evolution of the value in time. Non-volatile: update of data does not occur frequently in the data warehouse. The data is loaded and accessed.  Dr. Osmar R. Zaïane, 1999 Principles of Knowledge Discovery in Databases University of Alberta 9  Dr. Osmar R. Zaïane, 1999 Principles of Knowledge Discovery in Databases University of Alberta 10 Data Warehouse and OLAP Building a Data Warehouse Outline • What is a data warehouse and what is it for? Option 1 : Corporate • What is the multi-dimensional data model? Consolidate Data Marts Data Warehouse • What is the difference between OLAP and OLTP? Option 2 : • What is the general architecture of a data warehouse? Build from scratch Data Mart • How can we implement a data warehouse? Data Mart Data Mart Data Mart • Are there issues related to data cube technology? Corporate data • Can we mine data warehouses?  Dr. Osmar R. Zaïane, 1999  Dr. Osmar R. Zaïane, 1999 Principles of Knowledge Discovery in Databases University of Alberta 11 Principles of Knowledge Discovery in Databases University of Alberta 12 2

  3. Construction of Data Warehouse Describing the Organization Based on Multi-dimensional Model We sell products in various • Think of it as a cube with labels markets, and we measure our on each edge of the cube. performance over time • The cube doesn’t just have 3 dimensions, but may have many Business Manager dimensions (N). • Any point inside the cube is at the intersection of the coordinates We sell Products in various e defined by the edge of the cube. Markets m Markets , and we measure our i T • A point in the cube may store performance over Time values (measurements) relative to the combination of the labeled Products Data Warehouse Designer dimensions.  Dr. Osmar R. Zaïane, 1999 Principles of Knowledge Discovery in Databases University of Alberta 13  Dr. Osmar R. Zaïane, 1999 Principles of Knowledge Discovery in Databases University of Alberta 14 Data Warehouse and OLAP Concept-Hierarchies Outline • What is a data warehouse and what is it for? Dimensions are hierarchical by nature: total orders or partial orders Example: Location(continent � country � province � city) • What is the multi-dimensional data model? Time(year � quarter � (month,week) � day) • What is the difference between OLAP and OLTP? Industry Country Year • What is the general architecture of a data warehouse? Dimensions: Product, Region, week Category Region Quarter • How can we implement a data warehouse? Hierarchical summarization paths Product City Month Week • Are there issues related to data cube technology? Office Day • Can we mine data warehouses?  Dr. Osmar R. Zaïane, 1999 Principles of Knowledge Discovery in Databases University of Alberta 15  Dr. Osmar R. Zaïane, 1999 Principles of Knowledge Discovery in Databases University of Alberta 16 On-Line Transaction Processing On-Line Analytical Processing • On-line analytical processing (OLAP) is essential for • Database management systems are typically used for on-line decision support. transaction processing (OLTP) • OLAP is supported by data warehouses. • OLTP applications normally automate clerical data • Data warehouse consolidation of operational databases. • The key structure of the data warehouse always contains processing tasks of an organization, like data entry and some element of time. enquiry, transaction handling, etc. (access, read, update) •Owing to the hierarchical nature of the dimensions, OLAP • Database is current, and consistency and recoverability are operations view the data flexibly from different perspectives critical. Records are accessed one at a time. (different levels of abstractions). � OLTP operations are structured and repetitive • roll-up (increase the level of abstraction) •OLAP operations: � OLTP operations require detailed and up-to-date data • drill-down (decrease the level of abstraction) � OLTP operations are short, atomic and isolated transactions • slice and dice (selection and projection) • pivot (re-orient the multi-dimensional view) DW tend to be in the order of Tb • drill-through (links to the raw data) Databases tend to be hundreds of Mb to Gb.  Dr. Osmar R. Zaïane, 1999  Dr. Osmar R. Zaïane, 1999 Principles of Knowledge Discovery in Databases University of Alberta 17 Principles of Knowledge Discovery in Databases University of Alberta 18 3

Recommend


More recommend