developing data warehouses with quality in mind
play

Developing Data Warehouses with Quality in Mind Yannis Vassiliou - PDF document

Developing Data Warehouses with Quality in Mind Yannis Vassiliou National Technical University of Athens Workshop on Data Quality December 1, 2000 1 OUTLINE Introduction Motivation The Data Warehouse Metadata Framework Developed


  1. Developing Data Warehouses with Quality in Mind Yannis Vassiliou National Technical University of Athens Workshop on Data Quality December 1, 2000 1 OUTLINE � Introduction – Motivation � The Data Warehouse Metadata Framework Developed � Architecture, Processes, Quality � Models � Employing the Framework � Conclusions December 1, 2000 Yannis Vassiliou Slide 2 1

  2. Foundations of Data Warehouse Quality- DWQ Project National Technical University of Athens (NTUA) Informatik V & Lehr- und Forschungsgebiet Theoretische Informatik (RWTH-Aachen) Institute National de Recherche en Informatique et en Automatique (INRIA) Deutsche Forschungszentrum f ό r k ό nstliche Intelligenz (DFKI) University of Rome «La Sapienza» (Uniroma) Istituto per la Ricerca Scientifica e Tecnologica (IRST) University of Manchester (UMan) December 1, 2000 Yannis Vassiliou Slide 3 Introduction – Motivation � Contribute to the systematic understanding and usage of the interplay between QUALITY FACTORS and DESIGN / EVOLUTION OPTIONS in Data Warehousing (Objective) � Develop comprehensive DW Foundations (Framework), Prototype and Evaluate them (Achievement) � Enriched Meta data management facilities with embedded analysis and optimization techniques (Key Methodology) December 1, 2000 Yannis Vassiliou Slide 4 2

  3. Standard DW Architecture Standard DW Architecture Standard DW Architecture Clients Clients Clients GIS GIS GIS OLAP OLAP OLAP DSS DSS DSS Examples: Data Data Data Data Data Data Mart Mart Mart Mart Mart Mart Microsoft Repository Administration Administration Administration Metadata Interchange Agent Agent Agent Specification (MDIS) Data Data Data Meta Warehouse Warehouse Warehouse DataBase control and manage Repository metadata for OLAP Mediator Mediator Mediator databases. Wrappers Wrappers Wrappers / / / / / / / / Loaders Loaders Loaders Text Text Text External External External Sources Sources Sources DB DB DB File File File data data data December 1, 2000 Yannis Vassiliou Slide 5 Standard DW Architecture Clients GIS OLAP DSS PRACTICAL QUESTIONS not Handled in the Traditional Architecture: Data Data Mart Mart -- How come the information Administration from the DW is not the same Agent to the one coming from Meta sources? Data DataBase Warehouse Repository -- What is the effort required to get in the DW information Mediator not currently available? / Wrappers / -- If I want 100 % correct data Loaders in my DW, how do I design it? how often do I refresh it? Text External -- …. Sources DB File data December 1, 2000 Yannis Vassiliou Slide 6 3

  4. DWQ DW Architecture Clients GIS OLAP DSS query optimiser Data Data subsumption Mart Mart reasoner Administration Agent DWQ Data Repository Warehouse Concept quality manager Base aggregation Mediator reasoner Wrappers/ freshness agent Loaders Text External Sources DB File data December 1, 2000 Yannis Vassiliou Slide 7 A Small Motivating Example � MINISTRY of HEALTH (Greece) � Data Warehouse: � Sources = COBOL files for all the medical centers in Greece (~2400) � Transformation and Cleaning Tasks � Quality requirements (Goals) «Achieve 100% completeness and consistency of data» December 1, 2000 Yannis Vassiliou Slide 8 4

  5. Metadata Framework � Introduction – Motivation � The Data Warehouse Metadata Framework Developed � Architecture, Processes, Quality � Models � Employing the Framework � Conclusions December 1, 2000 Yannis Vassiliou Slide 9 Viewpoints of a DW December 1, 2000 Yannis Vassiliou Slide 10 5

  6. Architecture Model: Step 1 Enterprise Version of the Traditional DW Multidimensional Analyst OLAP Data Mart Mart ? Quality Aggregation/ Customization DW Data Quality Enterprise Warehouse Source Wrapper/ Observation Quality Loader Operational Information Department Source OLTP December 1, 2000 Yannis Vassiliou Slide 11 Architecture Model: Step 2 Enterprise Version (Meta level) Extending the Traditional DW Logical Physical Conceptual Perspective Perspective Perspective Client Client OLAP Client Schema Model Data Store ? Transportation Aggregation/ Agent Customization DW DW Enterprise Schema Data Store Model Observation Transportation Wrapper Agent Operational Source Source Department Schema Data Store OLTP Model December 1, 2000 Yannis Vassiliou Slide 12 6

  7. Architecture Model - Instantiation Conceptual Logical Physical Perspective Perspective Perspective Client Level Meta Model Level DW Level Source Level in Models/ Meta Data Level in in Real World December 1, 2000 Yannis Vassiliou Slide 13 Architecture Model: Step 3 Structure of the Meta Model as implemented in ConceptBase / Telos Measurable Object in in in LogicalObject PhysicalObject ConceptualObject isa isa isa deliversTo isa isa DW_ Component hasStructure Model hasConcept hasType Schema isa Concept Type isa isViewOn Agent DataStore isSubsumedBy Client isa Client relatesTo Client isa Model Schema isa DataStore Enterprise DW DW Schema Model DataStore Source Source Control Transport Source Model Schema Agent Agent DataStore December 1, 2000 Yannis Vassiliou Slide 14 7

  8. Process Meta Model: Step 1 Capturing the Dynamic Aspects of the Architecture Model (static) Conceptual Logical Physical Perspective Perspective Perspective Client Level Process Meta Model Meta Level DW Level Process Model Source Level in Process Models/ Model Meta Data uses Level in in Processes Real World December 1, 2000 Yannis Vassiliou Slide 15 DW Process Meta Model � Workflow Reference Model (made less abstract to fit in the DW case, e.g.: capture schedules, relationships with data) � Strategic Dependency Model (conceptual) � Processes: Cleaning, transformation, transfer, computation � ROLE – ACTIVITY – AGENT December 1, 2000 Yannis Vassiliou Slide 16 8

  9. Process Meta Model: Step 3 DW Operational Process Meta Model STAKEHOLDER FOR PERSON DW_USER NEXT IN MAPPED SCHEDULE type COMPOSED PROCESS ELEMENT COMPOSITE ACTIVITY active OPERATES ROLE isa IS TRANSITION RELATED MAPPED isa ELEMENT passive AGENT EXECUTED ACTIVITY BY (application) INPUT DW_OBJECT OUTPUT DATA INPUT/ PACKAGE OUTPUT responsibility isa RELATES TO DATA CONCEPT TYPE MAPPED STORED STORE Conceptual Perspective Logical Perspective Physical Perspective (why) (what) (how) December 1, 2000 Yannis Vassiliou Slide 17 Quality Model � Quality in a Data Warehouse � Quality of Data � Quality of Processes � Quality of Service At all perspectives � Establishment of Quality aspects (dimensions) � Scientific vs. Pragmatic (user defined) December 1, 2000 Yannis Vassiliou Slide 18 9

  10. Quality Model � Concepts: � Measurable Object (e.g. logical schema of source) � Quality Goal (e.g.,improve availability of source A) � Quality Query (decide whether a quality goal is achieved) � Quality Dimension (e.g., “availability”, “correct”) � Quality Factor (measurement) � Stakeholders (decision makers, designers, administrators, programmers) December 1, 2000 Yannis Vassiliou Slide 19 Quality Dimensions Example: Data Usage Data usage quality accessibility usefulness System availability interpretability timeliness responsiveness Transactional availability security currency volatility December 1, 2000 Yannis Vassiliou Slide 20 10

  11. Quality Factors by Perspective Conceptual Logical Physical Perspective Perspective Perspective • Completeness • Usefulness of • Efficiency schemas • Redundancy • Interpretability of • Correctness of • Consistency schemas mappings • Correctness • Timeliness of stored • Interpretability of • Trace ability data schemas of Concepts and • Maintainability/ Models Usability of software components - Questions and metrics for each quality factor ? - Predictive models of quality impacts and trade-offs ? - Can the results be mapped back into data warehouse practice ? December 1, 2000 Yannis Vassiliou Slide 21 Quality Factors - Metrics Factor Methods of measurement Metrics Schema quality Correctness final inspection of data warehouse number of errors in schema for each entity and its the mapping of the corresponding ones in the sources entities final inspection of data warehouse number of useful Complete- ness schema for useful entities in the sources, entities, not present not represented in the data warehouse in the data schema warehouse Minimality final inspection of data warehouse number of schema for undesired redundant undesired entities in information the data warehouse trace ability final inspection of data warehouse number of schema for inability to cover user requirements not requirements covered December 1, 2000 Yannis Vassiliou Slide 22 11

Recommend


More recommend