Overview DW Source Integration, Tools, and Architecture • DW Front End Tools • Source Integration • DW architecture Original slides were written by Torben Bach Pedersen Aalborg University 2007 - DWML course 2 End User Applications (EUA) EUA Concepts • The business impact of the DW! • Templates • Canned reports � Layout/structure + parameters � End user application templates � Compare sales per product in <area> for <period1> and <period2> � Provide answers to common questions • Parameters - chosen at run-time � Can be used as (quality-assured) building blocks for other reports � Come from any level of the given dimension – drill-down • Two extremes � Time (All time, 2002, 2002 4Q, 2002 Dec, 2002 Dec 1) possible � Ad hoc strategic analysis, power users, DIY query tools � Many different � Fixed operational analysis, report consumers, operational reporting • Identify report candidates • EUA fills the gap � Produce a list of candidates � “Tactical” analysis, push-button knowledge workers • Consolidate candidate list � Categorize candidates by data elements Aalborg University 2007 - DWML course 3 Aalborg University 2007 - DWML course 4
What Templates to Choose? Overview • “Analytical Cycle” Steps (repeats) • DW Front End Tools How’s business? – current performance 1) What are the trends? – performance over time 2) • Source Integration What’s unusual? – quick identification of exceptions (+/-) 3) What is driving the exceptions? – find causes for exceptions 4) • DW architecture What if …? – play around with parameters and see effect 5) Make a business decision – small as well as big decisions 6) Implement the decision – feed analysis results into op. systems 7) • Prioritize template list Rank or group templates – implement 15 most important at first � Aalborg University 2007 - DWML course 5 Aalborg University 2007 - DWML course 6 Data Integration Research Projects Views on DW Metadata • Focus on source integration and update propagation • Wrapper: convert source data into a standard format • Information Manifold � Sources: databases, SGML docs, unstructured files,… • Most DW projects: DW architecture as a “stepwise flow” of information from � Relational integration data model source to analyst • TSIMMIS • No conceptual domain model used for � Wrapper/mediator integration Some questions cannot be answered � Semi-structured OEM integration data model � • DWQ project: extended metamodel to • Squirrel capture all relevant aspects � Powerful “integration mediator” • WHIPS � Wrapper/mediator � Relational integration data model Aalborg University 2007 - DWML course 7 Aalborg University 2007 - DWML course 8
Using DW Metadata in the Enterprise Analyst: “Why can’t I answer question X?” • Analyst wants to analyze data Gather data from operational • Possible reasons � departments through OLTP (5) � Certain measures not included in fact table Question travels through (1)-(5) � • Traditional DW (previous slide) � Granularity of facts too coarse (4) only describes step (3)-(4) � Particular dimensions not in DW Cannot solve problems like � � Descriptive attributes missing from dimensions “why can’t I answer quest. X?” (3) • Conceptual relationships between (1) � Meaning of attributes/measures deviate from the enterprise model, operational analyst’s expectation (2) models + DW must be captured � …… Everything is a view on the � enterprise model ! (“local as view”) – unlike previous slide Aalborg University 2007 - DWML course 9 Aalborg University 2007 - DWML course 10 DWQ Metadata Source integration practice • Focus on information integration in databases • Three metadata perspectives (schema and data) must be captured Conceptual (enterprise) • Two main approaches � Logical (data model) � � Constructing integrated enterprise model Physical (data flow) � � Focus on mappings between sources and DW • Framework instantiated by • Tools for DW management conceptual, logical, and physical � Schema integration information models • DW quality heavily depends on � Metadata management DW processes rather than � Based on modeling tools schemas • Tools for data integration • A process meta model is needed � Mapping specification to capture process definitions, and � ETL tools – like last lecture the relationships to DW quality Aalborg University 2007 - DWML course 11 Aalborg University 2007 - DWML course 12
Schema Integration Virtual Data Integration • Only data definition is integrated • Producing one global schema (one-shot or incremental) Data only in sources, queries on views, queries shipped to sources � • Pre-integration Not suited for DW? � Analyzing and annotating source schemata � • Carnot Semantic enrichment of schema, often in richer data model � Individual schemata mapped onto rich GCL ontology (1. order logic) � • Schema comparison Articulation axioms specify mappings, queries mapped to GCL � Determine correlations/conflicts among schema concepts � • SIMS Heterogeneity conflicts – different source data models � Creates common class-based domain model to describe sources � Naming conflicts – homonyms and synonyms � Sources are dynamically chosen and integrated at query time � Semantic conflicts –different abstraction levels � Query reformulation, access planning, optimization, execution � Structural – different constructs � • Information Manifold • Schema conforming Relational world view + information source description + correspondences � Conform/align schemas to make them compatible � Metamodel enriched using description logic/Datalog rules � Typically semi-automatic process � Datalog queries, optimized by choosing ”minimal” sources � • Schema merging and restructuring • TSIMMIS “Superimpose” conformed schemas � Wrappers wrap sources using semi-structured OEM model � Quality: completeness, correctness, minimality, understandability � Mediator performs its own integration – no global integration (global as view) � Aalborg University 2007 - DWML course 13 Aalborg University 2007 - DWML course 14 Materialized Data Integration DWQ Source Integration • Current DW tools cannot fully support DW quality � No support for validation of interschema assertions and other • Views on source data are materialized in integrated DB specified relationships, i.e., the DW design process • Squirrel • Conceptual perspective � Integration mediators incrementally maintain materialized views � Domain model = enterprise model + source models Cooperation of sources required � � Consolidated and reconciled description of important concepts • WHIPS ◆ Not all enterprise data captured (at first, incremental approach) � Relational SPJ + aggregation views specified in view tree � Logic-based formalism allows reasoning over metadata � View manager computes view and handles updates � Intermodel assertions capture interdependencies � Integrator ensures view maintainability • Logical perspective � Global query processor queries sources using wrappers/mediators � Source schemata + DW schema in logical data model (relational) • In combination with virtual integration? � Defined as queries over the corresponding conceptual component • Physical perspective � The actual data stores Aalborg University 2007 - DWML course 15 Aalborg University 2007 - DWML course 16
DWQ Source Integration Architecture DWQ Source Integration Methodology • Source-driven integration � Enterprise and source model construction � Source model integration (into the domain model) � Source and DW schema specification (+ mappings) � Data integration and reconciliation � Quality analysis steps in all phases above • Client-driven integration � New client query considered � Reasoning determines whether query can be answered by materialized views already in DW ◆ Query containment reasoning � If DW not sufficient, materialize new concepts in domain model? � Otherwise, new sources must be added using source-driven integr. Note explicit mappings! Aalborg University 2007 - DWML course 17 Aalborg University 2007 - DWML course 18 Overview Lifecycle Overview • DW Front End Tools Technical Product Technical Product • Source Integration Architecture Selection& Architecture Selection& Design Installation Design Installation • DW architecture Data Staging Business Data Staging Business Dimensional Physical Project Dimensional Physical Maintenance Project Maintenance Design & Requirements Design & Deployment Requirements Deployment Modeling Design Planning Modeling Design and Growth Planning and Growth Development Definition Development Definition End-User End-User End-User End-User Application Application Application Application Specification Development Specification Development Project Management Project Management Aalborg University 2007 - DWML course 19 Aalborg University 2007 - DWML course 20
Recommend
More recommend