D IPARTIMENTO DI I NGEGNERIA INFORMATICA AUTOMATICA E GESTIONALE A NTONIO R UBERTI Master Degree Programme in Manage gement ment Engin inee eeri ring Enterprise En erprise In Information formation Sys ystems ems Umbe mberto to Nan anni ni Data Warehousing – Introduction to the Multidimensional Model Umberto Nanni Enterprise Information Systems 1
Multidimensional model U. Nanni, di anni 28, Pippo, di anni 32, Pluto, di anni 77, Paperino, di anni 16, event in the real world U. Nanni, aged 28, acquista acquista acquista acquista buys uno scopettone DB design: brillantante dentifricio lucido da scarpe selection of all and only features a box of dish soap having operational relevance vendita di datum in the operational DB one box of “DS” uno scopettone (rappresenting the event) dish soap is sold DW design: ..., definition of aggregations (with a given granularity) of analytic interest to some users fact in the enterpise DW sold N units of article X (measurement at point od sale Y in an n-dimensional on day Z space) Umberto Nanni Enterprise Information Systems 2 2
Events and Facts • EVENT (in the real world) and FACT (of interest) are terms whose meaning is determined by the granularity (level of detail) of the multidimensional representation of the Data Warehouse Example of FACT: on 02/05/2012 in “Rome 23” PoS (Point of Sale) 278 boxes of soap have been sold, cashing 745 Euro Umberto Nanni Enterprise Information Systems 3
Multidimensional Cube 05/11/2012 06/11/2012 07/11/2012 08/11/2012 11/11/2012 12/11/2012 13/11/2012 broom Roma dish soap brusher Firenze 05/05/2004 toothpaste L’Aquila Torino brusher Roma Venezia Palermo Foggia units: 278 sales: € 745 Umberto Nanni Enterprise Information Systems 4
Dimensions ← time → 05/11/2012 06/11/2012 07/11/2012 08/11/2012 11/11/2012 12/11/2012 13/11/2012 broom Roma dish soap brusher Firenze toothpaste ← place→ L’Aquila Torino Venezia Palermo Foggia Umberto Nanni Enterprise Information Systems 5
A relational schema for the multidimensional cube TABLE(dimens- 1, …, dimens-k, measur- 1, …, measur-h) Example: SALES(product, city, date, numUnitsSold, sales) dimensions measurements (classification attributes) (variables, metrics, indicators, …) IDENTIFICATIVE KEY Functional dependence: dimensions → measurements In the example: product, city, date → numUnitsSold, sales Umberto Nanni Enterprise Information Systems 6
Dimensional hierarchy Each dimension is associated with a hierarchy that groups the values at different levels of aggregation (also orthogonally). Each node of the hierarchy is called DIMENSIONAL ATTRIBUTE Example (time dimension): day Functional Dependencies: month week day → week day → month month → quarter four-month quarter month → four -month-period period quarter → year four-month-period → year year ALL Umberto Nanni Enterprise Information Systems 7
Access to Data Warehouse Reporting • predefined needs (poss., parametrized) • automated extraction of information OLAP • analysis needs that are not identifiable in advance (browsing on dimensions and indicators) • interactive data exploration, searching for information of interest Data Mining • analysis needs that are not identifiable in advance • (semi)automatic data exploration Umberto Nanni Enterprise Information Systems 8
Reporting Report with predefined structure and format • query – typically based on restrictions and / or aggregations • presentation – interleaved with text data – in tabular and / or graphic form – customized with environment variables (date, user, ...) • generation – on user demand – periodic – under specified conditions • distribution – preview/print, e- mail, web, … Umberto Nanni Enterprise Information Systems 9
OLAP • implementation: – analysis session, divided into a series of steps, each dependent on the results obtained previously • tipical users: – domain experts, not necessarily computer experts Umberto Nanni Enterprise Information Systems 10
OLAP: example 05/11/2012 06/11/2012 07/11/2012 08/11/2012 11/11/2012 12/11/2012 dish soap Roma brusher toothpaste Firenze L’Aquila Torino Palermo Umberto Nanni Enterprise Information Systems 11
OLAP – Basic operations in the multidimensional model RESTRICTION (filtering based on values) • selection on specific values of dimensional attributes • in case of equality with a single value, the operator is called SLICING (one dimension disappears) AGGREGATION (level of abstrazion increases) • considered one dimension, group one dimensional attribute to a higher level of granularity • joining up to the highest level (the most generic, consisting of a single value), a dimension disappears Umberto Nanni Enterprise Information Systems 12
OLAP: Roll-Up Operator Aggregation along one dimension → use of an aggregate operator on a measurement attribute Examples: a. sales of each product by city and month b. daily sales per city (total on all products) c. daily sales by product (North / Central / South) ← day → ← month → ← city → ← city→ (a) Umberto Nanni Enterprise Information Systems 13
OLAP: Aggregative Operators DISTRIBUTIVE: computable from: 1. partial aggregates – Examples: SUM, MAX, MIN, … ALGEBRAIC: computable from: 1. partial aggregates 2. finite set of support information – Examples : weighted AVG, standard deviation, … OLISTIC: computable from: 1. original data – Example: rank of an array Umberto Nanni Enterprise Information Systems 14
OLAP: Drill-Down Operator Disaggregation over one dimension → disaggregated data are assumed to be available Examples: a. from “sales of each product by city and month” to “daily sales” b. from “monthly sales by city (total on all products)” to “daily sales by city and product” ← month → ← date → ← city → ← city → (a) Umberto Nanni Enterprise Information Systems 15
OLAP: Cuboids (dimensions) • given a data cube, you can perform different roll-up operations up to the disappearance of a dimension • given a k-dimensional data cube, there exist 2 k possible cuboids all product date city roll-up product, data product, city date, city drill-down product, date, city Umberto Nanni Enterprise Information Systems 16
OLAP: Cuboids (aggregation levels) • given a data cube, you can perform different roll-up operations, increasing the aggregation level (up to the disappearance of a dimension) • given a k-dimensional data cube, resp. with n 1 , n 2 , …, n k agregation levels, there exist n 1 ×n 2 × … ×n k possible cuboids { } brand month roll-up brand, month date article drill-down article, month brand, date article, date Umberto Nanni Enterprise Information Systems 17
OLAP: Slice-and-Dice Operators Selection (up to “ slicing ”) Examples: from “monthly sales of products by city” to: a. monthly sales of products by city in central Italy b. monthly sales by city of dishwasher detergents Sept-12 Aug-12 Oct-12 Sptt-12 Aug-12 Oct-12 rinsing detergents dishwasher detergents dishwasher detergents Roma Roma brooms Firenze Firenze L’Aquila L’Aquila (b) Torino Torino Palermo Palermo Umberto Nanni Enterprise Information Systems 18
OLAP: Pivoting Operators Rotating dimensions on a bidimensional table Example from: sales of (products by month) by (city) to: sales of (products by city) by (month) aug-12 sep-12 oct-12 rinsing det. RM 34 56 76 FI 23 45 34 AQ 12 23 34 RM FI AQ TO PA TO 56 44 55 rinsing det aug-12 34 23 12 56 65 PA 65 67 45 sep-12 56 45 23 44 67 dishw. det. RM 57 79 99 oct-12 76 34 34 55 45 FI 46 68 57 dishw. det. aug-12 57 46 35 79 88 AQ 35 46 57 two sep-12 79 68 46 67 90 TO 79 67 78 oct-12 99 57 57 78 68 operations PA 88 90 68 brooms aug-12 46 35 24 68 77 brooms RM 46 68 88 sep-12 68 57 35 56 79 FI 35 57 46 oct-12 88 46 46 67 57 AQ 24 35 46 TO 68 56 67 PA 77 79 57 Umberto Nanni Enterprise Information Systems 19
OLAP: Drill-Through Operator • access data at a level of aggregation lower than the Data Warehouse • access to reconciled data (three-level architecture) or to the operational DB • a sort of extreme Drill-Down Umberto Nanni Enterprise Information Systems 20
OLAP: Drill-Across Operators • Correlate data from two or more cubes, based on the dimensional values, returning the value in a new cube Example: Starting from sales by product, date, city employers by city calculate: average sales by product, employee Umberto Nanni Enterprise Information Systems 21
A Book on Datawarehouse Matteo Golfarelli, Stefano Rizzi Data Warehouse Design Modern Principles and Methodologies McGraw-Hill, 2009. Matteo Golfarelli, Stefano Rizzi Data Warehouse Teoria e Pratica della Progettazione (2 a ed.) McGraw-Hill, 2006. Umberto Nanni Enterprise Information Systems 22
Recommend
More recommend