value driven approach designing extended data warehouses
play

Value-driven Approach Designing Extended Data Warehouses Nabila - PowerPoint PPT Presentation

Laboratoire dInformatique et dAutomatique pour les Systmes Value-driven Approach Designing Extended Data Warehouses Nabila BERKANI & Selma KHOURI Carlos ORDONEZ Ladjel BELLATRECHE ESI University of Houston LIAS/ISAE-ENSMA


  1. Laboratoire d’Informatique et d’Automatique pour les Systèmes Value-driven Approach Designing Extended Data Warehouses Nabila BERKANI & Selma KHOURI Carlos ORDONEZ Ladjel BELLATRECHE ESI University of Houston LIAS/ISAE-ENSMA Algiers, Algeria USA Poitiers, France (n_berkani, s_khouri)@esi.dz carlos@central.uh.edu bellatreche@ensma.fr DOLAP’2019, Lisbon, Marsh 26, 2019

  2. Impact of Big Data on DW DOLAP Workshop DaWak Conference 2016: Thinking 2017-Present 1999-2014 1998-2015 2015-Present 2

  3. 30 years of existence: Maturity Variety 1. Design life-cycle well identified Data Sources Requirements Mappings Sources/Requirements Mappings Instance Extraction DW Schema Definition Cross-phase design Discipline Origin à Augmentation of DW by Big Data Vs Extract-Transform-Load Author Load to DSA Exploitation Deployment Field Year Temp 1 Join Extract Store Load to DSA Filte Store Multidimensional r Extract Modeling Temp 2 Relational ☛ Actors of the Design ☛ Actors of Exploitation 2. Diversity of Actors Designers, Data Preparators, Quest of Value Data Analysts Architects Administrators , “Deployers” 3

  4. Agenda q Value & Variety (2Vs) q Augmenting DW by Linked Open Data q 2Vs-driven Design Approach q Case Study q Summary 4

  5. § Variety & Value Value: # places § LOD & DW § 2Vs Design Approach § Case Study § Summary § FR: Value à money [A. G. Sutcliffe’2018] Value à user feedback: § NfR: Value à satisfaction of qualities (security, privacy, …) N. Konstantinou Value à integration of new resources (LOD, …) Requirements à Recent efforts on building value ontologies: CM T. P. Sales, F. A. Baião, G. Guizzardi, J. P. A. Almeida, N. Guarino, J. Mylopoulos: The Common Ontology Decision Makers of Value and Risk. ER 2018: 121-135 Integration Exploitation Deployment N. W. Paton Decision Sources Value à offered services [D. Bork] Analysis Value à new Visualisation Value à usage of modern architectures: programming Teradata paradigms: Spark Queries, Statistics, … Value à interdependencies between value (phases) & value (operational DW) 5

  6. § Variety & Value Value increases Variety § LOD & DW § 2Vs Design Approach § Case Study § Summary Designer Person in Charge (PiC) of Value VALEUR VARIETY q Examples of requirements related to value 1 : • Media: Has the coverage of media changed over time? Politics: Speeches EU parliament that contain « human • rights » by country • Finance: Evolution of Debates related to Greece crisis by DB Libraries News papers country q Measurement of the value depends on the studied + Internal domain External Sources Sources ➡ Interaction between designers and PiC of value: multidisciplinary in DW à Usage of Linked Open Data : Traditional Management of Variety • + Variety of Formalisms (graphs) • 1 http://www.talkofeurope.eu/data 6

  7. § Variety & Value Augmenting DW by 2Vs § LOD & DW § 2Vs Design Approach § Case Study § Summary + High Variety of Sources || Global Processing Data Sources Mappings Sources/Requirements Requirements Instance Extraction Mappings DW Schema Definition Cross-phase design Deployment ETL Discipline Origin (Variety) (Variety) Store 1 Author Exploitation Load to DSA (Value) Field Year Temp 1 Join Extract Graph Load to DSA Filte Store Multidimensional Store n r Extract Modeling Temp 2 Relational ☛ Actors of the Design ☛ Actors of Exploitation 2. Diversity of Actors Designers, Data Preparator, Data Analysts, Architects Administrators, “PiC of value” “Deployer” 7

  8. § Variety & Value Formalisation § LOD & DW § 2Vs Design Approach § Case Study § Summary q Inputs: 1. Set of internal sources: S Int ={S I1 , S I2 , …, S Im } 2. Set of external resources: S Ext = ={S E1 , S E2 , …, S En } 3. Each source (internal/external) S i has: Its own physical format (F i ) § Its conceptual model CM i § Is related to a discipline D (medicine, engineering, etc.) § 4. Set of requirements to be satisfied 5. [Optional]: An operational DW ( [Ravat et al. 2017] ), where: Its conceptual model CM DW § Its format(s) Format (S DW ) = {f 1 , f 2 , …, f k } à polystore storage § q Objective: Definition of all phases of DW augmenting its value § q Challenges: Metrics of Value § Value(DW)= Operator (1≤ i≤ n+m) [Weight(S i , D) * Value (S i )]; S i ∈ S int ∪ S ext [ Ballou et al.]* *Ballou, D. P., & Tayi, G. K. (1999). Enhancing data quality in data warehouse environments. Com of the ACM , 42 (1), 73-78 8

  9. § Variety & Value # Scenarios § LOD & DW § 2Vs Design Approach § Case Study § Summary External ETL Internal ETL Load to DSA Load to DSA Extract Join Join Temp 1 Temp 1 Load to DSA Load to DSA Filter Store Filter Store Extract Extract Temp 2 Temp 2 3 Scenarios Users Users Requirements MD Query Users OLAP DW Queries Requirements Requirements Pool of Results ETL ETL Internal Sources Internal Sources Merge Internal Cube Synchronise LOD Cube Materialize Internal Sources ETL LOD Query LOD On-demand LOD Data Cube Graph Results DW ETL-LOD LOD ETL LOD graphs visualized Query materialized ED (a) Serial Design (b) Parallel Design (c) Query-driven Design ☛ On-demand ETL: data extracted from the existing DW and LOD, ☛ LOD is seen as source ☛ Two Parallel ETL then potentially loaded into DW à (requirement satisfaction) Challenges? 1. Pivot schema : generic schema vs. LOD schema (graph) 2. Redefinition of operators (overloading) 3. Synchronisation of internal and external data: 3 scenarios 9

  10. § Variety & Value Value Metrics § LOD & DW § 2Vs Design Approach § Case Study § Summary Value(DW) = Operator (1≤ i≤ n+m) [weight(S i , D) * Value(S i )], where S i ∈ S int ∪ S Ext q Three metrics related to: 1. Requirement satisfaction 𝑾𝒃𝒎𝒗𝒇(𝑺𝒇𝒓, 𝑻 𝒋 ) = 𝒐𝒗𝒏𝒄𝒇𝒔 𝒑𝒈 𝒔𝒇𝒕𝒒𝒑𝒐𝒕𝒇𝒕 𝒑𝒈 𝒔𝒇𝒓𝒗𝒋𝒔𝒇𝒏𝒇𝒐𝒖 𝒑𝒐 𝑻 𝒋 𝒐𝒗𝒏𝒄𝒇𝒔 𝒑𝒈 𝒔𝒇𝒕𝒒𝒑𝒐𝒕𝒇𝒕 𝒑𝒈 𝒃𝒎𝒎 𝒔𝒇𝒓𝒗𝒋𝒔𝒇𝒏𝒇𝒐𝒖𝒕 2. Conceptual modelling (multidimensional concepts) 𝑾𝒃𝒎𝒗𝒇 𝑫𝒑𝒐𝒅𝒇𝒒𝒖𝒕, 𝑻 𝒋 = 𝒐𝒗𝒏𝒄𝒇𝒔 𝒑𝒈 𝒅𝒑𝒐𝒅𝒇𝒒𝒖𝒕 𝒑𝒈 𝑬𝑿 𝒕𝒅𝒊𝒇𝒏𝒃 𝒄𝒛 𝒋𝒐𝒖𝒇𝒉𝒔𝒃𝒖𝒋𝒐𝒉 𝑻 𝒋 𝒖𝒑𝒖𝒃𝒎 𝒐𝒗𝒏𝒄𝒇𝒔 𝒑𝒈 𝒖𝒊𝒇 𝑬𝑿 𝒅𝒑𝒐𝒅𝒇𝒒𝒖𝒕 3. Target DW population 𝑾𝒃𝒎𝒗𝒇(𝑱𝒐𝒕𝒖𝒃𝒐𝒅𝒇𝒕, 𝑻𝒋) = 𝒐𝒗𝒏𝒄𝒇𝒔 𝒑𝒈 𝒋𝒐𝒕𝒖𝒃𝒐𝒅𝒇𝒕 𝒑𝒈 𝑬𝑿 𝒄𝒛 𝒋𝒐𝒖𝒇𝒉𝒔𝒃𝒖𝒋𝒐𝒉 𝑻 𝒋 𝒖𝒑𝒖𝒃𝒎 𝒐𝒗𝒏𝒄𝒇𝒔 𝒑𝒈 𝒋𝒐𝒕𝒖𝒃𝒐𝒅𝒇𝒕 𝒑𝒈 𝒖𝒊𝒇 𝑬𝑿 10

  11. § Variety & Value Case Study § LOD & DW § 2Vs Design Approach § Case Study § Summary ☛ University Research Analysis § 4 internal sources generated from LUBM benchmark 15 initial requirements § q Analysis: 6 requirements are not satisfied by internal sources (Oracle 12c release 1) § à External source: Dbpedia 11

  12. § Variety & Value Experiments § LOD & DW § 2Vs Design Approach § Case Study § Summary Metrics Dimensions/ Value (S*) MD Value(S*) Req. Value (S*) Instances Instances Response time Sources Measures Internal Sources 6/1 31% 6% 10% 550K 1.1 Serial Design 10/7 71% 80% 94% 7,7x10 6 3.2 Parallel Design 11/8 73% 84% 85% 3,1x10 6 2.6 Query-driven design 12/8 74% 96% 84% 2,9x10 6 1.7 *All sources have the same weight *Operator: Avg Augmented Schema 12

  13. § Variety & Value Summary § LOD & DW § 2Vs Design Approach § Case Study § Summary ✔ 2Vs for the DW renaissance ✔ Value = pool of multidisciplinary expertise ✔ DW life cycle design revisited (new formalization) ✔ 3 augmented scenarios ☛ Veracity & 2V ☛ More automation (query rewriting) ☛ Value Query Language (Thank Patrick) Special issue on: Business Intelligence and Analytics for Value Creation in the Era of Big Data and Linked Open Data: International Journal of Information Management, Elsevier (Q1; IF=4.810) 13

Recommend


More recommend