Towards Benchmarking Stream Data Warehouses Arian Br, Lukasz Golab - PowerPoint PPT Presentation

Jan 22, 2024 •351 likes •482 views

Towards Benchmarking Stream Data Warehouses Arian Br, Lukasz Golab 02.11.2012 Stream Data Warehouses A data warehouse that is (nearly) continuously loaded Enables real-time/historical analytics and applications Stream Data Warehouses

Towards Benchmarking Stream Data Warehouses Arian Bär, Lukasz Golab 02.11.2012
Stream Data Warehouses  A data warehouse that is (nearly) continuously loaded  Enables real-time/historical analytics and applications
Stream Data Warehouses
Research Issues  Goal: ensure data freshness  Fast/streaming ETL - Streaming joins  Fast data load and propagation - Temporal partitioning - Incremental view refresh - Golab et al, Stream warehousing with Data Depot, SIGMOD 2009 - View update scheduling - Golab et al, Scalable scheduling of updates in stream data warehouses, TKDE 2012
Measuring Freshness  Use a data steam benchmark? - Focus on throughput; no persistent storage  Use a data warehouse/OLAP benchmark? - Focus on query performance + periodic batch updates  What we need - Translate metrics such as throughput and response time to data freshness/staleness
Basic Ingredients  Define a staleness function wrt time - One per table; add up to get total for the warehouse - One implementation: staleness begins to accrue (for the base table and all associated views) when a new batch of data arrives - Many other definitions possible – e.g., binary  Track over time - Get a staleness vs. time plot  Return - Avg staleness per unit time - Min/max/variance over time - Priority-weighted staleness - The plot itself ... - … also query response times
Staleness Plots
Total Staleness
Factors Influencing Staleness  ETL, data load, view update times  Update order
Benchmark Structure  Data generator sends files to the SDW  System executes a worload consisting of - Base table loads and materialized view updates (including indices) on arrival of newdata - Ad-hoc queries scheduled randomly - (Don't want to wait till the end to test query performance)  Vary data speed and volume - Bursty workload will test overload performance  Repeat for different view hierarchies
Example View Hierarchies
Conclusions and Future/Ongoing Work  Proposal for a SDW benchmark framework - Focus on data freshness over time - Interpretable results  Ongoing work - Benchmark implementation - Efficient incremental view update - Freshness (and completeness) as data quality metric - Freshness in a distributed SDW

Recommend

Towards a Formal Model for View Maintenance in Data Warehouses D. Agrawal, A. El Abbadi, A.

Towards a Formal Model for View Maintenance in Data Warehouses D. Agrawal, A. El Abbadi, A. Most efaoui, M. Raynal and M. Roy Towards a Formal Modelfor View Maintenance in Data Warehouses p.1/22 Summary The Data Warehouse Problem

570 views • 23 slides

B3 Benchmarking B3 Building Benchmarking Program Overview www.CleanEnergyResourceTeams.org B3

B3 Benchmarking B3 Building Benchmarking Program Overview www.CleanEnergyResourceTeams.org B3 Benchmarking: Background B3 stands for: Buildings Benchmarking and Beyond www.CleanEnergyResourceTeams.org B3 Benchmarking: Background The

297 views • 25 slides

? sync ref chosen as sync source by Listener Stream B: Presentation Stream C: timestamps

Nominal Media Clock: Ts (implicit, not distributed) Stream A: ? sync ref chosen as sync source by Listener Stream B: Presentation Stream C: timestamps Stream D: from different Talkers A-F Stream E: Listener must Stream F: align

57 views • 3 slides

Benchmarking Lunch-n-Learn March 18, 2019 Agenda 1. Why Benchmarking? 2. Introduction to

Benchmarking Lunch-n-Learn March 18, 2019 Agenda 1. Why Benchmarking? 2. Introduction to SoCalREN 3. Benchmarking and Compliance Overview What is Energy Benchmarking What is ENERGY STAR Portfolio Manager California Building

368 views • 35 slides

BUSINESS ANALYTICS CHAPTER 29 LECTURE OUTLINE Data warehouses Comparison with

BUSINESS ANALYTICS CHAPTER 29 LECTURE OUTLINE Data warehouses Comparison with operational databases Multi-dimensional schemas Functionality of a data warehouse 2 DATA WAREHOUSES Data warehouse A subject

132 views • 9 slides

Benchmarking Summarizability Processing in Colocated with XML Warehouses with ACM CIKM 2012

ACM Fifteenth International Workshop On Data Warehousing and OLAP DOLAP 2012 Benchmarking Summarizability Processing in Colocated with XML Warehouses with ACM CIKM 2012 Complex Hierarchies Maui, Hawaii, USA November 2, 2012 By Chantola

488 views • 24 slides

Stream Ciphers Stream Ciphers 1 Stream Ciphers Generalization of one-time pad Trade

Stream Ciphers Stream Ciphers 1 Stream Ciphers Generalization of one-time pad Trade provable security for practicality Stream cipher is initialized with short key Key is stretched into long keystream Keystream is used like

512 views • 35 slides

Towards a Formal Model for View Maintenance in Data Warehouses D. Agrawal , A. El Abbadi , A.

Towards a Formal Model for View Maintenance in Data Warehouses D. Agrawal , A. El Abbadi , A. Most efaoui , M. Raynal and M. Roy Univ. Santa Barbara, California IRISA, Rennes, France Towards a Formal

578 views • 34 slides

Towards Benchmarking AIOT Device based on MCU Dong Li Seaway Technology Inc. ICT, CAS

Towards Benchmarking AIOT Device based on MCU Dong Li Seaway Technology Inc. ICT, CAS 2019-11-15 Outline MCU-based AIOT Device and Benchmarking SeawayRTOS Intro. & Auditing Kernel Contents Early Experiments for BenchMarking

524 views • 28 slides

CS162: Introduction to Computer Science II Streams 1 Streams A stream is a flow of data

CS162: Introduction to Computer Science II Streams 1 Streams A stream is a flow of data Input stream: a stream going into your program (eg. from a keyboard or file) cin is an input stream from the keyboard Output stream: a

351 views • 16 slides

Data Science in the Wild Lecture 12: Memory-Based Data Warehouses Eran Toch Data Science in the

Data Science in the Wild Lecture 12: Memory-Based Data Warehouses Eran Toch Data Science in the Wild, Spring 2019 1 Data Engineering Extract Transform Load & Clean Sources Data Warehouse Data Science in the Wild, Spring 2019 2

1.12k views • 54 slides

Stream Processing Marco Serafini COMPSCI 532 Lecture 5 Stream vs. Batch Processing Batch

Stream Processing Marco Serafini COMPSCI 532 Lecture 5 Stream vs. Batch Processing Batch processing Bounded input Bounded one-shot computation Bounded output Stream processing Unbounded input: data stream

624 views • 23 slides

2015 Benchmarking & Data Management April 15, 2015 PSTA Runs on Data Highlights from 2015 1.

2015 Benchmarking & Data Management April 15, 2015 PSTA Runs on Data Highlights from 2015 1. American Benchmarking Report Financial Comparisons 2. to Florida Systems How ABBG Works Benchmarking is a Systematic process of Continuous measuring,

549 views • 16 slides

Fresh water stream ecosystem Gr ov p 2 The description of stream lives Quadrat 1: Hong Kong Newt

Fresh water stream ecosystem Gr ov p 2 The description of stream lives Quadrat 1: Hong Kong Newt Larva Mayfly Nymph Canton Bee Shrimp Caddisfly Larva Large Stream snail Stonefly Quadrat 2: Mayfly Nymph Large Stream Snail Quadrat

285 views • 7 slides

Phase III Stream Assessment Study: Potential Stream Restoration Projects Strawberry Run and

Phase III Stream Assessment Study: Potential Stream Restoration Projects Strawberry Run and Taylor Run December 5, 2018 Tonights Agenda Introduce the project team Why stream restoration? Healthy stream characteristics

736 views • 44 slides

UPLOAD VIDEOS TO MICROSOFT STREAM VIA ACCESSUH To upload a video on Microsoft Stream, go to

UPLOAD VIDEOS TO MICROSOFT STREAM VIA ACCESSUH To upload a video on Microsoft Stream, go to AccessUH, login with your Cougarnet credentials and then click Office 365. Search for Stream to find the Microsoft Stream app.

138 views • 3 slides

Inventory system team - The DUNE logis/cs management. - Minnesota group: Hajime, Bill Miller, Ron

Hajime Muramatsu U of Minnesota Installation coordination meeting JUN/27/2018 1 Inventory system team - The DUNE logis/cs management. - Minnesota group: Hajime, Bill Miller, Ron Poling, Aaron Mislivec, Greg Pawloski, and Marvin

388 views • 6 slides

Introduction to Data Available from the University Data Warehouse AITS Decision Support Who

Introduction to Data Available from the University Data Warehouse AITS Decision Support Who We Are Decision Support is a function of Administrative Information Technology Services (AITS) focused on and providing expertise in the areas of:

2.33k views • 26 slides

ADijkstra-LikeScenario

ADijkstra-LikeScenario Yourcompanyownsadeliverytruckthatwillhavetomakemany tripsinadaytovariouswarehouses,alwaysstartingatwarehouseA

264 views • 5 slides

Chapter 2: Role of Warehouse Managers Warehouse managers are expected to recognize and balance

Chapter 2: Role of Warehouse Managers Warehouse managers are expected to recognize and balance other tradeoffs as follows: accuracy; increased throughput vs. reduction in labour cost control; costs; cleanliness; storage density vs.

397 views • 6 slides

ST Data-warehouse for trajectories Some preliminary ideas S. Orlando, R. Orsini, A. Raffaet,

ST Data-warehouse for trajectories Some preliminary ideas S. Orlando, R. Orsini, A. Raffaet, A. Roncato Requirements and Starting points Trajectories arrive in streams, as triples (ID, SpatialPos, TemporalPos) to insert information

350 views • 22 slides

Overview Motivation Problem Definition Data Integration Data Integration Approaches

Overview Motivation Problem Definition Data Integration Data Integration Approaches Virtual integration Hanna Zhong Data warehouse hzhong@illinois.edu Issues Department of Computer Science University of Illinois,

403 views • 15 slides

Towards Physical Design Management in Storage Systems PDSW19

Towards Physical Design Management in Storage Systems PDSW19 1. 2.

746 views • 37 slides

HEALTH AND HOMELESSNESS BEFORE AND AFTER COVID Bobby Watts, CEO NIHCM Webinar June 18, 2020

HEALTH AND HOMELESSNESS BEFORE AND AFTER COVID Bobby Watts, CEO NIHCM Webinar June 18, 2020 OUTLINE Who we are & what we do 1. (& why its relevant to you) Overview of Homelessness 2. and Health care Innovation Focus: 3.

542 views • 12 slides