Outline Background & Motivation System Overview System Design - PowerPoint PPT Presentation

R-Store : A Scalable Distributed System for Supporting Real-time Analytics Feng Li, M. Tamer Özsu, Gang Chen, Beng Chin Ooi @ICDE 2014 Presented by: Xiao Meng CS848, University of Waterloo

Outline • Background & Motivation • System Overview • System Design • RTOLAP in R-Store • Evaluation • Conclusion • Q & A

Background & Motivation • Si Situation uation for l or large arge scal scale e data p data processi ocessing ng  Systems classified into 2 categories: OLTP, OLAP  Data periodically transport to OLAP through ETL • Dema emand nd  Time critical decision making (RTOLAP) - the freshness of OLAP results - Fully RTOLAP entail executing query directly on OLTP data  OLAP & OLTP processed by one integrated system

Background & Motivation • Prob Problem em on on si simple co le combinatio ination  Resource contention - OLTP query blocked by OLAP  Inconsistency - Long running OLAP may access same data sets several times, updates by OLTP could lead to incorrect OLAP results • So Solut ution ion – R-Stor Store  Resource contention - Computation resource isolation  Inconsistency - Multi-versioning storage system

System Overview – A A glimpse of R-Store • OLAP LAP quer query data b y data based ased on on timest estamp of of quer query y sub ubmission ission fr from om mul multi ti-ve versi rsion onin ing stor orag age sys ystem tem – Modified HBase as storage – Mapreduce job for query execution • Per Period odica ically lly mater ateria ialize lize real eal-tim time e data data into nto data cub data cube – Fully HBaseScan every time is time-consuming • Entire table is scanned & shuffled during MR – Streaming Mapreduce to maintain data cube

System Overview – R-Store Architecture OLTP submitted to KV Store • OLAP query processed by • MapReduce – Scan on HBase Refresh data cube through • streaming MapReduce MetaStore to generate query • timestamp T Q & metadata

System Design – A Glimpse of HBase

System Design – Storage Design based on HBase • Ext Extend end Scan Scan to 2 to 2 ver versi sion ons – FullScan for querying data cube – IncrementalScan for querying real-time data • Infinit nfinite e ver versi sion ons of s of data to data to mainta aintain in quer query co y consist nsistency ency – Compaction to remove stale versions – Global compaction  Immediately following data cube refresh – Local compaction  Compact old versions not accessed by any scan process

System Design – IncrementalScan in detail • Tar Target get: Find out changes since last data cube materialization • Met etho hod – Take 2 timestamps as input 𝑈 𝐸𝐷 & 𝑈 𝑅 , return the values with largest timestamp before 𝑈 𝐸𝐷 & 𝑈 𝑅 • Implem ementa entation tions – Naïve: Accessing memstore & storefile in parallel – Adaptive: Maintain key modified since last materialization, first scan memstore , scan or random access keys based on cost

System Design – Compaction in detail • Glob obal co al compactio ction – Similar to Hbase’s default, retain only one version of each key – Triggered by data cube’s refresh completion • Loca Local l com compactio ction – Compacted data stored in different file in case block scan process – Files can be removed when not accessed by any scan – Triggered when #tuple/#key exceeds threshold

System Design – Data cube • Define a efine a dat data cub a cube f e for or “Customer Profiles” • Dim imensions: ensions: age, age, inco ncome, b e, buys uys

System Design – Data cube maintenance • Re-computation – First run – FullScan on one region, generate a KV pair for each cuboid in mapper, aggregate & output in reducer • Incremental Update – Consequent runs – Propagation step to computes change & update step to update cube – Streaming system updates cube inside & periodically materialize it into storage

System Design – HStreaming for cube maintenance • Each mapper responsible for processing update within a key range – Maintain KVs locally, cache hot keys in memory – For updates, emit 2 KV pair for each cubiod(+, -) • Reducer cache the output KV of mapper and invoke reduce every 𝑋 𝑠 , refresh cube every 𝑋 𝑑𝑣𝑐𝑓

System Design – Data Flow of R-Store 1. Updates arrives Hbase-R 2. stream updates to a Hstreaming mapper 3. Reducer periodically materialize local data cube to Hbase-R & notifies Metastore

RTOLAP in R-Store – Query Processing • Map • Reduce • Tag the values with ‘Q’ ‘+’, ‘ - ’ • Do calculation based on aggregation function & three values

Evaluation • Cluster of 144 nodes  – Intel X3430 2.4 GHz processor  – 8 GB of memory  – 2x500 GB SATA disks  – gigabit Ethernet • TPC-H data

Evaluation - Performance of Maintaining Data cube • Hstreaming with 10 nodes have higher throughput than 40 Hbase-R nodes • 1.6 billion keys, 1% updated, update algorithm fast enough, • latency equals to Hbase-R input speed

Evaluation - Performance of RT querying • Small key range updates scans fewer data in Hbase-R, process fewer data

Evaluation - Performance of OLTP

Related Work • Database – C-Store(VLDB 05) • Main-memory database – HyPer(ICDE 11), HYRISE(VLDB 10) • Druid(SIGMOD 14)

Conclusion • Multi-version concurrent control to support RTOLAP • Data cube to reduce storage requirement & improve performance • Streaming system to refresh data cube • Available at https://github.com/lifeng5042/RStore

Backup – OLAP Cube • A multi-dimensional generalization of a two- or three-dimensional spreadsheet. Hypercube for dataset with more than three d’s. • Dimensions: Product, time, cities… • Cells: each cell of the cube holds a number that represents some measure of the business, e.g. sales, profits… • Slicer: the dimension held constant for all cells so that multi-dimensional information can be shown in a 2D physical space of a spreadsheet.

Backup – OLAP Cube • Data cube can be viewed as a lattice of cuboids • The bottom-most cuboid is the base cuboid • The top-most cuboid (apex) contains only one cell • How many cuboids in an n-dimensional cube with L levels?  n   T ( L 1 ) i  1 i

Outline Background & Motivation System Overview System Design - PowerPoint PPT Presentation

R-Store : A Scalable Distributed System for Supporting Real-time Analytics Feng Li, M. Tamer zsu, Gang Chen, Beng Chin Ooi @ICDE 2014 Presented by: Xiao Meng CS848, University of Waterloo Outline Background & Motivation System

Ins Domingues Breast Cancer Workshop April 7th 2015 Outline Outline Outline Outline

Presentation Preparation Outline Speech Outline Template ***Use this outline to guide you in

Outline for St Outline for St Outline for

Beob Kyun Kim, S oonwook Hwang {kyun, hwang}@ kisti.re.kr KIS TI, Korea Outline Outline

Catherine Revels, World Bank November 2009 Presentation outline Presentation outline

Battlestar Galactica Battlestar Galactica Galactica Battlestar Outline Outline Outline

Outline 2 Outline 2 ZSim core simulation techniques Outline 2 ZSim core simulation

Appendix J: Capstone Presentation Outline Revised Spring 2016 CAPSTONE PRESENTATION OUTLINE This

PT1 TMP Presentation Outline 1 Group Members: ___________________________________ Use this outline

Broverview Outline 2 Outline Philosophy and Architecture A framework for network traffic

Xingqian Peng, Huaqiao University, China Presented by Zhen Wu Presented by Zhen Wu October 30,2011

1 Web Application Development 2 3 Web Application Development CSS Outline An outline is a

Lecture Outline Strengthening Induction Hypothesis. Lecture Outline Strengthening Induction

STAT 213 Simple Linear Regression I Colin Reimer Dawson Oberlin College 5 October 2016 Outline

High Dimensional Approximation - Outline Background and Sources Wolfgang Dahmen Seminar: USC,

Outline Outline Deaf and Hearing Impaired Deaf and Hearing Impaired Physical Structures of

NON-GAAP FINANCIAL MEASURES Quarter Ended September 30, 2020 1 NON-GAAP FINANCIAL MEASURES We

Memory Consistency Models Nima Honarmand Fall 2015 :: CSE 610 Parallel Computer Architectures

How to Store a Secret Salim El Rouayheb Illinois Institute of Technology A Brief History of Codes

MESSAGING FOR THE CLOUD with and Hadrian Zbarcea & Jamie Goodyear Cloud Computing

Formal reasoning about the C11 weak memory model Invited talk @ CPP15 Viktor Vafeiadis Max

an and d Im Impl plementa ementation tion (01 0120 20442 4423) ) Net etwork work Ar

Now Arriving at Layer 3 Packet Forwarding although layer 2 switches and layer 3 routers

D UE to the characteristic of challenged environment suf- algorithms designed for MANETs from

Outline Background & Motivation System Overview System Design - PowerPoint PPT Presentation

R-Store : A Scalable Distributed System for Supporting Real-time Analytics Feng Li, M. Tamer zsu, Gang Chen, Beng Chin Ooi @ICDE 2014 Presented by: Xiao Meng CS848, University of Waterloo Outline Background & Motivation System

Ins Domingues Breast Cancer Workshop April 7th 2015 Outline Outline Outline Outline

Presentation Preparation Outline Speech Outline Template ***Use this outline to guide you in

Outline for St Outline for St Outline for

Beob Kyun Kim, S oonwook Hwang {kyun, hwang}@ kisti.re.kr KIS TI, Korea Outline Outline

Catherine Revels, World Bank November 2009 Presentation outline Presentation outline

Battlestar Galactica Battlestar Galactica Galactica Battlestar Outline Outline Outline

Outline 2 Outline 2 ZSim core simulation techniques Outline 2 ZSim core simulation

Appendix J: Capstone Presentation Outline Revised Spring 2016 CAPSTONE PRESENTATION OUTLINE This

PT1 TMP Presentation Outline 1 Group Members: ___________________________________ Use this outline

Broverview Outline 2 Outline Philosophy and Architecture A framework for network traffic

Xingqian Peng, Huaqiao University, China Presented by Zhen Wu Presented by Zhen Wu October 30,2011

1 Web Application Development 2 3 Web Application Development CSS Outline An outline is a

Lecture Outline Strengthening Induction Hypothesis. Lecture Outline Strengthening Induction

STAT 213 Simple Linear Regression I Colin Reimer Dawson Oberlin College 5 October 2016 Outline

High Dimensional Approximation - Outline Background and Sources Wolfgang Dahmen Seminar: USC,

Outline Outline Deaf and Hearing Impaired Deaf and Hearing Impaired Physical Structures of

NON-GAAP FINANCIAL MEASURES Quarter Ended September 30, 2020 1 NON-GAAP FINANCIAL MEASURES We

Memory Consistency Models Nima Honarmand Fall 2015 :: CSE 610 Parallel Computer Architectures

How to Store a Secret Salim El Rouayheb Illinois Institute of Technology A Brief History of Codes

MESSAGING FOR THE CLOUD with and Hadrian Zbarcea &amp; Jamie Goodyear Cloud Computing

Formal reasoning about the C11 weak memory model Invited talk @ CPP15 Viktor Vafeiadis Max

an and d Im Impl plementa ementation tion (01 0120 20442 4423) ) Net etwork work Ar

Now Arriving at Layer 3 Packet Forwarding although layer 2 switches and layer 3 routers

D UE to the characteristic of challenged environment suf- algorithms designed for MANETs from

MESSAGING FOR THE CLOUD with and Hadrian Zbarcea & Jamie Goodyear Cloud Computing