Dremel: Interactive Analysis of Web- Scale Datasets S E R G E Y M - PowerPoint PPT Presentation

Dremel: Interactive Analysis of Web- Scale Datasets S E R G E Y M E L N I K , A N D R E Y G U B A R E V, J I N G J I N G L O N G , G EO F F R E Y R O M E R , S H I VA S H I VA K U M A R , M AT T TO LTO N ,T H EO VA S S I L A K I S P R E S E N T E D B Y D I PA N N I TA D E Y

Outline • Problem • Existing technology • Dremel • Basic features • Applications • Infrastructure & details • Experiments • Evaluations 2

Problem: Latency Matters Trends Interactive Spam Detection Tools Real-time Web Network Dashboards Optimization 3

Existing Technologies • Map-Reduce - Record-oriented data - Does not work with data in-situ - Suitable for batch-processing Inherent Latency between submitting query and getting • Pig result • Hive 4

Dremel • Interactive ad-hoc query system • Scales to thousands of nodes • Fault tolerant and handles stragglers • SQL like query language and multi-level execution trees • Nested data model • Columnar storage of nested (non-relational) data • Tree like architecture similar to web search • Interoperability with data • Access data in situ (Eg. – GFS, Bigtable) • MapReduce Pipelines 5

Widely used inside Google since 2010 • Analysis of crawled web documents • Tracking install data for applications on Android Market • Crash reporting for Google products • Spam analysis • Debugging of map tiles on Google Maps • Tablet migrations in managed Bigtable instances • Results of tests run on Google's distributed build system • Disk I/O statistics for hundreds of thousands of disks • Resource monitoring for jobs run in Google's data centers 6

Columnar data storage format Advantage: Read less, fast access, lossless representation Challenge: preserve structure, reconstruct from a subset of fields 7

Nested data model DocId: 10 DocId: 20 message Document { Links Links required int64 DocId; [1,1] Forward: 20 Backward: 10 optional group Links { Forward: 40 Backward: 30 Forward: 60 repeated int64 Backward; [0,*] Forward: 80 Name Name repeated int64 Forward; Language Url: 'http://C' } Code: 'en-us' repeated group Name { Country: 'us' repeated group Language { Language Code: 'en' required string Code; Url: 'http://A' optional string Country; [0,1] Name } Url: 'http://B' optional string Url; Name Language } Code: 'en-gb' } Country: 'gb' 8

Repetition and definition levels r 1 r 2 DocId: 10 DocId: 20 r=1 r=2 (non-repeating) Links Links Forward: 20 Backward: 10 Name.Language.Code Forward: 40 Backward: 30 value r d Forward: 60 Forward: 80 Name Name en-us 0 2 Language Url: 'http://C' en 2 2 Code: 'en-us' Country: 'us' NULL 1 1 Language en-gb 1 2 Code: 'en' Url: 'http://A' NULL 0 1 Name Url: 'http://B' r : At what repeated field in the field's path Name the value has repeated Language Code: 'en-gb' d : How many fields in paths that could be Country: 'gb' undefined (opt. or rep.) are actually present 9

Column-striped representation DocId Name.Url Links.Forward Links.Backward value r d value r d value r d value r d 10 0 0 20 0 2 NULL 0 1 http://A 0 2 20 0 0 40 1 2 10 0 2 http://B 1 2 60 1 2 30 1 2 NULL 1 1 Name.Language.Code 80 0 2 http://C 0 2 Name.Language.Country value r d value r d en-us 0 2 us 0 3 en 2 2 NULL 2 2 NULL 1 1 NULL 1 1 en-gb 1 2 gb 1 3 NULL 0 1 NULL 0 1 10

Record assembly FSM Transitions DocId labeled with 0 repetition levels 0 1 1 Links.Backward Links.Forward 0 0,1,2 Name.Language.Code Name.Language.Country 2 Name.Ur 0,1 1 l 0 For record-oriented data processing (e.g., MapReduce) 11

SQL dialect for nested data SELECT DocId AS Id, COUNT(Name.Language.Code) WITHIN Name AS Cnt, Name.Url + ',' + Name.Language.Code AS Str FROM t WHERE REGEXP(Name.Url, '^http') AND DocId < 20; Output schema Output table message QueryResult { Id: 10 t 1 required int64 Id; Name repeated group Name { Cnt: 2 optional uint64 Cnt; Language repeated group Language { Str: 'http://A,en-us' optional string Str; Str: 'http://A,en' } Name } Cnt: 0 } No record assembly during query processing 12

Serving tree • Parallelizes scheduling and client root server aggregation • Fault tolerance . . . intermediate • Stragglers servers • Designed for "small" results . . . (<1M records) leaf servers . . . (with local storage) histogram of response times storage layer (e.g., GFS) 13

Example: count() 0 SELECT A, COUNT(B) FROM T SELECT A, SUM(c) GROUP BY A FROM (R 1 1 UNION ALL R 1 10) T = {/gfs/1, /gfs/2, …, /gfs/100000} GROUP BY A R 1 1 R 1 2 SELECT A, COUNT(B) AS c SELECT A, COUNT(B) AS c 1 . . . FROM T 1 1 GROUP BY A FROM T 1 2 GROUP BY A T 1 1 = {/gfs/1, …, /gfs/10000} T 1 2 = {/gfs/10001, …, /gfs/20000} . . . SELECT A, COUNT(B) AS c . . . 3 FROM T 3 1 GROUP BY A T 3 1 = {/gfs/1} Data access ops 14

Experiments • 1 PB of real data (uncompressed, non-replicated) • 100K-800K tablets per table • Experiments run during business hours Table Number of Size (unrepl., Number Data Repl. name records compressed) of fields center factor 3 × T1 85 billion 87 TB 270 A 3 × T2 24 billion 13 TB 530 A 3 × T3 4 billion 70 TB 1200 A 3 × T4 1+ trillion 105 TB 50 B 2 × T5 1+ trillion 20 TB 30 B 15

Read from disk time (sec) ( e ) parse as from records C++ objects objects ( d ) read + records decompress columns from columns ( c ) parse as C++ objects ( b ) assemble records ( a ) read + decompress number of fields Table partition: 375 MB (compressed), 300K rows, 125 columns 16

MR and Dremel execution Avg # of terms in specific field in table T1 execution time (sec) on 3000 nodes 87 TB 0.5 TB 0.5 TB SELECT SUM(count_words(txtField)) / COUNT(*) Q1: FROM T1 MR overheads: launch jobs, schedule 0.5M tasks, assemble records 17

Impact of serving tree depth execution time (sec) SELECT country, Q2: SUM(item.amount) FROM T2 GROUP BY country SELECT domain, Q3: SUM(item.amount ) FROM T2 WHERE domain (returns 100s of records) (returns 1M records) CONTAINS ’ .net ’ GROUP BY domain 18

Scalability execution time (sec) number of leaf servers Q5 on a trillion-row table T4: SELECT TOP(aids, 20), COUNT(*) FROM T4 19

Interactive speed percentage of queries Monthly query workload of one 3000-node Dremel instance execution time (sec) Most queries complete under 10 sec 20

Outcome • Google Big-Query - Web Service (pay-per-query) BigQuery • Apache Drill - Open source Implementation of BigQuery 21

Take Away • Map-Reduce can benefit from columnar storage like a parallel DBMS - Record assembly is expensive - Dremel complements MR and together produces best results • Parallel DBMS can benefit from serving tree architecture 22

Thank You 23

Dremel: Interactive Analysis of Web- Scale Datasets S E R G E Y M - PowerPoint PPT Presentation

Dremel: Interactive Analysis of Web- Scale Datasets S E R G E Y M E L N I K , A N D R E Y G U B A R E V, J I N G J I N G L O N G , G EO F F R E Y R O M E R , S H I VA S H I VA K U M A R , M AT T TO LTO N ,T H EO VA S S I L A K I S P R E S

Dremel: Interac-ve Analysis of Web-Scale Datasets By Frank

Dremel: Interactive Analysis of Web-Scale Datasets CS 744 BIG DATA PHIL MARTINKUS Motivation

Dremel: Interactice Analysis of Web-Scale Datasets By Sergey Melnik, Andrey Gubarev, Jing Jing

Interactive Analysis of Web-Scale Database Presented by Jian Fang Most parts of these slides are

Improving Spark Performance with Zero-copy Buffer Management and RDMA Hu Li, Charley Chen and Wei

Apache Drill INTERACTIVE, AD-HOC QUERY AT SCALE Present by Jian Fang

The Two Faces of Cross-Border Banking Flows Dennis Reinhardt (Bank of England) and Steven J.

Flexible data transport for online reconstruction M. Al-Turany Dennis Klein A. Rybalchenko M.

2015 Annual Results Fixed Income Investor Presentation 26 February 2016 Ewen Stevenson Chief

Introduction to Bayesian Analysis in Stata The Method Fundamental equation MCMC Gustavo

HTCondor and Containers for Batch and Interactive use (Mostly) a success story Oliver

Plot and Mapped Attributes (Part 1) Omayma Said Data Scientist DataCamp Interactive Data

Increasing Interactivity A Refresh Teaching event 12.04.16 Presented by: Prof. Dr. Florian

Interactive Proofs in Higher-Order Concurrent Separation Logic Robbert Krebbers 1 Amin Timany 2

VISTA: A System for Interactive Code Improvement Wankang Zhao, David Whalley, Robert van Engelen,

VO Sandpit, November 2009 Are you sitting comfortably? VO Sandpit, November 2009 Creating data:

Sponsorship Brochure Combining healthcare management and clinical leadership 950+

Project work: Date and time of exam Work in your team to complete next milestone Exam

MMI 2: Mobile Human- Computer Interaction Visualization and Interaction Techniques for Small

Vaccinate with Confidence for COVID-19 Vaccines Amanda Cohn, MD October 30, 2020 For more

Mahdi Roozbahani Lecturer, Computational Science and Engineering, Georgia Tech Alternate Title

Interact 2015 Conference Bamberg, Germany Michael M. Pirker , Regina Bernhaupt, Franois Manciet

Proof Engineering of Higher Order Logic Wang) Collaboration, Translation, Checking and Retrieval

CS314 Software Engineering Sprint 5 - Release! Dave Matthews Sprint 5 Summary Use software