Scalable data store and analy/cs pla1orm for monitoring - PowerPoint PPT Presentation

Scalable ¡data ¡store ¡and ¡analy/cs ¡pla1orm ¡for ¡monitoring ¡WLCG, ¡ ¡ ¡a ¡distributed ¡data-‑intensive ¡scien/fic ¡infrastructure ¡ ¡ Uthay ¡Suthakar ¡ Brunel ¡University ¡ eepguus@brunel.ac.uk ¡

Topics • Introduc/on ¡to ¡current ¡architecture ¡ • Proposed ¡architecture ¡ ¡ • Lambda ¡architecture ¡ • Review ¡of ¡technologies ¡

Current ¡architecture: • Robust ¡architecture. ¡ • It ¡does ¡the ¡job! ¡ ¡ But ¡ ¡ • Expensive. ¡ • Does ¡not ¡scale ¡well. ¡ • Does ¡not ¡support ¡real-‑/me ¡ analy/cs. ¡

Proposed ¡architecture: Real-‑Time ¡Processing ¡ Batch ¡Layer ¡ Serving ¡Layer ¡ Layer ¡ Stores ¡constantly ¡growing ¡ Stores ¡the ¡batch ¡ Perform ¡analy/cs ¡on ¡fresh ¡ dataset. ¡ processed ¡views ¡for ¡ data. ¡ interac/ve ¡querying. ¡

Lambda ¡Architecture Three ¡layers ¡architecture: ¡ ¡ • Batch ¡Layer ¡ – ¡for ¡batch ¡ processing ¡on ¡Big ¡Data ¡and ¡ producing ¡queryable ¡views. ¡ • Serving ¡Layer ¡ – ¡for ¡ad-‑hoc ¡ query ¡(ideally ¡from ¡views ¡ generated ¡by ¡the ¡batch ¡layer). ¡ • Speed ¡Layer ¡ – ¡for ¡real-‑/me ¡ views ¡based ¡on ¡incremental ¡ algorithms. ¡

Batch ¡Layer ¡(i): ¡Hadoop ¡& ¡MapReduce • Programming ¡model ¡proposed ¡by ¡Google. ¡ • Solve ¡the ¡complex ¡issues ¡(compute ¡in ¡parallel, ¡load ¡balance ¡& ¡fault ¡ • tolerance). ¡ • Two ¡primi/ve ¡parallel ¡methods ¡(Map ¡and ¡Reduce). ¡

Batch ¡Layer ¡(ii): ¡Stratosphere • Stratosphere ¡extends ¡the ¡well-‑known ¡MapReduce ¡model ¡with ¡new ¡operators. ¡ • All ¡operators ¡will ¡start ¡working ¡in ¡memory. ¡ • Support ¡Java ¡or ¡Scala. ¡ • Scales ¡horizontally. ¡ • Seamlessly ¡integrates ¡into ¡exis/ng ¡Hadoop. ¡ • Built-‑In ¡Op/mizer. ¡

Serving ¡Layer ¡(1): ¡Apache ¡Drill ¡ • Inspired ¡by ¡Google’s ¡Dremel. ¡ • Drill ¡provides ¡a ¡distributed ¡execu/on ¡engine ¡for ¡interac/ve ¡queries. ¡ • Low ¡latency ¡ad-‑hoc ¡queries ¡to ¡many ¡different ¡data ¡sources. ¡ • Goal ¡is ¡to ¡scale ¡to ¡10,000 ¡servers ¡and ¡process ¡petabytes ¡of ¡data ¡within ¡seconds. ¡ • Supports ¡mul/ple ¡data ¡models: ¡ ¡-‑ ¡Schema: ¡Protocol ¡Buffers ¡& ¡Apache ¡Avro ¡ ¡-‑ ¡Schema-‑less: ¡JSON,BSON, ¡etc.. ¡

Serving ¡Layer ¡(ii): ¡Cloudera ¡Impala • Massively ¡Parallel ¡Processing ¡query ¡engine. ¡ • Low-‑latency ¡SQL ¡queries. ¡ • ¡Interac/ve ¡analy/cs ¡directly ¡on ¡data ¡stored ¡in ¡Hadoop ¡without ¡data ¡movement ¡or ¡predefined ¡schemas. ¡ • Shares ¡workload ¡management, ¡metadata, ¡ODBC ¡driver, ¡SQL ¡syntax ¡and ¡user ¡interface ¡with ¡Apache. ¡ • SQL-‑92 ¡features ¡of ¡Hive ¡Query ¡Language ¡including ¡SELECT, ¡joins, ¡and ¡aggregate ¡func/ons. ¡

Serving ¡Layer(iii): ¡Presto ¡(Facebook) • Distributed ¡SQL ¡query ¡engine ¡op/mized ¡for ¡ad-‑hoc ¡analysis. ¡ • Supports ¡complex ¡queries, ¡aggrega/ons, ¡joins, ¡and ¡window ¡func/ons. ¡ • Read-‑Only. ¡

Speed ¡Layer ¡(i): ¡Storm ¡ • Exposes ¡parallel ¡real-‑/me ¡computa/on ¡model. ¡ • Highly ¡Scalable. ¡ • Guarantees ¡that ¡every ¡message ¡will ¡be ¡processed. ¡ • ¡Transac/onal ¡topologies. ¡ • Stream ¡Processing. ¡ • Con/nuous ¡Computa/on. ¡ • Distributed ¡RPC. ¡ • Stream ¡Groupings. ¡

Speed ¡Layer ¡(ii): ¡Amazon ¡Kinesis ¡ • Streaming ¡data ¡as ¡managed ¡service ¡ (Cloud ¡Service). ¡ • Based ¡on ¡metering ¡system ¡(charged ¡ based ¡on ¡shards ¡and ¡HTTP ¡PUT ¡ transac/on). ¡ • Capacity ¡of ¡the ¡streams ¡are ¡ configured ¡as ¡shards ¡(throughput ¡ capacity). ¡ • Kinesis ¡Client ¡Library ¡– ¡responsible ¡ for ¡load ¡balancing, ¡coordina/on ¡and ¡ error ¡handling. ¡ ¡

Speed ¡Layer ¡(iii): ¡Samza • Three ¡layers; ¡stream ¡layer, ¡execu/ng ¡layer ¡and ¡processing ¡layer. ¡ • Samza ¡is ¡pluggable. ¡ • Streams ¡are ¡par//oned ¡and ¡ordered ¡sequen/ally. ¡ • stream ¡is ¡composed ¡of ¡immutable ¡messages ¡of ¡a ¡similar ¡type ¡(kaea ¡topics). ¡ • States ¡are ¡co-‑located ¡with ¡each ¡tasks. ¡ • Check ¡poin/ng ¡for ¡failure ¡recovery. ¡

Speed ¡Layer ¡(iv): ¡S4 • Distributed ¡stream ¡processing ¡engine ¡inspired ¡by ¡the ¡MapReduce. ¡ • Combina/on ¡of ¡MapReduce ¡and ¡the ¡Actors ¡model. ¡ • Provides ¡a ¡simple ¡Programming ¡Interface. ¡ • Decentralized ¡and ¡Symmetric ¡architecture ¡(managed ¡by ¡ZooKeeper). ¡ • Pluggable ¡architecture. ¡ • Lossy ¡failover ¡is ¡acceptable ¡– ¡Processes ¡are ¡moved ¡to ¡standby. ¡ • Several ¡PEs ¡are ¡available ¡for ¡standard ¡tasks ¡such ¡as ¡count, ¡ ¡ ¡ ¡ ¡ ¡ ¡aggregate, ¡join, ¡and ¡so ¡on… ¡

Spark, ¡Shark, ¡Spark ¡Stream, ¡etc… ¡(i) • In-‑memory ¡distributed ¡compu/ng ¡framework. ¡ • Provides ¡a ¡general ¡programming ¡model ¡(operators ¡such ¡as ¡ Map, ¡Reduce, ¡Join, ¡Filter, ¡GroupBy, ¡Sort, ¡LeiOuterJoin, ¡ RightOuterJoin, ¡Count, ¡Union, ¡Cross, ¡etc..). ¡ • Low-‑latency ¡ computa(ons ¡by ¡caching ¡the ¡working ¡dataset ¡ in ¡memory. ¡ • Fault ¡tolerance ¡by ¡lineage ¡or ¡check ¡poin/ng. ¡ • Spark ¡extends ¡it’s ¡engine ¡for ¡stream ¡processing. ¡ • Provides ¡same ¡Spark ¡APIs ¡for ¡processing ¡stream. ¡

Summary • MapReduce ¡to ¡generate ¡reports ¡and ¡answer ¡historical ¡ queries. ¡ ¡Separate ¡technologies ¡== ¡Complex ¡to ¡manage ¡and ¡ • Interac/ve ¡computa/on ¡for ¡ad-‑hoc ¡queries. ¡ maintain. ¡ • Stream ¡for ¡real-‑/me ¡analy/cs. ¡

Scalable data store and analy/cs pla1orm for monitoring - PowerPoint PPT Presentation

Scalable data store and analy/cs pla1orm for monitoring WLCG, a distributed data-intensive scien/fic infrastructure Uthay Suthakar Brunel University

Sapporo Sapporo Namba Namba Shinjuku Shinjuku Store Store Store Store West Store West

The Policy Coali-on: A Pla1orm for Advocacy to Promote

Analy&c Window Fu Func&ons A prac'cal look at using analy'c func'ons Olympia Area

Cache Coherence in Scalable Machines Scalable Cache Coherent Systems Scalable, distributed

Store Presentation And Design Store Presentation And Design Looking for qualified reading

Brand In Store Display Distrib tributi tion on Brasla Cosmetics Ayur Store e Images ges

IBS (protons at store) as part of APEX during April 12, 2012 Protons at store: contribution from

Antidot Training AFS@Store AFS@Store Introduction 2 Antidot solution for E-Commerce 3 What

University of Oxford Online Store Linda McCluskey Online Store Manager Cashiers Office, Finance

Malware Analysis Using Visualized Image Matrices Tzu-Ming Huang CISC850 Cyber Analy@cs CISC850

Coll llaboration through Analy lysis: A jo journey in in dig igital content management

Scalable String Matching on the Scalable String Matching on the Scalable String Matching on the

The need for File Systems Need to store data and programs in files Must be able to store lots of

2016 Coordinated Monitoring Schedule 1 Navigation of Coordinated Monitoring website

KAFKA STREAMS CLOUD MONITORING AWS CLOUD MONITORING AWS APP CLOUD MONITORING AWS HTTP APP

Store Presentation and Design 4 (Hardback) Store Presentation and Design 4 (Hardback) Filesize:

Laconic Object Query Language Using Features of Object Model V. Dimitrov Petrozavodsk State

An Enhanced Visualization Service based on Geospatial and Statistical Linked Open Data Monica

Modeling Rich Interac1ons in Session Search Georgetown

Profile EMR Queries for November 2017 Cole Stanley, MD Medical Lead, Con?nuous Quality

WP5 Time and Streams Ralf Mller, TUHH November 7th, 2012 . How: The Optique Project Who:

Rethink-ing your data store All about RethinkDB by: Brian Maula Outline A developers

EuResist EuResist Queries Data Idea Paper Board centers & Tables analysis P.I. and

YAGO - Yet Another Great Ontology YAGO: A Large Ontology from Wikipedia and WordNet 1 Presentation

Scalable data store and analy/cs pla1orm for monitoring - PowerPoint PPT Presentation

Scalable data store and analy/cs pla1orm for monitoring WLCG, a distributed data-intensive scien/fic infrastructure Uthay Suthakar Brunel University

Sapporo Sapporo Namba Namba Shinjuku Shinjuku Store Store Store Store West Store West

The Policy Coali-on: A Pla1orm for Advocacy to Promote

Analy&amp;c Window Fu Func&amp;ons A prac'cal look at using analy'c func'ons Olympia Area

Cache Coherence in Scalable Machines Scalable Cache Coherent Systems Scalable, distributed

Store Presentation And Design Store Presentation And Design Looking for qualified reading

Brand In Store Display Distrib tributi tion on Brasla Cosmetics Ayur Store e Images ges

IBS (protons at store) as part of APEX during April 12, 2012 Protons at store: contribution from

Antidot Training AFS@Store AFS@Store Introduction 2 Antidot solution for E-Commerce 3 What

University of Oxford Online Store Linda McCluskey Online Store Manager Cashiers Office, Finance

Malware Analysis Using Visualized Image Matrices Tzu-Ming Huang CISC850 Cyber Analy@cs CISC850

Coll llaboration through Analy lysis: A jo journey in in dig igital content management

Scalable String Matching on the Scalable String Matching on the Scalable String Matching on the

The need for File Systems Need to store data and programs in files Must be able to store lots of

2016 Coordinated Monitoring Schedule 1 Navigation of Coordinated Monitoring website

KAFKA STREAMS CLOUD MONITORING AWS CLOUD MONITORING AWS APP CLOUD MONITORING AWS HTTP APP

Store Presentation and Design 4 (Hardback) Store Presentation and Design 4 (Hardback) Filesize:

Laconic Object Query Language Using Features of Object Model V. Dimitrov Petrozavodsk State

An Enhanced Visualization Service based on Geospatial and Statistical Linked Open Data Monica

Modeling Rich Interac1ons in Session Search Georgetown

Profile EMR Queries for November 2017 Cole Stanley, MD Medical Lead, Con?nuous Quality

WP5 Time and Streams Ralf Mller, TUHH November 7th, 2012 . How: The Optique Project Who:

Rethink-ing your data store All about RethinkDB by: Brian Maula Outline A developers

EuResist EuResist Queries Data Idea Paper Board centers &amp; Tables analysis P.I. and

YAGO - Yet Another Great Ontology YAGO: A Large Ontology from Wikipedia and WordNet 1 Presentation

Analy&c Window Fu Func&ons A prac'cal look at using analy'c func'ons Olympia Area

EuResist EuResist Queries Data Idea Paper Board centers & Tables analysis P.I. and