Better TV & Broadband with Kafka & Spark Phill Radley - PowerPoint PPT Presentation

Nov 02, 2023 •119 likes •286 views

Better TV & Broadband with Kafka & Spark Phill Radley Chief Data Architect British Telecommunications plc In the beginning ( 2012 ) Hadoop HaaS Hadoop - Admin as a Service Admin Group Early adoption Spark will replace

Better TV & Broadband with Kafka & Spark Phill Radley Chief Data Architect British Telecommunications plc
In the beginning ( 2012 )
Hadoop HaaS Hadoop - Admin as a Service Admin Group
Early adoption
“Spark will replace map/reduce as the standard execution for Hadoop” Doug Cutting – Sep 2015
HaaS 2.0 Denser Nodes doubled #cores trebled RAM Same node count 
Cluder migration
TV Set Top Box Broadband Home Hub
TV & BB Data Pipeline Overview YARN Cluster big Kafka Broker XML Spark Gateway payload Kafka raw consumer Producer Enrich Atomic Aggregate Firewall metrics every rich Producer flume HDFS HIVE Impala ESB Tables CRM HAAS enrichment data
Data Ingest Kafka - Raw topic
Data Serving – Impala Concurrency
Schema Design … on read … DEVOPS approach  Flat (De-Normalised) Tables, table per query  Queried with SELECT * FROM …. WHERE …  Table Dimensions ( rows & columns )  Table File formats optimised for table query pattern ( up to 10 x difference ) 1. AVRO for tables being queried row oriented queries 2. Parquet – default time series 3. Parquet with snappy compression for deep time queries
Impala Tuning… - There’s lots of options, the default will not be good enough - ( it’s not as mature as an Oracle DB ; -) - Isolate operational tenant loads with their own Dedicated Impala Resource Pool - “Dedicated SQL Queue” added to platform service portfolio - Chargeable platform feature ( as its dedicated resource ) - Tuning Impala Daemons - Query Executor & Scanner Threads for MAX concurrency, shortest que - HDFS Caching - Currently in test, expecting a 2-5x speed up, more importantly eliminates unnecessary physical I/O ( these are hot tables keep them in memory )
Conclusions after months in production….  Spark 1.6 very stable  Impala requires a lot of tuning & table design to get working  High demand to use the data for other customer experience work  This solution runs on a multi-tenant cluster running hundreds of batch loads, and dozens of ad-hoc self-service analytics and data science users - i.e. the isolation using cgroups seems to work ( mostly )  Next Steps - Another similar data pipeline from internal nework - Multi-tenant Kafka ( Topic as a Service ) to service more clients - Second Data centre Site with dual ingest for high availability
Thank you 

Recommend

Day 3 Lab1: Spark Streaming with Kafka Example Introductions In this example, we will write a

Day 3 Lab1: Spark Streaming with Kafka Example Introductions In this example, we will write a Spark Streaming program that consumes messages from Kafka. We will reuse the whole setup from the previous lab, so this lab is best done as a

233 views • 4 slides

Day 4 Lab1: Docker container for Kafka - Spark streaming - Cassandra This Dockerfile sets up

Day 4 Lab1: Docker container for Kafka - Spark streaming - Cassandra This Dockerfile sets up a complete streaming environment for experimenting with Kafka, Spark streaming (PySpark), and Cassandra. It installs Kafka 0.10.2.1 Spark 2.1.1

156 views • 4 slides

Building a Scalable Recommender System with Apache Spark, Apache Kafka and Elasticsearch About

Nov / 14 / 16 Nick Pentreath Building a Scalable Recommender System with Apache Spark, Apache Kafka and Elasticsearch About @MLnick Principal Engineer, IBM Apache Spark PMC Focused on machine learning Author of Machine Learning

666 views • 53 slides

Apache Spark Streaming, Kafka and HarmonicIO: A Performance Benchmark and Architecture Comparison

Apache Spark Streaming, Kafka and HarmonicIO: A Performance Benchmark and Architecture Comparison for Enterprise and Scientific Computing Ben Blamey, Andreas Hellander and Salman Toor Uppsala University, Sweden Ben.Blamey@it.uu.se Bench19,

474 views • 20 slides

Fixed Wireless Access (FWA) Product Overview FWA Overview Spark Wireless Broadband is a

Fixed Wireless Access (FWA) Product Overview FWA Overview Spark Wireless Broadband is a fast and easy home broadband service that uses Sparks super -fast 4G network. There are no installation costs, no copper cables, and no waiting

451 views • 5 slides

Real Time Aggregation with Kafka ,Spark Streaming and ElasticSearch , scalable beyond Million RPS

Real Time Aggregation with Kafka ,Spark Streaming and ElasticSearch , scalable beyond Million RPS Dibyendu B Dataplatform Engineer, InstartLogic 1 Who We are 2 3 Dataplatform : Streaming Channel Ad-hoc queries, offline queries

579 views • 31 slides

Intr Intro o to Spark to Spark and Spark and Spark SQL SQL AMP Camp 2014 Michael Armbrust -

Intr Intro o to Spark to Spark and Spark and Spark SQL SQL AMP Camp 2014 Michael Armbrust - @michaelarmbrust What is Apache Spark? Fast and general cluster computing system, interoperable with Hadoop, included in all major distros

667 views • 43 slides

FROM HTTP TO KAFKA-BASED FROM HTTP TO KAFKA-BASED MICROSERVICES MICROSERVICES Wojciech Rzsa,

7/17/2019 From HTTP to Kafka-based microservices FROM HTTP TO KAFKA-BASED FROM HTTP TO KAFKA-BASED MICROSERVICES MICROSERVICES Wojciech Rzsa, FLYR Poland @wrzasa localhost:4567/index.html?print-pdf#/ 1/83 From HTTP to Kafka-based

1.15k views • 83 slides

Kafka Needs No Keeper Colin McCabe 2 Introduction Kafka has gotten its mileage out of

1 Kafka Needs No Keeper Colin McCabe 2 Introduction Kafka has gotten its mileage out of Zookeeper But it is still a second system KIP-500 has been adopted by the community This is not a 1-1 replacement Weve been

1.92k views • 124 slides

Spark Code Camp Discover Spark Streaming & Spark SQL Project Overview Focus on Spark

Spark Code Camp Discover Spark Streaming & Spark SQL Project Overview Focus on Spark Streaming and Spark SQL Explored Streaming API of Apache Spark on Ukko Cluster Window based Stream Content Direct Stream content

221 views • 9 slides

Spark Technology 1. Spark main objectives 2. RDD concepts and operations 3. SPARK application

10/05/2019 Big Data : Informatique pour les donnes et calculs massifs 7 SPARK technology Stphane Vialle Stephane.Vialle@centralesupelec.fr http://www.metz.supelec.fr/~vialle Spark Technology 1. Spark main objectives 2. RDD concepts

818 views • 39 slides

High Integrity Ada with SPARK Praxis Critical Systems 1 SPARK and the SPARK Examiner What is

High Integrity Ada with SPARK Praxis Critical Systems 1 SPARK and the SPARK Examiner What is SPARK? A sub-language of Ada 83 and 95 with particular properties that make it ideally suited to the most critical of applications: completely

848 views • 10 slides

Introduction to Kafka Instructor: Ekpe Okorafor 1. Big Data Academy - Accenture 2. Computer

Introduction to Kafka Instructor: Ekpe Okorafor 1. Big Data Academy - Accenture 2. Computer Science - African University of Science & Technology Agenda Introduction - Messaging Basics Kafka Architecture Kafka

555 views • 26 slides

Cr Cruise Co Control: Effo l: Effortle less M Manage gement o of K f Kafka fka Clu

Cr Cruise Co Control: Effo l: Effortle less M Manage gement o of K f Kafka fka Clu Clusters Adem Efe Gencer Senior Software Engineer LinkedIn Kafka: A Distributed Stream Processing Platform : High throughput & low latency :

989 views • 50 slides

Kafka in Jail Running Kafka in container orchestrated clusters Sean Glover, Lightbend @seg1o

Kafka in Jail Running Kafka in container orchestrated clusters Sean Glover, Lightbend @seg1o Who am I? Im Sean Glover Senior Software Engineer at Lightbend Member of the Fast Data Platform team Contributor to various projects in

788 views • 60 slides

Apache Kafka + Apache Mesos Highly Scalable Streaming Microservices with Kafka Streams Kai

Apache Kafka + Apache Mesos Highly Scalable Streaming Microservices with Kafka Streams Kai Waehner Technology Evangelist kontakt@kai-waehner.de LinkedIn @KaiWaehner www.kai-waehner.de Confidential 1 Abstract Microservices establish many

977 views • 44 slides

Spark Processing 101 September 10, 2015 Justin Sun Overview What is Spark? SparkContext

Spark Processing 101 September 10, 2015 Justin Sun Overview What is Spark? SparkContext Resilient Distributed Datasets (RDDs) Transformations Actions Code Examples Resources What is Spark? General cluster

356 views • 10 slides

SPARK G N I T E R R A M E V I T A E C u s t o m L o g o P r o j e c t Introduction

C K PROPOSAL SPARK G N I T E R R A M E V I T A E C u s t o m L o g o P r o j e c t Introduction We are excited to quote on your logo design project and believe that Spark is a perfect fit for your needs. Concept Presentation

554 views • 30 slides

High throughput High throughput kafka for science kafka for science Testing Kafkas limits

High throughput High throughput kafka for science kafka for science Testing Kafkas limits for science J Wyngaard, PhD wyngaard@jpl.nasa.gov UTLINE O UTLINE O Streaming Science Data Benchmark Context Tests and Results

473 views • 30 slides

Apache Kafka Real-Time Data Pipelines http://kafka.apache.org/ Joe Stein Developer,

Apache Kafka Real-Time Data Pipelines http://kafka.apache.org/ Joe Stein Developer, Architect & Technologist Founder & Principal Consultant => Big Data Open Source Security LLC - http://stealth.ly Big Data Open Source

637 views • 40 slides

SPARK NEW ZEALAND ANNUAL MEETING 2015 Spark New Zealand 2015 Spark New Zealand 2015 2 Order of

SPARK NEW ZEALAND ANNUAL MEETING 2015 Spark New Zealand 2015 Spark New Zealand 2015 2 Order of Meeting: Introductions and formalities Chairmans address Managing Director update Resolutions Shareholder questions Conduct of polls Meeting

421 views • 38 slides

Distributing Matrix Computations with Spark MLlib Reza Zadeh A General Platform Standard libraries

Distributing Matrix Computations with Spark MLlib Reza Zadeh A General Platform Standard libraries included with Spark Spark MLlib Spark SQL GraphX Streaming machine structured graph learning real-time Spark Core Outline Introduction to

682 views • 40 slides

www.e-nc.org STATE ENVIRONMENT FOR BROADBAND Why is it hard to get broadband to the last mile?

House Select Committee on High Speed Internet in Rural Area and Urban Areas February 22, 2010 Jane Smith Patterson Executive Director The e-NC Authority www.e-nc.org STATE ENVIRONMENT FOR BROADBAND Why is it hard to get broadband to the

163 views • 14 slides

READING KAFKA IN QATAR Qatar-TESOL Conference, April 2011 Magdalena Rostron Academic Bridge

READING KAFKA IN QATAR Qatar-TESOL Conference, April 2011 Magdalena Rostron Academic Bridge Program Qatar Foundation, Doha, Qatar mrostron@qf.org.qa Introduction my story Reading Kafka In Qatar Reading Kafka in Qatar

562 views • 10 slides