Audi's journey to an enterprise big data platform Strata Data 2018 - London Matthias Graunitz (AUDI AG, Germany) Carsten Herbe (Audi Business Innovation GmbH, Germany)
2 AUDI AG Strata Data London 2018 - Audi's journey to an enterprise big data platform WHO ARE WE ?
3 AUDI AG Strata Data London 2018 - Audi's journey to an enterprise big data platform Audi Group Audi, Lamborghini, Ducati and Italdesign
4 AUDI AG Strata Data London 2018 - Audi's journey to an enterprise big data platform Vorsprung is our promise Strategy 2025
Audi Business Innovation GmbH 5 AUDI AG Strata Data London 2018 - Audi's journey to an enterprise big data platform ...is the development, establishment, sales and operation of innovative concepts, products and services, as well the holding of shares in the field of future mobility. Audi mobility Audi balanced Audi customer innovations technologies IT solutions Audi on demand Audi e-gas
6 AUDI AG Strata Data London 2018 - Audi's journey to an enterprise big data platform About us Matthias Graunitz Carsten Herbe AUDI AG Audi Business Innovation GmbH » Center of Competence Big Data & BI » Data Platform & Solution Architecture » Big Data Architect » Hadoop since 2013 » 10+ years Data Warehousing & BI » 10+ years Data Warehousing & BI
7 AUDI AG Strata Data London 2018 - Audi's journey to an enterprise big data platform 2 YEARS AGO… STARTING BIG DATA AT AUDI
8 AUDI AG Strata Data London 2018 - Audi's journey to an enterprise big data platform Analytical Capabilities by 2015 Data Domains ! Programs Projects Data Scientists AAP – AUDI ANALYTIC PLATTFORM Secure Embed Analytics Deliver Information Manage Production Data Infor- Complex Analyitcal Dash- Planning & Visual mation Event Purchase Quality APIs boarding Simulation Analytics Authentifi- Processing cation Master Data Analyze Data Mgmt Data BI Report & Statistical Analytical Finance Sales Encryption OLAP Methods Script Data Lineage Auditing Store, Distribute and Process Data Data Analytical Car Data Infrastruc- Design & Warehouse Databases ture & Maintain Services Solutions Provision Data Hardware, ETL Batch Data Access / Lifecycle Network, OS Framework Processing APIs Mgmt Deliver Service Development Monitoring Process & On-Prem Application Methods Platform Deployment
9 AUDI AG Strata Data London 2018 - Audi's journey to an enterprise big data platform Analytical Capabilities by 2015 Data Domains ! Programs Projects Data Scientists AAP – AUDI ANALYTIC PLATTFORM Secure Embed Analytics Deliver Information Manage Production Data Infor- Complex Analyitcal Dash- Planning & Visual mation Event Purchase Quality APIs boarding Simulation Analytics Authentifi- Processing cation Master Data Analyze Data Mgmt Data BI Report & Statistical Analytical Finance Sales Encryption OLAP Methods Script Data Lineage Auditing Store, Distribute and Process Data Data Analytical Car Data Infrastruc- Design & Warehouse Databases ture & Maintain Services Solutions Provision Data Hardware, ETL Batch Data Access / Lifecycle Network, OS Framework Processing APIs Mgmt Deliver Service Development Monitoring Process & On-Prem Application Methods Platform Deployment
10 AUDI AG Strata Data London 2018 - Audi's journey to an enterprise big data platform Analytical Capabilities by 2015 Data Domains ! Programs Projects Data Scientists AAP – AUDI ANALYTIC PLATTFORM Secure Embed Analytics Deliver Information Manage Production Data Infor- Complex Analyitcal Dash- Planning & Visual mation Event Purchase Quality APIs boarding Simulation Analytics Authentifi- Processing cation Master Data Analyze Data Mgmt Data BI Report & Statistical Analytical Machine Finance Sales Encryption OLAP Methods Learning Script Data Lineage Auditing Store, Distribute and Process Data Data Analytical File Systems Car Data Infrastruc- Design & (HDFS) Warehouse Databases ture & Maintain Services Solutions Provision Data Hardware, ETL Batch Data Access / Lifecycle Stream Network, OS Framework Processing Processing APIs Mgmt Deliver Service Development Monitoring Process & On-Prem Application Cloud Methods Platform Platform Deployment
11 AUDI AG Strata Data London 2018 - Audi's journey to an enterprise big data platform Our first Hadoop Cluster 2015 DEV Hadoop per node Sum # data nodes 1 4 RAM 128 GB 0,5 TB Cores 24 96 HDD* 40 TB 160 TB * Raw Capacity without replication and FS overhead!
12 AUDI AG Strata Data London 2018 - Audi's journey to an enterprise big data platform Our first attempt to walk with Big Data Technologies SCREWDRIVER COMPANY CAR ANALYSIS ANALYSIS
13 AUDI AG Strata Data London 2018 - Audi's journey to an enterprise big data platform ENTERPRISE INTEGRATION VS SPEED OF DELIVERY
14 AUDI AG Strata Data London 2018 - Audi's journey to an enterprise big data platform Securing the Cluster as multi-tenant environment Step by step by step towards our target architecture … User Management LDAP Access Control & Audit Ranger Dedicated network: BI Zone Access Control File Attributes Protection from inside Kerberos Protection from outside: Knox Authentication: LDAP for Hive Basic Security: iptables + ssh tunneling Access Control: ACLs User Management Local OS users
15 AUDI AG Strata Data London 2018 - Audi's journey to an enterprise big data platform Password Hell DATA NODE 1 - X NAME NODE 1 - 2 EDGE NODE 1 - 2 OS Level Audi Active Directory: [ Local User ] [ AD User ] OS Named User Named User Technical Hive User Technical Hive User Technical Project User Hadoop User Hive SSH 2 Knox WebHDFS HDFS/YARN kinit EdgeNode SparkUI Hadoop KDC: [ Kerberos Principal ] Name User Technical Hive User Technical Project User Hadoop User password required no password required next step Legend:
16 AUDI AG Strata Data London 2018 - Audi's journey to an enterprise big data platform DATA INGESTION
17 AUDI AG Strata Data London 2018 - Audi's journey to an enterprise big data platform Data ingestion: technical requirements from projects, security and ops » Streaming data » Batch data » easy writing to HDFS/DWH INGESTION » Data Sources should not directly be coupled to analytical backend jobs » This allows adding new analytical jobs without changing the source DECOUPLING » Data ingestion must be available 24x7 » Data must be buffered (persisted) in case backend or backend job is not available HA & BUFFERING » Source systems must not connect directly to the data zone (Hadoop, DWH) – by IT Sec » Authentication + Data in motion encryption (multi tenancy) » Protocol must be auditable SECURITY » Some data sources run in the cloud » Amount of data will increase over time for most projects » Number of projects will increase SCALABILITY
18 AUDI AG Strata Data London 2018 - Audi's journey to an enterprise big data platform Solution: Kerberized Confluent Kafka Platform Data Source network #1 FW FW AAP Messaging Zone FW FW BI Data Zone SRC MSG MSG BI #1 Kerberos DataProxy KDC Hadoop KDC Kafka Client BIN / push Kerberos Kerberos Kerberos Kerberos Kafka Broker Spark Streaming BIN BIN BIN / pull Data Source network #n FW Kerberos Kerberos SRC HDFS #n Zookeeper Kerberos Kafka Client none none Kerberos BIN / push Schema Registry HDFS Connector HTTP HTTP BIN / pull encrypted (SSL) not encrypted Legend: authentication protocol / direction firewall pain point
19 AUDI AG Strata Data London 2018 - Audi's journey to an enterprise big data platform Kafka Distributed Connector: unsecured REST API Edge Node User Bob sink Bob‘s HDFS config Bob‘s Kafka keytab Bob keytab Connector HDFS File Sink HDFS Sink Bob’s Java topic Bob Source Eve Bob data Eve Process HTTP User Eve source sink Bob’s config config data Eve Eve evil connection good connection Legend:
20 AUDI AG Strata Data London 2018 - Audi's journey to an enterprise big data platform TODAY CURRENT STATE
21 AUDI AG Strata Data London 2018 - Audi's journey to an enterprise big data platform Architecture & Network Zones – Data Ingestion BI Data Zone System A Data Warehouse System A Messaging Zone HDFS System Data Proxy Connector Spark System Streaming Cloud App S3 Backup encrypted (SSL) not encrypted Legend:
22 AUDI AG Strata Data London 2018 - Audi's journey to an enterprise big data platform Architecture & Network Zones – User & Developer Access BI Application Zone BI Data Zone Data Mining AAP Data Warehouse Dashboarding AAP Audi Office LAN Remote Desktop Audi Laptop Deployment Zone PIPE encrypted (SSL) not encrypted Legend:
23 AUDI AG Strata Data London 2018 - Audi's journey to an enterprise big data platform Hadoop Cluster Sizing Production 2017 Hadoop per node Sum # data PROD 1 12 nodes RAM 512 GB 6 TB Cores 24 288 HDD* 96 TB 9.216 TB Kafka per node Sum # broker 1 4 PROD nodes RAM 32 GB 128 GB Cores 6 24 HDD* 4 TB 16 TB * Raw Capacity without replication and FS overhead!
24 AUDI AG Strata Data London 2018 - Audi's journey to an enterprise big data platform Organisational Tasks Current state
Recommend
More recommend