Innovation at AWS Eric Ferreira ericfe@amazon.com Principal - PowerPoint PPT Presentation

Innovation at AWS Eric Ferreira ericfe@amazon.com Principal Database Engineer Amazon Redshift

The Amazon Flywheel Focus on things that stay the same Price Selection Delivery

Applying this at AWS

Focus on things that stay the same Performance Amazon Value Redshift Simplicity

Adopt a retail mindset

Customers have choice Delight them and they’ll stay Earn their business one hour at a time

Start with the Customer Work Backwards

What Do Customers Want? • What problems are customers facing? • How will my service alleviate this pain? • Why will this idea delight customers? • Why can I do this better than anyone else?

What we heard from customers about DW • Complicated to install, maintain, operate • Require large upfront payments • Too expensive • Always running out of capacity

Press Release Describe the product in terms of customer value Why will customers care? Is it newsworthy? How is this differentiated?

FAQ Answer customer questions How does this help me? How do I get started? How will this work with my ETL/BI tools? When should I use this vs. Hadoop?

2 pizza teams An individual team should be no larger than can be fed • by two pizzas. Beyond this size, you define contracts and interfaces • with other teams Attention is a scarce resource. Time is a scarce resource • Apply attention and time to changing reality, not • communicating status.

Build the Product Assemble Internal Private Build Launch Iterate a Team Beta Beta

Iterate

Add Features Get Feedback that matter Increase Raise Value Adoption

Redshift pushes a new DB version every two weeks. 120+ features since launch Unload logs (7/5) Temp Credentials (4/11) Sharing snapshots (7/18) DUB (4/25) Resource Level IAM (8/9) Kinesis EMR/HDFS/SSH copy, SHA1 Builtin (7/15) 3 new regex features, Unload to single Distributed Tables, Audit SOC1/2/3 (5/8) file, FedRAMP(5/6) Logging/CloudTrail, Concurrency, Resize Statement Timeout (7/22) Perf., Approximate Count Distinct, SNS WLM Timeout/Wildcards (8/1) Alerts, Cross Region Backup (11/13) UTF-8 Substitution (8/29) Resize progress indicator & Cluster JDBC Fetch Size (6/27) Version (3/21) Service Launch (2/14) New query monitoring system tables and Split_part, Audit tables (10/3) diststyle all (1/13) 50 slots, COPY from EMR, ECDHE EIP Support for VPC Clusters (12/28) ciphers (4/22) Redshift on DW2 (SSD) Nodes (1/23) PCI (8/22) Distributed Tables, Single Node Cursor Support, Maximum Connections to 500 Regex_Substr, COPY from JSON (3/25) SIN/SYD (10/8) (12/13) PDX (4/2) JSON, Regex, Cursors (9/10) Compression for COPY from SSH, Fetch NRT (6/5) HSM Support (11/11) CRC32 Builtin, CSV, Restore Progress size support for single node clusters, (8/9) new system tables with commit stats, row_number(), strotol() and query Timezone, Epoch, Autoformat (7/25) termination (2/13) 4 byte UTF-8 (7/18) Unload Encrypted Files

Collect Store Analyze Athena EMR AWS Import/ Direct Connect S3 Glacier Export Snowball Machine Redshift Learning AWS IoT Kinesis DynamoDB Elasticsearch QuickSight EC2 Lambda AWS Glue AWS Database Migration Service

Collection & Storage • Store anything • Object storage • Designed for 99.999999999% durability Amazon S3 • Scalable & Cost effective; $0.023/GB-Mo • Integrated with Amazon Glacier • Support for multiple encryption methods; integrated with AWS KMS, with support for external HSMs

Data Management & ETL • Hive Metastore-compatible data catalog with integrated crawlers for schema, data type, and partition inference • Generates Python code to move data from source to destination AWS Glue • Edit jobs using your favorite IDE and share snippets via Git • Runs jobs in Spark containers that auto-scale based on SLA • Serverless with no infrastructure to manage; pay only for the resources you consume

Amazon RDS for Aurora MySQL compatible with up to 5x better performance on the • same hardware: 100,000 writes/sec & 500,000 reads/sec Scalable with up to 64 TB in single database, up to 15 read • replicas Highly available, durable, and fault-tolerant custom SSD storage • layer: 6-way replicated across 3 Availability Zones Transparent encryption for data at rest using AWS KMS • Stored procedures in Aurora can invoke AWS Lambda functions • MySQL & PostgreSQL compatible engines •

Structured Data Processing Petabyte-scale relational, MPP, data warehousing clusters with the • ability to join across Exabytes of data in S3 using Redshift Spectrum, a serverless scale out query layer that charges $5/TB scanned Fully managed with SSD and HDD platforms • Built-in end to end security, including customer-managed keys • Fault tolerant. Automatically recovers from disk and node failures • Data automatically backed up to Amazon S3 with cross region • Amazon Redshift backup capability for global disaster recovery $1,000/TB/Year; start at $0.25/hour. Provision in minutes; scale from • 160GB to 2PB of compressed data with just a few clicks

Semi-structured / Unstructured Data Processing Hadoop, Hive, Presto, Spark, Tez, Impala etc. • Release 5.3: Hadoop 2.7.3, Hive 2.1, Spark 2.1, Zeppelin, Presto, HBase – 1.2.3 and HBase on S3, Phoenix, Tez, Flink. New applications added within 30 days of their open source release – Fully managed, autoscaling clusters with support for on-demand • and spot pricing Support for HDFS and S3 filesystems enabling separated compute Amazon EMR • and storage; multiple clusters can run against the same data in S3 HIPAA-eligible. Support for end-to-end encryption, IAM/VPC, S3 • client-side encryption with customer managed keys and AWS KMS

Serverless Query Processing Serverless query service for querying data in S3 using standard SQL, • with no infrastructure to manage No data loading required; query directly from Amazon S3 • Use standard ANSI SQL queries with support for joins, JSON, and • window functions Amazon Athena Support for multiple data formats include text, CSV, TSV, JSON, • Avro, ORC, Parquet Pay per query only when you’re running queries based on data • scanned. If you compress your data, you pay less and your queries run faster

Serverless Event Processing Server-less compute service that runs your code in • response to events Extend AWS services with user defined custom logic • Write custom code in Node.js, Python, and Java • AWS Lambda Pay only for the requests served and compute time • required - billing in increments of 100 milliseconds

Stream Processing Real-time stream processing • High throughput; elastic • Highly available; data replicated across multiple • Availability Zones with configurable retention Amazon Kinesis S3, Redshift, DynamoDB Integrations • Kinesis Streams for custom streaming applications; • Kinesis Firehose for easy integration with Amazon S3 and Redshift; Kinesis Analytics for streaming SQL

Search and Operational Analytics Distributed search and analytics engine • Managed service using Elasticsearch and Kibana • Fully managed; Zero admin • Amazon Elasticsearch Highly Available and Reliable • Service Tightly integrated with other AWS services •

Predictive Applications Easy to use, managed service built for developers - • Deploy models to in seconds Robust, powerful technology based on Amazon’s • internal systems Create models using your data already stored in the • Amazon ML AWS cloud; deploy models in batch and real time modes Spark on Amazon EMR also available for custom • machine learning applications

Business Intelligence Fast and cloud-powered • Easy to use, no infrastructure to manage • Scales to 100s of thousands of users • Amazon QuickSight Quick calculations with SPICE • 1/10th the cost of legacy BI software •

Amazon Redshift

Amazon SWF Amazon VPC Amazon EC2 AWS IAM OLAP MPP Columnar PostgreSQL Amazon Redshift Amazon S3 Amazon Amazon AWS KMS CloudWatch Route 53

Redshift Cluster Architecture SQL Clients/BI Tools Massively parallel, shared nothing • JDBC/ODBC Leader node • 128GB RAM SQL endpoint – Leader 16 cores Node Stores metadata – 16TB disk 10 GigE Coordinates parallel SQL processing – (HPC) Compute nodes • Local, columnar storage – 128GB RAM 128GB RAM 128GB RAM Compute Compute Compute 16 cores 16 cores 16 cores Executes queries in parallel – Node Node Node 16TB disk 16TB disk 16TB disk Load, backup, restore – Ingestion S3 / EMR / DynamoDB / SSH Backup Restore

Brute force only takes you so far…

Designed for I/O Reduction CREATE TABLE audience ( Columnar storage • aid INT --audience_id ,loc CHAR(3) --location ,dt DATE --date ); Data compression • aid loc dt 1 SFO 2016-09-01 Zone maps • 2 JFK 2016-09-14 3 SFO 2017-04-01 4 JFK 2017-05-14 aid loc dt • Accessing dt with row storage: – Need to read everything – Unnecessary I/O

Innovation at AWS Eric Ferreira ericfe@amazon.com Principal - PowerPoint PPT Presentation

Innovation at AWS Eric Ferreira ericfe@amazon.com Principal Database Engineer Amazon Redshift The Amazon Flywheel Focus on things that stay the same Price Selection Delivery Applying this at AWS Focus on things that stay the same

Innovation Strategy & Guidelines 26/09/19 Innovation in TII What is Innovation? Innovation

Definition of Innovation 1 Definition of Innovation 01. Defining Innovation 02. Grades of

INNOVATION IN LARGE ORGANIZATIONS Tony AMBROZIE Innovation Innovation is the creation of

Thanks for coming! Bo Brock Office for Innovation What is innovation ? An innovation is unique,

Moises Norena Global Director of Innovation WHAT IS INNOVATION ABOUT? FROM TO $375 Before

1 Innovation at Statistics Netherlands 1 How does the programme work(1/2)? Road map for

Stimulating Innovation for Development - The Innovation Hub - Presentation by McLean Sibanda CEO:

Inherent difficulties in innovation policy Innovation is surrounded by uncertainty, creating a

INNOVATION IN FERTILITY TREATMENT Prof. Ariel Revel WHAT IS INNOVATION? INNOVATION IS STAYING

TECHNOLOGY INNOVATION By Dr. Robert Finkelstein (Based On: Mastering The Dynamics of Innovation

WHAT IS THE GRAIN INNOVATION HUB? GRAIN = WHAT IS THE GRAIN INNOVATION HUB? OUTCOME Manitoba as

Eco-innovation policy in the European Union Peter Czaga Unit on Knowledge, Eco-innovation and

Innovation is as much about creativity innovation as it is about transfer of scientific

4i Lab Innovation Integration Incubation Implementation

BIG DATA REGIONAL INNOVATION HUBS Accelerating the Innovation Ecosystem Fe Fen Z Zhao ao

Systematic Innovation and its application & Role of the SISIG in democratising innovation

Health Care Innovation Awards Overview of Innovation Categories One and Two June 12, 2013

Partners in Innovation: W What St States and the CMS Innovation C Center A Accomplish T

How to become a global innovation hub? Paris Regions support to startups and the innovation

1 Social Science Articles with Innovation In the title 1956-2006 (in percent of all social

innovation that matters to you Our Open Innovation Journey Iason Onassis Industry Consulting

Office of Extraordinary Innovation Los Angeles County Metro Office of Extraordinary Innovation

FTI- Fast Track TO Innovation Horizon2020 Israel Innovation Authority The countrys public

Health Care Innovation Awards Overview of Innovation Categories Three and Four June 18, 2013

Innovation at AWS Eric Ferreira ericfe@amazon.com Principal - PowerPoint PPT Presentation

Innovation at AWS Eric Ferreira ericfe@amazon.com Principal Database Engineer Amazon Redshift The Amazon Flywheel Focus on things that stay the same Price Selection Delivery Applying this at AWS Focus on things that stay the same

Innovation Strategy &amp; Guidelines 26/09/19 Innovation in TII What is Innovation? Innovation

Definition of Innovation 1 Definition of Innovation 01. Defining Innovation 02. Grades of

INNOVATION IN LARGE ORGANIZATIONS Tony AMBROZIE Innovation Innovation is the creation of

Thanks for coming! Bo Brock Office for Innovation What is innovation ? An innovation is unique,

Moises Norena Global Director of Innovation WHAT IS INNOVATION ABOUT? FROM TO $375 Before

1 Innovation at Statistics Netherlands 1 How does the programme work(1/2)? Road map for

Stimulating Innovation for Development - The Innovation Hub - Presentation by McLean Sibanda CEO:

Inherent difficulties in innovation policy Innovation is surrounded by uncertainty, creating a

INNOVATION IN FERTILITY TREATMENT Prof. Ariel Revel WHAT IS INNOVATION? INNOVATION IS STAYING

TECHNOLOGY INNOVATION By Dr. Robert Finkelstein (Based On: Mastering The Dynamics of Innovation

WHAT IS THE GRAIN INNOVATION HUB? GRAIN = WHAT IS THE GRAIN INNOVATION HUB? OUTCOME Manitoba as

Eco-innovation policy in the European Union Peter Czaga Unit on Knowledge, Eco-innovation and

Innovation is as much about creativity innovation as it is about transfer of scientific

4i Lab Innovation Integration Incubation Implementation

BIG DATA REGIONAL INNOVATION HUBS Accelerating the Innovation Ecosystem Fe Fen Z Zhao ao

Systematic Innovation and its application &amp; Role of the SISIG in democratising innovation

Health Care Innovation Awards Overview of Innovation Categories One and Two June 12, 2013

Partners in Innovation: W What St States and the CMS Innovation C Center A Accomplish T

How to become a global innovation hub? Paris Regions support to startups and the innovation

1 Social Science Articles with Innovation In the title 1956-2006 (in percent of all social

innovation that matters to you Our Open Innovation Journey Iason Onassis Industry Consulting

Office of Extraordinary Innovation Los Angeles County Metro Office of Extraordinary Innovation

FTI- Fast Track TO Innovation Horizon2020 Israel Innovation Authority The countrys public

Health Care Innovation Awards Overview of Innovation Categories Three and Four June 18, 2013

Innovation Strategy & Guidelines 26/09/19 Innovation in TII What is Innovation? Innovation

Systematic Innovation and its application & Role of the SISIG in democratising innovation