Modernizing Data Estates with Presto Ken Seier, Chief Architect | Data & AI ken.seier@insight.com
Insight fast facts DEEP PORTFOLIO & RELATIONSHIPS ENGAGED WORKFORCE GLOBAL REACH 19 3,500 + 11,000 + countries Hardware, software serving clients around the globe Insight teammates worldwide and cloud partners FOUNDED IN FINANCIAL STABILITY BROAD EXPERTISE $9B+ 1988 7,500 + Sales and service in revenue in 2018 Fortune 500 company with delivery professionals long legacy and knowledge
Presto today • Targeted query federation for line-of-business applications or reporting • Ad hoc analytics enablement • Tech and Retail verticals, with some FinServ https://db-engines.com/en/ranking_trend/system/Presto
Federated queries and data aggregation Presto doing what we know its good at, and a little more.
Challenge • Global technical services company • 500,000+ customers • 300,000+ events/second • End-user investigation tool with cumbersome Java query tier
Federated query solution Custom insights UX Simplified Java/SQL services Detail events in Starburst Presto Event aggregates Amazon S3 query fabric in Elasticsearch
Pre-aggregation ETL solution Amazon Elasticsearch
Outcomes • Rationalized Java query tier to single Presto SQL source • Implemented pre-aggregation ETL in same AWS/Java/Presto toolset • Elasticsearch queries through Presto over 1 million documents return in <2 seconds
Big Data 2.0 Presto is a lighter replacement for aging big SQL tools.
Challenge • Global software-as-a-service company • 15,000,000+ customers • Ad-hoc queries over 100 terabytes of cleansed data • Aging on-prem big-data-SQL implementation challenged to scale
Hive to Presto Hive QL queries ANSI SQL queries Starburst Presto Data lake Data lake
Outcomes • Data-in-place replacement for Hive • Migrate from HiveQL to ANSI SQL • Many-X concurrency improvement over Hive • 10X performance over Spark benchmarks
Unified Query Plane Using Presto to simplify and de-risk legacy data management
Challenge • Global manufacturer/retailer • $20,000,000,000+ globally • Rich operational ecosystem • Aggressively working toward comprehensive stack rationalization
Legacy data estate
Presto data fabric
Outcomes • Many, many Presto sources and consumers • Supporting data science and line-of-business on isolated clusters • Presto abstraction over legacy systems enables table-by-table migrations • Row and column level RBAC enabled in Ranger • End-to-end automation for registering and managing data definitions: metadata, stats and security • Query-grain costbacks enabled with log listener
Trends Where Presto may be headed
Presto today • Targeted query federation for applications or reporting • Ad hoc analytics enablement • Tech and Retail verticals, with some FinServ https://db-engines.com/en/ranking_trend/system/Presto
Presto going forward • Adoption driven by data science value • Drafting with Kubernetes adoption • Awareness in new industries • Data estate rationalization • Blue/green migration abstraction • New data tier and estate patterns
Presto going forward Core data value cases Historical Reporting Operational Data Store Analytics Discovery Line of business reporting Line of business reporting Line of business reporting for defined historical period for making real-time course for making real-time course using defined metrics and corrections in day to day corrections in day to day performance indicators operations operations Data warehouse Small, disk-bound or Data lake Data mart in-memory store Lab environment
Presto going forward Conceptual data architecture ANSI SQL data & insight Three fully- decoupled, Data lake horizontally of choice scalable, Insight single-tool enrichment tiers Direct event and Data staged from Data directly from transactional data source systems source systems
Questions? Ken Seier, Chief Architect | Data & AI ken.seier@insight.com
Recommend
More recommend