Pentaho 8.0 and Beyond Matt Howard Pentaho Sr. Director of Product Management, Hitachi Vantara
Safe Harbor Statement The forward-looking statements contained in this document represent an outline of our current intended product direction. It is provided for information purposes only and is not a commitment to deliver any new or enhanced product or functionality, or that we will pursue the product direction described. Facts and circumstances may occur which may impact current plans, resulting in changes to the information in this presentation. This information is current only as of the date it is made and should not be relied upon in making purchasing decisions. The development, release (if at all), and timing of any features or functionality described for the Pentaho products remains at the sole discretion of Pentaho.
Pentaho 8.0 and Beyond 1 2 3 Product Vision Pentaho 8.0 Product Roadmap
Product Vision
The Power of Three HITACHI DATA SYSTEMS PENTAHO > Content platform > Data Integration > Storage solutions > Business Analytics HITACHI INSIGHT GROUP > Lumada IoT
Pentaho Business Analytics Platform Data Engineer Data Engineer Data Analyst / Data Scientist Data Analyst / Data Scientist Business Analyst Business Analyst Consumer Consumer Interactive Query and Interactive Query and Custom and Self-Service Custom and Self-Service Production Reporting Production Reporting Analysis Analysis Dashboards Dashboards Pentaho Data Integration Pentaho Data Integration Data Preparation | Integrated Machine Learning Data Preparation | Integrated Machine learning O P E N A N D E M B E D DA B L E O P E N A N D E M B E D DA B L E O P E N A N D E M B E D DA B L E O P E N A N D E M B E D DA B L E Operational Data Operational Data Big Data Big Data Data Stream Data Stream Public/Private Clouds Public/Private Clouds
Future Vision: A Single Consistent Experience Data Engineering Data Prep Analytics Data Discovery Analysis Ingestion Processing Blending Data Delivery / Analysis & Dashboards Lifecycle Data Dynamic Data Administration Security Monitoring Automation Management Provenance Pipeline
Pentaho 8.0
Introducing Pentaho 8.0 Pentaho 8.0 • Connect to Kafka streams Challenge #1 Broadens connectivity Data volumes and velocity • Stream processing with Spark to streaming data are growing exponentially • Big data security with Knox sources • Enhanced Adaptive Execution (AEL) Challenge #2 Pentaho 8.0 Processing and storage Optimizes processing • Native Avro and Parquet handling resources are constrained resources • Worker nodes for “Scale-out” Pentaho 8.0 • Data explorer filters Challenge #3 Boosts team Shortage of Big Data talent • Improved repository UX productivity across and lack of productivity • Extended operations mart the pipeline
Streaming for Time Sensitive Insight Enable use cases that require real-time processing, monitoring and aggregation • Real-time device monitoring • Log-file aggregation • Notifications • And more… NEW in Pentaho 8.0 ü Kafka Producer Step ü Kafka Consumer Step ü Get records from stream Step ü Spark streaming via AEL
Pentaho 7.1 – Adaptive Execution for Spark PDI ü No Coding ü Build Once Pentaho Kettle ü Execute on Any* Engine *Currently Available Engines
Enhanced Adaptive Execution Simplified setup HADOOP CLUSTER • Eliminated “Zookeeper” component • Reduced number of setup steps Spark/Hadoop Processing Nodes PDI AEL-Spark Hardened deployment Client Daemon on Edge Nodes Spark • Fail-over at the edge Executors • Kerberos impersonation for client Hadoop/Spark Compatible Storage Cluster More flexible Azure Amazon • Support multiple run configurations HDFS AEL-Spark Storage S3 Engine Etc… • Customize cluster settings per job type (Spark Driver)
Worker Nodes for Scaling Out Scale work items across multiple nodes (containers) Worker Node (a) • Easily add and remove resources as required Worker Node (b) Distribute and Scale • Monitor and balance changing workloads Worker Node (c…) • Deploy on premise, cloud and hybrid NEW in Pentaho 8.0 ü Container framework ü Orchestration framework ü Node monitoring ü Enhanced HA implementation
Worker Nodes Architecture WORKER NODES Orchestration Framework Orchestration (Scheduler, monitoring, security, etc.) Powered by … Controller (HA) Pentaho Clients Master (Working) Master Master (Standby) (Standby) Pentaho Server Container Framework WN 1 WN 2 WN …n Pentaho Repository e.g. KJB e.g. KTR “Executor”
Pentaho 7.0 – Data Explorer Access visualizations during data prep for inspection and prototyping
Data Explorer Filters Enhanced data inspection in PDI • Identify data to be cleaned or removed • Deliver data to the business more quickly ENHANCED in Pentaho 8.0 ü Numeric filters ü String filters ü Include/Exclude data points
Pentaho 8.0 – Complete Data Integration Enterprise Platform • Filters in Data Explorer for enhanced data • Worker Nodes Scale-Out to drive superior inspection during prep agility and TCO for enterprises • New PDI Repository Dialogs for better usability • Ruby Theme – new platform branding • Run Configurations for Jobs for seamless user Additional Items experience • Ops Mart for Oracle, MySQL, SQL Server Big Data • Big Data Sandbox VM updates • Stream Data Processing to simplify near real • Platform password security improvements time integration with Kafka • PDI Mavenization for infra alignment • Enhanced AEL for reliability, performance, and • Documentation improvements on security help.pentaho.com • Big Data File Formats to support crucial Hadoop use cases • Big Data Security with HDP Knox Gateway • VFS Improvements for named Hadoop clusters
Product Roadmap
Roadmap Initiatives Visual Data Big Data Enterprise Experience Processing Platform Data Exploration Adaptive Execution Scale-out Deployment Visual Data Prep Spark Execution Metadata Management Embedded Analytics Stream Processing Operations Management Data Catalog Machine Learning Cloud Deployment EMERGING TRENDS AND TECHNOLOGY PENTAHO FOUNDATIONAL INVESTMENT AREAS Advanced Analytics | Real-time
Strengthening the Bridge Between Data and Insight DATA EXPLORER ü Visual data inspection ü Intuitive data prep ü Advanced visualization ü Governed access Source 1 Source 2 ü Searchable metadata Source 3 Source 4 Source 5 ü Collaboration CATALOG
Inline Data Prep – Vision Intuitive, excel-like transformation design Integrated Inline Profiling Model Field Statistics Field Type: Integer Records: 10,000 Cardinality: 273 Inline Min <count>: 1 Transformation Max <count>: 23 Bin Size (%): Quintile Merge Fields
Pentaho Machine Learning Orchestration Roadmap projects that serve emerging needs of data scientists. Notebook Catalog Data Integrations Explorer Adaptive Execution Native Algorithms
Pentaho Roadmap Features and dates are subject to change. Nov 2017 1H18 (8.1) Future • Data Explorer Filters • Catalog I • Catalog Search • New User Console VISUAL DATA EXPERIENCE • Visual Profiling • Data Prep from DET • Data Science Viz • Layout Manager • Real-time Viz • Kafka Interface • Streaming II • Advanced Profiling • Thin Kettle (Composer) (BIG) DATA PROCESSING • Spark Streaming • Enhanced JSON/XML/ORC • Rules Validator • Web Designer • Parquet and Avro • AEL - extend distros • Native ML algorithms • Data Operations Mgr. • Enhanced AEL • AEL – Flink • AEL – Next • Scale-out Framework • Unified Monitoring • Enhanced Upgrade • Metadata Manager ENTERPRISE PLATFORM • Foundry Integration • Harden Metadata Bridges • Enhanced Security • Business Glossary • Vantara Integrations • New Content Lifecycle • Multi-tenancy • Vantara Integrations • Vantara Integrations • AEL HDP, MapR • Google Cloud Platform • Multi-cloud Orchestration • Mainframe ECOSYSTEM • Cassandra/NoSQL Update • Cloud App Connectors • Enhanced SAP and SFDC
Hitachi Vantara Portfolio Application Framework Application Studio Dashboards Visualization Notifications App Development Edge Processing Edge Processing Asset Management Asset Management Data Integration Data Integration Analytics Analytics • Asset registry • Data connectors • Business analytics • Data catalog • Transformation engines • Content analytics • Metadata management • Profiling and quality • Artificial intelligence • Modeling and lineage • Data blending • Batch and stream • Governance • Data preparation Foundry Service Platform Software Platform Search Workflow Scheduling Security Clustering Repository Monitoring Storage Storage Flash Storage Converged Infrastructure Automated Management Data Protection
IoT Solutions – from Edge to Outcomes SMART SMART SMART SMART DATA CENTER BUSINESS INDUSTRY CITY Edge Core Insights Outcomes Fog Layer Core Sensors Asset Asset Sensors Ingest Model Telemetry Registry Registry Things Things Edge Process Predict Edge Stream Stream People Filtering Visualize Notify Queues People Queues IoT Data Pipeline Lumada IoT Data Pipeline IoT Analytic Processor
Recommend
More recommend