Roadmap: Operating Pentaho at Scale Jens Bleuel Senior Product Manager, Pentaho
Agenda – Worker Nodes Hear about new upcoming capabilities for scaling out the Pentaho platform in large enterprise operations. This will cover 8.0 and roadmap topics. • Worker Nodes: Overview and Business Benefits • How is this different from AEL / Hadoop MapReduce • Typical Customer Scenarios • Architecture & Capabilities including Monitoring & Logging • Improvements in Related Areas • Demonstration • Availability & Roadmap
Worker Nodes – Overview • Worker Nodes can scale work items across multiple nodes (containers) like: Worker Node (a) – PDI jobs and transformations (in 8.0) – Report executions (not in 8.0) Worker Node (b) Distribute and Scale – […] Worker Node (c…) • It operates easily and securely across an elastic architecture, which adds additional machine resources as they are required for processing • Worker Nodes can operate on premise or in the cloud • Uses Popular technologies under the hood such as Docker (Container Platform), Chronos (Scheduler) and Mesos/Marathon (Container Orchestration)
Worker Nodes – Business Benefits Large enterprises need the ability to seamlessly and efficiently spin up resources to handle 100s+ work items at different times, with different dependencies and processing requirements. Worker Nodes addresses these needs and delivers: • Faster time to value and reduced TCO because it enables customers to deploy their own scale-out processes without required services • Manage changing workloads more efficiently by spinning resources up and down as needed • Increased business agility thanks to containerization – which enables portability of processes across on-prem and cloud environments without the need to re-engineer them. – Even in pure on-prem, WN provides elasticity and resource optimization.
How Is This Different from AEL / Hadoop MapReduce? AEL / Hadoop Map Reduce (simplified): • Data is distributed across nodes SCALE OUT ON DATA • The processing takes place at the node level • Helps in scale out data volume Worker Nodes (simplified): • Work Items like PDI Jobs, PDI Transformations get distributed across nodes – this is about the SCALE OUT ON PROCESSES processing and orchestration (in contrast to (WORK ITEMS) distributing data) • Helps in scale out Pentaho processes These two architectures can also be combined: Within a Worker Node, a PDI transformation can also scale out with AEL or Map Reduce
Typical Customer Scenarios Customer Type Typical Number of Work Items Scale-Out Need Small Up to 10 No Medium 10 through 100 Sometimes Enterprise with one department +/- 100 Yes Enterprise with multiple departments Hundreds or thousands Yes
Typical Customer Examples – SLA’s and Time Windows • Need to meet customer SLA’s – Data from hundreds of sources need to get collected and aggregated – This is done by hundreds of PDI jobs and transformations – All these jobs and transformations need to be finished within a defined time window (for example between 5am and 7am) so that the data is available and accurate for the target audience • Worker Nodes provides the technology to run processes in parallel and scale out when needed, for example at peak times (end of month)
Typical Customer Examples – Shared Services Example of one project: • 800 daily batches from different departments in an enterprise • One server with 120GB memory and many CPUs • This machine hosts lots of VM in parallel Issue: When there is too much workload, one machine is not enough • Worker Nodes solves this in scaling out on a cluster
Typical Customer Examples – Scalable on Demand • Need to support growing data volumes and customer requirements • Worker Nodes provides a flexible and scalable architecture on-promise or in the cloud for growing demand • This is seamless and does not need to change the underlying architecture BASE TIMES PEAK TIMES Worker Node (1) Worker Node (1) Worker Node (2) Worker Node (2) Distribute and Scale Worker Node (3) Distribute and Scale Worker Node (3) Worker Node (4) Worker Node (5)
Worker Nodes – New in 8.0 • Containerized scale-out WORKER NODES Orchestration Framework • Pentaho PDI “work items” Orchestration (Scheduler, monitoring, security, etc.) Controller Master (Working) Master Master Pentaho Clients (Standby) (Standby) Container Framework Pentaho Server WN 1 WN 2 WN …n e.g. KJB e.g. KTR “Executor” Pentaho Repository
Worker Nodes Capabilities • Deploy consistently in physical, virtual, and cloud environments Adapts to customer needs (bare-metal vs. virtualization vs. Cloud) and no need to modify the product when the strategy changes • Scale and load balance services This helps to deal with peaks and limited time-windows, allocate the resources that are needed. • Hybrid deployments can be used to distribute load Even when the on-premise resources are not sufficient, scaling out into the Cloud is possible to provide more resources.
Monitoring and Logging
Monitoring – Overview
Monitoring – Worker Node Example
Improvements in Related Areas Open and Save Dialogs
Pain Point: Save a New Job/Transformation • Whenever you save a new transformation/job into the repository, the default folder is set to the user’s home folder. In previous versions: The user will need to change the folder for every time they save a new transformation or job.
New Save Dialog in 8.0 – Overview • Remembers the last opened folder! • Just enter the filename! (and/or change the folder) • Similar to the Open Dialog with additional functionality (see next slide).
New Open Dialog in 8.0 – Overview Search Recents Open shows the last opened folder. This is a big time saver!
Improvements in Related Areas Run Configurations
Pain Point: Remote Pentaho Server Execution before 8.0 To execute on the Pentaho Server before 8.0, you need to define a Slave server and give the credentials. Then execute on the selected Server.
Execute on the Pentaho Server • By selecting the Pentaho server option, you do not need to define a Slave server anymore when you want to execute remotely. • Behind the scenes, this option executes the transformation or job via the Scheduler. This is the same as you would do a “Schedule Now.” This new functionality improves the ease of use, also for Worker Nodes
Run Configurations within Job Entries • Run Configuration can be used in the Run dialog and also in the job entries that could execute jobs or transformations remotely and on Worker Nodes 7.1 Example 8.0
Demonstration
Availability and Roadmap
Availability • Worker Nodes is EE only • Initially, 8.0 Worker Nodes will be Limited Availability – Fully supported, production deployment – Distribution to a limited number of customers • Requires additional download and implementation services
Roadmap • Pentaho Server & Repository as a Service including High Availability Container Framework Pentaho Server WN 1 WN 2 WN …n “Executor” e.g. KJB e.g. KTR Pentaho Repository • Improved Monitoring and Logging • Extend to other Pentaho work items such as Reports • Integrated with other Hitachi Vantara Services and Products
Summary What we covered today: • The upcoming capabilities for scaling out the Pentaho platform and when to use them • How to use the new way of scaling out work items (Pentaho processes such as PDI jobs and transformations) across multiple nodes
Next Steps Want to learn more? • Meet-the-Expert: – Pedro Teixera • Other recommended breakout sessions: – Matt Howard: Pentaho 8.0 and Roadmap – Rakesh Saha and Jens Bleuel: Roadmap: Processing Big Data – Matt Casters: PDI Best Architecture Practices – Steve Szabo: PDI Sizing Overview and Case Study – Jonathan Jarvis: Understanding Parallelism with PDI and Adaptive Execution with Spark – Mark Burnett: Understanding the Big Data Technology Ecosystem
Recommend
More recommend