PDI Sizing Overview and Case Study Steve Szabo Pentaho Lead - PowerPoint PPT Presentation

PDI Sizing Overview and Case Study Steve Szabo Pentaho Lead Solution Engineer, Hitachi Vantara

Introduction and Agenda • Introduction – What is PDI Sizing, and Why Do We Need to be Concerned About it? • Agenda – Brief Anatomy of PDI – Example Sizing Problem – Review Test Cases – Example Sizing Solution – Review Major Constraints, Bottlenecks, and Best Practices • Next Steps – Recommendations – Resources Here is a sample footnote.

Example Sizing Problem • Retail Business has daily data that must be readied for next day analysis – Data volume fluctuates daily – 8 hour delivery window Peak = 10 TB – 10 TB per day peak, 5 TB per day average 10 TB per Day = 400 GB per Hour

Sizing Disclaimers • Past performance is not a guarantee of future results • The best practice is to run throughput tests with fully representative data and transformation profiles on actual equipment • Sizing should accommodate data growth and operational margins • The results here represent throughput with a single transformation type under controlled conditions. – Increasing the variety of transformations may result in lower performance • See Pentaho Best Practices for performance tuning

What is PDI Sizing? Determining the number of nodes and cores needed to Data Data process data within time Sources Targets constraints

PDI Sizing Variables and Constraints Customer Jobs Transformations Inputs Demands Available Memory Available Available Input Output CPU (cores) ops/sec bandwidth bandwidth Number of Nodes Time

Pentaho Data Integration Sizing Factors • Available Time – Processing time required – Turn-around time requirements • Amount of Data – Interdependencies and lag • Available Resources – Computing power: cores, memory, storage, network – Number of nodes • Complexity of Transformations

PDI Anatomy • Platform – CPU, Cores, Memory – JVM • Jobs – Orchestration • Transformations attributes – Threads – Connections – Steps • Blocking Steps • Expensive Steps – Rowset buffers – Step copies – Multiple streams

PDI Sizing – Hardware Constraints • Enterprise Grade supported Processors ( non end-of-life ) • 8-core processors • 32 GB or more of available RAM – 24GB+ per JVM • High Speed network connections ( 1Gb/sec – 10 Gb/sec ) • Low number Network Hops – Co-located nodes on same segment • Cluster Configurations – Carte Clustering – Hadoop Map Reduce – AWS Auto Scaling – Spark Clusters

Example 1: High I/O, Moderate CPU use-case** • Single 8-core node with a single type of transformation • Throughput Peak: – 229 GB per Hour – 916 GB per 4-Hours – 5.5 TB per 24-Hours

Summarized Results – High I/O, Moderate CPU use-case** Number of Concurrent Hourly Daily Transformations Throughput Throughput Notes 121.9 GB per Hour 2.9 TB per Day Medium (15-steps) 1 This represents an 228.6 GB per Hour 5.4 TB per Day 88% increase in 2 throughput This is less than 229.2 GB per Hour 5.5 TB per Day 3 1% more throughput

Example 2: High I/O, High CPU use-case** • Single 8-core node with a single type of transformation • Throughput Peak: – 113 GB per Hour – 452 GB per 4-Hours – 2.7 TB per 24-Hours

Summarized Results – High I/O, High CPU use-case** Number of Concurrent Hourly Daily Transformations Throughput Throughput Notes 88.8 GB per Hour 2.1 TB per Day Medium (15-steps) 1 This represents a 112.6 GB per Hour 2.7 TB per Day 27% increase in 2 throughput This is less than 113.4 GB per Hour 2.7 TB per Day 3 1% more throughput

Example Sizing Solution Capacity Requirement: 10 TB per Day Contingency: Solution: 5.5 TB / Day / 8-core 5.5 TB / Day / 8-core 5.5 TB / Day / 8-core 5.5 TB / Day / 8-core ( Additional nodes ) ( Two 8-core nodes )

PDI Sizing – Use-Case Variation Considerations • Data formats : JSON, XML, CSV, Binary • Transformation Sizes : small, medium, large ( low-CPU, high-CPU) • Step copies : 1, 2, 4, 8, 16 • Field sizes: 10B, 100B, 1K, 4K, 10K • Row sizes: 1K, 4K, 16K - and Rowset buffer • Step types: Regex, Javascript • Aggregations: Sort, Join, Analytic (Sum, Standard Deviation)

PDI Sizing – Best Practices • Performance Tuning – Optimize input and output structures (e.g. Pre-sort data) – Identify slowest steps – Use optimal steps, such as Bulk Loading • Scaling up – Number of step copies – Number of instances – Clustering transformations • Perform load test - Determine Peak throughput • Monitoring and alerting

Considerations and Recommendations • Capacity planning – CPU, Storage, • System Maintenance Network – Allow for maintenance, scheduled and MTBF – Allow for operating margins • Backups, Upgrades, Backlog Recovery • 20% to 50% – Redundancy, Failover – Allow for System overhead – Ongoing system and performance monitoring • 10% • Data forecasting – Monthly and Annual growth • Data Cycles – Ad-hoc projects – Near Real-time streaming versus – Data Growth daily analytical batches – Additional Transformations • System optimization – See Pentaho Best Practices

PDI Sizing – support.pentaho.com • Best Practices • Product Documentation • Enterprise Support • Pentaho Services

PDI Sizing Overview and Case Study Steve Szabo Pentaho Lead - PowerPoint PPT Presentation

PDI Sizing Overview and Case Study Steve Szabo Pentaho Lead Solution Engineer, Hitachi Vantara Introduction and Agenda Introduction What is PDI Sizing, and Why Do We Need to be Concerned About it? Agenda Brief Anatomy of PDI

Brendan Jackson Vice President, Analytics PDI Virtual Desktop In Docker Innovate In Your Swarm

Discrete Buffer and Wire Sizing for Discrete Buffer and Wire Sizing for Link-Based Non-Tree Clock

Sizing Power Generation and Fuel Sizing Power Generation and Fuel Capacity of the All-Electric

Getting System Sizing and Getting System Sizing and performance testing right performance

Stress Aw are Active Area Sizing, Gate Sizing and Repeater Insertion Ashutosh Chakraborty David

Robust Gate Sizing via Mean- - Robust Gate Sizing via Mean Excess Delay Minimization Excess

CS244 Advanced Topics in Networking Lecture 10: Buffer Sizing Nick McKeown Sizing Router

Construction of Realistic Gate Construction of Realistic Gate Sizing Benchmarks Sizing

Case study 2 Case study 2 Case study 2 Case study 2 Former Industrial Site, London: How has

Data Explorer Inspect, Visualize, and Collaborate from Any PDI Step Ben Hopkins Pentaho Senior

Design Patterns Leveraging Spark in PDI Chris Skirde Pentaho Director of Sales Engineering,

Basic Rocket Sizing and Performance: Saturn V Overview In this exercise we will perform a basic

Sample Preparation in the Laboratory FRITSCH GMBH Milling & Sizing Dagmar Klein Sales

Learnings from my Investing Career What is investing? Consists of three steps: Buying, Sizing

Outline Introduction. Paper: Paper: Optimal Sizing for Minimum Energy. B Benton H. C.,

Facilities Review Brian Carpenter Board Workshop 4 MSAD #1 Planning For The Future Right Sizing

Anatomy Journey to Journ y to Healt Health Gluteus Maximus Gluteus Medius

Anatomy of a Fraud Business E-mail Compromise & Computer Intrusion What is Business E-Mail

Visible Body Courseware Interactive assignments and lessons that engage your students' passion

Reciprocal Peer Teaching during Anatomy dissection at CUHAS: A one Year experience Mange Manyama,

Anatomy of a Java User Group A Community of Developers

Regulation of Personalised, including 3D Printed, Medical Devices Dr Elizabeth McGrath Acting

joint program in medicine Assessment of required facilities at CSU Orange 25 October 2018

Anatomy of a Cost Estimate for Legislation Funding Transportation Programs An explanation of the

Sambuz

Useful Links

Newsletter

Mail Us

PDI Sizing Overview and Case Study Steve Szabo Pentaho Lead - PowerPoint PPT Presentation

PDI Sizing Overview and Case Study Steve Szabo Pentaho Lead Solution Engineer, Hitachi Vantara Introduction and Agenda Introduction What is PDI Sizing, and Why Do We Need to be Concerned About it? Agenda Brief Anatomy of PDI

Brendan Jackson Vice President, Analytics PDI Virtual Desktop In Docker Innovate In Your Swarm

Discrete Buffer and Wire Sizing for Discrete Buffer and Wire Sizing for Link-Based Non-Tree Clock

Sizing Power Generation and Fuel Sizing Power Generation and Fuel Capacity of the All-Electric

Getting System Sizing and Getting System Sizing and performance testing right performance

Stress Aw are Active Area Sizing, Gate Sizing and Repeater Insertion Ashutosh Chakraborty David

Robust Gate Sizing via Mean- - Robust Gate Sizing via Mean Excess Delay Minimization Excess

CS244 Advanced Topics in Networking Lecture 10: Buffer Sizing Nick McKeown Sizing Router

Construction of Realistic Gate Construction of Realistic Gate Sizing Benchmarks Sizing

Case study 2 Case study 2 Case study 2 Case study 2 Former Industrial Site, London: How has

Data Explorer Inspect, Visualize, and Collaborate from Any PDI Step Ben Hopkins Pentaho Senior

Design Patterns Leveraging Spark in PDI Chris Skirde Pentaho Director of Sales Engineering,

Basic Rocket Sizing and Performance: Saturn V Overview In this exercise we will perform a basic

Sample Preparation in the Laboratory FRITSCH GMBH Milling &amp; Sizing Dagmar Klein Sales

Learnings from my Investing Career What is investing? Consists of three steps: Buying, Sizing

Outline Introduction. Paper: Paper: Optimal Sizing for Minimum Energy. B Benton H. C.,

Facilities Review Brian Carpenter Board Workshop 4 MSAD #1 Planning For The Future Right Sizing

Anatomy Journey to Journ y to Healt Health Gluteus Maximus Gluteus Medius

Anatomy of a Fraud Business E-mail Compromise &amp; Computer Intrusion What is Business E-Mail

Visible Body Courseware Interactive assignments and lessons that engage your students' passion

Reciprocal Peer Teaching during Anatomy dissection at CUHAS: A one Year experience Mange Manyama,

Anatomy of a Java User Group A Community of Developers

Regulation of Personalised, including 3D Printed, Medical Devices Dr Elizabeth McGrath Acting

joint program in medicine Assessment of required facilities at CSU Orange 25 October 2018

Anatomy of a Cost Estimate for Legislation Funding Transportation Programs An explanation of the

Sambuz

Useful Links

Newsletter

Mail Us

Sample Preparation in the Laboratory FRITSCH GMBH Milling & Sizing Dagmar Klein Sales

Anatomy of a Fraud Business E-mail Compromise & Computer Intrusion What is Business E-Mail