data explorer
play

Data Explorer Inspect, Visualize, and Collaborate from Any PDI Step - PowerPoint PPT Presentation

Data Explorer Inspect, Visualize, and Collaborate from Any PDI Step Ben Hopkins Pentaho Senior Product Manager, Hitachi Vantara Agenda Today, we will cover: Background on Data Explorer (DE) and its main use cases Deeper dive on specific


  1. Data Explorer Inspect, Visualize, and Collaborate from Any PDI Step Ben Hopkins Pentaho Senior Product Manager, Hitachi Vantara

  2. Agenda Today, we will cover: • Background on Data Explorer (DE) and its main use cases • Deeper dive on specific DE features and how to use them • Demonstration of DE in action

  3. Data Explorer Background

  4. What is Data Explorer? PDI module to visually inspect data at virtually any transformation step: • Access analytic views and charts without switching in and out of tools • Rapidly publish data sources to share with business colleagues • Reduce iterative work needed while building data pipelines

  5. Use Case – Accelerating Data Prep Scenario: • Designing data pipelines to cleanse data as it is onboarded to databases and applications DE Benefits: • Identify duplicates, misspellings, missing data, and other discrepancies • Confirm trends and outliers • Informs the PDI user how to adjust transformations to deliver clean

  6. Use Case – BI Prototyping Scenario: • Responding to new business requests for data to visualize and report on DE Benefits: • Publish preliminary data sources and schemas to Pentaho BA for business to validate • No staging required – boosts agility • Faster time to insight due to fewer iterations between IT and business

  7. Deeper Dive on Data Explorer Features

  8. Accessing Data Explorer Just click to inspect data from the selected step in your transformation -- but you have 2 choices from the canvas: Run and Inspect Data: Runs the Inspect Data: Launches DE transformation up to the selected directly; but this only works after step and then launches DE the transformation has run If you’ve run the transformation since making your changes, just hit “Inspect” to use the cached data for DE

  9. Default Visual – Flat Table • Table – simple row and column view of data • Fields can be sorted, moved, or removed • Great when you want to see particular rows or max/min for a field • Have choice of 14 other visualizations

  10. Visualizations – Commonly Used Bar: Good for quick Scatter: Good for Geo Map: Good comparisons and correlations, outliers, for validating finding missing data numeric relationships geolocation data

  11. Stream and Model Views A decision to make before you explore your data further Stream View Model View • No modeling layer • Uses Measures and used, just SQL Attributes specified in BA model layer • Uses PDI data types and masks • Required for pivot table, geo map, and • Required for flat sunburst charts table FOR DATA PREP FOR BI PUBLISHING

  12. Modeling and Annotations • DE generates an ‘auto model’ which guesses at basic measures, dimensions, and geo hierarchies • Use Annotate Stream step to complete the model with more hierarchies, formatting, aggregation • This is necessary for building and validating prototype data sources to publish to BA – Provides full business context

  13. Drill-Down Capabilities • Can click to drill-down into hierarchies – i.e. territory to country to city • Drill-down (and hierarchies) only available in model view • Annotate Stream is normally required to create useful hierarchies for drill-down • DE helps to validate these hierarchies before publishing data to PUC

  14. Filtering – New in 8.0 • Apply restrictions to include/exclude data when using charts • Filters can be applied to numeric and non-numeric fields • Examples: ‘Greater than,’ ‘Contains (string),’ ‘is not null’ • Create filters by dragging fields to the Filters panel, then configure

  15. Filtering – New in 8.0 • Filters and other actions can be invoked directly from charts as well • Exclude certain data • Keep only certain data • Drill down (if hierarchy available)

  16. Publishing • From any step, you can publish a data source to BA/PUC for business users • PDI must be connected to repository • Published data source includes: – JDBC to data service (virtual table) – Model (business context layer) • Can then create reports with Pentaho BA tools like Analyzer

  17. Role of Pentaho Data Services • For DE, the data service enables rapidly prototyping and publishing BA data sources without staging the data • However, data services are broadly applicable beyond DE: – Data service can be created off any step (separately from DE) – Transformations can be queried by BA as if data were in physical table – Good for when you want to blend and visualize data sets on the fly – Can also be queried by other JDBC-compliant tools like RStudio, DBVisualizer, or SQuirreL

  18. A Word on Row Limits • By default, DE will only show the first 50,000 rows from the PDI step • Can be changed in Kettle properties: – det.dataservice.dynamic.limit • Keep in mind performance and resource constraints if you scale it up

  19. Demonstration

  20. Summary What we covered today: • Background on Data Explorer (DE) and its main use cases – Accelerating Data Prep and BI Prototyping • Deeper dive on specific DE features and how to use them – visualizing, modeling, filtering, publishing, and more • Demonstration of DE in action

  21. Next Steps Want to learn more? • For documentation on DE, search “Inspect Data” on help.pentaho.com • To see where DE is headed, check out the session “Pentaho 8.0 and Roadmap”

Recommend


More recommend