Hitachi NEXT 2018 Automating Onboarding Data with Metadata Injection Contents Page 2: Introduction to Metadata Injection Page 7: Guided Demonstration Overview: Metadata Injection Page 13: Guided Demonstration – Standard Metadata Injection Page 20: Guided Demonstration – Push / Pull Metadata Injection Page 27: Guided Demonstration – 2 - Phase Metadata Injection Page 36: Summary of Metadata Architectures
Introduction to Metadata Injection Metadata is traditionally defined and configured at design time, in a process known as hard-coding, because it does not change at run time. This static ETL approach is a good one to take when you are onboarding just one or two data sources where you can easily enter metadata manually for your transformation. However, this hard-coding approach presents some complications, including: • Time consumption • Repetitive manual tasks • Error-prone solutions • High labour costs of designing, developing, and supporting a fragile solution • Added risk when predictable outcomes are jeopardized Metadata injection is the dynamic ETL alternative to scaling robust applications in an agile environment. One transformation can service many needs by building a framework that shifts time and resources to runtime decisions. This operation dramatically reduces upfront time-to-value and flattens the ongoing investment in maintenance. When you are dealing with many data sources that have varying schemas, try metadata injection to drastically reduce your development time and accelerate your time to value. HITACHI is a trademark or registered trademark of Hitachi, Ltd. 2
Data integration is the main domain of metadata injection. As illustrated below, metadata injection is useful in cases that face one or more of the following challenges: • Many datasources • Different naming conventions • Similar content • Dissimilar structure • Common destination The ETL Metadata Injection step can be used in transformations to inject metadata into another transformation, normally with input and output steps for standardizing filenames, naming or renaming fields, removing fields, and adding fields. Note : Pentaho’s metadata injection helps you accelerate productivity and reduce risk in complex data onboarding projects by dynamically scaling out from one template to many transformations. HITACHI is a trademark or registered trademark of Hitachi, Ltd. 3
Pentaho Data Integration (PDI) now has over 75 steps that can be templated to inject metadata or characteristics that can make small or large value changes, allowing each run to be different from the previous. https://help.pentaho.com/Documentation/8.1/Products/Data_Integration/T ransformation_Step_Reference/ETL_Metadata_Injection/Steps_Supporting_M DI ETL integration development takes time for gathering requirements, building, testing, documenting, deploying, and monitoring production. Rules, requirements, and data itself may change, over time. If that happens, the current rules may no longer apply or new rules may need to be added to the existing transformation to continue working. We recommend using flexible, data-driven ETL patterns to make your data integration transformation powerful and adaptable to changing business rules without going through a development cycle. Data Streaming Since version 5.1, this step is capable of streaming data from one transformation into another. To pass data from your template transformation (after injection, during execution) to your current transformation, specify Template step to read from. You can also specify the expected output fields easily design the steps which come after the ETL Metadata Injection step. To pass data from a source step into the template transformation (again, after injection) you can specify Streaming source step and Streaming target step in the template transformation. Metadata injection refers to the dynamic passing of metadata to PDI transformations at run time to control complex data integration logic. The metadata (from the data source, a user defined file, or an end user request) can be injected on the fly into a transformation template, providing the “instructions” to generate actual transformations. This enables teams to drive hundreds of data ingestion and preparation processes through just a few actual transformations, heavily accelerating time to data insights and monetization. In data onboarding use cases, metadata injection reduces development time and resources required, accelerating time to value. At the same time, the risk of human error is reduced. HITACHI is a trademark or registered trademark of Hitachi, Ltd. 4
Data integration can be made more flexible and reactive by building rules that can be injected into the transformation before running, and by using the appropriate parameters to pass into ETL jobs. For example: • Passing in different filenames (paths and filenames can be different for each run) • Passing different values into a custom database structured query language (SQL) statement to allow for different behaviours (from different tables’ names, and where clause field name values) ETL Metadata Injection Step The ETL Metadata Injection step exposes the metadata properties of your ‘template’ steps. This step enables you to map existing metadata properties to new injected metadata properties. HITACHI is a trademark or registered trademark of Hitachi, Ltd. 5
OPTION DESCRIPTION Transformation In this section of the dialog, you can specify the transformation to use as a template template. When you have specified a transformation, you can use the Validate and Refresh button. The Edit button will open the specified template in a new tab in Spoon. Template step to If you specify a step from the template here, then the output of the ETL read from (optional) Metadata Injection step will be the output from the source step. Optional target file For debugging or transformation generation, you can save the resulting (KTR after injection) transformation filename, after metadata injection, to a file. If you want, you can specify a file name, result.ktr for example. Don't execute If you prefer to not execute the resulting transformation (after metadata resulting injection), enable this option. transformation Field mapping You can select any row in the metadata tree table with your mouse, which pops up a source step and field selection dialog. HITACHI is a trademark or registered trademark of Hitachi, Ltd. 6
Guided Demonstration Overview: Metadata Injection Introduction These Guided Demonstrations outline the ‘use case’ for Metadata Injection. Onboarding data workflows follow repeatable patterns, with just different metadata properties. • Scenario 1 - Hard coded delimiter • Scenario 2 - ETL Metadata Injection of metadata Objectives Once the repeatable pattern has been defined in a template, the ETL Metadata Injection step, exposes their metadata properties, which can then be mapped to the corresponding injected source stream field. • Outline the workflow for standard data onboarding. • Configure an ETL Metadata Injection Transformation, and Template. Scenario 1 – Static ETL In this scenario, onboarding the files would require a CSV file input step for each of the different delimiters. HITACHI is a trademark or registered trademark of Hitachi, Ltd. 7
1. Double-click on the CSV File Input steps to display the metadata properties. Each datasource requires its own Transformation. HITACHI is a trademark or registered trademark of Hitachi, Ltd. 8
Scenario 2 – Inject ETL Metadata Properties Template For this scenario, the onboarding of the data is achieved with a template. Note : The steps in the template that define the scope of the metadata injection properties. 1. Double-click on each of the steps: • CSV file input • Select values The metadata properties will be injected at RUN time. 2. Double-click on the table output step. HITACHI is a trademark or registered trademark of Hitachi, Ltd. 9
ETL Metadata Injection Transformation The main Transformation: • Onboards the sales_data.txt into the datastream pipeline • Injects the required metadata properties from the Data Grid steps • Maps the Template fields to the Injected Metadata properties • Executes the Template HITACHI is a trademark or registered trademark of Hitachi, Ltd. 10
ETL Metadata Injection Step 1. Double-click on the ETL Metadata Injection step The step is mapped to: tr_metadata_inject_template.ktr Once mapped the steps and fields in the tr_metadata_inject_template.ktr are exposed. The Source steps and fields can now be mapped to the corresponding Target Injection Step. In this example, the Source step: Using variable to resolve filename and stream Field: filename, is mapped to the template: CSV file input step and FILENAME datastream field. 2. Examine the other mappings. HITACHI is a trademark or registered trademark of Hitachi, Ltd. 11
Recommend
More recommend