Last Updated Date Jun 26, 2021 |

Challenge

The usage of B2B Data Transformation to process the documents in any format can be optimized by an advanced concept called Chaining. Data Transformation chaining involves the use of more than one Data Transformation component like Parser, Mapper, Serializer, etc. or one or more Data Transformation service to achieve the desired objectives.

Description

Data Transformation chaining philosophy revolves around the use of one or more small logical components, each of which accomplish a specific task to achieve a more complex objective. This methodology encourages using reusable components and breaking down work into smaller chunks to enable easier development and maintenance.

As an example, organizations get data from different sources and store the data in warehouses to generate the big picture about some specific business function. Each source system may stipulate using date in a specific format. It becomes necessary to convert the date, from varied systems and bring them to a standard format to be stored in the warehouse. The same issue would exist in various departments and various projects. Using Data Transformation, an organization can build a reusable date processing component which would accept date in any format, analyze such format and convert the date into a standard format. Different processes can then use the reusable component in their projects (Parser projects/Mapper projects /Serializer projects) and ensure consistency in the date formats that go into the warehouse. The process of using multiple Data Transformation segments in a project to achieve a complete file transformation is called Data Transformation Chaining.

Benefits

The advantages of Data Transformation Chaining are summarized below:

  1. Chaining allows application logic to reside in separate segments (Parser, Mapper etc.).
  2. Application logic can be broken into logical service objects, thereby allowing reusable components to be built and used multiple times.
  3. Improved performance can be achieved as the number of steps required to convert a document can be brought down significantly.
  4. Maximum processing can be achieved at the Data Transformation level and Power Center can be utilized more as an execution platform.
  5. Chaining of documents allows a certain set of re-usable components to be bundled and exposed to a different application. This implies that an application running in C# or java can invoke the same service to achieve the same result thereby allowing cross development environment sharing.
  6. By chaining Data Transformation services or components, there is a reduced interaction with the invoking application like PowerCenter and code maintenance at the invocation layer is kept to a minimum necessary.
  7. Memory utilization at run time can be brought down significantly as the output is sent to another Data Transformation service as input and not as a buffer to PowerCenter.

There is no need to worry about the port sizes at the PowerCenter level as any string limitation is taken care of by Data Transformation internally. Only the Input and Output port sizes need to be defined.

Process

Chaining in a Data Transformation project is achieved by the use of any of the following “Actions”

  1. RunParser: Executes another parser within the same project. This allows a whole new source or the output of the existing parser component to be parsed, the output of which can be used back in the main parser. E.g. create a dynamic lookup object from a completely different input file at run time.
  2. RunMapper: Executes another mapper within the same project. This component allows a conversion of an xml document from one format to another format on the fly. This is typically used when XMLs have different grouping and ordering scenarios.
  3. RunSerializer: Executes another serializer within the same project. This component allows files to be created in different formats as part of a main component processing and yet allow the main component to continue further processing.

Scope

Scope of data for parsing, mapping or serialization would be the amount of data available from the main component (parser, mapper or serializer) to the secondary component that is being invoked. This can be categorized as follows:

  1. Implicit Scope: The secondary component uses the data holders available in the scope of the action. For example, if the action is within a Group, it runs on the output of the Group as shown below.

In above code snippet, the input for the “Secondary Serializer” is implicit. The input value that will be used as the source in this case would be the output of the repeating group – “MyRepeatGroup”. Hence the value stored in the complex variable “$Inputs” would be the source. The secondary serializer would be invoked repeatedly for each occurrence of the repeating group.

  1. In the above code snippet, the input for the “Full Serializer” is implicit. The input value that will be used as the source in this case would be the output of the “MainParser”. Hence the whole content of the xml from the “MainParser” is fed as the source document for the serializer.

  2. Explicit Scope: The secondary component provides an explicit reference to a data holder which serves as the source to the secondary component. This method is generally used for parsing additional input files that may be cross referenced during the different stages of data transformation in a file from one format to another format.

  1. In the above code snippet, the component “RunParser” invokes “My_Parser” and uses the value contained in the variable - “v_Temp”.

    Note: Similar approach applies for “RunMapper” and “RunSerializer” components as well.

Best Practices for Data Transformation Chaining

  1. Informatica recommends using implicit data sources when invoking a secondary component. This will ensure reduced memory usage by using a single in-memory data holder. If the data is copied to a variable and then later referenced explicitly, there is a duplication of data and that can be avoided.
  2. If parsing an additional input port, it is advisable to use data by reference to the invocation component. The secondary parser should be invoked at the last moment when the data from the secondary input port is required.
  3. It is best to limit the number of invocations of the secondary component and hence optimally reduce the number of I/O operations.
  4. It is best to clear any variables once their utility is over to reduce in-memory storage.

Table of Contents

Success

Link Copied to Clipboard