Informatica's Data Processor Transformation (DP) product addresses a gap in Data Engineering Integration, Data Quality, and PowerCenter to extract data from or transform unstructured data. The requirement to process or transform unstructured or file-based data is either known at the start of a new project or it is realized after a platform/architecture is set up. This article provides best practices and considerations while configuring Data Processor on a Unix platform.
When configuring DP on new or existing hardware (either in conjunction with Data Engineering Integration, Data Quality, PowerCenter or co-existing with other host applications on the same application server) consider the following questions to determine what type of hardware to use for Data Processor:
Regardless of the hardware vendor chosen, the hardware must be configured and sized appropriately to support the Data Engineering/Data Quality platform and the complex data processor transformation requirements . The hardware requirements for the Data Processor environment depends upon the data volumes, number of concurrent users, application server and operating system used, among other factors. For exact sizing recommendations, contact Informatica Professional Services for a Data Processor Sizing and Baseline Architecture engagement.
There are several variations of the hosting environment from which Data Processor services will be called. This has implications on how the Data processor is installed and configured. The most common configurations are:
Depending on what host options are chosen, installation may vary.
The choices for the location of the service repository are: (1) a path on the local file system on the server or (2) use of a shared network drive. The typical justification for using a shared network drive is to simplify service deployment by sharing data processor transformation objects.
What Are Multi-user Considerations?
Modeling Repository service hosting Data Processor Transformation, necessary permissions need to be provided to Projects, Folders. The identity associated with the caller of the Data Transformation services will also need to have permissions to execute mapplets or exposed web service end points corresponding to the Data Processor transformation.
Special considerations should be given to scenarios (e.g., web services) where the user that runs the Data Processor Transformation service is different than the user associated with the calling application.
Log files and tracing options should be configured for appropriate recycling policies. The calling application must have permissions to read, write and delete files to the path that is set for storing these files.
The Data Processor Transformation has client and server components. Only the server (or engine) component is installed on UNIX platforms. The client or developer tool is only supported on the Windows platform. Reviewing the environment and recording the information in a detailed checklist facilitates the Data Processor transformation install.
Verify that the minimum requirements for Operating System, Disk Space, Processor Speed and RAM are met and record them in the checklist.
For new Data Engineering or Data Quality installations, the Data Integration Service and Model Repository Service are bundled. The Data Processor Transformation is available with Data Engineering and Data Quality installations.
For existing Data Engineering or Data Quality installations, enable the licenses needed for Data Processor Transformation.
To integrate with existing or new PowerCenter installations, extract the Data processor transformation mapplet from Data Engineering or Data Quality
Ensure the following:
For more information, refer to the Product Availability Matrix.
The Data Processor configuration with DEI or DEQ involves:
Before installing DEI or DEQ and configuring the Data Processor transformation, complete the following steps:
Adhere to following sequence of steps to successfully configure the Data processor transformation:
The table below provides a description of each component:
Component |
Applicable Platform |
Description |
---|---|---|
Domain server platform components |
Both UNIX and Windows |
DEI or DEQ platform service |
Model repository service |
Both Unix and Windows |
The Model Repository Service is an application service that manages the Model repository. The Model repository stores metadata created by Informatica clients and application services in a relational database to enable collaboration among the clients and services. |
Data Integration Service |
Both Unix and Windows |
The Data Integration Service is an application service that performs data integration jobs for the Analyst tool, the Developer tool, and external clients. |
Content Management Service |
Both Unix and Windows |
The Content Management Service is an application service that manages reference data. A reference data object contains a set of data values that you can search while performing data quality operations on source data. The Content Management Service also compiles rule specifications into mapplets. A rule specification object describes the data requirements of a business rule in logical terms. |
Monitoring MRS |
Both Unix and Windows |
The monitoring Model Repository Service is a Model Repository Service that monitors statistics for Data Integration Service jobs. You configure the monitoring Model Repository Service in the domain properties. |
Existing PowerCenter Installations only – PowerCenter Repository service |
Both Unix and Windows |
The PowerCenter Repository Service manages the PowerCenter repository. It receives requests from Informatica clients and application services to store or access metadata in the Model repository. |
Existing PowerCenter Installations only – PowerCenter Integration service |
Both Unix and Windows |
The PowerCenter Integration Service receives requests from PowerCenter client tools to run data integration jobs. It writes results to different databases, and it writes run-time metadata to the PowerCenter repository. When you create the service, you need to associate another application service with it. |
Developer Tool |
Windows Only |
Client tool required for DEI or DEQ. Configure Data processor transformation with developer tool |
Existing PowerCenter l Installation – PowerCenter Client |
Windows Only |
Client tool required for PowerCenter Installations. |