Following a rigorous methodology is key to delivering customer satisfaction and expanding analytics use cases across the business.
Knowing that all data for the current load cycle has loaded correctly is essential for effective data warehouse management. However, the need for load validation varies depending on the extent of error checking, data validation, and data cleansing functionalities inherent in your mappings. For large data integration projects with thousands of mappings, the task of reporting load statuses becomes overwhelming without a well-planned load validation process.
Methods for validating the load process range from simple to complex. Use the following steps to plan a load validation process:
Weigh all of these factors to find the correct solution for your project.
Below are descriptions of five possible load validation solutions, ranging from fairly simple to increasingly complex:
One practical application of the post-session email functionality is the situation in which a key business user waits for completion of a session to run a report. Email is configured to notify the user when the session was successful so that the report can be run. Another practical application is the situation in which a production support analyst needs to be notified immediately of any failures. Configure the session to send an email to the analyst upon failure. For round-the-clock support, a pager number that has the ability to receive email can be used in place of an email address.
Post-session email is configured in the session, under the General tab and ‘Session Commands’.
A number of variables are available to simplify the text of the email:
In addition to post session email messages, there are other features available in the Workflow Manager to help validate loads. Control, Decision, Event, and Timer tasks are some of the features that can be used to place multiple controls on the behavior of loads. Another solution is to place conditions within links. Links are used to connect tasks within a workflow or worklet. Use the pre-defined or user-defined variables in the link conditions. In the example below, upon the ‘Successful’ completion of both sessions A and B, the PowerCenter Server executes session C.
The PowerCenter Reports (PCR) is a web-based business intelligence (BI) tool that is included with every PowerCenter license to provide visibility into metadata stored in the PowerCenter repository in a manner that is easy to comprehend and distribute. The PCR includes more than 130 pre-packaged metadata reports and dashboards delivered through Data Analyzer, Informatica’s BI offering. These pre-packaged reports enable PowerCenter customers to extract extensive business and technical metadata through easy-to-read reports including:
In addition to the 130 pre-packaged reports and dashboards that come standard with PCR, you can develop additional custom reports and dashboards that are based upon the PCR limited-use license that allows you to source reports from the PowerCenter repository. Examples of custom components that can be created include:
Below is an example of a custom dashboard that gives instant insight into the load validation across an entire repository through four custom indicators.
Informatica Metadata Exchange (MX) provides a set of relational views that allow easy SQL access to the PowerCenter repository. The Repository Manager generates these views when you create or upgrade a repository. Almost any query can be put together to retrieve metadata related to the load execution from the repository. The MX view, REP_SESS_LOG, is a great place to start. This view is likely to contain all the information you need. The following sample query shows how to extract folder name, session name, session end time, successful rows, and session duration:
The sample output would look like this:
Informatica strongly advises against querying directly from the repository tables. Because future versions of PowerCenter are likely to alter the underlying repository tables, PowerCenter supports queries from the unaltered MX views, not the repository tables.
A more complex approach, and the most customizable, is to create a PowerCenter mapping to populate a table or a flat file with desired information. You can do this by sourcing the MX view REP_SESS_LOG and then performing lookups to other repository tables or views for additional information.
The following graphic illustrates a sample mapping:
This mapping selects data from REP_SESS_LOG and performs lookups to retrieve the absolute minimum and maximum run times for that particular session. This enables you to compare the current execution time with the minimum and maximum durations.
Note: Unless you have acquired additional licensing, a customized metadata data mart cannot be a source for a PCR report. However, you can use a business intelligence tool of your choice instead.
Success
Link Copied to Clipboard