Understanding the recovery options that are available for PowerCenter when errors are encountered during the load.
When a task in the workflow fails at any point, one option is to truncate the target and run the workflow again from the beginning. As an alternative, the workflow can be suspended and the error can be fixed, rather than re-processing the portion of the workflow with no errors. This option, "Suspend on Error", results in accurate and complete target data, as if the session completed successfully with one run. There are also recovery options available for workflows and tasks that can be used to handle different failure scenarios.
For consistent recovery, the mapping needs to produce the same result, and in the same order, in the recovery execution as in the failed execution. This can be achieved by sorting the input data using either the sorted ports option in Source Qualifier (or Application Source Qualifier) or by using a sorter transformation with distinct rows option immediately after source qualifier transformation. Additionally, ensure that all the targets received data from transformations that produce repeatable data.
The recovery strategy can be configured on the Properties page of the Session task. Enable the session for recovery by selecting one of the following three Recovery Strategies:
The Suspend on Error option directs the Integration Service to suspend the workflow while the error is being fixed and then it resumes the workflow. The workflow is suspended when any of the following tasks fail:
When a task fails in the workflow, the Integration Service stops running tasks in the path. The Integration Service does not evaluate the output link of the failed task. If no other task is running in the workflow, the Workflow Monitor displays the status of the workflow as "Suspended."
If one or more tasks are still running in the workflow when a task fails, the Integration Service stops running the failed task and continues running tasks in other paths. The Workflow Monitor displays the status of the workflow as "Suspending." When the status of the workflow is "Suspended" or "Suspending," you can fix the error, such as a target database error, and recover the workflow in the Workflow Monitor. When you recover a workflow, the Integration Service restarts the failed tasks and continues evaluating the rest of the tasks in the workflow. The Integration Service does not run any task that already completed successfully.
If the truncate table option is enabled in a recovery-enabled session, the target table is not truncated during recovery process.
The workflow can be configured to send an email when the Integration Service suspends the workflow. When a task fails, the workflow is suspended, and suspension email is sent. The error can be fixed, and the workflow can be resumed subsequently. If another task fails while the Integration Service is suspending the workflow, another suspension email is not sent. The Integration Service only sends out another suspension email if another task fails after the workflow resumes. Check the "Browse Emails" button on the General tab of the Workflow Designer Edit sheet to configure the suspension email.
When the "Suspend On Error" option is enabled for the parent workflow, the Integration Service also suspends the worklet if a task within the worklet fails. When a task in the worklet fails, the Integration Service stops executing the failed task and other tasks in its path. If no other task is running in the worklet, the status of the worklet is "Suspended". If other tasks are still running in the worklet, the status of the worklet is "Suspending". The parent workflow is also suspended when the worklet is "Suspended" or "Suspending".
The recovery process can be started using Workflow Manager or Workflow Monitor. Alternately, the recovery process can be started by using pmcmd in command line mode or by using a script.
When the Integration Service runs a session that has a resume recovery strategy, it writes to recovery tables on the target database system. When the Integration Service recovers the session, it uses information in the recovery tables to determine where to begin loading data to target tables. If you want the Integration Service to create the recovery tables, grant table creation privilege to the database user name that is configured in the target database connection. If you do not want the Integration Service to create the recovery tables, create the recovery tables manually. The Integration Service creates the following recovery tables in the target database:
PM_RECOVERY - Contains target load information for the session run. The Integration Service removes the information from this table after each successful session and initializes the information at the beginning of subsequent sessions.
PM_TGT_RUN_ID - Contains information that the Integration Service uses to identify each target on the database. The information remains in the table between session runs. If you manually create this table, you must create a row and enter a value other than zero for LAST_TGT_RUN_ID to ensure that the session recovers successfully.
PM_REC_STATE - When the Integration Service runs a real-time session that uses the recovery table and that has recovery enabled, it creates a recovery table, PM_REC_STATE, on the target database to store message IDs and commit numbers. When the Integration Service recovers the session, it uses information in the recovery tables to determine if it needs to write the message to the target table. The table contains information that the Integration Service uses to determine if it needs to write messages to the target table during recovery for a real-time session.
If you edit or drop the recovery tables before you recover a session, the Integration Service cannot recover the session. If you disable recovery, the Integration Service does not remove the recovery tables from the target database, and you must manually remove them
The following options affect whether the session is incrementally recoverable:
For recovery to be effective, the recovery session must produce the same set of rows, and in the same order. Any change after initial failure (in mapping, session and/or in the Integration Service) that changes the ability to produce repeatable data, results in inconsistent data during the recovery process. The following situations may produce inconsistent data during a recovery session:
Highly-available recovery allows the workflow to resume automatically in the case of Integration Service failover. The following options are available in the properties tab of the workflow:
One of the recovery enhancements for real-time sessions is the ability to write recovery information to a target queue. This is important for maintaining data integrity during recovery. This capability is an enhancement to using relational targets for recovery.
The recovery information is written to a target queue when a real-time session recovery is enabled for a session that reads from a JMS or WebSphere MQ source and writes to a JMS or WebSphere MQ target. The reader state, the commit number and the message ID that was committed to the target all get stored. When recovery for a session takes place, the Integration Service uses the recovery information to determine where the processing stopped. Also, recovery information is written to a recovery ignore list for failed JMS or WebSphere MQ sessions. This recovery ignore list contains message IDs that the Integration Service wrote to the target for the failed session. Recovery information is written to the recovery ignore list in case the source did not receive the acknowledgement. This recovery ignore list is important for preventing duplicates during recovery. Additionally, partitioned real-time sessions that include JMS or WebSphere MQ sources can be recovered. Finally, the resilience feature is available for the real-time option.