• Success
    Manage your Success Plans and Engagements, gain key insights into your implementation journey, and collaborate with your CSMs
    Success
    Accelerate your Purchase to Value engaging with Informatica Architects for Customer Success
  • Communities
    A collaborative platform to connect and grow with like-minded Informaticans across the globe
    Communities
    Connect and collaborate with Informatica experts and champions
    Have a question? Start a Discussion and get immediate answers you are looking for
    Customer-organized groups that meet online and in-person. Join today to network, share ideas, and get tips on how to get the most out of Informatica
  • Knowledge Center
    Troubleshooting documents, product guides, how to videos, best practices, and more
    Knowledge Center
    One-stop self-service portal for solutions, FAQs, Whitepapers, How Tos, Videos, and more
    Video channel for step-by-step instructions to use our products, best practices, troubleshooting tips, and much more
    Information library of the latest product documents
    Best practices and use cases from the Implementation team
  • Learn
    Rich resources to help you leverage full capabilities of our products
    Learn
    Role-based training programs for the best ROI
    Get certified on Informatica products. Free, Foundation, or Professional
    Free and unlimited modules based on your expertise level and journey
    Self-guided, intuitive experience platform for outcome-focused product capabilities and use cases
  • Resources
    Library of content to help you leverage the best of Informatica products
    Resources
    Most popular webinars on product architecture, best practices, and more
    Product Availability Matrix statements of Informatica products
    Monthly support newsletter
    Informatica Support Guide and Statements, Quick Start Guides, and Cloud Product Description Schedule
    End of Life statements of Informatica products
Last Updated Date Jun 09, 2021 |

Challenge

In today’s world, business decisions are dependent on the availability of quality data from a variety of applications and data sources. Data latency is just one aspect of data quality. Data latency becomes especially important in an environment where changes in the data records are captured and replicated to other data sources and applications. In this context, a simple definition of data latency is the amount of time between when the change in the data record was committed and when the changed data record was replicated to the target.

Description

To minimize data latency, organizations will implement Change Data Capture (CDC) in a near real-time fashion and use service level agreements (SLAs) to formally set their expectations related to overall data latency. Determining data latency can be difficult to accomplish because the frequency of data changes within a given table or set of tables is not predictable. When both the source and target are relational, a Heartbeat application can assist in determining data latency.

 

From a high-level, a Heartbeat application is a rather simple application which involves the following:

  1. An IICS or CDI mapping or other process will update a timestamp column in a source Heartbeat table record at a predetermined time interval. For example, every 15 minutes. Once the change is committed, the CDC process will capture the data change record.
  2. The existing IICS or CDI CDC mappings that consume the CDC data records from the same source database will also contain a Heartbeat CDC pipeline to consume the CDC records from the source Heartbeat table. In the Heartbeat CDC pipeline, all of the CDC records will be inserted into the target table after populating some additional columns. These columns will be used to determine overall data latency from the source to the target table.

 

The target Heartbeat table is a historical table so a set of queries or reports can be generated to report the source to target Heartbeat data latency over a specified timeframe. Since the Heartbeat application uses the same source database, IICS or CDI CDC mappings, and target database as the other CDC tables, it’s data latency will be very similar to the data latency experienced by the other CDC tables. In addition, this data latency information can be used to assist in determining realistic SLAs and alerting the operations staff when the CDC process is not performing as expected. An alert example might be notifying the operations staff when the target Heartbeat table has not processed any new CDC records in the past 45 - 60 minutes.

 

To implement a simple Heartbeat application the following activities or tasks need to be completed.

  • Each of the source databases using CDC should have a Heartbeat table that will be used to help determine data latency. Note that the primary key for this table is a composite primary key. This source Heartbeat table will only contain a single record that will be UPDATED at a predefined time interval, for example every 15 minutes, by another process.

Description of Source Heartbeat Table Columns

Column Name

Description

REC_TYPE

This column identifies the record as a Heartbeat record.

SRC_DBNAME

This column contains an acronym that identifies the source database.

LINE_OF_BUSINESS

This column contains an acronym that identifies the line of business associated with the source database.

SRC_TIMESTAMP

This column contains the system timestamp when the process that updates this record was executed.

SRC_DBTIMESTAMP

This column contains the source database timestamp when the source record was updated.


Sample Source Heartbeat Table DDL

CREATE TABLE schema.SRC_HEARTBEAT (
     REC_TYPE VARCHAR2 (9) NOT NULL,
     SRC_DBNAME VARCHAR2 (10) NOT NULL,
     LINE_OF_BUSINESS VARCHAR2 (10) NOT NULL,
     SRC_TIMESTAMP TIMESTAMP NOT NULL,
     SRC_DBTIMESTAMP TIMESTAMP DEFAULT SYSTIMESTAMP NOT NULL);
COMMIT;
CREATE INDEX SRC_HEARTBEAT_INDEX
     ON schema.SRC_HEARTBEAT (REC_TYPE, SRC_DBNAME, LINE_OF_BUSINESS,
             SRC_TIMESTAMP);
COMMIT;

  • A target Heartbeat table needs to be created in each of the CDC target databases being used. Note that the primary key for this table is a composite primary key. All of the source Heartbeat change records captured by the CDC process will be INSERTED into this target Heartbeat table. The records in this table will be used to determine the amount of data latency between the source and target databases in addition to other points in the process.

Description of Target Heartbeat Table Columns

Column Name

Description

REC_TYPE

This column identifies the record as a Heartbeat record.

SRC_DBNAME

This column contains an acronym that identifies the source database.

LINE_OF_BUSINESS

This column contains an acronym that identifies the line of business associated with the source database.

SRC_TIMESTAMP

This column contains the system timestamp when the process that updates this record was executed.

SRC_DBTIMESTAMP

This column contains the source database timestamp when the source record was updated.

CAPX_TIMESTAMP

This column contains the database timestamp when the update to the source Heartbeat table was committed.

CDC_MAP_NAME

This column should contain the first 20 characters of the IICS or CDI CDC map name that processed the CDC Heartbeat record. In some environments, there may be more than one IICS or CDI CDC mappings consuming the CDC data records from the same source database. In these situations, there should be a Heartbeat CDC pipeline in each of these mappings and the contents of the CDC_MAP_NAME will contain the IICS or CDI CDC map name.

CDC_MAP_TYPE

This column contains the type of IICS or CDI CDC session.

CDC_ MAP_TIMESTAMP

This column contains the current system timestamp when the CDC record was processed/consumed by the IICS or CDI CDC mapping.

TGT_DBTIMESTAMP

This column will contain the database timestamp when the Heartbeat record was inserted into the target database.


Sample Target Table DDL

CREATE TABLE schema.TGT_HEARTBEAT (
     REC_TYPE VARCHAR2 (9) NOT NULL,
     SRC_DBNAME VARCHAR2 (10) NOT NULL,
     LINE_OF_BUSINESS VARCHAR2 (10) NOT NULL,
     SRC_TIMESTAMP TIMESTAMP NOT NULL,
     SRC_DBTIMESTAMP TIMESTAMP NOT NULL,
     CAPX_TIMESTAMP TIMESTAMP NOT NULL,
     CDC_MAP_NAME VARCHAR2 (20) NOT NULL,
     CDC_MAP_TYPE VARCHAR2 (3) NOT NULL,
     CDC_MAP_TIMESTAMP TIMESTAMP NOT NULL,
     TGT_DBTIMESTAMP TIMESTAMP DEFAULT SYSTIMESTAMP NOT NULL,
CONSTRAINT TGT_HEARTBEAT_PK
     PRIMARY KEY (REC_TYPE, SRC_DBNAME, LINE_OF_BUSINESS , SRC_TIMESTAMP,
          CAPX_TIMESTAMP, CDC_MAP_NAME));
COMMIT;

  • A CDC Change Registration will need to be created for the source Heartbeat table. A CDC Change Registration identifies that all data changes for this table should be captured.
  • A process will need to be created that will update the single record in the source Heartbeat table at a predefined time interval. For example, every 15 minutes. This ensures that at least one data change record will be flowing through the IICS or CDI CDC mappings every 15 minutes.

Below are column values that this process will need to set.

 

Column Name

Description

REC_TYPE

Set the value to “HEARTBEAT” which identifies this as a Heartbeat record.

SRC_DBNAME

Set the value to the acronym that identifies the source database.

LINE_OF_BUSINESS

Set the value to the acronym that identifies the line of business.

SRC_TIMESTAMP

Set this value to the system timestamp when the process that updates this record was executed.

SRC_DBTIMESTAMP

Set this column to null. By default, this column will be set to the current timestamp for the source database.

 

  • A Heartbeat CDC pipeline will need to be added into each of the IICS or CDI CDC mappings that will INSERT the Heartbeat CDC records into the target database. It is important to remember to change the target table Operation property so that all of the CDC records are treated as INSERTs.

Below are column values that this process will need to set.

 

Column Name

Description

REC_TYPE

Send the source content direct to the target without any changes.

SRC_DBNAME

Send the source content direct to the target without any changes.

LINE_OF_BUSINESS

Send the source content direct to the target without any changes.

SRC_TIMESTAMP

Send the source content direct to the target without any changes.

SRC_DBTIMESTAMP

Send the source content direct to the target without any changes.

CAPX_TIMESTAMP

The source of the data for this column is DTL__CAPXTIMESTAMP which contain the timestamp of when the change data records were committed to the source Heartbeat table. The DTL__CAPXTIMESTAMP is in a string format which needs to be converted to a valid timestamp data type format.

CDC_MAP_NAME

This column contains the first 20 characters of the IICS or CDI CDC map name that is processing the source CDC data record.

CDC_MAP_TYPE

Type of IICS or CDI CDC session. Valid values are RT for real-time CDC and BAT for batch CDC.

CDC_ MAP_TIMESTAMP

Current system timestamp when the CDC record entered the IICS or CDI CDC mapping.

TGT_DBTIMESTAMP

Set this column to null. By default, this column will be set to the current timestamp for the target database.

 

  • To facilitate the reporting on the Heartbeat CDC target table, a view should be created that includes the calculation of data latency between the various timestamp columns.

 Description of the Target Heartbeat Reporting View Columns

Column Name

Description

REC_TYPE

This column identifies the record as a Heartbeat record.

SRC_DBNAME

This column contains an acronym that identifies the source database.

LINE_OF_BUSINESS

This column contains an acronym that identifies the line of business associated with the source database.

SRC_TIMESTAMP

This column contains the system timestamp when the process that updates this record was executed.

SRC_DBTIMESTAMP

This column contains the source database timestamp when the source record was updated.

CAPX_TIMESTAMP

This column contains the source database timestamp when the source update record was committed.

CDC_MAP_NAME

This column contains the first 20 characters of the IICS or CDI CDC map name that is processing the source CDC data record.

CDC_MAP_TYPE

This column contains the type of IICS or CDI CDC session.

CDC_ MAP_TIMESTAMP

This column contains the current system timestamp when the CDC record was processed/consumed by the IICS or CDI CDC mapping.

TGT_DBTIMESTAMP

This column will contain the database timestamp when the Heartbeat record was inserted into the target database.

LATENCY_ BTW_CAPX_AND_SRC

Difference between CAPX_TIMESTAMP and SRC_TIMESTAMP. Format should be mmmm.ss.

LATENCY_ BTW_MAP_AND_CAPX

Difference between MAP_TIMESTAMP and CAPX_TIMESTAMP. Format should be mmmm.ss.

LATENCY_ BTW_TGT_AND_MAP

Difference between TGT_TIMESTAMP and MAP_TIMESTAMP. Format should be mmmm.ss.

LATENCY_ BTW_TGT_AND_SRC

Difference between TGT_DBTIMESTAMP and SRC_TIMESTAMP. Format should be mmmm.ss.


Sample Target Table View DDL for Reporting

CREATE VIEW TGT_HEARTBEAT_VIEW (
     REC_TYPE,
     SRC_DBNAME,
     LINE_OF_BUSINESS,
     SRC_TIMESTAMP,
     SRC_DBTIMESTAMP,
     CAPX_TIMESTAMP,
     CDC_MAP_NAME,
     CDC_MAP_TYPE,
     CDC_MAP_TIMESTAMP,
     TGT_DBTIMESTAMP,
     CAPX_TIMESTAMP - SRC_TIMESTAMP,
     MAP_TIMESTAMP - CAPX_TIMESTAMP,
     TGT_DBTIMESTAMP - MAP_TIMESTAMP,
     TGT_DBTIMESTAMP - SRC_TIMESTAMP)
AS SELECT DATA_SOURCE,
     REC_TYPE,
     SRC_DBNAME,
     LINE_OF_BUSINESS,
     SRC_TIMESTAMP,
     SRC_DBTIMESTAMP,
     CAPX_TIMESTAMP,
     CDC_MAP_NAME,
     CDC_MAP_TYPE,
     CDC_MAP_TIMESTAMP,
     TGT_TIMESTAMP,
     LATENCY_ BTW_CAPX_AND_SRC,
     LATENCY_ BTW_MAP_AND_CAPX,
     LATENCY_ BTW_TGT_AND_MAP,
     LATENCY_ BTN_TGT_AND_SRC
FROM schema.TGT_HEARTBEAT
     ORDERED BY (REC_TYPE, SRC_DBNAME, LINE_OF_BUSINESS, SRC_TIMESTAMP);
COMMIT; 

  • A set of queries or reports will need to be created showing the data latency detail records or averages over a specified time period.
  • An IICS or CDI mapping/taskflow or another process will need to be created that will purge obsolete records from the target Heartbeat tables.

 

Table of Contents

Success

Link Copied to Clipboard