Following a rigorous methodology is key to delivering customer satisfaction and expanding analytics use cases across the business.
In today’s world, business decisions are dependent on the availability of quality data from a variety of applications and data sources. Data latency is just one aspect of quality data. Data latency becomes especially important in an environment where changes in the data records are captured and replicated to other data sources and applications. In this context, a simple definition of data latency is the amount of time between when the change in the data record was committed and when the changed data record was replicated to the target.
To help measure data latency, organizations will implement Cloud Mass Ingression (CMI) or Database Mass Ingestion (DBMI) with the incremental load process and use service level agreements (SLAs) to formally set their expectations related to overall data latency. Determining data latency can be difficult to accomplish because the frequency of data changes within a given table or set of tables is not predictable. When both the source and target are relational, a Heartbeat application can assist in determining data latency.
From a high-level, a Heartbeat application is a rather simple application. A process inserts a record into the source Heartbeat table record at a predetermined time interval. For example, every 15 minutes. Once the insert is committed, the CMI or DBMI incremental load process will replicate the insert data change record into the target Heartbeat table automatically.
The target Heartbeat table is a historical table so a set of queries or reports can be generated to report the source to target Heartbeat data latency over a specified timeframe. Since the Heartbeat application uses the same source database, CMI or DBMI incremental load process and target database as the other tables, it’s data latency will be very similar to the data latency experienced by the tables. In addition, this data latency information can be used to assist in determining realistic SLAs and alerting the operations staff when the replication process is not performing as expected. An alert example might be notifying the operations staff when the target Heartbeat table has not processed any new data change records in the past 45 - 60 minutes.
To implement a simple Heartbeat application the following activities or tasks need to be completed.
Column Name |
Description |
REC_TYPE |
This column identifies the record as a Heartbeat record. |
SRC_DBNAME |
This column contains an acronym that identifies the source database. |
LINE_OF_BUSINESS |
This column contains an acronym that identifies the line of business associated with the source database. |
SRC_TIMESTAMP |
This column contains the system timestamp when the process that inserted this record was executed. |
SRC_DBTIMESTAMP |
This column contains the source database timestamp when the Heartbeat record was inserted into the source database. |
TGT_DBTIMESTAMP |
This column will contain the database timestamp when the Heartbeat record was inserted into the target database. |
CREATE TABLE schema.HEARTBEAT_TABLE (
REC_TYPE VARCHAR2 (9) NOT NULL,
SRC_DBNAME VARCHAR2 (10) NOT NULL,
LINE_OF_BUSINESS VARCHAR2 (10) NOT NULL,
SRC_TIMESTAMP TIMESTAMP NOT NULL,
SRC_DBTIMESTAMP TIMESTAMP DEFAULT SYSTIMESTAMP NOT NULL,
TGT_DBTIMESTAMP TIMESTAMP,
CONSTRAINT HEARTBEAT_PK
PRIMARY KEY (REC_TYPE, SRC_DBNAME, LINE_OF_BUSINESS , SRC_TIMESTAMP));
COMMIT;
ALTER TABLE schema.HEARTBEAT_TABLE
MODIFY TGT_DBTIMESTAMP TIMESTAMP DEFAULT SYSTIMESTAMP NOT NULL;
COMMIT;
Below are column values that this process will need to set.
Column Name |
Description |
REC_TYPE |
Set the value to “HEARTBEAT” which identifies this as a Heartbeat record. |
SRC_DBNAME |
Set the value to the acronym that identifies the source database. |
LINE_OF_BUSINESS |
Set the value to the acronym that identifies the line of business. |
SRC_TIMESTAMP |
Set this value to the system timestamp when the process that updates this record was executed. |
SRC_DBTIMESTAMP |
Set this column to null. By default, this column will be set to the current timestamp for the source database. |
TGT_DBTIMESTAMP |
Set this column to null. |
Column Name |
Description |
REC_TYPE |
This column identifies the record as a Heartbeat record. |
SRC_DBNAME |
This column contains an acronym that identifies the source database. |
LINE_OF_BUSINESS |
This column contains an acronym that identifies the line of business associated with the source database. |
SRC_TIMESTAMP |
This column contains the system timestamp when the process that updates this record was executed. |
SRC_DBTIMESTAMP |
This column contains the source database timestamp when the source record was inserted into the source database. |
TGT_DBTIMESTAMP |
This column will contain the database timestamp when the Heartbeat record was inserted into the target database. |
LATENCY_ BTW_TGT_AND_SRC |
Difference between TGT_DBTIMESTAMP and SRC_TIMESTAMP. Format should be mmmm.ss. |
CREATE VIEW HEARTBEAT_VIEW (
REC_TYPE,
SRC_DBNAME,
LINE_OF_BUSINESS,
SRC_TIMESTAMP,
SRC_DBTIMESTAMP,
TGT_DBTIMESTAMP,
TGT_DBTIMESTAMP - SRC_TIMESTAMP)
AS SELECT DATA_SOURCE,
REC_TYPE,
SRC_DBNAME,
LINE_OF_BUSINESS,
SRC_TIMESTAMP,
SRC_DBTIMESTAMP,
TGT_TIMESTAMP,
LATENCY_ BTN_TGT_AND_SRC
FROM schema.HEARTBEAT_TABLE
ORDERED BY (REC_TYPE, SRC_DBNAME, LINE_OF_BUSINESS, SRC_TIMESTAMP);
Success
Link Copied to Clipboard