• Success
    Manage your Success Plans and Engagements, gain key insights into your implementation journey, and collaborate with your CSMs
    Success
    Accelerate your Purchase to Value engaging with Informatica Architects for Customer Success
  • Communities
    A collaborative platform to connect and grow with like-minded Informaticans across the globe
    Communities
    Connect and collaborate with Informatica experts and champions
    Have a question? Start a Discussion and get immediate answers you are looking for
    Customer-organized groups that meet online and in-person. Join today to network, share ideas, and get tips on how to get the most out of Informatica
  • Knowledge Center
    Troubleshooting documents, product guides, how to videos, best practices, and more
    Knowledge Center
    One-stop self-service portal for solutions, FAQs, Whitepapers, How Tos, Videos, and more
    Video channel for step-by-step instructions to use our products, best practices, troubleshooting tips, and much more
    Information library of the latest product documents
    Best practices and use cases from the Implementation team
  • Learn
    Rich resources to help you leverage full capabilities of our products
    Learn
    Role-based training programs for the best ROI
    Get certified on Informatica products. Free, Foundation, or Professional
    Free and unlimited modules based on your expertise level and journey
    Self-guided, intuitive experience platform for outcome-focused product capabilities and use cases
  • Resources
    Library of content to help you leverage the best of Informatica products
    Resources
    Most popular webinars on product architecture, best practices, and more
    Product Availability Matrix statements of Informatica products
    Monthly support newsletter
    Informatica Support Guide and Statements, Quick Start Guides, and Cloud Product Description Schedule
    End of Life statements of Informatica products
Last Updated Date May 26, 2021 |

Challenge

Key management refers to the technique that manages key allocation in a decision support RDBMS to create a single view of reference data from multiple sources. Informatica recommends a concept of key management that ensures loading everything extracted from a source system into the data warehouse.

This Best Practice describes the Informatica recommended approach of key management, an approach that deviates from many traditional data warehouse solutions that apply logical and data warehouse (surrogate) key strategies where errors are loaded and transactions rejected for referential integrity issues.

Description

Key management in a decision support RDBMS comprises three techniques for handling the following common situations: 

  • Key merging/matching
  • Missing keys
  • Unknown keys

All three methods are applicable to a Reference Data Store, whereas only the missing and unknown keys are relevant for an Operational Data Store (ODS). Key management should be handled at the data integration level, thereby making it transparent to the Business Intelligence layer.

Key Merging/Matching

When companies source data from more than one transaction system of a similar type, the same object may have different, non-unique legacy keys. Additionally, a single key may have several descriptions or attributes in each of the source systems. The independence of these systems can result in incongruent coding, which poses a greater problem than records being sourced from multiple systems.

A business can resolve this inconsistency by undertaking a complete code standardization initiative (often as part of a larger metadata management effort) or applying a Universal Reference Data Store (URDS). Standardizing code requires an object to be uniquely represented in the new system. Alternatively, URDS contains universal codes for common reference values. Most companies adopt this pragmatic approach, while embarking on the longer term solution of code standardization.

The bottom line is that nearly every data warehouse project encounters this issue and needs to find a solution in the short term.

Missing Keys

A problem arises when a transaction is sent through without a value in a column where a foreign key should exist (i.e., a reference to a key in a reference table). This normally occurs during the loading of transactional data, although it can also occur when loading reference data into hierarchy structures. In many older data warehouse solutions, this condition would be identified as an error and the transaction row would be rejected. The row would have to be processed through some other mechanism to find the correct code and loaded at a later date. This is often a slow and cumbersome process that leaves the data warehouse incomplete until the issue is resolved. This approach also results in incorrect reconciliation reports comparing fact attributes in the Data Warehouse against transactional Operational data. With this approach, reporting on rejected transactions, analyzing missing transactional value and to pin point the exact reference attribute responsible in causing this would become a challenge. It might be possible that a transactional record contains multiple reference attributes. To reject the complete record due to an issue with just one of the reference attribute is not correct as that transactional record can still be used for valid reporting on remaining reference attributes that have no issues.

A more practical way to resolve this situation is to allocate a special key in place of the missing key, which links it with a dummy 'missing key' row in the related table. This enables the transaction to continue through the loading process and end up in the warehouse without further processing. Furthermore, the row ID of the bad transaction can be recorded in an error log, allowing the addition of the correct key value at a later time.

This approach will resolve all the issues stated earlier caused due by the rejection of the transactional record due to missing referential keys.

Example:

PRODUCT

CUSTOMER

SALES REP

QUANTITY

Audi TT18     

Doe10224         

 

 1

In the transaction above, there is no code in the SALES REP column. As this row is processed, a dummy sales rep key (UNKNOWN) is added to the record to link to a record in the SALES REP table. A data warehouse key (-8888888) is also added to the transaction.

PRODUCT

CUSTOMER

SALES REP

QUANTITY

DWKEY

Audi TT18     

Doe10224    

9999999

 1

-8888888

The related sales rep record may look like this:

REP CODE

REP NAME

REP MANAGER

1234567

David Jones

Mark Smith

7654321

Mark Smith

 

9999999

Missing Rep

 

An error log entry to identify the missing key on this transaction may look like:

ERROR CODE

TABLE NAME 

KEY NAME

KEY

MSGKEY

ORDERS

SALES REP

 -8888888

This type of error reporting is not usually necessary because the transactions with missing keys can be identified using standard end-user reporting tools against the data warehouse.

Unknown Keys

Unknown keys need to be treated much like missing keys except that the load process has to add the unknown key value to the referenced table to maintain integrity rather than explicitly allocating a dummy key to the transaction. The process also needs to make two error log entries. The first, to log the fact that a new and unknown key has been added to the reference table and a second to record the transaction in which the unknown key was found.

Simple example:

The Product reference data record might look like the following:

Product_Key

Product_ID

Product_Name

Product_Category

IS_Active

Unit_Price

For this table, you could have a manually create an entry as below to handle missing Product_IDs.

Product_Key

Product_ID

Product_Name

Product_Category

IS_Active

Unit_Price

0

Unknown

Unknown

Unknown

Unknown

0

When you load a fact table, the lookup will be done against Product_ID field in the Product table. The fact mapping will have the below logic after the lookup transformation (on Product table): IIF(ISNULL(product_id),0,product_key).

This way, Product_Key attribute in the fact table will be stored as ‘0’ for the exceptions that can later be analyzed in an OLAP cube or a report by joining the fact table with the Product table using the Product_Key.  Also, these keys can be populated manually with a value ‘0’ to ensure that the product key doesn’t have any conflicts with keys generated during the normal dimensional load using sequence generators. If we use Positive number key value for Unknown keys, there is a slight possibility that Key value may get overwritten after several years. Some organizations create dimensional/reference records manually that do not exist in operational systems. In these cases, as a best practice, these records are to be populated with a negative key value. 

Dealing With Nulls in the Dimensional Model

Most RDBMS can use null value to represent an absence of data. But, most of the time nulls can confuse both developers and users because the database treats nulls differently from blanks or zeros. This section explains the three major areas where nulls are found in source data and recommends how to handle each situation.

Nulls as Fact Table Foreign Keys

This situation can be found in the source data for several reasons. Either the foreign key value is not known at the time of extract, is (correctly) not applicable to the source measurement, or is incorrectly missing from the source extract. Obviously, referential integrity is violated if a null is placed in a fact table column that is declared as a foreign key to a dimension table, because in a relational database, null is not equal to itself.

In the first case, especially with an accumulating snapshot fact table, we sometimes find columns tracking events which have not yet occurred. For example, in an orders tracking accumulating snapshot, a business might receive an order on the 31st, but not ship until the next month. The fact table’s Ship_Date will not be known when the fact row is first inserted. In this case, Ship_Date is a foreign key to the date dimension table, but will not join as users expect if the value is left as null. That is, any fact reporting from the date table joined on Ship_Date will exclude all orders with a null Ship_Date. To overcome this, Informatica recommends using a surrogate key, which joins to a special record in the date dimension table with a description like Data not yet available.

Similarly, there are cases when the foreign key is simply not applicable to the fact measurement, such as when promotion is a fact table foreign key, but not every fact row has a promotion associated with it. Here also, a special record is needed in the dimension table with a value such as No promotion in effect.

In the case where the foreign key is missing from the source extract when it shouldn’t be, you have a few options. You can assign it to another special record in the appropriate dimension with a meaningful description such as, Missing Key, or assign a specific record such as Missing key for source code #1234, or write the row out to a suspense file. In all cases, the offending row will need to be investigated for a root cause.

Nulls as Facts

In this case, the null value has two potential meanings. Either the value did not exist, or the system failed to capture the value. Either way, generally the value is left as null because most database products will handle nulls properly in aggregate functions including SUM, MAX, MIN, COUNT, and AVG. Substituting a zero instead would improperly skew aggregation calculations.

Nulls as Dimension Attributes

Generally dimension attribute nulls are encountered due to timing or dimension sub-setting. For example, perhaps not all the attributes have been captured yet, so there are some unknown attributes for a period of time. Likewise, there may be certain attributes that only apply to a subset of the dimension members. In either case, the same recommendation applies. Putting a null in these fields can be confusing to the user, as it will appear as a blank on reports and pull-down menus, and require special query syntax to find. Instead, Informatica recommends substituting an appropriately descriptive string for example, Unknown or Not provided.

Handling null/missing data in Sales data (or any financial data) in real-time in business intelligence allows permits users to drill-down to the problematic data and diagnose further to improve the data quality.

Employing the Informatica recommended key management strategy produces the following benefits:

  • All rows can be loaded into the data warehouse
  • All objects are allocated a unique key
  • Referential integrity is maintained
  • Load dependencies are removed

Table of Contents

Success

Link Copied to Clipboard