Key Management in Data Warehousing Solutions

Success

Manage your Success Plans and Engagements, gain key insights into your implementation journey, and collaborate with your CSMs

Success

Success Accelerators

Accelerate your Purchase to Value engaging with Informatica Architects for Customer Success

My Engagements

All your Engagements at one place
Communities

A collaborative platform to connect and grow with like-minded Informaticans across the globe

Communities

Product Communities

Connect and collaborate with Informatica experts and champions

Discussions

Have a question? Start a Discussion and get immediate answers you are looking for

User Groups

Customer-organized groups that meet online and in-person. Join today to network, share ideas, and get tips on how to get the most out of Informatica

Get Started

Community Guidelines
Knowledge Center

Troubleshooting documents, product guides, how to videos, best practices, and more

Knowledge Center

Knowledge Base

One-stop self-service portal for solutions, FAQs, Whitepapers, How Tos, Videos, and more

Support TV

Video channel for step-by-step instructions to use our products, best practices, troubleshooting tips, and much more

Documentation

Information library of the latest product documents

Velocity (Best Practices)

Best practices and use cases from the Implementation team
Learn

Rich resources to help you leverage full capabilities of our products

Learn

Trainings

Role-based training programs for the best ROI

Certifications

Get certified on Informatica products. Free, Foundation, or Professional

Product Learning Paths

Free and unlimited modules based on your expertise level and journey

Experience Lounge

Self-guided, intuitive experience platform for outcome-focused product capabilities and use cases
Resources

Library of content to help you leverage the best of Informatica products

Resources

Tech Tuesdays Webinars

Most popular webinars on product architecture, best practices, and more

Product Availability Matrix

Product Availability Matrix statements of Informatica products

SupportFlash

Monthly support newsletter

Support Documents

Informatica Support Guide and Statements, Quick Start Guides, and Cloud Product Description Schedule

Product Lifecycle

End of Life statements of Informatica products

Ideas

Events

Change Request Tracking

Marketplace

| Sign up

Velocity
Strategy

Strategy

Data Strategy

Centers of Excellence

Enterprise Data Governance

Enterprise Architecture

Program & Change Management
Solutions

Solutions

Cloud Data Warehouse & Data Lake

Data Lake

Data Warehouse Modernization

Analytics Modernization

Application Integration

360 Engagement

Multidomain MDM

Customer 360 SaaS

Product 360

Supplier 360

Reference 360

Data Governance & Privacy

Data Catalog & Metadata Management

Data Privacy

Regulatory Compliance

Data Quality

Data Access and Provisioning
Stages

Stages

Cloud Data Warehouse & Data Lake

360 Engagement

Data Governance & Privacy

Following a rigorous methodology is key to delivering customer satisfaction and expanding analytics use cases across the business.
More
- Success
  
  Manage your Success Plans and Engagements, gain key insights into your implementation journey, and collaborate with your CSMs
  
  Manage your Success Plans and Engagements, gain key insights into your implementation journey, and collaborate with your CSMs
  
  Success Accelerators
  
  Accelerate your Purchase to Value engaging with Informatica Architects for Customer Success
  
  My Engagements
  
  All your Engagements at one place
- Communities
  
  A collaborative platform to connect and grow with like-minded Informaticans across the globe
  
  A collaborative platform to connect and grow with like-minded Informaticans across the globe
  
  Product Communities
  
  Connect and collaborate with Informatica experts and champions
  
  Discussions
  
  Have a question? Start a Discussion and get immediate answers you are looking for
  
  User Groups
  
  Customer-organized groups that meet online and in-person. Join today to network, share ideas, and get tips on how to get the most out of Informatica
  
  Get Started
  
  Community Guidelines
- Knowledge Center
  
  Troubleshooting documents, product guides, how to videos, best practices, and more
  
  Troubleshooting documents, product guides, how to videos, best practices, and more
  
  Knowledge Base
  
  One-stop self-service portal for solutions, FAQs, Whitepapers, How Tos, Videos, and more
  
  Support TV
  
  Video channel for step-by-step instructions to use our products, best practices, troubleshooting tips, and much more
  
  Documentation
  
  Information library of the latest product documents
  
  Velocity (Best Practices)
  
  Best practices and use cases from the Implementation team
- Learn
  
  Rich resources to help you leverage full capabilities of our products
  
  Rich resources to help you leverage full capabilities of our products
  
  Trainings
  
  Role-based training programs for the best ROI
  
  Certifications
  
  Get certified on Informatica products. Free, Foundation, or Professional
  
  Product Learning Paths
  
  Free and unlimited modules based on your expertise level and journey
  
  Experience Lounge
  
  Self-guided, intuitive experience platform for outcome-focused product capabilities and use cases
- Resources
  
  Library of content to help you leverage the best of Informatica products
  
  Library of content to help you leverage the best of Informatica products
  
  Tech Tuesdays Webinars
  
  Most popular webinars on product architecture, best practices, and more
  
  Product Availability Matrix
  
  Product Availability Matrix statements of Informatica products
  
  SupportFlash
  
  Monthly support newsletter
  
  Support Documents
  
  Informatica Support Guide and Statements, Quick Start Guides, and Cloud Product Description Schedule
  
  Product Lifecycle
  
  End of Life statements of Informatica products
  
  Ideas
  
  Events
  
  Change Request Tracking
  
  Marketplace

Last Updated Date May 26, 2021 |

Stages Cloud Data Warehouse & Data Lake Best Practice

Challenge

Key management refers to the technique that manages key allocation in a decision support RDBMS to create a single view of reference data from multiple sources. Informatica recommends a concept of key management that ensures loading everything extracted from a source system into the data warehouse.

This Best Practice describes the Informatica recommended approach of key management, an approach that deviates from many traditional data warehouse solutions that apply logical and data warehouse (surrogate) key strategies where errors are loaded and transactions rejected for referential integrity issues.

Description

Key management in a decision support RDBMS comprises three techniques for handling the following common situations:

Key merging/matching
Missing keys
Unknown keys

All three methods are applicable to a Reference Data Store, whereas only the missing and unknown keys are relevant for an Operational Data Store (ODS). Key management should be handled at the data integration level, thereby making it transparent to the Business Intelligence layer.

Key Merging/Matching

When companies source data from more than one transaction system of a similar type, the same object may have different, non-unique legacy keys. Additionally, a single key may have several descriptions or attributes in each of the source systems. The independence of these systems can result in incongruent coding, which poses a greater problem than records being sourced from multiple systems.

A business can resolve this inconsistency by undertaking a complete code standardization initiative (often as part of a larger metadata management effort) or applying a Universal Reference Data Store (URDS). Standardizing code requires an object to be uniquely represented in the new system. Alternatively, URDS contains universal codes for common reference values. Most companies adopt this pragmatic approach, while embarking on the longer term solution of code standardization.

The bottom line is that nearly every data warehouse project encounters this issue and needs to find a solution in the short term.

Missing Keys

A problem arises when a transaction is sent through without a value in a column where a foreign key should exist (i.e., a reference to a key in a reference table). This normally occurs during the loading of transactional data, although it can also occur when loading reference data into hierarchy structures. In many older data warehouse solutions, this condition would be identified as an error and the transaction row would be rejected. The row would have to be processed through some other mechanism to find the correct code and loaded at a later date. This is often a slow and cumbersome process that leaves the data warehouse incomplete until the issue is resolved. This approach also results in incorrect reconciliation reports comparing fact attributes in the Data Warehouse against transactional Operational data. With this approach, reporting on rejected transactions, analyzing missing transactional value and to pin point the exact reference attribute responsible in causing this would become a challenge. It might be possible that a transactional record contains multiple reference attributes. To reject the complete record due to an issue with just one of the reference attribute is not correct as that transactional record can still be used for valid reporting on remaining reference attributes that have no issues.

A more practical way to resolve this situation is to allocate a special key in place of the missing key, which links it with a dummy 'missing key' row in the related table. This enables the transaction to continue through the loading process and end up in the warehouse without further processing. Furthermore, the row ID of the bad transaction can be recorded in an error log, allowing the addition of the correct key value at a later time.

This approach will resolve all the issues stated earlier caused due by the rejection of the transactional record due to missing referential keys.

Example:

PRODUCT	CUSTOMER	SALES REP	QUANTITY
Audi TT18	Doe10224		1

In the transaction above, there is no code in the SALES REP column. As this row is processed, a dummy sales rep key (UNKNOWN) is added to the record to link to a record in the SALES REP table. A data warehouse key (-8888888) is also added to the transaction.

PRODUCT	CUSTOMER	SALES REP	QUANTITY	DWKEY
Audi TT18	Doe10224	9999999	1	-8888888

The related sales rep record may look like this:

REP CODE	REP NAME	REP MANAGER
1234567	David Jones	Mark Smith
7654321	Mark Smith
9999999	Missing Rep

An error log entry to identify the missing key on this transaction may look like:

ERROR CODE	TABLE NAME	KEY NAME	KEY
MSGKEY	ORDERS	SALES REP	-8888888

This type of error reporting is not usually necessary because the transactions with missing keys can be identified using standard end-user reporting tools against the data warehouse.

Unknown Keys

Unknown keys need to be treated much like missing keys except that the load process has to add the unknown key value to the referenced table to maintain integrity rather than explicitly allocating a dummy key to the transaction. The process also needs to make two error log entries. The first, to log the fact that a new and unknown key has been added to the reference table and a second to record the transaction in which the unknown key was found.

Simple example:

The Product reference data record might look like the following:

Product_Key

Product_ID

Product_Name

Product_Category

IS_Active

Unit_Price

For this table, you could have a manually create an entry as below to handle missing Product_IDs.

Product_Key	Product_ID	Product_Name	Product_Category	IS_Active	Unit_Price
0	Unknown	Unknown	Unknown	Unknown	0

When you load a fact table, the lookup will be done against Product_ID field in the Product table. The fact mapping will have the below logic after the lookup transformation (on Product table): IIF(ISNULL(product_id),0,product_key).

This way, Product_Key attribute in the fact table will be stored as ‘0’ for the exceptions that can later be analyzed in an OLAP cube or a report by joining the fact table with the Product table using the Product_Key. Also, these keys can be populated manually with a value ‘0’ to ensure that the product key doesn’t have any conflicts with keys generated during the normal dimensional load using sequence generators. If we use Positive number key value for Unknown keys, there is a slight possibility that Key value may get overwritten after several years. Some organizations create dimensional/reference records manually that do not exist in operational systems. In these cases, as a best practice, these records are to be populated with a negative key value.

Dealing With Nulls in the Dimensional Model

Most RDBMS can use null value to represent an absence of data. But, most of the time nulls can confuse both developers and users because the database treats nulls differently from blanks or zeros. This section explains the three major areas where nulls are found in source data and recommends how to handle each situation.

Nulls as Fact Table Foreign Keys

This situation can be found in the source data for several reasons. Either the foreign key value is not known at the time of extract, is (correctly) not applicable to the source measurement, or is incorrectly missing from the source extract. Obviously, referential integrity is violated if a null is placed in a fact table column that is declared as a foreign key to a dimension table, because in a relational database, null is not equal to itself.

In the first case, especially with an accumulating snapshot fact table, we sometimes find columns tracking events which have not yet occurred. For example, in an orders tracking accumulating snapshot, a business might receive an order on the 31st, but not ship until the next month. The fact table’s Ship_Date will not be known when the fact row is first inserted. In this case, Ship_Date is a foreign key to the date dimension table, but will not join as users expect if the value is left as null. That is, any fact reporting from the date table joined on Ship_Date will exclude all orders with a null Ship_Date. To overcome this, Informatica recommends using a surrogate key, which joins to a special record in the date dimension table with a description like Data not yet available.

Similarly, there are cases when the foreign key is simply not applicable to the fact measurement, such as when promotion is a fact table foreign key, but not every fact row has a promotion associated with it. Here also, a special record is needed in the dimension table with a value such as No promotion in effect.

In the case where the foreign key is missing from the source extract when it shouldn’t be, you have a few options. You can assign it to another special record in the appropriate dimension with a meaningful description such as, Missing Key, or assign a specific record such as Missing key for source code #1234, or write the row out to a suspense file. In all cases, the offending row will need to be investigated for a root cause.

Nulls as Facts

In this case, the null value has two potential meanings. Either the value did not exist, or the system failed to capture the value. Either way, generally the value is left as null because most database products will handle nulls properly in aggregate functions including SUM, MAX, MIN, COUNT, and AVG. Substituting a zero instead would improperly skew aggregation calculations.

Nulls as Dimension Attributes

Generally dimension attribute nulls are encountered due to timing or dimension sub-setting. For example, perhaps not all the attributes have been captured yet, so there are some unknown attributes for a period of time. Likewise, there may be certain attributes that only apply to a subset of the dimension members. In either case, the same recommendation applies. Putting a null in these fields can be confusing to the user, as it will appear as a blank on reports and pull-down menus, and require special query syntax to find. Instead, Informatica recommends substituting an appropriately descriptive string for example, Unknown or Not provided.

Handling null/missing data in Sales data (or any financial data) in real-time in business intelligence allows permits users to drill-down to the problematic data and diagnose further to improve the data quality.

Employing the Informatica recommended key management strategy produces the following benefits:

All rows can be loaded into the data warehouse
All objects are allocated a unique key
Referential integrity is maintained
Load dependencies are removed