• Success
    Manage your Success Plans and Engagements, gain key insights into your implementation journey, and collaborate with your CSMs
    Success
    Accelerate your Purchase to Value engaging with Informatica Architects for Customer Success
  • Communities
    A collaborative platform to connect and grow with like-minded Informaticans across the globe
    Communities
    Connect and collaborate with Informatica experts and champions
    Have a question? Start a Discussion and get immediate answers you are looking for
    Customer-organized groups that meet online and in-person. Join today to network, share ideas, and get tips on how to get the most out of Informatica
  • Knowledge Center
    Troubleshooting documents, product guides, how to videos, best practices, and more
    Knowledge Center
    One-stop self-service portal for solutions, FAQs, Whitepapers, How Tos, Videos, and more
    Video channel for step-by-step instructions to use our products, best practices, troubleshooting tips, and much more
    Information library of the latest product documents
    Best practices and use cases from the Implementation team
  • Learn
    Rich resources to help you leverage full capabilities of our products
    Learn
    Role-based training programs for the best ROI
    Get certified on Informatica products. Free, Foundation, or Professional
    Free and unlimited modules based on your expertise level and journey
    Self-guided, intuitive experience platform for outcome-focused product capabilities and use cases
  • Resources
    Library of content to help you leverage the best of Informatica products
    Resources
    Most popular webinars on product architecture, best practices, and more
    Product Availability Matrix statements of Informatica products
    Monthly support newsletter
    Informatica Support Guide and Statements, Quick Start Guides, and Cloud Product Description Schedule
    End of Life statements of Informatica products
Last Updated Date May 25, 2021 |

Designing the Data Model

This document describes what implementers must be aware of before building the data model for an Informatica MDM Hub implementation project. It is recommended for all implementers and anyone else who must understand the Master Reference Manager data model. To learn more about the data model, see the Informatica MDM Hub Administrator’s Guide.

Note: This document assumes  the user is familiar with conventional data modeling methodologies; it supplements conventional data modeling techniques with MDM Hub-specific recommendations.

About Data Modeling for MDM

Data modeling for MDM is a combination of:

  • Standard logical data modeling theory,
  • Product functionality and performance considerations and guidelines, and
  • Common patterns and practitioner techniques.

Data modelers and design consultants responsible for defining the data model for the MDM Hub require expertise in relational modeling at the conceptual, logical, and physical levels. The following sections introduce the various types of models necessary to develop an Informatica MDM Hub implementation: Data Model Design Deliverables, Conceptual Model, Logical Model, and Physical Model.

Data Model Design Deliverables

The process of designing the data model for consolidated reference data for an Informatica MDM Hub implementation involves a series of deliverables.

  • The design generally starts with a conceptual model, which identifies the main objects to be managed in the MDM Hub. It also identifies which objects will be consolidated, because match criteria ultimately drive modeling decisions for the physical model.
  • The conceptual model is used as the starting point for the logical model, which provides a logical representation of the entities and attributes to be managed in the MDM Hub.
  • The logical model is transformed into a physical model, which is the model that is then defined in the MDM Hub using the Schema Manager in the Hub Console. Transitioning from a logical model to an ideal MDM Hub physical model involves design principles that are described in “Design Principles” later in this document. The physical model is the final output from the data modeling design steps, and it is the model that the business and system owners need to approve.

The following figure shows the increasing level of detail and number of entities in conceptual, logical, and physical models.

mdm-hub-model-design-1.jpg

Conceptual Model

The purpose of the conceptual model is to identify and describe the main objects needed to create a global business view of the data, with little detail. This step is often skipped in typical IT projects, or it might be combined with the logical model. However, for Informatica MDM Hub implementations, it is very important to go through this step because it starts the process of thinking about match requirements, which impact the physical model design.

The conceptual model for an Informatica MDM Hub implementation shows the business entities that will need to be managed in the MDM Hub, along with the relationships among the business entities and some high-level design properties. The conceptual model might look similar to entity relationship diagrams (ERDs). To facilitate logical and physical (or logical to physical) data model design, the Match and Merge and Inter-table Match Parent properties are the most critical properties to identify (to learn more, see the Informatica MDM Hub Administrator’s Guide). One approach is to begin with the worst case match scenario, determine the elements in the token match table, and then trim this down to the tables that would be realistically used for matching.

The following figure shows an example of a conceptual data model.

mdm-hub-model-design-2

The conceptual model must be derived from the system requirements, with inputs from analyses of internal and external business system data sources.

Note: For some projects, a pre-existing logical data model might be available. In such cases, it is still important to create a conceptual data model to ensure that the Match and Merge requirements—that can have a significant impact on the subsequent physical data model—have been identified.

Logical Model

The purpose of building a logical model is to confirm that the application will satisfy the business requirements. A logical model represents the entities, relationships, and attributes that are representative of the business information needs. A logical model should include only the objects that will be stored in the Hub (not the full Enterprise model), and is usually a normalized model. Normalization is the process of determining stable attribute groupings in entities with high interdependency and affinity.

The basic normalization principles that should be considered in developing the logical data model include:

  • 3rd Normal Form
    • No repeating fields
    • Non-key attributes fully dependent on primary key
    • All attributes directly dependent on primary key
  • No transitive dependencies
  • Resolution of many-to-many relationships with intersection tables
  • No fields that can be calculated from others

By defining entities, attributes, and their relationships, the user might discover data model design flaws that could produce anomalies. Data flaws include:

  • Missing entities
  • Multiple entities that represent the same conceptual entity
  • Many-to-many relationships that need additional entities to resolve the many-to-many relationship by creating an intersection table, thus turning the many-to-many relationship into two one-to-many relationships
  • Multivalued and redundant attributes

Example Logical Model with Design Flaws

The following figure shows an example of a logical model that has some design flaws.

mdm-hub-model-design-3

This logical model is based on the previous conceptual model example shown in the figure in Conceptual Model. It has the following design flaws:

  • Affiliation Role probably needs a Lookup table to define the different types of Affiliation Roles (missing entity).
  • Repeating attributes (phone numbers, fax numbers, email addresses) can be normalized into an Electronic Address entity.

Example: Logical Model with Fixed Design Flaws

The following figure shows the logical model after it has been fully normalized and missing entities have been added.

 mdm-hub-model-design-4

The logical model includes the following new entities:

  • An Electronic Address entity has been added to handle the repeating phone and fax number attributes (which have therefore been removed from the Customer Address intersection table).
  • An Electronic Address Type table has been added to provide definitions for the types of electronic address represented in each record.
  • An Affiliation Role lookup table has been added.

Pre-Existing and New Logical Models

Before considering how the logical model will transition to a physical model, it is important to get the logical model right. In some Informatica MDM Hub implementations, such as those for C360 orS360, a pre-defined logical model is available. In such situations, it is necessary to evaluate the logical model to make sure that:

  • it meets the stated business needs
  • it makes logical sense
  • the entities and attributes in the logical model can be populated from the source systems (there is small point in modeling entities or attributes that cannot be populated from the source systems)

In other Informatica MDM Hub implementations, it is important to define the logical model from scratch. In such cases, the logical model can be defined in a way that suits the business needs and is more closely aligned with the models for which the MDM Hub is tuned.

Objects in the Logical Model

When modeling for the MDM Hub, the logical model must focus on the actual entities that will be defined in the MDM Hub as base objects.

Type of Object

Description

Base Objects

Used to describe central business entities, such as Customer, Product, or Employee. In a base object, data from multiple sources can be consolidated or merged. Trust settings are used to determine the most reliable value for each base object cell. In addition, one-to-many relationships (foreign keys) can be defined between base objects.


Do not model history, cross-references, etc., as the MDM Hub automatically creates and manages these structures. In addition, avoid adding landing tables or staging tables to the logical model, because they clutter the model. Model landing tables as part of the physical model.

Remember that the logical model is not an enterprise-wide data model. The logical model is a model for reference data only, and it is usually only for a specific subset of the reference data (such as Customer data). Similarly, do not include transaction data in the logical model, and limit the model to the reference data that is to be managed in the MDM Hub. Finally, be aware that the physical model—not the logical model—is the actual model that will be implemented for the MDM Hub.

Physical Model

The physical model is the actual model being defined using the Schema Manager in the Hub Console (to learn more, see the Informatica MDM Configuration Guide). Thus, it is a subset of the complete, physical schema that will be generated by the MDM Hub. The physical model diagram shows the base objects, dependent objects, and landing tables to be implemented in the MDM Hub.

The rule of thumb for physical model diagrams is to show the user-defined entities and attributes, plus the primary and foreign keys, so that relationships can be modeled correctly. In the physical model, avoid showing MDM Hub-generated entities or attributes other than primary and foreign keys. All supporting tables—such as cross-references, history tables, control tables, and staging tables—will be created by the MDM Hub and are not included in the physical model diagram.

The MDM Hub is flexible enough to implement any logical model as a physical model, but it will work better with some types of models than with others. Performance is the main driver for differences between the logical model and the physical model. Before developing a physical model for an Informatica MDM Hub implementation, carefully review the logical model in light of its performance implications. An ideal physical model for the MDM Hub is a balance between a completely denormalized model (best performance) and highly normalized (best flexibility).

The following figure shows an example physical model based on the logical model described previously.

 mdm-hub-model-design-5

Notice that all of the entities defined in the logical model will be implemented as base objects and that ROWID_OBJECT is used for all primary keys. In addition, notice that the many-to-many relationship between the Customer and Address entities in the logical model has been changed to a one-to-many relationship in the physical model. The reasons for these changes will be explained in the “Design Principles” section later in this chapter.

When designing the physical model, consider the following factors: Required Functionality, Performance and Scalability, Flexibility for Future Use, and Informatica MDM Product Roadmap.

Required Functionality

Required functionality is one of the key factors affecting design decisions in the physical model. An example of functionality requirements would be if a history of changes to attributes of an object must be kept, then define that object as a base object.

Performance and Scalability

The more denormalized the model, the fewer levels of consolidation are available, and the more difficult it can be to add new data sources and new attributes or entities in the future. Find a balance between modeling for performance (denormalizing) and modeling for functionality/flexibility (normalizing). Do not denormalize simply for the sake of denormalizing—there are some areas that are better to denormalize than others, as they yield the most performance benefit with the least functionality/flexibility loss. These issues are discussed in detail in the Design Principles section later in this chapter.

Flexibility for Future Use

When defining the physical model, it is important to keep possible, future requirements in mind but without adding entities or attributes that cannot yet be maintained or that are not yet fully understood. Sometimes building in system flexibility is as simple as naming things flexibly. For example, when building a Customer master for Organization data and the plan is to add Person data to that Customer Master within the next year, then consider using a name other than “Organization” (such as “Business Party”) for the Customer table, because the table may well end up containing both Organization and Customer data.

Be wary of adding physical limitations that might later cause problems. One example of this includes specifying user-defined unique keys on base objects. By defining a unique key on a base object, records cannot be merged in that base object. Although this might not be a problem in the initial implementation of a project, it is not uncommon for new sources that are later added to a system will bring their own values for the base object with the unique key, making it desirable to use match and merge functionality to consolidate the new system’s data with that of the original system’s data.

Informatica MDM Product Roadmap

An optimal physical design for an Informatica MDM Hub implementation takes into account what is known of future requirements, the Informatica MDM product roadmap, and the intersection between them. If there are any questions about how the model relates to the Informatica MDM product roadmap, arrange (through Informatica MDM Support) for a data model review with Informatica MDM product team.

If modeling types of objects (such as households) or types of relationships that are not discussed in this document, then review the data model with the Informatica MDM Solution Architect to make sure that the model does not run contrary to any assumptions in the MDM Hub design, quality assurance, or planned features. This review should be conducted as part of the data model checkpoint review that should already be built into the project plan.

Design Principles

This section describes some underlying design principles for transitioning from a highly normalized logical model to a physical model.

Principle 1: Consider Deep vs. Wide

This design principle refers to the number of direct child tables linked to a parent table. The following figure shows the two different types of designs.

 mdm-hub-model-design-6

This principle applies when merging or unmerging on the parent table.

The design principles mainly affect performance of the merge and unmerge processes.

The more directly-linked child tables that a parent table has, the more those tables must have foreign key references updated when records merge in the parent table. Therefore, the more child tables a parent table has, the slower the merges for the parent table will be.

This principle applies to the unmerge process as well. For unmerges in a deep model, consider how far the unmerge will be allowed to cascade. Which child tables need to have cascade unmerge enabled? How many child tables deep should be chosen to enable the Unmerge on Parent Unmerge flag? The more child tables with merged records and the Unmerge on Parent Unmerge flag enabled, the more work the unmerge needs to do, and therefore the slower the unmerge process.

Principle 2: Match Requirements Drive the Model

Match criteria also drive physical data model decisions with respect to functionality. Inter-table match criteria involves the use of attributes from one table in the match rules of a related table—for example, matching customers using address information from the Address table. For more information, see Address Example.

Another area in which required match functionality can affect the physical model is the way the match rules must be defined. If there is a need to define an AND match rule, it is necessary to denormalize repeating attributes that are to be used in the match rule. Normalizing repeated attributes into a child table allows OR match rules on the normalized attributes, not AND match rules.

For example, by creating an Electronic Address table that contains phone numbers and e-mail addresses, it is possible to use these in a match rule that identifies records as matching if the phone numbers are the same OR if the email addresses are the same. If the user needs a match rule that identifies records as matching if the phone numbers match AND the email addresses match, then it is essential to denormalize these into separate columns.

The following figure shows an example of a normalized Electronic Address table that supports OR match conditions only.

 mdm-hub-model-design-7

This Electronic Address table supports match rules in which phone numbers matched OR e-mail addresses matched. In the example shown, Customer IDs 12345, 45678, and 00001 would all be identified as matches for one another because of their matching phone numbers.

The following figure shows an example of denormalized attributes to support AND match conditions.

 mdm-hub-model-design-8

Logically, this table shows the same data as in the normalized Electronic Address table, but the physical structure has been denormalized to support match conditions that specify AND criteria. In this example, Customer IDs 12345 and 45678 would match because their phone numbers match AND their email addresses match. Customer ID 00001 would not be considered a match for the other two records because it has a different e-mail address. (This could also be addressed by creating a separate table for phone and email). For more information, see Communication Channel Models.

Principle 3: Consolidation Counts

The physical model must take into account the required results after consolidation and, particularly, the desired cardinality of base object to cross-references after consolidation (where cardinality is the ratio of the number of records in the base object to the number of records in the cross-reference). The physical model must also consider the effects of source updates on the surviving record. This section describes several examples to illustrate this principle.

Physician Specialties Example

A physician can have one or more specialties. Pharmaceutical companies are often interested in identifying only the primary specialty for a physician. However, when two physician records are merged from different sources, those sources might provide different values for the physician's primary specialty. If the required cardinality after merging the specialties is one surviving primary specialty, then include Primary Specialty as a column on the Physician base object. However, if the pharmaceutical company wants to keep all of the specialties for the merged physician record, then Physician Specialty must be a child table of the Physician table.

Address Example

Logically, a single address can belong to multiple customers. For example, office addresses can be shared by colleagues at the same location, or group practice addresses can be shared by partners in the same law firm. Of course, a customer can also have multiple addresses. For this reason, logical models usually have customer and address as distinct entities with a many-to-many intersection table between them.

However, in a physical model for consolidated data, this approach is not necessarily practical, especially when trying to reduce duplication in addresses from multiple sources. Consolidating addresses when they are not directly linked to customers means that the user is consolidating addresses across customers. For example, in the following figure, N.E. One and Ann Other both have the same address. If the two address records are merged, then one survived address record will remain and that record will be linked to both N.E. One and Ann Other through the Customer Address intersection table.

 mdm-hub-model-design-9

Avoid consolidating addresses across customers unless there is a real business need for an enterprise-wide unique ID per physical address location. Even if there is a real business need, there are other ways to model this instead. For more information, see Design Patterns.

Consolidating addresses across customers involves limiting address changes to the right customers, performance considerations, and functionality considerations.

Limiting Address Changes to the Right Customers

If one Customer changes their address, make sure that the address change is not automatically applied to the consolidated address record for all customers. For example, in the figure shown in Address Example, if N.E. One moves their office, it does not mean that Ann Other has also moved their office, so the consolidated address that was previously linked to both N.E. One and Ann Other now belongs only to Ann Other.

Performance Considerations

Consolidating addresses across customers usually means that there is a high degree of cardinality between the source addresses and the resultant consolidated addresses.

The higher the number of duplicate records, the more work the merge must do to process them. The cardinality is reduced if Customer ID is one of the match criteria for addresses—that is, if addresses are consolidated only within customer records, not across them. The following figure shows the recommended approach for customer address relationships.

 mdm-hub-model-design-10

Using this approach also reduces the number of tables that must be staged and loaded. This approach does not necessarily yield a large performance gain if the implementation involves only a handful of source systems to process. However, the more source systems that are configured to process, the higher the performance impact each additional target table will have on stage and load batches. For example, a Informatica MDM Hub implementation with five sources for the previous model (shown without consolidated addresses in Business Party and Differentiated Customer Models) requires 15 stage jobs and 15 load jobs. An implementation with ten sources for that same model requires 30 stage and 30 load jobs. For the model with consolidated addresses, five sources require ten stage and ten load jobs, and ten sources require 20 stage and 20 load jobs.

Functionality Considerations

Modeling customer address as a direct (one-to-many) relationship between customer and address means that customer address attributes can be stored directly on the Customer Address base object or as a child base object linked to Customer Address. As long as the attributes are part of a base object, the MDM Hub tracks their history.

This approach also means that Customer can use attributes from child tables of the Customer Address table for matching.

Similarly, keeping Customer Address attributes in base objects means that duplicate or overlapping attribute values from multiple sources can be consolidated to get to “best of breed” values for those attributes.

Principle 4: Pass the Independence Test

Independent base objects are base objects that are not linked to the core consolidated object through a one-to-many or a many-to-one relationship but instead, are linked through many-to-many intersection tables. If a base object is modeled as an independent base object, then its records should be logical alone, without any reference to the core base object. It should make sense to consolidate its records to a distinct set of values.

Steps for Testing Independence

The independence test for a physical model includes the following steps:

  1. Identify the core base object that is being consolidated in the Hub Store—Customer in a Customer Master, Supplier in a Supplier Master, etcetera.
  2. Look for any many-to-many relationships (direct or indirect).
  3. Inspect the base object that is on the other side of the many-to-many relationship and ask the question: “What can the business do with a distinct list of the things in this object without knowing who the Customer is?” If the answer is “Nothing,” then change the many-to-many to a one-to-many relationship.

Example Using a Highly Normalized Model

The following figure shows an example of a highly normalized model.

 mdm-hub-model-design-11

In this model, Specialty, Address, and Electronic Address are all linked to the core object—Customer—through many-to-many relationships. Apply the independence test by asking the following questions:

Question

Answer

What can the business do with a distinct list of Specialties without knowing who the Customer is?

The distinct list of Specialties can be used to provide a pick or lookup list of Specialty values in a capture screen for new Customer information. The business wants to standardize the list of Specialties it uses in reporting by assigning each source specialty to a consolidated enterprise specialty value.

What can the business do with a distinct list of Addresses without knowing who the Customer is?

In most cases, the answer to this question is “Nothing.” Addresses are usually meaningful only in terms of the Customer to whom the Address belongs.

What can the business do with a distinct list of Electronic Addresses (for example, telephone numbers) without knowing who the Customer is?

“Nothing”—a telephone number has no significance in its own right.


Converting relationships from many-to-many to one-to-many for the objects that failed the independence test would result in the model shown in the following figure.

 mdm-hub-model-design-12

A trick of the trade is creating the PKEY_Source from the standardized Address itself (123MAINSTSONOMACA95476). This will:

  • Eliminate duplicates from the source—one unique address per source—which solves the merge problem from high cardinality
  • Allow for Match on Primary Key
  • Prevent changes from one source/customer from affecting addresses from other sources/customers—pointer changes in Relationship Table
  • Still have the issue of deactivating addresses or dealing with address trust across sources

Principle 5: Mix Different Types of Customers Carefully

In Informatica MDM Hub implementations, be careful when mixing different types of customers.

Business Party and Differentiated Customer Models

This principle focuses on the consequences of implementing two different models—a Business Party model versus a Differentiated Customer model, which are shown in the following figure.

 mdm-hub-model-design-13

Model

Description

Business Party Model

All Customer records are loaded into the same Business Party table, and an attribute on that table identifies the type or classification of the Customer records. In this example of a Business Party model, the Class of Customer attribute distinguishes Organizations from Individuals.

Differentiated
Customer Model

The type or classification of the Customer records is implied by where the records are stored. In this example of a Differentiated Customer model, the Organization table holds Customers classified as Organizations, and the Individual table holds Customers classified as Individuals.


Data modelers often prefer the Differentiated Customer model because it reduces null attributes on the Customer table (for example, the Organization Customer does not need to carry any attributes that apply only to an Individual Customer). However, there are definite advantages to using a Business Party model over a Differentiated Customer model, even if it does result in more null attributes on the Business Party table. Such advantages include:

  • The Business Party model easily supports any number of chained relationships between different classes of customers and/or the same classes of customers.
  • The Business Party model allows the user to model networks, not just parent/child hierarchies.
  • The Business Party model provides a single unique identifier for each Customer without any chance of overlap.
  • The Business Party model allows the user to search for Customers in one place without needing to know anything about the type of Customer.
  • The Business Party model allows the user to identify source records that have given Customers incorrect types.

Mixing Models

In the Informatica MDM Hub implementation, the decision to implement a Business Party model in order to get one unique Customer identifier and to model Customer Affiliations flexibly may arise. To avoid too many redundant/null value columns on the Business Party base object, use child tables to carry some of the attributes that are specific to specific sub-types of Customers. However, be very careful about mixing the Business Party and Differentiated Customer models.

The following figure shows a poor mix of these models.

 mdm-hub-model-design-14

The following figure shows a better way to mix these models.

 mdm-hub-model-design-15

If the merge performance is a concern, then consider using a pure Business Party model, as shown in the figure in Business Party and Differentiated Customer Models.

This is a better mix than the figure showing a poor mix of models because it simplifies the relationships between the objects and reduces the number of cross-table joins required to get the match data. The preferred model is still the full business party model shown in Business Party and Differentiated Customer Models, as that reduces the number of child tables to be maintained on merge and unmerge.

The Customer match attributes have been denormalized so that they are attributes of the Business Party base object instead of the Organization and Individual base objects. This reduces the number of cross-joins used in populating the match token.

In the better mix, all relationships have been defined at the Business Party level, making it easier to navigate and maintain the relationships. The poor mix has an uneasy mixture of relationships, with Addresses having nullable foreign keys to either Individual or Organization

Principle 6: Landing and Staging Data

This principle considers the design landing and staging tables in the Informatica MDM Hub implementation.

Landing Table

Although there are no strong design recommendations with respect to landing tables, consider the following issues for the Informatica MDM Hub implementation:

  • Some implementations have used source-specific landing tables (a landing table per source table/source file). This keeps the landing table format closer to the source format and means that the ETL process does not need to transform all sources to a standard layout, which could simplify the process of making changes for one source or adding new sources with different attributes later. However, it usually means a very large number of landing tables, which can be tedious and cumbersome to set up.
  • Other implementations have used one landing table per target table, which means that the ETL needs to transform all sources for the same target to the same standard layout. This approach does allow the ETL to be standardized, making it much faster to develop and test for the first implementation (where typically a large number of sources need to be coded). It is possible that this approach also makes it more costly to maintain after initial deployment, because changes from one source could potentially affect multiple ETL mappings.
  • If using one landing table per target table in the Informatica MDM Hub implementation, then the landing table needs to include a source identifier, which must be used in filtering the data mapped to each staging table. The landing table should also have a range partition specified in Oracle to partition it according to source system, which allows partitions to be truncated before the ETL inserts data from a source, rather than having records deleted from the landing table.

Staging Tables

Staging tables must be based on the columns provided by the source system for the target base object or dependent object for which the staging table is defined, even if the landing tables are shared across multiple source systems. By not making the column on staging tables source-specific, unnecessary trust and validation requirements are created.

Trust is a powerful mechanism, but it carries performance overhead. Use trust where it is appropriate and necessary, but not where the most recent cell value will suffice for the surviving record. For more information, see Using Trust Levels.

By limiting the columns in the staging tables to the columns actually provided by the source systems, it is possible to restrict the trust columns to those that come from two or more staging tables. Use this approach instead of treating every column as if it comes from every source, which would mean needing to add trust for every column and then validation rules to downgrade the trust on null values for all of the sources that do not provide values for the columns.

More trust columns and validation rules obviously affect the load and the merge processes. Also, the more trusted columns, the longer the update statements will be for the control table. Be aware that Oracle and DB2 have a 32K limit on the size of the SQL buffer for SQL statements. For this reason, more than 40 trust columns result in a horizontal split in the update of the control table—the MDM Hub will try to update only 40 columns at a time.

Design Patterns

This section summarizes the following typical physical data model design scenarios and describes options for implementing them.

Households

A Household is a grouping of customer records according to geographic location. For example, all of the people living at one address could be considered a household, or a group of doctors practicing at one hospital could be considered a household.

Create Household as a base object that is the parent of Customer. The easiest type of household is one in which the household has no attributes of its own. It uses inter-table match to match on selected Customer match columns that usually include the Address match columns.

The following figure shows an example of a logical mode for Households.

 mdm-hub-model-design-16

Addresses

The ideal model for addresses involves a one-to-many relationship from Business Party to Address, with Address match rules that include Business Party ID to prevent matches across different Business Parties. However, there are occasionally business cases for consolidating addresses across Business Parties, such as to get a single identifying key for all addresses for the same location, regardless of which Business Parties use that address. If there are business reasons for consolidating Addresses regardless of the Business Parties using the Addresses, then the following consolidated address model is recommended.

 mdm-hub-model-design-17

In this model, the Business Party Address base object consolidates the Addresses per Business Party. The Business_Party_ROWID is one of the match criteria for the Business Party Address base object, and Business Party Addresses should merge only if they have the same Business_Party_ROWID value.

The Business Party Address base object gives the distinct set of addresses for each business party, but it does not give a distinct set of all the addresses with a unique ID for each unique address. To get a unique set of Address identifiers, the Address base object would need to be included in the data model.

At its simplest, the Address base object does not include any attributes of its own, other than a Status Indicator to indicate whether the Address ID is active or inactive. Instead, it uses inter-table match to match using the attributes from the Business Party Address table. This approach assumes that tight matching rules are used for the Address base object, and that survivorship of household-specific attributes is not required. If household-specific attributes need to be survived, then those attributes must be defined and populated for the Address base object, along with the appropriate Trust rules.

Populating the Address Household Object

The Address household object is a standard base object that is populated through landing and staging tables. At the cross-reference level, there is one-to-one cardinality between the Address base object (Address cross-reference) and the Business Party Address base object (Business Party Address cross-reference).

Landing Tables

The Address object should share landing tables with the Business Party Address base object. The Address base object uses the same pkey_src_object values as the Business Party Address.

Staging Tables

The Address object must have its own staging tables. As for any other base object, the Address base object requires a separate staging table for each source system that can populate it. Each Address staging table usually only has pkey_src_object and last_update_date columns, unless there are other, household-specific attributes to be included.

If hard delete detection is being used to deactivate unused address identifiers, then the staging tables may also include a status indicatorcolumn.

Populating Address and Populating Business Party Address

When first loading Address and Business Party Address (before doing any matching and merging on either), there will be the same number of records in Address and in Business Party Address. The cross-reference tables will also have the same number of records. Each Business Party Address record has a foreign key referencing a record in the Address table.

When merging records in the Address table, records in the Business Party Address table will be updated to reflect the surviving Address key. Once all Address records have been merged, merge records in the Business Party Address table.

Each new Business Party Address record will have a corresponding new Address record that will need to be matched against the existing Address records.

Communication Channel Models

Communication channel data refers to electronic and telephonic address information, such as phone numbers, fax numbers, email addresses, URLs, pagers, etcetera.

Two Communication Channel Models

This section describes two communication channel models.

Generic, Normalized Communication Channel Model

In some data models, communication channel data are modeled as a completely generic structure, with a type identifier that determines the type of electronic or telephonic address contained in each row of data, as shown in the following figure.

 mdm-hub-model-design-18

Denormalized, Type-Specific Communication Channel Model

In other data models, communication channel data is stored in a denormalized, type-specific structure in which each communication channel is stored in its own specific column, as shown in the following figure.

 mdm-hub-model-design-19

Comparison Between the Models

This section compares the pros and cons of each of these models.

Generic, Highly-Normalized Communication Channel Model

The following table compares and contrasts the advantages and disadvantages of using a generic, highly-normalized communication channel model.

Pros

Cons

No restriction on the number of phone numbers, e-mail addresses, etc., that can be stored for a customer.

More problematic to use in Customer match rules. Cannot do combinations of matches like “WHERE customer phone numbers match AND customer e-mails match.”

Duplicate values can easily be identified and consolidated.

More difficult to determine new versus updated values because there is no primary key value for each record that does not include the communication channel value.

New communication channel types can easily be added without needing to change anything in the data model.

Does not support a granular level of detail for types. For example, all elements of a phone number are usually stored in one string, rather than being separated into country code, dial code, phone number, and extension.

 

Requires ETL work to normalize the data, or otherwise requires multiple loads in the MDM Hub through multiple staging tables.


Denormalized, Type-Specific Communication Channel Model

The following table compares and contrasts the advantages and disadvantages of using a denormalized, type-specific communication channel model.

Pros

Cons

Can do match combinations like “WHERE customer phone numbers match AND customer emails match.”

Number of communication channels that can be stored per customer is limited by the number of columns available.

Easier to determine new versus updated values. An update to a value can be treated as

a direct update because the record is keyed on customer_id and the update is to a specific column.

Can be more difficult to identify and consolidate duplicate values. For example, if Source A provides phone_1 as

555-123-4576 but Source B has that value

as phone_2 and not phone_1, then the two

values will not be de-duped.

Supports a more granular level of detail for types. For example, instead of storing all elements of a phone number in one string, they can be separated into country code, dial code, phone number, and extension.

New communication channel types require changes to the data model.

Straight mapping from landing to staging can be supported for the relevant communication channels, with the need for normalizing the data in ETL or through multiple stage jobs in the MDM Hub.

 


Proposed Third Type of Data Model for Communication

Channels

This model shown in the following figure is a hybrid of the other two models.

It provides normalized structures for communications channels without a high degree of generalization described in Generic, Normalized Communication Channel Model.

 mdm-hub-model-design-20

In this hybrid model, separate communication channel objects have been created for logically similar communication types. This approach minimizes the drawbacks of the two previous models and provides the best benefits of both.

Practitioners Tip:

Although strictly speaking, a consultant can produce distinct logical and physical models, it is often useful to create a mixed logical/physical model that contains key attributes and notes to clarify the meanings of some of the tables such as the following:

 mdm-hub-model-design-21

Table of Contents

Success

Link Copied to Clipboard