MDM Match Tuning

Success

Manage your Success Plans and Engagements, gain key insights into your implementation journey, and collaborate with your CSMs

Success

Success Accelerators

Accelerate your Purchase to Value engaging with Informatica Architects for Customer Success

My Engagements

All your Engagements at one place
Communities

A collaborative platform to connect and grow with like-minded Informaticans across the globe

Communities

Product Communities

Connect and collaborate with Informatica experts and champions

Discussions

Have a question? Start a Discussion and get immediate answers you are looking for

User Groups

Customer-organized groups that meet online and in-person. Join today to network, share ideas, and get tips on how to get the most out of Informatica

Get Started

Community Guidelines
Knowledge Center

Troubleshooting documents, product guides, how to videos, best practices, and more

Knowledge Center

Knowledge Base

One-stop self-service portal for solutions, FAQs, Whitepapers, How Tos, Videos, and more

Support TV

Video channel for step-by-step instructions to use our products, best practices, troubleshooting tips, and much more

Documentation

Information library of the latest product documents

Velocity (Best Practices)

Best practices and use cases from the Implementation team
Learn

Rich resources to help you leverage full capabilities of our products

Learn

Trainings

Role-based training programs for the best ROI

Certifications

Get certified on Informatica products. Free, Foundation, or Professional

Product Learning Paths

Free and unlimited modules based on your expertise level and journey

Experience Lounge

Self-guided, intuitive experience platform for outcome-focused product capabilities and use cases
Resources

Library of content to help you leverage the best of Informatica products

Resources

Tech Tuesdays Webinars

Most popular webinars on product architecture, best practices, and more

Product Availability Matrix

Product Availability Matrix statements of Informatica products

SupportFlash

Monthly support newsletter

Support Documents

Informatica Support Guide and Statements, Quick Start Guides, and Cloud Product Description Schedule

Product Lifecycle

End of Life statements of Informatica products

Ideas

Events

Change Request Tracking

Marketplace

| Sign up

Velocity
Strategy

Strategy

Data Strategy

Centers of Excellence

Enterprise Data Governance

Enterprise Architecture

Program & Change Management
Solutions

Solutions

Cloud Data Warehouse & Data Lake

Data Lake

Data Warehouse Modernization

Analytics Modernization

Application Integration

360 Engagement

Multidomain MDM

Customer 360 SaaS

Product 360

Supplier 360

Reference 360

Data Governance & Privacy

Data Catalog & Metadata Management

Data Privacy

Regulatory Compliance

Data Quality

Data Access and Provisioning
Stages

Stages

Cloud Data Warehouse & Data Lake

360 Engagement

Data Governance & Privacy

Following a rigorous methodology is key to delivering customer satisfaction and expanding analytics use cases across the business.
More
- Success
  
  Manage your Success Plans and Engagements, gain key insights into your implementation journey, and collaborate with your CSMs
  
  Manage your Success Plans and Engagements, gain key insights into your implementation journey, and collaborate with your CSMs
  
  Success Accelerators
  
  Accelerate your Purchase to Value engaging with Informatica Architects for Customer Success
  
  My Engagements
  
  All your Engagements at one place
- Communities
  
  A collaborative platform to connect and grow with like-minded Informaticans across the globe
  
  A collaborative platform to connect and grow with like-minded Informaticans across the globe
  
  Product Communities
  
  Connect and collaborate with Informatica experts and champions
  
  Discussions
  
  Have a question? Start a Discussion and get immediate answers you are looking for
  
  User Groups
  
  Customer-organized groups that meet online and in-person. Join today to network, share ideas, and get tips on how to get the most out of Informatica
  
  Get Started
  
  Community Guidelines
- Knowledge Center
  
  Troubleshooting documents, product guides, how to videos, best practices, and more
  
  Troubleshooting documents, product guides, how to videos, best practices, and more
  
  Knowledge Base
  
  One-stop self-service portal for solutions, FAQs, Whitepapers, How Tos, Videos, and more
  
  Support TV
  
  Video channel for step-by-step instructions to use our products, best practices, troubleshooting tips, and much more
  
  Documentation
  
  Information library of the latest product documents
  
  Velocity (Best Practices)
  
  Best practices and use cases from the Implementation team
- Learn
  
  Rich resources to help you leverage full capabilities of our products
  
  Rich resources to help you leverage full capabilities of our products
  
  Trainings
  
  Role-based training programs for the best ROI
  
  Certifications
  
  Get certified on Informatica products. Free, Foundation, or Professional
  
  Product Learning Paths
  
  Free and unlimited modules based on your expertise level and journey
  
  Experience Lounge
  
  Self-guided, intuitive experience platform for outcome-focused product capabilities and use cases
- Resources
  
  Library of content to help you leverage the best of Informatica products
  
  Library of content to help you leverage the best of Informatica products
  
  Tech Tuesdays Webinars
  
  Most popular webinars on product architecture, best practices, and more
  
  Product Availability Matrix
  
  Product Availability Matrix statements of Informatica products
  
  SupportFlash
  
  Monthly support newsletter
  
  Support Documents
  
  Informatica Support Guide and Statements, Quick Start Guides, and Cloud Product Description Schedule
  
  Product Lifecycle
  
  End of Life statements of Informatica products
  
  Ideas
  
  Events
  
  Change Request Tracking
  
  Marketplace

Last Updated Date Feb 21, 2025 |

Stages Best Practice 360 Engagement Multidomain MDM Customer 360

Challenge

Effective match rules are central to a successful MDM implementation. The challenge is to arrive at the optimal match rules for a given customer implementation. This Best Practice provides guidelines on how to do that.

Description

This document describes the general steps in the set up and tuning of match rules. These steps are iterative and repeatable, depending on changes to match requirements.

Data Audit

Data Audit should overlap with data discovery. The data analysis may be performed exclusively for the purpose of the match process but would normally be part of the overall data analysis.

Get a Reasonable Sized Representative Sample

It is essential that the sample data is representative of the data the production system will contain. Be wary of data coming from TEST systems as it is often developer created and not representative of real data.

Note: A “reasonably sized” sample should be as large and comprehensive as the user can afford.

Discuss with Users/Business to Determine High-Level Match Requirements/Rules

It is necessary to discover (from a source as close as possible to the “acceptance” users) what is considered “MATCHING.” Consider the following questions:

What is considered a Match?
What is not considered a Match?
Are there any samples from previous systems that can be shared?

Data Investigation

Discover fields that contribute to the match process, including the match key. Use Informatica Data Profiler, IIR Edit Rule Wizard, or SQL queries to conduct the analysis. Assess the quality of each field and the combination of fields.

Group Identification: Run simple group statistics on single or multiple field data. This allows for the assessment of major groupings (e.g., state codes). This should help identify:

Large groups of identical records.
The levels of exact match duplicates (these feed into whether to use exact match rules).
Good candidates for filters, which are used as exact match columns to reduce the number of candidates sent to match. A lack of filters may result in a large group of candidates, which negatively impacts performance.
Viability of the intended key column

Data Quality

Data Completeness: Determine the degree to which the data columns are complete. In other words, how often the field has a non-null value. Look for data completeness by column and by combination of columns. For example:

Determine the total number of records that have both first and last names valued.
If the postal code field is valued in only 50% of the records, it may not be a good candidate as an exact column.

Data Accuracy: Ensure that the data is accurate. For example, verify that the gender field only contains gender values.

Use pattern analysis to assess the quality of data in column. Pattern analysis is useful in analyzing data that conforms to certain formats or data types (e.g., postal codes).

nnnnn-nnnn - 80% (full zip code with hyphen)
nnnnn - 10 % (five-digit zip code)

Suspect Data: Look for extraneous data as they are strings that may need to be removed. This can be detected by browsing the data and/or using Informatica Data Profiler.

Determine the Match Population

Determine if the data will be supported by a standard population (e.g., US, International), or if it needs a customized population. If there is mixed data from different languages, consider using multiple populations (e.g., USA, Japan, Chinese, etc.).

Word Frequency

Use the IIR edit wizard to retrieve a frequency list using sample data. This will:

List all the word/phrases in the sample data. This provides a frequency list of individual words/phrases within a data field.
Provide frequency counts.
Inform the user if there is an IIR rule for the word/phrase.

These results will help validate the population choice or identify if any tweaking is required.

Data Standardization

Using results from the data audit, set up cleanse functions to standardize the data (e.g., junior becomes JR).

Use an address cleanse tool to standardize and clean the address information

Define Fuzzy Match Key

Set up the match Key based on a discussion with the business users and data audit. It is recommended to use the following as match keys:

If data contains organization names or both organization and individual names, use the organization name as the match key.
If data only contains individual names, use the individual’s name as the match key.
If the data only contains addresses, use address part1 as the match key.

Define the Key Width

How big is the target dataset going to be?
How important is missing a match versus performance?
Is the data quality generally good?
The “wider” the key:
- The higher chance of finding a match.
- The lower the overall performance will be.
The options range from widest to narrowest (Extended, Standard, Limited, and Preferred).
With the sample, adjust this setting and determine what is lost and gained.
Run Generate Match Tokens.
Run statistics on IIR keys and run group summary statistics on the IIR keys.
Identify suspect IIR keys.
Review those keys that have the highest count compared to others, review the actual data, and determine if this may cause problems when matching. For example:
- Hot Spots are large groups of identical keys due to identical records.
The following are areas to explore:
- Is it necessary to cleanse the data?
- Is it necessary to flag the suspect records for later match?
- Is it necessary to change the key?
- Is it necessary to add more to the current key?
- Is it necessary to change the key width or type?
- Is the strip table too big?
- What about “Match Only Once”?

Set Up Match Rules

Set up draft match rules based on a discussion with the business users and data audit.
When creating match rules, start with the rules that will provide the tightest matches. Usually, these are the exact match columns (such as identifiers) with identifiers. Use name/address for added accuracy.
Define the match levels, search levels, and match purposes.
- Start with narrow, followed by typical, and then extreme. Aim to have the most conservative matching that the users find acceptable. The “cost” of a false positive is generally less than a false negative.

Name and Address Dry Run

It is advised to run the name and address fuzzy matches before using match rules that include unique identifiers. This will determine the match levels based on name and address exclusively.

Use all match levels: conservative, typical, and loose. If time is available, perform this in the dry run to find comparable numbers of the different level of matching. Avoid defining three separate rules and rerunning the job three times, changing the match level with each iteration.

Run the match rules (without merge and BMG).

Review Name and Address Match

Run a query on the MTCH table and group by match rule. These results should provide an initial perspective of where the match falls and which match levels to use. For example, if the majority of the matches are based on a conservative match level, it is recommended to use conservative.

Run a quick stat of the match results by match rule. This stat should provide a rough estimate of the match results by match rule. It is recommended to exclude, or revise, those rules that have little gained matches

Review detailed match results. This can be accomplished by querying the MTCH table. It is useful to join this query with the BO to allow an easy comparison of the results.

Make a copy of the MTCH table. It is recommended to compare the results to subsequent match results using different match rules.

This review should clarify the following aspects:

Over and under matching.
In reviewing detailed matched records, it is best to save the detailed results to a table or to export them to an Excel Spreadsheet. This allows a re-review, if required, or comparison with other runs.
Avoid reviewing matched records that have the exact name and address. Concentrate on matched records with discrepancies.
It helps to categorize the review records. For example, one category is “all records” where the addresses are the same but have one or more discrepancies in the name.
While reviewing the match results, use IIR workbench to help analyze the match results. Using this tool helps determine why IIR matched or did not match records. By tweaking the rules on the IIR workbench, it is possible to determine what adjustments are necessary for the match rules.

Set Final Match Rules with Exact Columns

Once the composition of the name and address match rules have been decided, set up the final match rules. At this point, add the unique identifiers, such as the IMS number.

If there is no unique identifier, include an exact column in the match rules, such as zip code. Add several identical rules by varying their exact filter.

If there is no reason to differentiate between a tight or loose match, only include the loose match rules in the final set.

Use the options, such as “Match Only Once,” “Dynamic Match Analysis”, or “Match only Previous,” especially for the initial load. Toward the end of the review process, set these options to on and compare the MTCH table to the earlier iterations to see what matches (if any) are missing. Determine if these acceptable for the performance trade off.

Review Match Results

Run the complete match set. Review the results and try to get acceptance user involvement as early as possible. It is recommended to receive step-by-step assistance from the acceptance user. If there are issues, reiterate the stages.

Review STRP Table

Use console Histogram to check for large key sets. It is necessary to analyze the STRP table to look for:

IIR Key ratio: average number of keys per “rowid_object”
- Calculate this using the cleanse server tokenization batch summary report. Key ratios less than ten should be investigated.
Large IIR keys
- This implies that a particular string has a high level of frequency in the data set. Is there extraneous data that need to be dealt with (e.g., “Estate of” or “Medical”)?
- If there are an excessive number of keys being generated, investigate the source. Are there a relatively small number of bad records? Are they from a particular source?

Sample Server Log Summary

[2024-09-05 19:50:00,784] [WebContainer : 3] [INFO ]
com.siperian.mrm.match.tokenize.Tokenize: Tokenize Summary

[2024-09-05 19:50:00,784] [WebContainer : 3] [INFO ]
com.siperian.mrm.match.tokenize.Tokenize: =============

[2024-09-05 19:50:00,784] [WebContainer : 3] [INFO ]
com.siperian.mrm.match.tokenize.Tokenize: Total BO Records read:13156618

[2024-09-05 19:50:00,784] [WebContainer : 3] [INFO ]
com.siperian.mrm.match.tokenize.Tokenize: Total Records read :62725732

[2024-09-05 19:50:00,784] [WebContainer : 3] [INFO ]
com.siperian.mrm.match.tokenize.Tokenize: Keys generated :310127101

[2024-09-05 19:50:00,784] [WebContainer : 3] [INFO ]
com.siperian.mrm.match.tokenize.Tokenize: Average Record Size :115

[2024-09-05 19:50:00,784] [WebContainer : 3] [INFO ]
com.siperian.mrm.match.tokenize.Tokenize: Largest Record Size :200863

[2024-09-05 19:50:00,784] [WebContainer : 3] [INFO ]
com.siperian.mrm.match.tokenize.Tokenize: Largest Record Rowid :2033307

[2024-09-05 19:50:00,784] [WebContainer : 3] [INFO ]
com.siperian.mrm.match.tokenize.Tokenize: Total time used 21.399084 mins

The following are sample SQL queries to analyze the STRP table:

This query will show how many records there are for the most populous 5,000 keys in the DB:
- select * from (select SSA_KEY, count(*) c1 from c_customer_strp where Data_row=1 group by ssa_key order by c1 desc) where rownum < 5000
This will show what the top 5,000 record key size is:
- select * from (select rowid_object,count(*) c1 from c_customer_strp where Data_row =1 group by rowid_object order by c1 desc) where rownum < 5000
This will show what the top 5,000 records STRP table records count is:
- select * from (rowid_object, data_count from c_customer_strp where data_row =1 and preferred_key_ind =1 order by data_count desc) where rownum < 5000

Review Cleanse Server Log

Review the cleanse server log while a Match batch job is running and after it completes.

Identify “Hot Spots”
- Look for long-running match ranges (i.e., if a range takes long to be processed). For example, if stats keep appearing for a range, then this is a potential hot spot).
Review Match Summary
- Match processing time, number of records processed, total auto and manual matches
- Top 10 Range Comparisons counts
  - Large range comparison counts may indicate a data issue

Check If Matches are Running Too Slow

The following are potential reasons:

Hot spots or a large number of comparisons
Slow DB read
Low thread count
Slow disk read /write
Logging set to Debug mode
Cleanse/Remove bad data
Set bad data aside (i.e., prevent from matching)
- Use exclude from match
Use match path filter to reduce number of child records
Add extraneous data as Noise Words
- Add Noise words to IIR population
Improve DB and Disk speed
Increase thread count
Use Distributed Matching
If an exact rule has the fuzzy key components in it, convert it to a “filtered rule”
Better filtering in rules
Use Match Bo Level Settings
- Dynamic match threshold
- Match only once
- Change Match Key width

Matching in MDM SaaS

Aside from what has been previously mentioned, match tuning in MDM SaaS can vary slightly from that of MDM Hub. Some key differences are in two key areas. The first being how you review the results of a match run, and the second is Match training from the Matching done from machine learning.

When running a Match job in MDM SaaS, upon completion a match summary is generated. This summary provides up to 150,000 matches and includes information on items such as what rule the record was matched on and the source of the record. The review of this summary should be completed together with the business unit or whoever will create the acceptance criteria.

The review process should be done both at a high level, for gaining summary statics on the current iteration, as well as at a more granular level. This means going through a set of matches pairs and determining if it is a proper match or not.

After an iteration is reviewed, changes can be made directly to the match model and then republished. This saves time and keeps a version of the old model.

Do's and Don'ts of Match and Merge Configuration

Consider the following do's and don'ts when configuring match and merge:

When you ingress data, ensure that you avoid loading transactional data. Use Cloud Data Integration to filter out transactional data. Highly duplicate transactional data impacts performance negatively by generating excessive number of match comparisons.
When you use threshold-based merge strategy, understand the rule outcomes for each scenario.
When you use name and address for matching without unique identifiers, ensure that you configure tight thresholds to avoid false positives. For example, multiple family members can share the same address and the records cannot be definite matches.
When you use exact match rules, configure the appropriate fields for the candidate selection criteria.
If required, regenerate the match keys after updating the candidate selection criteria. For example, if you change the default population, the match criteria values in a declarative rule change. Ensure that you regenerate the match keys for the records.
Use the NOT_READY_FOR_MATCH flag to mark records to exclude from the match process. If a source record does not have enough information to contribute to the master record, you might want to mark it as not ready for match.
During match tuning, perform the match process first. You can rematch the records until you reach an optimal match configuration.
During initial match tuning, use a smaller data set to test. Processing a smaller data set minimizes loss of time in the event of match job failure and allows for multiple tuning iterations. Ensure that you configure match rules to accommodate expected data volumes.
Avoid choosing multiple candidate selection criteria unless it is essential, and you have such a requirement for your use case.
Avoid manual match rules that use loose match level. You must not flood the data stewards with too many matches.

Match and Merge Best Practices Checklist

The following checklist summarizes the best practices for configuring match and merge:

Know your data
Define the candidate selection criteria
Set the maximum candidate limit
Define fuzzy match fields
Define exact match fields
Set the merge threshold limits
Leverage segment matching