Success

Manage your Success Plans and Engagements, gain key insights into your implementation journey, and collaborate with your CSMs

Success

Success Accelerators

Accelerate your Purchase to Value engaging with Informatica Architects for Customer Success

My Engagements

All your Engagements at one place
Communities

A collaborative platform to connect and grow with like-minded Informaticans across the globe

Communities

Product Communities

Connect and collaborate with Informatica experts and champions

Discussions

Have a question? Start a Discussion and get immediate answers you are looking for

User Groups

Customer-organized groups that meet online and in-person. Join today to network, share ideas, and get tips on how to get the most out of Informatica

Get Started

Community Guidelines
Knowledge Center

Troubleshooting documents, product guides, how to videos, best practices, and more

Knowledge Center

Knowledge Base

One-stop self-service portal for solutions, FAQs, Whitepapers, How Tos, Videos, and more

Support TV

Video channel for step-by-step instructions to use our products, best practices, troubleshooting tips, and much more

Documentation

Information library of the latest product documents

Velocity (Best Practices)

Best practices and use cases from the Implementation team
Learn

Rich resources to help you leverage full capabilities of our products

Learn

Trainings

Role-based training programs for the best ROI

Certifications

Get certified on Informatica products. Free, Foundation, or Professional

Product Learning Paths

Free and unlimited modules based on your expertise level and journey

Experience Lounge

Self-guided, intuitive experience platform for outcome-focused product capabilities and use cases
Resources

Library of content to help you leverage the best of Informatica products

Resources

Tech Tuesdays Webinars

Most popular webinars on product architecture, best practices, and more

Product Availability Matrix

Product Availability Matrix statements of Informatica products

SupportFlash

Monthly support newsletter

Support Documents

Informatica Support Guide and Statements, Quick Start Guides, and Cloud Product Description Schedule

Product Lifecycle

End of Life statements of Informatica products

Ideas

Events

Change Request Tracking

Marketplace

| Sign up

Velocity
Strategy

Strategy

Data Strategy

Centers of Excellence

Enterprise Data Governance

Enterprise Architecture

Program & Change Management
Solutions

Solutions

Cloud Data Warehouse & Data Lake

Data Lake

Data Warehouse Modernization

Analytics Modernization

Application Integration

360 Engagement

Multidomain MDM

Customer 360 SaaS

Product 360

Supplier 360

Reference 360

Data Governance & Privacy

Data Catalog & Metadata Management

Data Privacy

Regulatory Compliance

Data Quality

Data Access and Provisioning
Stages

Stages

Cloud Data Warehouse & Data Lake

360 Engagement

Data Governance & Privacy

Following a rigorous methodology is key to delivering customer satisfaction and expanding analytics use cases across the business.
More
- Success
  
  Manage your Success Plans and Engagements, gain key insights into your implementation journey, and collaborate with your CSMs
  
  Manage your Success Plans and Engagements, gain key insights into your implementation journey, and collaborate with your CSMs
  
  Success Accelerators
  
  Accelerate your Purchase to Value engaging with Informatica Architects for Customer Success
  
  My Engagements
  
  All your Engagements at one place
- Communities
  
  A collaborative platform to connect and grow with like-minded Informaticans across the globe
  
  A collaborative platform to connect and grow with like-minded Informaticans across the globe
  
  Product Communities
  
  Connect and collaborate with Informatica experts and champions
  
  Discussions
  
  Have a question? Start a Discussion and get immediate answers you are looking for
  
  User Groups
  
  Customer-organized groups that meet online and in-person. Join today to network, share ideas, and get tips on how to get the most out of Informatica
  
  Get Started
  
  Community Guidelines
- Knowledge Center
  
  Troubleshooting documents, product guides, how to videos, best practices, and more
  
  Troubleshooting documents, product guides, how to videos, best practices, and more
  
  Knowledge Base
  
  One-stop self-service portal for solutions, FAQs, Whitepapers, How Tos, Videos, and more
  
  Support TV
  
  Video channel for step-by-step instructions to use our products, best practices, troubleshooting tips, and much more
  
  Documentation
  
  Information library of the latest product documents
  
  Velocity (Best Practices)
  
  Best practices and use cases from the Implementation team
- Learn
  
  Rich resources to help you leverage full capabilities of our products
  
  Rich resources to help you leverage full capabilities of our products
  
  Trainings
  
  Role-based training programs for the best ROI
  
  Certifications
  
  Get certified on Informatica products. Free, Foundation, or Professional
  
  Product Learning Paths
  
  Free and unlimited modules based on your expertise level and journey
  
  Experience Lounge
  
  Self-guided, intuitive experience platform for outcome-focused product capabilities and use cases
- Resources
  
  Library of content to help you leverage the best of Informatica products
  
  Library of content to help you leverage the best of Informatica products
  
  Tech Tuesdays Webinars
  
  Most popular webinars on product architecture, best practices, and more
  
  Product Availability Matrix
  
  Product Availability Matrix statements of Informatica products
  
  SupportFlash
  
  Monthly support newsletter
  
  Support Documents
  
  Informatica Support Guide and Statements, Quick Start Guides, and Cloud Product Description Schedule
  
  Product Lifecycle
  
  End of Life statements of Informatica products
  
  Ideas
  
  Events
  
  Change Request Tracking
  
  Marketplace

Last Updated Date Feb 24, 2025 |

Best Practice Stages MDM Multidomain Customer 360 Supplier 360 360 Engagement

Challenge

The number one mistake when starting a Data Management project is skipping data profiling. Data discovery and analysis can be laborious and complex, involving massive volumes of data with relationships that are difficult to unravel, but bypassing this crucial step (particularly when embarking on a digital transformation) will often result in project delays and re-work. No matter how well you think you know your data, existing documentation, source code, data models and staff experience are often outdated, incorrect or missing.

Description

Data Profiling in Informatica Data Management Projects

Data Profiling is a fundamental activity performed in the early phase of a Data Management project (e.g., Master Data Management, Data Integration, Data Quality, Data Governance). This best practice article primarily applies to Master Data Management (MDM) implementations, although it is tool agnostic for this purpose.

As defined by Gartner, “Data profiling is a technology for discovering and investigating data quality issues, such as duplication, lack of consistency and lack of accuracy and completeness. This is accomplished by analyzing one or multiple data sources and collecting metadata that shows the condition of the data and enables the data steward to investigate the origin of data errors. The tools provide data statistics, such as degree of duplication and ratios of attribute values, both in tabular and graphical formats.”

Informatica’s view on Data profiling is that it is a process for assessing the quality and structure of data sources, so that you have a complete and accurate picture of your data. Data profiling verifies that data columns are populated with expected types of data. If a profile reveals problems in the data, the data quality concerns can be addressed to correct and prevent data anomalies.

Why Profile Your Data?

The primary reasons to tackle a data profiling exercise in any data related IT initiative include:

Minimizing budget overruns
Reducing unplanned delays in project schedules
Decreasing overall project implementation risk

Before any data can be accurately integrated in an MDM project, its content, quality, and structure must be understood. Data profiling can set expectations on what end results to expect from incoming source data quality. Early diagnostics of poor data quality provides opportunities for organizations to resolve the issues, thus making the data usable for further application. Early detection means earlier mitigation and higher success. It also improves business user engagement and promotes good data governance.

Example 1

A customer wants to master and consolidate product data from six enterprise systems into MDM. The customer expected over 90% of the data to be deduplicated at go live. However, after profiling the sources, it was discovered that many critical data elements needed for matching were missing, and less than 60% of the data could be deduplicated without additional cleansing and enrichment. Because of these findings, the customer was able to reset expectations.

Simple descriptive statistics (i.e., column-level based data profiling applied in data quality assessments) supports the definition of data quality metrics / scorecards, data integration requirements, and matching / identity resolution rules. Understanding the data, requirements, and rules leads to correct, elegant, and sustainable technical design specifications and the ability to create comprehensive, relevant test cases.

Example 2

A customer is confident that its party contact information was complete and of high quality because of checks put in place on the onboarding applications. After profiling the data, it was discovered that about 20% of the contact information was invalid.

Data profiling identifies problematic “hot spots” early in the project and triggers development of match mitigation strategies. It also identifies any overlaps in match attributes across sources in support of match rules designs and strategies. Perceived trustworthiness of source data can be validated and lead to timely discussions amongst key project stakeholders to arrive at the correct trust rules. This ensures each ‘golden record’ attribute does come from the confirmed source.

Another element of the data profiling analysis is to understand the referential integrity within and across sources (along with other relationships between data structures). If not addressed, the project team often faces issues with rejected records which requires additional time and effort to analyze, troubleshoot and to rewrite code. Assistance may even be required from other non-project teams to modify source data extracts or re-develop source data views.

The unique source identifier is a key concept in MDM. If it is not confirmed unique within a source before data is loaded, it may cause unnecessary delays to address the unexpected duplication. Additionally, if an identifier is missing, investigating other columns more closely can identify alternative data options or a group of columns that can be leveraged to create a compound, unique key for the project to assure a source record uniqueness.

Example 3

A customer in financial services expected to consolidate data based on the Tax ID because the Tax ID is unique. This customer had recently merged with another firm. The new firm used the Tax ID of the money manager instead of the individual investor when an individual investor used a third-party money manager. There were over 35 million accounts, and this scenario impacted less than half a million. Without data profiling, this scenario would have only come to light during UAT, or worse, after Production go-live. It would have impacted data ingestion and consolidation rules.

A lack of unique identifiers can also highlight a potential issue for delta definition and jeopardizes the required processing SLA times. If source delta extract (inclusive of deleted, updated and new records) cannot be provided; or if delta cannot accurately be formed based on data profiling observations, then a full data set processing for each load cycle might be required. This, in turn, increases the processing time for the accurate and complete data to be delivered to its end consumer. This presents another opportunity to address these risks earlier in a project and set expectations for the delivery timing.

Summary

Business leaders can make strategic and trusted decisions by leveraging data profiling results to understand the quality, shape and characteristics of source data. This mitigates risks early-on that can prevent data-related IT projects from moving towards digital transformation. When planning Data Management projects, it is important to not skip the critical step of data profiling.

The Importance of Data Profiling (SaaS)

360 Engagement