Following a rigorous methodology is key to delivering customer satisfaction and expanding analytics use cases across the business.
When starting an MDM program, data profiling and discovery are essential to the success of the program. Leveraging IDMC’s Cloud Data Governance and Catalog (CDGC), the profiling and discovery is systematic and comprehensive. Identified data sources can be scanned in whole, profiled, documented against enterprise glossary terms, and have data quality rules run seamlessly to quantify the existing state of data quality.
Adding CDGC as a part of the MDM process will support MDM development. Tables are profiled in a common setting with the ability to classify fields with out-of-the-box sensitivity rules, add custom metadata tags to identify matching fields. Additionally, users can view data lineage for all sources and transformations, both prior to the MDM development as well as post development.
While gathering requirements for the MDM domain, a glossary hierarchy should be built to correspond to this requirement. Glossary Terms can be documented and tagged as part of the MDM Data model.
Leveraging CDGC workflows, these domains and fields can be socialized with stakeholders, comments captured in the workflow ticket, and approvals or disagreements can be stored in one location. No more searching through emails, meeting minutes, or spreadsheets to find the decisions and historical revisions to the approved definitions and metamodel.
With the data glossary in progress, critical data elements can be tagged. These elements will be critical to building your MDM ingress and matching and merge rules. Using CDGC lineage functionality, you can analyze and determine the source of data which can support survivorship rules.
Once domain metadata has been scanned in CDGC, an analysis can be performed to review the tables that are part of the ingress process. Tables can be certified in CDGC and highlighted as a certified table indicating that it has been reviewed and deemed a valid, clean, and critical table.
With scanned metadata in CGDC, you can build representations of database systems and tables. Each MDM table becomes a data set built with a selection of fields from the table. These datasets can run through the CDGC workflow approval process to capture decisions and approvals on each of the ingress tables.
Building data sets and systems in CDGC will provide a graphic view of the data model and the lineage of the data. MDM developers have a quick search function to find information about tables to ingress, and a quick profile view of fields that will expedite development.
When MDM Domains are established and the metadata scanned in CDGC, the profiling is included in the metadata scanning. This profiling provides the following information:
Using the CDGC profile and the identification of key data elements, you can narrow the selection of tables for a full data profiling. This eliminates extra work to profile tables that will be useful in the MDM Project.
From the data profiling results, data quality rules can be identified and built to specify, Reference Tables, Standardization rules, and identifying Null Values. The Rule specifications can be added to the data profiling to give a quick look in the data quality. These data quality rules also help identify the data quality issues that may have to be resolved before match and merge testing is started.
Using the data quality rule results, reference tables can be identified that will help data standardization on ingress.
Data Quality rules that are built for the critical tables can be applied to all the metadata that is scanned in CDGC. Each rule is associated with a particular data glossary item. Each instance of this glossary item associated to a technical field produces a unique data quality occurrence. Each time the database is scanned, the data quality rules will run and a history of the scores is captured in CDGC. This allows for tracking and verifying the success of a data clean up.
Data quality rules in CDGC are aggregated both at the table level and for each glossary item. For example, the ability to review the data quality across all fields in the table “crm. customer” as well as the table salesforce.client. This can help with survivorship and golden record requirements.
Leveraging Cloud Data Governance and Catalog while building MDM requirements provides guidance on what data is available, where the data comes from, and how the data could be used.
Success
Link Copied to Clipboard