Data governance within an organization is the overall management of the security, integrity, availability, and confidentiality of data.
In essence, data governance policies allow large numbers of users within an organization to use and work on data and datasets, while ensuring privacy and adherence to regulatory and internal compliance rules and regulations.
Creating a data governance practice is an underestimated endeavor—but it is essential for realizing the value of an organization’s data. Data governance also has a strong security component that helps prevent sensitive information from getting in the wrong hands.
Data lakes store and provide the necessary tools to analyze large volumes of structured and unstructured data. The data stored within the lake can come from many sources such as machine data (IoT sensors), product logs, web interactions, operational data, and more. The accumulated data in the data lakes serve multiple data processing, reporting, and analytic services that can result in improved customer support and enhanced customer experiences. Within the data lake, the data volume can be huge, the velocity is fast, and the varieties are many. A good governance plan is the difference between an intelligent data lake that accelerates business and a data swamp that keeps you mired in repeating past mistakes.
Implementing a data lake is a serious undertaking and the success of it depends on how it is governed. To be successful and realize value, organizations must strike a balance between strict policies that protect data and ones that allow for easy access. Managing access to the data is important but it should not overly hinder legitimate data access. A poor balance of rules and openness leads to unusable data that is both hard to find and difficult to consume.
A typical data governance starts by identifying:
Data governance should be embedded in the overall design of a data lake from the start because adding it later will become increasingly more difficult over time. Keep in mind that data governance is an ongoing process that continually refines and improves overall data value.
Attention to the quality and availability of data will have a huge impact on a data lake’s success. As users start leveraging the data lake, they will need the assurance that the data is accurate and complete. Metadata plays a huge part in helping users quickly identify the datasets that are the most relevant for their use cases.
Data governance initiatives are judged not only on the quality of the implementation, but also on the benefits they bring to the business. Data is integral to today’s businesses and users need to be able to find it and consume it easily. A data governance strategy makes data available throughout the enterprise and allows users to manage, enhance, and protect business data across functions and units, while finding business insights.
Setting up a data governance strategy starts with determining the goals and requirements of the organization and/or department. A good concept to guide this process is the CIA triad. The Triad consists of the availability, integrity and confidentiality elements. These elements are considered to be vital components of (data) security. Confidentiality deals with the sensitivity of the data e.g. privacy related, trade secrets, stock sensitive data and more. It is most commonly rated through determining impact (reputation, cost, etc.) and likelihood. Integrity deals with the importance of the accuracy of the data. Lack of integrity through (erroneous) modification and deletion could result in severe issues like incorrect reporting, faulty decisions based on wrong information and more. Like with confidentiality this is also rated through determining impact and likelihood. Finally, availability, this element deals with the necessity of the data to be available for to the organizations users, systems and more. Lack of availability for certain data could lead to issues such as to missing opportunities, client dissatisfaction, complete shutdown of products or services. Again, this rated through determining impact and likelihood. Most commonly each element is rated from 1 to 4.
Using the triad an organization can determine the rating of data being ingested into the data lake and can determine which requirements and policies it needs to set up for storing that data and making it available accordingly.
Understanding the many facets of an organization’s data is a valuable outcome of data governance. Organizations can easily recognize what data is stored, the data type (structured or unstructured), where it is stored, its value in large and small terms, and its sensitivity. This process starts with gathering, documenting, and maintaining all the information in the data lake and applying metadata management to list location, type, structure, and content. Accurate metadata management will help business analysts and data scientists search for data, understand its quality, and determine its relationship to other business data.
To understand your data attributes, you will need to identify, categorize, and tag all data based on:
You will need to define and follow rigorous and consistent processes for:
Organizations have multiple goals in implementing a data governance strategy. Informatica has helped organizations reach better efficiency, project alignment, regulatory compliance, and a more simplified data landscape through data governance. As businesses approach digital transformation and require more agility in their highly competitive markets, they are bringing data governance to their data lakes and realizing that:
Business analysists and data scientists can easily put business data to work for the business. They will have the needed permissions for easy and automatic access to high-quality data through attribute-based access control (ABAC) and will no longer have to wait for manual data preparation.
Enterprises can avoid data breaches, privacy missteps, and noncompliance penalties. When governance is in place, an organization can determine the impact of data breach, provide regulatory reporting, comply with requests to be forgotten in a more automated way, respond to data audits such as GDPR requests, and delete data to remain in compliance with regulations.
Democratizing data makes it infinitely more valuable. Good governance ensures that anyone in the business is able to access the data they need easily and quickly and begin realizing data-driven value.
Informatica offers an enterprise data governance solution that can be used on-premises or in the cloud, with traditional data or modern data storage use cases, to meet the needs of both the business and IT. The end-to-end intelligent data governance solution brings together people, processes, and systems with a holistic, collaborative approach that delivers strategic business outcomes.
Are you ready to take your data governance to the next level? Get started here: