Following a rigorous methodology is key to delivering customer satisfaction and expanding analytics use cases across the business.
A Data Lake is a central repository that allows an organization to store all of its data—structured and unstructured—in volume. Data typically is stored in a raw format (i.e., as is) without first being structured. From there it can be scrubbed and optimized for the purpose at hand, be it dashboards for interactive analytics, downstream machine learning, or analytics applications. Ultimately, the Data Lake enables data teams to work collectively on the same information, which can be curated and secured for the right team or operation.
Well defined roles and responsibilities are required for the success of the data lake to ensure the optimal creation of the data lake, the ingestion of data into the data lake, and the provision of secure access to the data lake.
This best practice describes the roles and responsibilities that are required for implementing a Data Lake.
The data owner is responsible for both the possession of and the actual information in the data lake. The control of information includes the ability to access, create, modify, package, derive benefit from or remove the data, and the right to approve these access privileges to others.
The Data Governance Lead coordinates within the organization and sets up the data governance process to ensure that the data in the data lake is “fit-for-purpose" for use in business processes, for decision-making, and in business models and that any data issues are resolved.
The Data Steward ensures the data and metadata in the data lake is in compliance with the data governance process defined by the organization.
The Data Architect assesses the data needs and defines structures and processes to manage, store, and secure the data in the data lake by working with business and technical teams, and other relevant professionals within the organization.
The Cloud Admin assesses the infrastructure requirements and creates the required infrastructure in the cloud by working with the data lake owner, data architect, and the data administrators.
The Database Administrator creates and maintains the required data structures in the data lake by working with the Data Architect. The Data Administrator also optimizes the performance of the data lake and implements the security policies for the data lake.
The Informatica lead defines the processes to ingest the data lake, to prepare and curate the data, and to consume the data from the data lake. The Informatica Lead has good knowledge of Informatica cloud (IICS) for data integration, data quality and streaming, EDC for metadata, lineage and relationships, and Axon for data governance.
The Informatica Lead can also form a team of developers to develop the required processes.
The Reporting Lead defines the processes to consume the data from the data lake using reporting tools and can also form a team to develop the required reports.
The Operations Lead is responsible for monitoring the jobs scheduled for ingestion and the maintenance of the data lake. The Operations Lead can also form a team of operations analysts based on the skill sets required.
Data Lake |
Data Lake |
Data Lake |
IICS CDS/ CDQ) |
EDC |
Axon |
Reporting Tool |
|
Data Lake Owner |
C / M |
C/M | R/W/X/M |
R/W/X/M |
R/W/X/M |
R/W/X/M |
|
Data Governance Lead |
R/W |
R | R/W |
R/W |
|||
Data Steward |
R/W |
R | R/W |
R/W |
|||
Data Architect |
R/W |
||||||
Database Administrator |
C/M/R/W/X |
||||||
Cloud (AWS/Azure/GCP) Administrator |
C/M/R/W/X |
C/M/R/W/X |
|||||
Informatica Lead |
R/W/X/M |
I/R/W/X/M |
I/R/W/X/M |
||||
Reporting Lead |
I/R/W/X/M | ||||||
Operations Lead |
R/X/M |
R/X/M |
R/X/M |
R/X/M |
C- Create M- Maintain/Monitor R- Read W- Write X – Execute I - Install
Success
Link Copied to Clipboard