Data Lake Roadmap

Success

Manage your Success Plans and Engagements, gain key insights into your implementation journey, and collaborate with your CSMs

Success

Success Accelerators

Accelerate your Purchase to Value engaging with Informatica Architects for Customer Success

My Engagements

All your Engagements at one place
Communities

A collaborative platform to connect and grow with like-minded Informaticans across the globe

Communities

Product Communities

Connect and collaborate with Informatica experts and champions

Discussions

Have a question? Start a Discussion and get immediate answers you are looking for

User Groups

Customer-organized groups that meet online and in-person. Join today to network, share ideas, and get tips on how to get the most out of Informatica

Get Started

Community Guidelines
Knowledge Center

Troubleshooting documents, product guides, how to videos, best practices, and more

Knowledge Center

Knowledge Base

One-stop self-service portal for solutions, FAQs, Whitepapers, How Tos, Videos, and more

Support TV

Video channel for step-by-step instructions to use our products, best practices, troubleshooting tips, and much more

Documentation

Information library of the latest product documents

Velocity (Best Practices)

Best practices and use cases from the Implementation team
Learn

Rich resources to help you leverage full capabilities of our products

Learn

Trainings

Role-based training programs for the best ROI

Certifications

Get certified on Informatica products. Free, Foundation, or Professional

Product Learning Paths

Free and unlimited modules based on your expertise level and journey

Experience Lounge

Self-guided, intuitive experience platform for outcome-focused product capabilities and use cases
Resources

Library of content to help you leverage the best of Informatica products

Resources

Tech Tuesdays Webinars

Most popular webinars on product architecture, best practices, and more

Product Availability Matrix

Product Availability Matrix statements of Informatica products

SupportFlash

Monthly support newsletter

Support Documents

Informatica Support Guide and Statements, Quick Start Guides, and Cloud Product Description Schedule

Product Lifecycle

End of Life statements of Informatica products

Ideas

Events

Change Request Tracking

Marketplace

| Sign up

Velocity
Strategy

Strategy

Data Strategy

Centers of Excellence

Enterprise Data Governance

Enterprise Architecture

Program & Change Management
Solutions

Solutions

Cloud Data Warehouse & Data Lake

Data Lake

Data Warehouse Modernization

Analytics Modernization

Application Integration

360 Engagement

Multidomain MDM

Customer 360 SaaS

Product 360

Supplier 360

Reference 360

Data Governance & Privacy

Data Catalog & Metadata Management

Data Privacy

Regulatory Compliance

Data Quality

Data Access and Provisioning
Stages

Stages

Cloud Data Warehouse & Data Lake

360 Engagement

Data Governance & Privacy

Following a rigorous methodology is key to delivering customer satisfaction and expanding analytics use cases across the business.
More
- Success
  
  Manage your Success Plans and Engagements, gain key insights into your implementation journey, and collaborate with your CSMs
  
  Manage your Success Plans and Engagements, gain key insights into your implementation journey, and collaborate with your CSMs
  
  Success Accelerators
  
  Accelerate your Purchase to Value engaging with Informatica Architects for Customer Success
  
  My Engagements
  
  All your Engagements at one place
- Communities
  
  A collaborative platform to connect and grow with like-minded Informaticans across the globe
  
  A collaborative platform to connect and grow with like-minded Informaticans across the globe
  
  Product Communities
  
  Connect and collaborate with Informatica experts and champions
  
  Discussions
  
  Have a question? Start a Discussion and get immediate answers you are looking for
  
  User Groups
  
  Customer-organized groups that meet online and in-person. Join today to network, share ideas, and get tips on how to get the most out of Informatica
  
  Get Started
  
  Community Guidelines
- Knowledge Center
  
  Troubleshooting documents, product guides, how to videos, best practices, and more
  
  Troubleshooting documents, product guides, how to videos, best practices, and more
  
  Knowledge Base
  
  One-stop self-service portal for solutions, FAQs, Whitepapers, How Tos, Videos, and more
  
  Support TV
  
  Video channel for step-by-step instructions to use our products, best practices, troubleshooting tips, and much more
  
  Documentation
  
  Information library of the latest product documents
  
  Velocity (Best Practices)
  
  Best practices and use cases from the Implementation team
- Learn
  
  Rich resources to help you leverage full capabilities of our products
  
  Rich resources to help you leverage full capabilities of our products
  
  Trainings
  
  Role-based training programs for the best ROI
  
  Certifications
  
  Get certified on Informatica products. Free, Foundation, or Professional
  
  Product Learning Paths
  
  Free and unlimited modules based on your expertise level and journey
  
  Experience Lounge
  
  Self-guided, intuitive experience platform for outcome-focused product capabilities and use cases
- Resources
  
  Library of content to help you leverage the best of Informatica products
  
  Library of content to help you leverage the best of Informatica products
  
  Tech Tuesdays Webinars
  
  Most popular webinars on product architecture, best practices, and more
  
  Product Availability Matrix
  
  Product Availability Matrix statements of Informatica products
  
  SupportFlash
  
  Monthly support newsletter
  
  Support Documents
  
  Informatica Support Guide and Statements, Quick Start Guides, and Cloud Product Description Schedule
  
  Product Lifecycle
  
  End of Life statements of Informatica products
  
  Ideas
  
  Events
  
  Change Request Tracking
  
  Marketplace

Last Updated Date Oct 12, 2023 |

Enterprises are building intelligent data lakes to drive more value from their raw data with next generation analytics. Data lake analytics are proving to be change agents that transform business by uncovering opportunities, detecting problems, accelerating product innovations, and enhancing customer experiences.

The rewards from implementing a next generation analytic data lake are many, but it is not an easy task. To remove some of the confusion and complexity, this roadmap explains the seven steps necessary to implement a next generation analytic data lake within any enterprise. For a more visual representation and timing, take a look here.

1. Analyze and Evaluate the Business

Much like developing a business plan, data lake analytics start with a clear analysis and evaluation of business challenges. During this initial phase, you frame the problem and identify use cases that will have the biggest positive impact on the business. The best business use cases include input from multiple stakeholders such as the analytics team, line of business leaders, data administrators and data scientists, so you may want to invite all these team members to a workshop. Together, the team can develop a business use case with a strong problem statement, description of problem behavior, complications, business impact, and problem history with examples. Keep an open mind during this discovery phase as it is extremely likely to generate unexpected results, and they can lead to real business transformation.

This stage also includes gathering sample data and performing data discovery and analysis. Data scientists and analytic developers will gather and prepare data across multiple segments and from all the possible data sources such as sensors, logs, and enterprise data. The goal is to find trustworthy data that has the trend or event sequence that is the root cause of the problem pattern or behavior.

During data discovery and analysis, the data will need to be both cleansed and validated to ensure accurate results.

Informatica’s Axon Data Governance and Enterprise Data Catalog can help gather sample data and perform discovery and analysis. To prepare and organize sample data, consider Informatica Cloud Data Integration and Informatica Data Quality which can perform both data cleansing and validation.

2. Develop Business Hypothesis

During this phase, business users and data analysts will assemble and consolidate multiple use cases. They will test the hypothesis from the uses cases and perform fit-gap analysis to understand if the data is appropriate and useful for the use case.

Informatica’s Axon Data Governance and Enterprise Data Catalog can help assemble and consolidate uses cases.

3. Develop Analytics Approach

Multiple analytical models exist, such as time-series forecasting, classification, or clustering. A linear programming, for example, is the best analytic approach for finding the most appealing price point of a product, and a statistical model is most appropriate for a random process like setting up alerts to detect credit card frauds.

You will need to evaluate the use case to determine the right analytic technique based on data patterns and behaviors.

Once the correct analytic technique is chosen to address the business problem, you will perform a fit-gap analysis that will identify key data and components that fit the model and gaps that must be filled. Based on the findings from the fit-gap analysis, you can identify the appropriate analytics algorithm and modeling technique for the business use case. Data patterns and segmentation will inform these decisions.

Informatica’s Axon Data Governance and Enterprise Data Catalog can help evaluate and curate use cases. To perform fit-gap analysis, consider Informatica Cloud Data Integration, Informatica Data Quality, and a modeling tool which can help decide appropriate data algorithms needed to solve the use case.

4. Build and Prepare Data Sets

The most significant phase for the analytics team is developing data sets that capture both transactional data and the relationships between data. To build the data sets, the data team must acquire data and understand the data characteristics, such as volume, frequency, availability, and complexity. The “noise” from the data also must be discarded, and that must be labeled correctly with metadata.

Both data at rest and in motion must be ingested by building pipelines that ingest batch and streaming data, creating sampling techniques, and performing data cleansing and data validation.

The collected data patterns should be complete, rich, random, reliable, and consistent. Randomness assures that the samples represent statistical characteristics of the complete data set. Unstructured and structured data must be integrated together and in a consumable format.

Informatica’s Axon Data Governance, Enterprise Data Catalog, PowerExchange Change Data Capture, Informatica Cloud Data Integration and Informatica Cloud Data Quality can help build pipelines to ingest batch and NRT data. To perform data cleansing and validation, consider Informatica Cloud Data Quality. Informatica Cloud Data Integration, Axon Data Governance, and Enterprise Data Catalog can help build pipelines to ingest streaming and real-time data. Informatica Cloud Data Quality can perform both data cleansing and validation for real-time and streaming data.

5. Select and Build the Analytical Model

Using the pre-defined analytic technique, it is time to build the analytical models. For a successful analytical model, the analytics team should select algorithms that use small but meaningful data samples and test it rigorously. Crunching the data sets with rigorous testing will validate outcomes. You will also want to apply data visualization techniques to uncover patterns and trends. Finally, you will need to review results and scores and train the models so they can mature. The data is scored based on validation/comparison, aggregation/summarization, maximization/minimization, rare-event detection, or unusual patter detection. The data scoring will depend on the chosen analytic approach.

6. Build the Scalable, High-Performance, Production-ready System

The analytics and architecture team need to architect and develop an end-state solution that aligns with current business needs and adapts to future ones. The goals are to operationalize the pipelines and optimize the data lake ecosystem for scalability. Key questions to ask are:

How complex are the data sets and algorithms?
Does the analytical model require specialized configurations?
Can the algorithm perform across big data and IoT dimensions?
Will the system scale as more users, data, and models are added?
Does the model require in-memory or in-database processing?
How flexible is the platform in its ability to inherit new algorithms?
Will the outcome of analysis be a report or an alert or a simulation model?

Informatica has multiple products to help operationalize the pipelines. They include Informatica Cloud Data Integration, PowerExchange Change Data Capture, Axon Data Governance, Enterprise Data Catalog and Informatica Cloud Data Quality.

7. Measure and Monitor

The final stage is not final, as it is a continuous process that will always refine and improve the analytic models. It includes measuring the effectiveness by correlating business outcomes and insights with model results. Based on results, you will calibrate the models through continuous discovery and analysis. You can monitor the models with metadata and visualization methods. And finally establish a feedback loop for continuous improvements.

The most effective data models are repeatable. Following these steps ensures that the business is creating data lake analytical models that solve business problems and deliver real value to the organization.