• Success
    Manage your Success Plans and Engagements, gain key insights into your implementation journey, and collaborate with your CSMs
    Accelerate your Purchase to Value engaging with Informatica Architects for Customer Success
    All your Engagements at one place
  • Communities
    A collaborative platform to connect and grow with like-minded Informaticans across the globe
    Connect and collaborate with Informatica experts and champions
    Have a question? Start a Discussion and get immediate answers you are looking for
    Customer-organized groups that meet online and in-person. Join today to network, share ideas, and get tips on how to get the most out of Informatica
  • Knowledge Center
    Troubleshooting documents, product guides, how to videos, best practices, and more
    Knowledge Center
    One-stop self-service portal for solutions, FAQs, Whitepapers, How Tos, Videos, and more
    Video channel for step-by-step instructions to use our products, best practices, troubleshooting tips, and much more
    Information library of the latest product documents
    Best practices and use cases from the Implementation team
  • Learn
    Rich resources to help you leverage full capabilities of our products
    Role-based training programs for the best ROI
    Get certified on Informatica products. Free, Foundation, or Professional
    Free and unlimited modules based on your expertise level and journey
    Self-guided, intuitive experience platform for outcome-focused product capabilities and use cases
  • Resources
    Library of content to help you leverage the best of Informatica products
    Most popular webinars on product architecture, best practices, and more
    Product Availability Matrix statements of Informatica products
    Monthly support newsletter
    Informatica Support Guide and Statements, Quick Start Guides, and Cloud Product Description Schedule
    End of Life statements of Informatica products
Last Updated Date Oct 12, 2023 |

Enterprises are building intelligent data lakes to drive more value from their raw data with next generation analytics. Data lake analytics are proving to be change agents that transform business by uncovering opportunities, detecting problems, accelerating product innovations, and enhancing customer experiences.

The rewards from implementing a next generation analytic data lake are many, but it is not an easy task. To remove some of the confusion and complexity, this roadmap explains the seven steps necessary to implement a next generation analytic data lake within any enterprise. For a more visual representation and timing, take a look here.

1. Analyze and Evaluate the Business

Much like developing a business plan, data lake analytics start with a clear analysis and evaluation of business challenges. During this initial phase, you frame the problem and identify use cases that will have the biggest positive impact on the business. The best business use cases include input from multiple stakeholders such as the analytics team, line of business leaders, data administrators and data scientists, so you may want to invite all these team members to a workshop. Together, the team can develop a business use case with a strong problem statement, description of problem behavior, complications, business impact, and problem history with examples. Keep an open mind during this discovery phase as it is extremely likely to generate unexpected results, and they can lead to real business transformation.

This stage also includes gathering sample data and performing data discovery and analysis. Data scientists and analytic developers will gather and prepare data across multiple segments and from all the possible data sources such as sensors, logs, and enterprise data. The goal is to find trustworthy data that has the trend or event sequence that is the root cause of the problem pattern or behavior.

During data discovery and analysis, the data will need to be both cleansed and validated to ensure accurate results.

Informatica’s Axon Data Governance and Enterprise Data Catalog can help gather sample data and perform discovery and analysis. To prepare and organize sample data, consider Informatica Cloud Data Integration and Informatica Data Quality which can perform both data cleansing and validation.

2. Develop Business Hypothesis

During this phase, business users and data analysts will assemble and consolidate multiple use cases. They will test the hypothesis from the uses cases and perform fit-gap analysis to understand if the data is appropriate and useful for the use case.

Informatica’s Axon Data Governance and Enterprise Data Catalog can help assemble and consolidate uses cases.

3. Develop Analytics Approach

Multiple analytical models exist, such as time-series forecasting, classification, or clustering. A linear programming, for example, is the best analytic approach for finding the most appealing price point of a product, and a statistical model is most appropriate for a random process like setting up alerts to detect credit card frauds.

You will need to evaluate the use case to determine the right analytic technique based on data patterns and behaviors.

Once the correct analytic technique is chosen to address the business problem, you will perform a fit-gap analysis that will identify key data and components that fit the model and gaps that must be filled. Based on the findings from the fit-gap analysis, you can identify the appropriate analytics algorithm and modeling technique for the business use case. Data patterns and segmentation will inform these decisions.

Informatica’s Axon Data Governance and Enterprise Data Catalog can help evaluate and curate use cases. To perform fit-gap analysis, consider Informatica Cloud Data IntegrationInformatica Data Quality, and a modeling tool which can help decide appropriate data algorithms needed to solve the use case.

4. Build and Prepare Data Sets

The most significant phase for the analytics team is developing data sets that capture both transactional data and the relationships between data. To build the data sets, the data team must acquire data and understand the data characteristics, such as volume, frequency, availability, and complexity. The “noise” from the data also must be discarded, and that must be labeled correctly with metadata.

Both data at rest and in motion must be ingested by building pipelines that ingest batch and streaming data, creating sampling techniques, and performing data cleansing and data validation.

The collected data patterns should be complete, rich, random, reliable, and consistent. Randomness assures that the samples represent statistical characteristics of the complete data set. Unstructured and structured data must be integrated together and in a consumable format.

Informatica’s Axon Data GovernanceEnterprise Data CatalogPowerExchange Change Data CaptureInformatica Cloud Data Integration and Informatica Cloud Data Quality can help build pipelines to ingest batch and NRT data. To perform data cleansing and validation, consider Informatica Cloud Data QualityInformatica Cloud Data IntegrationAxon Data Governance, and Enterprise Data Catalog can help build pipelines to ingest streaming and real-time data. Informatica Cloud Data Quality can perform both data cleansing and validation for real-time and streaming data.

5. Select and Build the Analytical Model

Using the pre-defined analytic technique, it is time to build the analytical models. For a successful analytical model, the analytics team should select algorithms that use small but meaningful data samples and test it rigorously. Crunching the data sets with rigorous testing will validate outcomes. You will also want to apply data visualization techniques to uncover patterns and trends. Finally, you will need to review results and scores and train the models so they can mature. The data is scored based on validation/comparison, aggregation/summarization, maximization/minimization, rare-event detection, or unusual patter detection. The data scoring will depend on the chosen analytic approach.

6. Build the Scalable, High-Performance, Production-ready System

The analytics and architecture team need to architect and develop an end-state solution that aligns with current business needs and adapts to future ones. The goals are to operationalize the pipelines and optimize the data lake ecosystem for scalability. Key questions to ask are:

  • How complex are the data sets and algorithms?
  • Does the analytical model require specialized configurations?
  • Can the algorithm perform across big data and IoT dimensions?
  • Will the system scale as more users, data, and models are added?
  • Does the model require in-memory or in-database processing?
  • How flexible is the platform in its ability to inherit new algorithms?
  • Will the outcome of analysis be a report or an alert or a simulation model?

Informatica has multiple products to help operationalize the pipelines. They include Informatica Cloud Data IntegrationPowerExchange Change Data CaptureAxon Data GovernanceEnterprise Data Catalog and Informatica Cloud Data Quality.

7. Measure and Monitor

The final stage is not final, as it is a continuous process that will always refine and improve the analytic models. It includes measuring the effectiveness by correlating business outcomes and insights with model results. Based on results, you will calibrate the models through continuous discovery and analysis. You can monitor the models with metadata and visualization methods. And finally establish a feedback loop for continuous improvements.

The most effective data models are repeatable. Following these steps ensures that the business is creating data lake analytical models that solve business problems and deliver real value to the organization.

Table of Contents


Link Copied to Clipboard