Ensuring Success
Advanced analytics tools and projects are evolving rapidly, but it’s never been harder to make analytics projects work. More data, more sources, more structures, more users, and more use cases cause modern analytics initiatives to be incredibly complex—but not an impossible task.
Success depends on how you re-architect your data landscape, and in identifying early projects that can immediately deliver the significant value that comes from a modernized analytics infrastructure. Unlike upgrading a wireless network that can be a quick hit win, a modern analytics initiative is a journey that takes considerable time and resources due to complexity and the sheer volume of company data that continues to grow daily. At the journey’s end, the business will gain the benefits of digital transformation, and the path built to reach the long-term vision should be punctuated with short-term wins.
How you implement modern analytics will depend on the current state of your data infrastructure and your business goals. If you already have some infrastructure in place for analytics, you can still plan on replacing pieces—without disrupting the analytics program. There’s no need to shut off the existing infrastructure while the new infrastructure is moved into production. Instead, portions of the new architecture can be added, and new capabilities can be turned on with limited downtime.
Based on Informatica’s extensive experience with enterprise customers, we’ve outlined four integral projects that will get you started and a list of considerations and use cases for each one. From a financial viewpoint, identifying a use case that shows compelling real-world benefits and value will help you gain funds to continue improving your data landscape and move forward with your modern analytics program.
Organizations typically select from four projects to reach their modern analytics goals. Some start with building a data lake or optimizing their data warehouse, others catalog their data first, and another group chooses to launch self-service and real-time analytics. There isn’t one correct place to start for everyone -- the key is identifying the most appropriate path for your organization based on your current state and the most pressing desired benefits. To help you move forward on your modernized analytics journey, you will want to understand the benefits you gain from each initiative, the implementation process, and possible use cases. This is a short primer for each solution type as it relates to the modern analytics journey.
Data Lakes
Reasons for starting with a data lake project:
- Your end goal is to store and analyze large volumes of raw structured and unstructured data, such as machine data (IoT sensors), product logs (security activities), or web interactions (ads), in a single repository to serve multiple analytic services.
- Your data is a mix of unstructured and structured, it is constantly generated, and it accumulates quickly.
- You want to collect and analyze lots of data from lots of different sources (both inside and outside the organization).
- You want flexibility to repurpose your data for future machine learning and BI use cases and services.
- You face compliancy and regulations that require controls and robust data governance.
- You’re ready to start discovering the data relationships between all your business-critical data.
Blueprint
While implementing a data lake will have a unique set of steps for each organization, below is an overview of what you can expect. You will need to:
- Identify and gain access to raw sources of structured and unstructured data in your enterprise.
- Focus on key areas of data that will be consumed in stages.
- Define the phases for populating the data lake and create areas of value prior to the lake being fully populated.
- Build data ingestion routines to extract and load the raw data into the data lake with appropriate refresh methods.
- Create additional data areas in the data lake that will be used to implement data quality and data harmonization techniques on raw data that will identify logical relationships across multiple sources.
- Expose and export the harmonized and clean data to consuming applications and services, such as self-service analytics and data mining.
Use Cases
- Predictive analytics help operations managers understand and diagnose when assets may fail so that the organization can purchase and replace or schedule a maintenance crew to repair the asset.
- Analytics and machine learning provide business insights about customer activities and preferences to target the right products to the right customers at the right time.
- Predictive analytics review and analyze staffing data to reduce full-time employment leakage, improve staff retention, and ensure staffing needs are consistently met.
Data Warehouse Modernization
Reasons for starting with a data warehouse modernization project:
- Your end goal is to modernize your existing data warehouse so that you can more easily store and process the large volumes of data that are arriving from different sources to serve multiple analytic services.
- You want to offload data and computational processing.
- Your current data warehouse is not scalable to support analytic goals.
- You want access to real-time data and historical data.
- You want to collect and analyze large quantities of data from multiple sources both inside and outside the organization.
- You want to gain business insight from a broader set of enterprise data.
- You’re ready for a comprehensive solution for data cleansing, extract-transform-load (ETL), and extract-load-transform (ELT).
- You expect to build a data lake.
Blueprint
While modernizing your data warehouse will have a unique set of steps, below is an overview of what you can expect. You will need to:
- Implement a modern data storage technology to provide a landing and work area for your raw data.
- Replace potentially expensive data warehouse extract and transformation routines with raw extracts that will refresh your raw data and store it in volume.
- Reuse and redeploy current conversion, data quality, and matching routines with similar, less expensive efforts designed for the platform that leverage distributed power and speed.
- Replace your data warehouse refreshes with extracts of data that allows for more complex and comprehensive data processing due to the scalability and performance.
- Utilize the data warehouse modernization infrastructure as the beginning of your data lake – steadily adding more sources and more raw information to fill your enterprise data lake.
Use Cases
- Analyzing a more complete data set leads to more precise market and business insights.
- Online analytics can drive new sales through up-sells and cross-sells based on user activities.
- Analyzing user activities can reduce customer churn and identify potentially fraudulent activities.
Data Catalog
Reasons for starting with cataloging the data:
- Your end goal is to consume data from multiple sources for analytics and BI but it needs to be inventoried and catalogued before starting any modern analytic programs.
- You are not sure what data is available or where it is located within the organization.
- You want to identify and profile data sources.
- You want to identify what data sets are sensitive.
- You want to eliminate the collection and storing of redundant data.
- You need more data governance and access control.
- You want to identify data in meaningful business terms.
- You want your data analysts and data users to spend more time analyzing data vs. locating and preparing it.
Blueprint
While implementing a data catalog at your organization will have a unique set of steps, below is an overview of what you can expect. You will need to:
- Define your sources which can include catalog information from storage technologies, data files, data lake and data warehouse instances with the associated integration methods, and reporting technologies that address the full infrastructure.
- Connect and ingest your metadata from each of these sources into the catalog technology, capturing information on the structure, relationship, and information contained in the sources.
- Profile your data sources and use AI technology to interpret and scan the data for patterns to determine key data assets.
- Connect and ingest your data, processing metadata to understand your data from point to point and the end-to-end lineage of your data assets.
- Further curate your catalog to provide end users with a business glossary and cataloging information that includes descriptions and the nature of data captured.
- Deploy access to data stewards for data governance purposes, to data scientists for data analytics purposes, and to IT professionals for easy impact analysis of data structure changes and updates.
Use Cases
- Regulated industries gain the resources to track and monitor data lineage for audits.
- New data is onboarded with metadata and tags so it is easily available for business analysts, data scientists, and data stewards for reports, dashboards, heat maps, and key performance indicators.
- Chief data officers, business users, and BI analysts share and reuse trusted data from across the enterprise to find correlations and support data-driven business decisions.
Analytics Modernization
Reasons for starting with analytics modernization:
- Your end goal is to give users the power to find, access, and prepare enterprise data, without waiting for IT.
- You want to bring more people to the analytics table and give them the resources to access and analyze data to discover new insights.
- You expect to share algorithms, insights, visualizations, and data across different groups to meet different needs.
- You want to implement sales operations dashboards, heat map visualizations, and business insights from marketing data lakes.
- You want to leverage AI and machine learning to access data in hybrid and multi-cloud environments.
Blueprint
While implementing analytics modernization at your organization will have a unique set of steps, below is an overview of what you can expect. You will need to:
- Leverage previous data lake and data cataloging efforts by providing an integrated solution with real-time analytics capabilities for the end user community.
- Define end user communities and profiles to determine data access and availability requirements for analytics purposes.
- Configure analytics modernization technology to meet user needs by ensuring proper access to data sets, business glossary, and catalog information for specific business needs.
- Deploy access to users with appropriate training to ensure they can leverage both existing data sets and temporal data sets to find new patterns and opportunities in the data.
- Ensure users are able to store and capture their data analytics efforts for later operationalization by IT to create permanent new data sets that provide ongoing value and analysis.
Use Cases
- Business development and marketing teams can identify new services and potential customers in new markets and regions.
- Supply chain managers can run a report and test different scenarios to immediately know the impact when a supplier is unable to deliver a component. Based on the insights, they can make the necessary adjustments to keep the goods reaching customers.
- City planners can effectively re-route traffic to avoid gridlock by running multiple reports on traffic patterns and usage.