Over the past few decades most organizations have implemented at least one Data Warehouse, with the common purpose of providing enterprise-wide reporting and a single version of the truth data analysis. A typical data warehouse implementation extracts data from a variety of sources using an on-premise tool or code set with the purpose of cleansing, relating and ultimately storing a copy of key data elements. The storage medium is often a large on-premise database designed with a star schema model for storage and reporting.
Over the years Data Warehouses have provided many benefits for organizations, but there are a number of limitations that have companies seeking ways to improve their Data Warehouse infrastructure:
Storage Enhancements: Data growth and expansion of unstructured data sources can put a strain on the capability and cost of current storage technologies. More cost effective and scalable storage solutions have been introduced (often exposed in the cloud) that take the burden off of the IT team to manage, upgrade , backup, and expand the storage technology for new data volumes. Modern storage technology also enhances the ability to manage and process more data of wider varieties.
Analytics: Real-time and Artificial Intelligence capabilities are often key components in modernized data warehouses. These features may require new data structures and tools to provide end users with faster and more advanced data analysis. These new capabilities increase the value end users receive from the data, while reducing the delay and burden on IT to publish new metrics and analytics through the use of self-service data preparation.
Cost Control: Cloud based technologies can provide a more predictable cost model and potential cost savings in a number of ways. On the hardware side, cost is incurred as needed by leveraging compute cycles on-demand, rather than paying for a large technology footprint which is only fully used on rare heavy-volume occasions. On the software side, upgrades and maintenance costs are reduced, as these are often provided by the cloud vendor.
While the Data Warehouse continues to be the cornerstone of standardized analytics, dashboards, and reporting that drives key business decisions, today’s advanced technology can provide much more value at potentially a lower overall cost. Organizations can potentially attain a number of tangible business benefits by reviewing their current architecture and determining how to best leverage new technologies and capabilities to renew and modernize their environment. Informatica’s Data Warehouse Modernization solution is a roadmap to help make this transition while a current Data Warehouse is already in place. A well-planned modernization approach will ensure minimal disruptions and loss of value while installing a modernized architecture for the next generation of data analytics.
Let’s explore key steps to implementing a modern Data Warehouse, assuming a move from an on-premise architecture for data management and data storage to a cloud-based solution:
There are a number of cloud-hosted and born in the cloud storage technologies that provide scalable compute and storage cycles and a variety of new data access capabilities for the modern Data Warehouse. These tools generally come with the need for less IT support as the cloud vendor is responsible for uptime SLA’s, data backup, and general system maintenance. The Modern Data Warehouse typically involves some or all of the architecture hosted in the cloud. The architecture of the data structures also needs to be evaluated and potentially redesigned/realigned to a structure that best leverages these modern storage technologies. The new modernized architecture will be setup and configured to run in parallel with the existing Data Warehouse through the duration of the
In conjunction with a move to the cloud for storage and access, companies often find this is an optimal time to modernize data integration technology to a cloud-based solution as well. Cloud-based integration removes the need to manage integration-only hardware and servers and provides quicker and more seamless upgrades. For Informatica customers this often means a migration from Informatica PowerCenter to Informatica Intelligent Cloud Services. While it’s possible to continue to leverage a technology like Informatica PowerCenter to load new cloud storage technologies, since testing will occur on the integration loads into the new storage technology, it’s a great time to pivot to Informatica’s Cloud Integration Platform. Modern cloud integration tools provide a host of long-term benefits such as new cloud optimized integration features, reduced infrastructure maintenance, and elastic compute options. If multiple integration technologies are being used for the current Data Warehouse, this is also a golden opportunity to rebuild and migrate any other technology used to load the Data Warehouse onto a single platform. Database scripts, hand code, and other one-off integration technologies that are part of the current Data Warehouse are potential candidates for moving into a single integration platform for better management and control.
Three required components of migration include:
There are going to be inherent changes to loading data into a new modern data storage technology. Even if the schema is not altered, the methods, transports, and capabilities are likely different across technologies. These patterns will need to be migrated into the modern data integration platform.
Informatica provides a conversion tool to assist customers in moving from PowerCenter to Informatica Cloud Data Integration. However, even this process will require some intervention as there are often objects within these data integrations that will require updating that are very tied to the previous storage tool. Informatica also recommends moving non-Informatica related data integration components into the Informatica Cloud Platform at this time. The timing advantage for making this change now will be in having a working version to use as a model to quickly replicate in the modern technology while validating against the current Data Warehouse processes and results.
Organizations may also look to leverage real time capabilities to populate data more frequently to the Modern Data Warehouse as part of this migration or they could wait for a future phase of modernization while simply focusing on the standard refresh cycle. Implementing real time capabilities may require additional time in project planning.
There are cases where the new storage technology can be reloaded by simply pulling all historical data from source systems. For many organizations, however, this is impractical because the current Data Warehouse has more history than what exists in the current systems and there is history of changes over time that the operational systems do not store. For this reason, often the best solution is a one-time load of the current Data Warehouse data into the new Modern Data Warehouse using a number of one-time loads. Informatica Cloud Data Integration can be leveraged to automate the build of these jobs to quickly port the current snapshot of the existing Data Warehouse into the Modern Data Warehouse. This is often a network-intensive (and it can be a time-intensive) one-time task. Informatica recommends executing this step while still running both warehouses in parallel, rather than waiting for the cutover to the new technology. This allows for time to fine-tune any issues with the model and configurations in the modernized storage technology.
After the one-time load is completed to the Modern Data Warehouse, the incremental loads will be turned on to ingest ongoing changes into the Warehouse. These loads will run in parallel for a time, as testing is performed to ensure that the Modern Data Warehouse results match the old Data Warehouse. Any data anomalies will be researched and addressed. This parallel run-time may increase stress on the source systems with duplicate extracts of data, so while the length of time this persists should be minimized, it is however, an important time to validate and ensure that the modernized warehouse is providing the same or better accuracy than the current data warehouse. Informatica provides validation tools to accelerate the validation process.
The reporting and analytics environment supported by the current Data Warehouse will need to be repointed and rereleased using the Modern Data Warehouse as the new source, and potentially new visualization tools as well. The effort to migrate this infrastructure may be simple and seamless or it may require effort depending on how different the new storage technology is from the previous as well as new capabilities being utilized such as expanded data preparation and ad hoc analysis. The previous technology may not be able to be fully decommissioned until the current reporting, analytics, and extract infrastructure related to the current Data Warehouse is repointed and users are migrated to the new tools and warehouse.
The now historical Data Warehouse can now be decommissioned, freeing up servers, data base licensing, and IT support costs. The Modern Data Warehouse is now the center of the analytics infrastructure, providing all the benefits of cloud-based systems such as scalability and upgrades ease, and reduced support costs. Depending on industry requirements some organizations may choose to archive the previous warehouse for legal concerns or other reasons.
From here an organization may expand into the next phase of modernizing the Data warehouse such as continuing to evolve batch loads to more real-time loads where possible, as well as expanding new data analysis and mining capabilities provided by the new Modern Data Warehouse technology. Additional data sets previously too big or too complex to be managed by older storage technology may now be leveraged in the modernized scalable technology. This may drive new areas for insights and end user value to drive enhanced business outcomes.
RESOURCES
Cloud Datawarehouse & Data Lake
PLAN
IMPLEMENT
MONITOR
OPTIMIZE