To accelerate the time-to-value and ensure ongoing improvement, an analytics project is a journey. It must be properly planned, implemented, monitored and optimized from vision to execution—and beyond. Following a rigorous methodology is key to delivering customer satisfaction and expanding analytics use cases across the business.
Let’s take a closer look at what’s involved.
Taking an agile approach to the design and rollout of your next generation analytics project, piloting technologies and management approaches, and testing and refining them before getting to the optimal processes for data storage and access will accelerate time-to-value.
In the planning phase, you and key business and technology stakeholders will work to determine your business drivers, objectives, and goals.
The use cases for next gen analytics, often fueled by an underlying data lake, are varied across industries. Retailers are looking to data-driven insight to improve customer conversion rates and personalize marketing campaigns to increase revenue and to predict and avoid customer churn. Bringing together the many interaction points a shopper may have with the brand, including mobile, social, physical stores, and e-commerce, into a data lake facilitates decision-making. In healthcare, analytics and data lakes are critical for delivering evidence-based care, providing a foundation for patient medical records, clinical results, genomic research, and medical devices, to discern the most effective treatments. A manufacturer can use analytics to optimize daily production, with a data lake as the real-time source for data from operational machines and sensors, production databases, safety information and employee records, which can be used to make data-driven decisions to increase organizational output and profitability.
Communication throughout the planning process is key. Ensure that all stakeholders understand the goals and how it impacts them, and that the business case is solid. Modify the expected outcomes based on feedback.
Data stewardship is also key. It’s critical to fully understand the data regulatory requirements that impact your analytics project. Regulations such as the European Union General Data Protection Regulation (GDPR), for instance, can affect your strategy by restricting data retention and speculative use of data. Organizations need a thorough understanding of data usage, planned applications, and governance requirements, which determines the levels of security and access control to meet corporate data privacy policies as well as laws.
The next step is to develop the solution architecture, including technical requirements, volume requirements, and configurations. Determine the functional requirements for the analytics project, as well as your metadata strategy. Determine the components for a data lake as well as the data models driving the analytics. Consider how users and applications will consume and analyze information, and the development requirements.
Design and build processes to ensure data quality and to integrate data from source applications. Determine the integration needs for each source application. Assess the quality of the source data, as the success of your project rests on having trusted data. Assess your master data management strategy, identifying any gaps and developing a model and architecture for data management.
Assess your organization’s resources to understand what skills you will need, and where the gaps are. Establish the project roles, such as data scientist, data analysts, developers, systems administrators, enterprise architect, data steward, and project manager. Identify where you might need new skills or consultant assistance, and develop a detailed project plan.
Following a proven implementation methodology ensures the success of the analytics project and maximizes your available resources. As the build phase begins, start by reviewing the project scope and plan developed during the planning phase. You may implement working with your own IT team or with a team of consultants.
Implementing next generation analytics consists of several key steps: data acquisition and storage, building the enterprise data catalog, data organization and governance, data modification and manipulation, and data operationalizing and publication.
Data may be acquired and ingested from enterprise applications, sensors and other devices, external feeds, and other analytic systems. Data may be acquired via batch, near real-time and real-time streaming. Hadoop is typically used for storage, and it can be implemented in a private data center or in the cloud. An enterprise data catalog automatically classifies and catalogs all data, so you have a complete view of your data so that data can be enriched, searched, and governed. A master data management solution will enable you to link all of your data with a common reference.
Establishing governance policies is critical to that data is used in accordance with corporate and regulatory data privacy mandates. The data science team must also build next-generation analytics models. Finally, the solution can be put into operations, and the IT team typically takes over the primary responsibility.
Testing and tuning is critical. Key stakeholders should be involved in acceptance testing to ensure that analytics models and data lake meet the business objectives. Establish the overall test strategy, including the user acceptance testing plan. Determine the performance benchmarks, and test thoroughly to ensure that these will be reached in a production environment. User training is also critical to facilitate acceptance and adoption.
Once the analytics solution is deployed and delivering results, the project moves into an operations and monitoring phase. The IT operations team generally takes over the day-to-day running of the infrastructure from the data scientists or consulting team.
Your next generation analytics will quickly become a core part of your business operations, so it’s critical to understand how all the components are operating and performance. It’s essential to be notified when issues occur or operational performance falls below acceptable thresholds.
Monitor the system performance to ensure service-level agreements are met and monitor usage and data quality. Monitor the size and variety of your data sets, your data management capabilities, and analytics models to ensure that they continue to meet the business needs.
You may continue to augment your staff with consultants who have relevant experience. It’s often difficult to find employees with Hadoop and data lake skills, and consultants can help reduce project risks, shorten time to delivery, and provide a valuable transfer of knowledge to your employees over time.
Work with stakeholders to identify what is working well from their perspective, and to identify new analytics use cases and applications.
Starting small and demonstrating success to the business is an effective way to reap the full value of the data lake investment. Early projects may focus on a small constituency of data analysts and developers. Later phases may include self-service that address the needs of business users. The business value and ROI of next generation analytics is not complete until the full range of users is enabled.
You may expand analytics for other functions or business units. You may continue to refine the analytics models or create new ones. Data continues to evolve and grow, so you may add new data sets or sources into the data lake, such as human language text from the call sensor or real-time sensor data. Line-of-business managers may reengage – or engage for the first time – to proactively deliver outcomes and drive greater use of analytics across the organization and maximize your analytics investment.
Above all, be diligent about policy-based data governance. A common criticism of a data lake is that it becomes a dumping ground for data of varying quality, and quickly becomes a data swamp. By curating this raw data, data scientists can create useful models for different business contexts.
Always be on the lookout for ways to minimize operational costs. For instance, data storage can be a significant portion of the cost associated with analytics.