Following a rigorous methodology is key to delivering customer satisfaction and expanding analytics use cases across the business.
This document explores the role of the business focused data scientist. The primary function of the data scientist is to acquire, explore and analyze the relationships in vast volumes of data from different sources and systems with the aim of gaining insights that can improve the operational efficiency of the business and present those findings to management for action in the form of data stories or data products; all while being ethically aware. Individuals suitable for this role often demonstrate curiosity and creativity alongside their technical, mathematical and statistical qualifications. The tasks and focus of an individual data scientist can be quite varied. the tasks that they carry out and the technologies that they use can differ considerably. In some organizations the data scientist role may be fulfilled by a team rather than an individual. There are a number of typical high-level tasks that the data scientist or data science team will engage in such as:
A data scientist is an important participant in big data projects and data governance programs. While supporting their own core activities the data scientist is frequently involved in and needs to be consulted on: -
Two key areas for the data scientist that can be considered part of data science operations are repeatability and lineage. The data scientist should be advocating for data architectures that support their needs for repeatability of data sets and ease of access to that same set of records, months or even years ahead in the future to be able to revalidate and prove their results. Repeatability also means the repeatability of the processes. Data acquisition, preparation and mining processes need to be re-useable on different data sets. One facet of repeatability is data science lineage. While data lineage can be thought of as the route for the data from point A to B that answers the question “ How did we get here?”, data science lineage is a broader question of tracking, documenting and verifying the experiment or process so that is can be performed by other data scientists in the organization.
A data scientist will automate or arrange automation of tasks such as; initial profiling of new data sets, data quality clean up and data integration primarily for the acquisition, preparation and loading of the data feeds that they will use. These are tasks that can then be handed off to IT technical teams to be operationalized and setup on a production schedule once they become a standard activity.
It should be noted that data scientists are often aligned with business units and that there may be several in one organization focused on different areas requiring access to many of the same data sets.
Due to the flexible nature of data science activities the qualifications and certifications required vary considerably from position to position, some common ones are included below. Some of the qualifications and certifications are listed below to provide a flavor of the skills that are required. A wide range is covered may require compromise when searching for individuals to find this role.
Additional useful technical skills are:
Success
Link Copied to Clipboard