• Success
    Manage your Success Plans and Engagements, gain key insights into your implementation journey, and collaborate with your CSMs
    Success
    Accelerate your Purchase to Value engaging with Informatica Architects for Customer Success
    All your Engagements at one place
  • Communities
    A collaborative platform to connect and grow with like-minded Informaticans across the globe
    Communities
    Connect and collaborate with Informatica experts and champions
    Have a question? Start a Discussion and get immediate answers you are looking for
    Customer-organized groups that meet online and in-person. Join today to network, share ideas, and get tips on how to get the most out of Informatica
  • Knowledge Center
    Troubleshooting documents, product guides, how to videos, best practices, and more
    Knowledge Center
    One-stop self-service portal for solutions, FAQs, Whitepapers, How Tos, Videos, and more
    Video channel for step-by-step instructions to use our products, best practices, troubleshooting tips, and much more
    Information library of the latest product documents
    Best practices and use cases from the Implementation team
  • Learn
    Rich resources to help you leverage full capabilities of our products
    Learn
    Role-based training programs for the best ROI
    Get certified on Informatica products. Free, Foundation, or Professional
    Free and unlimited modules based on your expertise level and journey
    Self-guided, intuitive experience platform for outcome-focused product capabilities and use cases
  • Resources
    Library of content to help you leverage the best of Informatica products
    Resources
    Most popular webinars on product architecture, best practices, and more
    Product Availability Matrix statements of Informatica products
    Monthly support newsletter
    Informatica Support Guide and Statements, Quick Start Guides, and Cloud Product Description Schedule
    End of Life statements of Informatica products
Last Updated Date May 25, 2021 |

This document explores the role of the business focused data scientist. The primary function of the data scientist is to acquire, explore and analyze the relationships in vast volumes of data from different sources and systems with the aim of gaining insights that can improve the operational efficiency of the business and present those findings to management for action in the form of data stories or data products; all while being ethically aware. Individuals suitable for this role often demonstrate curiosity and creativity alongside their technical, mathematical and statistical qualifications. The tasks and focus of an individual data scientist can be quite varied. the tasks that they carry out and the technologies that they use can differ considerably. In some organizations the data scientist role may be fulfilled by a team rather than an individual. There are a number of typical high-level tasks that the data scientist or data science team will engage in such as:

  • Data Preparation
  • Data Acquisition
  • Programming
  • Data Mining
  • Machine Learning
  • Predictive Analytics
  • Design (For example, models, experiments, data store, segmentation)
  • Data Visualization
  • Data Science Operations
    • Ensuring Repeatability
    • Data Science Lineage

A data scientist is an important participant in big data projects and data governance programs. While supporting their own core activities the data scientist is frequently involved in and needs to be consulted on: -

  • Data standardization and cleansing efforts
  • Creating, refining and managing data definitions in the business glossary
  • Designing the data architectures used for sourcing and analyzing data

Two key areas for the data scientist that can be considered part of data science operations are repeatability and lineage. The data scientist should be advocating for data architectures that support their needs for repeatability of data sets and ease of access to that same set of records, months or even years ahead in the future to be able to revalidate and prove their results. Repeatability also means the repeatability of the processes. Data acquisition, preparation and mining processes need to be re-useable on different data sets. One facet of repeatability is data science lineage. While data lineage can be thought of as the route for the data from point A to B that answers the question “ How did we get here?”, data science lineage is a broader question of tracking, documenting and verifying the experiment or process so that is can be performed by other data scientists in the organization.

A data scientist will automate or arrange automation of tasks such as; initial profiling of new data sets, data quality clean up and data integration primarily for the acquisition, preparation and loading of the data feeds that they will use. These are tasks that can then be handed off to IT technical teams to be operationalized and setup on a production schedule once they become a standard activity.

It should be noted that data scientists are often aligned with business units and that there may be several in one organization focused on different areas requiring access to many of the same data sets.

Responsibilities

  • Interprets & translates questions from the business to statistical analyses, hypothesis, question, models, and required data type
  • Locates data set, profiles, cleanses and prepares data from multiple sources
  • Study the data for business relevant patterns
  • Builds Statistical and Learning models: e.g., revenue forecasting, customer churn prediction, next best offers, fraud detection, improving patient outcomes, etc.
  • Delivers data stories or data products: trends, metrics, models (with training and test datasets)
  • Present findings in an understandable way to the business so that they can be acted upon.

Qualifications/Certifications

Due to the flexible nature of data science activities the qualifications and certifications required vary considerably from position to position, some common ones are included below. Some of the qualifications and certifications are listed below to provide a flavor of the skills that are required. A wide range is covered may require compromise when searching for individuals to find this role.

  • Deep industry and organizational knowledge
  • Classical and Bayesian Statistics
  • Linear Algebra
  • Machine Learning Models
  • Programming / Scripting Languages
  • Data Mining
  • Familiarity with Data Quality and Data Integration Processes
  • Strong Business Analysis and problem-solving skills
  • Ability to drive additional analytical hypothesis to test and identify root cause in data sets beyond initial result sets
  • Strong presentation and communication skills
  • Demonstrable Curiosity and Creativity

Additional useful technical skills are:

  • Experience with one or more statistical modelling tools (R, SAS, SPSS…)
  • Experience with one or more reporting tools (Tableau, QlickView, MS Excel)
  • Experience with one or more programming or scripting languages (Python, etc.)

Table of Contents

Success

Link Copied to Clipboard