0% found this document useful (0 votes)
305 views

The AI Human Capital Playbook: 1 Workera

This document outlines the typical tasks and roles within AI organizations. It discusses the differences between data science and machine learning organizations and their project lifecycles. The project lifecycle typically involves five main tasks: data engineering, modeling, deployment, business analytics, and AI infrastructure. Data engineering focuses on collecting and preparing data for modeling or analysis.

Uploaded by

mikitito
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
305 views

The AI Human Capital Playbook: 1 Workera

This document outlines the typical tasks and roles within AI organizations. It discusses the differences between data science and machine learning organizations and their project lifecycles. The project lifecycle typically involves five main tasks: data engineering, modeling, deployment, business analytics, and AI infrastructure. Data engineering focuses on collecting and preparing data for modeling or analysis.

Uploaded by

mikitito
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

The AI Human

Capital
Playbook
Companies everywhere are building AI teams, but it’s
still unclear what aspiring machine learning engineers,
data scientists, and software engineers should focus
on when applying for AI jobs. This report walks you
through different types of organization, different roles
within them, the tasks you’ll work on, and the skills
recruiters are looking for in each role. It is the result
of two large-scale studies of the supply of and
business demand for AI talent. We're here to provide
mentorship and help you find a job that suits your
skills, experience, and aspirations.

1 Workera
CONTENTS
AI organizations.....................................................................................................................................4
The AI project development lifecycle: Tasks and Skills..................................................................6
Overview of the AI project development lifecycle...........................................................7
Data Engineering......................................................................................................................8
Modeling.................................................................................................................................. 10
Deployment............................................................................................................................ 12
Business Analytics................................................................................................................. 14
AI Infrastructure.................................................................................................................... 16
The roles of an AI team.................................................................................................................... 18
Overview................................................................................................................................ 19
Data Scientist......................................................................................................................... 20
Data Analyst........................................................................................................................... 21
Machine Learning Engineer................................................................................................. 22
Machine Learning Researcher............................................................................................ 23
Software Engineer - Machine Learning............................................................................. 24
Software Engineer................................................................................................................. 25
Conclusion........................................................................................................................................... 26

2 Workera
This report is a work in progress and is being
provided to the public for information purposes.
Because it is a work in progress, there are parts that
are either missing or will be progressively revised as
our team learns more about the supply of and the
demand for AI talent.

We welcome your comments and feedback.


Please send any comments and/or questions to
Kian Katanforoosh (kian@workera.ai).

- The Workera team

3 Workera
PART 1

AI
Organizations
We interviewed 100+ data science and machine
learning leaders from companies such as Airbnb,
Amazon, Earnin, Facebook, Google, Landing.ai, Lyft,
Proofpoint, and Upstart to understand the roles, tasks,
and skills that make up a corporate AI organization
(this is opposed to an academic setting, where often
the AI project development lifecycle differs).

4 Workera
Data Science vs.
Machine Learning
Organizations

Based on our research, we’ve identified two types of AI or-


ganizations: the data science organization and the machine
learning organization.

• The data science organization focuses on making


scientific decisions, and its business goal is usually to
help businesses run more effectively. The output of the
organization is often a set of actionable insights and
NOTES
its workflow includes (at a high-level) collecting data,
analyzing them, and suggesting hypotheses/actions. • Companies can have both data science and ma-
chine learning organizations. Some companies
• The machine learning organization focuses on
have hybrid organizations working toward both
automating tasks, and the business goal is to decrease
the data science and machine learning organiza-
operational costs or to scale a product. Automation
tions’ business goals.
is often the output, and the workflow includes (at
a high-level) collecting data, training models, and • AI organizations can be centralized or decen-
deploying them. tralized. Centralized AI organizations group AI
scientists and engineers to support non-AI teams,
while decentralized AI organizations are scattered
in different business units throughout a company.

5 Workera
PART 2

The AI project
development
lifecycle:Tasks
and Skills
From our research, most AI organizations’ work
divides into five tasks: data engineering, modeling,
deployment, business analysis, and AI infrastructure.
Together, these tasks make an AI project development
lifecycle. Each task requires specific skills and can be
the focus of multiple roles.

First, we’ll discuss the differences between machine


learning (ML) and data science (DS) projects
development lifecycle. Then, we’ll answer the
following questions:
• What are the goals of each task?
• What skill set is necessary to perform well in a
given task?
• Given individuals with different skill sets, who
should focus on which task?

6 Workera
Overview of the AI
Projects Development
Lifecycle

Here is a quick summary of ML and DS projects develop-


ment lifecycle.

The ML project development lifecycle

An ML project starts with (i) data on which you (ii) fit


models that will later be (iii) deployed into production.
A deployed model (iv) needs to be monitored and its
performance compared to the business goals. The (v) AI
infrastructure is necessary to support all tasks described
above (i, ii, iii and iv).

The DS project development lifecycle

A DS project starts with (i) data on which you can (ii) fit a
model. These models and (iii) other data analyses help you
make actionable business decisions. The (iv) AI infrastruc-
ture is necessary to support all tasks described above (i, ii
and iii).

ML and DS projects can be carried-out by the same


organization. Thus, we summarized both the ML and DS
projects development lifecycle in a template called the AI
project development lifecycle. A visual representation of
the AI project development lifecycle is presented on the
right column.

Now let's consider the tasks in an AI project one by one.


We will illustrate each task with concrete examples, and
identify the necessary technical skills to carry them out.

7 Workera
Data
Engineering

Data engineering aims to provide the necessary data to


achieve the modeling or business analysis task. Most of the
time, data engineering is done using database query lan-
guages (such as SQL) and object-oriented programming
languages (such as Python, C++ or Java). Big data tools
(such as Hadoop or Hive) are also commonly used.

Data engineering work includes:

8 Workera
Subtask Examples Technical Skills Involved

Creating a data model


Defining the features of high-quality data Machine Learning
Defining the data
Defining the covariates to be collected to achieve a desired functionality Business Acumen
requirements
Providing feedback regarding the clarity and completeness of data re- Software Engineering
quirements

Setting up a Mechanical Turk


Collecting data by manually taking images of cats
Machine Learning
Collecting data Coding a javascript tracker on a website to collect user data
Software Engineering
Scraping the Web, and if necessary, synchronizing data located in differ-
ent sources

Drawing bounding boxes on images


Building an automated labelling pipeline on a Mechanical Turk.
Labelling data Writing a labelling tutorial for workers Machine Learning
Evaluate the labelling performance of workers
Relabelling mislabelled data

Replacing all non-usable structured data records by NaN using a Python


library (e.g. pandas)
Inspecting and Machine Learning
Reformatting a data set (e.g. converting everything to jpeg and squaring
cleaning data Algorithmic Coding
all images)
Cleaning a text dataset (e.g,. removing special characters)

Writing a python script using skimage to rotate, warp, translate, or blur


images Machine Learning
Augmenting data
Using test-time augmentation to reduce the variance of an algorithm Algorithmic Coding
Synthesizing speech by overlaying distinct audio signals

Writing a script to allow online learning for a model


Moving data and Designing an ETL system Domain-specific (e.g,. Data Que-
building data Writing a script to preprocess training data and send it as input to a mod- ry) languages
pipelines el automatically
Writing a script to record model predictions in a database

Domain-specific (e.g., Data Que-


Querying data Pulling data from a database
ry) languages

Keeping track of the data sources


Tracing data Software Engineering
Setting-up a data version control system

9 Workera
Modeling

Modeling involves prototyping models to exploit patterns


found in data to predict outcomes, identify business risks
and opportunities, or determine cause-and-effect relation-
ships. Modeling is usually programmed in Python, R, Mat-
lab, C++, Java, etc. (though the dominant languages are
Python and R). To understand modeling, it helps to have
strong foundations in mathematics, data science, and ma-
chine learning. Deep learning skills are required by some
organizations, depending on their product focus. Deep
learning often empowers products leveraging computer
vision, natural language processing, or speech recognition.

Modeling work includes:

10 Workera
Subtask Examples Technical Skills Involved

Using one of the following methods: Linear Regression, Logistic Regres- Machine Learning
Training machine sion, Decision Trees, Random Forest, XGBoost, Support Vector Machines, Algorithmic Coding
learning models K-means, K-Nearest Neighbors, Neural Networks, Principal Component Mathematics
Analysis, Naive Bayes Classifier, Lasso/Ridge regression, etc. Data Science

Fitting probabi- Testing hypotheses via data experiments Data Science


listic or statistical Applying a dimensionality reduction on a dataset to facilitate training of a Algorithmic Coding
models model or gather insights Mathematics

Using deep learning for a domain-specific application such as object classi-


Deep Learning
fication, detection, segmentation, text summarization, machine translation,
Training deep Algorithmic Coding
speech recognition, etc.
learning models Mathematics
Extensively tuning hyperparameters involved in neural network optimiza-
Data Science
tion

Domain-specific languages (e.g.,


Accelerating
Setting-up code to train your model on multiple machines in parallel CUDA)
training
Algorithmic Coding

Defining evalua-
Choosing F1-score to evaluate a machine learning model’s performance on
tion metrics (usu- Machine Learning
a classification task
ally also involves Algorithmic Coding
Implementing evaluation metrics such as accuracy, precision, recall, inter-
a data product Mathematics
section over union, mean average precision (mAP), etc.
manager)

Applying techniques such as pruning, quantization or compression to re-


Speeding-up pre- Machine Learning
duce the memory requirements
diction time Algorithmic Coding
Running inference speed vs. accuracy experiments on a model

Translating a business problem into a machine learning problem. For


Iterate over the
instance, depending on the quality and quantity of accessible data, a better
virtuous cycle of Machine Learning
solution to the problem might come from an end-to-end or a pipeline
machine learning Business Acumen
network
projects: Idea,
Experiencing the three-step cycle of ideating with your team, coding to set
Code, Experiment.
up experiments, analyzing results

Organizing time effectively to run a maximum number of experiments in


Searching hyper- Machine Learning
the shortest time period
parameters Algorithmic Coding
Setting-up hyperparameter search experiments using tools such as AutoML

Research
Keeping up with Reading research papers Mathematics
the state-of-the-art Watching conference lectures or attending conferences Data Science
11 Machine Learning Workera
Deployment
Deployment includes all of the activities that make a
model available for use. Given a data stream (from the data
engineering task) and a model (from the modeling task),
individuals in charge of deployment will package and test
models before pushing them to production environments.
Deployment activities require the ability to write produc-
tion code, including strong back-end engineering skills
and understanding of cloud technologies.

Deployment work includes:

12 Workera
Subtask Examples Technical Skills Involved

Converting Refactoring an entire repository’s code


prototyped code Minimizing duplicate code
Software Engineering
into production Writing clean code to improve readability and consistency, for example,
code by following the PEP8 guidelines in Python

Setting up a Mastering cloud tools and infrastructure provided by Amazon AWS,


cloud environ- Microsoft Azure, Google Cloud, etc.
Software Engineering
ment to deploy Preparing files (usually model architecture and parameters) required for
the model deployment

Design a branching workflow. Using development, staging and produc-


Branching tion branches Software Engineering
Participating in or leading code reviews

Improving
response times Setting up load-balancing requirements with engineers in charge of AI
Software Engineering
and saving Infrastructure
bandwidth

Encrypting files
that store model
Understanding encryption at a high level and leveraging existing func-
parameters, Software Engineering
tions
architecture and
data

Building APIs
Setting up HTTP RESTful API services to facilitate communications
for an appli- Software Engineering
between software components
cation to use a
Setting up authorization and authentication to access the API
model

Retraining ma-
chine learning Software Engineering
Monitoring changes in data distribution and staging model updates
models (lifelong Machine Learning
learning)

Fitting mod-
els on a re- Pruning or quantizing a model so it fits memory requirements Software Engineering
source-con- Deploying a model on a mobile device using TensorFlow Machine Learning
strained device

13 Workera
Business
Analysis

Business analysis includes analytics, business activities


related to communicating with clients and colleagues,
thought leadership, and marketing. Working on business
analysis requires a foundational understanding of math-
ematics and data science for analytics. It also requires
strong communication skills and business acumen. Pro-
gramming languages such as R or Python can be helpful,
although many tasks can be carried out in a spreadsheet.

Business analysis work includes:

14 Workera
Subtask Examples Technical Skills Involved
Visualizing high-dimensional data in lower dimensions using methods such Domain-specific programming
as PCA or t-SNE languages
Building data
Building and presenting graphs produced using Tableau, ggplot or matplot- Data Science
visualizations
lib Mathematics
Building visualizations in Javascript, HTML and CSS Business Acumen

Building dash-
Writing a script that periodically notifies business leaders of trends in the
boards for Busi- Domain-specific programming
data
ness Intelligence

Presenting techni- Preparing presentations (e.g., PowerPoints decks)


Communication
cal work to clients Communicating effectively with team members
Business Acumen
or colleagues Giving technical talks to present research outcomes

Translating statis-
Data Science
tics into actionable Making marketing decisions based on analysis of various sources
Business Acumen
business insights

Plotting a correlation matrix to analyze covariates Data Science


Analyzing datasets Computing statistical variables such as mean, variance, mode, etc. Algorithmic Coding
Segmenting customers into groups Mathematics

Working with the deployment team to evaluate the business performance of


Running experi- Data Science
a deployed model
ments to evaluate Algorithmic Coding
Helping the deployment team make decisions
deployed models
Translating model performance into business outcomes (e.g., revenue)

Data Science
Optimizing web pages with A/B tests
Running A/B tests Algorithmic Coding
Evaluating systems in production
Business Acumen

15 Workera
AI Infrastructure
AI infrastructure aims to facilitate data engineering, mod-
eling, and deployment by building and maintaining reli-
able, fast, secure, and scalable software systems. Working
on AI infrastructure requires strong and broad software
engineering skills.

AI infrastructure work includes:

16 Workera
Subtask Examples Technical Skills Involved

Making software
Reducing latency by locating a model close to data Software Engineering
design decisions

Building distrib- Software Engineering


Building the databases (SQL, NoSQL, MySQL, Cassandra, etc.) that will
uted storage and Domain-specific (e.g,. Data Que-
store data and facilitating access by other team members
database systems ry) languages

Designing for scale Adding GPU compute or storage as needed Software Engineering

Maintaining soft- Managing software upgrades such as Python 2’s end of life on 01/01/2020,
Software Engineering
ware infrastructure and driving stability through automated monitoring and alerting

Networking Controlling access to all infrastructure elements Software Engineering

Securing data and Building security features allowing for production deployments into regulat- Software Engineering
models ed organizations, satisfying the needs for privacy and security

Writing unit and functional tests for multiple components across tasks of
Writing tests Software Engineering
the AI project lifecycle

Carrying out vari- Building a labeling software for a client, or key tools such as A/B testing
Software Engineering
ous software tasks frameworks or analysis environments

17 Workera
PART 3

The roles of
an AI team
There is no standard for roles in AI teams. Besides,
the lack of information about the supply of AI talent
makes it difficult for hiring managers to set
reasonable job requirements that correlate with
on-the-job performance. To bridge this gap, we
assessed the skills of thousands of individuals
aspiring to work in AI organizations and analyzed
hundreds of job descriptions for AI roles.

In part II, we defined the tasks carried out by AI


teams. In this section, we’ll introduce the different
roles of an AI team, their skill sets, and the tasks
they focus on. We hope that learning about these
roles will help you find a career track and prioritize
your learnings.

18 Workera
The Six Roles of
an AI Team

We identified six technical roles with distinct skill sets and a necessary step to enhance modeling, deployment, and
focus areas. Each of these roles contributes to a number of business analysis.
tasks in the AI development cycle.
For each role, we list the tasks it may carry out and the
All roles undertake (to a certain extent) the data engi- skills necessary to achieve those tasks.
neering task. That’s because data engineering is usually

Note:
• The dotted line indicates a less signifant involvement
with the task at hand. A Software Engineer - ML uses
out-of-the-box methods to carry out the modeling
task while an MLE, MLR or DS is able to customize
models.

19 Workera
Data
Scientist
TASKS

SKILL PROFILE

TOOLS DATA SCIENTISTS USE

• Modeling in Python using packages such as numpy,


scikit-learn, TensorFlow, PyTorch, etc.
• Data Engineering in Python and/or SQL (or other
domain-specific query languages). • Communication skills are usually required, but the
• Business Analysis in Python, R, other domain-spe- level depends on the team.
cific tools such as Tableau or Excel, and presentation • Terminology: Companies may refer to this position
software applications such as PowerPoint or Key- as data scientist, data analyst, machine learning
note. engineer, research scientist, statistician, quantitative
• Collaboration and Workflow using a version control analyst, full-stack data scientist, and other titles.
system (e.g., Git, Subversion, Mercurial, etc.) along
with a Command Line Interface (CLI) (e.g., UNIX)
and an Integrated Development Environment (IDE)
(e.g., Jupyter Notebook, Sublime, etc.).

20 Workera
Data
Analyst
TASKS

SKILL PROFILE

TOOLS DATA ANALYSTS USE

• Data Engineering in Python and/or SQL (or other


domain-specific query languages)
• Business Analysis in Python, R, other domain-spe-
cific tools such as Tableau or Excel, presentation • Our definition of a Data Analyst is specific to an
software applications such as PowerPoint or Key- AI organization. It is different from what is usually
note, and external software services for A/B testing referred to as a Business Analyst. The latter is less
quantitative and focuses on creating data pipelines,
cleaning data, and analyzing it. Data Analysts are
accomplished in query languages such as SQL and
commonly use spreadsheet software tools but don’t
need Algorithmic Coding skills.
• Communication skills are usually required, but the
level depends on the team.
• Terminology: Companies may refer to this position
as data scientist, research scientist, business analyst,
risk analyst, marketing analyst, and other titles.

21 Workera
Machine Learning
Engineer
TASKS

SKILL PROFILE

TOOLS MACHINE LEARNING


ENGINEERS USE
• There is a variant of the Machine Learning Engineer,
• Data Engineering in Python and/or SQL (or other called the Deep Learning Engineer, that requires
domain-specific query languages) deep learning knowledge in addition to the skills
• Modeling in Python using packages such as numpy, profile above. These engineers focus on applications
scikit-learn, TensorFlow, PyTorch, etc. usually powered by deep learning. Examples include
• Deployment using an object-oriented programming speech recognition, natural language processing and
language (e.g., Python, Java, C++, etc.) and cloud computer vision. Hence, they need skills specific to
technologies such as AWS, GCP, Azure, etc. deep learning projects such as understanding and
• Collaboration and Workflow using a version control using various neural network architectures (ful-
system (e.g., Git, Subversion, Mercurial, etc.), a ly-connected networks, CNNs, RNNs, etc.).
Command Line Interface (CLI) (e.g., UNIX), an • Although it depends on the team, communication
Integrated Development Environment (IDE) (e.g., skills and business acument aren't usually strong
Jupyter Notebook, Sublime, etc.) and an issue track- requirements.
ing product (e.g., JIRA) • Companies may refer to this position as: machine
learning engineer, software engineer - machine
learning, software engineer, data scientist, algorithm
engineer, research scientist, research engineer, full-
stack data scientist, and other titles.
22 Workera
Machine Learning
Researcher TASKS

SKILL PROFILE

• There is a variant of the Machine Learning Re-


TOOLS MACHINE LEARNING
searcher, called the Deep Learning Researcher, that
RESEARCHERS USE
requires deep learning knowledge in addition to the
skills profile above. These engineers focus on appli-
• Data Engineering in Python and/or SQL (or other
cations usually powered by deep learning. Examples
domain-specific query languages)
include speech recognition, NLP and computer
• Modeling in Python using packages such as numpy,
vision. Hence, they need skills specific to deep
scikit-learn, TensorFlow, PyTorch, etc.
learning projects such as understanding and using
• Collaboration and Workflow using a version control
various neural network architectures (fully-connect-
system (e.g., Git, Subversion, Mercurial, etc.), a
ed networks, CNNs, RNNs, etc.).
Command Line Interface (CLI) (e.g. UNIX), an
• Although not represented on the graph above, some
Integrated Development Environment (IDE) (e.g.,
machine learning researchers focus on deployment
Jupyter Notebook, Sublime, etc.) and an issue track-
(e.g., life-long learning, model memory optimization
ing product (e.g., JIRA)
for edge deployment, etc.) or AI infrastructure (e.g.,
• Research by following updates via channels such as
distributed training, scheduling, experiment, and
Twitter, Reddit, word of mouth, Arxiv, and various
resource management).
conferences (e.g. NeurIPS, ICLR, ICML, CVPR,
• Although it depends on the team, communication
ACM, etc.)
skills aren't usually a strong requirement.
• Companies may refer to this position as: machine
learning researcher, research scientist, research engi-
23 neer, data scientist, and other titles. Workera
Software Engineer -
Machine Learning
TASKS

SKILL PROFILE

TOOLS SOFTWARE ENGINEER -


MACHINE LEARNING USE
• There is a variant of the Software Engineer - Ma-
• Modeling in Python using packages such as numpy,
chine Learning, called the Software Engineer - Deep
scikit-learn, TensorFlow, PyTorch, etc.
Learning, that requires deep learning knowledge in
• Data Engineering in Python and/or SQL (or other
addition to the skills profile above. These engineers
domain-specific query languages).
focus on applications usually powered by deep
• Deployment and AI infrastructure using an ob-
learning. Examples include speech recognition,
ject-oriented programming language (e.g., Python,
natural language processing and computer vision.
Java, C++, etc.) and cloud technologies such as AWS,
Hence, they need skills specific to deep learning
GCP, Azure, etc.
projects such as understanding and using various
• Collaboration and Workflow using a version control
neural network architectures (fully-connected net-
system (e.g. Git, Subversion, Mercurial, etc.), a Com-
works, CNNs, RNNs, etc.).
mand Line Interface (CLI) (e.g., UNIX), an Integrat-
• Although it depends on the team, communication
ed Development Environment (IDE) (e.g., Jupyter
skills and business acument aren't usually strong
Notebook, Sublime, etc.) and an issue tracking
requirements.
product (e.g., JIRA).
• Companies may refer to this role as: machine
learning engineer, software engineer, full-stack data
scientist, and other titles.
24 Workera
Software
Engineer
TASKS

SKILL PROFILE

TOOLS SOFTWARE ENGINEERS USE

• Data Engineering in Python and/or SQL (or other


domain-specific query languages).
• AI infrastructure using an object-oriented program-
ming language (e.g. Python, Java, C++, etc.) and
cloud technologies such as AWS, GCP, Azure, etc. • Although it depends on the team, communication
• Collaboration and Workflow using a version control skills and business acument aren't usually strong
system (e.g. Git, Subversion, Mercurial, etc.), a Com- requirements.
mand Line Interface (CLI) (e.g. UNIX), an Integrat- • Companies may refer to this role as: data engineer,
ed Development Environment (IDE) (e.g., Jupyter software engineer, software development engineer,
Notebook, Sublime, etc.) and an issue tracking software engineer - AI Infrastructure, software
product (e.g., JIRA). engineer-data.

25 Workera
This report is a work in progress and is being
provided to the public for information purposes.
Because it is a work in progress, there are parts that
are either missing or will be progressively revised as
our team learns more about the supply of and the
demand for AI talent.

We welcome your comments and feedback.


Please send any comments and/or questions to
Kian Katanforoosh (kian@workera.ai).

- The Workera team

26 WORKERA Workera
a deeplearning.ai company

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy