0% found this document useful (0 votes)
29 views14 pages

MLOps A Taxonomy and A Methodology

The document presents a comprehensive review of Machine Learning Operations (MLOps), proposing a taxonomy to categorize existing research and methodologies for effectively implementing MLOps in industry. It outlines a ten-step ML pipeline aimed at streamlining the transition of ML models from research to production, emphasizing the importance of continuous integration, delivery, and monitoring. The paper highlights the growing significance of MLOps in addressing real-world challenges and aims to standardize practices to improve the efficiency of ML project execution.

Uploaded by

Thi Quế
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views14 pages

MLOps A Taxonomy and A Methodology

The document presents a comprehensive review of Machine Learning Operations (MLOps), proposing a taxonomy to categorize existing research and methodologies for effectively implementing MLOps in industry. It outlines a ten-step ML pipeline aimed at streamlining the transition of ML models from research to production, emphasizing the importance of continuous integration, delivery, and monitoring. The paper highlights the growing significance of MLOps in addressing real-world challenges and aims to standardize practices to improve the efficiency of ML project execution.

Uploaded by

Thi Quế
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/361669590

MLOps: A Taxonomy and a Methodology

Article in IEEE Access · June 2022


DOI: 10.1109/ACCESS.2022.3181730

CITATIONS READS

71 1,729

7 authors, including:

Emanuele Frontoni Giulio Iannello


University of Macerata Campus Bio-Medico University
404 PUBLICATIONS 7,670 CITATIONS 213 PUBLICATIONS 3,787 CITATIONS

SEE PROFILE SEE PROFILE

Paolo Soda Gennaro Vessio


Campus Bio-Medico University University of Bari Aldo Moro
217 PUBLICATIONS 2,907 CITATIONS 127 PUBLICATIONS 1,416 CITATIONS

SEE PROFILE SEE PROFILE

All content following this page was uploaded by Gennaro Vessio on 01 July 2022.

The user has requested enhancement of the downloaded file.


Received May 20, 2022, accepted June 1, 2022, date of publication June 9, 2022, date of current version June 20, 2022.
Digital Object Identifier 10.1109/ACCESS.2022.3181730

MLOps: A Taxonomy and a Methodology


MATTEO TESTI 1,2 , MATTEO BALLABIO 2 , EMANUELE FRONTONI3 , (Member, IEEE),
GIULIO IANNELLO 4 , (Member, IEEE), SARA MOCCIA5,6 , PAOLO SODA 4,7 , (Member, IEEE),
AND GENNARO VESSIO 8 , (Member, IEEE)
1 Integrated
Research Centre, Università Campus Bio-Medico di Roma, 00155 Rome, Italy
2 DeepLearningItalia,24129 Bergamo, Italy
3 VRAI Laboratory, Department of Political Sciences, Communication and International Relations, Università degli Studi di Macerata, 62100 Macerata, Italy
4 Department of Engineering, Unit of Computer Systems and Bioinformatics, Università Campus Bio-Medico di Roma, 00155 Rome, Italy
5 The BioRobotics Institute, Scuola Superiore Sant’Anna, 56127 Pisa, Italy
6 Department of Excellence in Robotics and AI, Scuola Superiore Sant’Anna, 56127 Pisa, Italy
7 Department of Radiation Sciences, Radiation Physics, Biomedical Engineering, Umeå University, 90187 Umeå, Sweden
8 Department of Computer Science, Università degli Studi di Bari Aldo Moro, 70121 Bari, Italy

Corresponding author: Matteo Testi (m.testi@deeplearningitalia.com)

ABSTRACT Over the past few decades, the substantial growth in enterprise-data availability and the
advancements in Artificial Intelligence (AI) have allowed companies to solve real-world problems using
Machine Learning (ML). ML Operations (MLOps) represents an effective strategy for bringing ML models
from academic resources to useful tools for solving problems in the corporate world. The current literature
on MLOps is still mostly disconnected and sporadic. In this work, we review the existing scientific literature
and we propose a taxonomy for clustering research papers on MLOps. In addition, we present methodologies
and operations aimed at defining an ML pipeline to simplify the release of ML applications in the industry.
The pipeline is based on ten steps: business problem understanding, data acquisition, ML methodology,
ML training & testing, continuous integration, continuous delivery, continuous training, continuous moni-
toring, explainability, and sustainability. The scientific and business interest and the impact of MLOps have
grown significantly over the past years: the definition of a clear and standardized methodology for conducting
MLOps projects is the main contribution of this paper.

INDEX TERMS MLOps, continuous monitoring, continuous integration, continuous delivery, continuous
training, XAI, sustainability.

I. INTRODUCTION ized production methods are required [10]–[12]. To indus-


In the last decades, Machine Learning (ML) has emerged as trialize ML models, a good set of production methods must
a powerful tool to solve complex real-world problems such be applied [13]. One of the key elements in facilitating the
as stock prediction [1], biomedical image analysis [2]–[4], development of industry-leading companies is to improve
autonomous driving [5], and fraud detection [6]. Since data communication between Science Technology Engineering
availability has reached levels never seen before, businesses Math (STEM) professionals and industry leaders or industry
around the world are working to leverage these data and pro- professionals by adopting a proven set of steps for industrial-
cess them automatically, exploiting the generalization power izing ML solutions [14], [15].
of ML to take actions and decisions [7]. Machine Learning Operations (MLOps) is a candidate
In most real-world applications, data are constantly chang- to define these standardized production methods [16], [17].
ing. This implies that ML models need to be retrained or, MLOps can be viewed as the iterative process of pushing
in the worst-case scenario, the entire ML pipeline has to the latest best ML models to production [18], [19]. In fact,
be rebuilt to tackle feature drift [8], [9]. A more frequent, conducting an MLOps project means supporting automa-
faster, and simpler release cycle helps meet any regulatory tion, integration, and monitoring at all stages of building an
or business changes. To achieve industrial growth, standard- ML system, including training, integration, testing, release,
deployment, and infrastructure management [20], [21].
The associate editor coordinating the review of this manuscript and MLOps was born from different fields: ML, Development
approving it for publication was Bilal Alatas . and Operations (DevOps), and data engineering (Fig. 1).

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
63606 VOLUME 10, 2022
M. Testi et al.: MLOps: A Taxonomy and a Methodology

FIGURE 1. MLOps develops upon Machine Learning, DevOps, and data


engineering.

Of the three fields, DevOps had the biggest impact on MLOps


development. DevOps is a method of thought and practice
that aims to improve and remove as much as possible the
friction between development and operations (implementa-
tion and integration), seeing them as a single process [22]. FIGURE 2. Traditional workflow of ML (top) vs. MLOps workflow (bottom).
The goal of DevOps is to study ways to improve service
quality and features to meet customer needs [23], [24]. The by analyzing existing literature from 2015 to 2022. Finally,
primary links between MLOps and DevOps are the concepts we propose our operational methodology to approach an
of continuous integration (CI) and continuous delivery (CD), ML project. As far as we know, this is the first effort to
which allow software to be produced in short cycles, ensuring systematize the literature on this topic and provide its opera-
that it can be reliably released at any time. tionalization.
When we look at how the current literature describes an The main difference between the operational methodology
ML project life-cycle, a picture like the one illustrated in proposed in this paper and the traditional workflow imple-
Fig. 2 (top) is often shown. In many companies, model devel- mented in many ML projects consists in the full integration
opment and operations are carried out manually and without of the various project steps to realize an effective, scal-
implementing MLOps. This slows down the industrialization able, but above all industrializable solution. In fact, in most
of ML methodologies. As ML models have zero Return on ML projects all forces are used for the development of an
Investment (ROI) until they can be used [25], [26], time to accurate ML model, without giving due importance to the
market should be the first metric to look at and optimize for integration and monitoring of the ML solution in the indus-
any ML project. The only way to improve the release and trial environment.
continuous use of ML solutions in the industrial environment The main motivation for proposing a methodology is to try
is to take great care of the part following the development to normalize each step to bring ML models from research to
of the model, in particular the interface between the ML production. Due to the growing interest, researchers are trying
solution and the existing Information and Communication to figure out each step of MLOps without involving business
Technologies (ICT) system. In fact, the most time-consuming partners in defining each step. This results in a misalignment
step in releasing an ML solution into production is Operations in the definition of the MLOps issues without having a clear
(Fig. 2 (bottom)). It is worth noting that, although MLOps vision from the transition from research to production up to
fosters process automation, its main goal is not to optimize the maintenance of the models. By following a clear method-
business [27]. ology, teams can have a deeper overview of all processes
and organize each part of a project in a better and more
II. OBJECTIVES systematic way.
The main objective of this paper is to provide a literature The rest of this paper is structured as follows: Section III
review on MLOps to highlight current challenges in building reviews the related literature; Section IV presents the pro-
and maintaining an ML system in a production environ- posed MLOps workflow; Section V concludes the paper and
ment [28]. At the same time, we aim at giving an overview suggests high-level directions for further research.
of why MLOps was introduced to translate ML systems into
production [29], [30]. To this end, we selected papers and III. PROPOSED TAXONOMY
projects in the field of MLOps and propose a taxonomy to As introduced in Section I, MLOps initiatives aim to estab-
understand the work done so far. We identify key concepts lish resilient and efficient workflows by creating robust

VOLUME 10, 2022 63607


M. Testi et al.: MLOps: A Taxonomy and a Methodology

pipelines [31], established practices [32], and auxiliary methodologies and try to apply them to the MLOps pipeline.
frameworks and tools. Indeed, model development is only a The process typically involves two teams: ML scientists
small part of the overall process, and many other processes, responsible for model training and testing, and ML engineers
configurations, and tools need to be integrated into the sys- responsible for production and deployment. MLOps pipeline
tem [33]. Bringing the application of DevOps techniques in automation with CI/CD routines is as follows:
the context of continuous training (CT), CD, CI, and con-
• Business problem analysis;
tinuous monitoring (CM) is among the main requirements
• Dataset features and storage;
of an ML project that aims to provide process automation,
• ML analytical methodology;
governance, and agility.
• Pipeline CI components;
In the literature, several projects have tried to tackle var-
• Pipeline CD components;
ious aspects of the ML production process by expanding
• Automated ML triggering;
existing libraries or by creating new tools to enhance the
• Model registry storage;
quality and performance of specific processes or make them
• Monitoring and performance;
more insightful. Up to now, there is no standardized and
• Production ML service.
common pipeline to follow for an end-to-end MLOps project.
To cluster the different approaches, we propose the following One of the points of greatest attention after CI and CD
taxonomy: is monitoring, in terms of metrics and Key Performance
1) ML-based software systems, also known as model- Indicators (KPIs), and the continuous deployment of mod-
centric frameworks. These systems focus on the archi- els. This part includes model performance, data monitoring,
tecture of ML models with a view to (CI/CD) [23], outlier detection, and explanations of historical predictions.
[34], [35]. The goal of such systems is twofold: on the Continuous monitoring is a process that allows understanding
one hand, it is to create and automate ML pipelines; in real-time when validation performance tends to decrease.
on the other hand, the goal is to increase the level of Outlier detection is the key to trusting and keeping the model
automation in the ML software life-cycle [36]. healthy. Therefore, the most important function of continuous
2) ML use case applications where, for example, papers monitoring is to ensure high model performance and KPIs
explain an MLOps workflow to foster collaboration used to validate models. There are many metrics to test the
and negotiation between surgeon and patient [37], [38] quality of a model, such as precision, recall, F1, and MSE.
or the ML pipeline on the Cloud for drug However, these metrics evaluate a model in the laboratory,
discovery [39]. regardless of the real-world context of how the model will
3) ML automation frameworks such as MLFlow [40], be used. When evaluating ML models in the context of real
Kedro [41] or Amazon SageMaker [42], and bench- applications, model performance metrics are not enough to
marking frameworks such as MLPerf [43], MLMod- establish the robustness of the models. The most basic step
elScope [44] and Deep500 [45]. These are interesting towards supporting such KPI-based analytics is to ensure that
commercial tools that are already being used in daily KPIs and model metrics are stored with a common correlation
work practice and represent excellent ML framework ID to identify which model operations contributed to transac-
automation solutions. tions with a particular KPI score [36]. Other important KPIs
The following subsections review in more detail the works at the company level for evaluating the performance of the
that fall into the three categories. model can be: time-to-market, infrastructure cost, scalability,
and profitability indices on sales (ROS) [47]. Unfortunately,
A. ML-BASED SOFTWARE SYSTEMS ML models often fail to generalize outside the training data
Machine Learning is becoming the primary approach to distribution [48].
solving real-world problems. Therefore, there are many data Finally, the trust in the ML project is the model expla-
science teams studying how to apply DevOps principles nation. Explainability allows users to trust the prediction
to industries. The ML life-cycle involves manual steps for and this improves transparency. The user can verify which
deploying the ML pipeline model. This method can produce factors contributed to certain predictions, introducing a layer
unexpected results due to the dependency on data, prepro- of accountability [35]. The terms ‘‘explainability’’ and ‘‘inter-
cessing, model training, validation, and testing. The idea is to pretability’’ are being used interchangeably throughout the
design an automated pipeline using two DevOps principles literature; however, in the case of an AI-based system,
which are CI and CD. The functionality of CI is to test and explainability is more than interpretability in terms of impor-
validate data, data schemes, and models. CD is for an ML tance, completeness, and fidelity of predictions or clas-
pipeline that should automatically deploy another ML ser- sifications [49]. Explainable Artificial Intelligence (XAI)
vice [23]. The ML life-cycle has different methodologies to fit is a research trend that promotes explainable decision-
different scenarios and data types. The approach most used by making. Many real-world ML applications greatly increase
data mining experts is CRoss-Industry Standard Process for the efficiency of industrial production from automated equip-
Data Mining (CRISP-DM) [46], introduced in 1996 by Daim- ment and production processes [50]. However, the use of
ler Chrysler. Experts can borrow the standard CRISP-DM ‘‘black-boxes’’ has not yet been overcome due to the lack of

63608 VOLUME 10, 2022


M. Testi et al.: MLOps: A Taxonomy and a Methodology

explainability and transparency of the models and decisions


still present [51].

B. ML USE CASE APPLICATIONS


One of the most difficult challenges is using ML in real-world
applications where the focus is on system integration and
scaling. The setup of MLOps use cases is continuous train-
ing, continuous integration, and continuous deployment [52],
where new versions of the ML system can be deployed in
running software. In this section, we present a case study to
understand what a workflow looks like in an MLOps project.
The use case concerns Oravizio [38], a software product
that provides data-driven information on patient-level risks
related to hip and knee joint replacement surgery. Oravizio
helps the collaboration and negotiation between the surgeon
and a patient so that the decisions that are taken are informed
and there is consent to the operation.
Oravizio provides three different dedicated prediction
models:
• Risk of infection within one year from surgery;
• Risk of revision within two years from surgery;
• Risk of death within two years from surgery.
In the case of Oravizio, data were collected over the years,
including 30, 000 medical records, from patients who have
undergone surgery. Since the number of cases is so large that
no surgeon can process them manually during the appoint-
ment, these data have been used to create a risk calcula-
tion model that predicts the outcome of the surgery. The
various formats of the data were one of the issues during
pre-processing to create a standard for later analysis [37].
Once the data are standardized, an ML model can be cre-
ated for each risk to enable validation and ensure regula-
tory compliance. The models selected to be trained for this
task were Logistic regression, Random forest, XGBoost, and
Weibull/Cox survival mode. According to the results, gradi-
ent boosting with XGBoost produced the best performance
and can be selected for use in production [38].
As shown in Fig. 3, these models are usually re-trained
during the life-cycle of an ML product. We have new data and
this entails continuous training to improve accuracy. We also
have continuous delivery in terms of deploying new mod-
FIGURE 3. Workflow of Oravizio.
els and continuous monitoring, which has two faces: some
indexes to track accuracy for data science analysis, and some
KPI or different indexes from the business or clinical side an ML algorithm and the run-time continuous training of the
to help understand the model and whether this approach can algorithm to improve performance, but what is missing is a
improve the business. methodological guideline.
Unfortunately, there are no other use cases available in
the literature that have a clear pipeline of MLOps where it C. ML AUTOMATION FRAMEWORKS
is clearly explained the process from problem understanding To have a business impact, ML applications need to be
to model deployment and continuous training, delivery, and deployed in production, which means deploying a model
monitoring. For example, in the case of the Uffizi Gallery in in a way that can be used for inference (e.g., REpresenta-
Florence [34], one of the most visited museums in Italy with tional state transfer (REST) services) and deploying sched-
over 2 million visitors, the project aims to reduce the queue uled jobs to update the model regularly. This is especially
using ML but we do not have a clear set-up of the MLOps challenging when deployment requires collaboration with
workflow. In the article in question, the authors talk about another team, such as application engineers who are not ML
the chosen architecture, the reason why it was decided to use experts, or when the ML team uses different libraries or

VOLUME 10, 2022 63609


M. Testi et al.: MLOps: A Taxonomy and a Methodology

TABLE 1. Some popular data labeling tools. TABLE 3. Some popular data versioning tools.

TABLE 2. Pros and cons of some popular data labeling tools.


TABLE 4. Pros and cons of some popular data versioning tools.

frameworks [40]. ML projects have created new challenges TABLE 5. Some popular feature engineering tools.

that are not present in traditional software development. One


of these includes tracking input data, data versions, tuning
parameters, and so on, to keep production deployment up-
to-date [53]. In this section, we want to summarize these
challenges and describe some of the most popular ML frame-
works like MLflow, Kubeflow, MLPerf, etc. [54].
MLOps frameworks can be divided into three main
areas [55] dealing with:
• Data management; 2) MODELLING
• Modelling; In Table 5 and 6, we present feature engineering tools that
• Operationalization. allow adding automation to the process of extracting useful
features from raw datasets to create better training data [59].
1) DATA MANAGEMENT These tools help speed up the process of feature engineering
Data labeling tools (Table 1) are used to help the data science and extraction and create better training data for ML models.
team to label large datasets such as texts, images, etc. [56], Developing ML projects involves running multiple exper-
[57]. Labeled data are used to train supervised ML algo- iments with different models, model parameters, or training
rithms. We provide an overview of some data labeling tools data. Experiment tracking tools save all necessary informa-
and advantages and disadvantages in Table 2. tion about different experiments [60]. This allows to track the
Data versioning tools (Table 3), on the other hand, are versions of experiment components and results and allows for
used by data science and data engineering teams to manage comparison between different experiments. Some examples
different versions of models and datasets [58]. This helps of experiment tracking tools are shown in Table 7. In Table 8,
data science teams gain insights, such as identifying how data a summary of their pros and cons is presented.
changes impact model performance and understanding how Hyperparameters are the main part to get better models.
datasets evolve. An overview of some popular data versioning These are the parameters of the ML training algorithms such
tools along with pros and cons are shown in Table 4. as the learning rate, the type of regularization applied, and so

63610 VOLUME 10, 2022


M. Testi et al.: MLOps: A Taxonomy and a Methodology

TABLE 6. Pros and cons of some popular feature engineering tools. TABLE 9. Some popular hyperparameter tuning tools.

TABLE 10. Pros and cons of some popular hyperparameter tuning tools.

TABLE 7. Some popular experiment tracking tools.

TABLE 11. Some popular model deployment tools.

TABLE 8. Pros and cons of some popular experiment tracking tools.

data drift and anomalies over time and allow to set alerts in
case of performance issues. An overview of some popular
data monitoring tools is provided in Table 13 and in Table 14,
with advantages and disadvantages.
There are also tools that cover the end-to-end ML life-
cycle [66]. Some popular platforms are shown in Table 15
and in Table 16 with advantages and disadvantages.

IV. PROPOSED MACHINE LEARNING OPERATIONS


on. Hyperparameter tuning tools help automate the process METHODOLOGIES
of searching and selecting the optimal hyperparameters that In this section, we provide our methodology for an MLOps
perform better [61], [62]. Popular hyperparameter tuning project that aims to unify the lessons learned from the liter-
tools are shown in Table 9 and 10. ature review into a single framework. The main difference
from the other frameworks is that we are trying to create a
3) OPERATIONALIZATION new standard for ML projects inspired by CRISP-DM that
ML model deployment tools facilitate the integration and helps strengthen the link between research and industries.
deployment of ML models into production [63]. Some tools Below, the different stages of the proposed MLOps process
with advantages and disadvantages for each software are are described. Figure 4 provides a schematic overview.
shown in Table 11 and Table 12.
ML model monitoring is another important part of a suc- A. BUSINESS PROBLEM UNDERSTANDING
cessful ML project because ML model performance tends to Establishing a business understanding and the success criteria
decay after model deployment due to changes in the input data for solving the problem under study is the first step in an
stream over time [64], [65]. Model monitoring tools detect ML project [67]. Business understanding is a non-technical

VOLUME 10, 2022 63611


M. Testi et al.: MLOps: A Taxonomy and a Methodology

TABLE 12. Pros and cons of some popular model deployment tools. TABLE 14. Pros and cons of some popular model monitoring tools.

TABLE 15. Some popular ML life-cycle tools.

TABLE 13. Some popular model monitoring tools.

TABLE 16. Pros and cons of some popular ML life-cycle tools.


phase and, for this reason, communication between data sci-
entists and business experts is the main part of identifying
the business problem. During this phase, it is essential to
map the processes, systems, key data elements, and policy
documentation for the key domains expressed in the business
problem. This information is often created and maintained by
the data governance team with an enterprise data governance.
The initial step is gathering requirements and clearly defin-
ing the objectives and key results (OKR). In this part, data
scientists should discuss with business experts to determine
if ML can really help. For each of the OKRs, it is necessary
to define one or more KPIs [68]. These KPIs need to be
documented for future reference and will be critically useful
in ensuring that the project delivers the expected value. The is identified, the data engineer builds the pipeline that makes
KPIs must match the metrics (MSE, accuracy, etc.) used by the data available to the data scientist. The data engineer
the data science team to understand how model improvement performs the preliminary cleaning and validation steps so that
impacts the business. The definition and documentation of there is a sufficient amount of high-quality data to meet the
business problems provide a key context for the subsequent data scientist’s needs.
phases, helping to distinguish relevant data, defining how data The tasks for data acquisition can be summarized as
maps into the model (both during training and deployment), follows:
and identifying which dimensions of the model performance • Data Extraction: select and integrate the data relevant to
should be monitored once the model is in production and the ML task.
according to what criteria [69]. • Data Analysis: exploratory data analysis to understand
the data schema and the characteristics expected by the
B. DATA ACQUISITION model.
During data acquisition, the goal is simply to collect enough • Data Preparation: identify the data preparation and fea-
data to train the ML model to get the first solution [70]. ture engineering required for the model. This prepara-
The data scientist identifies information in terms of fea- tion involves data cleaning and splitting into training,
tures/attributes presented for a specific business problem. validation, and test set. Data transformation and feature
These aspects should be discussed with a field-expert data engineering also apply to the model that solves the target
engineer to identify potential data sources. Once the dataset task. The output of this step is data ready in the prepared

63612 VOLUME 10, 2022


M. Testi et al.: MLOps: A Taxonomy and a Methodology

D. ML TRAINING AND TESTING


The process of training and optimizing a new ML model
is an iterative process in which data scientists test several
algorithms, features, and hyperparameters. Once the best ML
models have been chosen, they are re-trained and tested. The
models are evaluated using different validation methods such
as:
• Holdout validation, this is a type of external validation
in which the dataset is split into two randomly sized
subgroups.
• Cross-validation, in which the original sample is ran-
domly partitioned into k equal-sized subgroups. Of the
k subgroups, one subsample is kept as a testing dataset
and k − 1 as training.
• Bootstrap validation, in which we resample the dataset
with replacement producing new datasets with the same
number of instances as the initial dataset.
The output of this step is a set of metrics for evaluating
the quality of the model. Once this iteration is complete,
the weights of the best models are saved and deployed
using an API infrastructure. Training and testing an ML
system is integration, data validation, trained model qual-
ity evaluation, and model validation. The main goal is to
keep track of all experiments and maintain reproducibility
while maximizing code reusability [71]. We have seen that
there exist different tracking tools which can simplify the
process of storing the data, the features selected, and model
parameters along with performance metrics. These allow to
FIGURE 4. Proposed MLOps workflow.
compare the differences in performance and aid the repro-
ducibility of the experiments. Without reproducibility, data
scientists are unable to deliver the model to DevOps to see if
format. For example, NULL values are converted to what was created in the lab can be faithfully reproduced in
zero, or outliers are excluded from the dataset. production [72].
When there is not enough data to train the model, two main
methodologies allow to bypass the problem:
• Data Augmentation is a technique that allows increasing E. CONTINUOUS INTEGRATION
the number of data available by inserting copies of the Continuous integration is a well-established development
data (e.g., in the case of images, we use the same rotated, practice in the software development industry [52] and
enlarged, blurred, etc.). is the first step in starting the continuous delivery jour-
• Transfer Learning, which allows reusing most of the ney. CI enables companies to have frequent releases, and
weights of a neural network already trained on a similar improve software quality and teams’ productivity [73].
problem. This practice includes automated software building and
testing [74].
C. ML METHODOLOGY In the continuous integration pipeline, we build source
After data acquisition, selecting the best ML algorithms to code and run various ML trained models. The outputs of this
solve the problem is a key part of the ML project. Usually, the stage are components (packages and artifacts) to be deployed
data science team studies the state-of-the-art for the specific in the pre-production/production environment of continuous
problem and tries a bottom-up approach to solving it. ML is delivery [75]. The ML code is a small portion of a real
experimental by nature, trying different features, models, ML system because an important component is the infrastruc-
parameters and hyperparameter configurations to find what ture, configuration, and data elaboration. Continuous integra-
works best. The bottom-up approach typically consists in tion for ML systems relies on having a substantial impact
trying different models with increasing degrees of complexity on the end-to-end pipeline to automate the delivery of the
until reaching the best one. This methodology helps data sci- ML models with minimal effort. The main steps for continu-
entists to start with simple models before trying to implement ous integration are [22]:
complex ones. • Source code management (SCM);

VOLUME 10, 2022 63613


M. Testi et al.: MLOps: A Taxonomy and a Methodology

• Push/pull changes to the repository to trigger a continu- G. CONTINUOUS TRAINING


ous delivery build; During continuous training, we need to keep storing more
• Check the latest code and associated data version from data and setting up the data in the same way we train our
the data repository storage; model. This means detecting outliers to understand when
• Running of the unit tests; the data distribution diverges from the training data. CT is
• Building/running of the ML model; concerned with automatically retraining and serving mod-
• Testing and validation; els [84]. Continuous training is a part of MLOps which auto-
• Packaging of the model and building of the container matically and continuously retrains models before they are
image; redeployed.
• Pushing of the container image to the registry. To design a continuous training strategy, we should answer
Several software tools have been used for the deployment the following questions [85]:
of ML models such as Jenkins [76], Git [77], Docker [78], • When should a model be retrained?
Helm [79], and Kubernetes [80]. Then, to summarize, the – Periodic training.
pipeline and its components are built, tested, and packaged – Performance-based trigger.
when new code is committed or pushed to the source code – Trigger based on data changes.
repository. CI is testing and validating code, dataset, data – Retraining on demand.
schemas, and models. The validated model is deployed to a • How much data is needed for retraining?
target environment to provide predictions. This deployment – Fixed window.
can be one of the following: – Dynamic window.
• Microservices with a REST API to provide online – Representative subsample selection.
predictions; • What should be retrained?
• A model embedded into an edge or mobile device; – Continual learning vs. transfer learning.
• Part of a batch prediction system. – Offline (batch) vs. online (incremental) learning.
• When to deploy the model after retraining?
– A/B testing.
F. CONTINUOUS DELIVERY
Continuous delivery has the goal to ensure that an appli- H. CONTINUOUS MONITORING
cation is always in a production-ready state after success- The main objective during the monitoring stage is to man-
fully passing the automated tests and quality checks [81]. age the risks of the in-production models by checking for
The object of the deployment stage is to enable a seam- performance drift [86] and alerting an operator that model
less roll-out of new models, with the lowest possible risk. accuracy has dropped. The model predictive performance is
Best practices in the continuous delivery of software services monitored to potentially invoke a new iteration in the ML
involve the use of safe deployment techniques, such as A/B process. Once the model has been deployed to production,
tests. CD is an ML pipeline that should automatically deploy it still needs continuous validation or testing because pat-
model services. CD employs a set of practices such as CI, terns in the data can change over time. The model may
and deployment automation to automatically deliver software become less accurate because the data used in training the
in production [82]. CD is a push-based approach [83] and model are no longer representative of the new data existing
this practice has reduced deployment risk, lowered costs, and in production [71]. Performance monitoring not only affects
gained user feedback faster. the quantitative performance metrics. Therefore, during the
In this phase, the construction of artifacts takes place, continuous monitoring, both metrics and the KPIs from the
which were produced by previous continuous integration technical part to the business part must be taken under control.
in the staging/pre-production/production environment. Test
models are obtained from this phase. The components of the I. EXPLAINABLE AI
CD pipeline are summarized as follows: Deep Learning methods [87] now dominate benchmarks on
• Staging environment: deploying the trained ML model different tasks and achieve superhuman results. This improve-
first in a staging environment is a standard operation in ment has often been achieved through increased model com-
ICT. The output of this step is a test model that is pushed plexity. Once these models have become a real application in
into the model registry archive. production, the community has started studying the ‘‘explain-
• Model register archiving: necessary to define an ability’’ of the models to answer business questions. Explain-
archiving location where ML models in staging state and ability can be defined as ‘‘the degree to which a human can
ML models in production state are loaded. understand the cause of a decision’’ [88]. Explainability is
• Automatic activation: this step is performed auto- mostly connected with the intuition behind the outputs of a
matically according to a schedule or a response in model [89]; therefore, an ML system is explainable when it
the production environment. The output of this phase is easier to identify cause-and-effect relationships within the
is a test model that is pushed into the staging system inputs and outputs. For example, in image recognition
environment. tasks, part of the reason that led a system to decide that a

63614 VOLUME 10, 2022


M. Testi et al.: MLOps: A Taxonomy and a Methodology

specific object is part of an image (output) could be certain This paper is intended as a literature review to systematize
dominant patterns in the image (input). The more explainable and add clarity to the definition and methods of MLOps.
a model is, the greater the understanding practitioners get in The paper aims to define a high-level strategy for dealing
terms of internal business procedures that take place while with MLOps projects; the goal of future work is to apply
the model is making decisions. An explainable model does our proposed methodology to use cases such as biomedical
not necessarily translate into one that humans can understand imaging and finance. Experimental work will be required to
(internal logic or underlying processes) [90]. The explainabil- test the pipeline defined in this manuscript.
ity of the model allows the user to build trust in the predictions Traditionally, data preparation, model training and testing,
made by the deployed system and improve transparency. The and performance comparison are key points of traditional
user can verify which factors contributed to certain predic- pipelines. In this work, we have stressed the importance of
tions, introducing a layer of accountability [15]. many other, no less important aspects, such as continuous
monitoring, sustainability issues, etc. Following well-defined
J. SUSTAINABILITY: CARBON FOOTPRINT guidelines is the only way to allow the traceability and repro-
The increasingly common use of Deep Learning models in ducibility of the results obtained in an Open Science context.
real-world projects, as the other side of the medal, corre- For this reason, it is crucial to use systematic procedures for
sponded to immense growth in the computation and energy greater cohesion in the scientific community to follow clear
required [91]. If this growing trend continues, Deep Learning and clean pipelines in MLOps. The remaining challenge for
could become a significant contributor to climate change. the community is to try to apply an ML methodology to
This trend can be mitigated by exploring how to improve an end-to-end use case trying to go through each point of
energy efficiency in the DL models [92]. Hence, data scien- this methodology and show what happens if some phases
tists need to know their energy and carbon footprint, so that are not used. Specific areas, such as biomedicine, finance,
they can actively take steps to reduce them whenever possible. cyber-security, manufacturing [97], can greatly benefit from
Carbon footprint is a measure of the total exclusive amount adopting MLOps, and we believe the pipeline defined in this
of carbon dioxide emissions that are directly and indirectly paper can bring advantages over traditional practices.
caused by an activity or accumulated during the life stages of According to Fortune Business Insights, the global
a product [93]. Machine Learning market is expected to grow from
Strubell et al. selectively focused on carbon footprint anal- $15.50 billion in 2021 to $152.24 billion in 2028 with
ysis on AI models for natural language processing [94]. For a compound annual growth rate of 38.6% over the fore-
example, the training of an NLP Transformer model was esti- cast period. MLOps aims to create long-term ML solutions,
mated to be equivalent to that of a commercial flight between reducing maintenance costs, and monitoring and optimizing
San Francisco and New York. The publication of these esti- workflows. Understanding and intercepting new challenges
mates has had a significant effect in the scientific world. and trends such as the emerging MLOps will provide a
Following the publication of these data, the 2020 White strong competitive advantage to companies adopting this
Paper on AI released by the European Commission has solution [98]
called for actions that go beyond the collection of impressive
but admittedly anecdotal data about the training of selected ABBREVIATION TERMS
AI systems [95]. For this reason, it is necessary to calculate ML Machine Learning.
the carbon footprint of each individual AI system and the MLOps Machine Learning Operations.
AI sector [96]. AI Artificial Intelligence.
It is important to emphasize that, during the MLOps life- XAI eXplainable AI.
cycle, carbon footprint should be taken into account when STEM Science, Technology, Engineering
and Mathematics.
choosing models. It should be better to take a bottom-up
DevOps Development Operations.
approach trying the first simple models without jumping to DL Deep Learning.
the state-of-the-art with complex and expensive models. The ROI Return on Investments.
same approach is to calculate the carbon footprint during CI Continuos Integration.
training and testing, but also during continuous integration, CD Continuos Delivery.
continuous delivery, and continuous training. CT Continuos Training.
CRISP-DM CRoss-Industry Standard
V. CONCLUSION Process for Data Mining.
In this paper, we have provided an overview of approaches KPI Key Performance Indicator.
in the literature using MLOps: we have provided a taxon- MSE Mean Squared Error.
omy of the current literature and proposed a methodology ROS Return on Sales.
for addressing MLOps projects. The application of DevOps REST REpresentational State Transfer.
principles to ML and the use of MLOps in the industrial OKR Objective and Key Result.
environment are still little discussed topics at the academic API Application Programming Interface.
level. Current literature is mostly disconnected and sporadic. NLP Natural Language Processing.
VOLUME 10, 2022 63615
M. Testi et al.: MLOps: A Taxonomy and a Methodology

REFERENCES [25] J.-P. Correa-Baena, K. Hippalgaonkar, J. van Duren, S. Jaffer,


[1] R. Akita, A. Yoshihara, T. Matsubara, and K. Uehara, ‘‘Deep learning V. R. Chandrasekhar, V. Stevanovic, C. Wadia, S. Guha, and T. Buonassisi,
for stock prediction using numerical and textual information,’’ in Proc. ‘‘Accelerating materials development via automation, machine learning,
IEEE/ACIS 15th Int. Conf. Comput. Inf. Sci. (ICIS), Jun. 2016, pp. 1–6. and high-performance computing,’’ Joule, vol. 2, no. 8, pp. 1410–1420,
[2] M. C. Fiorentino, E. Cipolletta, E. Filippucci, W. Grassi, E. Frontoni, and Aug. 2018.
S. Moccia, ‘‘A deep-learning framework for metacarpal-head cartilage- [26] J. Mizgajski, A. Szymczak, M. Morzy, Ł. Augustyniak, P. Szymański, and
thickness estimation in ultrasound rheumatological images,’’ Comput. P. Żelasko, ‘‘Return on investment in machine learning: Crossing the chasm
Biol. Med., vol. 141, Feb. 2022, Art. no. 105117. between academia and business,’’ Found. Comput. Decis. Sci., vol. 45,
[3] M. Jamshidi, A. Lalbakhsh, J. Talla, Z. Peroutka, F. Hadjilooei, no. 4, pp. 281–304, Dec. 2020.
P. Lalbakhsh, M. Jamshidi, L. La Spada, M. Mirmozafari, M. Dehghani, [27] W. EckersonGroup, ‘‘Eckerson,’’ Tech. Rep., 2022.
A. Sabet, S. Roshani, S. Roshani, N. Bayat-Makou, B. Mohamadzade, [28] Y. Zhao, ‘‘Machine learning in production: A literature,’’ Tech. Rep.,
Z. Malek, A. Jamshidi, S. Kiani, H. Hashemi-Dezaki, and W. Mohyuddin, 2021.
‘‘Artificial intelligence and COVID-19: Deep learning approaches for [29] E. D. S. Nascimento, I. Ahmed, E. Oliveira, M. P. Palheta, I. Steinmacher,
diagnosis and treatment,’’ IEEE Access, vol. 8, pp. 109581–109595, 2020. and T. Conte, ‘‘Understanding development process of machine learning
[4] M. B. Jamshidi, A. Lalbakhsh, J. Talla, Z. Peroutka, S. Roshani, systems: Challenges and solutions,’’ in Proc. ACM/IEEE Int. Symp. Empir-
V. Matousek, S. Roshani, M. Mirmozafari, Z. Malek, L. L. Spada, and ical Softw. Eng. Meas. (ESEM), Sep. 2019, pp. 1–6.
A. Sabet, ‘‘Deep learning techniques and COVID-19 drug discovery: Fun- [30] J. Dalzochio, R. Kunst, E. Pignaton, A. Binotto, S. Sanyal, J. Favilla, and
damentals, state-of-the-art and future directions,’’ in Emerging Technolo- J. Barbosa, ‘‘Machine learning and reasoning for predictive maintenance
gies During the Era of COVID-19 Pandemic. Cham, Switzerland: Springer, in Industry 4.0: Current status and challenges,’’ Comput. Ind., vol. 123,
2021, pp. 9–31. Dec. 2020, Art. no. 103298.
[5] S. Grigorescu, B. Trasnea, T. Cocias, and G. Macesanu, ‘‘A survey of deep [31] L. Scotton, ‘‘Engineering framework for scalable machine learning opera-
learning techniques for autonomous driving,’’ J. Field Robot., vol. 37, no. 3, tions,’’ Tech. Rep., 2021.
pp. 362–386, Apr. 2020. [32] E. Breck, S. Cai, E. Nielsen, M. Salib, and D. Sculley, ‘‘The ML
[6] A. Roy, J. Sun, R. Mahoney, L. Alonzi, S. Adams, and P. Beling, ‘‘Deep test score: A rubric for ML production readiness and technical debt
learning detecting fraud in credit card transactions,’’ in Proc. Syst. Inf. Eng. reduction,’’ in Proc. IEEE Int. Conf. Big Data (Big Data), Dec. 2017,
Design Symp. (SIEDS), Apr. 2018, pp. 129–134. pp. 1123–1132.
[7] J. Frizzo-Barker, P. A. Chow-White, M. Mozafari, and D. Ha, ‘‘An empir- [33] D. Sculley, G. Holt, D. Golovin, E. Davydov, T. Phillips, D. Ebner,
ical study of the rise of big data in business scholarship,’’ Int. J. Inf. V. Chaudhary, M. Young, J.-F. Crespo, and D. Dennison, ‘‘Hidden tech-
Manage., vol. 36, no. 3, pp. 403–413, Jun. 2016. nical debt in machine learning systems,’’ in Proc. Adv. Neural Inf. Process.
[8] T. Dybå and T. Dingsøyr, ‘‘Empirical studies of agile software devel- Syst., vol. 28, 2015, pp. 2503–2511.
opment: A systematic review,’’ Inf. Softw. Technol., vol. 50, nos. 9–10, [34] H. Muccini and K. Vaidhyanathan, ‘‘Software architecture for ML-based
pp. 833–859, 2008. systems: What exists and what lies ahead,’’ 2021, arXiv:2103.07950.
[9] T. J. Gandomani, Z. Tavakoli, H. Zulzalil, and H. K. Farsani, ‘‘The role of [35] J. Klaise, A. Van Looveren, C. Cox, G. Vacanti, and A. Coca, ‘‘Monitoring
project manager in agile software teams: A systematic literature review,’’ and explainability of models in production,’’ 2020, arXiv:2007.06299.
IEEE Access, vol. 8, pp. 117109–117121, 2020. [36] M. Arnold, J. Boston, M. Desmond, E. Duesterwald, B. Elder, A. Murthi,
[10] Z. Luqing, ‘‘Research on the impact of standardization on economic J. Navratil, and D. Reimer, ‘‘Towards automating the AI operations lifecy-
growth,’’ Management, vol. 9, no. 6, pp. 236–241, 2021. cle,’’ 2020, arXiv:2003.12808.
[11] K. Blind, A. Jungmittag, and A. Mangelsdorf, ‘‘The economic benefits [37] T. Granlund, A. Kopponen, V. Stirbu, L. Myllyaho, and T. Mikkonen,
of standardization,’’ DIN German Inst. Standardization, Berlin, Germany, ‘‘MLOps challenges in multi-organization setup: Experiences from two
Tech. Rep., 2011. real-world cases,’’ 2021, arXiv:2103.08937.
[12] O. Khalaj, M. B. Jamshidi, E. Saebnoori, B. Mašek, C. Štadler, and [38] T. Granlund, V. Stirbu, and T. Mikkonen, ‘‘Towards regulatory-compliant
J. Svoboda, ‘‘Hybrid machine learning techniques and computational MLOps: Oravizio’s journey from a machine learning experiment to a
mechanics: Estimating the dynamic behavior of oxide precipitation hard- deployed certified medical product,’’ Social Netw. Comput. Sci., vol. 2,
ened steel,’’ IEEE Access, vol. 9, pp. 156930–156946, 2021. no. 5, pp. 1–14, Sep. 2021.
[13] S. Alla and S. K. Adari, ‘‘What is MLOps?’’ in Beginning MLOps With [39] O. Spjuth, J. Frid, and A. Hellander, ‘‘The machine learning life cycle
MLFlow. Cham, Switzerland: Springer, 2021, pp. 79–124. and the cloud: Implications for drug discovery,’’ Expert Opinion Drug
[14] S. Mäkinen, H. Skogström, E. Laaksonen, and T. Mikkonen, ‘‘Who needs Discovery, vol. 16, no. 9, pp. 1–9, 2021.
MLOps: What data scientists seek to accomplish and how can MLOps [40] M. Zaharia, A. Chen, A. Davidson, A. Ghodsi, S. A. Hong, A. Konwinski,
help?’’ 2021, arXiv:2103.08942. S. Murching, T. Nykodym, P. Ogilvie, and M. Parkhe, ‘‘Accelerating the
[15] U. Bhatt, M. Andrus, A. Weller, and A. Xiang, ‘‘Machine learning explain- machine learning lifecycle with MLflow,’’ IEEE Data Eng. Bull., vol. 41,
ability for external stakeholders,’’ 2020, arXiv:2007.05408. no. 4, pp. 39–45, Jun. 2018.
[16] C. Ebert, G. Gallardo, J. Hernantes, and N. Serrano, ‘‘DevOps,’’ IEEE [41] K. Org, ‘‘Kedro-org/Kedro: A Python framework for creating reproducible,
Softw., vol. 33, no. 3, pp. 94–100, Apr. 2016. maintainable and modular data science code,’’ Tech. Rep.
[17] S. M. Mohammad, ‘‘DevOps automation and agile methodology,’’ in Proc. [42] D. Hudgeon and R. Nichol, ‘‘Machine learning for business: Using Ama-
Int. J. Creative Res. Thoughts (IJCRT), 2017, pp. 2320–2882. zon SageMaker and Jupyter,’’ Tech. Rep., 2020.
[18] M. Borg, R. Jabangwe, S. Åberg, A. Ekblom, L. Hedlund, and A. Lidfeldt, [43] V. J. Reddi, C. Cheng, D. Kanter, P. Mattson, G. Schmuelling, C.-J. Wu,
‘‘Test automation with grad-CAM Heatmaps—A future pipe segment in B. Anderson, M. Breughe, M. Charlebois, and W. Chou, ‘‘MLPerf infer-
MLOps for vision AI?’’ in Proc. IEEE Int. Conf. Softw. Test., Verification ence benchmark,’’ in Proc. ACM/IEEE 47th Annu. Int. Symp. Comput.
Validation Workshops (ICSTW), Apr. 2021, pp. 175–181. Archit. (ISCA), Jun. 2020, pp. 446–459.
[19] G. Fursin, ‘‘The collective knowledge project: Making ML models more [44] C. Li, A. Dakkak, J. Xiong, W. Wei, L. Xu, and W.-M. Hwu, ‘‘XSP:
portable and reproducible with open APIs, reusable best practices and Across-stack profiling and analysis of machine learning models on GPUs,’’
MLOps,’’ 2020, arXiv:2006.07161. in Proc. IEEE Int. Parallel Distrib. Process. Symp. (IPDPS), May 2020,
[20] J. P. Gujjar and V. N. Kumar, ‘‘Demystifying mlops for continuous delivery pp. 326–327.
of the product,’’ Asian J. Adv. Res., vol. 18, pp. 19–23, Feb. 2022. [45] T. Ben-Nun, M. Besta, S. Huber, A. N. Ziogas, D. Peter, and T. Hoefler,
[21] S. Moreschini, F. Lomio, D. Hästbacka, and D. Taibi, ‘‘MLOps for evolv- ‘‘A modular benchmarking infrastructure for high-performance and repro-
able AI intensive software systems,’’ Tech. Rep. ducible deep learning,’’ in Proc. IEEE Int. Parallel Distrib. Process. Symp.
[22] M. Vizard, Kirsch, and, ‘‘DevOps,’’ Tech. Rep., Nov. 2021. (IPDPS), May 2019, pp. 66–77.
[23] I. Karamitsos, S. Albarhami, and C. Apostolopoulos, ‘‘Applying DevOps [46] C. Shearer, ‘‘The CRISP-DM model: The new blueprint for data mining,’’
practices of continuous automation for machine learning,’’ Information, J. Data Warehousing, vol. 5, no. 4, pp. 13–22, 2000.
vol. 11, no. 7, p. 363, Jul. 2020. [47] H. Ahmed, D. Tahseen, W. Haider, M. Asad, S. Nand, and S. Kamran,
[24] B. S. Farroha and D. L. Farroha, ‘‘A framework for managing mission ‘‘Establishing standard rules for choosing best KPIs for an E-commerce
needs, compliance, and trust in the DevOps environment,’’ in Proc. IEEE business based on Google analytics and machine learning technique,’’ Int.
Mil. Commun. Conf., Oct. 2014, pp. 288–293. J. Adv. Comput. Sci. Appl., vol. 8, no. 5, pp. 12–24, 2017.

63616 VOLUME 10, 2022


M. Testi et al.: MLOps: A Taxonomy and a Methodology

[48] D. Hendrycks and T. Dietterich, ‘‘Benchmarking neural network robust- [75] L. E. Lwakatare, I. Crnkovic, E. Rånge, and J. Bosch, ‘‘From a data science
ness to common corruptions and perturbations,’’ 2019, arXiv:1903.12261. driven process to a continuous delivery process for machine learning sys-
[49] W. Samek, T. Wiegand, and K.-R. Müller, ‘‘Explainable artificial intelli- tems,’’ in Proc. Int. Conf. Product-Focused Softw. Process Improvement.
gence: Understanding, visualizing and interpreting deep learning models,’’ Cham, Switzerland: Springer, 2020, pp. 185–201.
2017, arXiv:1708.08296. [76] J. F. Smart, Jenkins: The Definitive Guide: Continuous Integration for
[50] Z. Wang, Y. Lai, Z. Liu, and J. Liu, ‘‘Explaining the attributes of a deep Masses. Newton, MA, USA: O’Reilly Media, 2011.
learning based intrusion detection system for industrial control networks,’’ [77] S. Chacon and B. Straub, Pro Git. New York, NY, USA: Apress, 2014.
Sensors, vol. 20, no. 14, p. 3817, Jul. 2020. [78] D. Merkel, ‘‘Docker: Lightweight Linux containers for consistent devel-
[51] S. Rabiul Islam, W. Eberle, S. Khaled Ghafoor, and M. Ahmed, opment and deployment,’’ Linux J., vol. 2014, no. 239, p. 2, 2014.
‘‘Explainable artificial intelligence approaches: A survey,’’ 2021, [79] PM. Helm, 2018.
arXiv:2101.09429. [80] (2017). Kubernetes Manual. Accessed: Dec. 4, 2017. [Online]. Available:
[52] B. Fitzgerald and K.-J. Stol, ‘‘Continuous software engineering and https://kubernetes.io/
beyond: Trends and challenges,’’ in Proc. 1st Int. Workshop Rapid Con- [81] I. Weber, S. Nepal, and L. Zhu, ‘‘Developing dependable and secure cloud
tinuous Softw. Eng. (RCoSE), 2014, pp. 1–9. applications,’’ IEEE Internet Comput., vol. 20, no. 3, pp. 74–79, May 2016.
[53] G. Symeonidis, E. Nerantzis, A. Kazakis, and G. A. Papakostas, ‘‘MLOps– [82] P. Webteam, ‘‘Resources: Puppet,’’ Tech. Rep.
definitions, tools and challenges,’’ 2022, arXiv:2201.00162. [83] L. Chen, ‘‘Continuous delivery: Huge benefits, but challenges too,’’ IEEE
[54] N. Hewage and D. Meedeniya, ‘‘Machine learning operations: A survey on Softw., vol. 32, no. 2, pp. 50–54, Mar. 2015.
MLOps tool support,’’ 2022, arXiv:2202.10169. [84] B. Liu, ‘‘Lifelong machine learning: A paradigmphfor continuous learn-
[55] AI Usecases & Tools to Grow Your Business, Jan. 2022. ing,’’ Frontiers Comput. Sci., vol. 11, no. 3, pp. 359–361, 2017.
[56] M. J. Willemink, W. A. Koszek, C. Hardell, J. Wu, D. Fleischmann, [85] A. Komolafe, ‘‘Retraining model during deployment: Continuous training
H. Harvey, L. R. Folio, R. M. Summers, D. L. Rubin, and M. P. Lungren, and continuous testing,’’ Tech. Rep., Dec. 2021.
‘‘Preparing medical imaging data for machine learning,’’ Radiology, [86] J. G. Moreno-Torres, T. Raeder, R. Alaiz-Rodríguez, N. V. Chawla, and
vol. 295, no. 1, pp. 4–15, Apr. 2020. F. Herrera, ‘‘A unifying view on dataset shift in classification,’’ Pattern
[57] T. Kulesza, S. Amershi, R. Caruana, D. Fisher, and D. Charles, ‘‘Structured Recognit., vol. 45, no. 1, pp. 521–530, 2012.
labeling for facilitating concept evolution in machine learning,’’ in Proc. [87] Y. LeCun, Y. Bengio, and G. Hinton, ‘‘Deep learning,’’ Nature, vol. 521,
SIGCHI Conf. Hum. Factors Comput. Syst., Apr. 2014, pp. 3075–3084. no. 7553, pp. 436–444, Nov. 2015.
[58] T. van der Weide, D. Papadopoulos, O. Smirnov, M. Zielinski, and [88] T. Miller, ‘‘Explanation in artificial intelligence: Insights from the social
T. van Kasteren, ‘‘Versioning for end-to-end machine learning pipelines,’’ sciences,’’ Artif. Intell., vol. 267, pp. 1–38, Feb. 2019.
in Proc. 1st Workshop Data Manage. End-End Mach. Learn., May 2017, [89] A. Adadi and M. Berrada, ‘‘Peeking inside the black-box: A sur-
pp. 1–9. vey on explainable artificial intelligence (XAI),’’ IEEE Access, vol. 6,
[59] A. Zheng and A. Casari, Feature Engineering for Machine Learning: pp. 52138–52160, 2018.
Principles and Techniques for Data Scientists. Newton, MA, USA: [90] P. Linardatos, V. Papastefanopoulos, and S. Kotsiantis, ‘‘Explainable AI:
O’Reilly Media, 2018. A review of machine learning interpretability methods,’’ Entropy, vol. 23,
[60] A. W. Long, J. Zhang, S. Granick, and A. L. Ferguson, ‘‘Machine learning no. 1, p. 18, Dec. 2020.
assembly landscapes from particle tracking data,’’ Soft Matter, vol. 11, [91] L. F. W. Anthony, B. Kanding, and R. Selvan, ‘‘Carbontracker: Tracking
no. 41, pp. 8141–8153, 2015. and predicting the carbon footprint of training deep learning models,’’
[61] T. Akiba, S. Sano, T. Yanase, T. Ohta, and M. Koyama, ‘‘Optuna: 2020, arXiv:2007.03051.
A next-generation hyperparameter optimization framework,’’ in Proc. 25th [92] M. Płoszaj-Mazurek, E. Ryńska, and M. Grochulska-Salak, ‘‘Methods
ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, Jul. 2019, to optimize carbon footprint of buildings in regenerative architectural
pp. 2623–2631. design with the use of machine learning, convolutional neural network,
[62] D. Golovin, B. Solnik, S. Moitra, G. Kochanski, J. Karro, and D. Sculley, and parametric design,’’ Energies, vol. 13, no. 20, p. 5289, Oct. 2020.
‘‘Google vizier: A service for black-box optimization,’’ in Proc. 23rd [93] T. Wiedmann and J. Minx, ‘‘A definition of ‘carbon footprint,’’’ Ecol. Econ.
ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, Aug. 2017, Res. trends, vol. 1, pp. 1–11, Mar. 2008.
pp. 1487–1495. [94] E. Strubell, A. Ganesh, and A. McCallum, ‘‘Energy and policy considera-
[63] L. Baier, F. Jöhren, and S. Seebacher, ‘‘Challenges in the deployment and tions for deep learning in NLP,’’ 2019, arXiv:1906.02243.
operation of machine learning in practice,’’ in Proc. ECIS, 2019, pp. 1–15. [95] G. Tamburrini, ‘‘The AI carbon footprint and responsibilities of AI scien-
[64] O. Boursalie, R. Samavi, and T. E. Doyle, ‘‘M4CVD: Mobile machine tists,’’ Philosophies, vol. 7, no. 1, p. 4, Jan. 2022.
learning model for monitoring cardiovascular disease,’’ Proc. Comput. Sci., [96] U. European Commission, ‘‘White paper on artificial intelligence—
vol. 63, no. 2, pp. 384–391, 2015. European commission,’’ Tech. Rep.
[65] M. Syafrudin, G. Alfian, N. Fitriyani, and J. Rhee, ‘‘Performance analysis [97] A. Shafiei, M. Jamshidi, F. Khani, J. Talla, Z. Peroutka, R. Gantassi,
of IoT-based sensor, big data processing, and machine learning model M. Baz, O. Cheikhrouhou, and H. Hamam, ‘‘A hybrid technique based on
for real-time monitoring system in automotive manufacturing,’’ Sensors, a genetic algorithm for fuzzy multiobjective problems in 5G, Internet of
vol. 18, no. 9, p. 2946, Sep. 2018. Things, and mobile edge computing,’’ Math. Problems Eng., vol. 2021,
[66] R. Ashmore, R. Calinescu, and C. Paterson, ‘‘Assuring the machine learn- pp. 1–14, Oct. 2021.
ing lifecycle: Desiderata, methods, and challenges,’’ ACM Comput. Sur- [98] M. B. Jamshidi, N. Alibeigi, N. Rabbani, B. Oryani, and A. Lalbakhsh,
veys, vol. 54, no. 5, pp. 1–39, Jun. 2022. ‘‘Artificial neural networks: A powerful tool for cognitive science,’’ in
[67] A. F. V. Maya, ‘‘The state of MLOps,’’ Tech. Rep., 2021. Proc. IEEE 9th Annu. Inf. Technol., Electron. Mobile Commun. Conf.
[68] M. Badawy, A. A. A. El-Aziz, A. M. Idress, H. Hefny, and S. Hossam, (IEMCON), Nov. 2018, pp. 674–679.
‘‘A survey on exploring key performance indicators,’’ Future Comput.
Informat. J., vol. 1, nos. 1–2, pp. 47–52, Dec. 2016.
[69] S. Agrawal and A. Mittal, ‘‘MLOps: 5 steps to operationalize machine-
learning models—AI4,’’ Tech. Rep. MATTEO TESTI is currently pursuing the Ph.D.
[70] Y. Li, X. Yu, and N. Koudas, ‘‘Data acquisition for improving machine degree in computer engineering with the Medi-
learning models,’’ 2021, arXiv:2105.14107.
cal Statistic and Molecular Epidemiology Unit,
[71] P. Ruf, M. Madan, C. Reich, and D. Ould-Abdeslam, ‘‘Demystifying
University of Biomedical Campus, Rome, Italy.
MLOps and presenting a recipe for the selection of open-source tools,’’
Appl. Sci., vol. 11, no. 19, p. 8861, Sep. 2021.
He is also an Entrepreneur with a strong back-
[72] M. Treveil, Introducing MLOps: How to Scale Machine Learning in the ground in data science with a focus on deep learn-
Enterprise. Newton, MA, USA: O’Reilly, 2020. ing. He founded DeepLearningItalia the biggest
[73] J. Bosch, ‘‘Continuous software engineering: An introduction,’’ in Contin- e-learning platform in the artificial intelligence
uous Software Engineering. Cham, Switzerland: Springer, 2014, pp. 3–13. area in the Italian language. He was one of the
[74] M. Leppänen, S. Mäkinen, M. Pagels, V. P. Eloranta, J. Itkonen, technical writers for the Artificial Intelligence
M. V. Mäntylä, and T. Männistö, ‘‘The highways and country roads to Italian white paper. Since 2019, he has been an Adjunct Professor with the
continuous deployment,’’ IEEE Softw., vol. 32, no. 2, pp. 64–72, Mar. 2015. University of Rome Tor Vergata, Rome.

VOLUME 10, 2022 63617


M. Testi et al.: MLOps: A Taxonomy and a Methodology

MATTEO BALLABIO was born in Carate Brianza, PAOLO SODA (Member, IEEE) is currently a Full
Monza and Brianza, in January 1999. He received Professor in computer science and computer engi-
the B.Sc. degree in biomedical engineering from neering with the University Campus Bio-Medico
Bergamo University, Italy, in December 2021, di Roma, and he is also a Visiting Professor in
where he is currently pursuing the M.Sc. degree biomedical engineering and AI with the Depart-
in management engineering. He has been a Col- ment of Radiation Sciences, Umeå University,
laborator at DeepLearningItalia, since September Sweden. He is also a Vice-Coordinator of the
2021. He is the Contributor of the dataset ‘‘DBB Health and Life Sciences specialization area of
Distorted Brain Benchmark’’ (BrainLife.io) used the National Ph.D. program in AI. His research
in the publication ‘‘Automatic Tissue Segmenta- interests include AI, machine learning, and big
tion with Deep Learning in Patients with Congenital or Acquired Distortion data analytics, with applications to data, signals, 2D and 3D image, and video
of Brain Anatomy’’ published on Springer. He received the B.Sc. thesis enti- processing and analysis. He was the Team Leader of the Research Groups
tled: ‘‘Creation of an open dataset for the evaluation of the segmentation of that won the two international competitions: ‘‘COVID CXR Hackathon’’
MRI images in the case of patients with severe distortions of brain anatomy’’ (2022 Dubai Expo) and ‘‘All against COVID-19: Screening X-ray Images
in collaboration with the Research Center Fondazione Bruno Kessler (FBK), for COVID-19 Infection’’ (IEEE 2021). He is a member of CVPL and
Trento (IT). SIBIM, and the chairs of the IEEE International Technical Committee for
Computational Life Sciences.

EMANUELE FRONTONI (Member, IEEE) is cur-


rently a Full Professor in computer science with
the University of Macerata and the Co-Director of
the VRAI Laboratory, Marche Polytechnic Uni-
versity. His research interests include computer
vision and artificial intelligence with applications
in robotics, video analysis, human behavior anal-
ysis, and digital humanities. He is involved in sev-
eral industrial research and development projects
in collaboration with ICT and mechatronics com-
panies in the field of artificial intelligence. He is a member of the European
Association for Artificial Intelligence, the European AI Alliance, and the
International Association for Pattern Recognition.

GIULIO IANNELLO (Member, IEEE) received


the Graduate degree in electronic engineering from
the Politecnico di Milano, in 1981, and the Ph.D.
degree in computer science and computer engi-
neering from the University of Napoli Federico II,
in 1987. Currently, he is a Full Professor in com-
puter science and computer engineering with the
University Campus Bio-Medico di Roma, where
is responsible of the Research Group on Computer
Systems and Bionformatics. His current research GENNARO VESSIO (Member, IEEE) received
interests include data, signal and image processing for biomedical and the M.Sc. degree (Hons.) in computer science,
biological applications, high performance computing, design, and analysis in 2013, and the Ph.D. degree in computer science
of parallel algorithms. He has published over 150 journals and conference and mathematics from the Department of Com-
papers in these and related areas. He is a member of several program puter Science, University of Bari, Italy, in 2017.
committees of international conferences and referee of international journals. He is currently an Assistant Professor with the
He is a member of the ACM, CVPL, and SIBIM. University of Bari. His current research interests
include pattern recognition, machine and deep
learning, computer vision, and their application to
SARA MOCCIA was born in Bari, in several domains, including e-health, drone vision,
September 1990. She received the B.Sc. degree, and digital humanities. He is a member of the Editorial Board of Interna-
in 2012, the M.Sc. degree (cum laude) in biomed- tional Journal of Intelligent Systems and Computational Intelligence and
ical engineering from the Politecnico di Milano, Neuroscience. He is involved in the organization of scientific events, the most
Milan, Italy, in December 2014, and the joint Ph.D. recent of which was the Second International Workshop on Fine Art Pattern
(European) degree (cum laude) in bioengineering Extraction and Recognition within ICIAP. He regularly serves as a reviewer
from the Istituto Italiano di Tecnologia, Genoa, for many international journals published by high-level publishers, including
Italy, in May 2018, and the Politecnico di Milano, Elsevier, IEEE, and Springer, and as a member of the program committee of
for which she was awarded by the Italian Group of many international conferences. He is regularly involved in the teaching
Bioengineering (Gruppo Nazionale di Bioingeg- activities of his department and has supervised dozens of graduate students
neria). During her Ph.D. degree, she was hosted with the Computer-Assisted in computer science. He is currently a member of the IEEE Computer
Medical Interventions Laboratory, German Cancer Research Center, Heidel- Society, the IEEE Computational Intelligence Society, the INdAM-GNCS
berg, Germany. From May 2018 to January 2021, she was a Postdoctoral Society, the IAPR Technical Committee 19 (Computer Vision for Cultural
Researcher with Università Politecnica delle Marche, Ancona, Italy, and an Heritage Applications), the AIxIA association, the CINI-AIIS Laboratory,
Affiliated Researcher with the Istituto Italiano di Tecnologia. In 2021, she the CITEL Telemedicine Research Center, the CEDITH Digital Heritage
was also a Visiting Researcher with the University of Minho, Braga, Portugal. Research Center, GRIN, MIR Laboratories, and the REPRISE Register of
She is currently an Assistant Professor at Scuola Superiore Sant’Anna, Pisa, Experts.
Italy, and an Adjunct Professor with Università Politecnica delle Marche.

63618 VOLUME 10, 2022

View publication stats

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy