MLOps A Taxonomy and A Methodology
MLOps A Taxonomy and A Methodology
net/publication/361669590
CITATIONS READS
71 1,729
7 authors, including:
All content following this page was uploaded by Gennaro Vessio on 01 July 2022.
ABSTRACT Over the past few decades, the substantial growth in enterprise-data availability and the
advancements in Artificial Intelligence (AI) have allowed companies to solve real-world problems using
Machine Learning (ML). ML Operations (MLOps) represents an effective strategy for bringing ML models
from academic resources to useful tools for solving problems in the corporate world. The current literature
on MLOps is still mostly disconnected and sporadic. In this work, we review the existing scientific literature
and we propose a taxonomy for clustering research papers on MLOps. In addition, we present methodologies
and operations aimed at defining an ML pipeline to simplify the release of ML applications in the industry.
The pipeline is based on ten steps: business problem understanding, data acquisition, ML methodology,
ML training & testing, continuous integration, continuous delivery, continuous training, continuous moni-
toring, explainability, and sustainability. The scientific and business interest and the impact of MLOps have
grown significantly over the past years: the definition of a clear and standardized methodology for conducting
MLOps projects is the main contribution of this paper.
INDEX TERMS MLOps, continuous monitoring, continuous integration, continuous delivery, continuous
training, XAI, sustainability.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
63606 VOLUME 10, 2022
M. Testi et al.: MLOps: A Taxonomy and a Methodology
pipelines [31], established practices [32], and auxiliary methodologies and try to apply them to the MLOps pipeline.
frameworks and tools. Indeed, model development is only a The process typically involves two teams: ML scientists
small part of the overall process, and many other processes, responsible for model training and testing, and ML engineers
configurations, and tools need to be integrated into the sys- responsible for production and deployment. MLOps pipeline
tem [33]. Bringing the application of DevOps techniques in automation with CI/CD routines is as follows:
the context of continuous training (CT), CD, CI, and con-
• Business problem analysis;
tinuous monitoring (CM) is among the main requirements
• Dataset features and storage;
of an ML project that aims to provide process automation,
• ML analytical methodology;
governance, and agility.
• Pipeline CI components;
In the literature, several projects have tried to tackle var-
• Pipeline CD components;
ious aspects of the ML production process by expanding
• Automated ML triggering;
existing libraries or by creating new tools to enhance the
• Model registry storage;
quality and performance of specific processes or make them
• Monitoring and performance;
more insightful. Up to now, there is no standardized and
• Production ML service.
common pipeline to follow for an end-to-end MLOps project.
To cluster the different approaches, we propose the following One of the points of greatest attention after CI and CD
taxonomy: is monitoring, in terms of metrics and Key Performance
1) ML-based software systems, also known as model- Indicators (KPIs), and the continuous deployment of mod-
centric frameworks. These systems focus on the archi- els. This part includes model performance, data monitoring,
tecture of ML models with a view to (CI/CD) [23], outlier detection, and explanations of historical predictions.
[34], [35]. The goal of such systems is twofold: on the Continuous monitoring is a process that allows understanding
one hand, it is to create and automate ML pipelines; in real-time when validation performance tends to decrease.
on the other hand, the goal is to increase the level of Outlier detection is the key to trusting and keeping the model
automation in the ML software life-cycle [36]. healthy. Therefore, the most important function of continuous
2) ML use case applications where, for example, papers monitoring is to ensure high model performance and KPIs
explain an MLOps workflow to foster collaboration used to validate models. There are many metrics to test the
and negotiation between surgeon and patient [37], [38] quality of a model, such as precision, recall, F1, and MSE.
or the ML pipeline on the Cloud for drug However, these metrics evaluate a model in the laboratory,
discovery [39]. regardless of the real-world context of how the model will
3) ML automation frameworks such as MLFlow [40], be used. When evaluating ML models in the context of real
Kedro [41] or Amazon SageMaker [42], and bench- applications, model performance metrics are not enough to
marking frameworks such as MLPerf [43], MLMod- establish the robustness of the models. The most basic step
elScope [44] and Deep500 [45]. These are interesting towards supporting such KPI-based analytics is to ensure that
commercial tools that are already being used in daily KPIs and model metrics are stored with a common correlation
work practice and represent excellent ML framework ID to identify which model operations contributed to transac-
automation solutions. tions with a particular KPI score [36]. Other important KPIs
The following subsections review in more detail the works at the company level for evaluating the performance of the
that fall into the three categories. model can be: time-to-market, infrastructure cost, scalability,
and profitability indices on sales (ROS) [47]. Unfortunately,
A. ML-BASED SOFTWARE SYSTEMS ML models often fail to generalize outside the training data
Machine Learning is becoming the primary approach to distribution [48].
solving real-world problems. Therefore, there are many data Finally, the trust in the ML project is the model expla-
science teams studying how to apply DevOps principles nation. Explainability allows users to trust the prediction
to industries. The ML life-cycle involves manual steps for and this improves transparency. The user can verify which
deploying the ML pipeline model. This method can produce factors contributed to certain predictions, introducing a layer
unexpected results due to the dependency on data, prepro- of accountability [35]. The terms ‘‘explainability’’ and ‘‘inter-
cessing, model training, validation, and testing. The idea is to pretability’’ are being used interchangeably throughout the
design an automated pipeline using two DevOps principles literature; however, in the case of an AI-based system,
which are CI and CD. The functionality of CI is to test and explainability is more than interpretability in terms of impor-
validate data, data schemes, and models. CD is for an ML tance, completeness, and fidelity of predictions or clas-
pipeline that should automatically deploy another ML ser- sifications [49]. Explainable Artificial Intelligence (XAI)
vice [23]. The ML life-cycle has different methodologies to fit is a research trend that promotes explainable decision-
different scenarios and data types. The approach most used by making. Many real-world ML applications greatly increase
data mining experts is CRoss-Industry Standard Process for the efficiency of industrial production from automated equip-
Data Mining (CRISP-DM) [46], introduced in 1996 by Daim- ment and production processes [50]. However, the use of
ler Chrysler. Experts can borrow the standard CRISP-DM ‘‘black-boxes’’ has not yet been overcome due to the lack of
TABLE 1. Some popular data labeling tools. TABLE 3. Some popular data versioning tools.
frameworks [40]. ML projects have created new challenges TABLE 5. Some popular feature engineering tools.
TABLE 6. Pros and cons of some popular feature engineering tools. TABLE 9. Some popular hyperparameter tuning tools.
TABLE 10. Pros and cons of some popular hyperparameter tuning tools.
data drift and anomalies over time and allow to set alerts in
case of performance issues. An overview of some popular
data monitoring tools is provided in Table 13 and in Table 14,
with advantages and disadvantages.
There are also tools that cover the end-to-end ML life-
cycle [66]. Some popular platforms are shown in Table 15
and in Table 16 with advantages and disadvantages.
TABLE 12. Pros and cons of some popular model deployment tools. TABLE 14. Pros and cons of some popular model monitoring tools.
specific object is part of an image (output) could be certain This paper is intended as a literature review to systematize
dominant patterns in the image (input). The more explainable and add clarity to the definition and methods of MLOps.
a model is, the greater the understanding practitioners get in The paper aims to define a high-level strategy for dealing
terms of internal business procedures that take place while with MLOps projects; the goal of future work is to apply
the model is making decisions. An explainable model does our proposed methodology to use cases such as biomedical
not necessarily translate into one that humans can understand imaging and finance. Experimental work will be required to
(internal logic or underlying processes) [90]. The explainabil- test the pipeline defined in this manuscript.
ity of the model allows the user to build trust in the predictions Traditionally, data preparation, model training and testing,
made by the deployed system and improve transparency. The and performance comparison are key points of traditional
user can verify which factors contributed to certain predic- pipelines. In this work, we have stressed the importance of
tions, introducing a layer of accountability [15]. many other, no less important aspects, such as continuous
monitoring, sustainability issues, etc. Following well-defined
J. SUSTAINABILITY: CARBON FOOTPRINT guidelines is the only way to allow the traceability and repro-
The increasingly common use of Deep Learning models in ducibility of the results obtained in an Open Science context.
real-world projects, as the other side of the medal, corre- For this reason, it is crucial to use systematic procedures for
sponded to immense growth in the computation and energy greater cohesion in the scientific community to follow clear
required [91]. If this growing trend continues, Deep Learning and clean pipelines in MLOps. The remaining challenge for
could become a significant contributor to climate change. the community is to try to apply an ML methodology to
This trend can be mitigated by exploring how to improve an end-to-end use case trying to go through each point of
energy efficiency in the DL models [92]. Hence, data scien- this methodology and show what happens if some phases
tists need to know their energy and carbon footprint, so that are not used. Specific areas, such as biomedicine, finance,
they can actively take steps to reduce them whenever possible. cyber-security, manufacturing [97], can greatly benefit from
Carbon footprint is a measure of the total exclusive amount adopting MLOps, and we believe the pipeline defined in this
of carbon dioxide emissions that are directly and indirectly paper can bring advantages over traditional practices.
caused by an activity or accumulated during the life stages of According to Fortune Business Insights, the global
a product [93]. Machine Learning market is expected to grow from
Strubell et al. selectively focused on carbon footprint anal- $15.50 billion in 2021 to $152.24 billion in 2028 with
ysis on AI models for natural language processing [94]. For a compound annual growth rate of 38.6% over the fore-
example, the training of an NLP Transformer model was esti- cast period. MLOps aims to create long-term ML solutions,
mated to be equivalent to that of a commercial flight between reducing maintenance costs, and monitoring and optimizing
San Francisco and New York. The publication of these esti- workflows. Understanding and intercepting new challenges
mates has had a significant effect in the scientific world. and trends such as the emerging MLOps will provide a
Following the publication of these data, the 2020 White strong competitive advantage to companies adopting this
Paper on AI released by the European Commission has solution [98]
called for actions that go beyond the collection of impressive
but admittedly anecdotal data about the training of selected ABBREVIATION TERMS
AI systems [95]. For this reason, it is necessary to calculate ML Machine Learning.
the carbon footprint of each individual AI system and the MLOps Machine Learning Operations.
AI sector [96]. AI Artificial Intelligence.
It is important to emphasize that, during the MLOps life- XAI eXplainable AI.
cycle, carbon footprint should be taken into account when STEM Science, Technology, Engineering
and Mathematics.
choosing models. It should be better to take a bottom-up
DevOps Development Operations.
approach trying the first simple models without jumping to DL Deep Learning.
the state-of-the-art with complex and expensive models. The ROI Return on Investments.
same approach is to calculate the carbon footprint during CI Continuos Integration.
training and testing, but also during continuous integration, CD Continuos Delivery.
continuous delivery, and continuous training. CT Continuos Training.
CRISP-DM CRoss-Industry Standard
V. CONCLUSION Process for Data Mining.
In this paper, we have provided an overview of approaches KPI Key Performance Indicator.
in the literature using MLOps: we have provided a taxon- MSE Mean Squared Error.
omy of the current literature and proposed a methodology ROS Return on Sales.
for addressing MLOps projects. The application of DevOps REST REpresentational State Transfer.
principles to ML and the use of MLOps in the industrial OKR Objective and Key Result.
environment are still little discussed topics at the academic API Application Programming Interface.
level. Current literature is mostly disconnected and sporadic. NLP Natural Language Processing.
VOLUME 10, 2022 63615
M. Testi et al.: MLOps: A Taxonomy and a Methodology
[48] D. Hendrycks and T. Dietterich, ‘‘Benchmarking neural network robust- [75] L. E. Lwakatare, I. Crnkovic, E. Rånge, and J. Bosch, ‘‘From a data science
ness to common corruptions and perturbations,’’ 2019, arXiv:1903.12261. driven process to a continuous delivery process for machine learning sys-
[49] W. Samek, T. Wiegand, and K.-R. Müller, ‘‘Explainable artificial intelli- tems,’’ in Proc. Int. Conf. Product-Focused Softw. Process Improvement.
gence: Understanding, visualizing and interpreting deep learning models,’’ Cham, Switzerland: Springer, 2020, pp. 185–201.
2017, arXiv:1708.08296. [76] J. F. Smart, Jenkins: The Definitive Guide: Continuous Integration for
[50] Z. Wang, Y. Lai, Z. Liu, and J. Liu, ‘‘Explaining the attributes of a deep Masses. Newton, MA, USA: O’Reilly Media, 2011.
learning based intrusion detection system for industrial control networks,’’ [77] S. Chacon and B. Straub, Pro Git. New York, NY, USA: Apress, 2014.
Sensors, vol. 20, no. 14, p. 3817, Jul. 2020. [78] D. Merkel, ‘‘Docker: Lightweight Linux containers for consistent devel-
[51] S. Rabiul Islam, W. Eberle, S. Khaled Ghafoor, and M. Ahmed, opment and deployment,’’ Linux J., vol. 2014, no. 239, p. 2, 2014.
‘‘Explainable artificial intelligence approaches: A survey,’’ 2021, [79] PM. Helm, 2018.
arXiv:2101.09429. [80] (2017). Kubernetes Manual. Accessed: Dec. 4, 2017. [Online]. Available:
[52] B. Fitzgerald and K.-J. Stol, ‘‘Continuous software engineering and https://kubernetes.io/
beyond: Trends and challenges,’’ in Proc. 1st Int. Workshop Rapid Con- [81] I. Weber, S. Nepal, and L. Zhu, ‘‘Developing dependable and secure cloud
tinuous Softw. Eng. (RCoSE), 2014, pp. 1–9. applications,’’ IEEE Internet Comput., vol. 20, no. 3, pp. 74–79, May 2016.
[53] G. Symeonidis, E. Nerantzis, A. Kazakis, and G. A. Papakostas, ‘‘MLOps– [82] P. Webteam, ‘‘Resources: Puppet,’’ Tech. Rep.
definitions, tools and challenges,’’ 2022, arXiv:2201.00162. [83] L. Chen, ‘‘Continuous delivery: Huge benefits, but challenges too,’’ IEEE
[54] N. Hewage and D. Meedeniya, ‘‘Machine learning operations: A survey on Softw., vol. 32, no. 2, pp. 50–54, Mar. 2015.
MLOps tool support,’’ 2022, arXiv:2202.10169. [84] B. Liu, ‘‘Lifelong machine learning: A paradigmphfor continuous learn-
[55] AI Usecases & Tools to Grow Your Business, Jan. 2022. ing,’’ Frontiers Comput. Sci., vol. 11, no. 3, pp. 359–361, 2017.
[56] M. J. Willemink, W. A. Koszek, C. Hardell, J. Wu, D. Fleischmann, [85] A. Komolafe, ‘‘Retraining model during deployment: Continuous training
H. Harvey, L. R. Folio, R. M. Summers, D. L. Rubin, and M. P. Lungren, and continuous testing,’’ Tech. Rep., Dec. 2021.
‘‘Preparing medical imaging data for machine learning,’’ Radiology, [86] J. G. Moreno-Torres, T. Raeder, R. Alaiz-Rodríguez, N. V. Chawla, and
vol. 295, no. 1, pp. 4–15, Apr. 2020. F. Herrera, ‘‘A unifying view on dataset shift in classification,’’ Pattern
[57] T. Kulesza, S. Amershi, R. Caruana, D. Fisher, and D. Charles, ‘‘Structured Recognit., vol. 45, no. 1, pp. 521–530, 2012.
labeling for facilitating concept evolution in machine learning,’’ in Proc. [87] Y. LeCun, Y. Bengio, and G. Hinton, ‘‘Deep learning,’’ Nature, vol. 521,
SIGCHI Conf. Hum. Factors Comput. Syst., Apr. 2014, pp. 3075–3084. no. 7553, pp. 436–444, Nov. 2015.
[58] T. van der Weide, D. Papadopoulos, O. Smirnov, M. Zielinski, and [88] T. Miller, ‘‘Explanation in artificial intelligence: Insights from the social
T. van Kasteren, ‘‘Versioning for end-to-end machine learning pipelines,’’ sciences,’’ Artif. Intell., vol. 267, pp. 1–38, Feb. 2019.
in Proc. 1st Workshop Data Manage. End-End Mach. Learn., May 2017, [89] A. Adadi and M. Berrada, ‘‘Peeking inside the black-box: A sur-
pp. 1–9. vey on explainable artificial intelligence (XAI),’’ IEEE Access, vol. 6,
[59] A. Zheng and A. Casari, Feature Engineering for Machine Learning: pp. 52138–52160, 2018.
Principles and Techniques for Data Scientists. Newton, MA, USA: [90] P. Linardatos, V. Papastefanopoulos, and S. Kotsiantis, ‘‘Explainable AI:
O’Reilly Media, 2018. A review of machine learning interpretability methods,’’ Entropy, vol. 23,
[60] A. W. Long, J. Zhang, S. Granick, and A. L. Ferguson, ‘‘Machine learning no. 1, p. 18, Dec. 2020.
assembly landscapes from particle tracking data,’’ Soft Matter, vol. 11, [91] L. F. W. Anthony, B. Kanding, and R. Selvan, ‘‘Carbontracker: Tracking
no. 41, pp. 8141–8153, 2015. and predicting the carbon footprint of training deep learning models,’’
[61] T. Akiba, S. Sano, T. Yanase, T. Ohta, and M. Koyama, ‘‘Optuna: 2020, arXiv:2007.03051.
A next-generation hyperparameter optimization framework,’’ in Proc. 25th [92] M. Płoszaj-Mazurek, E. Ryńska, and M. Grochulska-Salak, ‘‘Methods
ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, Jul. 2019, to optimize carbon footprint of buildings in regenerative architectural
pp. 2623–2631. design with the use of machine learning, convolutional neural network,
[62] D. Golovin, B. Solnik, S. Moitra, G. Kochanski, J. Karro, and D. Sculley, and parametric design,’’ Energies, vol. 13, no. 20, p. 5289, Oct. 2020.
‘‘Google vizier: A service for black-box optimization,’’ in Proc. 23rd [93] T. Wiedmann and J. Minx, ‘‘A definition of ‘carbon footprint,’’’ Ecol. Econ.
ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, Aug. 2017, Res. trends, vol. 1, pp. 1–11, Mar. 2008.
pp. 1487–1495. [94] E. Strubell, A. Ganesh, and A. McCallum, ‘‘Energy and policy considera-
[63] L. Baier, F. Jöhren, and S. Seebacher, ‘‘Challenges in the deployment and tions for deep learning in NLP,’’ 2019, arXiv:1906.02243.
operation of machine learning in practice,’’ in Proc. ECIS, 2019, pp. 1–15. [95] G. Tamburrini, ‘‘The AI carbon footprint and responsibilities of AI scien-
[64] O. Boursalie, R. Samavi, and T. E. Doyle, ‘‘M4CVD: Mobile machine tists,’’ Philosophies, vol. 7, no. 1, p. 4, Jan. 2022.
learning model for monitoring cardiovascular disease,’’ Proc. Comput. Sci., [96] U. European Commission, ‘‘White paper on artificial intelligence—
vol. 63, no. 2, pp. 384–391, 2015. European commission,’’ Tech. Rep.
[65] M. Syafrudin, G. Alfian, N. Fitriyani, and J. Rhee, ‘‘Performance analysis [97] A. Shafiei, M. Jamshidi, F. Khani, J. Talla, Z. Peroutka, R. Gantassi,
of IoT-based sensor, big data processing, and machine learning model M. Baz, O. Cheikhrouhou, and H. Hamam, ‘‘A hybrid technique based on
for real-time monitoring system in automotive manufacturing,’’ Sensors, a genetic algorithm for fuzzy multiobjective problems in 5G, Internet of
vol. 18, no. 9, p. 2946, Sep. 2018. Things, and mobile edge computing,’’ Math. Problems Eng., vol. 2021,
[66] R. Ashmore, R. Calinescu, and C. Paterson, ‘‘Assuring the machine learn- pp. 1–14, Oct. 2021.
ing lifecycle: Desiderata, methods, and challenges,’’ ACM Comput. Sur- [98] M. B. Jamshidi, N. Alibeigi, N. Rabbani, B. Oryani, and A. Lalbakhsh,
veys, vol. 54, no. 5, pp. 1–39, Jun. 2022. ‘‘Artificial neural networks: A powerful tool for cognitive science,’’ in
[67] A. F. V. Maya, ‘‘The state of MLOps,’’ Tech. Rep., 2021. Proc. IEEE 9th Annu. Inf. Technol., Electron. Mobile Commun. Conf.
[68] M. Badawy, A. A. A. El-Aziz, A. M. Idress, H. Hefny, and S. Hossam, (IEMCON), Nov. 2018, pp. 674–679.
‘‘A survey on exploring key performance indicators,’’ Future Comput.
Informat. J., vol. 1, nos. 1–2, pp. 47–52, Dec. 2016.
[69] S. Agrawal and A. Mittal, ‘‘MLOps: 5 steps to operationalize machine-
learning models—AI4,’’ Tech. Rep. MATTEO TESTI is currently pursuing the Ph.D.
[70] Y. Li, X. Yu, and N. Koudas, ‘‘Data acquisition for improving machine degree in computer engineering with the Medi-
learning models,’’ 2021, arXiv:2105.14107.
cal Statistic and Molecular Epidemiology Unit,
[71] P. Ruf, M. Madan, C. Reich, and D. Ould-Abdeslam, ‘‘Demystifying
University of Biomedical Campus, Rome, Italy.
MLOps and presenting a recipe for the selection of open-source tools,’’
Appl. Sci., vol. 11, no. 19, p. 8861, Sep. 2021.
He is also an Entrepreneur with a strong back-
[72] M. Treveil, Introducing MLOps: How to Scale Machine Learning in the ground in data science with a focus on deep learn-
Enterprise. Newton, MA, USA: O’Reilly, 2020. ing. He founded DeepLearningItalia the biggest
[73] J. Bosch, ‘‘Continuous software engineering: An introduction,’’ in Contin- e-learning platform in the artificial intelligence
uous Software Engineering. Cham, Switzerland: Springer, 2014, pp. 3–13. area in the Italian language. He was one of the
[74] M. Leppänen, S. Mäkinen, M. Pagels, V. P. Eloranta, J. Itkonen, technical writers for the Artificial Intelligence
M. V. Mäntylä, and T. Männistö, ‘‘The highways and country roads to Italian white paper. Since 2019, he has been an Adjunct Professor with the
continuous deployment,’’ IEEE Softw., vol. 32, no. 2, pp. 64–72, Mar. 2015. University of Rome Tor Vergata, Rome.
MATTEO BALLABIO was born in Carate Brianza, PAOLO SODA (Member, IEEE) is currently a Full
Monza and Brianza, in January 1999. He received Professor in computer science and computer engi-
the B.Sc. degree in biomedical engineering from neering with the University Campus Bio-Medico
Bergamo University, Italy, in December 2021, di Roma, and he is also a Visiting Professor in
where he is currently pursuing the M.Sc. degree biomedical engineering and AI with the Depart-
in management engineering. He has been a Col- ment of Radiation Sciences, Umeå University,
laborator at DeepLearningItalia, since September Sweden. He is also a Vice-Coordinator of the
2021. He is the Contributor of the dataset ‘‘DBB Health and Life Sciences specialization area of
Distorted Brain Benchmark’’ (BrainLife.io) used the National Ph.D. program in AI. His research
in the publication ‘‘Automatic Tissue Segmenta- interests include AI, machine learning, and big
tion with Deep Learning in Patients with Congenital or Acquired Distortion data analytics, with applications to data, signals, 2D and 3D image, and video
of Brain Anatomy’’ published on Springer. He received the B.Sc. thesis enti- processing and analysis. He was the Team Leader of the Research Groups
tled: ‘‘Creation of an open dataset for the evaluation of the segmentation of that won the two international competitions: ‘‘COVID CXR Hackathon’’
MRI images in the case of patients with severe distortions of brain anatomy’’ (2022 Dubai Expo) and ‘‘All against COVID-19: Screening X-ray Images
in collaboration with the Research Center Fondazione Bruno Kessler (FBK), for COVID-19 Infection’’ (IEEE 2021). He is a member of CVPL and
Trento (IT). SIBIM, and the chairs of the IEEE International Technical Committee for
Computational Life Sciences.