0% found this document useful (0 votes)
66 views16 pages

Market Guide For DSML Engineering Platforms - Gartner 2022

The document discusses data science and machine learning engineering platforms and how they promote code-first development to build and support ML models. It outlines key findings on challenges in moving ML models from prototype to production and gaps in software engineering practices. It also discusses how DSML engineering platforms address these issues and combine open-source ML frameworks with proprietary controls. The market is fragmented with established vendors adding functionality and smaller vendors focusing on specific tasks. Data and analytics leaders are recommended to select platforms by identifying gaps and including requirements like flexibility, openness and composability.

Uploaded by

stammofa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
66 views16 pages

Market Guide For DSML Engineering Platforms - Gartner 2022

The document discusses data science and machine learning engineering platforms and how they promote code-first development to build and support ML models. It outlines key findings on challenges in moving ML models from prototype to production and gaps in software engineering practices. It also discusses how DSML engineering platforms address these issues and combine open-source ML frameworks with proprietary controls. The market is fragmented with established vendors adding functionality and smaller vendors focusing on specific tasks. Data and analytics leaders are recommended to select platforms by identifying gaps and including requirements like flexibility, openness and composability.

Uploaded by

stammofa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Market Guide for DSML

Engineering Platforms
Published 2 May 2022 - ID G00763493 - 16 min read

By Afraz Jaffri, Erick Brethenoux, and 7 more

Data science and machine learning engineering platforms promote


code-first development to build and support ML models used in
critical business applications. Data and analytics leaders scaling
their DSML initiatives must reassess their engineering requirements
and vendor selection criteria.

Overview
Key Findings
• Creating an efficient and maintainable path from prototype to production for
artificial intelligence (AI) and machine learning (ML) models continues to be a
major challenge, fueled by the increasing complexity of data and
infrastructure.

• Data science teams in many organizations have not sufficiently adopted


and strengthened software engineering and DevOps practices for AI and ML
model development and operationalization, leaving significant gaps in model
life cycle execution and management.

• Data science and machine learning (DSML) engineering platforms combine


the flexibility and utility of open-source ML frameworks with proprietary
controls and procedures that support repeatable and reliable delivery of
models into production.

• The DSML engineering platform market is fragmented, with established


vendors adding functionality across their platforms and smaller, specialist
vendors focusing on frameworks for abstracting common DSML tasks such as
experiment tracking and pipeline development.
Recommendations
Data and analytics leaders responsible for delivering data science solutions and the
necessary technology to move prototypes into production should:

• Select a DSML engineering platform by identifying gaps in current model


development practices, paying specific attention to capabilities around model
deployment, management, governance and security. Fill any remaining gaps
with specialist tools.

• Include flexibility, openness and composability as core requirements for


DSML platform selection by assessing API completeness, modularity of
platform architecture and metadata access.

• Match the skill sets of data science and ML engineering teams to DSML
engineering platform capabilities by assessing the level of collaboration and
abstraction provided across the model development and management life
cycle.

• Select a DSML engineering platform only when you have a pipeline of use
cases and a strategic business need for the delivery of machine
learning products with quantifiable scalability and performance requirements.

Market Definition
This document was revised on 3 May 2022. The document you are viewing is the
corrected version. For more information, see the Corrections page on gartner.com.

DSML engineering platforms consist of a core product and supporting portfolio of


integrated products, components, libraries and frameworks (including proprietary,
partner-sourced and open-source) for the development and operations of machine
learning solutions integrated with typically complex, innovative and highly scalable
applications. These solutions are engineered by personas who have deep technical
expertise in data science and machine learning or have other skills in digital
technology, such as data, software or system engineers. The platforms provide a
code-centric user interface, using a variety of programming languages. To boost
productivity, they also facilitate composition and automation through visual interfaces
and through open APIs.
Market Description
DSML engineering platforms have the primary purpose of developing ML models
that can drive critical business systems such as credit approval, predictive
maintenance, medical diagnosis and fraud detection. In order to do this, DSML
engineering platforms have evolved from supporting a core data science audience
with code-driven model development to now also supporting data engineering,
application development and infrastructure roles. DSML engineering platform
development has been driven by the need to enable collaboration between these
roles, including their activities and tasks, to effectively deliver ML systems. The focus
of these platforms has shifted in all areas of model development. Figure 1 illustrates
this shift by summarizing the main characteristics of a DSML engineering platform,
and Table 1 identifies 11 key aspects of platform functionality.

Figure 1: DSML Engineering Platforms

Table 1: Key Aspects of Platform Functionality


Enlarge Table


Capability DSML Engineering Platform Focus

Data access and Data access is provided for streaming data and
preparation unstructured data, typically achieved
through prebuilt connectors provided with the
platform. Data-centric AI is supported through data
labeling and synthetic data generation. Metadata
generated throughout the development life cycle is
stored and can be accessed programmatically.
Feature stores are also provided.

Data exploration and Support for notebooks is the de facto way for
visualization exploring and visualizing data. There are also
platform-specific functions for a variety of
exploratory statistics and geolocation and graph
analytics. Integrations are provided for
visualizations in external analytics platforms.

User interface Primarily code-based UI, which includes notebooks


modalities and IDEs for the majority of DSML tasks, although
visual-based drag-and-drop environments are
available for pipeline building and process design.
Graphical user interfaces are generally used for
administration and monitoring.

Collaboration Collaboration is supported across all stages in the


end-to-end pipeline, especially between
development and operationalization. Increasingly,
support is provided for implementing custom
approval workflows and integration with
marketplaces and workplace applications.

Model development Engineering platforms facilitate the development of


novel ML and deep learning techniques, such as
graph neural networks, transfer learning, federated
learning, ensembles and reinforcement learning, by
supporting the most recent open-source frameworks
or proprietary implementation.

Advanced analytics Support for different types of analyses on


geospatial, audio, images, video and text. Some
platforms also provide composite AI capabilities to
Capability DSML Engineering Platform Focus

combine ML with other techniques such as


simulation, rules or optimization.

Infrastructure Infrastructure can be provisioned, scaled, monitored


and spun down as required for data engineering,
model training, testing and deployment. Support is
provided for mixing and matching workloads
across on-premises, hybrid, multicloud and edge
environments.

Performance and Performance is primarily supported by


scalability utilizing multinode server clustering, distributed
autoscaling, model compression and optionally
distributed training. Support for training models with
specialist hardware such as GPUs is a standard
feature. Scalability is supported by orchestration of
workloads to optimize compute resources.

Precanned solutions Accelerate expert data science development by


providing template code specialized for different use
cases by industry or domain, and provide
accessibility to newly developed model architectures
produced by third parties.

Operationalization DSML engineering platforms support the abstracting


of components in the model development process
into pipelines. These components may include
model catalogs, model validation, performance
monitoring, containerization and integration with
XOps (DataOps, ModelOps, MLOps and DevOps)
processes. Models can be deployed in batch and
real-time modes for inferencing and
retraining on different endpoints.

Governance and Common components found in engineering


responsible AI platforms include model auditing and policy
compliance, risk management and security profiling,
privacy protection, bias mitigation, and fairness
Capability DSML Engineering Platform Focus

metrics through a variety of explainable AI


techniques.

Source: Gartner (May 2022)

Market Direction
The AI and data science platform market is due to grow to over $10 billion by 2025 at
a 21.6% compounded annual growth rate.1 This growth in the market mirrors the
investments made by organizations in data science and ML initiatives, which are
largely turning from strategy to execution. The DSML engineering market is
representative of this shift in dynamics between business need and technical
implementation.

Buyers of these platforms have typically had success in building and deploying
DSML solutions in pockets within their enterprise and are now looking to formalize
DSML development practices, platforms and architectures to provide sustainable
growth in the use of DSML enterprisewide. DSML engineering platforms will continue
to focus on enterprisewide deployments, which are managed by centralized teams,
often within IT, but also give visibility to lines of business (LOBs) for decision making.
The capabilities that will drive the development of these platforms and have the most
impact for these users are:

• Data access across hybrid and multicloud data sources with provisioning for
on-demand scalable compute for data engineering and model training.

• Experiment tracking and model management concentrating on increasing the


number of metadata points stored that will power catalogs for easy search
and reuse with some augmentation.

• Applying CI/CD practices to model development with a focus on unit


testing and validation of code and model behavior before entering production.

• The merger of model monitoring functionality with AIOps functionality to


include augmented monitoring of models, data and infrastructure, as part of
observability with highlighting of root causes and problem resolution
(see Market Guide for AIOps Platforms).
• Openness and flexibility of platforms via software development kits (SDKs)
and APIs to build a complete DSML stack through integration with other
DSML engineering platforms, data catalogs, DevOps and MLOps tools, and
collaboration platforms. It also enables organizations to build their own
services and integrate with other API-driven services.

Recent investment by cloud service providers (CSPs) in their DSML products,


together with the commonality of data stored in the cloud and criticality of cloud
compute resources for DSML engineering, means Amazon SageMaker, Google
Vertex AI and Microsoft Azure Machine Learning will further solidify their positions as
dominant players in the market. However, other vendors will continue to flourish as
part of cloud ecosystems through tight integration and focus on different parts of the
DSML stack and specialized solutions. DataRobot’s AI cloud for industries and SAS
industry solutions are examples of this type of integration.

Market Analysis
The DSML engineering platform market is still an emerging and immature market but
has many established vendors that have been adding functionality to their DSML
platforms to ease the frustration organizations face when deploying and running
models in production. These issues include:

• Building a data pipeline that supports the usage of the model in the correct
context (low latency or high throughput, for example)

• Building a model deployment pipeline, which includes code refactoring,


automated testing, monitoring and updating, quality validation, packaging and
deploying into the correct target state (container, service, code)

• Ensuring the observability of models, including mapping model outputs to


model versions and training data and being able to identify, explain and
correct data and concept drift

Not all barriers are technical in nature, and they are often resolved by improving
processes and collaboration (see 4 Machine Learning Best Practices to Achieve
Project Success). This demand and gap in the market has also allowed smaller
vendors to create offerings, either as all-purpose platforms or targeted on certain
tasks in the ML model life cycle, referred to as MLOps. Gartner’s social media
analysis of MLOps and related terms shows that the topic of MLOps platforms had
the biggest share of voice in 2021, along with MLOps capabilities such as continuous
monitoring, model governance and continuous delivery, as shown in Figure 2.
Figure 2: Share of Voice for MLOps in Social Media 2021

The emergence of MLOps has fragmented the DSML market into four broad
categories:

Multipersona DSML — A number of platforms fall in the category of being both


DSML multipersona and engineering platforms. They typically provide a low-code
interface for domain experts and citizen data scientists to work on predictive
analytics tasks, but also contain a coding environment where experts can take a
human or autogenerated ML model and deploy it as a service with basic monitoring
capability. Vendors that offer such platforms are expanding their capability and
appeal to all points on the user spectrum.

DSML Engineering — These platforms are focused on serving the needs of expert
data scientists and delivery teams responsible for building and maintaining ML and
AI solutions. They provide an end-to-end platform with developer tools for managing
code, data, experiments, models, model outputs and associated pipelines, often
integrating with DevOps tools and open-source frameworks. They also provide and
manage their own compute servers, while also able to connect to external compute
resources.

MLOps — Platforms in this category are focused on easing the process of


operationalizing models by integrating with existing developer tools. They typically
provide an abstraction layer for managing and tracking experiments and model
training, deployment and monitoring. These processes are abstracted through APIs
and libraries, which data scientists can utilize as they are writing code. The functions
provided integrate to back-end services such as model registries, feature stores,
monitoring services and infrastructure usage logs.

Specialists — A number of tools and platforms within the MLOps category focus on a
subset of capabilities. This can include explainability, security, deployment,
monitoring and governance. More information on these capabilities and platforms
can be found in Market Guide for AI Trust, Risk and Security Management.

As the market evolves, the continuation of mergers and acquisitions between


vendors that offer a full-stack engineering platform and those that are MLOps or
specialists will continue. Specialists are also likely to work together to enable
interoperability and build their own ecosystem of tightly integrated tools.

Open Source Remains Central to DSML Engineering


Platforms
Open-source libraries have become the standard utility used in the DSML domain.
Therefore, DSML platforms in general de-emphasize the provision of proprietary
libraries, algorithms and techniques in favor of supporting the comprehensive and
continuously advancing open-source set of frameworks. DSML engineering
platforms continue this trend by incorporating many open-source packages, with
their associated runtime environment, within the platform. A small number of
platforms also support libraries for multiple development languages including Python,
R, Go, C++, Scala and Julia.

Notebooks are a key tool in a data scientist’s toolbox for data exploration,
experimentation, collaboration and sharing and remain front and center in DSML
engineering platforms. Recent notebook innovations include real-time
collaboration, autopackaging, and deployment and auditability. Future innovations
will continue to focus on bringing notebook-based experiments into live production
settings.

As the development phase becomes increasingly commoditized by open source,


DSML engineering platforms turn their attention to developing supporting tools and
functions that make development scalable and collaborative, both between different
life cycle phases and between different teams. Many open-source frameworks have
emerged to also support these core needs of DSML teams (such as MLflow,
Metaflow and TensorFlow Extended [TFX]), yet being able to manage each one and
fit them together in a cohesive manner is out of reach for most organizations. DSML
engineering platforms either replicate and enhance the ideas from open source or
build a commercial offering on top of an open-source framework in order to abstract
away many low-level functions that require deep technical expertise to execute.

The Emerging Role of Composability, Metadata and Orchestration


The DSML development process has changed significantly since the introduction of
the Cross-Industry Standard Process for Data Mining (CRISP-DM) life cycle.
Traditionally, each step in the process was performed linearly, with a singular output
from one process forming the entry to the next. Now, agile development and the
need for greater speed and reproducibility has led to almost every stage of the
process being decomposed into components with associated metadata
(see Table 2).

Table 2: Metadata Generated From the DSML Engineering Life Cycle


Enlarge Table

Business
Data ML Models Model Outputs
Understanding

User stories ETL store Hyperparameters Model accuracy

Ownership Data Experiments Infrastructure


versioning utilization

Task Feature store Code versioning Model


management explainability

Business KPIs Security and Lineage Execution cycles


access

Source: Gartner (May 2022)


DSML engineering platforms store some or all the above metadata types, but
crucially, many also embrace composability by giving API access to each individual
type. This trend brings dynamism to the market as organizations need not be tied to
monolithic platforms but can build their own solutions in a decentralized, composable
environment. As the proliferation continues, it is inevitable that metadata
management and cataloging will become a core part of DSML engineering platform
architecture.

Supporting the ecosystem of platforms and tools needed to sustain high-velocity


DSML development are AI orchestration platforms (see Cool Vendors in Enterprise
AI Operationalization and Engineering). These services are a part of the DSML stack
that act as the glue between different services participating in a modular DSML
architecture and utilizing generated metadata. This orchestration will increasingly
become augmented by DSML engineering platforms themselves and reduce the
need for manual configuration in a trend similar to the emergence of augmentation
for infrastructure monitoring (see Platform Teams and AIOps Will Redefine DevOps
Approaches by 2025).

Further details on these providers and others can be found in the Gartner
research Tool: Vendor Identification for Data Science and Machine Learning
Platforms (the Market Guide is limited to a maximum of 40 vendors [see Note 1
and Note 2]).

Representative Vendors
The vendors listed in this Market Guide do not imply an exhaustive list. This section
is intended to provide more understanding of the market and its offerings.

Market Introduction
Table 3: Representative Vendors in DSML Engineering Platform Market
Enlarge Table

Vendor Product(s)

4Paradigm 4Paradigm Sage AIOS, Sage Studio, Sage


HyperCycle

Activeeon ProActive Machine Learning

Alibaba Cloud Machine Learning Platform for AI (PAI)


Vendor Product(s)

Altair Altair Data Analytics, Knowledge Studio,


SmartWorks, Panopticon, WPS Analytics,
HyperStudy, Monarch

Amazon Web Amazon SageMaker


Services (AWS)

C3 AI C3 AI Application Platform, C3 AI Studio, C3 AI Ex


Machina

Cloudera Cloudera Data Platform (CDP)

cnvrg.io cnvrg.io (self-hosted), cnvrg.io Metacloud (SaaS)

Comet Comet Experiment Management, Comet Model


Production Monitoring (MPM)

Databricks Databricks Lakehouse Platform

Dataiku Dataiku

DataRobot DataRobot AI Cloud

DataVision BeeYard

Deepnote Deepnote

Domino Data Lab Domino Enterprise MLOps Platform

Exponential AI Enso
Vendor Product(s)

FICO FICO Platform

ForePaaS ForePaaS Platform

Google Vertex AI, BigQuery

HPE HPE Ezmeral ML Ops

IBM IBM Watson Studio on Cloud Pak for Data, IBM


SPSS

Iguazio The Iguazio MLOps Platform

KNIME KNIME Analytics Platform, KNIME Server

MathWorks MATLAB

Microsoft Azure Machine Learning

Neo4j Neo4j Graph Data Science (includes Neo4j Graph


Database, Bloom, Browser), AuraDS

Oracle OCI Data and AI Platform, Oracle Machine Learning


available in Oracle Database, Oracle Analytics Cloud

Palantir Foundry
Technologies

RapidMiner RapidMiner Platform consists of Studio, AI Hub as a


bundle
Vendor Product(s)

Red Hat Red Hat OpenShift, Red Hat OpenShift Data Science

RStudio PBC RStudio Team (Bundle of RStudio Workbench,


RStudio Connect, RStudio Package Manager)

Run:AI Run:AI Atlas Platform

SAS SAS Visual Data Science Decisioning

Scale AI Scale Rapid, Nucleus, Launch, Validate, Document


AI, Collect, Image/Video, Mapping, Synthetic

Teradata Vantage includes in-database analytics, Bring Your


Own Model (BYOM), Open Analytics Framework and
others

TIBCO Software TIBCO Data Science, TIBCO Spotfire, TIBCO


Streaming, TIBCO Model Ops

TigerGraph TigerGraph Enterprise Graph Database, TigerGraph


Graph Data Science Library, TigerGraph ML
Workbench

TruEra TruEra Diagnostics, TruEra Monitoring

Valohai Valohai

Verta Verta Platform

Source: Gartner (May 2022)

Market Recommendations
Data and analytics leaders must capitalize on trends and configure their strategy for
DSML engineering platforms by:

• Assessing the current state of model development practices across data, data
science, machine learning engineering and operations. Assess the DSML
engineering platforms against current process limitations, and consider
specialists for acute needs such as explainability, testing and monitoring.

• Creating upskilling initiatives for the key roles involved in DSML


engineering (see Leading Upskilling Initiatives in Data Science and Machine
Learning). Data scientists should be required to learn foundational DevOps
practices, and operational experts should be upskilled to have a better
understanding of DSML techniques.

• Identifying whether the capabilities of a given DSML engineering platform


match the requirements and ambition of the organization. The level of DSML
maturity and skills, AI product centricity and organizational value need to be
considered when deciding the type of DSML platform required.

Evidence
Approved Methodology: Gartner conducts social listening analysis leveraging third-
party data tools to complement or supplement the other fact bases presented in this
document. Due to its qualitative and organic nature, the results should not be used
separately from the rest of this research. No conclusions should be drawn from this
data alone. Social media data in reference is from 1 January 2019 through 31
December 2021 in all geographies (except China) and recognized languages.

The SMA Team: Mani Ratnam and Talmeez Fahim from the Social Media Analytics
Team contributed to this research.
1
Forecast Analysis: Artificial Intelligence Software, Worldwide

Note 1: Representative Vendor Selection


The list of vendors is not exhaustive, and it represents vendors that Gartner has
identified under the scope of the emerging data science and machine learning
engineering platforms market.

Note 2: Gartner’s Initial Market Coverage


This Market Guide provides Gartner’s initial coverage of the market and focuses on
the market definition, rationale for the market and market dynamics.

IS THIS CONTENT HELPFUL TO YOU?


YESNO

© 2022 Gartner, Inc. and/or its affiliates. All rights reserved. Gartner is a registered trademark of
Gartner, Inc. and its affiliates. This publication may not be reproduced or distributed in any form
without Gartner's prior written permission. It consists of the opinions of Gartner's research
organization, which should not be construed as statements of fact. While the information
contained in this publication has been obtained from sources believed to be reliable, Gartner
disclaims all warranties as to the accuracy, completeness or adequacy of such information.
Although Gartner research may address legal and financial issues, Gartner does not provide
legal or investment advice and its research should not be construed or used as such. Your
access and use of this publication are governed by Gartner’s Usage Policy. Gartner prides itself
on its reputation for independence and objectivity. Its research is produced independently by its
research organization without input or influence from any third party. For further information, see
"Guiding Principles on Independence and Objectivity."

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy