Esas Doc Ms
Esas Doc Ms
Esas Doc Ms
SDK v1
The Azure SDK examples in articles in this section require the azureml-core , or Python SDK v1 for Azure
Machine Learning. The Python SDK v2 is now available in preview.
The v1 and v2 Python SDK packages are incompatible, and v2 style of coding will not work for articles in this
directory. However, machine learning workspaces and all underlying resources can be interacted with from
either, meaning one user can create a workspace with the SDK v1 and another can submit jobs to the same
workspace with the SDK v2.
We recommend not to install both versions of the SDK on the same environment, since it can cause clashes and
confusion in the code.
CLI v1
The Azure CLI commands in articles in this section require the azure-cli-ml , or v1, extension for Azure
Machine Learning. The enhanced v2 CLI using the ml extension is now available and recommended.
The extensions are incompatible, so v2 CLI commands will not work for articles in this directory. However,
machine learning workspaces and all underlying resources can be interacted with from either, meaning one user
can create a workspace with the v1 CLI and another can submit jobs to the same workspace with the v2 CLI.
Next steps
For more information on installing and using the different extensions, see the following articles:
azure-cli-ml - Install, set up, and use the CLI (v1)
ml - Install and set up the CLI (v2)
For more information on installing and using the different SDK versions:
azureml-core - Install the Azure Machine Learning SDK (v1) for Python
azure-ai-ml - Install the Azure Machine Learning SDK (v2) for Python
What is Azure Machine Learning?
5/25/2022 • 6 minutes to read • Edit Online
Azure Machine Learning is a cloud service for accelerating and managing the machine learning project lifecycle.
Machine learning professionals, data scientists, and engineers can use it in their day-to-day workflows: Train and
deploy models, and manage MLOps.
You can create a model in Azure Machine Learning or use a model built from an open-source platform, such as
Pytorch, TensorFlow, or scikit-learn. MLOps tools help you monitor, retrain, and redeploy models.
TIP
Free trial! If you don’t have an Azure subscription, create a free account before you begin. Try the free or paid version of
Azure Machine Learning. You get credits to spend on Azure services. After they're used up, you can keep the account and
use free Azure services. Your credit card is never charged unless you explicitly change your settings and ask to be charged.
IMPORTANT
Azure Machine Learning doesn't store or process your data outside of the region where you deploy.
Train models
In Azure Machine Learning, you can run your training script in the cloud or build a model from scratch.
Customers often bring models they've built and trained in open-source frameworks, so they can operationalize
them in the cloud.
Open and interoperable
Data scientists can use models in Azure Machine Learning that they've created in common Python frameworks,
such as:
PyTorch
TensorFlow
scikit-learn
XGBoost
LightGBM
Other languages and frameworks are supported as well, including:
R
.NET
See Open-source integration with Azure Machine Learning.
Automated featurization and algorithm selection (AutoML )
In a repetitive, time-consuming process, in classical machine learning data scientists use prior experience and
intuition to select the right data featurization and algorithm for training. Automated ML (AutoML) speeds this
process and can be used through the studio UI or Python SDK.
See What is automated machine learning?
Hyperparameter optimization
Hyperparameter optimization, or hyperparameter tuning, can be a tedious task. Azure Machine Learning can
automate this task for arbitrary parameterized commands with little modification to your job definition. Results
are visualized in the studio.
See How to tune hyperparameters.
Multinode distributed training
Efficiency of training for deep learning and sometimes classical machine learning training jobs can be drastically
improved via multinode distributed training. Azure Machine Learning compute clusters offer the latest GPU
options.
Supported via Azure Arc-attached Kubernetes (preview) and Azure ML compute clusters:
PyTorch
TensorFlow
MPI
The MPI distribution can be used for Horovod or custom multinode logic. Additionally, Apache Spark is
supported via Azure Synapse Analytics Spark clusters (preview).
See Distributed training with Azure Machine Learning.
Embarrassingly parallel training
Scaling a machine learning project may require scaling embarrassingly parallel model training. This pattern is
common for scenarios like forecasting demand, where a model may be trained for many stores.
Deploy models
To bring a model into production, it is deployed. Azure Machine Learning's managed endpoints abstract the
required infrastructure for both batch or real-time (online) model scoring (inferencing).
Real-time and batch scoring (inferencing)
Batch scoring, or batch inferencing, involves invoking an endpoint with a reference to data. The batch endpoint
runs jobs asynchronously to process data in parallel on compute clusters and store the data for further analysis.
Real-time scoring, or online inferencing, involves invoking an endpoint with one or more model deployments
and receiving a response in near-real-time via HTTPs. Traffic can be split across multiple deployments, allowing
for testing new model versions by diverting some amount of traffic initially and increasing once confidence in
the new model is established.
See:
Deploy a model with a real-time managed endpoint
Use batch endpoints for scoring
Next steps
Start using Azure Machine Learning:
Set up an Azure Machine Learning workspace
Tutorial: Build a first machine learning project
Preview: Run model training jobs with the v2 CLI
What is Azure Machine Learning studio?
5/25/2022 • 4 minutes to read • Edit Online
In this article, you learn about Azure Machine Learning studio, the web portal for data scientist developers in
Azure Machine Learning. The studio combines no-code and code-first experiences for an inclusive data science
platform.
In this article you learn:
How to author machine learning projects in the studio.
How to manage assets and resources in the studio.
The differences between Azure Machine Learning studio and ML Studio (classic).
We recommend that you use the most up-to-date browser that's compatible with your operating system. The
following browsers are supported:
Microsoft Edge (latest version)
Safari (latest version, Mac only)
Chrome (latest version)
Firefox (latest version)
Data labeling
Use Azure Machine Learning data labeling to efficiently coordinate image labeling or text labeling
projects.
Released in 2015, ML Studio (classic) was the first drag-and-drop machine learning model builder in Azure.
ML Studio (classic) is a standalone service that only offers a visual experience. Studio (classic) does not
interoperate with Azure Machine Learning.
Azure Machine Learning is a separate, and modernized, service that delivers a complete data science
platform. It supports both code-first and low-code experiences.
Azure Machine Learning studio is a web portal in Azure Machine Learning that contains low-code and no-
code options for project authoring and asset management.
If you're a new user, choose Azure Machine Learning , instead of ML Studio (classic). As a complete ML
platform, Azure Machine Learning offers:
Scalable compute clusters for large-scale training.
Enterprise security and governance.
Interoperable with popular open-source tools.
End-to-end MLOps.
Feature comparison
The following table summarizes the key differences between ML Studio (classic) and Azure Machine Learning.
Drag and drop interface Classic experience Updated experience - Azure Machine
Learning designer
F EAT URE M L ST UDIO ( C L A SSIC ) A Z URE M A C H IN E L EA RN IN G
Experiment Scalable (10-GB training data limit) Scale with compute target
Training compute targets Proprietary compute target, CPU Wide range of customizable training
support only compute targets. Includes GPU and
CPU support
Deployment compute targets Proprietary web service format, not Wide range of customizable
customizable deployment compute targets. Includes
GPU and CPU support
Model format Proprietary format, Studio (classic) only Multiple supported formats depending
on training job type
Automated model training and Not supported Supported. Code-first and no-code
hyperparameter tuning options.
Role-Based Access Control (RBAC) Only contributor and owner role Flexible role definition and RBAC
control
Troubleshooting
Missing user interface items in studio Azure role-based access control can be used to restrict actions
that you can perform with Azure Machine Learning. These restrictions can prevent user interface items from
appearing in the Azure Machine Learning studio. For example, if you are assigned a role that cannot create a
compute instance, the option to create a compute instance will not appear in the studio. For more
information, see Manage users and roles.
Next steps
Visit the studio, or explore the different authoring options with these tutorials:
Start with Quickstart: Get started with Azure Machine Learning. Then use these resources to create your first
experiment with your preferred method:
Run a "Hello world!" Python script (part 1 of 3)
Use a Jupyter notebook to train image classification models
Use automated machine learning to train & deploy models
Use the designer to train & deploy models
Use studio in a secured virtual network
How Azure Machine Learning works: resources and
assets (v2)
5/25/2022 • 6 minutes to read • Edit Online
APPLIES TO: Azure CLI ml extension v2 (current) Python SDK azure-ai-ml v2 (preview)
This article applies to the second version of the Azure Machine Learning CLI & Python SDK (v2). For version one
(v1), see How Azure Machine Learning works: Architecture and concepts (v1)
Azure Machine Learning includes several resources and assets to enable you to perform your machine learning
tasks. These resources and assets are needed to run any job.
Resources : setup or infrastructural resources needed to run a machine learning workflow. Resources
include:
Workspace
Compute
Datastore
Assets : created using Azure ML commands or as part of a training/scoring run. Assets are versioned and can
be registered in the Azure ML workspace. They include:
Model
Environment
Data
Component
This document provides a quick overview of these resources and assets.
Workspace
The workspace is the top-level resource for Azure Machine Learning, providing a centralized place to work with
all the artifacts you create when you use Azure Machine Learning. The workspace keeps a history of all jobs,
including logs, metrics, output, and a snapshot of your scripts. The workspace stores references to resources like
datastores and compute. It also holds all assets like models, environments, components and data asset.
Create a workspace
CLI
Python SDK
Compute
A compute is a designated compute resource where you run your job or host your endpoint. Azure Machine
learning supports the following types of compute:
Compute cluster - a managed-compute infrastructure that allows you to easily create a cluster of CPU or
GPU compute nodes in the cloud.
Compute instance - a fully configured and managed development environment in the cloud. You can use
the instance as a training or inference compute for development and testing. It's similar to a virtual machine
on the cloud.
Inference cluster - used to deploy trained machine learning models to Azure Kubernetes Service. You can
create an Azure Kubernetes Service (AKS) cluster from your Azure ML workspace, or attach an existing AKS
cluster.
Attached compute - You can attach your own compute resources to your workspace and use them for
training and inference.
CLI
Python SDK
Datastore
Azure Machine Learning datastores securely keep the connection information to your data storage on Azure, so
you don't have to code it in your scripts. You can register and create a datastore to easily connect to your storage
account, and access the data in your underlying storage service. The CLI v2 and SDK v2 support the following
types of cloud-based storage services:
Azure Blob Container
Azure File Share
Azure Data Lake
Azure Data Lake Gen2
CLI
Python SDK
Model
Azure machine learning models consist of the binary file(s) that represent a machine learning model and any
corresponding metadata. Models can be created from a local or remote file or directory. For remote locations
https , wasbs and azureml locations are supported. The created model will be tracked in the workspace under
the specified name and version. Azure ML supports three types of storage format for models:
custom_model
mlflow_model
triton_model
Creating a model
CLI
Python SDK
Environment
Azure Machine Learning environments are an encapsulation of the environment where your machine learning
task happens. They specify the software packages, environment variables, and software settings around your
training and scoring scripts. The environments are managed and versioned entities within your Machine
Learning workspace. Environments enable reproducible, auditable, and portable machine learning workflows
across a variety of computes.
Types of environment
Azure ML supports two types of environments: curated and custom.
Curated environments are provided by Azure Machine Learning and are available in your workspace by default.
Intended to be used as is, they contain collections of Python packages and settings to help you get started with
various machine learning frameworks. These pre-created environments also allow for faster deployment time.
For a full list, see the curated environments article.
In custom environments, you're responsible for setting up your environment and installing packages or any
other dependencies that your training or scoring script needs on the compute. Azure ML allows you to create
your own environment using
A docker image
A base docker image with a conda YAML to customize further
A docker build context
Create an Azure ML custom environment
CLI
Python SDK
Data
Azure Machine Learning allows you to work with different types of data:
URIs (a location in local/cloud storage)
uri_folder
uri_file
Tables (a tabular data abstraction)
mltable
Primitives
string
boolean
number
For most scenarios, you'll use URIs ( uri_folder and uri_file ) - a location in storage that can be easily mapped
to the filesystem of a compute node in a job by either mounting or downloading the storage to the node.
mltable is an abstraction for tabular data that is to be used for AutoML Jobs, Parallel Jobs, and some advanced
scenarios. If you're just starting to use Azure Machine Learning and aren't using AutoML, we strongly encourage
you to begin with URIs.
Component
An Azure Machine Learning component is a self-contained piece of code that does one step in a machine
learning pipeline. Components are the building blocks of advanced machine learning pipelines. Components can
do tasks such as data processing, model training, model scoring, and so on. A component is analogous to a
function - it has a name, parameters, expects input, and returns output.
Next steps
Train models with the CLI (v2)
Train models with the Azure ML Python SDK v2 (preview)
Quickstart: Create workspace resources you need to
get started with Azure Machine Learning
5/25/2022 • 5 minutes to read • Edit Online
In this quickstart, you'll create a workspace and then add compute resources to the workspace. You'll then have
everything you need to get started with Azure Machine Learning.
The workspace is the top-level resource for your machine learning activities, providing a centralized place to
view and manage the artifacts you create when you use Azure Machine Learning. The compute resources
provide a pre-configured cloud-based environment you can use to train, deploy, automate, manage, and track
machine learning models.
Prerequisites
An Azure account with an active subscription. Create an account for free.
Location Select the location closest to your users and the data
resources to create your workspace.
WARNING
It can take several minutes to create your workspace in the cloud.
NOTE
When the cluster is created, it will have 0 nodes provisioned. The cluster does not incur costs until you submit a job. This
cluster will scale down when it has been idle for 2,400 seconds (40 minutes). This will give you time to use it in a few
tutorials if you wish without waiting for it to scale back up.
Clean up resources
If you plan to continue now to the next tutorial, skip to Next steps.
Stop compute instance
If you're not going to use it now, stop the compute instance:
1. In the studio, on the left, select Compute .
2. In the top tabs, select Compute instances
3. Select the compute instance in the list.
4. On the top toolbar, select Stop .
Delete all resources
IMPORTANT
The resources that you created can be used as prerequisites to other Azure Machine Learning tutorials and how-to
articles.
If you don't plan to use any of the resources that you created, delete them so you don't incur any charges:
1. In the Azure portal, select Resource groups on the far left.
2. From the list, select the resource group that you created.
3. Select Delete resource group .
4. Enter the resource group name. Then select Delete .
Next steps
You now have an Azure Machine Learning workspace that contains:
A compute instance to use for your development environment.
A compute cluster to use for submitting training runs.
Use these resources to learn more about Azure Machine Learning and train a model with Python scripts.
Learn more with Python scripts
Tutorial: Get started with a Python script in Azure
Machine Learning (part 1 of 3)
5/25/2022 • 6 minutes to read • Edit Online
Prerequisites
Complete Quickstart: Set up your workspace to get started with Azure Machine Learning to create a
workspace, compute instance, and compute cluster to use in this tutorial series.
6. Name the new folder src . Use the Edit location link if the file location is not correct.
7. To the right of the src folder, use the ... to create a new file in the src folder.
8. Name your file hello.py. Switch the File type to Python (.py)*.
Copy this code into your file:
# src/hello.py
print("Hello world!")
You'll see the output of the script in the terminal window that opens. Close the tab and select Terminate to close
the session.
# get-started/run-hello.py
from azureml.core import Workspace, Experiment, Environment, ScriptRunConfig
ws = Workspace.from_config()
experiment = Experiment(workspace=ws, name='day1-experiment-hello')
run = experiment.submit(config)
aml_url = run.get_portal_url()
print(aml_url)
TIP
If you used a different name when you created your compute cluster, make sure to adjust the name in the code
compute_target='cpu-cluster' as well.
Workspace connects to your Azure Machine Learning workspace, so that you can communicate with your Azure
Machine Learning resources.
experiment = Experiment( ... )
Experiment provides a simple way to organize multiple runs under a single name. Later you can see how
experiments make it easy to compare metrics between dozens of runs.
config = ScriptRunConfig( ... )
ScriptRunConfig wraps your hello.py code and passes it to your workspace. As the name suggests, you can
use this class to configure how you want your script to run in Azure Machine Learning. It also specifies what
compute target the script will run on. In this code, the target is the compute cluster that you created in the setup
tutorial.
run = experiment.submit(config)
Submits your script. This submission is called a run. A run encapsulates a single execution of your code. Use a
run to monitor the script progress, capture the output, analyze the results, visualize metrics, and more.
aml_url = run.get_portal_url()
The run object provides a handle on the execution of your code. Monitor its progress from the Azure Machine
Learning studio with the URL that's printed from the Python script.
NOTE
You may see some warnings starting with Failure while loading azureml_run_type_providers.... You can ignore
these warnings. Use the link at the bottom of these warnings to view your output.
Next steps
In this tutorial, you took a simple "Hello world!" script and ran it on Azure. You saw how to connect to your
Azure Machine Learning workspace, create an experiment, and submit your hello.py code to the cloud.
In the next tutorial, you build on these learnings by running something more interesting than
print("Hello world!") .
NOTE
If you want to finish the tutorial series here and not progress to the next step, remember to clean up your resources.
Tutorial: Train your first machine learning model
(part 2 of 3)
5/25/2022 • 8 minutes to read • Edit Online
Prerequisites
Completion of part 1 of the series.
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(3, 6, 5)
self.pool = nn.MaxPool2d(2, 2)
self.conv2 = nn.Conv2d(6, 16, 5)
self.fc1 = nn.Linear(16 * 5 * 5, 120)
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)
2. On the toolbar, select Save to save the file. Close the tab if you wish.
3. Next, define the training script, also in the src subfolder. This script downloads the CIFAR10 dataset by
using PyTorch torchvision.dataset APIs, sets up the network defined in model.py, and trains it for two
epochs by using standard SGD and cross-entropy loss.
Create a train.py script in the src subfolder:
import torch
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms
if __name__ == "__main__":
running_loss = 0.0
for i, data in enumerate(trainloader, 0):
# unpack the data
inputs, labels = data
# print statistics
running_loss += loss.item()
if i % 2000 == 1999:
loss = running_loss / 2000
print(f"epoch={epoch + 1}, batch={i + 1:5}: loss {loss:.2f}")
running_loss = 0.0
print("Finished Training")
name: pytorch-env
channels:
- defaults
- pytorch
dependencies:
- python=3.6.2
- pytorch
- torchvision
2. On the toolbar, select Save to save the file. Close the tab if you wish.
# run-pytorch.py
from azureml.core import Workspace
from azureml.core import Experiment
from azureml.core import Environment
from azureml.core import ScriptRunConfig
if __name__ == "__main__":
ws = Workspace.from_config()
experiment = Experiment(workspace=ws, name='day1-experiment-train')
config = ScriptRunConfig(source_directory='./src',
script='train.py',
compute_target='cpu-cluster')
run = experiment.submit(config)
aml_url = run.get_portal_url()
print(aml_url)
TIP
If you used a different name when you created your compute cluster, make sure to adjust the name in the code
compute_target='cpu-cluster' as well.
NOTE
You may see some warnings starting with Failure while loading azureml_run_type_providers.... You can ignore
these warnings. Use the link at the bottom of these warnings to view your output.
If you see an error Your total snapshot size exceeds the limit , the data folder is located in the
source_directory value used in ScriptRunConfig .
Select the ... at the end of the folder, then select Move to move data to the get-star ted folder.
if __name__ == "__main__":
# define convolutional network
net = Net()
# set up pytorch loss / optimizer
criterion = torch.nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)
# train the network
for epoch in range(2):
running_loss = 0.0
for i, data in enumerate(trainloader, 0):
# unpack the data
inputs, labels = data
# zero the parameter gradients
optimizer.zero_grad()
# forward + backward + optimize
outputs = net(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
# print statistics
running_loss += loss.item()
if i % 2000 == 1999:
loss = running_loss / 2000
# ADDITIONAL CODE: log loss metric to AML
run.log('loss', loss)
print(f'epoch={epoch + 1}, batch={i + 1:5}: loss {loss:.2f}')
running_loss = 0.0
print('Finished Training')
...
# ADDITIONAL CODE: log loss metric to AML
run.log('loss', loss)
name: pytorch-env
channels:
- defaults
- pytorch
dependencies:
- python=3.6.2
- pytorch
- torchvision
- pip
- pip:
- azureml-sdk
Make sure you save this file before you submit the run.
Submit the run to Azure Machine Learning
Select the tab for the run-pytorch.py script, then select Save and run script in terminal to re-run the run-
pytorch.py script. Make sure you've saved your changes to pytorch-aml-env.yml first.
This time when you visit the studio, go to the Metrics tab where you can now see live updates on the model
training loss! It may take a 1 to 2 minutes before the training begins.
Next steps
In this session, you upgraded from a basic "Hello world!" script to a more realistic training script that required a
specific Python environment to run. You saw how to use curated Azure Machine Learning environments. Finally,
you saw how in a few lines of code you can log metrics to Azure Machine Learning.
There are other ways to create Azure Machine Learning environments, including from a pip requirements.txt file
or from an existing local Conda environment.
In the next session, you'll see how to work with data in Azure Machine Learning by uploading the CIFAR10
dataset to Azure.
Tutorial: Bring your own data
NOTE
If you want to finish the tutorial series here and not progress to the next step, remember to clean up your resources.
Tutorial: Upload data and train a model (part 3 of 3)
5/25/2022 • 7 minutes to read • Edit Online
Prerequisites
You'll need the data that was downloaded in the previous tutorial. Make sure you have completed these steps:
1. Create the training script.
2. Test locally.
NOTE
The use of argparse parameterizes the script.
import os
import argparse
import torch
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms
from model import Net
from azureml.core import Run
run = Run.get_context()
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument(
'--data_path',
type=str,
help='Path to the training data'
)
parser.add_argument(
'--learning_rate',
type=float,
default=0.001,
help='Learning rate for SGD'
)
parser.add_argument(
'--momentum',
type=float,
default=0.9,
help='Momentum for SGD'
)
args = parser.parse_args()
print("===== DATA =====")
print("DATA PATH: " + args.data_path)
print("LIST FILES IN DATA PATH...")
print(os.listdir(args.data_path))
print("================")
# prepare DataLoader for CIFAR10 data
transform = transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])
trainset = torchvision.datasets.CIFAR10(
root=args.data_path,
train=True,
download=False,
transform=transform,
)
trainloader = torch.utils.data.DataLoader(
trainset,
batch_size=4,
shuffle=True,
num_workers=2
)
# define convolutional network
net = Net()
# set up pytorch loss / optimizer
criterion = torch.nn.CrossEntropyLoss()
optimizer = optim.SGD(
net.parameters(),
lr=args.learning_rate,
momentum=args.momentum,
)
# train the network
for epoch in range(2):
running_loss = 0.0
for i, data in enumerate(trainloader, 0):
# unpack the data
inputs, labels = data
# zero the parameter gradients
optimizer.zero_grad()
# forward + backward + optimize
outputs = net(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
# print statistics
running_loss += loss.item()
if i % 2000 == 1999:
loss = running_loss / 2000
run.log('loss', loss) # log loss metric to AML
print(f'epoch={epoch + 1}, batch={i + 1:5}: loss {loss:.2f}')
print(f'epoch={epoch + 1}, batch={i + 1:5}: loss {loss:.2f}')
running_loss = 0.0
print('Finished Training')
Also, the train.py script was adapted to update the optimizer to use the user-defined parameters:
optimizer = optim.SGD(
net.parameters(),
lr=args.learning_rate, # get learning rate from command-line argument
momentum=args.momentum, # get momentum from command-line argument
)
NOTE
Azure Machine Learning allows you to connect other cloud-based datastores that store your data. For more details, see
the datastores documentation.
1. Create a new Python control script in the get-star ted folder (make sure it is in get-star ted , not in the
/src folder). Name the script upload-data.py and copy this code into the file:
# upload-data.py
from azureml.core import Workspace
from azureml.core import Dataset
from azureml.data.datapath import DataPath
ws = Workspace.from_config()
datastore = ws.get_default_datastore()
Dataset.File.upload_directory(src_dir='data',
target=DataPath(datastore, "datasets/cifar10")
)
The target_path value specifies the path on the datastore where the CIFAR10 data will be uploaded.
TIP
While you're using Azure Machine Learning to upload the data, you can use Azure Storage Explorer to upload ad
hoc files. If you need an ETL tool, you can use Azure Data Factory to ingest your data into Azure.
2. Select Save and run script in terminal to run the upload-data.py script.
You should see the following standard output:
Uploading ./data\cifar-10-batches-py\data_batch_2
Uploaded ./data\cifar-10-batches-py\data_batch_2, 4 files out of an estimated total of 9
.
.
Uploading ./data\cifar-10-batches-py\data_batch_5
Uploaded ./data\cifar-10-batches-py\data_batch_5, 9 files out of an estimated total of 9
Uploaded 9 files
# run-pytorch-data.py
from azureml.core import Workspace
from azureml.core import Experiment
from azureml.core import Environment
from azureml.core import ScriptRunConfig
from azureml.core import Dataset
if __name__ == "__main__":
ws = Workspace.from_config()
datastore = ws.get_default_datastore()
dataset = Dataset.File.from_files(path=(datastore, 'datasets/cifar10'))
config = ScriptRunConfig(
source_directory='./src',
script='train.py',
compute_target='cpu-cluster',
arguments=[
'--data_path', dataset.as_named_input('input').as_mount(),
'--learning_rate', 0.003,
'--momentum', 0.92],
)
run = experiment.submit(config)
aml_url = run.get_portal_url()
print("Submitted to compute cluster. Click link below")
print("")
print(aml_url)
TIP
If you used a different name when you created your compute cluster, make sure to adjust the name in the code
compute_target='cpu-cluster' as well.
A dataset is used to reference the data you uploaded to Azure Blob Storage. Datasets are an abstraction layer on
top of your data that are designed to improve reliability and trustworthiness.
config = ScriptRunConfig(...)
ScriptRunConfig is modified to include a list of arguments that will be passed into train.py . The
dataset.as_named_input('input').as_mount() argument means the specified directory will be mounted to the
compute target.
NOTE
You may see some warnings starting with Failure while loading azureml_run_type_providers.... You can ignore these
warnings. Use the link at the bottom of these warnings to view your output.
Notice:
Azure Machine Learning has mounted Blob Storage to the compute cluster automatically for you.
The dataset.as_named_input('input').as_mount() used in the control script resolves to the mount point.
Clean up resources
If you plan to continue now to another tutorial, or to start your own training runs, skip to Next steps.
Stop compute instance
If you're not going to use it now, stop the compute instance:
1. In the studio, on the left, select Compute .
2. In the top tabs, select Compute instances
3. Select the compute instance in the list.
4. On the top toolbar, select Stop .
Delete all resources
IMPORTANT
The resources that you created can be used as prerequisites to other Azure Machine Learning tutorials and how-to
articles.
If you don't plan to use any of the resources that you created, delete them so you don't incur any charges:
1. In the Azure portal, select Resource groups on the far left.
2. From the list, select the resource group that you created.
3. Select Delete resource group .
Next steps
In this tutorial, we saw how to upload data to Azure by using Datastore . The datastore served as cloud storage
for your workspace, giving you a persistent and flexible place to keep your data.
You saw how to modify your training script to accept a data path via the command line. By using Dataset , you
were able to mount a directory to the remote run.
Now that you have a model, learn:
How to deploy models with Azure Machine Learning.
Tutorial: Train and deploy an image classification
model with an example Jupyter Notebook
5/25/2022 • 8 minutes to read • Edit Online
Prerequisites
Complete the Quickstart: Get started with Azure Machine Learning to:
Create a workspace.
Create a cloud-based compute instance to use for your development environment.
NOTE
The video helps you understand the process, but shows opening a different file. For this tutorial, once you've cloned the
tutorials folder, use instructions below to open the cloned notebook.
8. A list of folders shows each user who accesses the workspace. Select your folder to clone the tutorials
folder there.
IMPORTANT
You can view notebooks in the samples folder but you can't run a notebook from there. To run a notebook, make
sure you open the cloned version of the notebook in the User Files section.
IMPORTANT
The rest of this article contains the same content as you see in the notebook.
Switch to the Jupyter Notebook now if you want to run the code while you read along. To run a single code cell in a
notebook, click the code cell and hit Shift+Enter . Or, run the entire notebook by choosing Run all from the top toolbar.
Import data
Before you train a model, you need to understand the data you're using to train it. In this section, learn how to:
Download the MNIST dataset
Display some sample images
You'll use Azure Open Datasets to get the raw MNIST data files. Azure Open Datasets are curated public datasets
that you can use to add scenario-specific features to machine learning solutions for better models. Each dataset
has a corresponding class, MNIST in this case, to retrieve the data in different ways.
import os
from azureml.opendatasets import MNIST
mnist_file_dataset = MNIST.get_file_dataset()
mnist_file_dataset.download(data_folder, overwrite=True)
# note we also shrink the intensity values (X) from 0-255 to 0-1. This helps the model converge faster.
X_train = (
load_data(
glob.glob(
os.path.join(data_folder, "**/train-images-idx3-ubyte.gz"), recursive=True
)[0],
False,
)
/ 255.0
)
X_test = (
load_data(
glob.glob(
os.path.join(data_folder, "**/t10k-images-idx3-ubyte.gz"), recursive=True
)[0],
False,
)
/ 255.0
)
y_train = load_data(
glob.glob(
os.path.join(data_folder, "**/train-labels-idx1-ubyte.gz"), recursive=True
)[0],
True,
).reshape(-1)
y_test = load_data(
glob.glob(
os.path.join(data_folder, "**/t10k-labels-idx1-ubyte.gz"), recursive=True
)[0],
True,
).reshape(-1)
# now let's show some randomly chosen images from the traininng set.
count = 0
sample_size = 30
plt.figure(figsize=(16, 6))
for i in np.random.permutation(X_train.shape[0])[:sample_size]:
count = count + 1
plt.subplot(1, sample_size, count)
plt.axhline("")
plt.axvline("")
plt.text(x=10, y=-10, s=y_train[i], fontsize=18)
plt.imshow(X_train[i].reshape(28, 28), cmap=plt.cm.Greys)
plt.show()
The code above displays a random set of images with their labels, similar to this:
View experiment
In the left-hand menu in Azure Machine Learning Studio, select Experiments and then select your experiment
(azure-ml-in10-mins-tutorial ). An experiment is a grouping of many runs from a specified script or piece of
code. Information for the run is stored under that experiment. If the name doesn't exist when you submit an
experiment, if you select your run you will see various tabs containing metrics, logs, explanations, etc.
Deploy model
This next code cell deploys the model to Azure Container Instance.
NOTE
The deployment takes approximately 3 minutes to complete.**
%%time
import uuid
from azureml.core.model import InferenceConfig
from azureml.core.environment import Environment
from azureml.core.model import Model
service.wait_for_deployment(show_output=True)
The scoring script file referenced in the code above can be found in the same folder as this notebook, and has
two functions:
1. An init function that executes once when the service starts - in this function you normally get the model
from the registry and set global variables
2. A run(data) function that executes each time a call is made to the service. In this function, you normally
format the input data, run a prediction, and output the predicted result.
View endpoint
Once the model has been successfully deployed, you can view the endpoint by navigating to Endpoints in the
left-hand menu in Azure Machine Learning Studio. You will be able to see the state of the endpoint
(healthy/unhealthy), logs, and consume (how applications can consume the model).
Clean up resources
If you're not going to continue to use this model, delete the Model service using:
# if you want to keep workspace and only delete endpoint (it will incur cost while running)
service.delete()
If you want to control cost further, stop the compute instance by selecting the "Stop compute" button next to the
Compute dropdown. Then start the compute instance again the next time you need it.
Delete everything
Use these steps to delete your Azure Machine Learning workspace and all compute resources.
IMPORTANT
The resources that you created can be used as prerequisites to other Azure Machine Learning tutorials and how-to
articles.
If you don't plan to use any of the resources that you created, delete them so you don't incur any charges:
1. In the Azure portal, select Resource groups on the far left.
2. From the list, select the resource group that you created.
3. Select Delete resource group .
4. Enter the resource group name. Then select Delete .
Next steps
Learn about all of the deployment options for Azure Machine Learning.
Learn how to create clients for the web service.
Make predictions on large quantities of data asynchronously.
Monitor your Azure Machine Learning models with Application Insights.
Try out the automatic algorithm selection tutorial.
Tutorial: Train a regression model with AutoML and
Python
5/25/2022 • 12 minutes to read • Edit Online
You'll write code using the Python SDK in this tutorial. You'll learn the following tasks:
Download, transform, and clean data using Azure Open Datasets
Train an automated machine learning regression model
Calculate model accuracy
For no-code AutoML, try the following tutorials:
Tutorial: Train no-code classification models
Tutorial: Forecast demand with automated machine learning
Prerequisites
If you don’t have an Azure subscription, create a free account before you begin. Try the free or paid version of
Azure Machine Learning today.
Complete the Quickstart: Get started with Azure Machine Learning if you don't already have an Azure
Machine Learning workspace or a compute instance.
After you complete the quickstart:
1. Select Notebooks in the studio.
2. Select the Samples tab.
3. Open the tutorials/regression-automl-nyc-taxi-data/regression-automated-ml.ipynb notebook.
This tutorial is also available on GitHub if you wish to run it in your own local environment. To get the required
packages,
Install the full automl client.
Run pip install azureml-opendatasets azureml-widgets to get the required packages.
Begin by creating a dataframe to hold the taxi data. When working in a non-Spark environment, Open Datasets
only allows downloading one month of data at a time with certain classes to avoid MemoryError with large
datasets.
To download taxi data, iteratively fetch one month at a time, and before appending it to green_taxi_df
randomly sample 2,000 records from each month to avoid bloating the dataframe. Then preview the data.
green_taxi_df = pd.DataFrame([])
start = datetime.strptime("1/1/2015","%m/%d/%Y")
end = datetime.strptime("1/31/2015","%m/%d/%Y")
green_taxi_df.head(10)
I
L M
L P P
P E R
E P D O
P D P R V
P R P I P O E
I O A C I P M
C P S T P D K C O E
K O S R U O U K F P N T T
U F E I L L P U F A F T O O
P F N P O O L P L Y A S T L T
V D D G D C C O L O M R U I L E A T
E A A E I A A N A N E E R P S H L R
N T T R S T T G T G N A M C A A A A I
D E E C T I I I I I T M E T H M M I M P
O T T O A O O T T T T O X A A O O L O T
R I I U N N N U U U Y U T T R U U F U Y
I M M N C I I D D D .. P N R A G N N E N P
D E E T E D D E E E . E T A X E T T E T E
I
L M
L P P
P E R
E P D O
P D P R V
P R P I P O E
I O A C I P M
C P S T P D K C O E
K O S R U O U K F P N T T
U F E I L L P U F A F T O O
P F N P O O L P L Y A S T L T
V D D G D C C O L O M R U I L E A T
E A A E I A A N A N E E R P S H L R
N T T R S T T G T G N A M C A A A A I
D E E C T I I I I I T M E T H M M I M P
O T T O A O O T T T T O X A A O O L O T
R I I U N N N U U U Y U T T R U U F U Y
I M M N C I I D D D .. P N R A G N N E N P
D E E T E D D E E E . E T A X E T T E T E
1 2 2 2 3 4 N N - 4 - ... 2 1 0 0 0 0 0 n 1
3 0 0 . o o 7 0 7 5 . . . . . a 6.
1 1 1 8 n n 3 . 3 . 5 5 3 0 0 n 3
9 5 5 4 e e . 8 . 0 0 0 0 0 0
6 - - 8 4 9 0
9 0 0 8 4
1 1
- -
1 1
1 1
0 0
5 5
: :
3 4
4 5
: :
4 0
4 3
1 2 2 2 1 0 N N - 4 - ... 2 4 1 0 0 0 0 n 6.
1 0 0 . o o 7 0 7 . . . . . . a 3
2 1 1 6 n n 3 . 3 5 0 5 3 0 0 n 0
9 5 5 9 e e . 8 . 0 0 0 0 0
8 - - 9 1 9
1 0 0 6 6
7 1 1
- -
2 2
0 0
1 1
6 6
: :
2 3
6 0
: :
2 2
9 6
I
L M
L P P
P E R
E P D O
P D P R V
P R P I P O E
I O A C I P M
C P S T P D K C O E
K O S R U O U K F P N T T
U F E I L L P U F A F T O O
P F N P O O L P L Y A S T L T
V D D G D C C O L O M R U I L E A T
E A A E I A A N A N E E R P S H L R
N T T R S T T G T G N A M C A A A A I
D E E C T I I I I I T M E T H M M I M P
O T T O A O O T T T T O X A A O O L O T
R I I U N N N U U U Y U T T R U U F U Y
I M M N C I I D D D .. P N R A G N N E N P
D E E T E D D E E E . E T A X E T T E T E
1 2 2 2 1 0 N N - 4 - ... 2 4 0 0 0 0 0 n 4.
2 0 0 . o o 7 0 7 . . . . . . a 8
7 1 1 4 n n 3 . 3 0 0 5 3 0 0 n 0
8 5 5 5 e e . 7 . 0 0 0 0 0
6 - - 9 6 9
2 0 0 2 1
0 1 1
- -
0 0
1 1
0 0
5 6
: :
5 0
8 0
: :
1 5
0 5
3 2 2 2 1 0 N N - 4 - ... 2 1 0 0 0 0 0 n 1
4 0 0 . o o 7 0 7 2 . . . . . a 3.
8 1 1 0 n n 3 . 3 . 5 5 3 0 0 n 8
4 5 5 0 e e . 7 . 5 0 0 0 0 0
3 - - 8 0 8 0
0 0 0 1 2
1 1
- -
1 1
7 7
0 0
2 2
: :
2 4
0 1
: :
5 3
0 8
I
L M
L P P
P E R
E P D O
P D P R V
P R P I P O E
I O A C I P M
C P S T P D K C O E
K O S R U O U K F P N T T
U F E I L L P U F A F T O O
P F N P O O L P L Y A S T L T
V D D G D C C O L O M R U I L E A T
E A A E I A A N A N E E R P S H L R
N T T R S T T G T G N A M C A A A A I
D E E C T I I I I I T M E T H M M I M P
O T T O A O O T T T T O X A A O O L O T
R I I U N N N U U U Y U T T R U U F U Y
I M M N C I I D D D .. P N R A G N N E N P
D E E T E D D E E E . E T A X E T T E T E
1 1 2 2 1 0 N N - 4 - ... 2 4 0 0 0 0 0 n 5.
2 0 0 . o o 7 0 7 . . . . . a 0
6 1 1 5 n n 3 . 3 0 5 5 0 0 n 0
9 5 5 0 e e . 7 . 0 0 0 0 0
6 - - 9 6 9
2 0 0 2 2
7 1 1
- -
0 0
1 1
0 0
5 5
: :
0 0
4 6
: :
1 2
0 3
8 1 2 2 2 1 N N - 4 - ... 2 6 0 0 0 0 0 n 7.
1 0 0 . o o 7 0 7 . . . . . . a 8
1 1 1 1 n n 3 . 3 5 5 5 3 0 0 n 0
7 5 5 0 e e . 7 . 0 0 0 0 0
5 - - 9 2 9
5 0 0 6 5
1 1
- -
0 0
4 4
1 2
9 0
: :
5 0
7 5
: :
5 4
1 5
I
L M
L P P
P E R
E P D O
P D P R V
P R P I P O E
I O A C I P M
C P S T P D K C O E
K O S R U O U K F P N T T
U F E I L L P U F A F T O O
P F N P O O L P L Y A S T L T
V D D G D C C O L O M R U I L E A T
E A A E I A A N A N E E R P S H L R
N T T R S T T G T G N A M C A A A A I
D E E C T I I I I I T M E T H M M I M P
O T T O A O O T T T T O X A A O O L O T
R I I U N N N U U U Y U T T R U U F U Y
I M M N C I I D D D .. P N R A G N N E N P
D E E T E D D E E E . E T A X E T T E T E
7 1 2 2 1 0 N N - 4 - ... 2 6 0 0 0 0 0 n 6.
3 0 0 . o o 7 0 7 . . . . . . a 8
7 1 1 9 n n 3 . 3 0 0 5 3 0 0 n 0
2 5 5 0 e e . 7 . 0 0 0 0 0
8 - - 8 6 8
1 0 0 8 7
1 1
- -
0 0
3 3
1 1
2 2
: :
2 3
7 3
: :
3 5
1 2
1 1 2 2 1 3 N N - 4 - ... 2 1 0 0 0 0 0 n 1
1 0 0 . o o 7 0 7 2 . . . . . a 3.
3 1 1 3 n n 3 . 3 . 5 5 3 0 0 n 8
9 5 5 0 e e . 7 . 5 0 0 0 0 0
5 - - 9 2 9 0
1 0 0 6 1
1 1
- -
0 0
9 9
2 2
3 3
: :
2 3
5 9
: :
5 5
1 2
I
L M
L P P
P E R
E P D O
P D P R V
P R P I P O E
I O A C I P M
C P S T P D K C O E
K O S R U O U K F P N T T
U F E I L L P U F A F T O O
P F N P O O L P L Y A S T L T
V D D G D C C O L O M R U I L E A T
E A A E I A A N A N E E R P S H L R
N T T R S T T G T G N A M C A A A A I
D E E C T I I I I I T M E T H M M I M P
O T T O A O O T T T T O X A A O O L O T
R I I U N N N U U U Y U T T R U U F U Y
I M M N C I I D D D .. P N R A G N N E N P
D E E T E D D E E E . E T A X E T T E T E
1 2 2 2 1 1 N N - 4 - ... 1 7 0 0 0 1 0 n 9.
5 0 0 . o o 7 0 7 . . . . . . a 5
0 1 1 1 n n 3 . 3 0 0 5 3 7 0 n 5
4 5 5 9 e e . 7 . 0 0 0 5 0
3 - - 9 1 9
6 0 0 4 5
1 1
- -
1 1
1 1
1 1
7 7
: :
1 2
5 2
: :
1 5
4 7
I
L M
L P P
P E R
E P D O
P D P R V
P R P I P O E
I O A C I P M
C P S T P D K C O E
K O S R U O U K F P N T T
U F E I L L P U F A F T O O
P F N P O O L P L Y A S T L T
V D D G D C C O L O M R U I L E A T
E A A E I A A N A N E E R P S H L R
N T T R S T T G T G N A M C A A A A I
D E E C T I I I I I T M E T H M M I M P
O T T O A O O T T T T O X A A O O L O T
R I I U N N N U U U Y U T T R U U F U Y
I M M N C I I D D D .. P N R A G N N E N P
D E E T E D D E E E . E T A X E T T E T E
4 2 2 N N - 4 - ... 2 5 0 0 0 0 0 n 6
3 0 o o 7 0 7 . . . . . . a .
2 1 n n 3 . 3 0 5 5 3 0 0 n 3
1 5 e e . 7 . 0 0 0 0 0 0
3 - 9 1 9
6 0 4 4
1
-
2
2
2
3
:
1
6
:
3
3
2
0
1
5
-
0
1
-
2
2
2
3
:
2
0
:
1
3
1
0
.
6
5
Remove some of the columns that you won't need for training or additional feature building. Automate machine
learning will automatically handle time-based features such as lpepPickupDatetime .
columns_to_remove = ["lpepDropoffDatetime", "puLocationId", "doLocationId", "extra", "mtaTax",
"improvementSurcharge", "tollsAmount", "ehailFee", "tripType", "rateCodeID",
"storeAndFwdFlag", "paymentType", "fareAmount", "tipAmount"
]
for col in columns_to_remove:
green_taxi_df.pop(col)
green_taxi_df.head(5)
Cleanse data
Run the describe() function on the new dataframe to see summary statistics for each field.
green_taxi_df.describe()
MONT
H _N U
DRO P O M
PA SSE T RIP DI P IC K U P IC K U F F LO N DRO P O TOTA L DAY _O DAY _O H O UR_
VEN DO N GERC STA N C P LO N G P L AT IT GIT UD F F L AT I AMOU F _M O N F _W EE O F _DA
RID O UN T E IT UDE UDE E T UDE NT TH K Y
count 48000. 48000. 48000. 48000. 48000. 48000. 48000. 48000. 48000. 48000.
00 00 00 00 00 00 00 00 00 00
mean 1.78 1.37 2.87 -73.83 40.69 -73.84 40.70 14.75 6.50 15.13
std 0.41 1.04 2.93 2.76 1.52 2.61 1.44 12.08 3.45 8.45
min 1.00 0.00 0.00 -74.66 0.00 -74.66 0.00 - 1.00 1.00
300.00
25% 2.00 1.00 1.06 -73.96 40.70 -73.97 40.70 7.80 3.75 8.00
50% 2.00 1.00 1.90 -73.94 40.75 -73.94 40.75 11.30 6.50 15.00
75% 2.00 1.00 3.60 -73.92 40.80 -73.91 40.79 17.80 9.25 22.00
max 2.00 9.00 97.57 0.00 41.93 0.00 41.94 450.00 12.00 30.00
From the summary statistics, you see that there are several fields that have outliers or values that will reduce
model accuracy. First filter the lat/long fields to be within the bounds of the Manhattan area. This will filter out
longer taxi trips or trips that are outliers in respect to their relationship with other features.
Additionally filter the tripDistance field to be greater than zero but less than 31 miles (the haversine distance
between the two lat/long pairs). This eliminates long outlier trips that have inconsistent trip cost.
Lastly, the totalAmount field has negative values for the taxi fares, which don't make sense in the context of our
model, and the passengerCount field has bad data with the minimum values being zero.
Filter out these anomalies using query functions, and then remove the last few columns unnecessary for
training.
final_df = green_taxi_df.query("pickupLatitude>=40.53 and pickupLatitude<=40.88")
final_df = final_df.query("pickupLongitude>=-74.09 and pickupLongitude<=-73.72")
final_df = final_df.query("tripDistance>=0.25 and tripDistance<31")
final_df = final_df.query("passengerCount>0 and totalAmount>0")
Call describe() again on the data to ensure cleansing worked as expected. You now have a prepared and
cleansed set of taxi, holiday, and weather data to use for machine learning model training.
final_df.describe()
Configure workspace
Create a workspace object from the existing workspace. A Workspace is a class that accepts your Azure
subscription and resource information. It also creates a cloud resource to monitor and track your model runs.
Workspace.from_config() reads the file config.json and loads the authentication details into an object named
ws . ws is used throughout the rest of the code in this tutorial.
The purpose of this step is to have data points to test the finished model that haven't been used to train the
model, in order to measure true accuracy.
In other words, a well-trained model should be able to accurately make predictions from data it hasn't already
seen. You now have data prepared for auto-training a machine learning model.
import logging
automl_settings = {
"iteration_timeout_minutes": 10,
"experiment_timeout_hours": 0.3,
"enable_early_stopping": True,
"primary_metric": 'spearman_correlation',
"featurization": 'auto',
"verbosity": logging.INFO,
"n_cross_validations": 5
}
Use your defined training settings as a **kwargs parameter to an AutoMLConfig object. Additionally, specify
your training data and the type of model, which is regression in this case.
automl_config = AutoMLConfig(task='regression',
debug_log='automated_ml_errors.log',
training_data=x_train,
label_column_name="totalAmount",
**automl_settings)
NOTE
Automated machine learning pre-processing steps (feature normalization, handling missing data, converting text to
numeric, etc.) become part of the underlying model. When using the model for predictions, the same pre-processing
steps applied during training are applied to your input data automatically.
****************************************************************************************************
ITERATION: The iteration being evaluated.
PIPELINE: A summary description of the pipeline being evaluated.
DURATION: Time taken for the current iteration.
METRIC: The result of computing score on the fitted pipeline.
BEST: The best observed score thus far.
****************************************************************************************************
y_predict = fitted_model.predict(x_test)
print(y_predict[:10])
Calculate the root mean squared error of the results. Convert the y_test dataframe to a list to compare to the
predicted values. The function mean_squared_error takes two arrays of values and calculates the average
squared error between them. Taking the square root of the result gives an error in the same units as the y
variable, cost . It indicates roughly how far the taxi fare predictions are from the actual fares.
y_actual = y_test.values.flatten().tolist()
rmse = sqrt(mean_squared_error(y_actual, y_predict))
rmse
Run the following code to calculate mean absolute percent error (MAPE) by using the full y_actual and
y_predict data sets. This metric calculates an absolute difference between each predicted and actual value and
sums all the differences. Then it expresses that sum as a percent of the total of the actual values.
sum_actuals = sum_errors = 0
Model MAPE:
0.14353867606052823
Model Accuracy:
0.8564613239394718
From the two prediction accuracy metrics, you see that the model is fairly good at predicting taxi fares from the
data set's features, typically within +- $4.00, and approximately 15% error.
The traditional machine learning model development process is highly resource-intensive, and requires
significant domain knowledge and time investment to run and compare the results of dozens of models. Using
automated machine learning is a great way to rapidly test many different models for your scenario.
Clean up resources
Do not complete this section if you plan on running other Azure Machine Learning tutorials.
Stop the compute instance
If you used a compute instance, stop the VM when you aren't using it to reduce cost.
1. In your workspace, select Compute .
2. From the list, select the name of the compute instance.
3. Select Stop .
4. When you're ready to use the server again, select Star t .
Delete everything
If you don't plan to use the resources you created, delete them, so you don't incur any charges.
1. In the Azure portal, select Resource groups on the far left.
2. From the list, select the resource group you created.
3. Select Delete resource group .
4. Enter the resource group name. Then select Delete .
You can also keep the resource group but delete a single workspace. Display the workspace properties and
select Delete .
Next steps
In this automated machine learning tutorial, you did the following tasks:
Configured a workspace and prepared data for an experiment.
Trained by using an automated regression model locally with custom parameters.
Explored and reviewed training results.
Tutorial: Train and deploy a model with Azure Machine Learning.
Tutorial: Train an object detection model (preview)
with AutoML and Python
5/25/2022 • 12 minutes to read • Edit Online
IMPORTANT
The features presented in this article are in preview. They should be considered experimental preview features that might
change at any time.
In this tutorial, you learn how to train an object detection model using Azure Machine Learning automated ML
with the Azure Machine Learning CLI extension v2 or the Azure Machine Learning Python SDK v2 (preview). This
object detection model identifies whether the image contains objects, such as a can, carton, milk bottle, or water
bottle.
Automated ML accepts training data and configuration settings, and automatically iterates through
combinations of different feature normalization/standardization methods, models, and hyperparameter settings
to arrive at the best model.
You'll write code using the Python SDK in this tutorial and learn the following tasks:
Download and transform data
Train an automated machine learning object detection model
Specify hyperparameter values for your model
Perform a hyperparameter sweep
Deploy your model
Visualize detections
Prerequisites
If you don’t have an Azure subscription, create a free account before you begin. Try the free or paid
version of Azure Machine Learning today.
Python 3.6 or 3.7 are supported for this feature
Complete the Quickstart: Get started with Azure Machine Learning if you don't already have an Azure
Machine Learning workspace.
Download and unzip the *odFridgeObjects.zip data file. The dataset is annotated in Pascal VOC format,
where each image corresponds to an xml file. Each xml file contains information on where its
corresponding image file is located and also contains information about the bounding boxes and the
object labels. In order to use this data, you first need to convert it to the required JSONL format as seen in
the Convert the downloaded data to JSONL section of the notebook.
CLI v2
Python SDK v2 (preview)
This tutorial is also available in the azureml-examples repository on GitHub. If you wish to run it in your own
local environment, setup using the following instructions
Install and set up CLI (v2) and make sure you install the ml extension.
CLI v2
Python SDK v2 (preview)
$schema: https://azuremlschemas.azureedge.net/latest/amlCompute.schema.json
name: gpu-cluster
type: amlcompute
size: Standard_NC24s_v3
min_instances: 0
max_instances: 4
idle_time_before_scale_down: 120
To create the compute, you run the following CLI v2 command with the path to your .yml file, workspace name,
resource group and subscription ID.
The created compute can be provided using compute key in the automl task configuration yaml:
compute: azureml:gpu-cluster
Experiment setup
You can use an Experiment to track your model training runs.
CLI v2
Python SDK v2 (preview)
experiment_name: dpv2-cli-automl-image-object-detection-experiment
%matplotlib inline
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import matplotlib.patches as patches
from PIL import Image as pil_image
import numpy as np
import json
import os
label_to_color_mapping = {}
for gt in ground_truth_boxes:
label = gt["label"]
if label in label_to_color_mapping:
color = label_to_color_mapping[label]
else:
# Generate a random color. If you want to use a specific color, you can use something like
"red".
color = np.random.rand(3)
label_to_color_mapping[label] = color
# Display label
ax.text(topleft_x, topleft_y - 10, label, color=color, fontsize=20)
plt.show()
image_file = "./odFridgeObjects/images/31.jpg"
jsonl_file = "./odFridgeObjects/train_annotations.jsonl"
plot_ground_truth_boxes_jsonl(image_file, jsonl_file)
CLI v2
Python SDK v2 (preview)
$schema: https://azuremlschemas.azureedge.net/latest/data.schema.json
name: fridge-items-images-object-detection
description: Fridge-items images Object detection
path: ./data/odFridgeObjects
type: uri_folder
To upload the images as a data asset, you run the following CLI v2 command with the path to your .yml file,
workspace name, resource group and subscription ID.
Next step is to create MLTable from your data in jsonl format as shown below. MLtable package your data into a
consumable object for training.
paths:
- file: ./train_annotations.jsonl
transformations:
- read_json_lines:
encoding: utf8
invalid_lines: error
include_path_column: false
- convert_column_types:
- columns: image_url
column_type: stream_info
CLI v2
Python SDK v2 (preview)
target_column_name: label
training_data:
path: data/training-mltable-folder
type: mltable
validation_data:
path: data/validation-mltable-folder
type: mltable
CLI v2
Python SDK v2 (preview)
task: image_object_detection
primary_metric: mean_average_precision
In your AutoML job, you can specify the model algorithms by using model_name parameter and configure the
settings to perform a hyperparameter sweep over a defined search space to find the optimal model.
In this example, we will train an object detection model with yolov5 and fasterrcnn_resnet50_fpn , both of
which are pretrained on COCO, a large-scale object detection, segmentation, and captioning dataset that
contains over thousands of labeled images with over 80 label categories.
Hyperparameter sweeping for image tasks
You can perform a hyperparameter sweep over a defined search space to find the optimal model.
The following code, defines the search space in preparation for the hyperparameter sweep for each defined
algorithm, yolov5 and fasterrcnn_resnet50_fpn . In the search space, specify the range of values for
learning_rate , optimizer , lr_scheduler , etc., for AutoML to choose from as it attempts to generate a model
with the optimal primary metric. If hyperparameter values are not specified, then default values are used for
each algorithm.
For the tuning settings, use random sampling to pick samples from this parameter space by using the random
sampling_algorithm. Doing so, tells automated ML to try a total of 10 trials with these different samples,
running two trials at a time on our compute target, which was set up using four nodes. The more parameters the
search space has, the more trials you need to find optimal models.
The Bandit early termination policy is also used. This policy terminates poor performing configurations; that is,
those configurations that are not within 20% slack of the best performing configuration, which significantly
saves compute resources.
CLI v2
Python SDK v2 (preview)
search_space:
- model_name: "yolov5"
learning_rate: "uniform(0.0001, 0.01)"
model_size: "choice('small', 'medium')"
- model_name: "fasterrcnn_resnet50_fpn"
learning_rate: "uniform(0.0001, 0.001)"
optimizer: "choice('sgd', 'adam', 'adamw')"
min_size: "choice(600, 800)"
Once the search space and sweep settings are defined, you can then submit the job to train an image model
using your training dataset.
CLI v2
Python SDK v2 (preview)
When doing a hyperparameter sweep, it can be useful to visualize the different configurations that were tried
using the HyperDrive UI. You can navigate to this UI by going to the 'Child runs' tab in the UI of the main
automl_image_run from above, which is the HyperDrive parent run. Then you can go into the 'Child runs' tab of
this one.
Alternatively, here below you can see directly the HyperDrive parent run and navigate to its 'Child runs' tab:
sample_image = './test_image.jpg'
Visualize detections
Now that you have scored a test image, you can visualize the bounding boxes for this image. To do so, be sure
you have matplotlib installed.
IMAGE_SIZE = (18,12)
plt.figure(figsize=IMAGE_SIZE)
img_np=mpimg.imread(sample_image)
img = Image.fromarray(img_np.astype('uint8'),'RGB')
x, y = img.size
ax.add_patch(rect)
plt.text(topleft_x, topleft_y - 10, label, color=color, fontsize=20)
plt.show()
Clean up resources
Do not complete this section if you plan on running other Azure Machine Learning tutorials.
If you don't plan to use the resources you created, delete them, so you don't incur any charges.
1. In the Azure portal, select Resource groups on the far left.
2. From the list, select the resource group you created.
3. Select Delete resource group .
4. Enter the resource group name. Then select Delete .
You can also keep the resource group but delete a single workspace. Display the workspace properties and
select Delete .
Next steps
In this automated machine learning tutorial, you did the following tasks:
Configured a workspace and prepared data for an experiment.
Trained an automated object detection model
Specified hyperparameter values for your model
Performed a hyperparameter sweep
Deployed your model
Visualized detections
Learn more about computer vision in automated ML (preview).
Learn how to set up AutoML to train computer vision models with Python (preview).
Learn how to configure incremental training on computer vision models.
See what hyperparameters are available for computer vision tasks.
Code examples:
CLI v2
Python SDK v2 (preview)
Review detailed code examples and use cases in the azureml-examples repository for automated
machine learning samples. Please check the folders with 'cli-automl-image-' prefix for samples
specific to building computer vision models.
NOTE
Use of the fridge objects dataset is available through the license under the MIT License.
Tutorial: Convert ML experiments to production
Python code
5/25/2022 • 11 minutes to read • Edit Online
Prerequisites
Generate the MLOpsPython template and use the experimentation/Diabetes Ridge Regression Training.ipynb
and experimentation/Diabetes Ridge Regression Scoring.ipynb notebooks. These notebooks are used as an
example of converting from experimentation to production. You can find these notebooks at
https://github.com/microsoft/MLOpsPython/tree/master/experimentation.
Install nbconvert . Follow only the installation instructions under section Installing nbconver t on the
Installation page.
sample_data = load_diabetes()
df = pd.DataFrame(
data=sample_data.data,
columns=sample_data.feature_names)
df['Y'] = sample_data.target
X = df.drop('Y', axis=1).values
y = df['Y'].values
args = {
"alpha": 0.5
}
reg_model = Ridge(**args)
reg_model.fit(data["train"]["X"], data["train"]["y"])
preds = reg_model.predict(data["test"]["X"])
mse = mean_squared_error(preds, y_test)
metrics = {"mse": mse}
print(metrics)
model_name = "sklearn_regression_model.pkl"
joblib.dump(value=reg, filename=model_name)
4. Move the code under the "Save Model" heading into the main function.
The main function should look like the following code:
def main():
# Load Data
sample_data = load_diabetes()
df = pd.DataFrame(
data=sample_data.data,
columns=sample_data.feature_names)
df['Y'] = sample_data.target
# Save Model
model_name = "sklearn_regression_model.pkl"
joblib.dump(value=reg, filename=model_name)
At this stage, there should be no code remaining in the notebook that isn't in a function, other than import
statements in the first cell.
Add a statement that calls the main function.
main()
After refactoring, experimentation/Diabetes Ridge Regression Training.ipynb should look like the following code
without the markdown:
from sklearn.datasets import load_diabetes
from sklearn.linear_model import Ridge
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split
import pandas as pd
import joblib
def main():
# Load Data
sample_data = load_diabetes()
df = pd.DataFrame(
data=sample_data.data,
columns=sample_data.feature_names)
df['Y'] = sample_data.target
# Save Model
model_name = "sklearn_regression_model.pkl"
joblib.dump(value=reg, filename=model_name)
main()
def init():
model_path = Model.get_model_path(
model_name="sklearn_regression_model.pkl")
model = joblib.load(model_path)
Once the init function has been created, replace all the code under the heading "Load Model" with a single call
to init as follows:
init()
{"result": result.tolist()}
2. Copy the code under the "Prepare Data" and "Score Data" headings into the run function.
The run function should look like the following code (Remember to remove the statements that set the
variables raw_data and request_headers , which will be used later when the run function is called):
Once the run function has been created, replace all the code under the "Prepare Data" and "Score Data"
headings with the following code:
raw_data = '{"data":[[1,2,3,4,5,6,7,8,9,10],[10,9,8,7,6,5,4,3,2,1]]}'
request_header = {}
prediction = run(raw_data, request_header)
print("Test result: ", prediction)
The previous code sets variables raw_data and request_header , calls the run function with raw_data and
request_header , and prints the predictions.
After refactoring, experimentation/Diabetes Ridge Regression Scoring.ipynb should look like the following code
without the markdown:
import json
import numpy
from azureml.core.model import Model
import joblib
def init():
model_path = Model.get_model_path(
model_name="sklearn_regression_model.pkl")
model = joblib.load(model_path)
init()
test_row = '{"data":[[1,2,3,4,5,6,7,8,9,10],[10,9,8,7,6,5,4,3,2,1]]}'
request_header = {}
prediction = run(test_row, {})
print("Test result: ", prediction)
jupyter nbconvert "Diabetes Ridge Regression Training.ipynb" --to script --output train
Once the notebook has been converted to train.py , remove any unwanted comments. Replace the call to
main() at the end of the file with a conditional invocation like the following code:
if __name__ == '__main__':
main()
def main():
# Load Data
sample_data = load_diabetes()
df = pd.DataFrame(
data=sample_data.data,
columns=sample_data.feature_names)
df['Y'] = sample_data.target
# Save Model
model_name = "sklearn_regression_model.pkl"
joblib.dump(value=reg, filename=model_name)
if __name__ == '__main__':
main()
train.py can now be invoked from a terminal by running python train.py . The functions from train.py can
also be called from other files.
The train_aml.py file found in the diabetes_regression/training directory in the MLOpsPython repository calls
the functions defined in train.py in the context of an Azure Machine Learning experiment run. The functions
can also be called in unit tests, covered later in this guide.
Create Python file for the Diabetes Ridge Regression Scoring notebook
Covert your notebook to an executable script by running the following statement in a command prompt that
which uses the nbconvert package and the path of experimentation/Diabetes Ridge Regression Scoring.ipynb :
jupyter nbconvert "Diabetes Ridge Regression Scoring.ipynb" --to script --output score
Once the notebook has been converted to score.py , remove any unwanted comments. Your score.py file
should look like the following code:
import json
import numpy
from azureml.core.model import Model
import joblib
def init():
model_path = Model.get_model_path(
model_name="sklearn_regression_model.pkl")
model = joblib.load(model_path)
init()
test_row = '{"data":[[1,2,3,4,5,6,7,8,9,10],[10,9,8,7,6,5,4,3,2,1]]}'
request_header = {}
prediction = run(test_row, request_header)
print("Test result: ", prediction)
The model variable needs to be global so that it's visible throughout the script. Add the following statement at
the beginning of the init function:
global model
After adding the previous statement, the init function should look like the following code:
def init():
global model
import numpy as np
from code.training.train import train_model
def test_train_model():
# Arrange
X_train = np.array([1, 2, 3, 4, 5, 6]).reshape(-1, 1)
y_train = np.array([10, 9, 8, 8, 6, 5])
data = {"train": {"X": X_train, "y": y_train}}
# Act
reg_model = train_model(data, {"alpha": 1.2})
# Assert
preds = reg_model.predict([[1], [2]])
np.testing.assert_almost_equal(preds, [9.93939393939394, 9.03030303030303])
Next steps
Now that you understand how to convert from an experiment to production code, see the following links for
more information and next steps:
MLOpsPython: Build a CI/CD pipeline to train, evaluate and deploy your own model using Azure Pipelines
and Azure Machine Learning
Monitor Azure ML experiment runs and metrics
Monitor and collect data from ML web service endpoints
Tutorial: Train a classification model with no-code
AutoML in the Azure Machine Learning studio
5/25/2022 • 12 minutes to read • Edit Online
Learn how to train a classification model with no-code AutoML using Azure Machine Learning automated ML in
the Azure Machine Learning studio. This classification model predicts if a client will subscribe to a fixed term
deposit with a financial institution.
With automated ML, you can automate away time intensive tasks. Automated machine learning rapidly iterates
over many combinations of algorithms and hyperparameters to help you find the best model based on a
success metric of your choosing.
You won't write any code in this tutorial, you'll use the studio interface to perform training. You'll learn how to do
the following tasks:
Create an Azure Machine Learning workspace.
Run an automated machine learning experiment.
Explore model details.
Deploy the recommended model.
Also try automated machine learning for these other model types:
For a no-code example of forecasting, see Tutorial: Demand forecasting & AutoML.
For a code first example of a regression model, see the Tutorial: Regression model with AutoML.
Prerequisites
An Azure subscription. If you don't have an Azure subscription, create a free account.
Download the bankmarketing_train.csv data file. The y column indicates if a customer subscribed to a
fixed term deposit, which is later identified as the target column for predictions in this tutorial.
Create a workspace
An Azure Machine Learning workspace is a foundational resource in the cloud that you use to experiment, train,
and deploy machine learning models. It ties your Azure subscription and resource group to an easily consumed
object in the service.
There are many ways to create a workspace. In this tutorial, you create a workspace via the Azure portal, a web-
based console for managing your Azure resources.
1. Sign in to the Azure portal by using the credentials for your Azure subscription.
2. In the upper-left corner of the Azure portal, select the three bars, then + Create a resource .
3. Use the search bar to find Machine Learning .
4. Select Machine Learning .
Location Select the location closest to your users and the data
resources to create your workspace.
WARNING
It can take several minutes to create your workspace in the cloud.
IMPORTANT
Take note of your workspace and subscription . You'll need these to ensure you create your experiment in the right
place.
Column headers Indicates how the headers of the All files have same headers
dataset, if any, will be treated.
h. The Schema form allows for further configuration of your data for this experiment. For this
example, select the toggle switch for the day_of_week , so as to not include it. Select Next .
i. On the Confirm details form, verify the information matches what was previously populated on
the Basic info, Datastore and file selection and Settings and preview forms.
j. Select Create to complete the creation of your dataset.
k. Select your dataset once it appears in the list.
l. Review the Data preview to ensure you didn't include day_of_week then, select Close .
m. Select Next .
Configure run
After you load and configure your data, you can set up your experiment. This setup includes experiment design
tasks such as, selecting the size of your compute environment and specifying what column you want to predict.
1. Select the Create new radio button.
2. Populate the Configure Run form as follows:
a. Enter this experiment name: my-1st-automl-experiment
b. Select y as the target column, what you want to predict. This column indicates whether the client
subscribed to a term deposit or not.
c. Select compute cluster as your compute type.
d. +New to configure your compute target. A compute target is a local or cloud-based resource
environment used to run your training script or host your service deployment. For this experiment,
we use a cloud-based compute.
a. Populate the Select vir tual machine form to set up your compute.
Virtual machine type Select the virtual machine type CPU (Central Processing Unit)
for your compute.
Idle seconds before scale down Idle time before the cluster is 120 (default)
automatically scaled down to
the minimum node count.
Additional classification settings These settings help improve the Positive class label: None
accuracy of your model
Exit criterion If a criteria is met, the training job Training job time (hours): 1
is stopped. Metric score threshold: None
IMPORTANT
Preparation takes 10-15 minutes to prepare the experiment run. Once running, it takes 2-3 minutes more for each
iteration .
In production, you'd likely walk away for a bit. But for this tutorial, we suggest you start exploring the tested algorithms
on the Models tab as they complete while the others are still running.
Explore models
Navigate to the Models tab to see the algorithms (models) tested. By default, the models are ordered by metric
score as they complete. For this tutorial, the model that scores the highest based on the chosen AUC_weighted
metric is at the top of the list.
While you wait for all of the experiment models to finish, select the Algorithm name of a completed model to
explore its performance details.
The following navigates through the Details and the Metrics tabs to view the selected model's properties,
metrics, and performance charts.
Model explanations
While you wait for the models to complete, you can also take a look at model explanations and see which data
features (raw or engineered) influenced a particular model's predictions.
These model explanations can be generated on demand, and are summarized in the model explanations
dashboard that's part of the Explanations (preview) tab.
To generate model explanations,
1. Select Run 1 at the top to navigate back to the Models screen.
2. Select the Models tab.
3. For this tutorial, select the first MaxAbsScaler, LightGBM model.
4. Select the Explain model button at the top. On the right, the Explain model pane appears.
5. Select the automl-compute that you created previously. This compute cluster initiates a child run to
generate the model explanations.
6. Select Create at the bottom. A green success message appears towards the top of your screen.
NOTE
The explainability run takes about 2-5 minutes to complete.
7. Select the Explanations (preview) button. This tab populates once the explainability run completes.
8. On the left hand side, expand the pane and select the row that says raw under Features .
9. Select the Aggregate feature impor tance tab on the right. This chart shows which data features
influenced the predictions of the selected model.
In this example, the duration appears to have the most influence on the predictions of this model.
Deploy the best model
The automated machine learning interface allows you to deploy the best model as a web service in a few steps.
Deployment is the integration of the model so it can predict on new data and identify potential areas of
opportunity.
For this experiment, deployment to a web service means that the financial institution now has an iterative and
scalable web solution for identifying potential fixed term deposit customers.
Check to see if your experiment run is complete. To do so, navigate back to the parent run page by selecting Run
1 at the top of your screen. A Completed status is shown on the top left of the screen.
Once the experiment run is complete, the Details page is populated with a Best model summar y section. In
this experiment context, VotingEnsemble is considered the best model, based on the AUC_weighted metric.
We deploy this model, but be advised, deployment takes about 20 minutes to complete. The deployment
process entails several steps including registering the model, generating resources, and configuring them for
the web service.
1. Select VotingEnsemble to open the model-specific page.
2. Select the Deploy menu in the top-left and select Deploy to web ser vice .
3. Populate the Deploy a model pane as follows:
F IEL D VA L UE
Use custom deployments Disable. Allows for the default driver file (scoring script)
and environment file to be auto-generated.
For this example, we use the defaults provided in the Advanced menu.
4. Select Deploy .
A green success message appears at the top of the Run screen, and in the Model summar y pane, a
status message appears under Deploy status . Select Refresh periodically to check the deployment
status.
Now you have an operational web service to generate predictions.
Proceed to the Next Steps to learn more about how to consume your new web service, and test your
predictions using Power BI's built in Azure Machine Learning support.
Clean up resources
Deployment files are larger than data and experiment files, so they cost more to store. Delete only the
deployment files to minimize costs to your account, or if you want to keep your workspace and experiment files.
Otherwise, delete the entire resource group, if you don't plan to use any of the files.
Delete the deployment instance
Delete just the deployment instance from Azure Machine Learning at https://ml.azure.com/, if you want to keep
the resource group and workspace for other tutorials and exploration.
1. Go to Azure Machine Learning. Navigate to your workspace and on the left under the Assets pane, select
Endpoints .
2. Select the deployment you want to delete and select Delete .
3. Select Proceed .
Delete the resource group
IMPORTANT
The resources that you created can be used as prerequisites to other Azure Machine Learning tutorials and how-to
articles.
If you don't plan to use any of the resources that you created, delete them so you don't incur any charges:
1. In the Azure portal, select Resource groups on the far left.
2. From the list, select the resource group that you created.
3. Select Delete resource group .
Next steps
In this automated machine learning tutorial, you used Azure Machine Learning's automated ML interface to
create and deploy a classification model. See these articles for more information and next steps:
Consume a web service
Learn more about automated machine learning.
For more information on classification metrics and charts, see the Understand automated machine learning
results article.
Learn more about featurization.
Learn more about data profiling.
NOTE
This Bank Marketing dataset is made available under the Creative Commons (CCO: Public Domain) License. Any rights in
individual contents of the database are licensed under the Database Contents License and available on Kaggle. This
dataset was originally available within the UCI Machine Learning Database.
[Moro et al., 2014] S. Moro, P. Cortez and P. Rita. A Data-Driven Approach to Predict the Success of Bank Telemarketing.
Decision Support Systems, Elsevier, 62:22-31, June 2014.
Tutorial: Forecast demand with no-code automated
machine learning in the Azure Machine Learning
studio
5/25/2022 • 10 minutes to read • Edit Online
Learn how to create a time-series forecasting model without writing a single line of code using automated
machine learning in the Azure Machine Learning studio. This model will predict rental demand for a bike sharing
service.
You won't write any code in this tutorial, you'll use the studio interface to perform training. You'll learn how to do
the following tasks:
Create and load a dataset.
Configure and run an automated ML experiment.
Specify forecasting settings.
Explore the experiment results.
Deploy the best model.
Also try automated machine learning for these other model types:
For a no-code example of a classification model, see Tutorial: Create a classification model with automated
ML in Azure Machine Learning.
For a code first example of a regression model, see the Tutorial: Use automated machine learning to predict
taxi fares.
Prerequisites
An Azure Machine Learning workspace. See Create an Azure Machine Learning workspace.
Download the bike-no.csv data file
Column headers Indicates how the headers of the Only first file has headers
dataset, if any, will be treated.
h. The Schema form allows for further configuration of your data for this experiment.
a. For this example, choose to ignore the casual and registered columns. These columns are
a breakdown of the cnt column so, therefore we don't include them.
b. Also for this example, leave the defaults for the Proper ties and Type .
c. Select Next .
i. On the Confirm details form, verify the information matches what was previously populated on
the Basic info and Settings and preview forms.
j. Select Create to complete the creation of your dataset.
k. Select your dataset once it appears in the list.
l. Select Next .
Configure run
After you load and configure your data, set up your remote compute target and select which column in your
data you want to predict.
1. Populate the Configure run form as follows:
a. Enter an experiment name: automl-bikeshare
b. Select cnt as the target column, what you want to predict. This column indicates the number of
total bike share rentals.
c. Select compute cluster as your compute type.
d. Select +New to configure your compute target. Automated ML only supports Azure Machine
Learning compute.
a. Populate the Select vir tual machine form to set up your compute.
Virtual machine type Select the virtual machine type CPU (Central Processing Unit)
for your compute.
Idle seconds before scale down Idle time before the cluster is 120 (default)
automatically scaled down to
the minimum node count.
Primary metric Evaluation metric that the machine Normalized root mean squared
learning algorithm will be measured error
by.
Exit criterion If a criteria is met, the training job is Training job time (hours): 3
stopped. Metric score threshold: None
Select Save .
7. Select Next .
8. On the [Optional] Validate and test form,
a. Select k-fold cross-validation as your Validation type .
b. Select 5 as your Number of cross validations .
Run experiment
To run your experiment, select Finish . The Run details screen opens with the Run status at the top next to the
run number. This status updates as the experiment progresses. Notifications also appear in the top right corner
of the studio, to inform you of the status of your experiment.
IMPORTANT
Preparation takes 10-15 minutes to prepare the experiment run. Once running, it takes 2-3 minutes more for each
iteration .
In production, you'd likely walk away for a bit as this process takes time. While you wait, we suggest you start exploring
the tested algorithms on the Models tab as they complete.
Explore models
Navigate to the Models tab to see the algorithms (models) tested. By default, the models are ordered by metric
score as they complete. For this tutorial, the model that scores the highest based on the chosen Normalized
root mean squared error metric is at the top of the list.
While you wait for all of the experiment models to finish, select the Algorithm name of a completed model to
explore its performance details.
The following example navigates through the Details and the Metrics tabs to view the selected model's
properties, metrics and performance charts.
F IEL D VA L UE
Use custom deployment assets Disable. Disabling allows for the default driver file
(scoring script) and environment file to be
autogenerated.
For this example, we use the defaults provided in the Advanced menu.
4. Select Deploy .
A green success message appears at the top of the Run screen stating that the deployment was started
successfully. The progress of the deployment can be found in the Model summar y pane under Deploy
status .
Once deployment succeeds, you have an operational web service to generate predictions.
Proceed to the Next steps to learn more about how to consume your new web service, and test your
predictions using Power BI's built in Azure Machine Learning support.
Clean up resources
Deployment files are larger than data and experiment files, so they cost more to store. Delete only the
deployment files to minimize costs to your account, or if you want to keep your workspace and experiment files.
Otherwise, delete the entire resource group, if you don't plan to use any of the files.
Delete the deployment instance
Delete just the deployment instance from the Azure Machine Learning studio, if you want to keep the resource
group and workspace for other tutorials and exploration.
1. Go to the Azure Machine Learning studio. Navigate to your workspace and on the left under the Assets
pane, select Endpoints .
2. Select the deployment you want to delete and select Delete .
3. Select Proceed .
Delete the resource group
IMPORTANT
The resources that you created can be used as prerequisites to other Azure Machine Learning tutorials and how-to
articles.
If you don't plan to use any of the resources that you created, delete them so you don't incur any charges:
1. In the Azure portal, select Resource groups on the far left.
2. From the list, select the resource group that you created.
3. Select Delete resource group .
Next steps
In this tutorial, you used automated ML in the Azure Machine Learning studio to create and deploy a time series
forecasting model that predicts bike share rental demand.
See this article for steps on how to create a Power BI supported schema to facilitate consumption of your newly
deployed web service:
Consume a web service
Learn more about automated machine learning.
For more information on classification metrics and charts, see the Understand automated machine learning
results article.
Learn more about featurization.
Learn more about data profiling.
NOTE
This bike share dataset has been modified for this tutorial. This dataset was made available as part of a Kaggle
competition and was originally available via Capital Bikeshare. It can also be found within the UCI Machine Learning
Database.
Source: Fanaee-T, Hadi, and Gama, Joao, Event labeling combining ensemble detectors and background knowledge,
Progress in Artificial Intelligence (2013): pp. 1-15, Springer Berlin Heidelberg.
Tutorial: Designer - train a no-code regression
model
5/25/2022 • 12 minutes to read • Edit Online
Train a linear regression model that predicts car prices using the Azure Machine Learning designer. This tutorial
is part one of a two-part series.
This tutorial uses the Azure Machine Learning designer, for more information, see What is Azure Machine
Learning designer?
In part one of the tutorial, you learn how to:
Create a new pipeline.
Import data.
Prepare data.
Train a machine learning model.
Evaluate a machine learning model.
In part two of the tutorial, you deploy your model as a real-time inferencing endpoint to predict the price of any
car based on technical specifications you send it.
NOTE
A completed version of this tutorial is available as a sample pipeline.
To find it, go to the designer in your workspace. In the New pipeline section, select Sample 1 - Regression:
Automobile Price Prediction(Basic) .
IMPORTANT
If you do not see graphical elements mentioned in this document, such as buttons in studio or designer, you may not
have the right level of permissions to the workspace. Please contact your Azure subscription administrator to verify that
you have been granted the correct level of access. For more information, see Manage users and roles.
IMPORTANT
Attached compute is not supported, use compute instances or clusters instead.
You can set a Default compute target for the entire pipeline, which will tell every component to use the same
compute target by default. However, you can specify compute targets on a per-module basis.
1. Next to the pipeline name, select the Gear icon at the top of the canvas to open the Settings pane.
2. In the Settings pane to the right of the canvas, select Select compute target .
If you already have an available compute target, you can select it to run this pipeline.
3. Enter a name for the compute resource.
4. Select Save .
NOTE
It takes approximately five minutes to create a compute resource. After the resource is created, you can reuse it
and skip this wait time for future runs.
The compute resource autoscales to zero nodes when it's idle to save cost. When you use it again after a delay,
you might experience approximately five minutes of wait time while it scales back up.
Import data
There are several sample datasets included in the designer for you to experiment with. For this tutorial, use
Automobile price data (Raw) .
1. To the left of the pipeline canvas is a palette of datasets and components. Select Sample datasets to
view the available sample datasets.
2. Select the dataset Automobile price data (Raw) , and drag it onto the canvas.
Prepare data
Datasets typically require some preprocessing before analysis. You might have noticed some missing values
when you inspected the dataset. These missing values must be cleaned so that the model can analyze the data
correctly.
Remove a column
When you train a model, you have to do something about the data that's missing. In this dataset, the
normalized-losses column is missing many values, so you'll exclude that column from the model altogether.
1. In the component palette to the left of the canvas, expand the Data Transformation section and find the
Select Columns in Dataset component.
2. Drag the Select Columns in Dataset component onto the canvas. Drop the component below the
dataset component.
3. Connect the Automobile price data (Raw) dataset to the Select Columns in Dataset component.
Drag from the dataset's output port, which is the small circle at the bottom of the dataset on the canvas,
to the input port of Select Columns in Dataset , which is the small circle at the top of the component.
TIP
You create a flow of data through your pipeline when you connect the output port of one component to an input
port of another.
TIP
Cleaning the missing values from input data is a prerequisite for using most of the components in the designer.
1. In the component palette to the left of the canvas, expand the section Data Transformation , and find the
Clean Missing Data component.
2. Drag the Clean Missing Data component to the pipeline canvas. Connect it to the Select Columns in
Dataset component.
3. Select the Clean Missing Data component.
4. In the component details pane to the right of the canvas, select Edit Column .
5. In the Columns to be cleaned window that appears, expand the drop-down menu next to Include .
Select, All columns
6. Select Save
7. In the component details pane to the right of the canvas, select Remove entire row under Cleaning
mode .
8. In the component details pane to the right of the canvas, select the Comment box, and enter Remove
missing value rows.
Your pipeline should now look something like this:
Train a machine learning model
Now that you have the components in place to process the data, you can set up the training components.
Because you want to predict price, which is a number, you can use a regression algorithm. For this example, you
use a linear regression model.
Split the data
Splitting data is a common task in machine learning. You'll split your data into two separate datasets. One
dataset will train the model and the other will test how well the model performed.
1. In the component palette, expand the section Data Transformation and find the Split Data component.
2. Drag the Split Data component to the pipeline canvas.
3. Connect the left port of the Clean Missing Data component to the Split Data component.
IMPORTANT
Be sure that the left output ports of Clean Missing Data connects to Split Data . The left port contains the
cleaned data. The right port contains the discarded data.
IMPORTANT
Be sure that the left output ports of Split Data connects to Train Model. The left port contains the training set.
The right port contains the test set.
IMPORTANT
Make sure you enter the column name exactly. Do not capitalize price .
NOTE
Experiments group similar pipeline runs together. If you run a pipeline multiple times, you can select the same
experiment for successive runs.
After the run completes, you can view the results of the pipeline run. First, look at the predictions generated by
the regression model.
1. Right-click the Score Model component, and select Preview data > Scored dataset to view its output.
Here you can see the predicted prices and the actual prices from the testing data.
Evaluate models
Use the Evaluate Model to see how well the trained model performed on the test dataset.
1. Right-click the Evaluate Model component and select Preview data > Evaluation results to view its
output.
The following statistics are shown for your model:
Mean Absolute Error (MAE) : The average of absolute errors. An error is the difference between the
predicted value and the actual value.
Root Mean Squared Error (RMSE) : The square root of the average of squared errors of predictions made
on the test dataset.
Relative Absolute Error : The average of absolute errors relative to the absolute difference between actual
values and the average of all actual values.
Relative Squared Error : The average of squared errors relative to the squared difference between the
actual values and the average of all actual values.
Coefficient of Determination : Also known as the R squared value, this statistical metric indicates how well
a model fits the data.
For each of the error statistics, smaller is better. A smaller value indicates that the predictions are closer to the
actual values. For the coefficient of determination, the closer its value is to one (1.0), the better the predictions.
Clean up resources
Skip this section if you want to continue on with part 2 of the tutorial, deploying models.
IMPORTANT
You can use the resources that you created as prerequisites for other Azure Machine Learning tutorials and how-to
articles.
Delete everything
If you don't plan to use anything that you created, delete the entire resource group so you don't incur any
charges.
1. In the Azure portal, select Resource groups on the left side of the window.
2. In the list, select the resource group that you created.
3. Select Delete resource group .
Deleting the resource group also deletes all resources that you created in the designer.
Delete individual assets
In the designer where you created your experiment, delete individual assets by selecting them and then
selecting the Delete button.
The compute target that you created here automatically autoscales to zero nodes when it's not being used. This
action is taken to minimize charges. If you want to delete the compute target, take these steps:
You can unregister datasets from your workspace by selecting each dataset and selecting Unregister .
To delete a dataset, go to the storage account by using the Azure portal or Azure Storage Explorer and manually
delete those assets.
Next steps
In part two, you'll learn how to deploy your model as a real-time endpoint.
Continue to deploying models
Tutorial: Designer - deploy a machine learning
model
5/25/2022 • 7 minutes to read • Edit Online
Use the designer to deploy a machine learning model to predict the price of cars. This tutorial is part two of a
two-part series.
In part one of the tutorial you trained a linear regression model on car prices. In part two, you deploy the model
to give others a chance to use it. In this tutorial, you:
Create a real-time inference pipeline.
Create an inferencing cluster.
Deploy the real-time endpoint.
Test the real-time endpoint.
Prerequisites
Complete part one of the tutorial to learn how to train and score a machine learning model in the designer.
IMPORTANT
If you do not see graphical elements mentioned in this document, such as buttons in studio or designer, you may not
have the right level of permissions to the workspace. Please contact your Azure subscription administrator to verify that
you have been granted the correct level of access. For more information, see Manage users and roles.
NOTE
Create inference pipeline only supports training pipelines which contain only the designer built-in components and must
have a component like Train Model which outputs the trained model.
2. Select Submit , and use the same compute target and experiment that you used in part one.
If this is the first run, it may take up to 20 minutes for your pipeline to finish running. The default
compute settings have a minimum node size of 0, which means that the designer must allocate resources
after being idle. Repeated pipeline runs will take less time since the compute resources are already
allocated. Additionally, the designer uses cached results for each component to further improve efficiency.
3. Go to the real-time inference pipeline job detail by selecting Job detail link in the left pane.
4. Select Deploy in the job detail page.
NOTE
It takes approximately 15 minutes to create a new AKS service. You can check the provisioning state on the
Inference Clusters page.
Enable Application Insights diagnostics and data Whether to enable Azure Application Insights to collect
collection data from the deployed endpoints.
By default: false.
Auto scale enabled Whether to enable autoscaling for the web service.
By default: true.
Target utilization The target utilization (in percent out of 100) that the
autoscaler should attempt to maintain for this web
service.
By default: 70.
Refresh period How often (in seconds) the autoscaler attempts to scale
this web service.
By default: 1.
CPU reserve capacity The number of CPU cores to allocate for this web service.
By default: 0.1.
Memory reserve capacity The amount of memory (in GB) to allocate for this web
service.
By default: 0.5.
4. Select Deploy .
A success notification from the notification center appears after deployment finishes. It might take a few
minutes.
TIP
You can also deploy to Azure Container Instance (ACI) if you select Azure Container Instance for Compute type
in the real-time endpoint setting box. Azure Container Instance is used for testing or development. Use ACI for low-scale
CPU-based workloads that require less than 48 GB of RAM.
2. After you submit the modified training pipeline, go to the job detail page.
3. When the job completes, right click Train Model and select Register data .
Input name and select File type.
4. After the dataset registers successfully, open your inference pipeline draft, or clone the previous inference
pipeline job into a new draft. In the inference pipeline draft, replace the previous trained model shown as
MD-XXXX node connected to the Score Model component with the newly registered dataset.
5. If you need to update the data preprocessing part in your training pipeline, and would like to update that
into the inference pipeline, the processing is similar as steps above.
You just need to register the transformation output of the transformation component as dataset.
Then manually replace the TD- component in inference pipeline with the registered dataset.
6. After modifying your inference pipeline with the newly trained model or transformation, submit it. When
the job is completed, deploy it to the existing online endpoint deployed previously.
Limitations
Due to datastore access limitation, if your inference pipeline contains Impor t Data or Expor t Data component,
they'll be auto-removed when deploy to real-time endpoint.
Clean up resources
IMPORTANT
You can use the resources that you created as prerequisites for other Azure Machine Learning tutorials and how-to
articles.
Delete everything
If you don't plan to use anything that you created, delete the entire resource group so you don't incur any
charges.
1. In the Azure portal, select Resource groups on the left side of the window.
2. In the list, select the resource group that you created.
3. Select Delete resource group .
Deleting the resource group also deletes all resources that you created in the designer.
Delete individual assets
In the designer where you created your experiment, delete individual assets by selecting them and then
selecting the Delete button.
The compute target that you created here automatically autoscales to zero nodes when it's not being used. This
action is taken to minimize charges. If you want to delete the compute target, take these steps:
You can unregister datasets from your workspace by selecting each dataset and selecting Unregister .
To delete a dataset, go to the storage account by using the Azure portal or Azure Storage Explorer and manually
delete those assets.
Next steps
In this tutorial, you learned the key steps in how to create, deploy, and consume a machine learning model in the
designer. To learn more about how you can use the designer see the following links:
Designer samples: Learn how to use the designer to solve other types of problems.
Use Azure Machine Learning studio in an Azure virtual network.
Train an image classification TensorFlow model
using the Azure Machine Learning Visual Studio
Code Extension (preview)
5/25/2022 • 5 minutes to read • Edit Online
Prerequisites
Azure subscription. If you don't have one, sign up to try the free or paid version of Azure Machine Learning. If
you're using the free subscription, only CPU clusters are supported.
Install Visual Studio Code, a lightweight, cross-platform code editor.
Azure Machine Learning Studio Visual Studio Code extension. For install instructions see the Setup Azure
Machine Learning Visual Studio Code extension guide
CLI (v2). For installation instructions, see Install, set up, and use the CLI (v2)
Clone the community driven repository
Create a workspace
The first thing you have to do to build an application in Azure Machine Learning is to create a workspace. A
workspace contains the resources to train models as well as the trained models themselves. For more
information, see what is a workspace.
1. Open the azureml-examples/cli/jobs/single-step/tensorflow/mnist directory from the community driven
repository in Visual Studio Code.
2. On the Visual Studio Code activity bar, select the Azure icon to open the Azure Machine Learning view.
3. In the Azure Machine Learning view, right-click your subscription node and select Create Workspace .
4. A specification file appears. Configure the specification file with the following options.
$schema: https://azuremlschemas.azureedge.net/latest/workspace.schema.json
name: TeamWorkspace
location: WestUS2
friendly_name: team-ml-workspace
description: A workspace for training machine learning models
tags:
purpose: training
team: ml-team
The specification file creates a workspace called TeamWorkspace in the WestUS2 region. The rest of the
options defined in the specification file provide friendly naming, descriptions, and tags for the workspace.
5. Right-click the specification file and select Azure ML: Execute YAML . Creating a resource uses the
configuration options defined in the YAML specification file and submits a job using the CLI (v2). At this
point, a request to Azure is made to create a new workspace and dependent resources in your account.
After a few minutes, the new workspace appears in your subscription node.
6. Set TeamWorkspace as your default workspace. Doing so places resources and jobs you create in the
workspace by default. Select the Set Azure ML Workspace button on the Visual Studio Code status bar
and follow the prompts to set TeamWorkspace as your default workspace.
3. A specification file appears. Configure the specification file with the following options.
$schema: https://azuremlschemas.azureedge.net/latest/compute.schema.json
name: gpu-cluster
type: amlcompute
size: Standard_NC12
min_instances: 0
max_instances: 3
idle_time_before_scale_down: 120
The specification file creates a GPU cluster called gpu-cluster with at most 3 Standard_NC12 VM nodes
that automatically scales down to 0 nodes after 120 seconds of inactivity.
For more information on VM sizes, see sizes for Linux virtual machines in Azure.
4. Right-click the specification file and select Azure ML: Execute YAML .
After a few minutes, the new compute target appears in the Compute > Compute clusters node of your
workspace.
This specification file submits a training job called tensorflow-mnist-example to the recently created
gpu-cluster computer target that runs the code in the train.py Python script. The environment used is one of
the curated environments provided by Azure Machine Learning which contains TensorFlow and other software
dependencies required to run the training script. For more information on curated environments, see Azure
Machine Learning curated environments.
To submit the training job:
1. Open the job.yml file.
2. Right-click the file in the text editor and select Azure ML: Execute YAML .
At this point, a request is sent to Azure to run your experiment on the selected compute target in your
workspace. This process takes several minutes. The amount of time to run the training job is impacted by several
factors like the compute type and training data size. To track the progress of your experiment, right-click the
current run node and select View Run in Azure por tal .
When the dialog requesting to open an external website appears, select Open .
When the model is done training, the status label next to the run node updates to "Completed".
Next steps
In this tutorial, you learn the following tasks:
Understand the code
Create a workspace
Create a GPU cluster for training
Train a model
For next steps, see:
Create and manage Azure Machine Learning resources using Visual Studio Code.
Connect Visual Studio Code to a compute instance for a full development experience.
For a walkthrough of how to edit, run, and debug code locally, see the Python hello-world tutorial.
Run Jupyter Notebooks in Visual Studio Code using a remote Jupyter server.
For a walkthrough of how to train with Azure Machine Learning outside of Visual Studio Code, see Tutorial:
Train and deploy a model with Azure Machine Learning.
Tutorial: Power BI integration - Create the predictive
model with a Jupyter Notebook (part 1 of 2)
5/25/2022 • 7 minutes to read • Edit Online
Prerequisites
An Azure subscription. If you don't already have a subscription, you can use a free trial.
An Azure Machine Learning workspace. If you don't already have a workspace, see Create and manage Azure
Machine Learning workspaces.
Introductory knowledge of the Python language and machine learning workflows.
NOTE
The compute instance can take 2 to 4 minutes to be provisioned.
After the compute is provisioned, you can use the notebook to run code cells. For example, in the cell you can
type the following code:
import numpy as np
np.sin(3)
Then select Shift + Enter (or select Control + Enter or select the Play button next to the cell). You should see the
following output:
diabetes = Diabetes.get_tabular_dataset()
X = diabetes.drop_columns("Y")
y = diabetes.keep_columns("Y")
X_df = X.to_pandas_dataframe()
y_df = y.to_pandas_dataframe()
X_df.info()
The X_df pandas data frame contains 10 baseline input variables. These variables include age, sex, body mass
index, average blood pressure, and six blood serum measurements. The y_df pandas data frame is the target
variable. It contains a quantitative measure of disease progression one year after the baseline. The data frame
contains 442 records.
Train the model
Create a new code cell in your notebook. Then copy the following code and paste it into the cell. This code
snippet constructs a ridge regression model and serializes the model by using the Python pickle format.
import joblib
from sklearn.linear_model import Ridge
model = Ridge().fit(X_df,y_df)
joblib.dump(model, 'sklearn_regression_model.pkl')
import sklearn
ws = Workspace.from_config()
model = Model.register(workspace=ws,
model_name='my-sklearn-model', # Name of the registered model in your
workspace.
model_path='./sklearn_regression_model.pkl', # Local file to upload and register as
a model.
model_framework=Model.Framework.SCIKITLEARN, # Framework used to create the model.
model_framework_version=sklearn.__version__, # Version of scikit-learn used to
create the model.
sample_input_dataset=X,
sample_output_dataset=y,
resource_configuration=ResourceConfiguration(cpu=2, memory_in_gb=4),
description='Ridge regression model to predict diabetes progression.',
tags={'area': 'diabetes', 'type': 'regression'})
print('Name:', model.name)
print('Version:', model.version)
You can also view the model in Azure Machine Learning Studio. In the menu on the left, select Models :
NOTE
The Python decorators in the code below define the schema of the input and output data, which is important for
integration into Power BI.
Copy the following code and paste it into a new code cell in your notebook. The following code snippet has cell
magic that writes the code to a file named score.py.
%%writefile score.py
import json
import pickle
import numpy as np
import pandas as pd
import os
import joblib
from azureml.core.model import Model
def init():
global model
# Replace filename if needed.
path = os.getenv('AZUREML_MODEL_DIR')
model_path = os.path.join(path, 'sklearn_regression_model.pkl')
# Deserialize the model file back into a sklearn model.
model = joblib.load(model_path)
input_sample = pd.DataFrame(data=[{
"AGE": 5,
"SEX": 2,
"BMI": 3.1,
"BP": 3.1,
"S1": 3.1,
"S2": 3.1,
"S3": 3.1,
"S4": 3.1,
"S5": 3.1,
"S6": 3.1
}])
# This is an integer type sample. Use the data type that reflects the expected result.
output_sample = np.array([0])
environment = Environment('my-sklearn-environment')
environment.python.conda_dependencies = CondaDependencies.create(pip_packages=[
'azureml-defaults',
'inference-schema[numpy-support]',
'joblib',
'numpy',
'pandas',
'scikit-learn=={}'.format(sklearn.__version__)
])
inference_config = InferenceConfig(entry_script='./score.py',environment=environment)
service_name = 'my-diabetes-model'
NOTE
The service can take 2 to 4 minutes to deploy.
If the service deploys successfully, you should see the following output:
You can also view the service in Azure Machine Learning Studio. In the menu on the left, select Endpoints :
We recommend that you test the web service to ensure it works as expected. To return your notebook, in Azure
Machine Learning Studio, in the menu on the left, select Notebooks . Then copy the following code and paste it
into a new code cell in your notebook to test the service.
import json
input_payload = json.dumps({
'data': X_df[0:2].values.tolist()
})
output = service.run(input_payload)
print(output)
The output should look like this JSON structure: {'predict': [[205.59], [68.84]]} .
Next steps
In this tutorial, you saw how to build and deploy a model so that it can be consumed by Power BI. In the next
part, you'll learn how to consume this model in a Power BI report.
Tutorial: Consume a model in Power BI
Tutorial: Power BI integration - Drag and drop to
create the predictive model (part 1 of 2)
5/25/2022 • 5 minutes to read • Edit Online
In part 1 of this tutorial, you train and deploy a predictive machine learning model by using the Azure Machine
Learning designer. The designer is a low-code drag-and-drop user interface. In part 2, you'll use the model to
predict outcomes in Microsoft Power BI.
In this tutorial, you:
Create an Azure Machine Learning compute instance.
Create an Azure Machine Learning inference cluster.
Create a dataset.
Train a regression model.
Deploy the model to a real-time scoring endpoint.
There are three ways to create and deploy the model you'll use in Power BI. This article covers "Option B: Train
and deploy models by using the designer." This option is a low-code authoring experience that uses the designer
interface.
But you could instead use one of the other options:
Option A: Train and deploy models by using Jupyter Notebooks. This code-first authoring experience uses
Jupyter Notebooks that are hosted in Azure Machine Learning Studio.
Option C: Train and deploy models by using automated machine learning. This no-code authoring experience
fully automates data preparation and model training.
Prerequisites
An Azure subscription. If you don't already have a subscription, you can use a free trial.
An Azure Machine Learning workspace. If you don't already have a workspace, see Create and manage Azure
Machine Learning workspaces.
Introductory knowledge of machine learning workflows.
TIP
You can also use the compute instance to create and run notebooks.
Your compute instance Status is now Creating . The machine takes around 4 minutes to provision.
While you wait, on the Compute page, select the Inference clusters tab. Then select New :
On the Create inference cluster page, select a region and a VM size. For this tutorial, select a
Standard_D11_v2 VM. Then select Next .
On the Configure Settings page:
1. Provide a valid compute name.
2. Select Dev-test as the cluster purpose. This option creates a single node to host the deployed model.
3. Select Create .
Your inference cluster Status is now Creating . Your single node cluster takes around 4 minutes to deploy.
Create a dataset
In this tutorial, you use the Diabetes dataset. This dataset is available in Azure Open Datasets.
To create the dataset, in the menu on the left, select Datasets . Then select Create dataset . You see the
following options:
Select From Open Datasets . On the Create dataset from Open Datasets page:
1. Use the search bar to find diabetes.
2. Select Sample: Diabetes .
3. Select Next .
4. Name your dataset diabetes.
5. Select Create .
To explore the data, select the dataset and then select Explore :
The data has 10 baseline input variables, such as age, sex, body mass index, average blood pressure, and six
blood serum measurements. It also has one target variable, named Y . This target variable is a quantitative
measure of diabetes progression one year after the baseline.
On the Settings menu, choose Select compute target . Select the compute instance you created earlier, and
then select Save . Change the Draft name to something more memorable, such as diabetes-model. Finally,
enter a description.
In list of assets, expand Datasets and locate the diabetes dataset. Drag this component onto the canvas:
Next, drag the following components onto the canvas:
1. Linear Regression (located in Machine Learning Algorithms )
2. Train Model (located in Model Training )
On your canvas, notice the circles at the top and bottom of the components. These circles are ports.
Now wire the components together. Select the port at the bottom of the diabetes dataset. Drag it to the port on
the upper-right side of the Train Model component. Select the port at the bottom of the Linear Regression
component. Drag it to the port on the upper-left side of the Train Model component.
Choose the dataset column to use as the label (target) variable to predict. Select the Train Model component
and then select Edit column .
In the dialog box, select Enter column name > Y :
Select Save . Your machine learning workflow should look like this:
Select Submit . Under Experiment , select Create new . Name the experiment, and then select Submit .
NOTE
Your experiment's first run should take around 5 minutes. Subsequent runs are much quicker because the designer caches
components that have been run to reduce latency.
The pipeline condenses to just the components necessary to score the model. When you score the data, you
won't know the target variable values. So you can remove Y from the dataset.
To remove Y , add a Select Columns in Dataset component to the canvas. Wire the component so the diabetes
dataset is the input. The results are the output into the Score Model component:
On the canvas, select the Select Columns in Dataset component, and then select Edit Columns .
In the Select columns dialog box, choose By name . Then ensure that all the input variables are selected but
the target is not selected:
Select Save .
Finally, select the Score Model component and ensure the Append score columns to output check box is
cleared. To reduce latency, the predictions are sent back without the inputs.
In part 1 of this tutorial, you train and deploy a predictive machine learning model. You use automated machine
learning (ML) in Azure Machine Learning Studio. In part 2, you'll use the best-performing model to predict
outcomes in Microsoft Power BI.
In this tutorial, you:
Create an Azure Machine Learning compute cluster.
Create a dataset.
Create an automated machine learning run.
Deploy the best model to a real-time scoring endpoint.
There are three ways to create and deploy the model you'll use in Power BI. This article covers "Option C: Train
and deploy models by using automated machine learning in the studio." This option is a no-code authoring
experience. It fully automates data preparation and model training.
But you could instead use one of the other options:
Option A: Train and deploy models by using Jupyter Notebooks. This code-first authoring experience uses
Jupyter Notebooks that are hosted in Azure Machine Learning Studio.
Option B: Train and deploy models by using the Azure Machine Learning designer. This low-code authoring
experience uses a drag-and-drop user interface.
Prerequisites
An Azure subscription. If you don't already have a subscription, you can use a free trial.
An Azure Machine Learning workspace. If you don't already have a workspace, see Create and manage Azure
Machine Learning workspaces.
NOTE
The new cluster has 0 nodes, so no compute costs are incurred. You incur costs only when the automated machine
learning job runs. The cluster scales back to 0 automatically after 120 seconds of idle time.
Create a dataset
In this tutorial, you use the Diabetes dataset. This dataset is available in Azure Open Datasets.
To create the dataset, in the menu on the left, select Datasets . Then select Create dataset . You see the
following options:
Select From Open Datasets . Then on the Create dataset from Open Datasets page:
1. Use the search bar to find diabetes.
2. Select Sample: Diabetes .
3. Select Next .
4. Name your dataset diabetes.
5. Select Create .
To explore the data, select the dataset and then select Explore :
The data has 10 baseline input variables, such as age, sex, body mass index, average blood pressure, and six
blood serum measurements. It also has one target variable, named Y . This target variable is a quantitative
measure of diabetes progression one year after the baseline.
Next, select the diabetes dataset you created earlier. Then select Next :
On the Configure run page:
1. Under Experiment name , select Create new .
2. Name the experiment.
3. In the Target column field, select Y .
4. In the Select compute cluster field, select the compute cluster you created earlier.
Your completed form should look like this:
Finally, select a machine learning task. In this case, the task is Regression :
Select Finish .
IMPORTANT
Automated machine learning takes around 30 minutes to finish training the 100 models.
Next steps
In this tutorial, you saw how to train and deploy a machine learning model by using automated machine
learning. In the next tutorial, you'll learn how to consume (score) this model in Power BI.
Tutorial: Consume a model in Power BI
How to create a secure workspace
5/25/2022 • 17 minutes to read • Edit Online
In this article, learn how to create and connect to a secure Azure Machine Learning workspace. A secure
workspace uses Azure Virtual Network to create a security boundary around resources used by Azure Machine
Learning.
In this tutorial, you accomplish the following tasks:
Create an Azure Virtual Network (VNet) to secure communications between ser vices in the vir tual
network .
Create an Azure Storage Account (blob and file) behind the VNet. This service is used as default storage
for the workspace .
Create an Azure Key Vault behind the VNet. This service is used to store secrets used by the workspace .
For example, the security information needed to access the storage account.
Create an Azure Container Registry (ACR). This service is used as a repository for Docker images. Docker
images provide the compute environments needed when training a machine learning model or
deploying a trained model as an endpoint .
Create an Azure Machine Learning workspace.
Create a jump box. A jump box is an Azure Virtual Machine that is behind the VNet. Since the VNet restricts
access from the public internet, the jump box is used as a way to connect to resources behind the
VNet .
Configure Azure Machine Learning studio to work behind a VNet. The studio provides a web interface for
Azure Machine Learning .
Create an Azure Machine Learning compute cluster. A compute cluster is used when training machine
learning models in the cloud . In configurations where Azure Container Registry is behind the VNet, it is
also used to build Docker images.
Connect to the jump box and use the Azure Machine Learning studio.
TIP
If you're looking for a template (Microsoft Bicep or Hashicorp Terraform) that demonstrates how to create a secure
workspace, see Tutorial - Create a secure workspace using a template.
Prerequisites
Familiarity with Azure Virtual Networks and IP networking. If you are not familiar, try the Fundamentals of
computer networking module.
While most of the steps in this article use the Azure portal or the Azure Machine Learning studio, some steps
use the Azure CLI extension for Machine Learning v2.
Limitations
The steps in this article put Azure Container Registry behind the VNet. In this configuration, you can't deploy
models to Azure Container Instances inside the VNet. For more information, see Secure the inference
environment.
TIP
As an alternative to Azure Container Instances, try Azure Machine Learning managed online endpoints. For more
information, see Enable network isolation for managed online endpoints (preview).
2. From the Basics tab, select the Azure subscription to use for this resource and then select or create a
new resource group . Under Instance details , enter a friendly name for your virtual network and
select the region to create it in.
3. Select IP Addresses tab. The default settings should be similar to the following image:
Use the following steps to configure the IP address and configure a subnet for training and scoring
resources:
TIP
While you can use a single subnet for all Azure ML resources, the steps in this article show how to create two
subnets to separate the training & scoring resources.
The workspace and other dependency services will go into the training subnet. They can still be used by resources
in other subnets, such as the scoring subnet.
a. Look at the default IPv4 address space value. In the screenshot, the value is 172.17.0.0/16 . The
value may be different for you . While you can use a different value, the rest of the steps in this
tutorial are based on the 172.16.0.0/16 value .
IMPORTANT
We do not recommend using an address in the 172.17.0.1/16 range if you plan on using Azure
Kubernetes Services for deployment with this cluster. The Docker bridge in Azure Kubernetes Services uses
172.17.0.1/16 as its default. Other ranges may also conflict depending on what you want to connect to
the virtual network. For example, if you plan to connect your on premises network to the VNet, and your
on-premises network also uses the 172.16.0.0/16 range. Ultimately, it is up to you to plan your network
infrastructure.
c. To create a subnet to contain the workspace, dependency services, and resources used for training,
select + Add subnet and set the subnet name and address range. The following are the values
used in this tutorial:
Subnet name : Training
Subnet address range : 172.16.0.0/24
TIP
If you plan on using a service endpoint to add your Azure Storage Account, Azure Key Vault, and Azure
Container Registry to the VNet, select the following under Ser vices :
Microsoft.Storage
Microsoft.KeyVault
Microsoft.ContainerRegistr y
If you plan on using a private endpoint to add these services to the VNet, you do not need to select these
entries. The steps in this article use a private endpoint for these services, so you do not need to select
them when following these steps.
d. To create a subnet for compute resources used to score your models, select + Add subnet again,
and set the name and address range:
Subnet name : Scoring
Subnet address range : 172.16.1.0/24
TIP
If you plan on using a service endpoint to add your Azure Storage Account, Azure Key Vault, and Azure
Container Registry to the VNet, select the following under Ser vices :
Microsoft.Storage
Microsoft.KeyVault
Microsoft.ContainerRegistr y
If you plan on using a private endpoint to add these services to the VNet, you do not need to select these
entries. The steps in this article use a private endpoint for these services, so you do not need to select
them when following these steps.
4. Select Security . For BastionHost , select Enable . Azure Bastion provides a secure way to access the VM
jump box you will create inside the VNet in a later step. Use the following values for the remaining fields:
Bastion name : A unique name for this Bastion instance
AzureBastionSubnetAddress space : 172.16.2.0/27
Public IP address : Create a new public IP address.
Leave the other fields at the default values.
NOTE
While you created a private endpoint for Blob storage in the previous steps, you must also create one for File
storage.
8. On the Create a private endpoint form, use the same subscription , resource group , and Region
that you have used for previous resources. Enter a unique Name .
9. Select Next : Resource , and then set Target sub-resource to file .
10. Select Next : Configuration , and then use the following values:
Vir tual network : The network you created previously
Subnet : Training
Integrate with private DNS zone : Yes
Private DNS zone : privatelink.file.core.windows.net
11. Select Review + Create . Verify that the information is correct, and then select Create .
TIP
If you plan to use ParallelRunStep in your pipeline, it is also required to configure private endpoints target queue and
table sub-resources. ParallelRunStep uses queue and table under the hood for task scheduling and dispatching.
5. Select Review + create . Verify that the information is correct, and then select Create .
6. After the container registry has been created, select Go to resource .
7. From the left of the page, select Access keys , and then enable Admin user . This setting is required
when using Azure Container Registry inside a virtual network with Azure Machine Learning.
Create a workspace
1. In the Azure portal, select the portal menu in the upper left corner. From the menu, select + Create a
resource and then enter Machine Learning . Select the Machine Learning entry, and then select
Create .
2. From the Basics tab, select the subscription , resource group , and Region you previously used for the
virtual network. Use the following values for the other fields:
Workspace name : A unique name for your workspace.
Storage account : Select the storage account you created previously.
Key vault : Select the key vault you created previously.
Application insights : Use the default value.
Container registr y : Use the container registry you created previously.
3. From the Networking tab, select Private endpoint and then select + add .
4. On the Create private endpoint form, use the following values:
Subscription : The same Azure subscription that contains the previous resources you've created.
Resource group : The same Azure resource group that contains the previous resources you've
created.
Location : The same Azure region that contains the previous resources you've created.
Name : A unique name for this private endpoint.
Target sub-resource : amlworkspace
Vir tual network : The virtual network you created earlier.
Subnet : Training (172.16.0.0/24)
Private DNS integration : Yes
Private DNS Zone : Leave the two private DNS zones at the default values of
privatelink .api.azureml.ms and privatelink .notebooks.azure.net .
Select OK to create the private endpoint.
5. Select Review + create . Verify that the information is correct, and then select Create .
6. Once the workspace has been created, select Go to resource .
7. From the Settings section on the left, select Private endpoint connections and then select the link in
the Private endpoint column:
8. Once the private endpoint information appears, select DNS configuration from the left of the page.
Save the IP address and fully qualified domain name (FQDN) information on this page, as it will be used
later.
IMPORTANT
There are still some configuration steps needed before you can fully use the workspace. However, these require you to
connect to the workspace.
Enable studio
Azure Machine Learning studio is a web-based application that lets you easily manage your workspace.
However, it needs some extra configuration before it can be used with resources secured inside a VNet. Use the
following steps to enable studio:
1. When using an Azure Storage Account that has a private endpoint, add the service principal for the
workspace as a Reader for the storage private endpoint(s). From the Azure portal, select your storage
account and then select Networking . Next, select Private endpoint connections .
e. On the Members tab, select User, group, or ser vice principal in the Assign access to area
and then select + Select members . In the Select members dialog, enter the name as your
Azure Machine Learning workspace. Select the service principal for the workspace, and then use
the Select button.
f. On the Review + assign tab, select Review + assign to assign the role.
Connect to the workspace
There are several ways that you can connect to the secured workspace. The steps in this article use a jump box ,
which is a virtual machine in the VNet. You can connect to it using your web browser and Azure Bastion. The
following table lists several other ways that you might connect to the secure workspace:
M ET H O D DESC RIP T IO N
Azure VPN gateway Connects on-premises networks to the VNet over a private
connection. Connection is made over the public internet.
IMPORTANT
When using a VPN gateway or ExpressRoute , you will need to plan how name resolution works between your on-
premises resources and those in the VNet. For more information, see Use a custom DNS server.
IMPORTANT
Do not select a Gen2 image.
TIP
If your Azure AD account has access to multiple subscriptions or directories, use the Director y and
Subscription dropdown to select the one that contains the workspace.
6. From the Configure Settings dialog, enter cpu-cluster as the Compute name . Set the Subnet to
Training and then select Create to create the cluster.
TIP
Compute clusters dynamically scale the nodes in the cluster as needed. We recommend leaving the minimum
number of nodes at 0 to reduce costs when the cluster is not in use.
7. From studio, select Compute , Compute instance , and then + New .
8. From the Vir tual Machine dialog, enter a unique Computer name and select Next: Advanced
Settings .
9. From the Advanced Settings dialog, set the Subnet to Training , and then select Create .
TIP
When you create a compute cluster or compute instance, Azure Machine Learning dynamically adds a Network Security
Group (NSG). This NSG contains the following rules, which are specific to compute cluster and compute instance:
Allow inbound TCP traffic on ports 29876-29877 from the BatchNodeManagement service tag.
Allow inbound TCP traffic on port 44224 from the AzureMachineLearning service tag.
The following screenshot shows an example of these rules:
For more information on creating a compute cluster and compute cluster, including how to do so with Python
and the CLI, see the following articles:
Create a compute cluster
Create a compute instance
az extension add -n ml
3. To update the workspace to use the compute cluster to build Docker images. Replace docs-ml-rg with
your resource group. Replace docs-ml-ws with your workspace. Replace cpu-cluster with the compute
cluster to use:
az ml workspace update \
-n myworkspace \
-g myresourcegroup \
-i mycomputecluster
NOTE
You can use the same compute cluster to train models and build Docker images for the workspace.
At this point, you can use studio to interactively work with notebooks on the compute instance and run training
jobs on the compute cluster. For a tutorial on using the compute instance and compute cluster, see run a Python
script.
The compute cluster dynamically scales between the minimum and maximum node count set when you created
it. If you accepted the defaults, the minimum is 0, which effectively turns off the cluster when not in use.
Stop the compute instance
From studio, select Compute , Compute clusters , and then select the compute instance. Finally, select Stop
from the top of the page.
Stop the jump box
Once it has been created, select the virtual machine in the Azure portal and then use the Stop button. When you
are ready to use it again, use the Star t button to start it.
You can also configure the jump box to automatically shut down at a specific time. To do so, select Auto-
shutdown , Enable , set a time, and then select Save .
Clean up resources
If you plan to continue using the secured workspace and other resources, skip this section.
To delete all resources created in this tutorial, use the following steps:
1. In the Azure portal, select Resource groups on the far left.
2. From the list, select the resource group that you created in this tutorial.
3. Select Delete resource group .
4. Enter the resource group name, then select Delete .
Next steps
Now that you have created a secure workspace and can access studio, learn how to run a Python script using
Azure Machine Learning.
How to create a secure workspace by using
template
5/25/2022 • 6 minutes to read • Edit Online
Templates provide a convenient way to create reproducible service deployments. The template defines what will
be created, with some information provided by you when you use the template. For example, specifying a
unique name for the Azure Machine Learning workspace.
In this tutorial, you learn how to use a Microsoft Bicep and Hashicorp Terraform template to create the following
Azure resources:
Azure Virtual Network. The following resources are secured behind this VNet:
Azure Machine Learning workspace
Azure Machine Learning compute instance
Azure Machine Learning compute cluster
Azure Storage Account
Azure Key Vault
Azure Application Insights
Azure Container Registry
Azure Bastion host
Azure Machine Learning Virtual Machine (Data Science Virtual Machine)
The Bicep template also creates an Azure Kubernetes Service cluster, and a separate resource group
for it.
Prerequisites
Before using the steps in this article, you must have an Azure subscription. If you don't have an Azure
subscription, create a free account.
You must also have either a Bash or Azure PowerShell command line.
TIP
When reading this article, use the tabs in each section to select whether to view information on using Bicep or Terraform
templates.
Bicep
Terraform
1. To install the command-line tools, see Set up Bicep development and deployment environments.
2. The Bicep template used in this article is located at https://github.com/Azure/azure-quickstart-
templates/blob/master/quickstarts/microsoft.machinelearningservices/machine-learning-end-to-end-
secure. Use the following commands to clone the GitHub repo to your development environment:
TIP
If you do not have the git command on your development environment, you can install it from https://git-
scm.com/.
The Bicep template is made up of the main.bicep and the .bicep files in the modules subdirectory. The
following table describes what each file is responsible for:
F IL E DESC RIP T IO N
nsg.bicep Defines the network security group rules for the VNet.
bastion.bicep Defines the Azure Bastion host and subnet. Azure Bastion
allows you to easily access a VM inside the VNet using your
web browser.
machinelearningnetworking.bicep Defines te private endpoints and DNS zones for the Azure
Machine Learning workspace.
To run the Bicep template, use the following commands from the machine-learning-end-to-end-secure where the
main.bicep file is:
1. To create a new Azure Resource Group, use the following command. Replace exampleRG with your
resource group name, and eastus with the Azure region you want to use:
Azure CLI
Azure PowerShell
Azure CLI
Azure PowerShell
3. When prompted, provide the username and password you specified when configuring the template
and then select Connect .
IMPORTANT
The first time you connect to the DSVM desktop, a PowerShell window opens and begins running a script. Allow
this to complete before continuing with the next step.
4. From the DSVM desktop, start Microsoft Edge and enter https://ml.azure.com as the address. Sign in
to your Azure subscription, and then select the workspace created by the template. The studio for your
workspace is displayed.
Next steps
IMPORTANT
The Data Science Virtual Machine (DSVM) and any compute instance resources bill you for every hour that they are
running. To avoid excess charges, you should stop these resources when they are not in use. For more information, see
the following articles:
Create/manage VMs (Linux).
Create/manage VMs (Windows).
Create/manage compute instance.
To continue learning how to use the secured workspace from the DSVM, see Tutorial: Get started with a Python
script in Azure Machine Learning.
To learn more about common secure workspace configurations and input/output requirements, see Azure
Machine Learning secure workspace traffic flow.
Tutorial: Create production ML pipelines with
Python SDK v2 (preview) in a Jupyter notebook
5/25/2022 • 18 minutes to read • Edit Online
IMPORTANT
SDK v2 is currently in public preview. The preview version is provided without a service level agreement, and it's not
recommended for production workloads. Certain features might not be supported or might have constrained capabilities.
For more information, see Supplemental Terms of Use for Microsoft Azure Previews.
NOTE
For a tutorial that uses SDK v1 to build a pipeline, see Tutorial: Build an Azure Machine Learning pipeline for image
classification
In this tutorial, you'll use Azure Machine Learning (Azure ML) to create a production ready machine learning
(ML) project, using AzureML Python SDK v2 (preview).
You'll learn how to use the AzureML Python SDK v2 to:
Connect to your Azure ML workspace
Create Azure ML data assets
Create reusable Azure ML components
Create, validate and run Azure ML pipelines
Deploy the newly-trained model as an endpoint
Call the Azure ML endpoint for inferencing
Prerequisites
Complete the Quickstart: Get started with Azure Machine Learning to:
Create a workspace.
Create a cloud-based compute instance to use for your development environment.
Create a cloud-based compute cluster to use for training your model.
4. A list of folders shows each user who accesses the workspace. Select your folder, you'll find azureml-
samples is cloned.
IMPORTANT
The rest of this article contains the same content as you see in the notebook.
Switch to the Jupyter Notebook now if you want to run the code while you read along. To run a single code cell in a
notebook, click the code cell and hit Shift+Enter . Or, run the entire notebook by choosing Run all from the top toolbar
Introduction
In this tutorial, you'll create an Azure ML pipeline to train a model for credit default prediction. The pipeline
handles the data preparation, training and registering the trained model. You'll then run the pipeline, deploy the
model and use it.
The image below shows the pipeline as you'll see it in the AzureML portal once submitted. It's a rather simple
pipeline we'll use to walk you through the AzureML SDK v2.
The two steps are first data preparation and second training.
Set up the pipeline resources
The Azure ML framework can be used from CLI, Python SDK, or studio interface. In this example, you'll use the
AzureML Python SDK v2 to create a pipeline.
Before creating the pipeline, you'll set up the resources the pipeline will use:
The dataset for training
The software environment to run the pipeline
A compute resource to where the job will run
# Authentication package
from azure.identity import DefaultAzureCredential
In the next cell, enter your Subscription ID, Resource Group name and Workspace name. To find your
Subscription ID:
1. In the upper right Azure Machine Learning studio toolbar, select your workspace name.
2. At the bottom, select View all proper ties in Azure por tal
3. Copy the value from Azure portal into the code.
# get a handle to the workspace
ml_client = MLClient(
DefaultAzureCredential(),
subscription_id="<SUBSCRIPTION_ID>",
resource_group_name="<RESOURCE_GROUP>",
workspace_name="<AML_WORKSPACE_NAME>",
)
The result is a handler to the workspace that you'll use to manage other resources and jobs.
IMPORTANT
Creating MLClient will not connect to the workspace. The client initialization is lazy, it will wait for the first time it needs to
make a call (in the notebook below, that will happen during dataset registration).
credit_data = Data(
name="creditcard_defaults",
path=web_path,
type=AssetTypes.URI_FILE,
description="Dataset for credit card defaults",
tags={"source_type": "web", "source": "UCI ML Repo"},
version='1.0.0'
)
This code just created a Data asset, ready to be consumed as an input by the pipeline that you'll define in the
next sections. In addition, you can register the dataset to your workspace so it becomes reusable across
pipelines.
Registering the dataset will enable you to:
Reuse and share the dataset in future pipelines
Use versions to track the modification to the dataset
Use the dataset from Azure ML designer, which is Azure ML's GUI for pipeline authoring
Since this is the first time that you're making a call to the workspace, you may be asked to authenticate. Once the
authentication is complete, you'll then see the dataset registration completion message.
credit_data = ml_client.data.create_or_update(credit_data)
print(
f"Dataset with name {credit_data.name} was registered to workspace, the dataset version is
{credit_data.version}"
)
In the future, you can fetch the same dataset from the workspace using
credit_dataset = ml_client.data.get("<DATA ASSET NAME>", version='<VERSION>') .
import os
dependencies_dir = "./dependencies"
os.makedirs(dependencies_dir, exist_ok=True)
The specification contains some usual packages, that you'll use in your pipeline (numpy, pip), together with
some Azure ML specific packages (azureml-defaults, azureml-mlflow).
The Azure ML packages aren't mandatory to run Azure ML jobs. However, adding these packages will let you
interact with Azure ML for logging metrics and registering models, all inside the Azure ML job. You'll use them in
the training script later in this tutorial.
Use the yaml file to create and register this custom environment in your workspace:
custom_env_name = "aml-scikit-learn"
pipeline_job_env = Environment(
name=custom_env_name,
description="Custom environment for Credit Card Defaults pipeline",
tags={"scikit-learn": "0.24.2", "azureml-defaults": "1.38.0"},
conda_file=os.path.join(dependencies_dir, "conda.yml"),
image="mcr.microsoft.com/azureml/openmpi3.1.2-ubuntu18.04:20210727.v1",
version="1.0.0"
)
pipeline_job_env = ml_client.environments.create_or_update(pipeline_job_env)
print(
f"Environment with name {pipeline_job_env.name} is registered to workspace, the environment version is
{pipeline_job_env.version}"
)
import os
data_prep_src_dir = "./components/data_prep"
os.makedirs(data_prep_src_dir, exist_ok=True)
This script performs the simple task of splitting the data into train and test datasets. Azure ML mounts datasets
as folders to the computes, therefore, we created an auxiliary select_first_file function to access the data file
inside the mounted input folder.
MLFlow will be used to log the parameters and metrics during our pipeline run.
%%writefile {data_prep_src_dir}/data_prep.py
import os
import argparse
import pandas as pd
from sklearn.model_selection import train_test_split
import logging
import mlflow
def main():
"""Main function of the script."""
# Start Logging
mlflow.start_run()
mlflow.log_metric("num_samples", credit_df.shape[0])
mlflow.log_metric("num_features", credit_df.shape[1] - 1)
# output paths are mounted as folder, therefore, we are adding a filename to the path
credit_train_df.to_csv(os.path.join(args.train_data, "data.csv"), index=False)
# Stop Logging
mlflow.end_run()
if __name__ == "__main__":
main()
Now that you have a script that can perform the desired task, create an Azure ML Component from it.
You'll use the general purpose CommandComponent that can run command line actions. This command line
action can directly call system commands or run a script. The inputs/outputs are specified on the command line
via the ${{ ... }} notation.
%%writefile {data_prep_src_dir}/data_prep.yml
# <component>
name: data_prep_credit_defaults
display_name: Data preparation for training
# version: 1 # Not specifying a version will automatically update the version
type: command
inputs:
data:
type: uri_folder
test_train_ratio:
type: number
outputs:
train_data:
type: uri_folder
test_data:
type: uri_folder
code: .
environment:
# for this step, we'll use an AzureML curate environment
azureml:aml-scikit-learn:1.0.0
command: >-
python data_prep.py
--data ${{inputs.data}} --test_train_ratio ${{inputs.test_train_ratio}}
--train_data ${{outputs.train_data}} --test_data ${{outputs.test_data}}
# </component>
Once the yaml file and the script are ready, you can create your component using load_component() .
data_prep_component = ml_client.create_or_update(data_prep_component)
print(
f"Component {data_prep_component.name} with Version {data_prep_component.version} is registered"
)
import os
train_src_dir = "./components/train"
os.makedirs(train_src_dir, exist_ok=True)
Create the training script in the directory:
%%writefile {train_src_dir}/train.py
import argparse
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.metrics import classification_report
from azureml.core.model import Model
from azureml.core import Run
import os
import pandas as pd
import joblib
import mlflow
def select_first_file(path):
"""Selects first file in folder, use under assumption there is only one file in folder
Args:
path (str): path to directory or file to choose
Returns:
str: full path of selected file
"""
files = os.listdir(path)
return os.path.join(path, files[0])
# Start Logging
mlflow.start_run()
# enable autologging
mlflow.sklearn.autolog()
# This line creates a handles to the current run. It is used for model registration
run = Run.get_context()
os.makedirs("./outputs", exist_ok=True)
def main():
"""Main function of the script."""
# paths are mounted as folder, therefore, we are selecting the file from folder
train_df = pd.read_csv(select_first_file(args.train_data))
# paths are mounted as folder, therefore, we are selecting the file from folder
test_df = pd.read_csv(select_first_file(args.test_data))
clf = GradientBoostingClassifier(
n_estimators=args.n_estimators, learning_rate=args.learning_rate
)
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
print(classification_report(y_test, y_pred))
# Stop Logging
mlflow.end_run()
if __name__ == "__main__":
main()
As you can see in this training script, once the model is trained, the model file is saved and registered to the
workspace. Now you can use the registered model in inferencing endpoints.
For the environment of this step, you'll use one of the built-in (curated) Azure ML environments. The tag
azureml , tells the system to use look for the name in curated environments.
IMPORTANT
In the code below, replace <CPU-CLUSTER-NAME> with the name you used when you created a compute cluster in the
Quickstart: Create workspace resources you need to get started with Azure Machine Learning.
# the dsl decorator tells the sdk that we are defining an Azure ML pipeline
from azure.ai.ml import dsl, Input, Output
@dsl.pipeline(
compute="<CPU-CLUSTER-NAME>",
description="E2E data_perp-train pipeline",
)
def credit_defaults_pipeline(
pipeline_job_data_input,
pipeline_job_test_train_ratio,
pipeline_job_learning_rate,
pipeline_job_registered_model_name,
):
# using data_prep_function like a python call with its own inputs
data_prep_job = data_prep_component(
data=pipeline_job_data_input,
test_train_ratio=pipeline_job_test_train_ratio,
)
Now use your pipeline definition to instantiate a pipeline with your dataset, split rate of choice and the name
you picked for your model.
registered_model_name = "credit_defaults_model"
import webbrowser
# submit the pipeline job
returned_job = ml_client.jobs.create_or_update(
pipeline,
# Project's name
experiment_name="e2e_registered_components",
)
# open the pipeline in web browser
webbrowser.open(returned_job.services["Studio"].endpoint)
An output of "False" is expected from the above cell. You can track the progress of your pipeline, by using the
link generated in the cell above.
When you select on each component, you'll see more information about the results of that component. There
are two important parts to look for at this stage:
Outputs+logs > user_logs > std_log.txt This section shows the script run sdtout.
Outputs+logs > Metric This section shows different logged metrics. In this example. mlflow
autologging , has automatically logged the training metrics.
Deploy the model as an online endpoint
Now deploy your machine learning model as a web service in the Azure cloud.
To deploy a machine learning service, you'll usually need:
The model assets (filed, metadata) that you want to deploy. You've already registered these assets in your
training component.
Some code to run as a service. The code executes the model on a given input request. This entry script
receives data submitted to a deployed web service and passes it to the model, then returns the model's
response to the client. The script is specific to your model. The entry script must understand the data that the
model expects and returns.
In the following implementation the init() function loads the model, and the run function expects the data in
json format with the input data stored under data .
deploy_dir = "./deploy"
os.makedirs(deploy_dir, exist_ok=True)
%%writefile {deploy_dir}/score.py
import os
import logging
import json
import numpy
import joblib
def init():
"""
This function is called when the container is initialized/started, typically after create/update of the
deployment.
You can write the logic here to perform init operations like caching the model in memory
"""
global model
# AZUREML_MODEL_DIR is an environment variable created during deployment.
# It is the path to the model folder (./azureml-models/$MODEL_NAME/$VERSION)
model_path = os.path.join(os.getenv("AZUREML_MODEL_DIR"), "model.pkl")
# deserialize the model file back into a sklearn model
model = joblib.load(model_path)
logging.info("Init complete")
def run(raw_data):
"""
This function is called for every invocation of the endpoint to perform the actual scoring/prediction.
In the example we extract the data from the json input and call the scikit-learn model's predict()
method and return the result back
"""
logging.info("Request received")
data = json.loads(raw_data)["data"]
data = numpy.array(data)
result = model.predict(data)
logging.info("Request processed")
return result.tolist()
import uuid
endpoint = ml_client.begin_create_or_update(endpoint)
NOTE
Expect this deployment to take approximately 6 to 8 minutes.
# picking the model to deploy. Here we use the latest version of our registered model
model = ml_client.models.get(name=registered_model_name, version=latest_model_version)
blue_deployment = ml_client.begin_create_or_update(blue_deployment)
%%writefile {deploy_dir}/sample-request.json
{"data": [
[20000,2,2,1,24,2,2,-1,-1,-2,-2,3913,3102,689,0,0,0,0,689,0,0,0,0],
[10,9,8,7,6,5,4,3,2,1, 10,9,8,7,6,5,4,3,2,1,10,9,8]
]}
Clean up resources
If you're not going to use the endpoint, delete it to stop using the resource. Make sure no other deployments are
using an endpoint before you delete it.
NOTE
Expect this step to take approximately 6 to 8 minutes.
ml_client.online_endpoints.begin_delete(name=online_endpoint_name)
Next steps
Learn more about Azure ML logging.
Explore Azure Machine Learning with Jupyter
Notebooks
5/25/2022 • 2 minutes to read • Edit Online
The Azure Machine Learning Notebooks repository includes the latest Azure Machine Learning Python SDK
samples. These Jupyter notebooks are designed to help you explore the SDK and serve as models for your own
machine learning projects. In this repository, you'll find tutorial notebooks in the tutorials folder and feature-
specific notebooks in the how-to-use-azureml folder.
Also explore the community-driven repository of AzureML-Examples. This repository includes notebooks and
CLI (v2) examples. For information on the various example types, see the readme.
This article shows you how to access the repositories from the following environments:
Azure Machine Learning compute instance
Bring your own notebook server
Data Science Virtual Machine
jupyter notebook
These instructions install the base SDK packages necessary for the quickstart and tutorial notebooks. Other
sample notebooks may require you to install extra components. For more information, see Install the Azure
Machine Learning SDK for Python.
Create a new workspace using code in the configuration.ipynb notebook in your cloned directory.
3. From the directory where you added the configuration file, clone the Machine Learning Notebooks
repository.
5. Start the notebook server from the directory, which now contains the two clones and the config file.
jupyter notebook
Next steps
Explore the MachineLearningNotebooks and AzureML-Examples repositories to discover what Azure Machine
Learning can do.
For more GitHub sample projects and examples, see these repos:
Microsoft/MLOps
Microsoft/MLOpsPython
Try these tutorials:
Train and deploy an image classification model with MNIST
Prepare data and use automated machine learning to train a regression model with the NYC taxi data set
Example pipelines & datasets for Azure Machine
Learning designer
5/25/2022 • 10 minutes to read • Edit Online
Use the built-in examples in Azure Machine Learning designer to quickly get started building your own machine
learning pipelines. The Azure Machine Learning designer GitHub repository contains detailed documentation to
help you understand some common machine learning scenarios.
Prerequisites
An Azure subscription. If you don't have an Azure subscription, create a free account
An Azure Machine Learning workspace
IMPORTANT
If you do not see graphical elements mentioned in this document, such as buttons in studio or designer, you may not
have the right level of permissions to the workspace. Please contact your Azure subscription administrator to verify that
you have been granted the correct level of access. For more information, see Manage users and roles.
Regression
Explore these built-in regression samples.
SA M P L E T IT L E DESC RIP T IO N
Regression - Automobile Price Prediction (Basic) Predict car prices using linear regression.
Regression - Automobile Price Prediction (Advanced) Predict car prices using decision forest and boosted decision
tree regressors. Compare models to find the best algorithm.
Classification
Explore these built-in classification samples. You can learn more about the samples by opening the samples and
viewing the component comments in the designer.
SA M P L E T IT L E DESC RIP T IO N
Binary Classification with Feature Selection - Income Predict income as high or low, using a two-class boosted
Prediction decision tree. Use Pearson correlation to select features.
Binary Classification with custom Python script - Credit Risk Classify credit applications as high or low risk. Use the
Prediction Execute Python Script component to weight your data.
Binary Classification - Customer Relationship Prediction Predict customer churn using two-class boosted decision
trees. Use SMOTE to sample biased data.
Text Classification - Wikipedia SP 500 Dataset Classify company types from Wikipedia articles with
multiclass logistic regression.
Multiclass Classification - Letter Recognition Create an ensemble of binary classifiers to classify written
letters.
Computer vision
Explore these built-in computer vision samples. You can learn more about the samples by opening the samples
and viewing the component comments in the designer.
SA M P L E T IT L E DESC RIP T IO N
Image Classification using DenseNet Use computer vision components to build image
classification model based on PyTorch DenseNet.
Recommender
Explore these built-in recommender samples. You can learn more about the samples by opening the samples
and viewing the component comments in the designer.
SA M P L E T IT L E DESC RIP T IO N
Wide & Deep based Recommendation - Restaurant Rating Build a restaurant recommender engine from
Prediction restaurant/user features and ratings.
Recommendation - Movie Rating Tweets Build a movie recommender engine from movie/user
features and ratings.
Utility
Learn more about the samples that demonstrate machine learning utilities and features. You can learn more
about the samples by opening the samples and viewing the component comments in the designer.
SA M P L E T IT L E DESC RIP T IO N
Binary Classification using Vowpal Wabbit Model - Adult Vowpal Wabbit is a machine learning system which pushes
Income Prediction the frontier of machine learning with techniques such as
online, hashing, allreduce, reductions, learning2search, active,
and interactive learning. This sample shows how to use
Vowpal Wabbit model to build binary classification model.
Use custom R script - Flight Delay Prediction Use customized R script to predict if a scheduled passenger
flight will be delayed by more than 15 minutes.
Cross Validation for Binary Classification - Adult Income Use cross validation to build a binary classifier for adult
Prediction income.
Tune Parameters for Binary Classification - Adult Income Use Tune Model Hyperparameters to find optimal
Prediction hyperparameters to build a binary classifier.
Datasets
When you create a new pipeline in Azure Machine Learning designer, a number of sample datasets are included
by default. These sample datasets are used by the sample pipelines in the designer homepage.
The sample datasets are available under Datasets -Samples category. You can find this in the component
palette to the left of the canvas in the designer. You can use any of these datasets in your own pipeline by
dragging it to the canvas.
Adult Census Income Binary Classification dataset A subset of the 1994 Census database, using working adults
over the age of 16 with an adjusted income index of > 100.
Usage : Classify people using demographics to predict
whether a person earns over 50K a year.
Related Research : Kohavi, R., Becker, B., (1996). UCI
Machine Learning Repository. Irvine, CA: University of
California, School of Information and Computer Science
DATA SET N A M E DATA SET DESC RIP T IO N
Automobile price data (Raw) Information about automobiles by make and model,
including the price, features such as the number of cylinders
and MPG, as well as an insurance risk score.
The risk score is initially associated with auto price. It is then
adjusted for actual risk in a process known to actuaries as
symboling. A value of +3 indicates that the auto is risky, and
a value of -3 that it is probably safe.
Usage : Predict the risk score by features, using regression
or multivariate classification.
Related Research : Schlimmer, J.C. (1987). UCI Machine
Learning Repository. Irvine, CA: University of California,
School of Information and Computer Science.
CRM Appetency Labels Shared Labels from the KDD Cup 2009 customer relationship
prediction challenge (orange_small_train_appetency.labels).
CRM Churn Labels Shared Labels from the KDD Cup 2009 customer relationship
prediction challenge (orange_small_train_churn.labels).
CRM Dataset Shared This data comes from the KDD Cup 2009 customer
relationship prediction challenge
(orange_small_train.data.zip).
The dataset contains 50K customers from the French
Telecom company Orange. Each customer has 230
anonymized features, 190 of which are numeric and 40 are
categorical. The features are very sparse.
CRM Upselling Labels Shared Labels from the KDD Cup 2009 customer relationship
prediction challenge (orange_large_train_upselling.labels
Flight Delays Data Passenger flight on-time performance data taken from the
TranStats data collection of the U.S. Department of
Transportation (On-Time).
The dataset covers the time period April-October 2013.
Before uploading to the designer, the dataset was processed
as follows:
- The dataset was filtered to cover only the 70 busiest
airports in the continental US
- Canceled flights were labeled as delayed by more than 15
minutes
- Diverted flights were filtered out
- The following columns were selected: Year, Month,
DayofMonth, DayOfWeek, Carrier, OriginAirportID,
DestAirportID, CRSDepTime, DepDelay, DepDel15,
CRSArrTime, ArrDelay, ArrDel15, Canceled
German Credit Card UCI dataset The UCI Statlog (German Credit Card) dataset
(Statlog+German+Credit+Data), using the german.data file.
The dataset classifies people, described by a set of attributes,
as low or high credit risks. Each example represents a person.
There are 20 features, both numerical and categorical, and a
binary label (the credit risk value). High credit risk entries
have label = 2, low credit risk entries have label = 1. The cost
of misclassifying a low risk example as high is 1, whereas the
cost of misclassifying a high risk example as low is 5.
DATA SET N A M E DATA SET DESC RIP T IO N
IMDB Movie Titles The dataset contains information about movies that were
rated in Twitter tweets: IMDB movie ID, movie name, genre,
and production year. There are 17K movies in the dataset.
The dataset was introduced in the paper "S. Dooms, T. De
Pessemier and L. Martens. MovieTweetings: a Movie Rating
Dataset Collected From Twitter. Workshop on Crowdsourcing
and Human Computation for Recommender Systems,
CrowdRec at RecSys 2013."
Restaurant Feature Data A set of metadata about restaurants and their features, such
as food type, dining style, and location.
Usage : Use this dataset, in combination with the other two
restaurant datasets, to train and test a recommender
system.
Related Research : Bache, K. and Lichman, M. (2013). UCI
Machine Learning Repository. Irvine, CA: University of
California, School of Information and Computer Science.
Clean up resources
IMPORTANT
You can use the resources that you created as prerequisites for other Azure Machine Learning tutorials and how-to
articles.
Delete everything
If you don't plan to use anything that you created, delete the entire resource group so you don't incur any
charges.
1. In the Azure portal, select Resource groups on the left side of the window.
2. In the list, select the resource group that you created.
3. Select Delete resource group .
Deleting the resource group also deletes all resources that you created in the designer.
Delete individual assets
In the designer where you created your experiment, delete individual assets by selecting them and then
selecting the Delete button.
The compute target that you created here automatically autoscales to zero nodes when it's not being used. This
action is taken to minimize charges. If you want to delete the compute target, take these steps:
You can unregister datasets from your workspace by selecting each dataset and selecting Unregister .
To delete a dataset, go to the storage account by using the Azure portal or Azure Storage Explorer and manually
delete those assets.
Next steps
Learn the fundamentals of predictive analytics and machine learning with Tutorial: Predict automobile price with
the designer
What is Azure Machine Learning CLI & Python SDK
v2?
5/25/2022 • 4 minutes to read • Edit Online
APPLIES TO: Azure CLI ml extension v2 (current) Python SDK azure-ai-ml v2 (preview)
Azure Machine Learning CLI v2 and Azure Machine Learning Python SDK v2 (preview) introduce a consistency
of features and terminology across the interfaces. In order to create this consistency, the syntax of commands
differs, in some cases significantly, from the first versions (v1).
Azure ML Python SDK v2 is an updated Python SDK package, which allows users to:
Submit training jobs
Manage data, models, environments
Perform managed inferencing (real time and batch)
Stitch together multiple tasks and production workflows using Azure ML pipelines
The SDK v2 is on par with CLI v2 functionality and is consistent in how assets (nouns) and actions (verbs) are
used between SDK and CLI. For example, to list an asset, the list action can be used in both CLI and SDK. The
same list action can be used to list a compute, model, environment, and so on.
Use cases for SDK v2
The SDK v2 is useful in the following scenarios:
Use Python functions to build a single step or a complex workflow
SDK v2 allows you to build a single command or a chain of commands like python functions - the
command has a name, parameters, expects input, and returns output.
Move from simple to complex concepts incrementally
SDK v2 allows you to:
Construct a single command.
Add a hyperparameter sweep on top of that command,
Add the command with various others into a pipeline one after the other.
This construction is useful, given the iterative nature of machine learning.
Reusable components in pipelines
Azure ML introduces components for managing and reusing common logic across pipelines. This
functionality is available only via CLI v2 and SDK v2.
Managed inferencing
Azure ML offers endpoints to streamline model deployments for both real-time and batch inference
deployments. This functionality is available only via CLI v2 and SDK v2.
Next steps
Get started with CLI v2
Install and set up CLI (v2)
Train models with the CLI (v2)
Deploy and score models with managed online endpoint
Get started with SDK v2
Install and set up SDK (v2)
Train models with the Azure ML Python SDK v2 (preview)
Tutorial: Create production ML pipelines with Python SDK v2 (preview) in a Jupyter notebook
Train models with Azure Machine Learning
5/25/2022 • 6 minutes to read • Edit Online
Azure Machine Learning provides several ways to train your models, from code-first solutions using the SDK to
low-code solutions such as automated machine learning and the visual designer. Use the following list to
determine which training method is right for you:
Azure Machine Learning SDK for Python: The Python SDK provides several ways to train models, each
with different capabilities.
T RA IN IN G M ET H O D DESC RIP T IO N
Machine learning pipeline Pipelines are not a different training method, but a way
of defining a workflow using modular, reusable
steps , that can include training as part of the workflow.
Machine learning pipelines support using automated
machine learning and run configuration to train models.
Since pipelines are not focused specifically on training,
the reasons for using a pipeline are more varied than the
other training methods. Generally, you might use a
pipeline when:
* You want to schedule unattended processes such
as long running training jobs or data preparation.
* Use multiple steps that are coordinated across
heterogeneous compute resources and storage
locations.
* Use the pipeline as a reusable template for specific
scenarios, such as retraining or batch scoring.
* Track and version data sources, inputs, and
outputs for your workflow.
* Your workflow is implemented by different teams
that work on specific steps independently . Steps
can then be joined together in a pipeline to implement
the workflow.
Designer : Azure Machine Learning designer provides an easy entry-point into machine learning for
building proof of concepts, or for users with little coding experience. It allows you to train models using a
drag and drop web-based UI. You can use Python code as part of the design, or train models without
writing any code.
Azure CLI : The machine learning CLI provides commands for common tasks with Azure Machine
Learning, and is often used for scripting and automating tasks . For example, once you've created a
training script or pipeline, you might use the Azure CLI to start a training run on a schedule or when the
data files used for training are updated. For training models, it provides commands that submit training
jobs. It can submit jobs using run configurations or pipelines.
Each of these training methods can use different types of compute resources for training. Collectively, these
resources are referred to as compute targets . A compute target can be a local machine or a cloud resource,
such as an Azure Machine Learning Compute, Azure HDInsight, or a remote virtual machine.
Python SDK
The Azure Machine Learning SDK for Python allows you to build and run machine learning workflows with
Azure Machine Learning. You can interact with the service from an interactive Python session, Jupyter
Notebooks, Visual Studio Code, or other IDE.
What is the Azure Machine Learning SDK for Python
Install/update the SDK
Configure a development environment for Azure Machine Learning
Run configuration
A generic training job with Azure Machine Learning can be defined using the ScriptRunConfig. The script run
configuration is then used, along with your training script(s) to train a model on a compute target.
You may start with a run configuration for your local computer, and then switch to one for a cloud-based
compute target as needed. When changing the compute target, you only change the run configuration you use.
A run also logs information about the training job, such as the inputs, outputs, and logs.
What is a run configuration?
Tutorial: Train your first ML model
Examples: Jupyter Notebook and Python examples of training models
How to: Configure a training run
Automated Machine Learning
Define the iterations, hyperparameter settings, featurization, and other settings. During training, Azure Machine
Learning tries different algorithms and parameters in parallel. Training stops once it hits the exit criteria you
defined.
TIP
In addition to the Python SDK, you can also use Automated ML through Azure Machine Learning studio.
Azure CLI
The machine learning CLI is an extension for the Azure CLI. It provides cross-platform CLI commands for
working with Azure Machine Learning. Typically, you use the CLI to automate tasks, such as training a machine
learning model.
Use the CLI extension for Azure Machine Learning
MLOps on Azure
VS Code
You can use the VS Code extension to run and manage your training jobs. See the VS Code resource
management how-to guide to learn more.
Next steps
Learn how to Configure a training run.
Distributed training with Azure Machine Learning
5/25/2022 • 2 minutes to read • Edit Online
In this article, you learn about distributed training and how Azure Machine Learning supports it for deep
learning models.
In distributed training the workload to train a model is split up and shared among multiple mini processors,
called worker nodes. These worker nodes work in parallel to speed up model training. Distributed training can
be used for traditional ML models, but is better suited for compute and time intensive tasks, like deep learning
for training deep neural networks.
Data parallelism
Data parallelism is the easiest to implement of the two distributed training approaches, and is sufficient for most
use cases.
In this approach, the data is divided into partitions, where the number of partitions is equal to the total number
of available nodes, in the compute cluster. The model is copied in each of these worker nodes, and each worker
operates on its own subset of the data. Keep in mind that each node has to have the capacity to support the
model that's being trained, that is the model has to entirely fit on each node. The following diagram provides a
visual demonstration of this approach.
Each node independently computes the errors between its predictions for its training samples and the labeled
outputs. In turn, each node updates its model based on the errors and must communicate all of its changes to
the other nodes to update their corresponding models. This means that the worker nodes need to synchronize
the model parameters, or gradients, at the end of the batch computation to ensure they are training a consistent
model.
Model parallelism
In model parallelism, also known as network parallelism, the model is segmented into different parts that can
run concurrently in different nodes, and each one will run on the same data. The scalability of this method
depends on the degree of task parallelization of the algorithm, and it is more complex to implement than data
parallelism.
In model parallelism, worker nodes only need to synchronize the shared parameters, usually once for each
forward or backward-propagation step. Also, larger models aren't a concern since each node operates on a
subsection of the model on the same training data.
Next steps
Learn how to use compute targets for model training with the Python SDK.
For a technical example, see the reference architecture scenario.
Find tips for MPI, TensorFlow, and PyTorch in the Distributed GPU training guide
Deep learning vs. machine learning in Azure
Machine Learning
5/25/2022 • 9 minutes to read • Edit Online
This article explains deep learning vs. machine learning and how they fit into the broader category of artificial
intelligence. Learn about deep learning solutions you can build on Azure Machine Learning, such as fraud
detection, voice and facial recognition, sentiment analysis, and time series forecasting.
For guidance on choosing algorithms for your solutions, see the Machine Learning Algorithm Cheat Sheet.
Consider the following definitions to understand deep learning vs. machine learning vs. AI:
Deep learning is a subset of machine learning that's based on artificial neural networks. The learning
process is deep because the structure of artificial neural networks consists of multiple input, output, and
hidden layers. Each layer contains units that transform the input data into information that the next layer
can use for a certain predictive task. Thanks to this structure, a machine can learn through its own data
processing.
Machine learning is a subset of artificial intelligence that uses techniques (such as deep learning) that
enable machines to use experience to improve at tasks. The learning process is based on the following
steps:
1. Feed data into an algorithm. (In this step you can provide additional information to the model, for
example, by performing feature extraction.)
2. Use this data to train a model.
3. Test and deploy the model.
4. Consume the deployed model to do an automated predictive task. (In other words, call and use the
deployed model to receive the predictions returned by the model.)
Ar tificial intelligence (AI) is a technique that enables computers to mimic human intelligence. It
includes machine learning.
By using machine learning and deep learning techniques, you can build computer systems and applications that
do tasks that are commonly associated with human intelligence. These tasks include image recognition, speech
recognition, and language translation.
Techniques of deep learning vs. machine learning
Now that you have the overview of machine learning vs. deep learning, let's compare the two techniques. In
machine learning, the algorithm needs to be told how to make an accurate prediction by consuming more
information (for example, by performing feature extraction). In deep learning, the algorithm can learn how to
make an accurate prediction through its own data processing, thanks to the artificial neural network structure.
The following table compares the two techniques in more detail:
A L L M A C H IN E L EA RN IN G O N LY DEEP L EA RN IN G
Number of data points Can use small amounts of data to Needs to use large amounts of training
make predictions. data to make predictions.
Featurization process Requires features to be accurately Learns high-level features from data
identified and created by users. and creates new features by itself.
Learning approach Divides the learning process into Moves through the learning process
smaller steps. It then combines the by resolving the problem on an end-
results from each step into one output. to-end basis.
Execution time Takes comparatively little time to train, Usually takes a long time to train
ranging from a few seconds to a few because a deep learning algorithm
hours. involves many layers.
Output The output is usually a numerical The output can have multiple formats,
value, like a score or a classification. like a text, a score or a sound.
Next steps
The following articles show you more options for using open-source deep learning models in Azure Machine
Learning:
Classify handwritten digits by using a TensorFlow model
Classify handwritten digits by using a TensorFlow estimator and Keras
Classify handwritten digits by using a Chainer model
What is automated machine learning (AutoML)?
5/25/2022 • 15 minutes to read • Edit Online
Automated machine learning, also referred to as automated ML or AutoML, is the process of automating the
time-consuming, iterative tasks of machine learning model development. It allows data scientists, analysts, and
developers to build ML models with high scale, efficiency, and productivity all while sustaining model quality.
Automated ML in Azure Machine Learning is based on a breakthrough from our Microsoft Research division.
Traditional machine learning model development is resource-intensive, requiring significant domain knowledge
and time to produce and compare dozens of models. With automated machine learning, you'll accelerate the
time it takes to get production-ready ML models with great ease and efficiency.
NLP-Text ✓ ✓
Block algorithms ✓ ✓
Cross validation ✓ ✓
Featurization summar y ✓
Model settings
These settings can be applied to the best model as a result of your automated ML experiment.
Get guardrails ✓ ✓
When to use AutoML: classification, regression, forecasting, computer
vision & NLP
Apply automated ML when you want Azure Machine Learning to train and tune a model for you using the target
metric you specify. Automated ML democratizes the machine learning model development process, and
empowers its users, no matter their data science expertise, to identify an end-to-end machine learning pipeline
for any problem.
ML professionals and developers across industries can use automated ML to:
Implement ML solutions without extensive programming knowledge
Save time and resources
Leverage data science best practices
Provide agile problem-solving
Classification
Classification is a common machine learning task. Classification is a type of supervised learning in which
models learn using training data, and apply those learnings to new data. Azure Machine Learning offers
featurizations specifically for these tasks, such as deep neural network text featurizers for classification. Learn
more about featurization options.
The main goal of classification models is to predict which categories new data will fall into based on learnings
from its training data. Common classification examples include fraud detection, handwriting recognition, and
object detection. Learn more and see an example at Create a classification model with automated ML.
See examples of classification and automated machine learning in these Python notebooks: Fraud Detection,
Marketing Prediction, and Newsgroup Data Classification
Regression
Similar to classification, regression tasks are also a common supervised learning task. Azure Machine Learning
offers featurizations specifically for these tasks.
Different from classification where predicted output values are categorical, regression models predict numerical
output values based on independent predictors. In regression, the objective is to help establish the relationship
among those independent predictor variables by estimating how one variable impacts the others. For example,
automobile price based on features like, gas mileage, safety rating, etc. Learn more and see an example of
regression with automated machine learning.
See examples of regression and automated machine learning for predictions in these Python notebooks: CPU
Performance Prediction,
Time -series forecasting
Building forecasts is an integral part of any business, whether it's revenue, inventory, sales, or customer
demand. You can use automated ML to combine techniques and approaches and get a recommended, high-
quality time-series forecast. Learn more with this how-to: automated machine learning for time series
forecasting.
An automated time-series experiment is treated as a multivariate regression problem. Past time-series values
are "pivoted" to become additional dimensions for the regressor together with other predictors. This approach,
unlike classical time series methods, has an advantage of naturally incorporating multiple contextual variables
and their relationship to one another during training. Automated ML learns a single, but often internally
branched model for all items in the dataset and prediction horizons. More data is thus available to estimate
model parameters and generalization to unseen series becomes possible.
Advanced forecasting configuration includes:
holiday detection and featurization
time-series and DNN learners (Auto-ARIMA, Prophet, ForecastTCN)
many models support through grouping
rolling-origin cross validation
configurable lags
rolling window aggregate features
See examples of regression and automated machine learning for predictions in these Python notebooks: Sales
Forecasting, Demand Forecasting, and Forecasting GitHub's Daily Active Users.
Computer vision (preview)
IMPORTANT
This feature is currently in public preview. This preview version is provided without a service-level agreement. Certain
features might not be supported or might have constrained capabilities. For more information, see Supplemental Terms of
Use for Microsoft Azure Previews.
Support for computer vision tasks allows you to easily generate models trained on image data for scenarios like
image classification and object detection.
With this capability you can:
Seamlessly integrate with the Azure Machine Learning data labeling capability
Use labeled data for generating image models
Optimize model performance by specifying the model algorithm and tuning the hyperparameters.
Download or deploy the resulting model as a web service in Azure Machine Learning.
Operationalize at scale, leveraging Azure Machine Learning MLOps and ML Pipelines capabilities.
Authoring AutoML models for vision tasks is supported via the Azure ML Python SDK. The resulting
experimentation runs, models, and outputs can be accessed from the Azure Machine Learning studio UI.
Learn how to set up AutoML training for computer vision models.
TA SK DESC RIP T IO N
Multi-class image classification Tasks where an image is classified with only a single label
from a set of classes - e.g. each image is classified as either
an image of a 'cat' or a 'dog' or a 'duck'
TA SK DESC RIP T IO N
Multi-label image classification Tasks where an image could have one or more labels from a
set of labels - e.g. an image could be labeled with both 'cat'
and 'dog'
Object detection Tasks to identify objects in an image and locate each object
with a bounding box e.g. locate all dogs and cats in an image
and draw a bounding box around each.
IMPORTANT
This feature is currently in public preview. This preview version is provided without a service-level agreement, and it's not
recommended for production workloads. Certain features might not be supported or might have constrained capabilities.
For more information, see Supplemental Terms of Use for Microsoft Azure Previews.
Support for natural language processing (NLP) tasks in automated ML allows you to easily generate models
trained on text data for text classification and named entity recognition scenarios. Authoring automated ML
trained NLP models is supported via the Azure Machine Learning Python SDK. The resulting experimentation
runs, models, and outputs can be accessed from the Azure Machine Learning studio UI.
The NLP capability supports:
End-to-end deep neural network NLP training with the latest pre-trained BERT models
Seamless integration with Azure Machine Learning data labeling
Use labeled data for generating NLP models
Multi-lingual support with 104 languages
Distributed training with Horovod
Learn how to set up AutoML training for NLP models.
You can also inspect the logged run information, which contains metrics gathered during the run. The training
run produces a Python serialized object ( .pkl file) that contains the model and data preprocessing.
While model building is automated, you can also learn how important or relevant features are to the generated
models.
Remote ML compute clusters Full set of features Start-up time for cluster nodes
Parallelize child runs Start-up time for each child run
Large data support
DNN-based featurization
Dynamic scalability of compute
cluster on demand
No-code experience (web UI) also
available
Feature availability
More features are available when you use the remote compute, as shown in the table below.
Continue a run ✓
Forecasting ✓ ✓
F EAT URE REM OT E LO C A L
Data guardrails ✓ ✓
IMPORTANT
Testing your models with a test dataset to evaluate generated models is a preview feature. This capability is an
experimental preview feature, and may change at any time.
Learn how to configure AutoML experiments to use test data (preview) with the SDK or with the Azure Machine
Learning studio.
You can also test any existing automated ML model (preview)), including models from child runs, by providing
your own test data or by setting aside a portion of your training data.
Feature engineering
Feature engineering is the process of using domain knowledge of the data to create features that help ML
algorithms learn better. In Azure Machine Learning, scaling and normalization techniques are applied to facilitate
feature engineering. Collectively, these techniques and feature engineering are referred to as featurization.
For automated machine learning experiments, featurization is applied automatically, but can also be customized
based on your data. Learn more about what featurization is included and how AutoML helps prevent over-fitting
and imbalanced data in your models.
NOTE
Automated machine learning featurization steps (feature normalization, handling missing data, converting text to numeric,
etc.) become part of the underlying model. When using the model for predictions, the same featurization steps applied
during training are applied to your input data automatically.
Customize featurization
Additional feature engineering techniques such as, encoding and transforms are also available.
Enable this setting with:
Azure Machine Learning studio: Enable Automatic featurization in the View additional
configuration section with these steps.
Python SDK: Specify "feauturization": 'auto' / 'off' / 'FeaturizationConfig' in your AutoMLConfig
object. Learn more about enabling featurization.
Ensemble models
Automated machine learning supports ensemble models, which are enabled by default. Ensemble learning
improves machine learning results and predictive performance by combining multiple models as opposed to
using single models. The ensemble iterations appear as the final iterations of your run. Automated machine
learning uses both voting and stacking ensemble methods for combining models:
Voting : predicts based on the weighted average of predicted class probabilities (for classification tasks) or
predicted regression targets (for regression tasks).
Stacking : stacking combines heterogenous models and trains a meta-model based on the output from the
individual models. The current default meta-models are LogisticRegression for classification tasks and
ElasticNet for regression/forecasting tasks.
The Caruana ensemble selection algorithm with sorted ensemble initialization is used to decide which models to
use within the ensemble. At a high level, this algorithm initializes the ensemble with up to five models with the
best individual scores, and verifies that these models are within 5% threshold of the best score to avoid a poor
initial ensemble. Then for each ensemble iteration, a new model is added to the existing ensemble and the
resulting score is calculated. If a new model improved the existing ensemble score, the ensemble is updated to
include the new model.
See the how-to for changing default ensemble settings in automated machine learning.
Next steps
There are multiple resources to get you up and running with AutoML.
Tutorials/ how-tos
Tutorials are end-to-end introductory examples of AutoML scenarios.
For a code first experience , follow the Tutorial: Train a regression model with AutoML and Python.
For a low or no-code experience , see the Tutorial: Train a classification model with no-code AutoML in
Azure Machine Learning studio.
For using AutoML to train computer vision models , see the Tutorial: Train an object detection
model (preview) with AutoML and Python.
How-to articles provide additional detail into what functionality automated ML offers. For example,
Configure the settings for automatic training experiments
Without code in the Azure Machine Learning studio.
With the Python SDK.
Learn how to train forecasting models with time series data.
Learn how to train computer vision models with Python.
Learn how to view the generated code from your automated ML models.
Jupyter notebook samples
Review detailed code examples and use cases in the GitHub notebook repository for automated machine
learning samples.
Python SDK reference
Deepen your expertise of SDK design patterns and class specifications with the AutoML class reference
documentation.
NOTE
Automated machine learning capabilities are also available in other Microsoft solutions such as, ML.NET, HDInsight, Power
BI and SQL Server
Prevent overfitting and imbalanced data with
automated machine learning
5/25/2022 • 7 minutes to read • Edit Online
Overfitting and imbalanced data are common pitfalls when you build machine learning models. By default,
Azure Machine Learning's automated machine learning provides charts and metrics to help you identify these
risks, and implements best practices to help mitigate them.
Identify overfitting
Overfitting in machine learning occurs when a model fits the training data too well, and as a result can't
accurately predict on unseen test data. In other words, the model has simply memorized specific patterns and
noise in the training data, but is not flexible enough to make predictions on real data.
Consider the following trained models and their corresponding train and test accuracies.
A 99.9% 95%
B 87% 87%
C 99.9% 45%
Considering model A , there is a common misconception that if test accuracy on unseen data is lower than
training accuracy, the model is overfitted. However, test accuracy should always be less than training accuracy,
and the distinction for overfit vs. appropriately fit comes down to how much less accurate.
When comparing models A and B , model A is a better model because it has higher test accuracy, and although
the test accuracy is slightly lower at 95%, it is not a significant difference that suggests overfitting is present. You
wouldn't choose model B simply because the train and test accuracies are closer together.
Model C represents a clear case of overfitting; the training accuracy is very high but the test accuracy isn't
anywhere near as high. This distinction is subjective, but comes from knowledge of your problem and data, and
what magnitudes of error are acceptable.
Prevent overfitting
In the most egregious cases, an overfitted model assumes that the feature value combinations seen during
training will always result in the exact same output for the target.
The best way to prevent overfitting is to follow ML best-practices including:
Using more training data, and eliminating statistical bias
Preventing target leakage
Using fewer features
Regularization and hyperparameter optimization
Model complexity limitations
Cross-validation
In the context of automated ML, the first three items above are best-practices you implement . The last three
bolded items are best-practices automated ML implements by default to protect against overfitting. In
settings other than automated ML, all six best-practices are worth following to avoid overfitting models.
C H A RT DESC RIP T IO N
Confusion Matrix Evaluates the correctly classified labels against the actual
labels of the data.
ROC Curves Evaluates the ratio of correct labels against the ratio of false-
positive labels.
Next steps
See examples and learn how to build models using automated machine learning:
Follow the Tutorial: Automatically train a regression model with Azure Machine Learning
Configure the settings for automatic training experiment:
In Azure Machine Learning studio, use these steps.
With the Python SDK, use these steps.
What is Azure Machine Learning designer?
5/25/2022 • 4 minutes to read • Edit Online
Azure Machine Learning designer is a drag-and-drop interface used to train and deploy models in Azure
Machine Learning. This article describes the tasks you can do in the designer.
To get started with the designer, see Tutorial: Train a no-code regression model.
To learn about the components available in the designer, see the Algorithm and component reference.
The designer uses your Azure Machine Learning workspace to organize shared resources such as:
Pipelines
Datasets
Compute resources
Registered models
Published pipelines
Real-time endpoints
Pipeline
A pipeline consists of datasets and analytical components, which you connect. Pipelines have many uses: you
can make a pipeline that trains a single model, or one that trains multiple models. You can create a pipeline that
makes predictions in real time or in batch, or make a pipeline that only cleans data. Pipelines let you reuse your
work and organize your projects.
Pipeline draft
As you edit a pipeline in the designer, your progress is saved as a pipeline draft . You can edit a pipeline draft at
any point by adding or removing components, configuring compute targets, creating parameters, and so on.
A valid pipeline has these characteristics:
Datasets can only connect to components.
components can only connect to either datasets or other components.
All input ports for components must have some connection to the data flow.
All required parameters for each component must be set.
When you're ready to run your pipeline draft, you submit a pipeline run.
Pipeline run
Each time you run a pipeline, the configuration of the pipeline and its results are stored in your workspace as a
pipeline run . You can go back to any pipeline run to inspect it for troubleshooting or auditing. Clone a pipeline
run to create a new pipeline draft for you to edit.
Pipeline runs are grouped into experiments to organize run history. You can set the experiment for every
pipeline run.
Datasets
A machine learning dataset makes it easy to access and work with your data. Several sample datasets are
included in the designer for you to experiment with. You can register more datasets as you need them.
Component
A component is an algorithm that you can perform on your data. The designer has several components ranging
from data ingress functions to training, scoring, and validation processes.
A component may have a set of parameters that you can use to configure the component's internal algorithms.
When you select a component on the canvas, the component's parameters are displayed in the Properties pane
to the right of the canvas. You can modify the parameters in that pane to tune your model. You can set the
compute resources for individual components in the designer.
For some help navigating through the library of machine learning algorithms available, see Algorithm &
component reference overview. For help with choosing an algorithm, see the Azure Machine Learning
Algorithm Cheat Sheet.
Compute resources
Use compute resources from your workspace to run your pipeline and host your deployed models as online
endpoints or pipeline endpoints (for batch inference). The supported compute targets are:
C O M P UT E TA RGET T RA IN IN G DEP LO Y M EN T
Compute targets are attached to your Azure Machine Learning workspace. You manage your compute targets in
your workspace in the Azure Machine Learning studio.
Deploy
To perform real-time inferencing, you must deploy a pipeline as a online endpoint. The online endpoint creates
an interface between an external application and your scoring model. A call to an online endpoint returns
prediction results to the application in real time. To make a call to an online endpoint, you pass the API key that
was created when you deployed the endpoint. The endpoint is based on REST, a popular architecture choice for
web programming projects.
Online endpoints must be deployed to an Azure Kubernetes Service cluster.
To learn how to deploy your model, see Tutorial: Deploy a machine learning model with the designer.
NOTE
Azure Machine Learning Endpoints (preview) provide an improved, simpler deployment experience. Endpoints support
both real-time and batch inference scenarios. Endpoints provide a unified interface to invoke and manage model
deployments across compute types. See What are Azure Machine Learning endpoints (preview)?.
Publish
You can also publish a pipeline to a pipeline endpoint . Similar to an online endpoint, a pipeline endpoint lets
you submit new pipeline runs from external applications using REST calls. However, you cannot send or receive
data in real time using a pipeline endpoint.
Published pipelines are flexible, they can be used to train or retrain models, perform batch inferencing, process
new data, and much more. You can publish multiple pipelines to a single pipeline endpoint and specify which
pipeline version to run.
A published pipeline runs on the compute resources you define in the pipeline draft for each component.
The designer creates the same PublishedPipeline object as the SDK.
Next steps
Learn the fundamentals of predictive analytics and machine learning with Tutorial: Predict automobile price
with the designer
Learn how to modify existing designer samples to adapt them to your needs.
Machine Learning Algorithm Cheat Sheet for Azure
Machine Learning designer
5/25/2022 • 2 minutes to read • Edit Online
The Azure Machine Learning Algorithm Cheat Sheet helps you choose the right algorithm from the
designer for a predictive analytics model.
Azure Machine Learning has a large library of algorithms from the classification , recommender systems ,
clustering , anomaly detection , regression , and text analytics families. Each is designed to address a
different type of machine learning problem.
For more information, see How to select algorithms.
Download and print the Machine Learning Algorithm Cheat Sheet in tabloid size to keep it handy and get help
choosing an algorithm.
Next steps
See more information on How to select algorithms
Learn about studio in Azure Machine Learning and the Azure portal.
Tutorial: Build a prediction model in Azure Machine Learning designer.
Learn about deep learning vs. machine learning.
How to select algorithms for Azure Machine
Learning
5/25/2022 • 7 minutes to read • Edit Online
A common question is “Which machine learning algorithm should I use?” The algorithm you select depends
primarily on two different aspects of your data science scenario:
What you want to do with your data? Specifically, what is the business question you want to answer
by learning from your past data?
What are the requirements of your data science scenario? Specifically, what is the accuracy,
training time, linearity, number of parameters, and number of features your solution supports?
NOTE
Download the cheat sheet here: Machine Learning Algorithm Cheat Sheet (11x17 in.)
Along with guidance in the Azure Machine Learning Algorithm Cheat Sheet, keep in mind other requirements
when choosing a machine learning algorithm for your solution. Following are additional factors to consider,
such as the accuracy, training time, linearity, number of parameters and number of features.
Comparison of machine learning algorithms
Some learning algorithms make particular assumptions about the structure of the data or the desired results. If
you can find one that fits your needs, it can give you more useful results, more accurate predictions, or faster
training times.
The following table summarizes some of the most important characteristics of algorithms from the
classification, regression, and clustering families:
Classification
family
Regression
family
Clustering
family
Accuracy
Accuracy in machine learning measures the effectiveness of a model as the proportion of true results to total
cases. In Machine Learning designer, the Evaluate Model component computes a set of industry-standard
evaluation metrics. You can use this component to measure the accuracy of a trained model.
Getting the most accurate answer possible isn’t always necessary. Sometimes an approximation is adequate,
depending on what you want to use it for. If that is the case, you may be able to cut your processing time
dramatically by sticking with more approximate methods. Approximate methods also naturally tend to avoid
overfitting.
There are three ways to use the Evaluate Model component:
Generate scores over your training data in order to evaluate the model
Generate scores on the model, but compare those scores to scores on a reserved testing set
Compare scores for two different but related models, using the same set of data
For a complete list of metrics and approaches you can use to evaluate the accuracy of machine learning models,
see Evaluate Model component.
Training time
In supervised learning, training means using historical data to build a machine learning model that minimizes
errors. The number of minutes or hours necessary to train a model varies a great deal between algorithms.
Training time is often closely tied to accuracy; one typically accompanies the other.
In addition, some algorithms are more sensitive to the number of data points than others. You might choose a
specific algorithm because you have a time limitation, especially when the data set is large.
In Machine Learning designer, creating and using a machine learning model is typically a three-step process:
1. Configure a model, by choosing a particular type of algorithm, and then defining its parameters or
hyperparameters.
2. Provide a dataset that is labeled and has data compatible with the algorithm. Connect both the data and
the model to Train Model component.
3. After training is completed, use the trained model with one of the scoring components to make
predictions on new data.
Linearity
Linearity in statistics and machine learning means that there is a linear relationship between a variable and a
constant in your dataset. For example, linear classification algorithms assume that classes can be separated by a
straight line (or its higher-dimensional analog).
Lots of machine learning algorithms make use of linearity. In Azure Machine Learning designer, they include:
Multiclass logistic regression
Two-class logistic regression
Support vector machines
Linear regression algorithms assume that data trends follow a straight line. This assumption isn't bad for some
problems, but for others it reduces accuracy. Despite their drawbacks, linear algorithms are popular as a first
strategy. They tend to be algorithmically simple and fast to train.
Nonlinear class boundar y : Relying on a linear classification algorithm would result in low accuracy.
Data with a nonlinear trend : Using a linear regression method would generate much larger errors than
necessary.
Number of parameters
Parameters are the knobs a data scientist gets to turn when setting up an algorithm. They are numbers that
affect the algorithm’s behavior, such as error tolerance or number of iterations, or options between variants of
how the algorithm behaves. The training time and accuracy of the algorithm can sometimes be sensitive to
getting just the right settings. Typically, algorithms with large numbers of parameters require the most trial and
error to find a good combination.
Alternatively, there is the Tune Model Hyperparameters component in Machine Learning designer: The goal of
this component is to determine the optimum hyperparameters for a machine learning model. The component
builds and tests multiple models by using different combinations of settings. It compares metrics over all
models to get the combinations of settings.
While this is a great way to make sure you’ve spanned the parameter space, the time required to train a model
increases exponentially with the number of parameters. The upside is that having many parameters typically
indicates that an algorithm has greater flexibility. It can often achieve very good accuracy, provided you can find
the right combination of parameter settings.
Number of features
In machine learning, a feature is a quantifiable variable of the phenomenon you are trying to analyze. For
certain types of data, the number of features can be very large compared to the number of data points. This is
often the case with genetics or textual data.
A large number of features can bog down some learning algorithms, making training time unfeasibly long.
Support vector machines are particularly well suited to scenarios with a high number of features. For this
reason, they have been used in many applications from information retrieval to text and image classification.
Support vector machines can be used for both classification and regression tasks.
Feature selection refers to the process of applying statistical tests to inputs, given a specified output. The goal is
to determine which columns are more predictive of the output. The Filter Based Feature Selection component in
Machine Learning designer provides multiple feature selection algorithms to choose from. The component
includes correlation methods such as Pearson correlation and chi-squared values.
You can also use the Permutation Feature Importance component to compute a set of feature importance scores
for your dataset. You can then leverage these scores to help you determine the best features to use in a model.
Next steps
Learn more about Azure Machine Learning designer
For descriptions of all the machine learning algorithms available in Azure Machine Learning designer, see
Machine Learning designer algorithm and component reference
To explore the relationship between deep learning, machine learning, and AI, see Deep Learning vs. Machine
Learning
What are Azure Machine Learning endpoints?
5/25/2022 • 9 minutes to read • Edit Online
APPLIES TO: Azure CLI ml extension v2 (current) Python SDK azure-ai-ml v2 (preview)
Use Azure Machine Learning endpoints to streamline model deployments for both real-time and batch inference
deployments. Endpoints provide a unified interface to invoke and manage model deployments across compute
types.
In this article, you learn about:
Endpoints
Deployments
Managed online endpoints
Kubernetes online endpoints
Batch inference endpoints
TIP
A request can bypass the configured traffic load balancing by including an HTTP header of azureml-model-deployment .
Set the header value to the name of the deployment you want the request to route to.
Traffic to one deployment can also be mirrored (copied) to another deployment. Mirroring is useful when you
want to test for things like response latency or error conditions without impacting live clients. For example, a
blue/green deployment where 100% of the traffic is routed to blue and a 10% is mirrored to the green
deployment. With mirroring, the results of the traffic to the green deployment aren't returned to the clients but
metrics and logs are collected. Mirror traffic functionality is a preview feature.
IMPORTANT
This feature is currently in public preview. This preview version is provided without a service-level agreement, and it's not
recommended for production workloads. Certain features might not be supported or might have constrained capabilities.
For more information, see Supplemental Terms of Use for Microsoft Azure Previews.
You can configure security for inbound scoring requests and outbound communications with the workspace and
other services separately. Inbound communications use the private endpoint of the Azure Machine Learning
workspace. Outbound communications use private endpoints created per deployment.
For more information, see Secure online endpoints.
M A N A GED O N L IN E EN DP O IN T S K UB ERN ET ES O N L IN E EN DP O IN T S
M A N A GED O N L IN E EN DP O IN T S K UB ERN ET ES O N L IN E EN DP O IN T S
Recommended users Users who want a managed model Users who prefer Kubernetes and can
deployment and enhanced MLOps self-manage infrastructure
experience requirements
View costs
Managed online endpoints let you monitor cost at the endpoint and deployment level
NOTE
Managed online endpoints are based on Azure Machine Learning compute. When using a managed online
endpoint, you pay for the compute and networking charges. There is no additional surcharge.
If you use a virtual network and secure outbound (egress) traffic from the managed online endpoint, there is an
additional cost. For egress, three private endpoints are created per deployment for the managed online endpoint.
These are used to communicate with the default storage account, Azure Container Registry, and workspace.
Additional networking charges may apply. For more information on pricing, see the Azure pricing calculator.
NOTE
If you are using existing V1 FileDataset for batch endpoint, we recommend migrating them to V2 data assets and refer
to them directly when invoking batch endpoints. Currently only data assets of type uri_folder or uri_file are
supported. Batch endpoints created with GA CLIv2 (2.4.0 and newer) or GA REST API (2022-05-01 and newer) will not
support V1 Dataset.
You can also extract the URI or path on datastore extracted from V1 FileDataset by using az ml dataset show
command with --query parameter and use that information for invoke.
While Batch endpoints created with earlier APIs will continue to support V1 FileDataset, we will be adding further V2
data assets support with the latest API versions for even more usability and flexibility. For more information on V2
data assets, see Work with data using SDK v2 (preview). For more information on the new V2 experience, see What is
v2.
For more information on supported input options, see Batch scoring with batch endpoint.
For more information on supported input options, see Batch scoring with batch endpoint.
Specify the storage output location to any datastore and path. By default, batch endpoints store their output to
the workspace's default blob store, organized by the Job Name (a system-generated GUID).
Security
Authentication: Azure Active Directory Tokens
SSL: enabled by default for endpoint invocation
VNET support: Batch endpoints support ingress protection. A batch endpoint with ingress protection will
accept scoring requests only from hosts inside a virtual network but not from the public internet. A batch
endpoint that is created in a private-link enabled workspace will have ingress protection. To create a private-
link enabled workspace, see Create a secure workspace.
Next steps
How to deploy online endpoints with the Azure CLI
How to deploy batch endpoints with the Azure CLI
How to use online endpoints with the studio
Deploy models with REST
How to monitor managed online endpoints
How to view managed online endpoint costs
Manage and increase quotas for resources with Azure Machine Learning
ONNX and Azure Machine Learning: Create and
accelerate ML models
5/25/2022 • 4 minutes to read • Edit Online
Learn how using the Open Neural Network Exchange (ONNX) can help optimize the inference of your machine
learning model. Inference, or model scoring, is the phase where the deployed model is used for prediction, most
commonly on production data.
Optimizing machine learning models for inference (or model scoring) is difficult since you need to tune the
model and the inference library to make the most of the hardware capabilities. The problem becomes extremely
hard if you want to get optimal performance on different kinds of platforms (cloud/edge, CPU/GPU, etc.), since
each one has different capabilities and characteristics. The complexity increases if you have models from a
variety of frameworks that need to run on a variety of platforms. It's very time consuming to optimize all the
different combinations of frameworks and hardware. A solution to train once in your preferred framework and
run anywhere on the cloud or edge is needed. This is where ONNX comes in.
Microsoft and a community of partners created ONNX as an open standard for representing machine learning
models. Models from many frameworks including TensorFlow, PyTorch, SciKit-Learn, Keras, Chainer, MXNet,
MATLAB, and SparkML can be exported or converted to the standard ONNX format. Once the models are in the
ONNX format, they can be run on a variety of platforms and devices.
ONNX Runtime is a high-performance inference engine for deploying ONNX models to production. It's
optimized for both cloud and edge and works on Linux, Windows, and Mac. Written in C++, it also has C,
Python, C#, Java, and JavaScript (Node.js) APIs for usage in a variety of environments. ONNX Runtime supports
both DNN and traditional ML models and integrates with accelerators on different hardware such as TensorRT
on NVidia GPUs, OpenVINO on Intel processors, DirectML on Windows, and more. By using ONNX Runtime, you
can benefit from the extensive production-grade optimizations, testing, and ongoing improvements.
ONNX Runtime is used in high-scale Microsoft services such as Bing, Office, and Azure Cognitive Services.
Performance gains are dependent on a number of factors, but these Microsoft services have seen an average
2x performance gain on CPU . In addition to Azure Machine Learning services, ONNX Runtime also runs in
other products that support Machine Learning workloads, including:
Windows: The runtime is built into Windows as part of Windows Machine Learning and runs on hundreds of
millions of devices.
Azure SQL product family: Run native scoring on data in Azure SQL Edge and Azure SQL Managed Instance.
ML.NET: Run ONNX models in ML.NET.
Get ONNX models
You can obtain ONNX models in several ways:
Train a new ONNX model in Azure Machine Learning (see examples at the bottom of this article) or by using
automated Machine Learning capabilities
Convert existing model from another format to ONNX (see the tutorials)
Get a pre-trained ONNX model from the ONNX Model Zoo
Generate a customized ONNX model from Azure Custom Vision service
Many models including image classification, object detection, and text processing can be represented as ONNX
models. If you run into an issue with a model that cannot be converted successfully, please file an issue in the
GitHub of the respective converter that you used. You can continue using your existing format model until the
issue is addressed.
import onnxruntime
session = onnxruntime.InferenceSession("path to model")
The documentation accompanying the model usually tells you the inputs and outputs for using the model. You
can also use a visualization tool such as Netron to view the model. ONNX Runtime also lets you query the
model metadata, inputs, and outputs:
session.get_modelmeta()
first_input_name = session.get_inputs()[0].name
first_output_name = session.get_outputs()[0].name
To inference your model, use run and pass in the list of outputs you want returned (leave empty if you want all
of them) and a map of the input values. The result is a list of the outputs.
For the complete Python API reference, see the ONNX Runtime reference docs.
Examples
See how-to-use-azureml/deployment/onnx for example Python notebooks that create and deploy ONNX
models.
Learn how to run notebooks by following the article Use Jupyter notebooks to explore this service.
Samples for usage in other languages can be found in the ONNX Runtime GitHub.
More info
Learn more about ONNX or contribute to the project:
ONNX project website
ONNX code on GitHub
Learn more about ONNX Runtime or contribute to the project:
ONNX Runtime project website
ONNX Runtime GitHub Repo
Prebuilt Docker images for inference
5/25/2022 • 2 minutes to read • Edit Online
Prebuilt Docker container images for inference are used when deploying a model with Azure Machine Learning.
The images are prebuilt with popular machine learning frameworks and Python packages. You can also extend
the packages to add other packages by using one of the following methods:
Add Python packages.
Use prebuilt inference image as base for a new Dockerfile. Using this method, you can install both Python
packages and apt packages .
PyTorch
F RA M EW O RK P RE- IN STA L L ED C URAT ED
VERSIO N C P U/ GP U PA C K A GES M C R PAT H EN VIRO N M EN T
SciKit-Learn
F RA M EW O RK P RE- IN STA L L ED C URAT ED
VERSIO N C P U/ GP U PA C K A GES M C R PAT H EN VIRO N M EN T
ONNX Runtime
F RA M EW O RK P RE- IN STA L L ED C URAT ED
VERSIO N C P U/ GP U PA C K A GES M C R PAT H EN VIRO N M EN T
XGBoost
F RA M EW O RK P RE- IN STA L L ED C URAT ED
VERSIO N C P U/ GP U PA C K A GES M C R PAT H EN VIRO N M EN T
No framework
F RA M EW O RK P RE- IN STA L L ED C URAT ED
VERSIO N C P U/ GP U PA C K A GES M C R PAT H EN VIRO N M EN T
NA CPU NA AzureML-minimal-
mcr.microsoft.com/azureml/minimal-
ubuntu18.04-py37-cpu- ubuntu18.04-py37-
inference:latest
cpu-inference
Next steps
Add Python packages to prebuilt images.
Use a prebuilt package as a base for a new Dockerfile.
MLOps: Model management, deployment, lineage,
and monitoring with Azure Machine Learning
5/25/2022 • 9 minutes to read • Edit Online
APPLIES TO: Azure CLI ml extension v2 (current) Python SDK azure-ai-ml v2 (preview)
In this article, learn about how do Machine Learning Operations (MLOps) in Azure Machine Learning to manage
the lifecycle of your models. MLOps improves the quality and consistency of your machine learning solutions.
What is MLOps?
MLOps is based on DevOps principles and practices that increase the efficiency of workflows. Examples include
continuous integration, delivery, and deployment. MLOps applies these principles to the machine learning
process, with the goal of:
Faster experimentation and development of models.
Faster deployment of models into production.
Quality assurance and end-to-end lineage tracking.
TIP
A registered model is a logical container for one or more files that make up your model. For example, if you have a model
that's stored in multiple files, you can register them as a single model in your Machine Learning workspace. After
registration, you can then download or deploy the registered model and receive all the files that were registered.
Registered models are identified by name and version. Each time you register a model with the same name as
an existing one, the registry increments the version. More metadata tags can be provided during registration.
These tags are then used when you search for a model. Machine Learning supports any model that can be
loaded by using Python 3.5.2 or higher.
TIP
You can also register models trained outside Machine Learning.
You can't delete a registered model that's being used in an active deployment. For more information, see the
"Register model" section of Deploy models.
IMPORTANT
When you use the Filter by Tags option on the Models page of Azure Machine Learning Studio, instead of using
TagName : TagValue , use TagName=TagValue without spaces.
TIP
While some information on models and datasets is automatically captured, you can add more information by using tags.
When you look for registered models and datasets in your workspace, you can use tags as a filter.
Associating a dataset with a registered model is an optional step. For information on how to reference a dataset when
you register a model, see the Model class reference.
Next steps
Learn more by reading and exploring the following resources:
How and where to deploy models with Machine Learning
Tutorial: Train and deploy a model
End-to-end MLOps examples repo
CI/CD of machine learning models with Azure Pipelines
Create clients that consume a deployed model
Machine learning at scale
Azure AI reference architectures and best practices repo
Use open-source machine learning libraries and
platforms with Azure Machine Learning
5/25/2022 • 5 minutes to read • Edit Online
In this article, learn about open-source Python machine learning libraries and platforms you can use with Azure
Machine Learning. Train, deploy, and manage the end-to-end machine learning process using open source
projects you prefer. Use development tools, like Jupyter Notebooks and Visual Studio Code, to leverage your
existing models and scripts in Azure Machine Learning.
Model deployment
Once models are trained and ready for production, you have to choose how to deploy it. Azure Machine
Learning provides various deployment targets. For more information, see the where and how to deploy article.
Standardize model formats with ONNX
After training, the contents of the model such as learned parameters are serialized and saved to a file. Each
framework has its own serialization format. When working with different frameworks and tools, it means you
have to deploy models according to the framework's requirements. To standardize this process, you can use the
Open Neural Network Exchange (ONNX) format. ONNX is an open-source format for artificial intelligence
models. ONNX supports interoperability between frameworks. This means you can train a model in one of the
many popular machine learning frameworks like PyTorch, convert it into ONNX format, and consume the ONNX
model in a different framework like ML.NET.
For more information on ONNX and how to consume ONNX models, see the following articles:
Create and accelerate ML models with ONNX
Use ONNX models in .NET applications
Package and deploy models as containers
Container technologies such as Docker are one way to deploy models as web services. Containers provide a
platform and resource agnostic way to build and orchestrate reproducible software environments. With these
core technologies, you can use preconfigured environments, preconfigured container images or custom ones to
deploy your machine learning models to such as Kubernetes clusters. For GPU intensive workflows, you can use
tools like NVIDIA Triton Inference server to make predictions using GPUs.
Secure deployments with homomorphic encryption
Securing deployments is an important part of the deployment process. To deploy encrypted inferencing
services, use the encrypted-inference open-source Python library. The encrypted inferencing package provides
bindings based on Microsoft SEAL, a homomorphic encryption library.
P RIM A RY A Z URE C A N O N IC A L
SC EN A RIO P ERSO N A O F F ERIN G O SS O F F ERIN G P IP E ST REN GT H S
Model Data scientist Azure Machine Kubeflow Data -> Model Distribution,
orchestration Learning Pipelines caching, code-
(Machine Pipelines first, reuse
learning)
Data Data engineer Azure Data Apache Airflow Data -> Data Strongly typed
orchestration Factory pipelines movement, data-
(Data prep) centric activities
Code & app App Developer / Azure Pipelines Jenkins Code + Model - Most open and
orchestration Ops > App/Service flexible activity
(CI/CD) support,
approval queues,
phases with
gating
Next steps
Azure Machine Learning pipelines are a powerful facility that begins delivering value in the early development
stages.
Define pipelines with the Azure ML CLI v2
Define pipelines with the Azure ML SDK v2
Define pipelines with Designer
Try out CLI v2 pipeline example
Try out Python SDK v2 pipeline example
What is an Azure Machine Learning component?
5/25/2022 • 3 minutes to read • Edit Online
APPLIES TO: Azure CLI ml extension v2 (current) Python SDK azure-ai-ml v2 (preview)
An Azure Machine Learning component is a self-contained piece of code that does one step in a machine
learning pipeline. A component is analogous to a function - it has a name, inputs, outputs, and a body.
Components are the building blocks of the Azure Machine Learning pipelines.
A component consists of three parts:
Metadata: name, display_name, version, type, etc.
Interface: input/output specifications (name, type, description, default value, etc.).
Command, Code & Environment: command, code and environment required to run the component.
Next steps
Define component with the Azure ML CLI v2.
Define component with the Azure ML SDK v2.
Define component with Designer.
Component CLI v2 YAML reference.
What is Azure Machine Learning Pipeline?.
Try out CLI v2 component example.
Try out Python SDK v2 component example.
MLflow and Azure Machine Learning
5/25/2022 • 2 minutes to read • Edit Online
APPLIES TO: Azure CLI ml extension v2 (current) Python SDK azure-ai-ml v2 (preview)
Azure Machine Learning only uses MLflow Tracking for metric logging and artifact storage for your experiments,
whether you created the experiment via the Azure Machine Learning Python SDK, Azure Machine Learning CLI
or the Azure Machine Learning studio.
NOTE
Unlike the Azure Machine Learning SDK v1, there is no logging functionality in the SDK v2 (preview), and it is
recommended to use MLflow for logging and tracking.
MLflow is an open-source library for managing the lifecycle of your machine learning experiments. MLflow's
tracking URI and logging API, collectively known as MLflow Tracking is a component of MLflow that logs and
tracks your training run metrics and model artifacts, no matter your experiment's environment--locally on your
computer, on a remote compute target, a virtual machine or an Azure Machine Learning compute instance.
Track experiments
With MLflow Tracking you can connect Azure Machine Learning as the backend of your MLflow experiments. By
doing so, you can
Track and log experiment metrics and artifacts in your Azure Machine Learning workspace. If you already
use MLflow Tracking for your experiments, the workspace provides a centralized, secure, and scalable
location to store training metrics and models. Learn more at Track ML models with MLflow and Azure
Machine Learning CLI v2.
Model management in MLflow or Azure Machine Learning model registry.
You can use MLflow's tracking URI and logging API, collectively known as MLflow Tracking, to submit training
jobs with MLflow Projects and Azure Machine Learning backend support (preview). You can submit jobs locally
with Azure Machine Learning tracking or migrate your runs to the cloud like via an Azure Machine Learning
Compute.
Learn more at Train ML models with MLflow projects and Azure Machine Learning (preview).
Next steps
Track ML models with MLflow and Azure Machine Learning CLI v2
Convert your custom model to MLflow model format for no code deployments
Deploy MLflow models to an online endpoint
What is responsible AI? (preview)
5/25/2022 • 6 minutes to read • Edit Online
Transparency
When AI systems are used to help inform decisions that have tremendous impacts on people's lives, it's critical
that people understand how those decisions were made. For example, a bank might use an AI system to decide
whether a person is creditworthy, or a company might use an AI system to determine the most qualified
candidates to hire.
A crucial part of transparency is what we refer to as interpretability, or the useful explanation of the behavior of
AI systems and their components. Improving interpretability requires that stakeholders comprehend how and
why they function so that they can identify potential performance issues, safety and privacy concerns, fairness
issues, exclusionary practices, or unintended outcomes.
Transparency in Azure Machine Learning : Azure Machine Learning’s Model Interpretability and
Counterfactual What-If components of the Responsible AI dashboard enables data scientists and ML developers
to generate human-understandable descriptions of the predictions of a model. It provides multiple views into a
model’s behavior: global explanations (for example, what features affect the overall behavior of a loan allocation
model) and local explanations (for example, why a customer’s loan application was approved or rejected). One
can also observe model explanations for a selected cohort as a subgroup of data points. Moreover, the
Counterfactual What-If component enables understanding and debugging a machine learning model in terms of
how it reacts to input (feature) changes. Azure Machine Learning also supports a Responsible AI scorecard, a
customizable report which machine learning developers can easily configure, download, and share with their
technical and non-technical stakeholders to educate them about data and model health and compliance and
build trust. This scorecard could also be used in audit reviews to inform the stakeholders about the
characteristics of machine learning models.
Accountability
The people who design and deploy AI systems must be accountable for how their systems operate.
Organizations should draw upon industry standards to develop accountability norms. These norms can ensure
that AI systems aren't the final authority on any decision that impacts people's lives and that humans maintain
meaningful control over otherwise highly autonomous AI systems.
Accountability in Azure Machine Learning : Azure Machine Learning’s Machine Learning Operations
(MLOps) is based on DevOps principles and practices that increase the efficiency of workflows. It specifically
supports quality assurance and end-to-end lineage tracking to capture the governance data for the end-to-end
ML lifecycle. The logged lineage information can include who is publishing models, why changes were made,
and when models were deployed or used in production.
Azure Machine Learning’s Responsible AI scorecard creates accountability by enabling cross-stakeholders
communications and by empowering machine learning developers to easily configure, download, and share
their model health insights with their technical and non-technical stakeholders to educate them about data and
model health and compliance and build trust.
The ML platform also enables decision-making by informing model-driven and data-driven business decisions:
Data-driven insights to further understand heterogeneous treatment effects on an outcome, using historic
data only. For example, “how would a medicine impact a patient’s blood pressure?". Such insights are
provided through theCausal Inference component of the Responsible AI dashboard.
Model-driven insights, to answer end-users’ questions such as “what can I do to get a different outcome
from your AI next time?” to inform their actions. Such insights are provided to data scientists through the
Counterfactual What-If component of the Responsible AI dashboard.
Next steps
For more information on how to implement Responsible AI in Azure Machine Learning, see Responsible AI
dashboard.
Learn more about the ABOUT ML set of guidelines for machine learning system documentation.
Assess AI systems and make data-driven decisions
with Azure Machine Learning Responsible AI
dashboard (preview)
5/25/2022 • 10 minutes to read • Edit Online
Responsible AI requires rigorous engineering. Rigorous engineering, however, can be tedious, manual, and time-
consuming without the right tooling and infrastructure. Data scientists need tools to implement responsible AI
in practice effectively and efficiently.
The Responsible AI dashboard provides a single interface that makes responsible machine learning engineering
efficient and interoperable across the larger model development and assessment lifecycle. The tool brings
together several mature Responsible AI tools in the areas of model statistics assessment, data exploration,
machine learning interpretability, unfairness assessment, error analysis, causal inference, and counterfactual
analysis for a holistic assessment and debugging of models and making informed business decisions. With a
single command or simple UI wizard, the dashboard addresses the fragmentation issues of multiple tools and
enables you to:
1. Evaluate and debug your machine learning models by identifying model errors, diagnosing why those errors
are happening, and informing your mitigation steps.
2. Boost your data-driven decision-making abilities by addressing questions such as “what is the minimum
change the end user could apply to their features to get a different outcome from the model?” and/or “what
is the causal effect of reducing red meat consumption on diabetes progression?”
3. Export Responsible AI metadata of your data and models for sharing offline with product and compliance
stakeholders.
Diagnose Counterfactual Analysis and What-If The Counterfactual Analysis and what-
if component consists of two
functionalities for better error
diagnosis:
- Generating a set of examples with
minimal changes to a given point such
that they change the model's
prediction (showing the closest
datapoints with opposite model
precisions).
- Enabling interactive and custom
what-if perturbations for individual
data points to understand how the
model reacts to feature changes.
Exploratory data analysis, counterfactual analysis, and causal inference capabilities can assist you make
informed model-driven and data-driven decisions responsibly.
Below are the components of the Responsible AI dashboard supporting responsible decision making:
Data Explorer
The component could be reused here to understand data distributions and identify over- and
underrepresentation. Data exploration is a critical part of decision making as one can conclude that it
isn't feasible to make informed decisions about a cohort that is underrepresented within data.
Causal Inference
The Causal Inference component estimates how a real-world outcome changes in the presence of an
intervention. It also helps to construct promising interventions by simulating different feature
responses to various interventions and creating rules to determine which population cohorts would
benefit from a particular intervention. Collectively, these functionalities allow you to apply new
policies and effect real-world change.
The capabilities of this component are founded by EconML package, which estimates heterogeneous
treatment effects from observational data via machine learning.
Counterfactual Analysis
The Counterfactual Analysis component described above could be reused here to help data scientists
generate a set of similar datapoints with opposite prediction outcomes (showing minimum changes
applied to a datapoint's features leading to opposite model predictions). Providing counterfactual
examples to the end users inform their perspective, educating them on how they can take action to get
the desired outcome from the model in the future.
The capabilities of this component are founded by DiCE package.
Model Overview -> Error Analysis -> Data Explorer To identify model errors and diagnose them by
understanding the underlying data distribution
Model Overview -> Fairness Assessment -> Data Explorer To identify model fairness issues and diagnose them by
understanding the underlying data distribution
Model Overview -> Error Analysis -> Counterfactuals To diagnose errors in individual instances with counterfactual
Analysis and What-If analysis (minimum change to lead to a different model
prediction)
Model Overview -> Data Explorer To understand the root cause of errors and fairness issues
introduced via data imbalances or lack of representation of a
particular data cohort
Model Overview -> Interpretability To diagnose model errors through understanding how the
model has made its predictions
Data Explorer -> Causal Inference To distinguish between correlations and causations in the
data or decide the best treatments to apply to see a positive
outcome
Interpretability -> Causal Inference To learn whether the factors that model has used for
decision making has any causal effect on the real-world
outcome.
RESP O N SIB L E A I DA SH B O A RD F LO W USE C A SE
Data Explorer -> Counterfactuals Analysis and What-If To address customer questions about what they can do next
time to get a different outcome from an AI.
Next steps
Learn how to generate the Responsible AI dashboard via CLIv2 and SDKv2 or studio UI
Learn how to generate a Responsible AI scorecard) based on the insights observed in the Responsible AI
dashboard.
Model interpretablity (preview)
5/25/2022 • 8 minutes to read • Edit Online
This article describes methods you can use for model interpretability in Azure Machine Learning.
IMPORTANT
With the release of the Responsible AI dashboard which includes model interpretability, we recommend users to migrate
to the new experience as the older SDKv1 preview model interpretability dashboard will no longer be actively maintained.
Mimic Explainer (Global Surrogate) + Mimic explainer is based on the idea of Model-agnostic
SHAP tree training global surrogate models to
mimic opaque-box models. A global
surrogate model is an intrinsically
interpretable model that is trained to
approximate the predictions of any
opaque-box model as accurately as
possible. Data scientists can interpret
the surrogate model to draw
conclusions about the opaque-box
model. The Responsible AI dashboard
uses LightGBM
(LGBMExplainableModel), paired with
the SHAP (SHapley Additive
exPlanations) tree explainer, which is a
specific explainer to trees and
ensembles of trees. The combination of
LightGBM and SHAP tree provide
model-agnostic global and local
explanations of your machine learning
models.
Mimic Explainer (Global Surrogate) Mimic explainer is based on the idea of Model-agnostic
training global surrogate models to
mimic opaque-box models. A global
surrogate model is an intrinsically
interpretable model that is trained to
approximate the predictions of any
opaque-box model as accurately as
possible. Data scientists can interpret
the surrogate model to draw
conclusions about the opaque-box
model. You can use one of the
following interpretable models as your
surrogate model: LightGBM
(LGBMExplainableModel), Linear
Regression (LinearExplainableModel),
Stochastic Gradient Descent
explainable model
(SGDExplainableModel), and Decision
Tree (DecisionTreeExplainableModel).
IN T ERP RETA B IL IT Y T EC H N IQ UE DESC RIP T IO N TYPE
Besides the interpretability techniques described above, we support another SHAP-based explainer, called
TabularExplainer . Depending on the model, TabularExplainer uses one of the supported SHAP explainers:
The explanation functions accept both models and pipelines as input. If a model is provided, the model must
implement the prediction function predict or predict_proba that conforms to the Scikit convention. If your
model does not support this, you can wrap your model in a function that generates the same outcome as
predict or predict_proba in Scikit and use that wrapper function with the selected explainer. If a pipeline is
provided, the explanation function assumes that the running pipeline script returns a prediction. Using this
wrapping technique, azureml.interpret can support models trained via PyTorch, TensorFlow, and Keras deep
learning frameworks as well as classic machine learning models.
Next steps
See the how-to guide for generating a Responsible AI dashboard with model interpretability via CLIv2 and
SDKv2 or studio UI
See the Responsible AI scorecard generate a Responsible AI scorecard based on the insights observed in the
Responsible AI dashboard.
See the how-to for enabling interpretability for models training both locally and on Azure Machine Learning
remote compute resources.
Learn how to enable interpretability for automated machine learning models.
See the sample notebooks for additional scenarios.
If you're interested in interpretability for text scenarios, see Interpret-text, a related open source repo to
Interpret-Community, for interpretability techniques for NLP. azureml.interpret package does not currently
support these techniques but you can get started with an example notebook on text classification.
Machine learning fairness (preview)
5/25/2022 • 6 minutes to read • Edit Online
Learn about machine learning fairness and how the Fairlearn open-source Python package can help you assess
and mitigate unfairness issues in machine learning models.
NOTE
Fairness is a socio-technical challenge. Many aspects of fairness, such as justice and due process, are not captured in
quantitative fairness metrics. Also, many quantitative fairness metrics can't all be satisfied simultaneously. The goal with
the Fairlearn open-source package is to enable humans to assess different impact and mitigation strategies. Ultimately, it
is up to the human users building artificial intelligence and machine learning models to make trade-offs that are
appropriate to their scenario.
NOTE
A fairness assessment is not a purely technical exercise. The Fairlearn open-source package can help you assess the
fairness of a model, but it will not perform the assessment for you. The Fairlearn open-source package helps identify
quantitative metrics to assess fairness, but developers must also perform a qualitative analysis to evaluate the fairness of
their own models. The sensitive features noted above is an example of this kind of qualitative analysis.
During assessment phase, fairness is quantified through disparity metrics. Disparity metrics can evaluate and
compare model's behavior across different groups either as ratios or as differences. The Fairlearn open-source
package supports two classes of disparity metrics:
Disparity in model performance: These sets of metrics calculate the disparity (difference) in the values of
the selected performance metric across different subgroups. Some examples include:
disparity in accuracy rate
disparity in error rate
disparity in precision
disparity in recall
disparity in MAE
many others
Disparity in selection rate: This metric contains the difference in selection rate among different
subgroups. An example of this is disparity in loan approval rate. Selection rate means the fraction of
datapoints in each class classified as 1 (in binary classification) or distribution of prediction values (in
regression).
NOTE
Mitigating unfairness in a model means reducing the unfairness, but this technical mitigation cannot eliminate this
unfairness completely. The unfairness mitigation algorithms in the Fairlearn open-source package can provide suggested
mitigation strategies to help reduce unfairness in a machine learning model, but they are not solutions to eliminate
unfairness completely. There may be other parity constraints or criteria that should be considered for each particular
developer's machine learning model. Developers using Azure Machine Learning must determine for themselves if the
mitigation sufficiently eliminates any unfairness in their intended use and deployment of machine learning models.
The Fairlearn open-source package supports the following types of parity constraints:
PA RIT Y C O N ST RA IN T P URP O SE M A C H IN E L EA RN IN G TA SK
Mitigation algorithms
The Fairlearn open-source package provides postprocessing and reduction unfairness mitigation algorithms:
Reduction: These algorithms take a standard black-box machine learning estimator (for example, a LightGBM
model) and generate a set of retrained models using a sequence of re-weighted training datasets. For
example, applicants of a certain gender might be up-weighted or down-weighted to retrain models and
reduce disparities across different gender groups. Users can then pick a model that provides the best trade-
off between accuracy (or other performance metric) and disparity, which generally would need to be based
on business rules and cost calculations.
Post-processing: These algorithms take an existing classifier and the sensitive feature as input. Then, they
derive a transformation of the classifier's prediction to enforce the specified fairness constraints. The biggest
advantage of threshold optimization is its simplicity and flexibility as it doesn’t need to retrain the model.
SUP P O RT ED
M A C H IN E SEN SIT IVE PA RIT Y A L GO RIT H M
A L GO RIT H M DESC RIP T IO N L EA RN IN G TA SK F EAT URES C O N ST RA IN T S TYPE
Black-box
ExponentiatedGradient Binary Categorical Demographic Reduction
approach to fair classification parity, equalized
classification odds
described in A
Reductions
Approach to Fair
Classification
Next steps
Learn how to use the different components by checking out the Fairlearn's GitHub, user guide, examples, and
sample notebooks.
Learn how to enable fairness assessment of machine learning models in Azure Machine Learning.
See the sample notebooks for additional fairness assessment scenarios in Azure Machine Learning.
Make data-driven policies and influence decision
making (preview)
5/25/2022 • 3 minutes to read • Edit Online
While machine learning models are powerful in identifying patterns in data and making predictions, they offer
little support for estimating how the real-world outcome changes in the presence of an intervention.
Practitioners have become increasingly focused on using historical data to inform their future decisions and
business interventions. For example, how would revenue be affected if a corporation pursues a new pricing
strategy? Would a new medication improve a patient’s condition, all else equal?
The Causal Inference component of the Responsible AI dashboard addresses these questions by estimating the
effect of a feature on an outcome of interest on average, across a population or a cohort and on an individual
level. It also helps to construct promising interventions by simulating different feature responses to various
interventions and creating rules to determine which population cohorts would benefit from a particular
intervention. Collectively, these functionalities allow decision makers to apply new policies and affect real-world
change.
The capabilities of this component are founded by EconML package, which estimates heterogeneous treatment
effects from observational data via double machine learning technique.
Use Causal Inference when you need to:
Identify the features that have the most direct effect on your outcome of interest.
Decide what overall treatment policy to take to maximize real-world impact on an outcome of interest.
Understand how individuals with certain feature values would respond to a particular treatment policy.
The causal effects computed based on the treatment features is purely a data property. Hence, a trained
model is optional when computing the causal effects.
Double Machine Learning is a method for estimating (heterogeneous) treatment effects when all potential
confounders/controls (factors that simultaneously had a direct effect on the treatment decision in the collected
data and the observed outcome) are observed but are either too many (high-dimensional) for classical statistical
approaches to be applicable or their effect on the treatment and outcome can't be satisfactorily modeled by
parametric functions (non-parametric). Both latter problems can be addressed via machine learning techniques
(for an example, see Chernozhukov2016).
The method reduces the problem to first estimating two predictive tasks:
Predicting the outcome from the controls
Predicting the treatment from the controls
Then the method combines these two predictive models in a final stage estimation to create a model of the
heterogeneous treatment effect. The approach allows for arbitrary machine learning algorithms to be used for
the two predictive tasks, while maintaining many favorable statistical properties related to the final model (for
example, small mean squared error, asymptotic normality, construction of confidence intervals).
What other tools does Microsoft provide for causal inference?
Project Azua provides a novel framework focusing on end-to-end causal inference. Azua’s technology DECI
(deep end-to-end causal inference) is a single model that can simultaneously do causal discovery and causal
inference. We only require the user to provide data, and the model can output the causal relationships among all
different variables. By itself, this can provide insights into the data and enables metrics such as individual
treatment effect (ITE), average treatment effect (ATE) and conditional average treatment effect (CATE) to be
calculated, which can then be used to make optimal decisions. The framework is scalable for large data, both in
terms of the number of variables and the number of data points; it can also handle missing data entries with
mixed statistical types.
EconML (powering the backend of the Responsible AI dashboard)is a Python package that applies the power of
machine learning techniques to estimate individualized causal responses from observational or experimental
data. The suite of estimation methods provided in EconML represents the latest advances in causal machine
learning. By incorporating individual machine learning steps into interpretable causal models, these methods
improve the reliability of what-if predictions and make causal analysis quicker and easier for a broad set of
users.
DoWhy is a Python library that aims to spark causal thinking and analysis. DoWhy provides a principled four-
step interface for causal inference that focuses on explicitly modeling causal assumptions and validating them as
much as possible. The key feature of DoWhy is its state-of-the-art refutation API that can automatically test
causal assumptions for any estimation method, thus making inference more robust and accessible to non-
experts. DoWhy supports estimation of the average causal effect for backdoor, front-door, instrumental variable
and other identification methods, and estimation of the conditional effect (CATE) through an integration with the
EconML library.
Next steps
Learn how to generate the Responsible AI dashboard via CLIv2 and SDKv2 or studio UI
Learn how to generate a Responsible AI scorecard) based on the insights observed in the Responsible AI
dashboard.
Assess errors in ML models (preview)
5/25/2022 • 2 minutes to read • Edit Online
One of the most apparent challenges with current model debugging practices is using aggregate metrics to
score models on a benchmark. Model accuracy may not be uniform across subgroups of data, and there might
exist input cohorts for which the model fails more often. The direct consequences of these failures are a lack of
reliability and safety, unfairness, and a loss of trust in machine learning altogether.
Error Analysis moves away from aggregate accuracy metrics, exposes the distribution of errors to developers in
a transparent way, and enables them to identify & diagnose errors efficiently.
The Error Analysis component of the Responsible AI dashboard provides machine learning practitioners with a
deeper understanding of model failure distribution and assists them with quickly identifying erroneous cohorts
of data. It contributes to the “identify” stage of the model lifecycle workflow through a decision tree that reveals
cohorts with high error rates and a heatmap that visualizes how a few input features impact the error rate
across cohorts. Discrepancies in error might occur when the system underperforms for specific demographic
groups or infrequently observed input cohorts in the training data.
The capabilities of this component are founded by Error Analysis) capabilities on generating model error
profiles.
Use Error Analysis when you need to:
Gain a deep understanding of how model failures are distributed across a given dataset and across several
input and feature dimensions.
Break down the aggregate performance metrics to automatically discover erroneous cohorts and take
targeted mitigation steps.
Error tree
Often, error patterns may be complex and involve more than one or two features. Therefore, it may be difficult
for developers to explore all possible combinations of features to discover hidden data pockets with critical
failure. To alleviate the burden, the binary tree visualization automatically partitions the benchmark data into
interpretable subgroups, which have unexpectedly high or low error rates. In other words, the tree uses the
input features to maximally separate model error from success. For each node defining a data subgroup, users
can investigate the following information:
Error rate : a portion of instances in the node for which the model is incorrect. This is shown through the
intensity of the red color.
Error coverage : a portion of all errors that fall into the node. This is shown through the fill rate of the node.
Data representation : number of instances in the node. This is shown through the thickness of the incoming
edge to the node along with the actual total number of instances in the node.
Error Heatmap
The view slices the data based on a one- or two-dimensional grid of input features. Users can choose the input
features of interest for analysis. The heatmap visualizes cells with higher error with a darker red color to bring
the user’s attention to regions with high error discrepancy. This is beneficial especially when the error themes
are different in different partitions, which happen frequently in practice. In this error identification view, the
analysis is highly guided by the users and their knowledge or hypotheses of what features might be most
important for understanding failure.
Next steps
Learn how to generate the Responsible AI dashboard via CLIv2 and SDKv2 or studio UI
Learn how to generate a Responsible AI scorecard) based on the insights observed in the Responsible AI
dashboard.
Understand your datasets (preview)
5/25/2022 • 2 minutes to read • Edit Online
Machine learning models "learn" from historical decisions and actions captured in training data. As a result, their
performance in real-world scenarios is heavily influenced by the data they're trained on. When feature
distribution in a dataset is skewed, this can cause a model to incorrectly predict datapoints belonging to an
underrepresented group or to be optimized along an inappropriate metric. For example, while training a
housing price prediction AI, the training set was representing 75% of newer houses that have less than median
prices. As a result, it was much less successful in successfully identifying more expensive historic houses. The fix
was to add older and expensive houses to the training data and augment the features to include insights about
the historic value of the house. Upon incorporating that data augmentation, results improved.
The Data Explorer component of the Responsible AI dashboard helps visualize datasets based on predicted and
actual outcomes, error groups, and specific features. This enables you to identify issues of over- and
underrepresentation and to see how data is clustered in the dataset. Data visualizations consist of aggregate
plots or individual datapoints.
Next steps
Learn how to generate the Responsible AI dashboard via CLIv2 and SDKv2 or studio UI
Learn how to generate a Responsible AI scorecard based on the insights observed in the Responsible AI
dashboard.
Counterfactuals analysis and what-if (preview)
5/25/2022 • 2 minutes to read • Edit Online
What-if counterfactuals address the question of “what would the model predict if the action input is changed”,
enables understanding and debugging of a machine learning model in terms of how it reacts to input (feature)
changes. Compared with approximating a machine learning model or ranking features by their predictive
importance (which standard interpretability techniques do), counterfactual analysis “interrogates” a model to
determine what changes to a particular datapoint would flip the model decision. Such an analysis helps in
disentangling the impact of different correlated features in isolation or for acquiring a more nuanced
understanding on how much of a feature change is needed to see a model decision flip for classification models
and decision change for regression models.
The Counterfactual Analysis and what-if component of the Responsible AI dashboard consists of two
functionalities:
Generating a set of examples with minimal changes to a given point such that they change the model's
prediction (showing the closest datapoints with opposite model precisions)
Enabling users to generate their own what-if perturbations to understand how the model reacts to features’
changes.
The capabilities of this component are founded by the DiCE package, which implements counterfactual
explanations that provide this information by showing feature-perturbed versions of the same datapoint who
would have received a different model prediction (for example, Taylor would have received the loan if their
income was higher by $10,000). The counterfactual analysis component enables you to identify which features
to vary and their permissible ranges for valid and logical counterfactual examples.
Use What-If Counterfactuals when you need to:
Examine fairness and reliability criteria as a decision evaluator (by perturbing sensitive attributes such as
gender, ethnicity, etc., and observing whether model predictions change).
Debug specific input instances in depth.
Provide solutions to end users and determining what they can do to get a desirable outcome from the model
next time.
In traditional scenarios, raw data is stored in files and databases. When users analyze data, they typically use the
raw data. This is a concern because it might infringe on an individual's privacy. Differential privacy tries to deal
with this problem by adding "noise" or randomness to the data so that users can't identify any individual data
points. At the least, such a system provides plausible deniability. Therefore, the privacy of individuals is
preserved with limited impact on the accuracy of the data.
In differentially private systems, data is shared through requests called queries . When a user submits a query
for data, operations known as privacy mechanisms add noise to the requested data. Privacy mechanisms
return an approximation of the data instead of the raw data. This privacy-preserving result appears in a repor t .
Reports consist of two parts, the actual data computed and a description of how the data was created.
Reliability of data
Although the preservation of privacy should be the goal, there’s a tradeoff when it comes to usability and
reliability of the data. In data analytics, accuracy can be thought of as a measure of uncertainty introduced by
sampling errors. This uncertainty tends to fall within certain bounds. Accuracy from a differential privacy
perspective instead measures the reliability of the data, which is affected by the uncertainty introduced by the
privacy mechanisms. In short, a higher level of noise or privacy translates to data that has a lower epsilon,
accuracy, and reliability.
C O M P O N EN T DESC RIP T IO N
Validator A Rust library that contains a set of tools for checking and
deriving the necessary conditions for an analysis to be
differentially private.
SmartNoise SDK
The system library provides the following tools and services for working with tabular and relational data:
C O M P O N EN T DESC RIP T IO N
Data Access Library that intercepts and processes SQL queries and
produces reports. This library is implemented in Python and
supports the following ODBC and DBAPI data sources:
PostgreSQL
SQL Server
Spark
Preston
Pandas
Next steps
Learn more about differential privacy in machine learning:
How to build a differentially private system in Azure Machine Learning.
To learn more about the components of SmartNoise, check out the GitHub repositories for SmartNoise
Core, SmartNoise SDK, and SmartNoise samples.
What is an Azure Machine Learning workspace?
5/25/2022 • 6 minutes to read • Edit Online
The workspace is the top-level resource for Azure Machine Learning, providing a centralized place to work with
all the artifacts you create when you use Azure Machine Learning. The workspace keeps a history of all training
runs, including logs, metrics, output, and a snapshot of your scripts. You use this information to determine which
training run produces the best model.
Once you have a model you like, you register it with the workspace. You then use the registered model and
scoring scripts to deploy to Azure Container Instances, Azure Kubernetes Service, or to a field-programmable
gate array (FPGA) as a REST-based HTTP endpoint.
Taxonomy
A taxonomy of the workspace is illustrated in the following diagram:
On the web:
Azure Machine Learning studio
Azure Machine Learning designer
In any Python environment with the Azure Machine Learning SDK for Python.
On the command line using the Azure Machine Learning CLI extension
Azure Machine Learning VS Code Extension
Workspace management
You can also perform the following workspace management tasks:
W O RK SPA C E
M A N A GEM EN T
TA SK P O RTA L ST UDIO P Y T H O N SDK A Z URE C L I VS C O DE
Create a ✓ ✓ ✓ ✓
workspace
Manage ✓ ✓
workspace
access
Create and ✓ ✓ ✓ ✓
manage
compute
resources
Create a ✓
Notebook VM
WARNING
Moving your Azure Machine Learning workspace to a different subscription, or moving the owning subscription to a new
tenant, is not supported. Doing so may cause errors.
Create a workspace
There are multiple ways to create a workspace:
Use the Azure portal for a point-and-click interface to walk you through each step.
Use the Azure Machine Learning SDK for Python to create a workspace on the fly from Python scripts or
Jupyter notebooks
Use an Azure Resource Manager template or the Azure Machine Learning CLI when you need to automate or
customize the creation with corporate security standards.
If you work in Visual Studio Code, use the VS Code extension.
NOTE
The workspace name is case-insensitive.
Sub resources
These sub resources are the main resources that are made in the AML workspace.
VMs: provide computing power for your AML workspace and are an integral part in deploying and training
models.
Load Balancer: a network load balancer is created for each compute instance and compute cluster to manage
traffic even while the compute instance/cluster is stopped.
Virtual Network: these help Azure resources communicate with one another, the internet, and other on-
premises networks.
Bandwidth: encapsulates all outbound data transfers across regions.
Associated resources
When you create a new workspace, it automatically creates several Azure resources that are used by the
workspace:
Azure Storage account: Is used as the default datastore for the workspace. Jupyter notebooks that are
used with your Azure Machine Learning compute instances are stored here as well.
IMPORTANT
By default, the storage account is a general-purpose v1 account. You can upgrade this to general-purpose v2 after
the workspace has been created. Do not enable hierarchical namespace on the storage account after upgrading to
general-purpose v2.
To use an existing Azure Storage account, it cannot be of type BlobStorage or a premium account
(Premium_LRS and Premium_GRS). It also cannot have a hierarchical namespace (used with Azure Data
Lake Storage Gen2). Neither premium storage nor hierarchical namespaces are supported with the
default storage account of the workspace. You can use premium storage or hierarchical namespace with
non-default storage accounts.
Azure Container Registry: Registers docker containers that are used for the following components:
Azure Machine Learning environments when training and deploying models
AutoML when deploying
Data profiling
To minimize costs, ACR is lazy-loaded until images are needed.
NOTE
If your subscription setting requires adding tags to resources under it, Azure Container Registry (ACR) created by
Azure Machine Learning will fail, since we cannot set tags to ACR.
Azure Application Insights: Stores monitoring and diagnostics information. For more information, see
Monitor and collect data from Machine Learning web service endpoints.
NOTE
You can delete the Application Insights instance after cluster creation if you want. Deleting it limits the information
gathered from the workspace, and may make it more difficult to troubleshoot problems. If you delete the
Application Insights instance created by the workspace, you cannot re-create it without deleting
and recreating the workspace .
Azure Key Vault: Stores secrets that are used by compute targets and other sensitive information that's
needed by the workspace.
NOTE
You can instead use existing Azure resource instances when you create the workspace with the Python SDK or the Azure
Machine Learning CLI using an ARM template.
Next steps
To learn more about planning a workspace for your organization's requirements, see Organize and set up Azure
Machine Learning.
To get started with Azure Machine Learning, see:
What is Azure Machine Learning?
Create and manage a workspace
Tutorial: Get started with Azure Machine Learning
Tutorial: Create your first classification model with automated machine learning
Tutorial: Predict automobile price with the designer
What are Azure Machine Learning environments?
5/25/2022 • 6 minutes to read • Edit Online
Azure Machine Learning environments are an encapsulation of the environment where your machine learning
training happens. They specify the Python packages, environment variables, and software settings around your
training and scoring scripts. They also specify run times (Python, Spark, or Docker). The environments are
managed and versioned entities within your Machine Learning workspace that enable reproducible, auditable,
and portable machine learning workflows across a variety of compute targets.
You can use an Environment object on your local compute to:
Develop your training script.
Reuse the same environment on Azure Machine Learning Compute for model training at scale.
Deploy your model with that same environment.
Revisit the environment in which an existing model was trained.
The following diagram illustrates how you can use a single Environment object in both your run configuration
(for training) and your inference and deployment configuration (for web service deployments).
The environment, compute target and training script together form the run configuration: the full specification
of a training run.
Types of environments
Environments can broadly be divided into three categories: curated, user-managed, and system-managed.
Curated environments are provided by Azure Machine Learning and are available in your workspace by default.
Intended to be used as is, they contain collections of Python packages and settings to help you get started with
various machine learning frameworks. These pre-created environments also allow for faster deployment time.
For a full list, see the curated environments article.
In user-managed environments, you're responsible for setting up your environment and installing every
package that your training script needs on the compute target. Also be sure to include any dependencies needed
for model deployment.
You use system-managed environments when you want conda to manage the Python environment for you. A
new conda environment is materialized from your conda specification on top of a base docker image.
To determine whether to reuse a cached image or build a new one, AzureML computes a hash value from the
environment definition and compares it to the hashes of existing environments. The hash is based on the
environment definition's:
Base image
Custom docker steps
Python packages
Spark packages
The hash isn't affected by the environment name or version. If you rename your environment or create a new
one with the same settings and packages as another environment, then the hash value will remain the same.
However, environment definition changes like adding or removing a Python package or changing a package
version will result cause the resulting hash value to change. Changing the order of dependencies or channels in
an environment will also change the hash and require a new image build. Similarly, any change to a curated
environment will result in the creation of a new "non-curated" environment.
NOTE
You will not be able to submit any local changes to a curated environment without changing the name of the
environment. The prefixes "AzureML-" and "Microsoft" are reserved exclusively for curated environments, and your job
submission will fail if the name starts with either of them.
The environment's computed hash value is compared with those in the Workspace and global ACR, or on the
compute target (local runs only). If there is a match then the cached image is pulled and used, otherwise an
image build is triggered.
The following diagram shows three environment definitions. Two of them have different names and versions
but identical base images and Python packages, which results in the same hash and corresponding cached
image. The third environment has different Python packages and versions, leading to a different hash and
cached image.
Actual cached images in your workspace ACR will have names like
azureml/azureml_e9607b2514b066c851012848913ba19f with the hash appearing at the end.
IMPORTANT
If you create an environment with an unpinned package dependency (for example, numpy ), the environment uses
the package version that was available when the environment was created. Any future environment that uses a
matching definition will use the original version.
To update the package, specify a version number to force an image rebuild. An example of this would be changing
numpy to numpy==1.18.1 . New dependencies--including nested ones--will be installed, and they might break a
previously working scenario.
Using an unpinned base image like mcr.microsoft.com/azureml/openmpi3.1.2-ubuntu18.04 in your
environment definition results in rebuilding the image every time the latest tag is updated. This helps the
image receive the latest patches and system updates.
WARNING
The Environment.build method will rebuild the cached image, with the possible side-effect of updating unpinned
packages and breaking reproducibility for all environment definitions corresponding to that cached image.
Image patching
Microsoft is responsible for patching the base images for known security vulnerabilities. Updates for supported
images are released every two weeks, with a commitment of no unpatched vulnerabilities older than 30 days in
the the latest version of the image. Patched images are released with a new immutable tag and the :latest tag
is updated to the latest version of the patched image.
If you provide your own images, you are responsible for updating them.
For more information on the base images, see the following links:
Azure Machine Learning base images GitHub repository.
Train a model using a custom image.
Deploy a TensorFlow model using a custom container
Next steps
Learn how to create and use environments in Azure Machine Learning.
See the Python SDK reference documentation for the environment class.
What is an Azure Machine Learning compute
instance?
5/25/2022 • 7 minutes to read • Edit Online
An Azure Machine Learning compute instance is a managed cloud-based workstation for data scientists.
Compute instances make it easy to get started with Azure Machine Learning development as well as provide
management and enterprise readiness capabilities for IT administrators.
Use a compute instance as your fully configured and managed development environment in the cloud for
machine learning. They can also be used as a compute target for training and inferencing for development and
testing purposes.
For compute instance Jupyter functionality to work, ensure that web socket communication is not disabled.
Please ensure your network allows websocket connections to *.instances.azureml.net and
*.instances.azureml.ms.
IMPORTANT
Items marked (preview) in this article are currently in public preview. The preview version is provided without a service
level agreement, and it's not recommended for production workloads. Certain features might not be supported or might
have constrained capabilities. For more information, see Supplemental Terms of Use for Microsoft Azure Previews.
K EY B EN EF IT S DESC RIP T IO N
Managed & secure Reduce your security footprint and add compliance with
enterprise security requirements. Compute instances provide
robust management policies and secure networking
configurations such as:
Preconfigured for ML Save time on setup tasks with pre-configured and up-to-
date ML packages, deep learning frameworks, GPU drivers.
Fully customizable Broad support for Azure VM types including GPUs and
persisted low-level customization such as installing packages
and drivers makes advanced scenarios a breeze. You can also
use setup scripts to automate customization
Drivers CUDA
cuDNN
NVIDIA
Blob FUSE
Azure CLI
GEN ERA L TO O L S & EN VIRO N M EN T S DETA IL S
Docker
Nginx
NCCL 2.0
Protobuf
R kernel
Anaconda Python
Azure Machine Learning SDK for Python Includes most of the azureml extra packages. To see the full
from PyPI list, open a terminal window on your compute instance and
run
conda list -n azureml_py36 azureml*
Python packages are all installed in the Python 3.8 - AzureML environment. Compute instance has Ubuntu
18.04 as the base OS.
Accessing files
Notebooks and R scripts are stored in the default storage account of your workspace in Azure file share. These
files are located under your “User files” directory. This storage makes it easy to share notebooks between
compute instances. The storage account also keeps your notebooks safely preserved when you stop or delete a
compute instance.
The Azure file share account of your workspace is mounted as a drive on the compute instance. This drive is the
default working directory for Jupyter, Jupyter Labs, and RStudio. This means that the notebooks and other files
you create in Jupyter, JupyterLab, or RStudio are automatically stored on the file share and available to use in
other compute instances as well.
The files in the file share are accessible from all compute instances in the same workspace. Any changes to these
files on the compute instance will be reliably persisted back to the file share.
You can also clone the latest Azure Machine Learning samples to your folder under the user files directory in the
workspace file share.
Writing small files can be slower on network drives than writing to the compute instance local disk itself. If you
are writing many small files, try using a directory directly on the compute instance, such as a /tmp directory.
Note these files will not be accessible from other compute instances.
Do not store training data on the notebooks file share. You can use the /tmp directory on the compute instance
for your temporary data. However, do not write very large files of data on the OS disk of the compute instance.
OS disk on compute instance has 128 GB capacity. You can also store temporary training data on temporary
disk mounted on /mnt. Temporary disk size is configurable based on the VM size chosen and can store larger
amounts of data if a higher size VM is chosen. You can also mount datastores and datasets. Any software
packages you install are saved on the OS disk of compute instance. Please note customer managed key
encryption is currently not supported for OS disk. The OS disk for compute instance is encrypted with
Microsoft-managed keys.
Create
As an administrator, you can create a compute instance for others in the workspace (preview) .
You can also use a setup script (preview) for an automated way to customize and configure the compute
instance.
To create a compute instance for yourself, use your workspace in Azure Machine Learning studio, create a new
compute instance from either the Compute section or in the Notebooks section when you are ready to run
one of your notebooks.
You can also create an instance
Directly from the integrated notebooks experience
In Azure portal
From Azure Resource Manager template. For an example template, see the create an Azure Machine Learning
compute instance template.
With Azure Machine Learning SDK
From the CLI extension for Azure Machine Learning
The dedicated cores per region per VM family quota and total regional quota, which applies to compute instance
creation, is unified and shared with Azure Machine Learning training compute cluster quota. Stopping the
compute instance does not release quota to ensure you will be able to restart the compute instance. Please do
not stop the compute instance through the OS terminal by doing a sudo shutdown.
Compute instance comes with P10 OS disk. Temp disk type depends on the VM size chosen. Currently, it is not
possible to change the OS disk type.
Compute target
Compute instances can be used as a training compute target similar to Azure Machine Learning compute
training clusters.
A compute instance:
Has a job queue.
Runs jobs securely in a virtual network environment, without requiring enterprises to open up SSH port. The
job executes in a containerized environment and packages your model dependencies in a Docker container.
Can run multiple small jobs in parallel (preview). One job per core can run in parallel while the rest of the
jobs are queued.
Supports single-node multi-GPU distributed training jobs
You can use compute instance as a local inferencing deployment target for test/debug scenarios.
TIP
The compute instance has 120GB OS disk. If you run out of disk space and get into an unusable state, please clear at
least 5 GB disk space on OS disk (mounted on /) through the compute instance terminal by removing files/folders and
then do sudo reboot . To access the terminal go to compute list page or compute instance details page and click on
Terminal link. You can check available disk space by running df -h on the terminal. Clear at least 5 GB space before
doing sudo reboot . Please do not stop or restart the compute instance through the Studio until 5 GB disk space has
been cleared.
Next steps
Create and manage a compute instance
Tutorial: Train your first ML model shows how to use a compute instance with an integrated notebook.
What are compute targets in Azure Machine
Learning?
5/25/2022 • 8 minutes to read • Edit Online
A compute target is a designated compute resource or environment where you run your training script or host
your service deployment. This location might be your local machine or a cloud-based compute resource. Using
compute targets makes it easy for you to later change your compute environment without having to change
your code.
In a typical model development lifecycle, you might:
1. Start by developing and experimenting on a small amount of data. At this stage, use your local environment,
such as a local computer or cloud-based virtual machine (VM), as your compute target.
2. Scale up to larger data, or do distributed training by using one of these training compute targets.
3. After your model is ready, deploy it to a web hosting environment with one of these deployment compute
targets.
The compute resources you use for your compute targets are attached to a workspace. Compute resources
other than the local machine are shared by users of the workspace.
A UTO M AT ED M A C H IN E M A C H IN E L EA RN IN G A Z URE M A C H IN E
T RA IN IN G TA RGET S L EA RN IN G P IP EL IN ES L EA RN IN G DESIGN ER
TIP
The compute instance has 120GB OS disk. If you run out of disk space, use the terminal to clear at least 1-2 GB before
you stop or restart the compute instance.
Azure Kubernetes Real-time inference Yes (web service Yes Use for high-scale
Service (AKS) deployment) production
Recommended for deployments.
production Provides fast
workloads. response time and
autoscaling of the
deployed service.
Cluster autoscaling
isn't supported
through the Azure
Machine Learning
SDK. To change the
nodes in the AKS
cluster, use the UI for
your AKS cluster in
the Azure portal.
Supported in the
designer.
Supported in the
designer.
Azure Machine Batch inference Yes (machine learning Run batch scoring on
Learning compute pipeline) serverless compute.
clusters Supports normal and
low-priority VMs. No
support for real-time
inference.
NOTE
Although compute targets like local, and Azure Machine Learning compute clusters support GPU for training and
experimentation, using GPU for inference when deployed as a web service is supported only on AKS.
Using a GPU for inference when scoring with a machine learning pipeline is supported only on Azure Machine Learning
compute.
When choosing a cluster SKU, first scale up and then scale out. Start with a machine that has 150% of the RAM your
model requires, profile the result and find a machine that has the performance you need. Once you've learned that,
increase the number of machines to fit your need for concurrent inference.
NOTE
Container instances are suitable only for small models less than 1 GB in size.
Use single-node AKS clusters for dev/test of larger models.
C A PA B IL IT Y C O M P UT E C L UST ER C O M P UT E IN STA N C E
NOTE
When a compute cluster is idle, it autoscales to 0 nodes, so you don't pay when it's not in use. A compute instance is
always on and doesn't autoscale. You should stop the compute instance when you aren't using it to avoid extra cost.
While Azure Machine Learning supports these VM series, they might not be available in all Azure regions. To
check whether VM series are available, see Products available by region.
NOTE
Azure Machine Learning doesn't support all VM sizes that Azure Compute supports. To list the available VM sizes, use
one of the following methods:
REST API
Python SDK
If using the GPU-enabled compute targets, it is important to ensure that the correct CUDA drivers are installed
in the training environment. Use the following table to determine the correct CUDA version to use:
In addition to ensuring the CUDA version and hardware are compatible, also ensure that the CUDA version is
compatible with the version of the machine learning framework you are using:
For PyTorch, you can check the compatibility by visiting Pytorch's previous versions page.
For Tensorflow, you can check the compatibility by visiting Tensorflow's build from source page.
Compute isolation
Azure Machine Learning compute offers VM sizes that are isolated to a specific hardware type and dedicated to
a single customer. Isolated VM sizes are best suited for workloads that require a high degree of isolation from
other customers' workloads for reasons that include meeting compliance and regulatory requirements. Utilizing
an isolated size guarantees that your VM will be the only one running on that specific server instance.
The current isolated VM offerings include:
Standard_M128ms
Standard_F72s_v2
Standard_NC24s_v3
Standard_NC24rs_v3*
*RDMA capable
To learn more about isolation, see Isolation in the Azure public cloud.
Unmanaged compute
An unmanaged compute target is not managed by Azure Machine Learning. You create this type of compute
target outside Azure Machine Learning and then attach it to your workspace. Unmanaged compute resources
can require additional steps for you to maintain or to improve performance for machine learning workloads.
Azure Machine Learning supports the following unmanaged compute types:
Your local computer
Remote virtual machines
Azure HDInsight
Azure Batch
Azure Databricks
Azure Data Lake Analytics
Azure Container Instance
Azure Kubernetes Service & Azure Arc-enabled Kubernetes (preview)
For more information, see set up compute targets for model training and deployment
Next steps
Learn how to:
Use a compute target to train your model
Deploy your model to a compute target
Plan to manage costs for Azure Machine Learning
5/25/2022 • 7 minutes to read • Edit Online
This article describes how to plan and manage costs for Azure Machine Learning. First, you use the Azure pricing
calculator to help plan for costs before you add any resources. Next, as you add the Azure resources, review the
estimated costs.
After you've started using Azure Machine Learning resources, use the cost management features to set budgets
and monitor costs. Also review the forecasted costs and identify spending trends to identify areas where you
might want to act.
Understand that the costs for Azure Machine Learning are only a portion of the monthly costs in your Azure bill.
If you are using other Azure services, you're billed for all the Azure services and resources used in your Azure
subscription, including the third-party services. This article explains how to plan for and manage costs for Azure
Machine Learning. After you're familiar with managing costs for Azure Machine Learning, apply similar methods
to manage costs for all the Azure services used in your subscription.
For more information on optimizing costs, see how to manage and optimize cost in Azure Machine Learning.
Prerequisites
Cost analysis in Cost Management supports most Azure account types, but not all of them. To view the full list of
supported account types, see Understand Cost Management data.
To view cost data, you need at least read access for an Azure account. For information about assigning access to
Azure Cost Management data, see Assign access to data.
ws.delete(delete_dependent_resources=True)
If you create Azure Kubernetes Service (AKS) in your workspace, or if you attach any compute resources to your
workspace you must delete them separately in Azure portal.
Using Azure Prepayment credit with Azure Machine Learning
You can pay for Azure Machine Learning charges with your Azure Prepayment credit. However, you can't use
Azure Prepayment credit to pay for charges for third party products and services including those from the Azure
Marketplace.
Monitor costs
As you use Azure resources with Azure Machine Learning, you incur costs. Azure resource usage unit costs vary
by time intervals (seconds, minutes, hours, and days) or by unit usage (bytes, megabytes, and so on.) As soon as
Azure Machine Learning use starts, costs are incurred and you can see the costs in cost analysis.
When you use cost analysis, you view Azure Machine Learning costs in graphs and tables for different time
intervals. Some examples are by day, current and prior month, and year. You also view costs against budgets and
forecasted costs. Switching to longer views over time can help you identify spending trends. And you see where
overspending might have occurred. If you've created budgets, you can also easily see where they're exceeded.
To view Azure Machine Learning costs in cost analysis:
1. Sign in to the Azure portal.
2. Open the scope in the Azure portal and select Cost analysis in the menu. For example, go to
Subscriptions , select a subscription from the list, and then select Cost analysis in the menu. Select Scope
to switch to a different scope in cost analysis.
3. By default, cost for services are shown in the first donut chart. Select the area in the chart labeled Azure
Machine Learning.
Actual monthly costs are shown when you initially open cost analysis. Here's an example showing all monthly
usage costs.
To narrow costs for a single service, like Azure Machine Learning, select Add filter and then select Ser vice
name . Then, select vir tual machines .
Here's an example showing costs for just Azure Machine Learning.
In the preceding example, you see the current cost for the service. Costs by Azure regions (locations) and Azure
Machine Learning costs by resource group are also shown. From here, you can explore costs on your own.
Create budgets
You can create budgets to manage costs and create alerts that automatically notify stakeholders of spending
anomalies and overspending risks. Alerts are based on spending compared to budget and cost thresholds.
Budgets and alerts are created for Azure subscriptions and resource groups, so they're useful as part of an
overall cost monitoring strategy.
Budgets can be created with filters for specific resources or services in Azure if you want more granularity
present in your monitoring. Filters help ensure that you don't accidentally create new resources that cost you
additional money. For more about the filter options when you when create a budget, see Group and filter
options.
Other ways to manage and reduce costs for Azure Machine Learning
Use the following tips to help you manage and optimize your compute resource costs.
Configure your training clusters for autoscaling
Set quotas on your subscription and workspaces
Set termination policies on your training run
Use low-priority virtual machines (VM)
Schedule compute instances to shut down and start up automatically
Use an Azure Reserved VM Instance
Train locally
Parallelize training
Set data retention and deletion policies
Deploy resources to the same region
Delete instances and clusters if you do not plan on using them in the near future.
For more information, see manage and optimize costs in Azure Machine Learning.
Next steps
Manage and optimize costs in Azure Machine Learning.
Manage budgets, costs, and quota for Azure Machine Learning at organizational scale
Learn how to optimize your cloud investment with Azure Cost Management.
Learn more about managing costs with cost analysis.
Learn about how to prevent unexpected costs.
Take the Cost Management guided learning course.
Azure Machine Learning datastores
5/25/2022 • 2 minutes to read • Edit Online
Storage URIs use identity-based access that will prompt you for your Azure Active Directory token for data
access authentication. This approach allows for data access management at the storage level and keeps
credentials confidential.
NOTE
When using Notebooks in Azure Machine Learning Studio, your Azure Active Directory token is automatically passed
through to storage for data access authentication.
Although storage URIs provide a convenient mechanism to access data, there may be cases where using an
Azure Machine Learning Datastore is a better option:
You need credential-based data access (for example: Ser vice Principals, SAS Tokens, Account
Name/Key). Datastores are helpful because they keep the connection information to your data storage
securely in an Azure Keyvault, so you don't have to code it in your scripts.
You want team members to easily discover relevant datastores. Datastores are registered to an
Azure Machine Learning workspace making them easier for your team members to find/discover them.
Register and create a datastore to easily connect to your storage account, and access the data in your underlying
storage service.
Next steps
How to create a datastore
Data in Azure Machine Learning
5/25/2022 • 2 minutes to read • Edit Online
Azure Machine Learning makes it easy to connect to your data in the cloud. It provides an abstraction layer over
the underlying storage service, so you can securely access and work with your data without having to write code
specific to your storage type. Azure Machine Learning also provides the following data capabilities:
Interoperability with Pandas and Spark DataFrames
Versioning and tracking of data lineage
Data labeling (V1 only for now)
You can bring data to Azure Machine Learning
Directly from your local machine and URLs
That's already in a cloud-based storage service in Azure and access it using your Azure storage account
related credentials and an Azure Machine Learning datastore.
C REDEN T IA L - B A SED
SUP P O RT ED STO RA GE SERVIC E A UT H EN T IC AT IO N IDEN T IT Y - B A SED A UT H EN T IC AT IO N
mltable Defines tabular data for use in Schema and subsetting transforms
automated ML and parallel jobs
In the following example, the expectation is to provide a uri_folder because to read the file in, the training
script creates a path that joins the folder with the file name. If you want to pass in just an individual file rather
than the entire folder you can use the uri_file type.
Human data is data collected directly from, or about, people. Human data may include personal data such as
names, age, images, or voice clips and sensitive data such as genetic data, biometric data, gender identity,
religious beliefs, or political affiliations.
Collecting this data can be important to building AI systems that work for all users. But certain practices should
be avoided, especially ones that can cause physical and psychological harm to data contributors.
The best practices in this article will help you conduct manual data collection projects from volunteers where
everyone involved is treated with respect, and potential harms—especially those faced by vulnerable groups—
are anticipated and mitigated. This means that:
People contributing data are not coerced or exploited in any way, and they have control over what personal
data is collected.
People collecting and labeling data have adequate training.
These practices can also help ensure more-balanced and higher-quality datasets and better stewardship of
human data.
These are emerging practices, and we are continually learning. The best practices below are a starting point as
you begin your own responsible human data collections. These best practices are provided for informational
purposes only and should not be treated as legal advice. All human data collections should undergo specific
privacy and legal reviews.
Communicate expectations clearly in the Statement of Work (SOW) (contracts or agreements) with
suppliers.
A contract which lacks requirements for responsible data collection work may result in low-quality or poorly
collected data.
NOTE
This article focuses on recommendations for human data, including personal data and sensitive data such as biometric
data, health data, racial or ethnic data, data collected manually from the general public or company employees, as well as
metadata relating to human characteristics, such as age, ancestry, and gender identity, that may be created via annotation
or labeling.
In some parts of the world, there are laws that criminalize specific gender categories, so it may be dangerous for
data contributors to answer this question honestly. Always give people a way to opt out. And work with regional
experts and attorneys to conduct a careful review of the laws and cultural norms of each place where you plan
to collect data, and if needed, avoid asking this question entirely.
Download the full guidance here.
Next steps
For more information on how to work with your data:
Secure data access in Azure Machine Learning
Data ingestion options for Azure Machine Learning workflows
Optimize data processing with Azure Machine Learning
Use differential privacy in Azure Machine Learning
Follow these how-to guides to work with your data after you've collected it:
Set up image labeling
Label images and text
Enterprise security and governance for Azure
Machine Learning
5/25/2022 • 6 minutes to read • Edit Online
In this article, you'll learn about security and governance features available for Azure Machine Learning. These
features are useful for administrators, DevOps, and MLOps who want to create a secure configuration that is
compliant with your companies policies. With Azure Machine Learning and the Azure platform, you can:
Restrict access to resources and operations by user account or groups
Restrict incoming and outgoing network communications
Encrypt data in transit and at rest
Scan for vulnerabilities
Apply and audit configuration policies
Each workspace has an associated system-assigned managed identity that has the same name as the workspace.
This managed identity is used to securely access resources used by the workspace. It has the following Azure
RBAC permissions on associated resources:
Workspace Contributor
The system-assigned managed identity is used for internal service-to-service authentication between Azure
Machine Learning and other Azure resources. The identity token is not accessible to users and cannot be used by
them to gain access to these resources. Users can only access the resources through Azure Machine Learning
control and data plane APIs, if they have sufficient RBAC permissions.
The managed identity needs Contributor permissions on the resource group containing the workspace in order
to provision the associated resources, and to deploy Azure Container Instances for web service endpoints.
We don't recommend that admins revoke the access of the managed identity to the resources mentioned in the
preceding table. You can restore access by using the resync keys operation.
NOTE
If your Azure Machine Learning workspaces has compute targets (compute cluster, compute instance, Azure Kubernetes
Service, etc.) that were created before May 14th, 2021 , you may also have an additional Azure Active Directory
account. The account name starts with Microsoft-AzureML-Support-App- and has contributor-level access to your
subscription for every workspace region.
If your workspace does not have an Azure Kubernetes Service (AKS) attached, you can safely delete this Azure AD
account.
If your workspace has attached AKS clusters, and they were created before May 14th, 2021, do not delete this Azure
AD account . In this scenario, you must first delete and recreate the AKS cluster before you can delete the Azure AD
account.
You can provision the workspace to use user-assigned managed identity, and grant the managed identity
additional roles, for example to access your own Azure Container Registry for base Docker images. For more
information, see Use managed identities for access control.
You can also configure managed identities for use with Azure Machine Learning compute cluster. This managed
identity is independent of workspace managed identity. With a compute cluster, the managed identity is used to
access resources such as secured datastores that the user running the training job may not have access to. For
more information, see Identity-based data access to storage services on Azure.
TIP
There are some exceptions to the use of Azure AD and Azure RBAC within Azure Machine Learning:
You can optionally enable SSH access to compute resources such as Azure Machine Learning compute instance and
compute cluster. SSH access is based on public/private key pairs, not Azure AD. SSH access is not governed by Azure
RBAC.
You can authenticate to models deployed as web services (inference endpoints) using key or token -based
authentication. Keys are static strings, while tokens are retrieved using an Azure AD security object. For more
information, see Configure authentication for models deployed as a web service.
Data encryption
Azure Machine Learning uses a variety of compute resources and data stores on the Azure platform. To learn
more about how each of these supports data encryption at rest and in transit, see Data encryption with Azure
Machine Learning.
When deploying models as web services, you can enable transport-layer security (TLS) to encrypt data in transit.
For more information, see Configure a secure web service.
Vulnerability scanning
Microsoft Defender for Cloud provides unified security management and advanced threat protection across
hybrid cloud workloads. For Azure machine learning, you should enable scanning of your Azure Container
Registry resource and Azure Kubernetes Service resources. For more information, see Azure Container Registry
image scanning by Defender for Cloud and Azure Kubernetes Services integration with Defender for Cloud.
Next steps
Azure Machine Learning best practices for enterprise security
Secure Azure Machine Learning web services with TLS
Consume a Machine Learning model deployed as a web service
Use Azure Machine Learning with Azure Firewall
Use Azure Machine Learning with Azure Virtual Network
Data encryption at rest and in transit
Build a real-time recommendation API on Azure
Network traffic flow when using a secured
workspace
5/25/2022 • 10 minutes to read • Edit Online
When your Azure Machine Learning workspace and associated resources are secured in an Azure Virtual
Network, it changes the network traffic between resources. Without a virtual network, network traffic flows over
the public internet or within an Azure data center. Once a virtual network (VNet) is introduced, you may also
want to harden network security. For example, blocking inbound and outbound communications between the
VNet and public internet. However, Azure Machine Learning requires access to some resources on the public
internet. For example, Azure Resource Management is used for deployments and management operations.
This article lists the required traffic to/from the public internet. It also explains how network traffic flows
between your client development environment and a secured Azure Machine Learning workspace in the
following scenarios:
Using Azure Machine Learning studio to work with:
Your workspace
AutoML
Designer
Datasets and datastores
TIP
Azure Machine Learning studio is a web-based UI that runs partially in your web browser, and makes calls to
Azure services to perform tasks such as training a model, using designer, or viewing datasets. Some of these calls
use a different communication flow than if you are using the SDK, CLI, REST API, or VS Code.
Using Azure Machine Learning studio , SDK , CLI , or REST API to work with:
Compute instances and clusters
Azure Kubernetes Service
Docker images managed by Azure Machine Learning
TIP
If a scenario or task is not listed here, it should work the same with or without a secured workspace.
Assumptions
This article assumes the following configuration:
Azure Machine Learning workspace using a private endpoint to communicate with the VNet.
The Azure Storage Account, Key Vault, and Container Registry used by the workspace also use a private
endpoint to communicate with the VNet.
A VPN gateway or Express Route is used by the client workstations to access the VNet.
Use compute instance and Azure Machine Azure Active If you use a firewall, create
compute cluster Learning service on Directory user-defined routes. For
port 44224 Azure Resource more information, see
Azure Batch Manager Configure inbound and
Management Azure Machine outbound traffic.
service on ports Learning service
29876-29877 Azure Storage
Account
Azure Key Vault
Use Azure Kubernetes NA For information on the Configure the Internal Load
Service outbound configuration for Balancer. For more
AKS, see How to deploy to information, see How to
Azure Kubernetes Service. deploy to Azure Kubernetes
Service.
When accessing your workspace from studio, the network traffic flows are as follows:
To authenticate to resources, Azure Active Director y is used.
For management and deployment operations, Azure Resource Manager is used.
For Azure Machine Learning specific tasks, Azure Machine Learning ser vice is used
For access to Azure Machine Learning studio (https://ml.azure.com), Azure FrontDoor is used.
For most storage operations, traffic flows through the private endpoint of the default storage for your
workspace. Exceptions are discussed in the Use AutoML, designer, dataset, and datastore section.
You also need to configure a DNS solution that allows you to resolve the names of the resources within the
VNet. For more information, see Use your workspace with a custom DNS.
Scenario: Use AutoML, designer, dataset, and datastore from studio
The following features of Azure Machine Learning studio use data profiling:
Dataset: Explore the dataset from studio.
Designer: Visualize module output data.
AutoML: View a data preview/profile and choose a target column.
Labeling
Data profiling depends on the Azure Machine Learning managed service being able to access the default Azure
Storage Account for your workspace. The managed service doesn't exist in your VNet, so can’t directly access the
storage account in the VNet. Instead, the workspace uses a service principal to access storage.
TIP
You can provide a service principal when creating the workspace. If you do not, one is created for you and will have the
same name as your workspace.
To allow access to the storage account, configure the storage account to allow a resource instance for your
workspace or select the Allow Azure ser vices on the trusted ser vices list to access this storage
account . This setting allows the managed service to access storage through the Azure data center network.
Next, add the service principal for the workspace to the Reader role to the private endpoint of the storage
account. This role is used to verify the workspace and storage subnet information. If they're the same, access is
allowed. Finally, the service principal also requires Blob data contributor access to the storage account.
For more information, see the Azure Storage Account section of How to secure a workspace in a virtual network.
Scenario: Use compute instance and compute cluster
Azure Machine Learning compute instance and compute cluster are managed services hosted by Microsoft.
They're built on top of the Azure Batch service. While they exist in a Microsoft managed environment, they're
also injected into your VNet.
When you create a compute instance or compute cluster, the following resources are also created in your VNet:
A Network Security Group with required outbound rules. These rules allow inbound access from the
Azure Machine Learning (TCP on port 44224) and Azure Batch service (TCP on ports 29876-29877).
IMPORTANT
If you usee a firewall to block internet access into the VNet, you must configure the firewall to allow this traffic. For
example, with Azure Firewall you can create user-defined routes. For more information, see How to use Azure
Machine Learning with a firewall.
IMPORTANT
This feature is currently in public preview. This preview version is provided without a service-level agreement, and it's not
recommended for production workloads. Certain features might not be supported or might have constrained capabilities.
For more information, see Supplemental Terms of Use for Microsoft Azure Previews.
Inbound communication with the scoring URL of the online endpoint can be secured using the
public_network_access flag on the endpoint. Setting the flag to disabled restricts the online endpoint to
receiving traffic only from the virtual network. For secure inbound communications, the Azure Machine
Learning workspace's private endpoint is used.
Outbound communication from a deployment can be secured on a per-deployment basis by using the
egress_public_network_access flag. Outbound communication in this case is from the deployment to Azure
Container Registry, storage blob, and workspace. Setting the flag to true will restrict communication with these
resources to the virtual network.
NOTE
For secure outbound communication, a private endpoint is created for each deployment where
egress_public_network_access is set to disabled .
Visibility of the endpoint is also governed by the public_network_access flag of the Azure Machine Learning
workspace. If this flag is disabled , then the scoring endpoints can only be accessed from virtual networks that
contain a private endpoint for the workspace. If it is enabled , then the scoring endpoint can be accessed from
the virtual network and public networks.
Supported configurations
IN B O UN D O UT B O UN D
C O N F IGURAT IO N ( EN DP O IN T P RO P ERT Y ) ( DEP LO Y M EN T P RO P ERT Y ) SUP P O RT ED?
NOTE
The Azure Kubernetes Service load balancer is not the same as the load balancer created by Azure Machine Learning. If
you want to host your model as a secured application, only available on the VNet, use the internal load balancer created
by Azure Machine Learning. If you want to allow public access, use the public load balancer created by Azure Machine
Learning.
If your model requires extra inbound or outbound connectivity, such as to an external data source, use a network
security group or your firewall to allow the traffic.
TIP
If your Azure Container Registry is secured in the VNet, it cannot be used by Azure Machine Learning to build Docker
images. Instead, you must designate an Azure Machine Learning compute cluster to build images. For more information,
see How to secure a workspace in a virtual network.
Next steps
Now that you've learned how network traffic flows in a secured configuration, learn more about securing Azure
ML in a virtual network by reading the Virtual network isolation and privacy overview article.
For information on best practices, see the Azure Machine Learning best practices for enterprise security article.
Azure Policy Regulatory Compliance controls for
Azure Machine Learning
5/25/2022 • 4 minutes to read • Edit Online
Regulatory Compliance in Azure Policy provides Microsoft created and managed initiative definitions, known as
built-ins, for the compliance domains and security controls related to different compliance standards. This
page lists the compliance domains and security controls for Azure Machine Learning. You can assign the
built-ins for a security control individually to help make your Azure resources compliant with the specific
standard.
The title of each built-in policy definition links to the policy definition in the Azure portal. Use the link in the
Policy Version column to view the source on the Azure Policy GitHub repo.
IMPORTANT
Each control below is associated with one or more Azure Policy definitions. These policies may help you assess compliance
with the control; however, there often is not a one-to-one or complete match between a control and one or more policies.
As such, Compliant in Azure Policy refers only to the policies themselves; this doesn't ensure you're fully compliant with
all requirements of a control. In addition, the compliance standard includes controls that aren't addressed by any Azure
Policy definitions at this time. Therefore, compliance in Azure Policy is only a partial view of your overall compliance status.
The associations between controls and Azure Policy Regulatory Compliance definitions for these compliance standards
may change over time.
P O L IC Y P O L IC Y VERSIO N
DO M A IN C O N T RO L ID C O N T RO L T IT L E ( A ZURE PO RTA L) ( GIT HUB)
FedRAMP High
To review how the available Azure Policy built-ins for all Azure services map to this compliance standard, see
Azure Policy Regulatory Compliance - FedRAMP High. For more information about this compliance standard,
see FedRAMP High.
P O L IC Y P O L IC Y VERSIO N
DO M A IN C O N T RO L ID C O N T RO L T IT L E ( A ZURE PO RTA L) ( GIT HUB)
FedRAMP Moderate
To review how the available Azure Policy built-ins for all Azure services map to this compliance standard, see
Azure Policy Regulatory Compliance - FedRAMP Moderate. For more information about this compliance
standard, see FedRAMP Moderate.
P O L IC Y P O L IC Y VERSIO N
DO M A IN C O N T RO L ID C O N T RO L T IT L E ( A ZURE PO RTA L) ( GIT HUB)
P O L IC Y P O L IC Y VERSIO N
DO M A IN C O N T RO L ID C O N T RO L T IT L E ( A ZURE PO RTA L) ( GIT HUB)
P O L IC Y P O L IC Y VERSIO N
DO M A IN C O N T RO L ID C O N T RO L T IT L E ( A ZURE PO RTA L) ( GIT HUB)
Next steps
Learn more about Azure Policy Regulatory Compliance.
See the built-ins on the Azure Policy GitHub repo.
Data encryption with Azure Machine Learning
5/25/2022 • 7 minutes to read • Edit Online
Azure Machine Learning uses a variety of Azure data storage services and compute resources when training
models and performing inference. Each of these has their own story on how they provide encryption for data at
rest and in transit. In this article, learn about each one and which is best for your scenario.
IMPORTANT
For production grade encryption during training , Microsoft recommends using Azure Machine Learning compute cluster.
For production grade encryption during inference , Microsoft recommends using Azure Kubernetes Service.
Azure Machine Learning compute instance is a dev/test environment. When using it, we recommend that you store your
files, such as notebooks and scripts, in a file share. Your data should be stored in a datastore.
Encryption at rest
Azure Machine Learning relies on multiple Azure Services, each of which have their own encryption capabilities.
Azure Blob storage
Azure Machine Learning stores snapshots, output, and logs in the Azure Blob storage account (default storage
account) that's tied to the Azure Machine Learning workspace and your subscription. All the data stored in Azure
Blob storage is encrypted at rest with Microsoft-managed keys.
For information on how to use your own keys for data stored in Azure Blob storage, see Azure Storage
encryption with customer-managed keys in Azure Key Vault.
Training data is typically also stored in Azure Blob storage so that it's accessible to training compute targets. This
storage isn't managed by Azure Machine Learning but mounted to compute targets as a remote file system.
If you need to rotate or revoke your key, you can do so at any time. When rotating a key, the storage account
will start using the new key (latest version) to encrypt data at rest. When revoking (disabling) a key, the storage
account takes care of failing requests. It usually takes an hour for the rotation or revocation to be effective.
For information on regenerating the access keys, see Regenerate storage access keys.
Azure Cosmos DB
Azure Machine Learning stores metadata in an Azure Cosmos DB instance. This instance is associated with a
Microsoft subscription managed by Azure Machine Learning. All the data stored in Azure Cosmos DB is
encrypted at rest with Microsoft-managed keys.
When using your own (customer-managed) keys to encrypt the Azure Cosmos DB instance, a Microsoft
managed Azure Cosmos DB instance is created in your subscription. This instance is created in a Microsoft-
managed resource group, which is different than the resource group for your workspace. For more information,
see Customer-managed keys.
Azure Container Registry
All container images in your registry (Azure Container Registry) are encrypted at rest. Azure automatically
encrypts an image before storing it and decrypts it when Azure Machine Learning pulls the image.
To use your own (customer-managed) keys to encrypt your Azure Container Registry, you need to create your
own ACR and attach it while provisioning the workspace or encrypt the default instance that gets created at the
time of workspace provisioning.
IMPORTANT
Azure Machine Learning requires the admin account be enabled on your Azure Container Registry. By default, this setting
is disabled when you create a container registry. For information on enabling the admin account, see Admin account.
Once an Azure Container Registry has been created for a workspace, do not delete it. Doing so will break your Azure
Machine Learning workspace.
For an example of creating a workspace using an existing Azure Container Registry, see the following articles:
Create a workspace for Azure Machine Learning with Azure CLI.
Create a workspace with Python SDK.
Use an Azure Resource Manager template to create a workspace for Azure Machine Learning
Azure Container Instance
You may encrypt a deployed Azure Container Instance (ACI) resource using customer-managed keys. The
customer-managed key used for ACI can be stored in the Azure Key Vault for your workspace. For information
on generating a key, see Encrypt data with a customer-managed key.
To use the key when deploying a model to Azure Container Instance, create a new deployment configuration
using AciWebservice.deploy_configuration() . Provide the key information using the following parameters:
cmk_vault_base_url : The URL of the key vault that contains the key.
cmk_key_name : The name of the key.
cmk_key_version : The version of the key.
For more information on creating and using a deployment configuration, see the following articles:
AciWebservice.deploy_configuration() reference
Where and how to deploy
For more information on using a customer-managed key with ACI, see Encrypt data with a customer-managed
key.
Azure Kubernetes Service
You may encrypt a deployed Azure Kubernetes Service resource using customer-managed keys at any time. For
more information, see Bring your own keys with Azure Kubernetes Service.
This process allows you to encrypt both the Data and the OS Disk of the deployed virtual machines in the
Kubernetes cluster.
IMPORTANT
This process only works with AKS K8s version 1.17 or higher. Azure Machine Learning added support for AKS 1.17 on Jan
13, 2020.
Encryption in transit
Azure Machine Learning uses TLS to secure internal communication between various Azure Machine Learning
microservices. All Azure Storage access also occurs over a secure channel.
To secure external calls made to the scoring endpoint, Azure Machine Learning uses TLS. For more information,
see Use TLS to secure a web service through Azure Machine Learning.
Next steps
Connect to Azure storage
Get data from a datastore
Connect to data
Train with datasets
Customer-managed keys.
Customer-managed keys for Azure Machine
Learning
5/25/2022 • 6 minutes to read • Edit Online
Azure Machine Learning is built on top of multiple Azure services. While the data is stored securely using
encryption keys that Microsoft provides, you can enhance security by also providing your own (customer-
managed) keys. The keys you provide are stored securely using Azure Key Vault.
Customer-managed keys are used with the following services that Azure Machine Learning relies on:
SERVIC E W H AT IT ’S USED F O R
Azure Cognitive Search Stores workspace metadata for Azure Machine Learning
Azure Storage Account Stores workspace metadata for Azure Machine Learning
TIP
Azure Cosmos DB, Cognitive Search, and Storage Account are secured using the same key. You can use a different key
for Azure Kubernetes Service and Container Instance.
To use a customer-managed key with Azure Cosmos DB, Cognitive Search, and Storage Account, the key is provided
when you create your workspace. The key(s) used with Azure Container Instance and Kubernetes Service are provided
when configuring those resources.
In addition to customer-managed keys, Azure Machine Learning also provides a hbi_workspace flag. Enabling
this flag reduces the amount of data Microsoft collects for diagnostic purposes and enables extra encryption in
Microsoft-managed environments. This flag also enables the following behaviors:
Starts encrypting the local scratch disk in your Azure Machine Learning compute cluster, provided you
haven’t created any previous clusters in that subscription. Else, you need to raise a support ticket to enable
encryption of the scratch disk of your compute clusters.
Cleans up your local scratch disk between runs.
Securely passes credentials for your storage account, container registry, and SSH account from the execution
layer to your compute clusters using your key vault.
TIP
The hbi_workspace flag does not impact encryption in transit, only encryption at rest.
Prerequisites
An Azure subscription.
An Azure Key Vault instance. The key vault contains the key(s) used to encrypt your services.
The key vault instance must enable soft delete and purge protection.
The managed identity for the services secured by a customer-managed key must have the
following permissions in key vault:
wrap key
unwrap key
get
For example, the managed identity for Azure Cosmos DB would need to have those permissions to
the key vault.
Limitations
The customer-managed key for resources the workspace depends on can’t be updated after workspace
creation.
Resources managed by Microsoft in your subscription can’t transfer ownership to you.
You can't delete Microsoft-managed resources used for customer-managed keys without also deleting your
workspace.
SERVIC E H O W IT ’S USED
Azure Cognitive Search Stores indices that are used to help query your machine
learning content.
Azure Storage Account Stores other metadata such as Azure Machine Learning
pipelines data.
Your Azure Machine Learning workspace reads and writes data using its managed identity. This identity is
granted access to the resources using a role assignment (Azure role-based access control) on the data resources.
The encryption key you provide is used to encrypt data that is stored on Microsoft-managed resources. It's also
used to create indices for Azure Cognitive Search, which are created at runtime.
Customer-managed keys
When you don't use a customer-managed key , Microsoft creates and manages these resources in a
Microsoft owned Azure subscription and uses a Microsoft-managed key to encrypt the data.
When you use a customer-managed key , these resources are in your Azure subscription and encrypted with
your key. While they exist in your subscription, these resources are managed by Microsoft . They're
automatically created and configured when you create your Azure Machine Learning workspace.
IMPORTANT
When using a customer-managed key, the costs for your subscription will be higher because these resources are in your
subscription. To estimate the cost, use the Azure pricing calculator.
These Microsoft-managed resources are located in a new Azure resource group is created in your subscription.
This group is in addition to the resource group for your workspace. This resource group will contain the
Microsoft-managed resources that your key is used with. The resource group will be named using the formula
of <Azure Machine Learning workspace resource group name><GUID> .
TIP
The Request Units for the Azure Cosmos DB automatically scale as needed.
If your Azure Machine Learning workspace uses a private endpoint, this resource group will also contain a Microsoft-
managed Azure Virtual Network. This VNet is used to secure communications between the managed services and the
workspace. You cannot provide your own VNet for use with the Microsoft-managed resources . You also
cannot modify the vir tual network . For example, you cannot change the IP address range that it uses.
IMPORTANT
If your subscription does not have enough quota for these services, a failure will occur.
WARNING
Don't delete the resource group that contains this Azure Cosmos DB instance, or any of the resources automatically
created in this group. If you need to delete the resource group or Microsoft-managed services in it, you must delete the
Azure Machine Learning workspace that uses it. The resource group resources are deleted when the associated workspace
is deleted.
C O M P UT E EN C RY P T IO N
Azure Machine Learning compute instance Local scratch disk is encrypted if the hbi_workspace flag is
enabled for the workspace.
Azure Machine Learning compute cluster OS disk encrypted in Azure Storage with Microsoft-managed
keys. Temporary disk is encrypted if the hbi_workspace
flag is enabled for the workspace.
Compute cluster The OS disk for each compute node stored in Azure Storage is encrypted with Microsoft-
managed keys in Azure Machine Learning storage accounts. This compute target is ephemeral, and clusters are
typically scaled down when no runs are queued. The underlying virtual machine is de-provisioned, and the OS
disk is deleted. Azure Disk Encryption isn't supported for the OS disk.
Each virtual machine also has a local temporary disk for OS operations. If you want, you can use the disk to
stage training data. If the workspace was created with the hbi_workspace parameter set to TRUE , the temporary
disk is encrypted. This environment is short-lived (only during your run) and encryption support is limited to
system-managed keys only.
Compute instance The OS disk for compute instance is encrypted with Microsoft-managed keys in Azure
Machine Learning storage accounts. If the workspace was created with the hbi_workspace parameter set to
TRUE , the local temporary disk on compute instance is encrypted with Microsoft managed keys. Customer
managed key encryption isn't supported for OS and temp disk.
HBI_workspace flag
The hbi_workspace flag can only be set when a workspace is created. It can’t be changed for an existing
workspace.
When this flag is set to True, it may increase the difficulty of troubleshooting issues because less telemetry
data is sent to Microsoft. There’s less visibility into success rates or problem types. Microsoft may not be able
to react as proactively when this flag is True.
To enable the hbi_workspace flag when creating an Azure Machine Learning workspace, follow the steps in one
of the following articles:
How to create and manage a workspace.
How to create and manage a workspace using the Azure CLI.
How to create a workspace using Hashicorp Terraform.
How to create a workspace using Azure Resource Manager templates.
Next Steps
How to configure customer-managed keys with Azure Machine Learning.
Vulnerability management for Azure Machine
Learning
5/25/2022 • 7 minutes to read • Edit Online
Vulnerability management involves detecting, assessing, mitigating, and reporting on any security
vulnerabilities that exist in an organization’s systems and software. Vulnerability management is a shared
responsibility between you and Microsoft.
In this article, we discuss these responsibilities and outline the vulnerability management controls provided by
Azure Machine Learning. You'll learn how to keep your service instance and applications up to date with the
latest security updates, and how to minimize the window of opportunity for attackers.
Microsoft-managed VM images
Azure Machine Learning manages host OS VM images for Azure ML compute instance, Azure ML compute
clusters, and Data Science Virtual Machines. The update frequency is monthly and includes the following:
For each new VM image version, the latest updates are sourced from the original publisher of the OS. Using
the latest updates ensures that all OS-related patches that are applicable are picked. For Azure Machine
Learning, the publisher is Canonical for all the Ubuntu 18 images. These images are used for Azure Machine
Learning compute instances, compute clusters, and Data Science Virtual Machines.
VM images are updated monthly.
In addition to patches applied by the original publisher, Azure Machine Learning updates system packages
when updates are available.
Azure Machine Learning checks and validates any machine learning packages that may require an upgrade.
In most circumstances, new VM images contain the latest package versions.
All VM images are built on secure subscriptions that run vulnerability scanning regularly. Any unaddressed
vulnerabilities are flagged and are to be fixed within the next release.
The frequency is on a monthly interval for most images. For compute instance, the image release is aligned
with the Azure ML SDK release cadence as it comes preinstalled in the environment.
Next to the regular release cadence, hot fixes are applied in the case vulnerabilities are discovered. Hot fixes get
rolled out within 72 hours for Azure ML compute and within a week for Compute Instance.
NOTE
The host OS is not the OS version you might specify for an environment when training or deploying a model.
Environments run inside Docker. Docker runs on the host OS.
Compute clusters
Compute clusters automatically upgrade to the latest VM image. If the cluster is configured with min nodes = 0,
it automatically upgrades nodes to the latest VM image version when all jobs are completed and the cluster
reduces to zero nodes.
There are conditions in which cluster nodes do not scale down, and as a result are unable to get the latest
VM images.
Cluster minimum node count may be set to a value greater than 0.
Jobs may be scheduled continuously on your cluster.
It is your responsibility to scale non-idle cluster nodes down to get the latest OS VM image updates.
Azure Machine Learning does not abort any running workloads on compute nodes to issue VM updates.
Temporarily change the minimum nodes to zero and allow the cluster to reduce to zero nodes.
Managed online endpoints
Managed Online Endpoints automatically receive OS host image updates that include vulnerability fixes. The
update frequency of images is at least once a month.
Compute nodes get automatically upgraded to the latest VM image version once released. There’s no action
required on you.
Customer managed Kubernetes clusters
Kubernetes compute lets you configure Kubernetes clusters to train, inference, and manage models in Azure
Machine Learning.
Because you manage the environment with Kubenetes, both OS VM vulnerabilities and container image
vulnerability management is your responsibility.
Azure Machine Learning frequently publishes new versions of AzureML extension container images into
Microsoft Container Registry. It's Microsoft’s responsibility to ensure new image versions are free from
vulnerabilities. Vulnerabilities are fixed with each release.
When your clusters run jobs without interruption, running jobs may run outdated container image versions.
Once you upgrade the amlarc extension to a running cluster, newly submitted jobs will start to use the latest
image version. When upgrading the AMLArc extension to its latest version, clean up the old container image
versions from the clusters as required.
Observability on whether your Azure Arc cluster is running the latest version of AMLArc, you can find via the
Azure portal. Under your Arc resource of the type 'Kubernetes - Azure Arc', see 'Extensions' to find the
version of the AMLArc extension.
Next steps
Azure Machine Learning Base Images Repository
Data Science Virtual Machine release notes
AzureML Python SDK Release Notes
Machine learning enterprise security
Set up authentication for Azure Machine Learning
resources and workflows
5/25/2022 • 10 minutes to read • Edit Online
Learn how to set up authentication to your Azure Machine Learning workspace. Authentication to your Azure
Machine Learning workspace is based on Azure Active Director y (Azure AD) for most things. In general, there
are four authentication workflows that you can use when connecting to the workspace:
Interactive : You use your account in Azure Active Directory to either directly authenticate, or to get a
token that is used for authentication. Interactive authentication is used during experimentation and
iterative development. Interactive authentication enables you to control access to resources (such as a
web service) on a per-user basis.
Ser vice principal : You create a service principal account in Azure Active Directory, and use it to
authenticate or get a token. A service principal is used when you need an automated process to
authenticate to the service without requiring user interaction. For example, a continuous integration and
deployment script that trains and tests a model every time the training code changes.
Azure CLI session : You use an active Azure CLI session to authenticate. Azure CLI authentication is used
during experimentation and iterative development, or when you need an automated process to
authenticate to the service using a pre-authenticated session. You can log in to Azure via the Azure CLI on
your local workstation, without storing credentials in Python code or prompting the user to authenticate.
Similarly, you can reuse the same scripts as part of continuous integration and deployment pipelines,
while authenticating the Azure CLI with a service principal identity.
Managed identity : When using the Azure Machine Learning SDK on an Azure Virtual Machine, you can
use a managed identity for Azure. This workflow allows the VM to connect to the workspace using the
managed identity, without storing credentials in Python code or prompting the user to authenticate.
Azure Machine Learning compute clusters can also be configured to use a managed identity to access the
workspace when training models.
Regardless of the authentication workflow used, Azure role-based access control (Azure RBAC) is used to scope
the level of access (authorization) allowed to the resources. For example, an admin or automation process might
have access to create a compute instance, but not use it, while a data scientist could use it, but not delete or
create it. For more information, see Manage access to Azure Machine Learning workspace.
Azure AD Conditional Access can be used to further control or restrict access to the workspace for each
authentication workflow. For example, an admin can allow workspace access from managed devices only.
Prerequisites
IMPORTANT
The Azure CLI commands in this article require the azure-cli-ml , or v1, extension for Azure Machine Learning. We
recommend you select v2 (current) for the enhanced v2 CLI using the ml extension. For more information, see
Machine Learning CLI (v1).
IMPORTANT
When using a service principal, grant it the minimum access required for the task it is used for. For example, you
would not grant a service principal owner or contributor access if all it is used for is reading the access token for a web
deployment.
The reason for granting the least access is that a service principal uses a password to authenticate, and the password may
be stored as part of an automation script. If the password is leaked, having the minimum access required for a specific
tasks minimizes the malicious use of the SP.
The easiest way to create an SP and grant access to your workspace is by using the Azure CLI. To create a service
principal and grant it access to your workspace, use the following steps:
NOTE
You must be an admin on the subscription to perform all of these steps.
az login
If the CLI can open your default browser, it will do so and load a sign-in page. Otherwise, you need to
open a browser and follow the instructions on the command line. The instructions involve browsing to
https://aka.ms/devicelogin and entering an authorization code.
If you have multiple Azure subscriptions, you can use the az account set -s <subscription name or ID>
command to set the subscription. For more information, see Use multiple Azure subscriptions.
For other methods of authenticating, see Sign in with Azure CLI.
2. Create the service principal. In the following example, an SP named ml-auth is created:
The output will be a JSON similar to the following. Take note of the clientId , clientSecret , and
tenantId fields, as you will need them for other steps in this article.
{
"clientId": "your-client-id",
"clientSecret": "your-client-secret",
"subscriptionId": "your-sub-id",
"tenantId": "your-tenant-id",
"activeDirectoryEndpointUrl": "https://login.microsoftonline.com",
"resourceManagerEndpointUrl": "https://management.azure.com",
"activeDirectoryGraphResourceId": "https://graph.windows.net",
"sqlManagementEndpointUrl": "https://management.core.windows.net:5555",
"galleryEndpointUrl": "https://gallery.azure.com/",
"managementEndpointUrl": "https://management.core.windows.net"
}
3. Retrieve the details for the service principal by using the clientId value returned in the previous step:
The following JSON is a simplified example of the output from the command. Take note of the objectId
field, as you will need its value for the next step.
{
"accountEnabled": "True",
"addIns": [],
"appDisplayName": "ml-auth",
...
...
...
"objectId": "your-sp-object-id",
"objectType": "ServicePrincipal"
}
4. To grant access to the workspace and other resources used by Azure Machine Learning, use the
information in the following articles:
How to assign roles and actions in AzureML
How to assign roles in the CLI
IMPORTANT
Owner access allows the service principal to do virtually any operation in your workspace. It is used in this
document to demonstrate how to grant access; in a production environment Microsoft recommends granting the
service principal the minimum access needed to perform the role you intend it for. For information on creating a
custom role with the access needed for your scenario, see Manage access to Azure Machine Learning workspace.
SET T IN G VA L UE
Most examples in the documentation and samples use interactive authentication. For example, when using the
SDK there are two function calls that will automatically prompt you with a UI-based authentication flow:
Calling the from_config() function will issue the prompt.
APPLIES TO: Python SDK azureml v1
ws = Workspace(subscription_id="your-sub-id",
resource_group="your-resource-group-id",
workspace_name="your-workspace-name"
)
TIP
If you have access to multiple tenants, you may need to import the class and explicitly define what tenant you are
targeting. Calling the constructor for InteractiveLoginAuthentication will also prompt you to login similar to the calls
above.
APPLIES TO: Python SDK azureml v1
When using the Azure CLI, the az login command is used to authenticate the CLI session. For more
information, see Get started with Azure CLI.
TIP
If you are using the SDK from an environment where you have previously authenticated interactively using the Azure CLI,
you can use the AzureCliAuthentication class to authenticate to the workspace using the credentials cached by the
CLI:
APPLIES TO: Python SDK azureml v1
sp = ServicePrincipalAuthentication(tenant_id="your-tenant-id", # tenantID
service_principal_id="your-client-id", # clientId
service_principal_password="your-client-secret") # clientSecret
The sp variable now holds an authentication object that you use directly in the SDK. In general, it is a good idea
to store the ids/secrets used above in environment variables as shown in the following code. Storing in
environment variables prevents the information from being accidentally checked into a GitHub repo.
APPLIES TO: Python SDK azureml v1
import os
sp = ServicePrincipalAuthentication(tenant_id=os.environ['AML_TENANT_ID'],
service_principal_id=os.environ['AML_PRINCIPAL_ID'],
service_principal_password=os.environ['AML_PRINCIPAL_PASS'])
For automated workflows that run in Python and use the SDK primarily, you can use this object as-is in most
cases for your authentication. The following code authenticates to your workspace using the auth object you
created.
APPLIES TO: Python SDK azureml v1
ws = Workspace.get(name="ml-example",
auth=sp,
subscription_id="your-sub-id",
resource_group="your-rg-name")
ws.get_details()
IMPORTANT
If you are currently using Azure Active Directory Authentication Library (ADAL) to get credentials, we recommend that
you Migrate to the Microsoft Authentication Library (MSAL). ADAL support is scheduled to end on June 30, 2022.
For information and samples on authenticating with MSAL, see the following articles:
JavaScript - How to migrate a JavaScript app from ADAL.js to MSAL.js.
Node.js - How to migrate a Node.js app from ADAL to MSAL.
Python - ADAL to MSAL migration guide for Python.
msi_auth = MsiAuthentication()
ws = Workspace(subscription_id="your-sub-id",
resource_group="your-resource-group-id",
workspace_name="your-workspace-name",
auth=msi_auth
)
Next steps
How to use secrets in training.
How to configure authentication for models deployed as a web service.
Consume an Azure Machine Learning model deployed as a web service.
Manage access to an Azure Machine Learning
workspace
5/25/2022 • 12 minutes to read • Edit Online
In this article, you learn how to manage access (authorization) to an Azure Machine Learning workspace. Azure
role-based access control (Azure RBAC) is used to manage access to Azure resources, such as the ability to
create new resources or use existing ones. Users in your Azure Active Directory (Azure AD) are assigned specific
roles, which grant access to resources. Azure provides both built-in roles and the ability to create custom roles.
TIP
While this article focuses on Azure Machine Learning, individual services that Azure ML relies on provide their own RBAC
settings. For example, using the information in this article, you can configure who can submit scoring requests to a model
deployed as a web service on Azure Kubernetes Service. But Azure Kubernetes Service provides its own set of Azure roles.
For service specific RBAC information that may be useful with Azure Machine Learning, see the following links:
Control access to Azure Kubernetes cluster resources
Use Azure RBAC for Kubernetes authorization
Use Azure RBAC for access to blob data
WARNING
Applying some roles may limit UI functionality in Azure Machine Learning studio for other users. For example, if a user's
role does not have the ability to create a compute instance, the option to create a compute instance will not be available
in studio. This behavior is expected, and prevents the user from attempting operations that would return an access denied
error.
Default roles
Azure Machine Learning workspaces have a four built-in roles that are available by default. When adding users
to a workspace, they can be assigned one of the built-in roles described below.
RO L E A C C ESS L EVEL
AzureML Data Scientist Can perform all actions within an Azure Machine Learning
workspace, except for creating or deleting compute
resources and modifying the workspace itself.
IMPORTANT
Role access can be scoped to multiple levels in Azure. For example, someone with owner access to a workspace may not
have owner access to the resource group that contains the workspace. For more information, see How Azure RBAC works.
NOTE
You must be an owner of the resource at that level to create custom roles within that resource.
To create a custom role, first construct a role definition JSON file that specifies the permission and scope for the
role. The following example defines a custom role named "Data Scientist Custom" scoped at a specific
workspace level:
data_scientist_custom_role.json :
{
"Name": "Data Scientist Custom",
"IsCustom": true,
"Description": "Can run experiment but can't create or delete compute.",
"Actions": ["*"],
"NotActions": [
"Microsoft.MachineLearningServices/workspaces/*/delete",
"Microsoft.MachineLearningServices/workspaces/write",
"Microsoft.MachineLearningServices/workspaces/computes/*/write",
"Microsoft.MachineLearningServices/workspaces/computes/*/delete",
"Microsoft.Authorization/*/write"
],
"AssignableScopes": [
"/subscriptions/<subscription_id>/resourceGroups/<resource_group_name>/providers/Microsoft.MachineLearningSe
rvices/workspaces/<workspace_name>"
]
}
TIP
You can change the AssignableScopes field to set the scope of this custom role at the subscription level, the resource
group level, or a specific workspace level. The above custom role is just an example, see some suggested custom roles for
the Azure Machine Learning service.
This custom role can do everything in the workspace except for the following actions:
It can't create or update a compute resource.
It can't delete a compute resource.
It can't add, delete, or alter role assignments.
It can't delete the workspace.
To deploy this custom role, use the following Azure CLI command:
After deployment, this role becomes available in the specified workspace. Now you can add and assign this role
in the Azure portal.
For more information on custom roles, see Azure custom roles.
Azure Machine Learning operations
For more information on the operations (actions and not actions) usable with custom roles, see Resource
provider operations. You can also use the following Azure CLI command to list operations:
To view the role definition for a specific custom role, use the following Azure CLI command. The <role-name>
should be in the same format returned by the command above:
az role definition list -n <role-name> --subscription <sub-id>
You need to have permissions on the entire scope of your new role definition. For example if this new role has a
scope across three subscriptions, you need to have permissions on all three subscriptions.
NOTE
Role updates can take 15 minutes to an hour to apply across all role assignments in that scope.
Common scenarios
The following table is a summary of Azure Machine Learning activities and the permissions required to perform
them at the least scope. For example, if an activity can be performed with a workspace scope (Column 4), then
all higher scope with that permission will also work automatically:
IMPORTANT
All paths in this table that start with / are relative paths to Microsoft.MachineLearningServices/ :
Create new workspace 1 Not required Owner or contributor N/A (becomes Owner or
inherits higher scope role
after creation)
Create new compute cluster Not required Not required Owner, contributor, or
custom role allowing:
/workspaces/computes/write
Submitting any type of run Not required Not required Owner, contributor, or
custom role allowing:
"/workspaces/*/read",
"/workspaces/environments/write",
"/workspaces/experiments/runs/write",
"/workspaces/metadata/artifacts/write",
"/workspaces/metadata/snapshots/write",
"/workspaces/environments/build/action",
"/workspaces/experiments/runs/submit/action",
"/workspaces/environments/readSecrets/action"
Create new custom role Owner, contributor, or Not required Owner, contributor, or
custom role allowing custom role allowing:
Microsoft.Authorization/roleDefinitions/write /workspaces/computes/write
1: If you receive a failure when trying to create a workspace for the first time, make sure that your role allows
Microsoft.MachineLearningServices/register/action . This action allows you to register the Azure Machine
Learning resource provider with your Azure subscription.
2: When attaching an AKS cluster, you also need to the Azure Kubernetes Service Cluster Admin Role on the
cluster.
Create a workspace using a customer-managed key
When using a customer-managed key (CMK), an Azure Key Vault is used to store the key. The user or service
principal used to create the workspace must have owner or contributor access to the key vault.
Within the key vault, the user or service principal must have create, get, delete, and purge access to the key
through a key vault access policy. For more information, see Azure Key Vault security.
User-assigned managed identity with Azure ML compute cluster
To assign a user assigned identity to an Azure Machine Learning compute cluster, you need write permissions to
create the compute and the Managed Identity Operator Role. For more information on Azure RBAC with
Managed Identities, read How to manage user assigned identity
MLflow operations
To perform MLflow operations with your Azure Machine Learning workspace, use the following scopes your
custom role:
M L F LO W O P ERAT IO N SC O P E
Get a run and related data and metadata, get a list of all Microsoft.MachineLearningServices/workspaces/experiments/runs/read
values for the specified metric for a given run, list artifacts
for a run
M L F LO W O P ERAT IO N SC O P E
Delete a registered model along with all its version, delete Microsoft.MachineLearningServices/workspaces/models/delete
specific versions of a registered model
{
"Name": "Data Scientist Custom",
"IsCustom": true,
"Description": "Can run experiment but can't create or delete compute or deploy production endpoints.",
"Actions": [
"Microsoft.MachineLearningServices/workspaces/*/read",
"Microsoft.MachineLearningServices/workspaces/*/action",
"Microsoft.MachineLearningServices/workspaces/*/delete",
"Microsoft.MachineLearningServices/workspaces/*/write"
],
"NotActions": [
"Microsoft.MachineLearningServices/workspaces/delete",
"Microsoft.MachineLearningServices/workspaces/write",
"Microsoft.MachineLearningServices/workspaces/computes/*/write",
"Microsoft.MachineLearningServices/workspaces/computes/*/delete",
"Microsoft.Authorization/*",
"Microsoft.MachineLearningServices/workspaces/computes/listKeys/action",
"Microsoft.MachineLearningServices/workspaces/listKeys/action",
"Microsoft.MachineLearningServices/workspaces/services/aks/write",
"Microsoft.MachineLearningServices/workspaces/services/aks/delete",
"Microsoft.MachineLearningServices/workspaces/endpoints/pipelines/write"
],
"AssignableScopes": [
"/subscriptions/<subscription_id>"
]
}
MLOps
Allows you to assign a role to a service principal and use that to automate your MLOps pipelines. For example,
to submit runs against an already published pipeline:
mlops_custom_role.json :
{
"Name": "MLOps Custom",
"IsCustom": true,
"Description": "Can run pipelines against a published pipeline endpoint",
"Actions": [
"Microsoft.MachineLearningServices/workspaces/read",
"Microsoft.MachineLearningServices/workspaces/endpoints/pipelines/read",
"Microsoft.MachineLearningServices/workspaces/metadata/artifacts/read",
"Microsoft.MachineLearningServices/workspaces/metadata/snapshots/read",
"Microsoft.MachineLearningServices/workspaces/environments/read",
"Microsoft.MachineLearningServices/workspaces/metadata/secrets/read",
"Microsoft.MachineLearningServices/workspaces/modules/read",
"Microsoft.MachineLearningServices/workspaces/experiments/runs/read",
"Microsoft.MachineLearningServices/workspaces/datasets/registered/read",
"Microsoft.MachineLearningServices/workspaces/datastores/read",
"Microsoft.MachineLearningServices/workspaces/environments/write",
"Microsoft.MachineLearningServices/workspaces/experiments/runs/write",
"Microsoft.MachineLearningServices/workspaces/metadata/artifacts/write",
"Microsoft.MachineLearningServices/workspaces/metadata/snapshots/write",
"Microsoft.MachineLearningServices/workspaces/environments/build/action",
"Microsoft.MachineLearningServices/workspaces/experiments/runs/submit/action"
],
"NotActions": [
"Microsoft.MachineLearningServices/workspaces/computes/write",
"Microsoft.MachineLearningServices/workspaces/write",
"Microsoft.MachineLearningServices/workspaces/computes/delete",
"Microsoft.MachineLearningServices/workspaces/delete",
"Microsoft.MachineLearningServices/workspaces/computes/listKeys/action",
"Microsoft.MachineLearningServices/workspaces/listKeys/action",
"Microsoft.Authorization/*"
],
"AssignableScopes": [
"/subscriptions/<subscription_id>"
]
}
Workspace Admin
Allows you to perform all operations within the scope of a workspace, except :
Creating a new workspace
Assigning subscription or workspace level quotas
The workspace admin also cannot create a new role. It can only assign existing built-in or custom roles within
the scope of their workspace:
workspace_admin_custom_role.json :
{
"Name": "Workspace Admin Custom",
"IsCustom": true,
"Description": "Can perform all operations except quota management and upgrades",
"Actions": [
"Microsoft.MachineLearningServices/workspaces/*/read",
"Microsoft.MachineLearningServices/workspaces/*/action",
"Microsoft.MachineLearningServices/workspaces/*/write",
"Microsoft.MachineLearningServices/workspaces/*/delete",
"Microsoft.Authorization/roleAssignments/*"
],
"NotActions": [
"Microsoft.MachineLearningServices/workspaces/write"
],
"AssignableScopes": [
"/subscriptions/<subscription_id>"
]
}
Data labeler
Allows you to define a role scoped only to labeling data:
labeler_custom_role.json :
{
"Name": "Labeler Custom",
"IsCustom": true,
"Description": "Can label data for Labeling",
"Actions": [
"Microsoft.MachineLearningServices/workspaces/read",
"Microsoft.MachineLearningServices/workspaces/labeling/projects/read",
"Microsoft.MachineLearningServices/workspaces/labeling/projects/summary/read",
"Microsoft.MachineLearningServices/workspaces/labeling/labels/read",
"Microsoft.MachineLearningServices/workspaces/labeling/labels/write"
],
"NotActions": [
],
"AssignableScopes": [
"/subscriptions/<subscription_id>"
]
}
{
"properties": {
"roleName": "Labeling Team Lead",
"description": "Team lead for Labeling Projects",
"assignableScopes": [
"/subscriptions/<subscription_id>"
],
"permissions": [
{
"actions": [
"Microsoft.MachineLearningServices/workspaces/read",
"Microsoft.MachineLearningServices/workspaces/labeling/labels/read",
"Microsoft.MachineLearningServices/workspaces/labeling/labels/write",
"Microsoft.MachineLearningServices/workspaces/labeling/labels/reject/action",
"Microsoft.MachineLearningServices/workspaces/labeling/projects/read",
"Microsoft.MachineLearningServices/workspaces/labeling/projects/summary/read"
],
"notActions": [
"Microsoft.MachineLearningServices/workspaces/labeling/projects/write",
"Microsoft.MachineLearningServices/workspaces/labeling/projects/delete",
"Microsoft.MachineLearningServices/workspaces/labeling/export/action"
],
"dataActions": [],
"notDataActions": []
}
]
}
}
Troubleshooting
Here are a few things to be aware of while you use Azure role-based access control (Azure RBAC):
When you create a resource in Azure, such as a workspace, you are not directly the owner of the resource.
Your role is inherited from the highest scope role that you are authorized against in that subscription. As
an example if you are a Network Administrator, and have the permissions to create a Machine Learning
workspace, you would be assigned the Network Administrator role against that workspace, and not the
Owner role.
To perform quota operations in a workspace, you need subscription level permissions. This means setting
either subscription level quota or workspace level quota for your managed compute resources can only
happen if you have write permissions at the subscription scope.
When there are two role assignments to the same Azure Active Directory user with conflicting sections of
Actions/NotActions, your operations listed in NotActions from one role might not take effect if they are
also listed as Actions in another role. To learn more about how Azure parses role assignments, read How
Azure RBAC determines if a user has access to a resource
To deploy your compute resources inside a VNet, you need to explicitly have permissions for the
following actions:
on the VNet resources.
Microsoft.Network/virtualNetworks/*/read
on the subnet resource.
Microsoft.Network/virtualNetworks/subnets/join/action
For more information on Azure RBAC with networking, see the Networking built-in roles.
It can sometimes take up to 1 hour for your new role assignments to take effect over cached permissions
across the stack.
Next steps
Enterprise security overview
Virtual network isolation and privacy overview
Tutorial: Train and deploy a model
Resource provider operations
Use Managed identities with Azure Machine
Learning
5/25/2022 • 9 minutes to read • Edit Online
Prerequisites
An Azure Machine Learning workspace. For more information, see Create an Azure Machine Learning
workspace.
The Azure CLI extension for Machine Learning service
The Azure Machine Learning Python SDK.
To assign roles, the login for your Azure subscription must have the Managed Identity Operator role, or other
role that grants the required actions (such as Owner ).
You must be familiar with creating and working with Managed Identities.
IMPORTANT
When using Azure Machine Learning for inference on Azure Container Instance (ACI), admin user access on ACR is
required . Do not disable it if you plan on deploying models to ACI for inference.
When you create ACR without enabling admin user access, managed identities are used to access the ACR to
build and pull Docker images.
You can bring your own ACR with admin user disabled when you create the workspace. Alternatively, let Azure
Machine Learning create workspace ACR and disable admin user afterwards.
Bring your own ACR
If ACR admin user is disallowed by subscription policy, you should first create ACR without admin user, and then
associate it with the workspace. Also, if you have existing ACR with admin user disabled, you can attach it to the
workspace.
Create ACR from Azure CLI without setting --admin-enabled argument, or from Azure portal without enabling
admin user. Then, when creating Azure Machine Learning workspace, specify the Azure resource ID of the ACR.
The following example demonstrates creating a new Azure ML workspace that uses an existing ACR:
TIP
To get the value for the --container-registry parameter, use the az acr show command to show information for your
ACR. The id field contains the resource ID for your ACR.
2. Perform an action that requires ACR. For example, the tutorial on training a model.
3. Get the ACR name created by the cluster:
This command returns a value similar to the following text. You only want the last portion of the text,
which is the ACR instance name:
Create compute with managed identity to access Docker images for training
To access the workspace ACR, create machine learning compute cluster with system-assigned managed identity
enabled. You can enable the identity from Azure portal or Studio when creating compute, or from Azure CLI
using the below. For more information, see using managed identity with compute clusters.
Python
Azure CLI
Portal
When creating a compute cluster with the AmlComputeProvisioningConfiguration, use the identity_type
parameter to set the managed identity type.
A managed identity is automatically granted ACRPull role on workspace ACR to enable pulling Docker images
for training.
NOTE
If you create compute first, before workspace ACR has been created, you have to assign the ACRPull role manually.
Optionally, you can update the compute cluster to assign a user-assigned managed identity:
APPLIES TO: Azure CLI ml extension v2 (current)
To allow the compute cluster to pull the base images, grant the managed service identity ACRPull role on the
private ACR
APPLIES TO: Azure CLI ml extension v2 (current)
az role assignment create --assignee <principal ID> \
--role acrpull \
--scope "/subscriptions/<subscription ID>/resourceGroups/<private ACR resource
group>/providers/Microsoft.ContainerRegistry/registries/<private ACR name>"
Finally, when submitting a training run, specify the base image location in the environment definition.
APPLIES TO: Python SDK azureml v1
IMPORTANT
To ensure that the base image is pulled directly to the compute resource, set user_managed_dependencies = True and
do not specify a Dockerfile. Otherwise Azure Machine Learning service will attempt to build a new Docker image and fail,
because only the compute cluster has access to pull the base image from ACR.
Build Azure Machine Learning managed environment into base image from private ACR for training or
inference
APPLIES TO: Azure CLI ml extension v2 (current)
In this scenario, Azure Machine Learning service builds the training or inference environment on top of a base
image you supply from a private ACR. Because the image build task happens on the workspace ACR using ACR
Tasks, you must perform more steps to allow access.
1. Create user-assigned managed identity and grant the identity ACRPull access to the private ACR .
2. Grant the workspace system-assigned managed identity a Managed Identity Operator role on the
user-assigned managed identity from the previous step. This role allows the workspace to assign the
user-assigned managed identity to ACR Task for building the managed environment.
a. Obtain the principal ID of workspace system-assigned managed identity:
The user-assigned managed identity resource ID is Azure resource ID of the user assigned identity,
in the format
/subscriptions/<subscription ID>/resourceGroups/<resource
group>/providers/Microsoft.ManagedIdentity/userAssignedIdentities/<user-assigned managed
identity name>
.
3. Specify the external ACR and client ID of the user-assigned managed identity in workspace
connections by using Workspace.set_connection method:
APPLIES TO: Python SDK azureml v1
workspace.set_connection(
name="privateAcr",
category="ACR",
target = "<acr url>",
authType = "RegistryConnection",
value={"ResourceId": "<user-assigned managed identity resource id>", "ClientId": "<user-assigned
managed identity client ID>"})
Once the configuration is complete, you can use the base images from private ACR when building environments
for training or inference. The following code snippet demonstrates how to specify the base image ACR and
image name in an environment definition:
APPLIES TO: Python SDK azureml v1
env = Environment(name="my-env")
env.docker.base_image = "<acr url>/my-repo/my-image:latest"
Optionally, you can specify the managed identity resource URL and client ID in the environment definition itself
by using RegistryIdentity. If you use registry identity explicitly, it overrides any workspace connections specified
earlier:
APPLIES TO: Python SDK azureml v1
identity = RegistryIdentity()
identity.resource_id= "<user-assigned managed identity resource ID>"
identity.client_id="<user-assigned managed identity client ID>"
env.docker.base_image_registry.registry_identity=identity
env.docker.base_image = "my-acr.azurecr.io/my-repo/my-image:latest"
NOTE
If you bring your own AKS cluster, the cluster must have service principal enabled instead of managed identity.
IMPORTANT
When creating workspace with user-assigned managed identity, you must create the associated resources yourself, and
grant the managed identity roles on those resources. Use the role assignment ARM template to make the assignments.
Use Azure CLI or Python SDK to create the workspace. When using the CLI, specify the ID using the
--primary-user-assigned-identity parameter. When using the SDK, use primary_user_assigned_identity . The
following are examples of using the Azure CLI and Python to create a new workspace using these parameters:
Azure CLI
APPLIES TO: Azure CLI ml extension v2 (current)
Python
APPLIES TO: Python SDK azureml v1
ws = Workspace.create(name="workspace name",
subscription_id="subscription id",
resource_group="resource group name",
primary_user_assigned_identity="managed identity ARM ID")
You can also use an ARM template to create a workspace with user-assigned managed identity.
For a workspace with customer-managed keys for encryption, you can pass in a user-assigned managed identity
to authenticate from storage to Key Vault. Use argument user-assigned-identity-for-cmk-encr yption (CLI)
or user_assigned_identity_for_cmk_encr yption (SDK) to pass in the managed identity. This managed
identity can be the same or different as the workspace primary user assigned managed identity.
Next steps
Learn more about enterprise security in Azure Machine Learning
Learn about identity-based data access
Learn about managed identities on compute cluster.
Use Azure AD identity with your machine learning
web service in Azure Kubernetes Service
5/25/2022 • 4 minutes to read • Edit Online
In this how-to, you learn how to assign an Azure Active Directory (Azure AD) identity to your deployed machine
learning model in Azure Kubernetes Service. The Azure AD Pod Identity project allows applications to access
cloud resources securely with Azure AD by using a Managed Identity and Kubernetes primitives. This allows
your web service to securely access your Azure resources without having to embed credentials or manage
tokens directly inside your score.py script. This article explains the steps to create and install an Azure Identity
in your Azure Kubernetes Service cluster and assign the identity to your deployed web service.
Prerequisites
The Azure CLI extension for the Machine Learning service, the Azure Machine Learning SDK for Python,
or the Azure Machine Learning Visual Studio Code extension.
Access to your AKS cluster using the kubectl command. For more information, see Connect to the
cluster
An Azure Machine Learning web service deployed to your AKS cluster.
az aks show --name <AKS cluster name> --resource-group <resource group name> --subscription
<subscription id> --query enableRbac
This command returns a value of true if Kubernetes RBAC is enabled. This value determines the
command to use in the next step.
2. Install Azure AD Pod Identity in your AKS cluster.
3. Create an Identity on Azure following the steps shown in Azure AD Pod Identity project page.
4. Deploy AzureIdentity following the steps shown in Azure AD Pod Identity project page.
5. Deploy AzureIdentityBinding following the steps shown in Azure AD Pod Identity project page.
6. If the Azure Identity created in the previous step is not in the same node resource group for your AKS
cluster, follow the Role Assignment steps shown in Azure AD Pod Identity project page.
Add the Azure Identity selector label to your deployment by editing the deployment spec. The selector value
should be the one that you defined in step 5 of Deploy AzureIdentityBinding.
apiVersion: "aadpodidentity.k8s.io/v1"
kind: AzureIdentityBinding
metadata:
name: demo1-azure-identity-binding
spec:
AzureIdentity: <a-idname>
Selector: <label value to match>
Edit the deployment to add the Azure Identity selector label. Go to the following section under
/spec/template/metadata/labels . You should see values such as isazuremlapp: “true” . Add the aad-pod-identity
label like shown below.
spec:
template:
metadata:
labels:
aadpodidbinding: "<value of Selector in AzureIdentityBinding>"
...
To verify that the label was correctly added, run the following command. You should also see the statuses of the
newly created pods.
Once the pods are up and running, the web services for this deployment will now be able to access Azure
resources through your Azure Identity without having to embed the credentials in your code.
my_vault_name = "yourkeyvaultname"
my_vault_url = "https://{}.vault.azure.net/".format(my_vault_name)
my_secret_name = "sample-secret"
IMPORTANT
This example uses the DefaultAzureCredential. To grant your identity access using a specific access policy, see Assign a Key
Vault access policy using the Azure CLI.
my_storage_account_name = "yourstorageaccountname"
my_storage_account_url = "https://{}.blob.core.windows.net/".format(my_storage_account_name)
Next steps
For more information on how to use the Python Azure Identity client library, see the repository on GitHub.
Secure Azure Machine Learning workspace
resources using virtual networks (VNets)
5/25/2022 • 10 minutes to read • Edit Online
Secure Azure Machine Learning workspace resources and compute environments using virtual networks
(VNets). This article uses an example scenario to show you how to configure a complete virtual network.
TIP
This article is part of a series on securing an Azure Machine Learning workflow. See the other articles in this series:
Secure the workspace resources
Secure the training environment
For securing inference, see the following documents:
If using CLI v1 or SDK v1 - Secure inference environment
If using CLI v2 or SDK v2 - Network isolation for managed online endpoints
Enable studio functionality
Use custom DNS
Use a firewall
API platform network isolation
For a tutorial on creating a secure workspace, see Tutorial: Create a secure workspace or Tutorial: Create a secure
workspace using a template.
Prerequisites
This article assumes that you have familiarity with the following topics:
Azure Virtual Networks
IP networking
Azure Machine Learning workspace with private endpoint
Network Security Groups (NSG)
Network firewalls
Example scenario
In this section, you learn how a common network scenario is set up to secure Azure Machine Learning
communication with private IP addresses.
The following table compares how services access different parts of an Azure Machine Learning network with
and without a VNet:
IN F EREN C IN G
A SSO C IAT ED T RA IN IN G C O M P UT E C O M P UT E
SC EN A RIO W O RK SPA C E RESO URC ES EN VIRO N M EN T EN VIRO N M EN T
Workspace - Create a private endpoint for your workspace. The private endpoint connects the workspace to
the vnet through several private IP addresses.
Public access - You can optionally enable public access for a secured workspace.
Associated resource - Use service endpoints or private endpoints to connect to workspace resources like
Azure storage, Azure Key Vault. For Azure Container Services, use a private endpoint.
Ser vice endpoints provide the identity of your virtual network to the Azure service. Once you
enable service endpoints in your virtual network, you can add a virtual network rule to secure the
Azure service resources to your virtual network. Service endpoints use public IP addresses.
Private endpoints are network interfaces that securely connect you to a service powered by Azure
Private Link. Private endpoint uses a private IP address from your VNet, effectively bringing the
service into your VNet.
Training compute access - Access training compute targets like Azure Machine Learning Compute
Instance and Azure Machine Learning Compute Clusters with public or private IP addresses.
Inference compute access - Access Azure Kubernetes Services (AKS) compute clusters with private IP
addresses.
The next sections show you how to secure the network scenario described above. To secure your network, you
must:
1. Secure the workspace and associated resources .
2. Secure the training environment .
3. Secure the inferencing environment v1 or v2.
4. Optionally: enable studio functionality .
5. Configure firewall settings .
6. Configure DNS name resolution .
SERVIC E EN DP O IN T IN F O RM AT IO N A L LO W T RUST ED IN F O RM AT IO N
Azure Storage Account Service and private endpoint Grant access to trusted Azure
Private endpoint services
4. In properties for the Azure Storage Account(s) for your workspace, add your client IP address to the
allowed list in firewall settings. For more information, see Configure firewalls and virtual networks.
SERVIC E EN DP O IN T IN F O RM AT IO N A L LO W T RUST ED IN F O RM AT IO N
Azure Storage Account Service and private endpoint Grant access from Azure resource
Private endpoint instances
or
Grant access to trusted Azure
services
TIP
Compute cluster and compute instance can be created with or without a public IP address. If created with a public
IP address, you get a load balancer with a public IP to accept the inbound access from Azure batch service and
Azure Machine Learning service. You need to configure User Defined Routing (UDR) if you use a firewall. If created
without a public IP, you get a private link service to accept the inbound access from Azure batch service and Azure
Machine Learning service without a public IP.
For detailed instructions on how to complete these steps, see Secure a training environment.
Example training job submission
In this section, you learn how Azure Machine Learning securely communicates between services to submit a
training job. This shows you how all your configurations work together to secure communication.
1. The client uploads training scripts and training data to storage accounts that are secured with a service or
private endpoint.
2. The client submits a training job to the Azure Machine Learning workspace through the private endpoint.
3. Azure Batch service receives the job from the workspace. It then submits the training job to the compute
environment through the public load balancer for the compute resource.
4. The compute resource receives the job and begins training. The compute resource uses information
stored in key vault to access storage accounts to download training files and upload output.
Limitations
Azure Compute Instance and Azure Compute Clusters must be in the same VNet, region, and subscription as
the workspace and its associated resources.
IMPORTANT
Using network isolation for managed online endpoints is a preview feature, and isn't fully supported.
For more information, see Enable network isolation for managed online endpoints.
TIP
As long as it is not the default storage account, the account used by data labeling can be secured behind the virtual
network.
Custom DNS
If you need to use a custom DNS solution for your virtual network, you must add host records for your
workspace.
For more information on the required domain names and IP addresses, see how to use a workspace with a
custom DNS server.
Microsoft Sentinel
Microsoft Sentinel is a security solution that can integrate with Azure Machine Learning. For example, using
Jupyter notebooks provided through Azure Machine Learning. For more information, see Use Jupyter notebooks
to hunt for security threats.
Public access
Microsoft Sentinel can automatically create a workspace for you if you are OK with a public endpoint. In this
configuration, the security operations center (SOC) analysts and system administrators connect to notebooks in
your workspace through Sentinel.
For information on this process, see Create an Azure ML workspace from Microsoft Sentinel
Private endpoint
If you want to secure your workspace and associated resources in a VNet, you must create the Azure Machine
Learning workspace first. You must also create a virtual machine 'jump box' in the same VNet as your
workspace, and enable Azure Bastion connectivity to it. Similar to the public configuration, SOC analysts and
administrators can connect using Microsoft Sentinel, but some operations must be performed using Azure
Bastion to connect to the VM.
For more information on this configuration, see Create an Azure ML workspace from Microsoft Sentinel
Next steps
This article is part of a series on securing an Azure Machine Learning workflow. See the other articles in this
series:
Secure the workspace resources
Secure the training environment
For securing inference, see the following documents:
If using CLI v1 or SDK v1 - Secure inference environment
If using CLI v2 or SDK v2 - Network isolation for managed online endpoints
Enable studio functionality
Use custom DNS
Use a firewall
API platform network isolation
Secure an Azure Machine Learning workspace with
virtual networks
5/25/2022 • 14 minutes to read • Edit Online
In this article, you learn how to secure an Azure Machine Learning workspace and its associated resources in a
virtual network.
TIP
This article is part of a series on securing an Azure Machine Learning workflow. See the other articles in this series:
Virtual network overview
Secure the training environment
For securing inference, see the following documents:
If using CLI v1 or SDK v1 - Secure inference environment
If using CLI v2 or SDK v2 - Network isolation for managed online endpoints
Enable studio functionality
Use custom DNS
Use a firewall
API platform network isolation
For a tutorial on creating a secure workspace, see Tutorial: Create a secure workspace or Tutorial: Create a secure
workspace using a template.
In this article you learn how to enable the following workspaces resources in a virtual network:
Azure Machine Learning workspace
Azure Storage accounts
Azure Machine Learning datastores and datasets
Azure Key Vault
Azure Container Registry
Prerequisites
Read the Network security overview article to understand common virtual network scenarios and overall
virtual network architecture.
Read the Azure Machine Learning best practices for enterprise security article to learn about best
practices.
An existing virtual network and subnet to use with your compute resources.
TIP
If you plan on using Azure Container Instances in the virtual network (to deploy models), then the workspace and
virtual network must be in the same resource group. Otherwise, they can be in different groups.
To deploy resources into a virtual network or subnet, your user account must have permissions to the
following actions in Azure role-based access control (Azure RBAC):
"Microsoft.Network/virtualNetworks/join/action" on the virtual network resource.
"Microsoft.Network/virtualNetworks/subnets/join/action" on the subnet resource.
For more information on Azure RBAC with networking, see the Networking built-in roles
Azure Container Registry
Your Azure Container Registry must be Premium version. For more information on upgrading, see
Changing SKUs.
If your Azure Container Registry uses a private endpoint , it must be in the same virtual network as the
storage account and compute targets used for training or inference. If it uses a ser vice endpoint , it
must be in the same virtual network and subnet as the storage account and compute targets.
Your Azure Machine Learning workspace must contain an Azure Machine Learning compute cluster.
Limitations
Azure Storage Account
If you plan to use Azure Machine Learning studio and the storage account is also in the VNet, there are
extra validation requirements:
If the storage account uses a ser vice endpoint , the workspace private endpoint and storage service
endpoint must be in the same subnet of the VNet.
If the storage account uses a private endpoint , the workspace private endpoint and storage private
endpoint must be in the same VNet. In this case, they can be in different subnets.
Azure Container Registry
When ACR is behind a virtual network, Azure Machine Learning can’t use it to directly build Docker images.
Instead, the compute cluster is used to build the images.
IMPORTANT
The compute cluster used to build Docker images needs to be able to access the package repositories that are used to
train and deploy your models. You may need to add network security rules that allow access to public repos, use private
Python packages, or use custom Docker images that already include the packages.
WARNING
If your Azure Container Registry uses a private endpoint or service endpoint to communicate with the virtual network,
you cannot use a managed identity with an Azure Machine Learning compute cluster.
Azure Monitor
WARNING
Azure Monitor supports using Azure Private Link to connect to a VNet. However, you must use the open Private Link
mode in Azure Monitor. For more information, see Private Link access modes: Private only vs. Open.
TIP
If you need the IP addresses instead of service tags, use one of the following options:
Download a list from Azure IP Ranges and Service Tags.
Use the Azure CLI az network list-service-tags command.
Use the Azure PowerShell Get-AzNetworkServiceTag command.
The IP addresses may change periodically.
You may also need to allow outbound traffic to Visual Studio Code and non-Microsoft sites for the installation
of packages required by your machine learning project. The following table lists commonly used repositories for
machine learning:
H O ST N A M E P URP O SE
pypi.org Used to list dependencies from the default index, if any, and
the index is not overwritten by user settings. If the index is
overwritten, you must also allow *.pythonhosted.org .
When using Azure Kubernetes Service (AKS) with Azure Machine Learning, allow the following traffic to the AKS
VNet:
General inbound/outbound requirements for AKS as described in the Restrict egress traffic in Azure
Kubernetes Service article.
Outbound to mcr.microsoft.com.
When deploying a model to an AKS cluster, use the guidance in the Deploy ML models to Azure Kubernetes
Service article.
For information on using a firewall solution, see Use a firewall with Azure Machine Learning.
Private endpoint
Service endpoint
TIP
When configuring a storage account that is not the default storage, select the Target subresource type that
corresponds to the storage account you want to add.
3. After creating the private endpoints for the storage resources, select the Firewalls and vir tual
networks tab under Networking for the storage account.
4. Select Selected networks , and then under Resource instances , select
Microsoft.MachineLearningServices/Workspace as the Resource type . Select your workspace using
Instance name . For more information, see Trusted access based on system-assigned managed identity.
TIP
Alternatively, you can select Allow Azure ser vices on the trusted ser vices list to access this storage
account to more broadly allow access from trusted services. For more information, see Configure Azure Storage
firewalls and virtual networks.
TIP
When using a private endpoint, you can also disable public access. For more information, see disallow public read access.
TIP
Regardless of whether you use a private endpoint or service endpoint, the key vault must be in the same network as the
private endpoint of the workspace.
Private endpoint
Service endpoint
For information on using a private endpoint with Azure Key Vault, see Integrate Key Vault with Azure Private
Link.
Azure Container Registry can be configured to use a private endpoint. Use the following steps to configure your
workspace to use ACR when it is in the virtual network:
1. Find the name of the Azure Container Registry for your workspace, using one of the following methods:
Azure CLI
Python SDK
Azure portal
IMPORTANT
The following limitations apply When using a compute cluster for image builds:
Only a CPU SKU is supported.
You can't use a compute cluster configured for no public IP address.
Azure CLI
Python SDK
Azure portal
You can use the az ml workspace update command to set a build compute. The command is the same for
both the v1 and v2 Azure CLI extensions for machine learning. In the following command, replace
myworkspace with your workspace name, myresourcegroup with the resource group that contains the
workspace, and mycomputecluster with the compute cluster name:
TIP
When ACR is behind a VNet, you can also disable public access to it.
PostgreSql Yes
NOTE
Azure Data Lake Store Gen1 and Azure Data Lake Store Gen2 skip validation by default, so you don't have to do anything.
The following code sample creates a new Azure Blob datastore and sets skip_validation=True .
blob_datastore = Datastore.register_azure_blob_container(workspace=ws,
datastore_name=blob_datastore_name,
container_name=container_name,
account_name=account_name,
account_key=account_key,
Use datasets
The syntax to skip dataset validation is similar for the following dataset types:
Delimited file
JSON
Parquet
SQL
File
The following code creates a new JSON dataset and sets validate=False .
json_ds = Dataset.Tabular.from_json_lines_files(path=datastore_paths,
validate=False)
IMPORTANT
When using a VPN gateway or ExpressRoute , you will need to plan how name resolution works between your on-
premises resources and those in the VNet. For more information, see Use a custom DNS server.
Workspace diagnostics
You can run diagnostics on your workspace from Azure Machine Learning studio or the Python SDK. After
diagnostics run, a list of any detected problems is returned. This list includes links to possible solutions. For
more information, see How to use workspace diagnostics.
Next steps
This article is part of a series on securing an Azure Machine Learning workflow. See the other articles in this
series:
Virtual network overview
Secure the training environment
Secure online endpoints (inference)
For securing inference, see the following documents:
If using CLI v1 or SDK v1 - Secure inference environment
If using CLI v2 or SDK v2 - Network isolation for managed online endpoints
Enable studio functionality
Use custom DNS
Use a firewall
Tutorial: Create a secure workspace
Tutorial: Create a secure workspace using a template
API platform network isolation
Secure an Azure Machine Learning training
environment with virtual networks
5/25/2022 • 20 minutes to read • Edit Online
In this article, you learn how to secure training environments with a virtual network in Azure Machine Learning.
TIP
This article is part of a series on securing an Azure Machine Learning workflow. See the other articles in this series:
Virtual network overview
Secure the workspace resources
For securing inference, see the following documents:
If using CLI v1 or SDK v1 - Secure inference environment
If using CLI v2 or SDK v2 - Network isolation for managed online endpoints
Enable studio functionality
Use custom DNS
Use a firewall
For a tutorial on creating a secure workspace, see Tutorial: Create a secure workspace or Tutorial: Create a secure
workspace using a template.
In this article you learn how to secure the following training compute resources in a virtual network:
Azure Machine Learning compute cluster
Azure Machine Learning compute instance
Azure Databricks
Virtual Machine
HDInsight cluster
Prerequisites
Read the Network security overview article to understand common virtual network scenarios and overall
virtual network architecture.
An existing virtual network and subnet to use with your compute resources.
To deploy resources into a virtual network or subnet, your user account must have permissions to the
following actions in Azure role-based access control (Azure RBAC):
"Microsoft.Network/virtualNetworks/*/read" on the virtual network resource. This permission isn't
needed for Azure Resource Manager (ARM) template deployments.
"Microsoft.Network/virtualNetworks/subnet/join/action" on the subnet resource.
For more information on Azure RBAC with networking, see the Networking built-in roles
Azure Machine Learning compute cluster/instance
Compute clusters and instances create the following resources. If they're unable to create these resources
(for example, if there's a resource lock on the resource group) then creation, scale out, or scale in, may fail.
IP address.
Network Security Group (NSG).
Load balancer.
The virtual network must be in the same subscription as the Azure Machine Learning workspace.
The subnet used for the compute instance or cluster must have enough unassigned IP addresses.
A compute cluster can dynamically scale. If there aren't enough unassigned IP addresses, the cluster
will be partially allocated.
A compute instance only requires one IP address.
To create a compute cluster or instance without a public IP address (a preview feature), your workspace
must use a private endpoint to connect to the VNet. For more information, see Configure a private
endpoint for Azure Machine Learning workspace.
If you plan to secure the virtual network by restricting traffic, see the Required public internet access
section.
The subnet used to deploy compute cluster/instance shouldn't be delegated to any other service. For
example, it shouldn't be delegated to ACI.
Azure Databricks
The virtual network must be in the same subscription and region as the Azure Machine Learning workspace.
If the Azure Storage Account(s) for the workspace are also secured in a virtual network, they must be in the
same virtual network as the Azure Databricks cluster.
Limitations
Azure Machine Learning compute cluster/instance
If put multiple compute instances or clusters in one virtual network, you may need to request a quota
increase for one or more of your resources. The Machine Learning compute instance or cluster
automatically allocates networking resources in the resource group that contains the vir tual
network . For each compute instance or cluster, the service allocates the following resources:
One network security group (NSG). This NSG contains the following rules, which are specific to
compute cluster and compute instance:
Allow inbound TCP traffic on ports 29876-29877 from the BatchNodeManagement service tag.
Allow inbound TCP traffic on port 44224 from the AzureMachineLearning service tag.
The following screenshot shows an example of these rules:
TIP
If your compute cluster or instance does not use a public IP address (a preview feature), these inbound
NSG rules are not required.
For compute cluster or instance, it's now possible to remove the public IP address (a preview
feature). If you have Azure Policy assignments prohibiting Public IP creation, then deployment of
the compute cluster or instance will succeed.
One load balancer
For compute clusters, these resources are deleted every time the cluster scales down to 0 nodes and
created when scaling up.
For a compute instance, these resources are kept until the instance is deleted. Stopping the instance
doesn't remove the resources.
IMPORTANT
These resources are limited by the subscription's resource quotas. If the virtual network resource group is locked
then deletion of compute cluster/instance will fail. Load balancer cannot be deleted until the compute
cluster/instance is deleted. Also please ensure there is no Azure Policy assignment which prohibits creation of
network security groups.
If you create a compute instance and plan to use the no public IP address configuration, your Azure
Machine Learning workspace's managed identity must be assigned the Reader role for the virtual
network that contains the workspace. For more information on assigning roles, see Steps to assign an
Azure role.
If you have configured Azure Container Registry for your workspace behind the virtual network, you
must use a compute cluster to build Docker images. You can't use a compute cluster with the no public IP
address configuration. For more information, see Enable Azure Container Registry.
If the Azure Storage Accounts for the workspace are also in the virtual network, use the following
guidance on subnet limitations:
If you plan to use Azure Machine Learning studio to visualize data or use designer, the storage
account must be in the same subnet as the compute instance or cluster .
If you plan to use the SDK , the storage account can be in a different subnet.
NOTE
Adding a resource instance for your workspace or selecting the checkbox for "Allow trusted Microsoft services to
access this account" is not sufficient to allow communication from the compute.
When your workspace uses a private endpoint, the compute instance can only be accessed from inside
the virtual network. If you use a custom DNS or hosts file, add an entry for
<instance-name>.<region>.instances.azureml.ms . Map this entry to the private IP address of the workspace
private endpoint. For more information, see the custom DNS article.
Virtual network service endpoint policies don't work for compute cluster/instance system storage
accounts.
If storage and compute instance are in different regions, you may see intermittent timeouts.
If the Azure Container Registry for your workspace uses a private endpoint to connect to the virtual
network, you can’t use a managed identity for the compute instance. To use a managed identity with the
compute instance, don't put the container registry in the VNet.
If you want to use Jupyter Notebooks on a compute instance:
Don't disable websocket communication. Make sure your network allows websocket communication
to *.instances.azureml.net and *.instances.azureml.ms .
Make sure that your notebook is running on a compute resource behind the same virtual network and
subnet as your data. When creating the compute instance, use Advanced settings > Configure
vir tual network to select the network and subnet.
Compute clusters can be created in a different region than your workspace. This functionality is in
preview , and is only available for compute clusters , not compute instances. When using a different
region for the cluster, the following limitations apply:
If your workspace associated resources, such as storage, are in a different virtual network than the
cluster, set up global virtual network peering between the networks. For more information, see Virtual
network peering.
You may see increased network latency and data transfer costs. The latency and costs can occur when
creating the cluster, and when running jobs on it.
Guidance such as using NSG rules, user-defined routes, and input/output requirements, apply as normal
when using a different region than the workspace.
WARNING
If you are using a private endpoint-enabled workspace , creating the cluster in a different region is not
suppor ted .
Azure Databricks
In addition to the databricks-private and databricks-public subnets used by Azure Databricks, the
default subnet created for the virtual network is also required.
Azure Databricks doesn't use a private endpoint to communicate with the virtual network.
For more information on using Azure Databricks in a virtual network, see Deploy Azure Databricks in your
Azure Virtual Network.
Azure HDInsight or virtual machine
Azure Machine Learning supports only virtual machines that are running Ubuntu.
TIP
If you need the IP addresses instead of service tags, use one of the following options:
Download a list from Azure IP Ranges and Service Tags.
Use the Azure CLI az network list-service-tags command.
Use the Azure PowerShell Get-AzNetworkServiceTag command.
The IP addresses may change periodically.
You may also need to allow outbound traffic to Visual Studio Code and non-Microsoft sites for the installation
of packages required by your machine learning project. The following table lists commonly used repositories for
machine learning:
H O ST N A M E P URP O SE
pypi.org Used to list dependencies from the default index, if any, and
the index is not overwritten by user settings. If the index is
overwritten, you must also allow *.pythonhosted.org .
When using Azure Kubernetes Service (AKS) with Azure Machine Learning, allow the following traffic to the AKS
VNet:
General inbound/outbound requirements for AKS as described in the Restrict egress traffic in Azure
Kubernetes Service article.
Outbound to mcr.microsoft.com.
When deploying a model to an AKS cluster, use the guidance in the Deploy ML models to Azure Kubernetes
Service article.
For information on using a firewall solution, see Use a firewall with Azure Machine Learning.
Compute clusters
Use the tabs below to select how you plan to create a compute cluster:
Studio
Python
Use the following steps to create a compute cluster in the Azure Machine Learning studio:
1. Sign in to Azure Machine Learning studio, and then select your subscription and workspace.
2. Select Compute on the left, Compute clusters from the center, and then select + New .
3. In the Create compute cluster dialog, select the VM size and configuration you need and then select
Next .
4. From the Configure Settings section, set the Compute name , Vir tual network , and Subnet .
TIP
If your workspace uses a private endpoint to connect to the virtual network, the Vir tual network selection field
is greyed out.
WARNING
By default, you do not have public internet access from No Public IP Compute Cluster. You need to configure User Defined
Routing (UDR) to reach to a public IP to access the internet. For example, you can use a public IP of your firewall, or you
can use Virtual Network NAT with a public IP.
A compute cluster with No public IP enabled has no inbound communication requirements from public
internet. Specifically, neither inbound NSG rule ( BatchNodeManagement , AzureMachineLearning ) is required. You
still need to allow inbound from source of Vir tualNetwork and any port source, to destination of
Vir tualNetwork , and destination port of 29876, 29877 and inbound from source AzureLoadBalancer and
any port source to destination Vir tualNetwork and port 44224 destination.
No public IP clusters are dependent on Azure Private Link for Azure Machine Learning workspace. A compute
cluster with No public IP also requires you to disable private endpoint network policies and private link service
network policies. These requirements come from Azure private link service and private endpoints and aren't
Azure Machine Learning specific. Follow instruction from Disable network policies for Private Link service to set
the parameters disable-private-endpoint-network-policies and disable-private-link-service-network-policies
on the virtual network subnet.
For outbound connections to work, you need to set up an egress firewall such as Azure firewall with user
defined routes. For instance, you can use a firewall set up with inbound/outbound configuration and route traffic
there by defining a route table on the subnet in which the compute cluster is deployed. The route table entry can
set up the next hop of the private IP address of the firewall with the address prefix of 0.0.0.0/0.
You can use a service endpoint or private endpoint for your Azure container registry and Azure storage in the
subnet in which cluster is deployed.
To create a no public IP address compute cluster (a preview feature) in studio, set No public IP checkbox in the
virtual network section. You can also create no public IP compute cluster through an ARM template. In the ARM
template set enableNodePublicIP parameter to false.
NOTE
Support for compute instances without public IP addresses is currently available and in public preview for the following
regions: France Central, East Asia, West Central US, South Central US, West US 2, East US, East US 2, North Europe, West
Europe, Central US, North Central US, West US, Australia East, Japan East, Japan West.
Support for compute clusters without public IP addresses is currently available and in public preview for the following
regions: France Central, East Asia, West Central US, South Central US, West US 2, East US, North Europe, East US 2,
Central US, West Europe, North Central US, West US, Australia East, Japan East, Japan West.
Troubleshooting
If you get this error message during creation of cluster
The specified subnet has PrivateLinkServiceNetworkPolicies or PrivateEndpointNetworkEndpoints enabled ,
follow the instructions from Disable network policies for Private Link service and Disable network policies
for Private Endpoint.
If job execution fails with connection issues to ACR or Azure Storage, verify that customer has added ACR
and Azure Storage service endpoint/private endpoints to subnet and ACR/Azure Storage allows the
access from the subnet.
To ensure that you've created a no public IP cluster, in Studio when looking at cluster details you'll see No
Public IP property is set to true under resource properties.
Compute instance
For steps on how to create a compute instance deployed in a virtual network, see Create and manage an Azure
Machine Learning compute instance.
No public IP for compute instances (preview)
When you enable No public IP , your compute instance doesn't use a public IP for communication with any
dependencies. Instead, it communicates solely within the virtual network using Azure Private Link ecosystem
and service/private endpoints, eliminating the need for a public IP entirely. No public IP removes access and
discoverability of compute instance node from the internet thus eliminating a significant threat vector. Compute
instances will also do packet filtering to reject any traffic from outside virtual network. No public IP instances
are dependent on Azure Private Link for Azure Machine Learning workspace.
WARNING
By default, you do not have public internet access from No Public IP Compute Instance. You need to configure User
Defined Routing (UDR) to reach to a public IP to access the internet. For example, you can use a public IP of your firewall,
or you can use Virtual Network NAT with a public IP.
For outbound connections to work, you need to set up an egress firewall such as Azure firewall with user
defined routes. For instance, you can use a firewall set up with inbound/outbound configuration and route traffic
there by defining a route table on the subnet in which the compute instance is deployed. The route table entry
can set up the next hop of the private IP address of the firewall with the address prefix of 0.0.0.0/0.
A compute instance with No public IP enabled has no inbound communication requirements from public
internet. Specifically, neither inbound NSG rule ( BatchNodeManagement , AzureMachineLearning ) is required. You
still need to allow inbound from source of Vir tualNetwork , any port source, destination of Vir tualNetwork ,
and destination port of 29876, 29877, 44224 .
A compute instance with No public IP also requires you to disable private endpoint network policies and
private link service network policies. These requirements come from Azure private link service and private
endpoints and aren't Azure Machine Learning specific. Follow instruction from Disable network policies for
Private Link service source IP to set the parameters disable-private-endpoint-network-policies and
disable-private-link-service-network-policies on the virtual network subnet.
To create a no public IP address compute instance (a preview feature) in studio, set No public IP checkbox in
the virtual network section. You can also create no public IP compute instance through an ARM template. In the
ARM template set enableNodePublicIP parameter to false.
Next steps:
Use custom DNS
Use a firewall
NOTE
Support for compute instances without public IP addresses is currently available and in public preview for the following
regions: France Central, East Asia, West Central US, South Central US, West US 2, East US, East US 2, North Europe, West
Europe, Central US, North Central US, West US, Australia East, Japan East, Japan West.
Support for compute clusters without public IP addresses is currently available and in public preview for the following
regions: France Central, East Asia, West Central US, South Central US, West US 2, East US, North Europe, East US 2,
Central US, West Europe, North Central US, West US, Australia East, Japan East, Japan West.
Inbound traffic
When using Azure Machine Learning compute instance (with a public IP) or compute cluster , allow inbound
traffic from Azure Batch management and Azure Machine Learning services. Compute instance with no public IP
(preview) does not require this inbound communication. A Network Security Group allowing this traffic is
dynamically created for you, however you may need to also create user-defined routes (UDR) if you have a
firewall. When creating a UDR for this traffic, you can use either IP Addresses or ser vice tags to route the
traffic.
IMPORTANT
Using service tags with user-defined routes is now GA. For more information, see Virtual Network routing.
TIP
While a compute instance without a public IP (a preview feature) does not need a UDR for this inbound traffic, you will still
need these UDRs if you also use a compute cluster or a compute instance with a public IP.
IP Address routes
Service tag routes
For the Azure Machine Learning service, you must add the IP address of both the primar y and secondar y
regions. To find the secondary region, see the Cross-region replication in Azure. For example, if your Azure
Machine Learning service is in East US 2, the secondary region is Central US.
To get a list of IP addresses of the Batch service and Azure Machine Learning service, download the Azure IP
Ranges and Service Tags and search the file for BatchNodeManagement.<region> and
AzureMachineLearning.<region> , where <region> is your Azure region.
IMPORTANT
The IP addresses may change over time.
When creating the UDR, set the Next hop type to Internet . This means the inbound communication from
Azure skips your firewall to access the load balancers with public IPs of Compute Instance and Compute Cluster.
UDR is required because Compute Instance and Compute Cluster will get random public IPs at creation, and you
cannot know the public IPs before creation to register them on your firewall to allow the inbound from Azure to
specific IPs for Compute Instance and Compute Cluster. The following image shows an example IP address
based UDR in the Azure portal:
For information on configuring UDR, see Route network traffic with a routing table.
For more information on input and output traffic requirements for Azure Machine Learning, see Use a
workspace behind a firewall.
Azure Databricks
For specific information on using Azure Databricks with a virtual network, see Deploy Azure Databricks in your
Azure Virtual Network.
Next steps
This article is part of a series on securing an Azure Machine Learning workflow. See the other articles in this
series:
Virtual network overview
Secure the workspace resources
For securing inference, see the following documents:
If using CLI v1 or SDK v1 - Secure inference environment
If using CLI v2 or SDK v2 - Network isolation for managed online endpoints
Enable studio functionality
Use custom DNS
Use a firewall
Use Azure Machine Learning studio in an Azure
virtual network
5/25/2022 • 7 minutes to read • Edit Online
In this article, you learn how to use Azure Machine Learning studio in a virtual network. The studio includes
features like AutoML, the designer, and data labeling.
Some of the studio's features are disabled by default in a virtual network. To re-enable these features, you must
enable managed identity for storage accounts you intend to use in the studio.
The following operations are disabled by default in a virtual network:
Preview data in the studio.
Visualize data in the designer.
Deploy a model in the designer.
Submit an AutoML experiment.
Start a labeling project.
The studio supports reading data from the following datastore types in a virtual network:
Azure Storage Account (blob & file)
Azure Data Lake Storage Gen1
Azure Data Lake Storage Gen2
Azure SQL Database
In this article, you learn how to:
Give the studio access to data stored inside of a virtual network.
Access the studio from a resource inside of a virtual network.
Understand how the studio impacts storage security.
TIP
This article is part of a series on securing an Azure Machine Learning workflow. See the other articles in this series:
Virtual network overview
Secure the workspace resources
Secure the training environment
For securing inference, see the following documents:
If using CLI v1 or SDK v1 - Secure inference environment
If using CLI v2 or SDK v2 - Network isolation for managed online endpoints
Use custom DNS
Use a firewall
For a tutorial on creating a secure workspace, see Tutorial: Create a secure workspace or Tutorial: Create a secure
workspace using a template.
Prerequisites
Read the Network security overview to understand common virtual network scenarios and architecture.
A pre-existing virtual network and subnet to use.
An existing Azure Machine Learning workspace with a private endpoint.
An existing Azure storage account added your virtual network.
Limitations
Azure Storage Account
When the storage account is in the VNet, there are extra validation requirements when using studio:
If the storage account uses a ser vice endpoint , the workspace private endpoint and storage service
endpoint must be in the same subnet of the VNet.
If the storage account uses a private endpoint , the workspace private endpoint and storage private
endpoint must be in the same VNet. In this case, they can be in different subnets.
Designer sample pipeline
There's a known issue where user cannot run sample pipeline in Designer homepage. This is the sample dataset
used in the sample pipeline is Azure Global dataset, and it cannot satisfy all virtual network environment.
To resolve this issue, you can use a public workspace to run sample pipeline to get to know how to use the
designer and then replace the sample dataset with your own dataset in the workspace within virtual network.
TIP
The first step is not required for the default storage account for the workspace. All other steps are required for any
storage account behind the VNet and used by the workspace, including the default storage account.
1. If the storage account is the default storage for your workspace, skip this step . If it is not the
default, Grant the workspace managed identity the 'Storage Blob Data Reader' role for the
Azure storage account so that it can read data from blob storage.
For more information, see the Blob Data Reader built-in role.
2. Grant the workspace managed identity the 'Reader' role for storage private endpoints . If your
storage service uses a private endpoint , grant the workspace's managed identity Reader access to the
private endpoint. The workspace's managed identity in Azure AD has the same name as your Azure
Machine Learning workspace.
TIP
Your storage account may have multiple private endpoints. For example, one storage account may have separate
private endpoint for blob, file, and dfs (Azure Data Lake Storage Gen2). Add the managed identity to all these
endpoints.
3. Enable managed identity authentication for default storage accounts . Each Azure Machine
Learning workspace has two default storage accounts, a default blob storage account and a default file
store account, which are defined when you create your workspace. You can also set new defaults in the
Datastore management page.
The following table describes why managed identity authentication is used for your workspace default
storage accounts.
STO RA GE A C C O UN T N OT ES
Workspace default blob storage Stores model assets from the designer. Enable managed
identity authentication on this storage account to deploy
models in the designer. If managed identity
authentication is disabled, the user's identity is used to
access data stored in the blob.
Workspace default file store Stores AutoML experiment assets. Enable managed
identity authentication on this storage account to submit
AutoML experiments.
4. Configure datastores to use managed identity authentication . After you add an Azure storage
account to your virtual network with either a service endpoint or private endpoint, you must configure
your datastore to use managed identity authentication. Doing so lets the studio access data in your
storage account.
Azure Machine Learning uses datastores to connect to storage accounts. When creating a new datastore,
use the following steps to configure a datastore to use managed identity authentication:
a. In the studio, select Datastores .
b. To update an existing datastore, select the datastore and select Update credentials .
To create a new datastore, select + New datastore .
c. In the datastore settings, select Yes for Use workspace managed identity for data preview
and profiling in Azure Machine Learning studio .
d. In the Networking settings for the Azure Storage Account , add the
Microsoft.MachineLearningService/workspaces Resource type , and set the Instance name to
the workspace.
These steps add the workspace's managed identity as a Reader to the new storage service using Azure
RBAC. Reader access allows the workspace to view the resource, but not make changes.
Firewall settings
Some storage services, such as Azure Storage Account, have firewall settings that apply to the public endpoint
for that specific service instance. Usually this setting allows you to allow/disallow access from specific IP
addresses from the public internet. This is not suppor ted when using Azure Machine Learning studio. It is
supported when using the Azure Machine Learning SDK or CLI.
TIP
Azure Machine Learning studio is supported when using the Azure Firewall service. For more information, see Use your
workspace behind a firewall.
Next steps
This article is part of a series on securing an Azure Machine Learning workflow. See the other articles in this
series:
Virtual network overview
Secure the workspace resources
Secure the training environment
For securing inference, see the following documents:
If using CLI v1 or SDK v1 - Secure inference environment
If using CLI v2 or SDK v2 - Network isolation for managed online endpoints
Use custom DNS
Use a firewall
Configure a private endpoint for an Azure Machine
Learning workspace
5/25/2022 • 15 minutes to read • Edit Online
In this document, you learn how to configure a private endpoint for your Azure Machine Learning workspace.
For information on creating a virtual network for Azure Machine Learning, see Virtual network isolation and
privacy overview.
Azure Private Link enables you to connect to your workspace using a private endpoint. The private endpoint is a
set of private IP addresses within your virtual network. You can then limit access to your workspace to only
occur over the private IP addresses. A private endpoint helps reduce the risk of data exfiltration. To learn more
about private endpoints, see the Azure Private Link article.
WARNING
Securing a workspace with private endpoints does not ensure end-to-end security by itself. You must secure all of the
individual components of your solution. For example, if you use a private endpoint for the workspace, but your Azure
Storage Account is not behind the VNet, traffic between the workspace and storage does not use the VNet for security.
For more information on securing resources used by Azure Machine Learning, see the following articles:
Virtual network isolation and privacy overview.
Secure workspace resources.
Secure training environments.
For securing inference, see the following documents:
If using CLI v1 or SDK v1 - Secure inference environment
If using CLI v2 or SDK v2 - Network isolation for managed online endpoints
Use Azure Machine Learning studio in a VNet.
API platform network isolation
Prerequisites
You must have an existing virtual network to create the private endpoint in.
Disable network policies for private endpoints before adding the private endpoint.
Limitations
If you enable public access for a workspace secured with private endpoint and use Azure Machine
Learning studio over the public internet, some features such as the designer may fail to access your data.
This problem happens when the data is stored on a service that is secured behind the VNet. For example,
an Azure Storage Account.
You may encounter problems trying to access the private endpoint for your workspace if you are using
Mozilla Firefox. This problem may be related to DNS over HTTPS in Mozilla. We recommend using
Microsoft Edge or Google Chrome as a workaround.
Using a private endpoint does not effect Azure control plane (management operations) such as deleting
the workspace or managing compute resources. For example, creating, updating, or deleting a compute
target. These operations are performed over the public Internet as normal. Data plane operations, such as
using Azure Machine Learning studio, APIs (including published pipelines), or the SDK use the private
endpoint.
When creating a compute instance or compute cluster in a workspace with a private endpoint, the
compute instance and compute cluster must be in the same Azure region as the workspace.
When creating or attaching an Azure Kubernetes Service cluster to a workspace with a private endpoint,
the cluster must be in the same region as the workspace.
When using a workspace with multiple private endpoints, one of the private endpoints must be in the
same VNet as the following dependency services:
Azure Storage Account that provides the default storage for the workspace
Azure Key Vault for the workspace
Azure Container Registry for the workspace.
For example, one VNet ('services' VNet) would contain a private endpoint for the dependency services
and the workspace. This configuration allows the workspace to communicate with the services. Another
VNet ('clients') might only contain a private endpoint for the workspace, and be used only for
communication between client development machines and the workspace.
TIP
If you'd like to create a workspace, private endpoint, and virtual network at the same time, see Use an Azure Resource
Manager template to create a workspace for Azure Machine Learning.
Python
Azure CLI extension 2.0 preview
Azure CLI extension 1.0
Portal
The Azure Machine Learning Python SDK provides the PrivateEndpointConfig class, which can be used with
Workspace.create() to create a workspace with a private endpoint. This class requires an existing virtual network.
APPLIES TO: Python SDK azureml v1
Python
Azure CLI extension 2.0 preview
Azure CLI extension 1.0
Portal
For more information on the classes and methods used in this example, see PrivateEndpointConfig and
Workspace.add_private_endpoint.
WARNING
Removing the private endpoints for a workspace doesn't make it publicly accessible . To make the workspace publicly
accessible, use the steps in the Enable public access section.
Python
Azure CLI extension 2.0 preview
Azure CLI extension 1.0
Portal
ws = Workspace.from_config()
# get the connection name
_, _, connection_name = ws.get_details()['privateEndpointConnections'][0]['id'].rpartition('/')
ws.delete_private_endpoint_connection(private_endpoint_connection_name=connection_name)
Enable public access
In some situations, you may want to allow someone to connect to your secured workspace over a public
endpoint, instead of through the VNet. Or you may want to remove the workspace from the VNet and re-enable
public access.
IMPORTANT
Enabling public access doesn't remove any private endpoints that exist. All communications between components behind
the VNet that the private endpoint(s) connect to are still secured. It enables public access only to the workspace, in
addition to the private access through any private endpoints.
WARNING
When connecting over the public endpoint while the workspace uses a private endpoint to communicate with other
resources:
Some features of studio will fail to access your data . This problem happens when the data is stored on a
service that is secured behind the VNet. For example, an Azure Storage Account.
Using Jupyter, JupyterLab, and RStudio on a compute instance, including running notebooks, is not suppor ted .
Python
Azure CLI extension 2.0 preview
Azure CLI extension 1.0
Portal
ws = Workspace.from_config()
ws.update(allow_public_access_when_behind_vnet=True)
IMPORTANT
Synapse's data exfiltration protection is not supported with Azure Machine Learning.
IMPORTANT
Each VNet that contains a private endpoint for the workspace must also be able to access the Azure Storage Account,
Azure Key Vault, and Azure Container Registry used by the workspace. For example, you might create a private endpoint
for the services in each VNet.
Adding multiple private endpoints uses the same steps as described in the Add a private endpoint to a
workspace section.
Scenario: Isolated clients
If you want to isolate the development clients, so they do not have direct access to the compute resources used
by Azure Machine Learning, use the following steps:
NOTE
These steps assume that you have an existing workspace, Azure Storage Account, Azure Key Vault, and Azure Container
Registry. Each of these services has a private endpoints in an existing VNet.
1. Create another VNet for the clients. This VNet might contain Azure Virtual Machines that act as your clients,
or it may contain a VPN Gateway used by on-premises clients to connect to the VNet.
2. Add a new private endpoint for the Azure Storage Account, Azure Key Vault, and Azure Container Registry
used by your workspace. These private endpoints should exist in the client VNet.
3. If you have additional storage that is used by your workspace, add a new private endpoint for that storage.
The private endpoint should exist in the client VNet and have private DNS zone integration enabled.
4. Add a new private endpoint to your workspace. This private endpoint should exist in the client VNet and have
private DNS zone integration enabled.
5. Use the steps in the Use studio in a virtual network article to enable studio to access the storage account(s).
The following diagram illustrates this configuration. The Workload VNet contains computes created by the
workspace for training & deployment. The Client VNet contains clients or client ExpressRoute/VPN connections.
Both VNets contain private endpoints for the workspace, Azure Storage Account, Azure Key Vault, and Azure
Container Registry.
NOTE
These steps assume that you have an existing workspace, Azure Storage Account, Azure Key Vault, and Azure Container
Registry. Each of these services has a private endpoints in an existing VNet.
1. Create an Azure Kubernetes Service instance. During creation, AKS creates a VNet that contains the AKS
cluster.
2. Add a new private endpoint for the Azure Storage Account, Azure Key Vault, and Azure Container Registry
used by your workspace. These private endpoints should exist in the client VNet.
3. If you have other storage that is used by your workspace, add a new private endpoint for that storage. The
private endpoint should exist in the client VNet and have private DNS zone integration enabled.
4. Add a new private endpoint to your workspace. This private endpoint should exist in the client VNet and have
private DNS zone integration enabled.
5. Attach the AKS cluster to the Azure Machine Learning workspace. For more information, see Create and
attach an Azure Kubernetes Service cluster.
Next steps
For more information on securing your Azure Machine Learning workspace, see the Virtual network
isolation and privacy overview article.
If you plan on using a custom DNS solution in your virtual network, see how to use a workspace with a
custom DNS server.
API platform network isolation
How to use your workspace with a custom DNS
server
5/25/2022 • 20 minutes to read • Edit Online
When using an Azure Machine Learning workspace with a private endpoint, there are several ways to handle
DNS name resolution. By default, Azure automatically handles name resolution for your workspace and private
endpoint. If you instead use your own custom DNS ser ver , you must manually create DNS entries or use
conditional forwarders for the workspace.
IMPORTANT
This article covers how to find the fully qualified domain names (FQDN) and IP addresses for these entries if you would
like to manually register DNS records in your DNS solution. Additionally this article provides architecture
recommendations for how to configure your custom DNS solution to automatically resolve FQDNs to the correct IP
addresses. This article does NOT provide information on configuring the DNS records for these items. Consult the
documentation for your DNS software for information on how to add records.
TIP
This article is part of a series on securing an Azure Machine Learning workflow. See the other articles in this series:
Virtual network overview
Secure the workspace resources
Secure the training environment
For securing inference, see the following documents:
If using CLI v1 or SDK v1 - Secure inference environment
If using CLI v2 or SDK v2 - Network isolation for managed online endpoints
Enable studio functionality
Use a firewall
Prerequisites
An Azure Virtual Network that uses your own DNS server.
An Azure Machine Learning workspace with a private endpoint. For more information, see Create an
Azure Machine Learning workspace.
Familiarity with using Network isolation during training & inference.
Familiarity with Azure Private Endpoint DNS zone configuration
Familiarity with Azure Private DNS
Optionally, Azure CLI or Azure PowerShell.
The Fully Qualified Domains resolve to the following Canonical Names (CNAMEs) called the workspace Private
Link FQDNs:
Azure Public regions :
<per-workspace globally-unique identifier>.workspace.<region the workspace was created
in>.privatelink.api.azureml.ms
ml-<workspace-name, truncated>-<region>-<per-workspace globally-unique
identifier>.privatelink.notebooks.azure.net
The FQDNs resolve to the IP addresses of the Azure Machine Learning workspace in that region. However,
resolution of the workspace Private Link FQDNs can be overridden by using a custom DNS server hosted in the
virtual network. For an example of this architecture, see the custom DNS server hosted in a vnet example.
<workspace-GUID>.workspace.<region>.api.azureml.ms
ml-<workspace-name, truncated>-<region>-<workspace-guid>.notebooks.azure.net
NOTE
The workspace name for this FQDN may be truncated. Truncation is done to keep
ml-<workspace-name, truncated>-<region>-<workspace-guid> at 63 characters or less.
<instance-name>.<region>.instances.azureml.ms
NOTE
Compute instances can be accessed only from within the virtual network.
The IP address for this FQDN is not the IP of the compute instance. Instead, use the private IP address of the
workspace private endpoint (the IP of the *.api.azureml.ms entries.)
*.<workspace-GUID>.inference.<region>.api.azureml.ms
<workspace-GUID>.workspace.<region>.api.ml.azure.cn
ml-<workspace-name, truncated>-<region>-<workspace-guid>.notebooks.chinacloudapi.cn
NOTE
The workspace name for this FQDN may be truncated. Truncation is done to keep
ml-<workspace-name, truncated>-<region>-<workspace-guid> at 63 characters or less.
<instance-name>.<region>.instances.azureml.cn
The IP address for this FQDN is not the IP of the compute instance. Instead, use the private IP address
of the workspace private endpoint (the IP of the *.api.azureml.ms entries.)
Azure US Government
The following FQDNs are for Azure US Government regions:
<workspace-GUID>.workspace.<region>.cert.api.ml.azure.us
<workspace-GUID>.workspace.<region>.api.ml.azure.us
ml-<workspace-name, truncated>-<region>-<workspace-guid>.notebooks.usgovcloudapi.net
NOTE
The workspace name for this FQDN may be truncated. Truncation is done to keep
ml-<workspace-name, truncated>-<region>-<workspace-guid> at 63 characters or less.
<instance-name>.<region>.instances.azureml.us
The IP address for this FQDN is not the IP of the compute instance. Instead, use the private IP
address of the workspace private endpoint (the IP of the *.api.azureml.ms entries.)
NOTE
The fully qualified domain names and IP addresses will be different based on your configuration. For example, the GUID
value in the domain name will be specific to your workspace.
Azure CLI
Azure PowerShell
Azure portal
1. To get the ID of the private endpoint network interface, use the following command:
2. To get the IP address and FQDN information, use the following command. Replace <resource-id> with
the ID from the previous step:
The information returned from all methods is the same; a list of the FQDN and private IP address for the
resources. The following example is from the Azure Public Cloud:
F Q DN IP A DDRESS
fb7e20a0-8891-458b-b969- 10.1.0.5
55ddb3382f51.workspace.eastus.api.azureml.ms
fb7e20a0-8891-458b-b969- 10.1.0.5
55ddb3382f51.workspace.eastus.cert.api.azureml.ms
ml-myworkspace-eastus-fb7e20a0-8891-458b-b969- 10.1.0.6
55ddb3382f51.notebooks.azure.net
mymanagedonlineendpoint.fb7e20a0-8891-458b-b969- 10.1.0.7
55ddb3382f51.inference.eastus.api.azureml.ms
The following table shows example IPs from Azure China regions:
F Q DN IP A DDRESS
52882c08-ead2-44aa-af65- 10.1.0.5
08a75cf094bd.workspace.chinaeast2.api.ml.azure.cn
52882c08-ead2-44aa-af65- 10.1.0.5
08a75cf094bd.workspace.chinaeast2.cert.api.ml.azure.cn
ml-mype-pltest-chinaeast2-52882c08-ead2-44aa-af65- 10.1.0.6
08a75cf094bd.notebooks.chinacloudapi.cn
The following table shows example IPs from Azure US Government regions:
F Q DN IP A DDRESS
52882c08-ead2-44aa-af65- 10.1.0.5
08a75cf094bd.workspace.chinaeast2.api.ml.azure.us
52882c08-ead2-44aa-af65- 10.1.0.5
08a75cf094bd.workspace.chinaeast2.cert.api.ml.azure.us
ml-mype-plt-usgovvirginia-52882c08-ead2-44aa-af65- 10.1.0.6
08a75cf094bd.notebooks.usgovcloudapi.net
Create A records in custom DNS server
Once the list of FQDNs and corresponding IP addresses are gathered, proceed to create A records in the
configured DNS Server. Refer to the documentation for your DNS server to determine how to create A records.
Note it is recommended to create a unique zone for the entire FQDN, and create the A record in the root of the
zone.
IMPORTANT
The private endpoint must have Private DNS integration enabled for this example to function correctly.
3. Create conditional for warder in DNS Ser ver to for ward to Azure DNS :
Next, create a conditional forwarder to the Azure DNS Virtual Server. The conditional forwarder ensures
that the DNS server always queries the Azure DNS Virtual Server IP address for FQDNs related to your
workspace. This means that the DNS Server will return the corresponding record from the Private DNS
Zone.
The zones to conditionally forward are listed below. The Azure DNS Virtual Server IP address is
168.63.129.16:
Azure Public regions :
api.azureml.ms
notebooks.azure.net
instances.azureml.ms
aznbcontent.net
Azure China regions :
api.ml.azure.cn
notebooks.chinacloudapi.cn
instances.azureml.cn
aznbcontent.net
Azure US Government regions :
api.ml.azure.us
notebooks.usgovcloudapi.net
instances.azureml.us
aznbcontent.net
IMPORTANT
Configuration steps for the DNS Server are not included here, as there are many DNS solutions available that can
be used as a custom DNS Server. Refer to the documentation for your DNS solution for how to appropriately
configure conditional forwarding.
The result of each nslookup should return one of the two private IP addresses on the Private Endpoint to
the Azure Machine Learning workspace. If it does not, then there is something misconfigured in the
custom DNS solution.
Possible causes:
The compute resource running the troubleshooting commands is not using DNS Server for DNS
resolution
The Private DNS Zones chosen when creating the Private Endpoint are not linked to the DNS Server
VNet
Conditional forwarders to Azure DNS Virtual Server IP were not configured correctly
NOTE
The DNS Server in the virtual network is separate from the On-premises DNS Server.
A Private DNS Zone overrides name resolution for all names within the scope of the root of the zone. This
override applies to all Virtual Networks the Private DNS Zone is linked to. For example, if a Private DNS
Zone rooted at privatelink.api.azureml.ms is linked to Virtual Network foo, all resources in Virtual
Network foo that attempt to resolve bar.workspace.westus2.privatelink.api.azureml.ms will receive any
record that is listed in the privatelink.api.azureml.ms zone.
However, records listed in Private DNS Zones are only returned to devices resolving domains using the
default Azure DNS Virtual Server IP address. The Azure DNS Virtual Server IP address is only valid within
the context of a Virtual Network. When using an on-premises DNS server, it is not able to query the
Azure DNS Virtual Server IP address to retrieve records.
To get around this behavior, create an intermediary DNS Server in a virtual network. This DNS server can
query the Azure DNS Virtual Server IP address to retrieve records for any Private DNS Zone linked to the
virtual network.
While the On-premises DNS Server will resolve domains for devices spread throughout your network
topology, it will resolve Azure Machine Learning-related domains against the DNS Server. The DNS
Server will resolve those domains from the Azure DNS Virtual Server IP address.
2. Create private endpoint with private DNS integration targeting Private DNS Zone linked to
DNS Ser ver Vir tual Network :
The next step is to create a Private Endpoint to the Azure Machine Learning workspace. The private
endpoint targets both Private DNS Zones created in step 1. This ensures all communication with the
workspace is done via the Private Endpoint in the Azure Machine Learning Virtual Network.
IMPORTANT
The private endpoint must have Private DNS integration enabled for this example to function correctly.
3. Create conditional for warder in DNS Ser ver to for ward to Azure DNS :
Next, create a conditional forwarder to the Azure DNS Virtual Server. The conditional forwarder ensures
that the DNS server always queries the Azure DNS Virtual Server IP address for FQDNs related to your
workspace. This means that the DNS Server will return the corresponding record from the Private DNS
Zone.
The zones to conditionally forward are listed below. The Azure DNS Virtual Server IP address is
168.63.129.16.
Azure Public regions :
api.azureml.ms
notebooks.azure.net
instances.azureml.ms
aznbcontent.net
Azure China regions :
api.ml.azure.cn
notebooks.chinacloudapi.cn
instances.azureml.cn
aznbcontent.net
Azure US Government regions :
api.ml.azure.us
notebooks.usgovcloudapi.net
instances.azureml.us
aznbcontent.net
IMPORTANT
Configuration steps for the DNS Server are not included here, as there are many DNS solutions available that can
be used as a custom DNS Server. Refer to the documentation for your DNS solution for how to appropriately
configure conditional forwarding.
4. Create conditional for warder in On-premises DNS Ser ver to for ward to DNS Ser ver :
Next, create a conditional forwarder to the DNS Server in the DNS Server Virtual Network. This
forwarder is for the zones listed in step 1. This is similar to step 3, but, instead of forwarding to the Azure
DNS Virtual Server IP address, the On-premises DNS Server will be targeting the IP address of the DNS
Server. As the On-premises DNS Server is not in Azure, it is not able to directly resolve records in Private
DNS Zones. In this case the DNS Server proxies requests from the On-premises DNS Server to the Azure
DNS Virtual Server IP. This allows the On-premises DNS Server to retrieve records in the Private DNS
Zones linked to the DNS Server Virtual Network.
The zones to conditionally forward are listed below. The IP addresses to forward to are the IP addresses
of your DNS Servers:
Azure Public regions :
api.azureml.ms
notebooks.azure.net
instances.azureml.ms
Azure China regions :
api.ml.azure.cn
notebooks.chinacloudapi.cn
instances.azureml.cn
Azure US Government regions :
api.ml.azure.us
notebooks.usgovcloudapi.net
instances.azureml.us
IMPORTANT
Configuration steps for the DNS Server are not included here, as there are many DNS solutions available that can
be used as a custom DNS Server. Refer to the documentation for your DNS solution for how to appropriately
configure conditional forwarding.
IMPORTANT
The hosts file only overrides name resolution for the local computer. If you want to use a hosts file with multiple
computers, you must modify it individually on each computer.
O P ERAT IN G SY ST EM LO C AT IO N
Linux /etc/hosts
macOS /etc/hosts
Windows %SystemRoot%\System32\drivers\etc\hosts
TIP
The name of the file is hosts with no extension. When editing the file, use administrator access. For example, on Linux or
macOS you might use sudo vi . On Windows, run notepad as an administrator.
The following is an example of hosts file entries for Azure Machine Learning:
For more information on the services and DNS resolution, see Azure Private Endpoint DNS configuration.
Troubleshooting
If after running through the above steps you are unable to access the workspace from a virtual machine or jobs
fail on compute resources in the Virtual Network containing the Private Endpoint to the Azure Machine learning
workspace, follow the below steps to try to identify the cause.
1. Locate the workspace FQDNs on the Private Endpoint :
Navigate to the Azure portal using one of the following links:
Azure Public regions
Azure China regions
Azure US Government regions
Navigate to the Private Endpoint to the Azure Machine Learning workspace. The workspace FQDNs will
be listed on the “Overview” tab.
2. Access compute resource in Vir tual Network topology :
Proceed to access a compute resource in the Azure Virtual Network topology. This will likely require
accessing a Virtual Machine in a Virtual Network that is peered with the Hub Virtual Network.
3. Resolve workspace FQDNs :
Open a command prompt, shell, or PowerShell. Then for each of the workspace FQDNs, run the following
command:
nslookup <workspace FQDN>
The result of each nslookup should yield one of the two private IP addresses on the Private Endpoint to
the Azure Machine Learning workspace. If it does not, then there is something misconfigured in the
custom DNS solution.
Possible causes:
The compute resource running the troubleshooting commands is not using DNS Server for DNS
resolution
The Private DNS Zones chosen when creating the Private Endpoint are not linked to the DNS Server
VNet
Conditional forwarders from DNS Server to Azure DNS Virtual Server IP were not configured
correctly
Conditional forwarders from On-premises DNS Server to DNS Server were not configured correctly
Next steps
This article is part of a series on securing an Azure Machine Learning workflow. See the other articles in this
series:
Virtual network overview
Secure the workspace resources
Secure the training environment
For securing inference, see the following documents:
If using CLI v1 or SDK v1 - Secure inference environment
If using CLI v2 or SDK v2 - Network isolation for managed online endpoints
Enable studio functionality
Use a firewall
For information on integrating Private Endpoints into your DNS configuration, see Azure Private Endpoint DNS
configuration.
For information on deploying models with a custom DNS name or TLS security, see Secure web services using
TLS.
Configure inbound and outbound network traffic
5/25/2022 • 17 minutes to read • Edit Online
In this article, learn about the network communication requirements when securing Azure Machine Learning
workspace in a virtual network (VNet). Including how to configure Azure Firewall to control access to your Azure
Machine Learning workspace and the public internet. To learn more about securing Azure Machine Learning, see
Enterprise security for Azure Machine Learning.
NOTE
The information in this article applies to Azure Machine Learning workspace configured with a private endpoint.
TIP
This article is part of a series on securing an Azure Machine Learning workflow. See the other articles in this series:
Virtual network overview
Secure the workspace resources
Secure the training environment
For securing inference, see the following documents:
If using CLI v1 or SDK v1 - Secure inference environment
If using CLI v2 or SDK v2 - Network isolation for managed online endpoints
Enable studio functionality
Use custom DNS
Well-known ports
The following are well-known ports used by services listed in this article. If a port range is used in this article
and is not listed in this section, it is specific to the service and may not have published information on what it is
used for:
P O RT DESC RIP T IO N
445 SMB traffic used to access file shares in Azure File storage
TIP
If you need the IP addresses instead of service tags, use one of the following options:
Download a list from Azure IP Ranges and Service Tags.
Use the Azure CLI az network list-service-tags command.
Use the Azure PowerShell Get-AzNetworkServiceTag command.
The IP addresses may change periodically.
You may also need to allow outbound traffic to Visual Studio Code and non-Microsoft sites for the installation
of packages required by your machine learning project. The following table lists commonly used repositories for
machine learning:
H O ST N A M E P URP O SE
pypi.org Used to list dependencies from the default index, if any, and
the index is not overwritten by user settings. If the index is
overwritten, you must also allow *.pythonhosted.org .
When using Azure Kubernetes Service (AKS) with Azure Machine Learning, allow the following traffic to the AKS
VNet:
General inbound/outbound requirements for AKS as described in the Restrict egress traffic in Azure
Kubernetes Service article.
Outbound to mcr.microsoft.com.
When deploying a model to an AKS cluster, use the guidance in the Deploy ML models to Azure Kubernetes
Service article.
Azure Firewall
IMPORTANT
Azure Firewall provides security for Azure Virtual Network resources. Some Azure Services, such as Azure Storage
Accounts, have their own firewall settings that apply to the public endpoint for that specific service instance. The
information in this document is specific to Azure Firewall.
For information on service instance firewall settings, see Use studio in a virtual network.
For inbound traffic to Azure Machine Learning compute cluster and compute instance, use user-defined
routes (UDRs) to skip the firewall.
For outbound traffic, create network and application rules.
These rule collections are described in more detail in What are some Azure Firewall concepts.
Inbound configuration
When using Azure Machine Learning compute instance (with a public IP) or compute cluster , allow inbound
traffic from Azure Batch management and Azure Machine Learning services. Compute instance with no public IP
(preview) does not require this inbound communication. A Network Security Group allowing this traffic is
dynamically created for you, however you may need to also create user-defined routes (UDR) if you have a
firewall. When creating a UDR for this traffic, you can use either IP Addresses or ser vice tags to route the
traffic.
IMPORTANT
Using service tags with user-defined routes is now GA. For more information, see Virtual Network routing.
TIP
While a compute instance without a public IP (a preview feature) does not need a UDR for this inbound traffic, you will still
need these UDRs if you also use a compute cluster or a compute instance with a public IP.
IP Address routes
Service tag routes
For the Azure Machine Learning service, you must add the IP address of both the primar y and secondar y
regions. To find the secondary region, see the Cross-region replication in Azure. For example, if your Azure
Machine Learning service is in East US 2, the secondary region is Central US.
To get a list of IP addresses of the Batch service and Azure Machine Learning service, download the Azure IP
Ranges and Service Tags and search the file for BatchNodeManagement.<region> and
AzureMachineLearning.<region> , where <region> is your Azure region.
IMPORTANT
The IP addresses may change over time.
When creating the UDR, set the Next hop type to Internet . This means the inbound communication from
Azure skips your firewall to access the load balancers with public IPs of Compute Instance and Compute Cluster.
UDR is required because Compute Instance and Compute Cluster will get random public IPs at creation, and you
cannot know the public IPs before creation to register them on your firewall to allow the inbound from Azure to
specific IPs for Compute Instance and Compute Cluster. The following image shows an example IP address
based UDR in the Azure portal:
For information on configuring UDR, see Route network traffic with a routing table.
Outbound configuration
1. Add Network rules , allowing traffic to and from the following service tags:
SERVIC E TA G P ROTO C O L P O RT
TIP
AzureContainerRegistry.region is only needed for custom Docker images. Including small modifications (such as
additional packages) to base images provided by Microsoft.
MicrosoftContainerRegistry.region is only needed if you plan on using the default Docker images provided by
Microsoft, and enabling user-managed dependencies.
AzureKeyVault.region is only needed if your workspace was created with the hbi_workspace flag enabled.
For entries that contain region , replace with the Azure region that you're using. For example,
AzureContainerRegistry.westus .
NOTE
This is not a complete list of the hosts required for all Python resources on the internet, only the most commonly
used. For example, if you need access to a GitHub repository or other host, you must identify and add the
required hosts for that scenario.
H O ST N A M E P URP O SE
Other firewalls
The guidance in this section is generic, as each firewall has its own terminology and specific configurations. If
you have questions, check the documentation for the firewall you are using.
If not configured correctly, the firewall can cause problems using your workspace. There are various host names
that are used both by the Azure Machine Learning workspace. The following sections list hosts that are required
for Azure Machine Learning.
Dependencies API
You can also use the Azure Machine Learning REST API to get a list of hosts and ports that you must allow
outbound traffic to. To use this API, use the following steps:
1. Get an authentication token. The following command demonstrates using the Azure CLI to get an
authentication token and subscription ID:
2. Call the API. In the following command, replace the following values:
Replace <region> with the Azure region your workspace is in. For example, westus2 .
Replace <resource-group> with the resource group that contains your workspace.
Replace <workspace-name> with the name of your workspace.
The result of the API call is a JSON document. The following snippet is an excerpt of this document:
{
"value": [
{
"properties": {
"category": "Azure Active Directory",
"endpoints": [
{
"domainName": "login.microsoftonline.com",
"endpointDetails": [
{
"port": 80
},
{
"port": 443
}
]
}
]
}
},
{
"properties": {
"category": "Azure portal",
"endpoints": [
{
"domainName": "management.azure.com",
"endpointDetails": [
{
"port": 443
}
]
}
]
}
},
...
Microsoft hosts
The hosts in the following tables are owned by Microsoft, and provide services required for the proper
functioning of your workspace. The tables list hosts for the Azure public, Azure Government, and Azure China
21Vianet regions.
IMPORTANT
Azure Machine Learning uses Azure Storage Accounts in your subscription and in Microsoft-managed subscriptions.
Where applicable, the following terms are used to differentiate between them in this section:
Your storage : The Azure Storage Account(s) in your subscription, which is used to store your data and artifacts such
as models, training data, training logs, and Python scripts.>
Microsoft storage : The Azure Machine Learning compute instance and compute clusters rely on Azure Batch, and
must access storage located in a Microsoft subscription. This storage is used only for the management of the compute
instances. None of your data is stored here.
IMPORTANT
In the following table, replace <storage> with the name of the default storage account for your Azure Machine Learning
workspace.
Azure public
Azure Government
Azure China 21Vianet
Azure public
Azure Government
Azure China 21Vianet
Also, use the information in the inbound configuration section to add IP addresses for BatchNodeManagement and
AzureMachineLearning .
For information on restricting access to models deployed to AKS, see Restrict egress traffic in Azure Kubernetes
Service.
Monitoring, metrics, and diagnostics
To support logging of metrics and other monitoring information to Azure Monitor and Application Insights,
allow outbound traffic to the following hosts:
NOTE
The information logged to these hosts is also used by Microsoft Support to be able to diagnose any problems you run
into with your workspace.
dc.applicationinsights.azure.com
dc.applicationinsights.microsoft.com
dc.ser vices.visualstudio.com
*.in.applicationinsights.azure.com
For a list of IP addresses for these hosts, see IP addresses used by Azure Monitor.
Python hosts
The hosts in this section are used to install Python packages, and are required during development, training, and
deployment.
NOTE
This is not a complete list of the hosts required for all Python resources on the internet, only the most commonly used.
For example, if you need access to a GitHub repository or other host, you must identify and add the required hosts for
that scenario.
H O ST N A M E P URP O SE
pypi.org Used to list dependencies from the default index, if any, and
the index is not overwritten by user settings. If the index is
overwritten, you must also allow *.pythonhosted.org .
H O ST N A M E P URP O SE
R hosts
The hosts in this section are used to install R packages, and are required during development, training, and
deployment.
NOTE
This is not a complete list of the hosts required for all R resources on the internet, only the most commonly used. For
example, if you need access to a GitHub repository or other host, you must identify and add the required hosts for that
scenario.
H O ST N A M E P URP O SE
O UT B O UN D
EN DP O IN T P O RT DESC RIP T IO N T RA IN IN G IN F EREN C E
NOTE
<region> is the lowcase full spelling of Azure Region, for example, eastus, southeastasia.
NOTE
This is not a complete list of the hosts required for all Visual Studio Code resources on the internet, only the most
commonly used. For example, if you need access to a GitHub repository or other host, you must identify and add the
required hosts for that scenario.
H O ST N A M E P URP O SE
update.code.visualstudio.com Used to retrieve VS Code server bits that are installed on the
compute instance through a setup script.
*.vo.msecnd.net
Next steps
This article is part of a series on securing an Azure Machine Learning workflow. See the other articles in this
series:
Virtual network overview
Secure the workspace resources
Secure the training environment
For securing inference, see the following documents:
If using CLI v1 or SDK v1 - Secure inference environment
If using CLI v2 or SDK v2 - Network isolation for managed online endpoints
Enable studio functionality
Use custom DNS
For more information on configuring Azure Firewall, see Tutorial: Deploy and configure Azure Firewall using the
Azure portal.
Network Isolation Change with Our New API
Platform on Azure Resource Manager
5/25/2022 • 4 minutes to read • Edit Online
In this article, you'll learn about network isolation changes with our new v2 API platform on Azure Resource
Manager (ARM) and its effect on network isolation.
Prerequisites
The Azure Machine Learning Python SDK or Azure CLI extension for machine learning v1.
IMPORTANT
The v1 extension ( azure-cli-ml ) version must be 1.41.0 or greater. Use the az version command to view
version information.
The v2 API provides a consistent API in one place. You can more easily use Azure role-based access control and
Azure Policy for resources with the v2 API because it's based on Azure Resource Manager.
The Azure Machine Learning CLI v2 uses our new v2 API platform. New features such as managed online
endpoints are only available using the v2 API platform.
TIP
Public ARM operations do not surface data in your storage account on public networks.
Your communication with public ARM is encrypted using TLS 1.2.
If you need time to evaluate the new v2 API before adopting it in your enterprise solutions, or have a company
policy that prohibits sending communication over public networks, you can enable the v1_legacy_mode
parameter. When enabled, this parameter disables the v2 API for your workspace.
IMPORTANT
Enabling v1_legacy_mode may prevent you from using features provided by the v2 API. For example, some features of
Azure Machine Learning studio may be unavailable.
If you don't plan on using a private endpoint with your workspace, you don't need to enable parameter.
If you're OK with operations communicating with public ARM, you don't need to enable the parameter.
You only need to enable the parameter if you're using a private endpoint with the workspace and don't
want to allow operations with ARM over public networks.
Once we implement the parameter, it will be retroactively applied to existing workspaces using the following
logic:
If you have an existing workspace with a private endpoint , the flag will be true .
If you have an existing workspace without a private endpoint (public workspace), the flag will be
false .
After the parameter has been implemented, the default value of the flag depends on the underlying REST API
version used when you create a workspace (with a private endpoint):
If the API version is older than 2022-05-01 , then the flag is true by default.
If the API version is 2022-05-01 or newer , then the flag is false by default.
IMPORTANT
If you want to use the v2 API with your workspace, you must set the v1_legacy_mode parameter to false .
Python
Azure CLI extension v1
ws = Workspace.from_config()
ws.update(v1_legacy_mode=false)
Next steps
Use a private endpoint with Azure Machine Learning workspace.
Create private link for managing Azure resources.
Failover for business continuity and disaster
recovery
5/25/2022 • 11 minutes to read • Edit Online
To maximize your uptime, plan ahead to maintain business continuity and prepare for disaster recovery with
Azure Machine Learning.
Microsoft strives to ensure that Azure services are always available. However, unplanned service outages may
occur. We recommend having a disaster recovery plan in place for handling regional service outages. In this
article, you'll learn how to:
Plan for a multi-regional deployment of Azure Machine Learning and associated resources.
Design for high availability of your solution.
Initiate a failover to another region.
NOTE
Azure Machine Learning itself does not provide automatic failover or disaster recovery.
In case you have accidentally deleted your workspace or corresponding components, this article also provides
you with currently supported recovery options.
Associated resources
Compute resources
The rest of this article describes the actions you need to take to make each of these services highly available.
TIP
Depending on your business requirements, you may decide to treat different Azure Machine Learning resources
differently. For example, you may want to use hot/hot for deployed models (inference), and hot/cold for experiments
(training).
Azure Machine Learning builds on top of other services. Some services can be configured to replicate to other
regions. Others you must manually create in multiple regions. The following table provides a list of services,
who is responsible for replication, and an overview of the configuration:
Key Vault Microsoft Use the same Key Vault instance with
the Azure Machine Learning
workspace and resources in both
regions. Key Vault automatically fails
over to a secondary region. For more
information, see Azure Key Vault
availability and redundancy.
To enable fast recovery and restart in the secondary region, we recommend the following development
practices:
Use Azure Resource Manager templates. Templates are 'infrastructure-as-code', and allow you to quickly
deploy services in both regions.
To avoid drift between the two regions, update your continuous integration and deployment pipelines to
deploy to both regions.
When automating deployments, include the configuration of workspace attached compute resources such as
Azure Kubernetes Service.
Create role assignments for users in both regions.
Create network resources such as Azure Virtual Networks and private endpoints for both regions. Make sure
that users have access to both network environments. For example, VPN and DNS configurations for both
virtual networks.
Compute and data services
Depending on your needs, you may have more compute or data services that are used by Azure Machine
Learning. For example, you may use Azure Kubernetes Services or Azure SQL Database. Use the following
information to learn how to configure these services for high availability.
Compute resources
Azure Kubernetes Ser vice : See Best practices for business continuity and disaster recovery in Azure
Kubernetes Service (AKS) and Create an Azure Kubernetes Service (AKS) cluster that uses availability zones. If
the AKS cluster was created by using the Azure Machine Learning Studio, SDK, or CLI, cross-region high
availability is not supported.
Azure Databricks : See Regional disaster recovery for Azure Databricks clusters.
Container Instances : An orchestrator is responsible for failover. See Azure Container Instances and
container orchestrators.
HDInsight : See High availability services supported by Azure HDInsight.
Data ser vices
Azure Blob container / Azure Files / Data Lake Storage Gen2 : See Azure Storage redundancy.
Data Lake Storage Gen1 : See High availability and disaster recovery guidance for Data Lake Storage Gen1.
SQL Database : See High availability for Azure SQL Database and SQL Managed Instance.
Azure Database for PostgreSQL : See High availability concepts in Azure Database for PostgreSQL - Single
Server.
Azure Database for MySQL : See Understand business continuity in Azure Database for MySQL.
Azure Databricks File System : See Regional disaster recovery for Azure Databricks clusters.
TIP
If you provide your own customer-managed key to deploy an Azure Machine Learning workspace, Azure Cosmos DB is
also provisioned within your subscription. In that case, you're responsible for configuring its high-availability settings. See
High availability with Azure Cosmos DB.
Design for high availability
Deploy critical components to multiple regions
Determine the level of business continuity that you are aiming for. The level may differ between the components
of your solution. For example, you may want to have a hot/hot configuration for production pipelines or model
deployments, and hot/cold for experimentation.
Manage training data on isolated storage
By keeping your data storage isolated from the default storage the workspace uses for logs, you can:
Attach the same storage instances as datastores to the primary and secondary workspaces.
Make use of geo-replication for data storage accounts and maximize your uptime.
Manage machine learning artifacts as code
Runs in Azure Machine Learning are defined by a run specification. This specification includes dependencies on
input artifacts that are managed on a workspace-instance level, including environments, datasets, and compute.
For multi-region run submission and deployments, we recommend the following practices:
Manage your code base locally, backed by a Git repository.
Export important notebooks from Azure Machine Learning studio.
Export pipelines authored in studio as code.
NOTE
Pipelines created in studio designer cannot currently be exported as code.
Initiate a failover
Continue work in the failover workspace
When your primary workspace becomes unavailable, you can switch over the secondary workspace to continue
experimentation and development. Azure Machine Learning does not automatically submit runs to the
secondary workspace if there is an outage. Update your code configuration to point to the new workspace
resource. We recommend to avoiding hardcoding workspace references. Instead, use a workspace config file to
minimize manual user steps when changing workspaces. Make sure to also update any automation, such as
continuous integration and deployment pipelines to the new workspace.
Azure Machine Learning cannot sync or recover artifacts or metadata between workspace instances. Dependent
on your application deployment strategy, you might have to move artifacts or recreate experimentation inputs
such as dataset objects in the failover workspace in order to continue run submission. In case you have
configured your primary workspace and secondary workspace resources to share associated resources with
geo-replication enabled, some objects might be directly available to the failover workspace. For example, if both
workspaces share the same docker images, configured datastores, and Azure Key Vault resources. The following
diagram shows a configuration where two workspaces share the same images (1), datastores (2), and Key Vault
(3).
NOTE
Any jobs that are running when a service outage occurs will not automatically transition to the secondary workspace. It is
also unlikely that the jobs will resume and finish successfully in the primary workspace once the outage is resolved.
Instead, these jobs must be resubmitted, either in the secondary workspace or in the primary (once the outage is
resolved).
A RT IFA C T EXP O RT IM P O RT
Azure ML pipelines (code-generated) az ml pipeline get --path {PATH} az ml pipeline create --name {NAME} -
y {PATH}
TIP
Registered datasets cannot be downloaded or moved. This includes datasets generated by Azure ML, such as
intermediate pipeline datasets. However datasets that refer to a shared file location that both workspaces can access,
or where the underlying data storage is replicated, can be registered on both workspaces. Use the az ml dataset
register to register a dataset.
Run outputs are stored in the default storage account associated with a workspace. While run outputs might
become inaccessible from the studio UI in the case of a service outage, you can directly access the data through the
storage account. For more information on working with data stored in blobs, see Create, download, and list blobs with
Azure CLI.
Recovery options
Workspace deletion
If you accidentally deleted your workspace it is currently not possible to recover it. However you are able to
retrieve your existing notebooks from the corresponding storage if you follow these steps:
In the Azure portal navigate to the storage account that was linked to the deleted Azure Machine Learning
workspace.
In the Data storage section on the left, click on File shares .
Your notebooks are located on the file share with the name that contains your workspace ID.
Next steps
To deploy Azure Machine Learning with associated resources with your high-availability settings, use an Azure
Resource Manager template.
Regenerate storage account access keys
5/25/2022 • 5 minutes to read • Edit Online
IMPORTANT
Credentials registered with datastores are saved in your Azure Key Vault associated with the workspace. If you have soft-
delete enabled for your Key Vault, this article provides instructions for updating credentials. If you unregister the
datastore and try to re-register it under the same name, this action will fail. See Turn on Soft Delete for an existing key
vault for how to enable soft delete in this scenario.
Prerequisites
An Azure Machine Learning workspace. For more information, see the Create a workspace article.
The Azure Machine Learning SDK.
The Azure Machine Learning CLI extension v1.
NOTE
The code snippets in this document were tested with version 1.0.83 of the Python SDK.
IMPORTANT
Update the workspace using the Azure CLI, and the datastores using Python, at the same time. Updating only one or the
other is not sufficient, and may cause errors until both are updated.
To discover the storage accounts that are used by your datastores, use the following code:
import azureml.core
from azureml.core import Workspace, Datastore
ws = Workspace.from_config()
default_ds = ws.get_default_datastore()
print("Default datstore: " + default_ds.name + ", storage account name: " +
default_ds.account_name + ", container name: " + default_ds.container_name)
datastores = ws.datastores
for name, ds in datastores.items():
if ds.datastore_type == "AzureBlob":
print("Blob store - datastore name: " + name + ", storage account name: " +
ds.account_name + ", container name: " + ds.container_name)
if ds.datastore_type == "AzureFile":
print("File share - datastore name: " + name + ", storage account name: " +
ds.account_name + ", container name: " + ds.container_name)
This code looks for any registered datastores that use Azure Storage and lists the following information:
Datastore name: The name of the datastore that the storage account is registered under.
Storage account name: The name of the Azure Storage account.
Container: The container in the storage account that is used by this registration.
It also indicates whether the datastore is for an Azure Blob or an Azure File share, as there are different methods
to re-register each type of datastore.
If an entry exists for the storage account that you plan on regenerating access keys for, save the datastore name,
storage account name, and container name.
IMPORTANT
Perform all steps, updating both the workspace using the CLI, and datastores using Python. Updating only one or the
other may cause errors until both are updated.
1. Regenerate the key. For information on regenerating an access key, see Manage storage account access
keys. Save the new key.
2. The Azure Machine Learning workspace will automatically synchronize the new key and begin using it
after an hour. To force the workspace to synch to the new key immediately, use the following steps:
a. To sign in to the Azure subscription that contains your workspace by using the following Azure CLI
command:
az login
TIP
After logging in, you see a list of subscriptions associated with your Azure account. The subscription
information with isDefault: true is the currently activated subscription for Azure CLI commands. This
subscription must be the same one that contains your Azure Machine Learning workspace. You can find
the subscription ID from the Azure portal by visiting the overview page for your workspace. You can also
use the SDK to get the subscription ID from the workspace object. For example,
Workspace.from_config().subscription_id .
To select another subscription, use the az account set -s <subscription name or ID> command and
specify the subscription name or ID to switch to. For more information about subscription selection, see
Use multiple Azure Subscriptions.
b. To update the workspace to use the new key, use the following command. Replace myworkspace
with your Azure Machine Learning workspace name, and replace myresourcegroup with the name
of the Azure resource group that contains the workspace.
TIP
If you get an error message stating that the ml extension isn't installed, use the following command to
install it:
This command automatically syncs the new keys for the Azure storage account used by the
workspace.
3. You can re-register datastore(s) that use the storage account via the SDK or the Azure Machine Learning
studio.
a. To re-register datastores via the Python SDK , use the values from the What needs to be
updated section and the key from step 1 with the following code.
Since overwrite=True is specified, this code overwrites the existing registration and updates it to
use the new key.
b. To re-register datastores via the studio , select Datastores from the left pane of the studio.
a. Select which datastore you want to update.
b. Select the Update credentials button on the top left.
c. Use your new access key from step 1 to populate the form and click Save .
If you are updating credentials for your default datastore , complete this step and repeat
step 2b to resync your new key with the default datastore of the workspace.
Next steps
For more information on registering datastores, see the Datastore class reference.
Monitor Azure Machine Learning
5/25/2022 • 9 minutes to read • Edit Online
When you have critical applications and business processes relying on Azure resources, you want to monitor
those resources for their availability, performance, and operation. This article describes the monitoring data
generated by Azure Machine Learning and how to analyze and alert on this data with Azure Monitor.
TIP
The information in this document is primarily for administrators , as it describes monitoring for the Azure Machine
Learning service and associated Azure services. If you are a data scientist or developer , and want to monitor
information specific to your model training runs, see the following documents:
Start, monitor, and cancel training runs
Log metrics for training runs
Track experiments with MLflow
Visualize runs with TensorBoard
If you want to monitor information generated by models deployed as web services, see Collect model data and Monitor
with Application Insights.
TIP
To understand costs associated with Azure Monitor, see Usage and estimated costs. To understand the time it takes for
your data to appear in Azure Monitor, see Log data ingestion time.
IMPORTANT
Enabling these settings requires additional Azure services (storage account, event hub, or Log Analytics), which may
increase your cost. To calculate an estimated cost, visit the Azure pricing calculator.
You can configure the following logs for Azure Machine Learning:
NOTE
Effective February 2022, the AmlComputeClusterNodeEvent category will be deprecated. We recommend that you
instead use the AmlComputeClusterEvent category.
NOTE
When you enable metrics in a diagnostic setting, dimension information is not currently included as part of the
information sent to a storage account, event hub, or log analytics.
The metrics and logs you can collect are discussed in the following sections.
Analyzing metrics
You can analyze metrics for Azure Machine Learning, along with metrics from other Azure services, by opening
Metrics from the Azure Monitor menu. See Getting started with Azure Metrics Explorer for details on using
this tool.
For a list of the platform metrics collected, see Monitoring Azure Machine Learning data reference metrics.
All metrics for Azure Machine Learning are in the namespace Machine Learning Ser vice Workspace .
For reference, you can see a list of all resource metrics supported in Azure Monitor.
TIP
Azure Monitor metrics data is available for 90 days. However, when creating charts only 30 days can be visualized. For
example, if you want to visualize a 90 day period, you must break it into three charts of 30 days within the 90 day period.
Analyzing logs
Using Azure Monitor Log Analytics requires you to create a diagnostic configuration and enable Send
information to Log Analytics . For more information, see the Collection and routing section.
Data in Azure Monitor Logs is stored in tables, with each table having its own set of unique properties. Azure
Machine Learning stores data in the following tables:
TA B L E DESC RIP T IO N
NOTE
Effective February 2022, the AmlComputeClusterNodeEvent table will be deprecated. We recommend that you instead
use the AmlComputeClusterEvent table.
IMPORTANT
When you select Logs from the Azure Machine Learning menu, Log Analytics is opened with the query scope set to the
current workspace. This means that log queries will only include data from that resource. If you want to run a query that
includes data from other databases or data from other Azure services, select Logs from the Azure Monitor menu. See
Log query scope and time range in Azure Monitor Log Analytics for details.
For a detailed reference of the logs and metrics, see Azure Machine Learning monitoring data reference.
Sample Kusto queries
IMPORTANT
When you select Logs from the [service-name] menu, Log Analytics is opened with the query scope set to the current
Azure Machine Learning workspace. This means that log queries will only include data from that resource. If you want to
run a query that includes data from other workspaces or data from other Azure services, select Logs from the Azure
Monitor menu. See Log query scope and time range in Azure Monitor Log Analytics for details.
Following are queries that you can use to help you monitor your Azure Machine Learning resources:
Get failed jobs in the last five days:
AmlComputeJobEvent
| where TimeGenerated > ago(5d) and EventType == "JobFailed"
| project TimeGenerated , ClusterId , EventType , ExecutionState , ToolType
AmlComputeJobEvent
| where JobName == "automl_a9940991-dedb-4262-9763-2fd08b79d8fb_setup"
| project TimeGenerated , ClusterId , EventType , ExecutionState , ToolType
Get cluster events in the last five days for clusters where the VM size is Standard_D1_V2:
AmlComputeClusterEvent
| where TimeGenerated > ago(4d) and VmSize == "STANDARD_D1_V2"
| project ClusterName , InitialNodeCount , MaximumNodeCount , QuotaAllocated , QuotaUtilized
AmlComputeClusterEvent
| where TimeGenerated > ago(8d) and TargetNodeCount > CurrentNodeCount
| project TimeGenerated, ClusterName, CurrentNodeCount, TargetNodeCount
When you connect multiple Azure Machine Learning workspaces to the same Log Analytics workspace, you can
query across all resources.
Get number of running nodes across workspaces and clusters in the last day:
AmlComputeClusterEvent
| where TimeGenerated > ago(1d)
| summarize avgRunningNodes=avg(TargetNodeCount), maxRunningNodes=max(TargetNodeCount)
by Workspace=tostring(split(_ResourceId, "/")[8]), ClusterName, ClusterType, VmSize,
VmPriority
Model Deploy Failed Aggregation type: Total, Operator: When one or more model
Greater than, Threshold value: 0 deployments have failed
Quota Utilization Percentage Aggregation type: Average, Operator: When the quota utilization percentage
Greater than, Threshold value: 90 is greater than 90%
Unusable Nodes Aggregation type: Total, Operator: When there are one or more unusable
Greater than, Threshold value: 0 nodes
Next steps
For a reference of the logs and metrics, see Monitoring Azure Machine Learning data reference.
For information on working with quotas related to Azure Machine Learning, see Manage and request quotas
for Azure resources.
For details on monitoring Azure resources, see Monitoring Azure resources with Azure Monitor.
Secure code best practices with Azure Machine
Learning
5/25/2022 • 2 minutes to read • Edit Online
In Azure Machine Learning, you can upload files and content from any source into Azure. Content within Jupyter
notebooks or scripts that you load can potentially read data from your sessions, access data within your
organization in Azure, or run malicious processes on your behalf.
IMPORTANT
Only run notebooks or scripts from trusted sources. For example, where you or your security team have reviewed the
notebook or script.
Potential threats
Development with Azure Machine Learning often involves web-based development environments (Notebooks
& Azure ML studio). When using web-based development environments, the potential threats are:
Cross site scripting (XSS)
DOM injection : This type of attack can modify the UI displayed in the browser. For example, by
changing how the run button behaves in a Jupyter Notebook.
Access token/cookies : XSS attacks can also access local storage and browser cookies. Your Azure
Active Directory (AAD) authentication token is stored in local storage. An XSS attack could use this
token to make API calls on your behalf, and then send the data to an external system or API.
Cross site request forgery (CSRF): This attack may replace the URL of an image or link with the URL of a
malicious script or API. When the image is loaded, or link clicked, a call is made to the URL.
Next steps
Enterprise security for Azure Machine Learning
Audit and manage Azure Machine Learning
5/25/2022 • 7 minutes to read • Edit Online
When teams collaborate on Azure Machine Learning, they may face varying requirements to the configuration
and organization of resources. Machine learning teams may look for flexibility in how to organize workspaces
for collaboration, or size compute clusters to the requirements of their use cases. In these scenarios, it may lead
to most productivity if the application team can manage their own infrastructure.
As a platform administrator, you can use policies to lay out guardrails for teams to manage their own resources.
Azure Policy helps audit and govern resource state. In this article, you learn about available auditing controls
and governance practices for Azure Machine Learning.
P O L IC Y DESC RIP T IO N
Private endpoint Configure the Azure Virtual Network subnet where the
private endpoint should be created.
Private DNS zone Configure the private DNS zone to use for the private link.
Disable public network access Audit or enforce whether workspaces disable access from the
public internet.
Disable local authentication Audit or enforce whether Azure Machine Learning compute
resources should have local authentication methods
disabled.
Compute cluster and instance is behind vir tual Audit whether compute resources are behind a virtual
network network.
Policies can be set at different scopes, such as at the subscription or resource group level. For more information,
see the Azure Policy documentation.
Next steps
Azure Policy documentation
Built-in policies for Azure Machine Learning
Working with security policies with Microsoft Defender for Cloud
The Cloud Adoption Framework scenario for data management and analytics outlines considerations in
running data and analytics workloads in the cloud.
Cloud Adoption Framework data landing zones provide a reference implementation for managing data and
analytics workloads in Azure.
Learn how to use policy to integrate Azure Private Link with Azure Private DNS zones, to manage private link
configuration for the workspace and dependent resources.
Manage Azure Machine Learning workspaces in the
portal or with the Python SDK
5/25/2022 • 15 minutes to read • Edit Online
In this article, you create, view, and delete Azure Machine Learning workspaces for Azure Machine Learning,
using the Azure portal or the SDK for Python
As your needs change or requirements for automation increase you can also manage workspaces using the CLI,
or via the VS Code extension.
Prerequisites
An Azure subscription. If you don't have an Azure subscription, create a free account before you begin. Try
the free or paid version of Azure Machine Learning today.
If using the Python SDK, install the SDK.
Limitations
When creating a new workspace, you can either automatically create services needed by the workspace
or use existing services. If you want to use existing ser vices from a different Azure subscription
than the workspace, you must register the Azure Machine Learning namespace in the subscription that
contains those services. For example, creating a workspace in subscription A that uses a storage account
from subscription B, the Azure Machine Learning namespace must be registered in subscription B before
you can use the storage account with the workspace.
The resource provider for Azure Machine Learning is Microsoft.MachineLearningSer vices . For
information on how to see if it is registered and how to register it, see the Azure resource providers and
types article.
IMPORTANT
This only applies to resources provided during workspace creation; Azure Storage Accounts, Azure Container
Register, Azure Key Vault, and Application Insights.
By default, creating a workspace also creates an Azure Container Registry (ACR). Since ACR does not
currently support unicode characters in resource group names, use a resource group that does not
contain these characters.
Azure Machine Learning does not support hierarchical namespace (Azure Data Lake Storage Gen2
feature) for the workspace's default storage account.
TIP
An Azure Application Insights instance is created when you create the workspace. You can delete the Application Insights
instance after cluster creation if you want. Deleting it limits the information gathered from the workspace, and may make
it more difficult to troubleshoot problems. If you delete the Application Insights instance created by the
workspace, you cannot re-create it without deleting and recreating the workspace .
For more information on using this Application Insights instance, see Monitor and collect data from Machine Learning
web service endpoints.
Create a workspace
Python
Portal
ws = Workspace.create(name='myworkspace',
subscription_id='<azure-subscription-id>',
resource_group='myresourcegroup',
create_resource_group=True,
location='eastus2'
)
Set create_resource_group to False if you have an existing Azure resource group that you want to use for
the workspace.
Multiple tenants. If you have multiple accounts, add the tenant ID of the Azure Active Directory you
wish to use. Find your tenant ID from the Azure portal under Azure Active Director y, External
Identities .
interactive_auth = InteractiveLoginAuthentication(tenant_id="my-tenant-id")
ws = Workspace.create(name='myworkspace',
subscription_id='<azure-subscription-id>',
resource_group='myresourcegroup',
create_resource_group=True,
location='eastus2',
auth=interactive_auth
)
Sovereign cloud . You'll need extra code to authenticate to Azure if you're working in a sovereign cloud.
from azureml.core.authentication import InteractiveLoginAuthentication
from azureml.core import Workspace
Use existing Azure resources . You can also create a workspace that uses existing Azure resources with
the Azure resource ID format. Find the specific Azure resource IDs in the Azure portal or with the SDK.
This example assumes that the resource group, storage account, key vault, App Insights, and container
registry already exist.
import os
from azureml.core import Workspace
from azureml.core.authentication import ServicePrincipalAuthentication
service_principal_password = os.environ.get("AZUREML_PASSWORD")
service_principal_auth = ServicePrincipalAuthentication(
tenant_id="<tenant-id>",
username="<application-id>",
password=service_principal_password)
auth=service_principal_auth,
subscription_id='<azure-subscription-id>',
resource_group='myresourcegroup',
create_resource_group=False,
location='eastus2',
friendly_name='My workspace',
storage_account='subscriptions/<azure-subscription-
id>/resourcegroups/myresourcegroup/providers/microsoft.storage/storageaccounts/mystorageaccount',
key_vault='subscriptions/<azure-subscription-
id>/resourcegroups/myresourcegroup/providers/microsoft.keyvault/vaults/mykeyvault',
app_insights='subscriptions/<azure-subscription-
id>/resourcegroups/myresourcegroup/providers/microsoft.insights/components/myappinsights',
container_registry='subscriptions/<azure-subscription-
id>/resourcegroups/myresourcegroup/providers/microsoft.containerregistry/registries/mycontainerregist
ry',
exist_ok=False)
IMPORTANT
For more information on using a private endpoint and virtual network with your workspace, see Network isolation and
privacy.
Python
Portal
The Azure Machine Learning Python SDK provides the PrivateEndpointConfig class, which can be used with
Workspace.create() to create a workspace with a private endpoint. This class requires an existing virtual network.
Vulnerability scanning
Microsoft Defender for Cloud provides unified security management and advanced threat protection across
hybrid cloud workloads. You should allow Microsoft Defender for Cloud to scan your resources and follow its
recommendations. For more, see Azure Container Registry image scanning by Defender for Cloud and Azure
Kubernetes Services integration with Defender for Cloud.
Advanced
By default, metadata for the workspace is stored in an Azure Cosmos DB instance that Microsoft maintains. This
data is encrypted using Microsoft-managed keys.
To limit the data that Microsoft collects on your workspace, select High business impact workspace in the
portal, or set hbi_workspace=true in Python. For more information on this setting, see Encryption at rest.
IMPORTANT
Selecting high business impact can only be done when creating a workspace. You cannot change this setting after
workspace creation.
IMPORTANT
Before following these steps, you must first perform the following actions:
Follow the steps in Configure customer-managed keys to:
Register the Azure Cosmos DB provider
Create and configure an Azure Key Vault
Generate a key
Python
Portal
Python
Portal
If you plan to use code on your local environment that references this workspace ( ws ), write the configuration
file:
ws.write_config()
Place the file into the directory structure with your Python scripts or Jupyter Notebooks. It can be in the same
directory, a subdirectory named .azureml, or in a parent directory. When you create a compute instance, this file
is added to the correct directory on the VM for you.
Connect to a workspace
APPLIES TO: Python SDK azureml v1
In your Python code, you create a workspace object to connect to your workspace. This code will read the
contents of the configuration file to find your workspace. You will get a prompt to sign in if you are not already
authenticated.
ws = Workspace.from_config()
Multiple tenants. If you have multiple accounts, add the tenant ID of the Azure Active Directory you
wish to use. Find your tenant ID from the Azure portal under Azure Active Director y, External
Identities .
interactive_auth = InteractiveLoginAuthentication(tenant_id="my-tenant-id")
ws = Workspace.from_config(auth=interactive_auth)
Sovereign cloud . You'll need extra code to authenticate to Azure if you're working in a sovereign cloud.
APPLIES TO: Python SDK azureml v1
If you have problems in accessing your subscription, see Set up authentication for Azure Machine Learning
resources and workflows, as well as the Authentication in Azure Machine Learning notebook.
Find a workspace
See a list of all the workspaces you can use.
Python
Portal
Workspace.list('<subscription-id>')
The Workspace.list(..) method does not return the full workspace object. It includes only basic information about
existing workspaces in the subscription. To get a full object for specific workspace, use Workspace.get(..).
If you've used this feature in a previous update, a search result error may occur. Reselect your preferred
workspaces in the Directory + Subscription + Workspace tab.
IMPORTANT
Search results may be unexpected for multiword terms in other languages (ex. Chinese characters).
Delete a workspace
When you no longer need a workspace, delete it.
WARNING
Once an Azure Machine Learning workspace has been deleted, it cannot be recovered.
If you accidentally deleted your workspace, you may still be able to retrieve your notebooks. For details, see
Failover for business continuity and disaster recovery.
Python
Portal
ws.delete(delete_dependent_resources=False, no_wait=False)
The default action is not to delete resources associated with the workspace, that is, container registry, storage
account, key vault, and application insights. Set delete_dependent_resources to True to delete these resources as
well.
Clean up resources
IMPORTANT
The resources that you created can be used as prerequisites to other Azure Machine Learning tutorials and how-to
articles.
If you don't plan to use any of the resources that you created, delete them so you don't incur any charges:
1. In the Azure portal, select Resource groups on the far left.
2. From the list, select the resource group that you created.
3. Select Delete resource group .
Troubleshooting
Suppor ted browsers in Azure Machine Learning studio : We recommend that you use the most
up-to-date browser that's compatible with your operating system. The following browsers are supported:
Microsoft Edge (The new Microsoft Edge, latest version. Not Microsoft Edge legacy)
Safari (latest version, Mac only)
Chrome (latest version)
Firefox (latest version)
Azure por tal :
If you go directly to your workspace from a share link from the SDK or the Azure portal, you can't view
the standard Over view page that has subscription information in the extension. In this scenario, you
also can't switch to another workspace. To view another workspace, go directly to Azure Machine
Learning studio and search for the workspace name.
All assets (Datasets, Experiments, Computes, and so on) are available only in Azure Machine Learning
studio. They're not available from the Azure portal.
Attempting to export a template for a workspace from the Azure portal may return an error similar to
the following text:
Could not get resource of the type <type>. Resources of this type will not be exported. As a
workaround, use one of the templates provided at https://github.com/Azure/azure-quickstart-
templates/tree/master/quickstarts/microsoft.machinelearningservices as the basis for your template.
Workspace diagnostics
You can run diagnostics on your workspace from Azure Machine Learning studio or the Python SDK. After
diagnostics run, a list of any detected problems is returned. This list includes links to possible solutions. For
more information, see How to use workspace diagnostics.
Resource provider errors
When creating an Azure Machine Learning workspace, or a resource used by the workspace, you may receive an
error similar to the following messages:
No registered resource provider found for location {location}
The subscription is not registered to use namespace {resource-provider-namespace}
Most resource providers are automatically registered, but not all. If you receive this message, you need to
register the provider mentioned.
The following table contains a list of the resource providers required by Azure Machine Learning:
Microsoft.Storage Azure Storage Account is used as the default storage for the
workspace.
If you plan on using a customer-managed key with Azure Machine Learning, then the following service
providers must be registered:
For information on registering resource providers, see Resolve errors for resource provider registration.
Deleting the Azure Container Registry
The Azure Machine Learning workspace uses Azure Container Registry (ACR) for some operations. It will
automatically create an ACR instance when it first needs one.
WARNING
Once an Azure Container Registry has been created for a workspace, do not delete it. Doing so will break your Azure
Machine Learning workspace.
Examples
Examples of creating a workspace:
Use Azure portal to create a workspace and compute instance
Next steps
Once you have a workspace, learn how to Train and deploy a model.
To learn more about planning a workspace for your organization's requirements, see Organize and set up Azure
Machine Learning.
To check for problems with your workspace, see How to use workspace diagnostics.
If you need to move a workspace to another Azure subscription, see How to move a workspace.
Manage Azure Machine Learning workspaces using
Azure CLI
5/25/2022 • 17 minutes to read • Edit Online
In this article, you learn how to create and manage Azure Machine Learning workspaces using the Azure CLI.
The Azure CLI provides commands for managing Azure resources and is designed to get you working quickly
with Azure, with an emphasis on automation. The machine learning extension to the CLI provides commands for
working with Azure Machine Learning resources.
NOTE
Examples in this article refer to both CLI v1 and CLI v2 versions. If no version is specified for a command, it will work with
either the v1 or CLI v2. The machine learning CLI v2 is currently in public preview. This preview version is provided
without a service-level agreement, and it's not recommended for production workloads.
Prerequisites
An Azure subscription . If you do not have one, try the free or paid version of Azure Machine Learning.
To use the CLI commands in this document from your local environment , you need the Azure CLI.
If you use the Azure Cloud Shell, the CLI is accessed through the browser and lives in the cloud.
Limitations
When creating a new workspace, you can either automatically create services needed by the workspace
or use existing services. If you want to use existing ser vices from a different Azure subscription
than the workspace, you must register the Azure Machine Learning namespace in the subscription that
contains those services. For example, creating a workspace in subscription A that uses a storage account
from subscription B, the Azure Machine Learning namespace must be registered in subscription B before
you can use the storage account with the workspace.
The resource provider for Azure Machine Learning is Microsoft.MachineLearningSer vices . For
information on how to see if it is registered and how to register it, see the Azure resource providers and
types article.
IMPORTANT
This only applies to resources provided during workspace creation; Azure Storage Accounts, Azure Container
Register, Azure Key Vault, and Application Insights.
TIP
An Azure Application Insights instance is created when you create the workspace. You can delete the Application Insights
instance after cluster creation if you want. Deleting it limits the information gathered from the workspace, and may make
it more difficult to troubleshoot problems. If you delete the Application Insights instance created by the
workspace, you cannot re-create it without deleting and recreating the workspace .
For more information on using this Application Insights instance, see Monitor and collect data from Machine Learning
web service endpoints.
With the Azure Machine Learning CLI extension v1 ( azure-cli-ml ), only some of the commands communicate
with the Azure Resource Manager. Specifically, commands that create, update, delete, list, or show Azure
resources. Operations such as submitting a training job communicate directly with the Azure Machine Learning
workspace. If your workspace is secured with a private endpoint , that is enough to secure
commands provided by the azure-cli-ml extension .
There are several ways that you can authenticate to your Azure subscription from the CLI. The most simple is to
interactively authenticate using a browser. To authenticate interactively, open a command line or terminal and
use the following command:
az login
If the CLI can open your default browser, it will do so and load a sign-in page. Otherwise, you need to open a
browser and follow the instructions on the command line. The instructions involve browsing to
https://aka.ms/devicelogin and entering an authorization code.
TIP
After logging in, you see a list of subscriptions associated with your Azure account. The subscription information with
isDefault: true is the currently activated subscription for Azure CLI commands. This subscription must be the same
one that contains your Azure Machine Learning workspace. You can find the subscription ID from the Azure portal by
visiting the overview page for your workspace. You can also use the SDK to get the subscription ID from the workspace
object. For example, Workspace.from_config().subscription_id .
To select another subscription, use the az account set -s <subscription name or ID> command and specify the
subscription name or ID to switch to. For more information about subscription selection, see Use multiple Azure
Subscriptions.
For other methods of authenticating, see Sign in with Azure CLI.
NOTE
You should select a region where Azure Machine Learning is available. For information, see Products available by region.
The response from this command is similar to the following JSON. You can use the output values to locate the
created resources or parse them as input to subsequent CLI steps for automation.
{
"id": "/subscriptions/<subscription-GUID>/resourceGroups/<resourcegroupname>",
"location": "<location>",
"managedBy": null,
"name": "<resource-group-name>",
"properties": {
"provisioningState": "Succeeded"
},
"tags": null,
"type": null
}
Create a workspace
When you deploy an Azure Machine Learning workspace, various other services are required as dependent
associated resources. When you use the CLI to create the workspace, the CLI can either create new associated
resources on your behalf or you could attach existing resources.
IMPORTANT
When attaching your own storage account, make sure that it meets the following criteria:
The storage account is not a premium account (Premium_LRS and Premium_GRS)
Both Azure Blob and Azure File capabilities enabled
Hierarchical Namespace (ADLS Gen 2) is disabled These requirements are only for the default storage account used by
the workspace.
When attaching Azure container registry, you must have the admin account enabled before it can be used with an Azure
Machine Learning workspace.
IMPORTANT
When you attaching existing resources, you don't have to specify all. You can specify one or more. For example, you can
specify an existing storage account and the workspace will create the other resources.
The output of the workspace creation command is similar to the following JSON. You can use the output values
to locate the created resources or parse them as input to subsequent CLI steps.
{
"applicationInsights": "/subscriptions/<service-GUID>/resourcegroups/<resource-group-
name>/providers/microsoft.insights/components/<application-insight-name>",
"containerRegistry": "/subscriptions/<service-GUID>/resourcegroups/<resource-group-
name>/providers/microsoft.containerregistry/registries/<acr-name>",
"creationTime": "2019-08-30T20:24:19.6984254+00:00",
"description": "",
"friendlyName": "<workspace-name>",
"id": "/subscriptions/<service-GUID>/resourceGroups/<resource-group-
name>/providers/Microsoft.MachineLearningServices/workspaces/<workspace-name>",
"identityPrincipalId": "<GUID>",
"identityTenantId": "<GUID>",
"identityType": "SystemAssigned",
"keyVault": "/subscriptions/<service-GUID>/resourcegroups/<resource-group-
name>/providers/microsoft.keyvault/vaults/<key-vault-name>",
"location": "<location>",
"name": "<workspace-name>",
"resourceGroup": "<resource-group-name>",
"storageAccount": "/subscriptions/<service-GUID>/resourcegroups/<resource-group-
name>/providers/microsoft.storage/storageaccounts/<storage-account-name>",
"type": "Microsoft.MachineLearningServices/workspaces",
"workspaceid": "<GUID>"
}
Advanced configurations
Configure workspace for private network connectivity
Dependent on your use case and organizational requirements, you can choose to configure Azure Machine
Learning using private network connectivity. You can use the Azure CLI to deploy a workspace and a Private link
endpoint for the workspace resource. For more information on using a private endpoint and virtual network
(VNet) with your workspace, see Virtual network isolation and privacy overview. For complex resource
configurations, also refer to template based deployment options including Azure Resource Manager.
CLI v1
CLI v2 - preview
For more details on how to use these commands, see the CLI reference pages.
Customer-managed key and high business impact workspace
By default, metadata for the workspace is stored in an Azure Cosmos DB instance that Microsoft maintains. This
data is encrypted using Microsoft-managed keys. Instead of using the Microsoft-managed key, you can also
provide your own key. Doing so creates an additional set of resources in your Azure subscription to store your
data.
To learn more about the resources that are created when you bring your own key for encryption, see Data
encryption with Azure Machine Learning.
Below CLI commands provide examples for creating a workspace that uses customer-managed keys for
encryption using the CLI v1 and CLI v2 versions.
CLI v1
CLI v2 - preview
NOTE
Authorize the Machine Learning App (in Identity and Access Management) with contributor permissions on your
subscription to manage the data encryption additional resources.
NOTE
Azure Cosmos DB is not used to store information such as model performance, information logged by experiments, or
information logged from your model deployments. For more information on monitoring these items, see the Monitoring
and logging section of the architecture and concepts article.
IMPORTANT
Selecting high business impact can only be done when creating a workspace. You cannot change this setting after
workspace creation.
For more information on customer-managed keys and high business impact workspace, see Enterprise security
for Azure Machine Learning.
CLI v1
CLI v2 - preview
CLI v1
CLI v2 - preview
CLI v1
CLI v2 - preview
WARNING
Once an Azure Machine Learning workspace has been deleted, it cannot be recovered.
CLI v1
CLI v2 - preview
IMPORTANT
Deleting a workspace does not delete the application insight, storage account, key vault, or container registry used by the
workspace.
You can also delete the resource group, which deletes the workspace and all other Azure resources in the
resource group. To delete the resource group, use the following command:
Troubleshooting
Resource provider errors
When creating an Azure Machine Learning workspace, or a resource used by the workspace, you may receive an
error similar to the following messages:
No registered resource provider found for location {location}
The subscription is not registered to use namespace {resource-provider-namespace}
Most resource providers are automatically registered, but not all. If you receive this message, you need to
register the provider mentioned.
The following table contains a list of the resource providers required by Azure Machine Learning:
Microsoft.Storage Azure Storage Account is used as the default storage for the
workspace.
If you plan on using a customer-managed key with Azure Machine Learning, then the following service
providers must be registered:
For information on registering resource providers, see Resolve errors for resource provider registration.
Moving the workspace
WARNING
Moving your Azure Machine Learning workspace to a different subscription, or moving the owning subscription to a new
tenant, is not supported. Doing so may cause errors.
WARNING
Once an Azure Container Registry has been created for a workspace, do not delete it. Doing so will break your Azure
Machine Learning workspace.
Next steps
For more information on the Azure CLI extension for machine learning, see the az ml documentation.
To check for problems with your workspace, see How to use workspace diagnostics.
To learn how to move a workspace to a new Azure subscription, see How to move a workspace.
Use an Azure Resource Manager template to create
a workspace for Azure Machine Learning
5/25/2022 • 15 minutes to read • Edit Online
In this article, you learn several ways to create an Azure Machine Learning workspace using Azure Resource
Manager templates. A Resource Manager template makes it easy to create resources as a single, coordinated
operation. A template is a JSON document that defines the resources that are needed for a deployment. It may
also specify deployment parameters. Parameters are used to provide input values when using the template.
For more information, see Deploy an application with Azure Resource Manager template.
Prerequisites
An Azure subscription . If you do not have one, try the free or paid version of Azure Machine Learning.
To use a template from a CLI, you need either Azure PowerShell or the Azure CLI.
Limitations
When creating a new workspace, you can either automatically create services needed by the workspace
or use existing services. If you want to use existing ser vices from a different Azure subscription
than the workspace, you must register the Azure Machine Learning namespace in the subscription that
contains those services. For example, creating a workspace in subscription A that uses a storage account
from subscription B, the Azure Machine Learning namespace must be registered in subscription B before
you can use the storage account with the workspace.
The resource provider for Azure Machine Learning is Microsoft.MachineLearningSer vices . For
information on how to see if it is registered and how to register it, see the Azure resource providers and
types article.
IMPORTANT
This only applies to resources provided during workspace creation; Azure Storage Accounts, Azure Container
Register, Azure Key Vault, and Application Insights.
NOTE
The workspace name is case-insensitive.
TIP
While the template associated with this document creates a new Azure Container Registry, you can also create a new
workspace without creating a container registry. One will be created when you perform an operation that requires a
container registry. For example, training or deploying a model.
You can also reference an existing container registry or storage account in the Azure Resource Manager template, instead
of creating a new one. When doing so, you must either use a managed identity (preview), or enable the admin account for
the container registry.
WARNING
Once an Azure Container Registry has been created for a workspace, do not delete it. Doing so will break your Azure
Machine Learning workspace.
Deploy template
To deploy your template you have to create a resource group.
See the Azure portal section if you prefer using the graphical user interface.
Azure CLI
Azure PowerShell
az group create --name "examplegroup" --location "eastus"
Once your resource group is successfully created, deploy the template with the following command:
Azure CLI
Azure PowerShell
By default, all of the resources created as part of the template are new. However, you also have the option of
using existing resources. By providing additional parameters to the template, you can use existing resources. For
example, if you want to use an existing storage account set the storageAccountOption value to existing and
provide the name of your storage account in the storageAccountName parameter.
IMPORTANT
If you want to use an existing Azure Storage account, it cannot be a premium account (Premium_LRS and Premium_GRS).
It also cannot have a hierarchical namespace (used with Azure Data Lake Storage Gen2). Neither premium storage or
hierarchical namespace are supported with the default storage account of the workspace. Neither premium storage or
hierarchical namespaces are supported with the default storage account of the workspace. You can use premium storage
or hierarchical namespace with non-default storage accounts.
Azure CLI
Azure PowerShell
IMPORTANT
There are some specific requirements your subscription must meet before using this template:
You must have an existing Azure Key Vault that contains an encryption key.
The Azure Key Vault must be in the same region where you plan to create the Azure Machine Learning workspace.
You must specify the ID of the Azure Key Vault and the URI of the encryption key.
For steps on creating the vault and key, see Configure customer-managed keys.
To get the values for the cmk_keyvault (ID of the Key Vault) and the resource_cmk_uri (key URI) parameters
needed by this template, use the following steps:
1. To get the Key Vault ID, use the following command:
Azure CLI
Azure PowerShell
az keyvault key show --vault-name <keyvault-name> --name <key-name> --query 'key.kid' --output tsv
IMPORTANT
Once a workspace has been created, you cannot change the settings for confidential data, encryption, key vault ID, or key
identifiers. To change these values, you must create a new workspace using the new values.
To enable use of Customer Managed Keys, set the following parameters when deploying the template:
encr yption_status to Enabled .
cmk_keyvault to the cmk_keyvault value obtained in previous steps.
resource_cmk_uri to the resource_cmk_uri value obtained in previous steps.
Azure CLI
Azure PowerShell
When using a customer-managed key, Azure Machine Learning creates a secondary resource group which
contains the Cosmos DB instance. For more information, see encryption at rest - Cosmos DB.
An additional configuration you can provide for your data is to set the confidential_data parameter to true .
Doing so, does the following:
Starts encrypting the local scratch disk for Azure Machine Learning compute clusters, providing you have
not created any previous clusters in your subscription. If you have previously created a cluster in the
subscription, open a support ticket to have encryption of the scratch disk enabled for your compute
clusters.
Cleans up the local scratch disk between runs.
Securely passes credentials for the storage account, container registry, and SSH account from the
execution layer to your compute clusters by using key vault.
Enables IP filtering to ensure the underlying batch pools cannot be called by any external services other
than AzureMachineLearningService.
IMPORTANT
Once a workspace has been created, you cannot change the settings for confidential data, encryption, key vault
ID, or key identifiers. To change these values, you must create a new workspace using the new values.
IMPORTANT
For container registry, only the 'Premium' sku is supported.
IMPORTANT
Application Insights does not support deployment behind a virtual network.
Azure CLI
Azure PowerShell
Azure CLI
Azure PowerShell
Alternatively, you can deploy multiple or all dependent resources behind a virtual network.
Azure CLI
Azure PowerShell
IMPORTANT
Subnet should have Microsoft.Storage service endpoint
IMPORTANT
Subnets do not allow creation of private endpoints. Disable private endpoint to enable subnet.
Azure CLI
Azure PowerShell
Azure CLI
Azure PowerShell
3. When the template appears, provide the following required information and any other parameters
depending on your deployment scenario.
Subscription: Select the Azure subscription to use for these resources.
Resource group: Select or create a resource group to contain the services.
Region: Select the Azure region where the resources will be created.
Workspace name: The name to use for the Azure Machine Learning workspace that will be created.
The workspace name must be between 3 and 33 characters. It may only contain alphanumeric
characters and '-'.
Location: Select the location where the resources will be created.
4. Select Review + create .
5. In the Review + create screen, agree to the listed terms and conditions and select Create .
For more information, see Deploy resources from custom template.
Troubleshooting
Resource provider errors
When creating an Azure Machine Learning workspace, or a resource used by the workspace, you may receive an
error similar to the following messages:
No registered resource provider found for location {location}
The subscription is not registered to use namespace {resource-provider-namespace}
Most resource providers are automatically registered, but not all. If you receive this message, you need to
register the provider mentioned.
The following table contains a list of the resource providers required by Azure Machine Learning:
Microsoft.Storage Azure Storage Account is used as the default storage for the
workspace.
If you plan on using a customer-managed key with Azure Machine Learning, then the following service
providers must be registered:
For information on registering resource providers, see Resolve errors for resource provider registration.
Azure Key Vault access policy and Azure Resource Manager templates
When you use an Azure Resource Manager template to create the workspace and associated resources
(including Azure Key Vault), multiple times. For example, using the template multiple times with the same
parameters as part of a continuous integration and deployment pipeline.
Most resource creation operations through templates are idempotent, but Key Vault clears the access policies
each time the template is used. Clearing the access policies breaks access to the Key Vault for any existing
workspace that is using it. For example, Stop/Create functionalities of Azure Notebooks VM may fail.
To avoid this problem, we recommend one of the following approaches:
Do not deploy the template more than once for the same parameters. Or delete the existing resources
before using the template to recreate them.
Examine the Key Vault access policies and then use these policies to set the accessPolicies property of
the template. To view the access policies, use the following Azure CLI command:
For more information on using the accessPolicies section of the template, see the AccessPolicyEntry
object reference.
Check if the Key Vault resource already exists. If it does, do not recreate it through the template. For
example, to use the existing Key Vault instead of creating a new one, make the following changes to the
template:
Add a parameter that accepts the ID of an existing Key Vault resource:
"keyVaultId":{
"type": "string",
"metadata": {
"description": "Specify the existing Key Vault ID."
}
}
{
"type": "Microsoft.KeyVault/vaults",
"apiVersion": "2018-02-14",
"name": "[variables('keyVaultName')]",
"location": "[parameters('location')]",
"properties": {
"tenantId": "[variables('tenantId')]",
"sku": {
"name": "standard",
"family": "A"
},
"accessPolicies": [
]
}
},
{
"type": "Microsoft.MachineLearningServices/workspaces",
"apiVersion": "2019-11-01",
"name": "[parameters('workspaceName')]",
"location": "[parameters('location')]",
"dependsOn": [
"[resourceId('Microsoft.Storage/storageAccounts', variables('storageAccountName'))]",
"[resourceId('Microsoft.Insights/components', variables('applicationInsightsName'))]"
],
"identity": {
"type": "systemAssigned"
},
"sku": {
"tier": "[parameters('sku')]",
"name": "[parameters('sku')]"
},
"properties": {
"friendlyName": "[parameters('workspaceName')]",
"keyVault": "[parameters('keyVaultId')]",
"applicationInsights": "
[resourceId('Microsoft.Insights/components',variables('applicationInsightsName'))]",
"storageAccount": "
[resourceId('Microsoft.Storage/storageAccounts/',variables('storageAccountName'))]"
}
}
After these changes, you can specify the ID of the existing Key Vault resource when running the template.
The template will then reuse the Key Vault by setting the keyVault property of the workspace to its ID.
To get the ID of the Key Vault, you can reference the output of the original template run or use the Azure
CLI. The following command is an example of using the Azure CLI to get the Key Vault resource ID:
/subscriptions/{subscription-
guid}/resourceGroups/myresourcegroup/providers/Microsoft.KeyVault/vaults/mykeyvault
Next steps
Deploy resources with Resource Manager templates and Resource Manager REST API.
Creating and deploying Azure resource groups through Visual Studio.
For other templates related to Azure Machine Learning, see the Azure Quickstart Templates repository.
How to use workspace diagnostics.
Move an Azure Machine Learning workspace to another subscription.
Manage Azure Machine Learning workspaces using
Terraform
5/25/2022 • 10 minutes to read • Edit Online
In this article, you learn how to create and manage an Azure Machine Learning workspace using Terraform
configuration files. Terraform's template-based configuration files enable you to define, create, and configure
Azure resources in a repeatable and predictable manner. Terraform tracks resource state and is able to clean up
and destroy resources.
A Terraform configuration is a document that defines the resources that are needed for a deployment. It may
also specify deployment variables. Variables are used to provide input values when using the configuration.
Prerequisites
An Azure subscription . If you don't have one, try the free or paid version of Azure Machine Learning.
An installed version of the Azure CLI.
Configure Terraform: follow the directions in this article and the Terraform and configure access to Azure
article.
Limitations
When creating a new workspace, you can either automatically create services needed by the workspace
or use existing services. If you want to use existing ser vices from a different Azure subscription
than the workspace, you must register the Azure Machine Learning namespace in the subscription that
contains those services. For example, creating a workspace in subscription A that uses a storage account
from subscription B, the Azure Machine Learning namespace must be registered in subscription B before
you can use the storage account with the workspace.
The resource provider for Azure Machine Learning is Microsoft.MachineLearningSer vices . For
information on how to see if it is registered and how to register it, see the Azure resource providers and
types article.
IMPORTANT
This only applies to resources provided during workspace creation; Azure Storage Accounts, Azure Container
Register, Azure Key Vault, and Application Insights.
TIP
An Azure Application Insights instance is created when you create the workspace. You can delete the Application Insights
instance after cluster creation if you want. Deleting it limits the information gathered from the workspace, and may make
it more difficult to troubleshoot problems. If you delete the Application Insights instance created by the
workspace, you cannot re-create it without deleting and recreating the workspace .
For more information on using this Application Insights instance, see Monitor and collect data from Machine Learning
web service endpoints.
code main.tf
terraform {
required_version = ">=1.0"
required_providers {
azurerm = {
source = "hashicorp/azurerm"
version = "=2.76.0"
}
}
}
provider "azurerm" {
features {}
}
Deploy a workspace
The following Terraform configurations can be used to create an Azure Machine Learning workspace. When you
create an Azure Machine Learning workspace, various other services are required as dependencies. The
template also specifies these associated resources to the workspace. Depending on your needs, you can choose
to use the template that creates resources with either public or private network connectivity.
Some resources in Azure require globally unique names. Before deploying your resources using the following
templates, set the name variable to a value that is unique.
variables.tf :
variable "name" {
type = string
description = "Name of the deployment"
}
variable "environment" {
type = string
description = "Name of the environment"
default = "dev"
}
variable "location" {
type = string
description = "Location of the resources"
default = "East US"
}
workspace.tf :
# Dependent resources for Azure Machine Learning
resource "azurerm_application_insights" "default" {
name = "appi-${var.name}-${var.environment}"
location = azurerm_resource_group.default.location
resource_group_name = azurerm_resource_group.default.name
application_type = "web"
}
identity {
type = "SystemAssigned"
}
}
Troubleshooting
Resource provider errors
When creating an Azure Machine Learning workspace, or a resource used by the workspace, you may receive an
error similar to the following messages:
No registered resource provider found for location {location}
The subscription is not registered to use namespace {resource-provider-namespace}
Most resource providers are automatically registered, but not all. If you receive this message, you need to
register the provider mentioned.
The following table contains a list of the resource providers required by Azure Machine Learning:
RESO URC E P RO VIDER W H Y IT 'S N EEDED
Microsoft.Storage Azure Storage Account is used as the default storage for the
workspace.
If you plan on using a customer-managed key with Azure Machine Learning, then the following service
providers must be registered:
For information on registering resource providers, see Resolve errors for resource provider registration.
Next steps
To learn more about Terraform support on Azure, see Terraform on Azure documentation.
For details on the Terraform Azure provider and Machine Learning module, see Terraform Registry Azure
Resource Manager Provider.
To find "quick start" template examples for Terraform, see Azure Terraform QuickStart Templates:
101: Machine learning workspace and compute – the minimal set of resources needed to get started
with Azure ML.
201: Machine learning workspace, compute, and a set of network components for network isolation –
all resources that are needed to create a production-pilot environment for use with HBI data.
202: Similar to 201, but with the option to bring existing network components..
301: Machine Learning workspace (Secure Hub and Spoke with Firewall).
To learn more about network configuration options, see Secure Azure Machine Learning workspace
resources using virtual networks (VNets).
For alternative Azure Resource Manager template-based deployments, see Deploy resources with
Resource Manager templates and Resource Manager REST API.
Create, run, and delete Azure ML resources using
REST
5/25/2022 • 10 minutes to read • Edit Online
There are several ways to manage your Azure ML resources. You can use the portal, command-line interface, or
Python SDK. Or, you can choose the REST API. The REST API uses HTTP verbs in a standard way to create,
retrieve, update, and delete resources. The REST API works with any language or tool that can make HTTP
requests. REST's straightforward structure often makes it a good choice in scripting environments and for
MLOps automation.
In this article, you learn how to:
Retrieve an authorization token
Create a properly-formatted REST request using service principal authentication
Use GET requests to retrieve information about Azure ML's hierarchical resources
Use PUT and POST requests to create and modify resources
Use PUT requests to create Azure ML workspaces
Use DELETE requests to clean up resources
Prerequisites
An Azure subscription for which you have administrative rights. If you don't have such a subscription, try
the free or paid personal subscription
An Azure Machine Learning Workspace
Administrative REST requests use service principal authentication. Follow the steps in Set up authentication
for Azure Machine Learning resources and workflows to create a service principal in your workspace
The curl utility. The curl program is available in the Windows Subsystem for Linux or any UNIX distribution.
In PowerShell, curl is an alias for Invoke-WebRequest and curl -d "key=val" -X POST uri becomes
Invoke-WebRequest -Body "key=val" -Method POST -Uri uri .
The response should provide an access token good for one hour:
{
"token_type": "Bearer",
"expires_in": "3599",
"ext_expires_in": "3599",
"expires_on": "1578523094",
"not_before": "1578519194",
"resource": "https://management.azure.com/",
"access_token": "YOUR-ACCESS-TOKEN"
}
Make note of the token, as you'll use it to authenticate all additional administrative requests. You'll do so by
setting an Authorization header in all requests:
NOTE
The value starts with the string "Bearer " including a single space before you add the token.
curl https://management.azure.com/subscriptions/<YOUR-SUBSCRIPTION-ID>/resourceGroups?api-version=2021-03-
01-preview -H "Authorization:Bearer <YOUR-ACCESS-TOKEN>"
Across Azure, many REST APIs are published. Each service provider updates their API on their own cadence, but
does so without breaking existing programs. The service provider uses the api-version argument to ensure
compatibility. The api-version argument varies from service to service. For the Machine Learning Service, for
instance, the current API version is 2021-03-01-preview . For storage accounts, it's 2019-08-01 . For key vaults, it's
2019-09-01 . All REST calls should set the api-version argument to the expected value. You can rely on the
syntax and semantics of the specified version even as the API continues to evolve. If you send a request to a
provider without the api-version argument, the response will contain a human-readable list of supported
values.
The above call will result in a compacted JSON response of the form:
{
"value": [
{
"id": "/subscriptions/12345abc-abbc-1b2b-1234-57ab575a5a5a/resourceGroups/RG1",
"name": "RG1",
"type": "Microsoft.Resources/resourceGroups",
"location": "westus2",
"properties": {
"provisioningState": "Succeeded"
}
},
{
"id": "/subscriptions/12345abc-abbc-1b2b-1234-57ab575a5a5a/resourceGroups/RG2",
"name": "RG2",
"type": "Microsoft.Resources/resourceGroups",
"location": "eastus",
"properties": {
"provisioningState": "Succeeded"
}
}
]
}
curl https://management.azure.com/subscriptions/<YOUR-SUBSCRIPTION-ID>/resourceGroups/<YOUR-RESOURCE-
GROUP>/providers/Microsoft.MachineLearningServices/workspaces/?api-version=2021-03-01-preview \
-H "Authorization:Bearer <YOUR-ACCESS-TOKEN>"
Again you'll receive a JSON list, this time containing a list, each item of which details a workspace:
{
"id": "/subscriptions/12345abc-abbc-1b2b-1234-
57ab575a5a5a/resourceGroups/DeepLearningResourceGroup/providers/Microsoft.MachineLearningServices/workspaces
/my-workspace",
"name": "my-workspace",
"type": "Microsoft.MachineLearningServices/workspaces",
"location": "centralus",
"tags": {},
"etag": null,
"properties": {
"friendlyName": "",
"description": "",
"creationTime": "2020-01-03T19:56:09.7588299+00:00",
"storageAccount": "/subscriptions/12345abc-abbc-1b2b-1234-
57ab575a5a5a/resourcegroups/DeepLearningResourceGroup/providers/microsoft.storage/storageaccounts/myworkspac
e0275623111",
"containerRegistry": null,
"keyVault": "/subscriptions/12345abc-abbc-1b2b-1234-
57ab575a5a5a/resourcegroups/DeepLearningResourceGroup/providers/microsoft.keyvault/vaults/myworkspace2525649
324",
"applicationInsights": "/subscriptions/12345abc-abbc-1b2b-1234-
57ab575a5a5a/resourcegroups/DeepLearningResourceGroup/providers/microsoft.insights/components/myworkspace205
3523719",
"hbiWorkspace": false,
"workspaceId": "cba12345-abab-abab-abab-ababab123456",
"subscriptionState": null,
"subscriptionStatusChangeTimeStampUtc": null,
"discoveryUrl": "https://centralus.experiments.azureml.net/discovery"
},
"identity": {
"type": "SystemAssigned",
"principalId": "abcdef1-abab-1234-1234-abababab123456",
"tenantId": "1fedcba-abab-1234-1234-abababab123456"
},
"sku": {
"name": "Basic",
"tier": "Basic"
}
}
To work with resources within a workspace, you'll switch from the general management.azure.com server to
a REST API server specific to the location of the workspace. Note the value of the discoveryUrl key in the above
JSON response. If you GET that URL, you'll receive a response something like:
{
"api": "https://centralus.api.azureml.ms",
"catalog": "https://catalog.cortanaanalytics.com",
"experimentation": "https://centralus.experiments.azureml.net",
"gallery": "https://gallery.cortanaintelligence.com/project",
"history": "https://centralus.experiments.azureml.net",
"hyperdrive": "https://centralus.experiments.azureml.net",
"labeling": "https://centralus.experiments.azureml.net",
"modelmanagement": "https://centralus.modelmanagement.azureml.net",
"pipelines": "https://centralus.aether.ms",
"studiocoreservices": "https://centralus.studioservice.azureml.com"
}
The value of the api response is the URL of the server that you'll use for more requests. To list experiments, for
instance, send the following command. Replace REGIONAL-API-SERVER with the value of the api response (for
instance, centralus.api.azureml.ms ). Also replace YOUR-SUBSCRIPTION-ID , YOUR-RESOURCE-GROUP ,
YOUR-WORKSPACE-NAME , and YOUR-ACCESS-TOKEN as usual:
curl https://<REGIONAL-API-SERVER>/history/v1.0/subscriptions/<YOUR-SUBSCRIPTION-ID>/resourceGroups/<YOUR-
RESOURCE-GROUP>/\
providers/Microsoft.MachineLearningServices/workspaces/<YOUR-WORKSPACE-NAME>/experiments?api-version=2021-
03-01-preview \
-H "Authorization:Bearer <YOUR-ACCESS-TOKEN>"
curl https://<REGIONAL-API-SERVER>/modelmanagement/v1.0/subscriptions/<YOUR-SUBSCRIPTION-
ID>/resourceGroups/<YOUR-RESOURCE-GROUP>/\
providers/Microsoft.MachineLearningServices/workspaces/<YOUR-WORKSPACE-NAME>/models?api-version=2021-03-01-
preview \
-H "Authorization:Bearer <YOUR-ACCESS-TOKEN>"
Notice that to list experiments the path begins with history/v1.0 while to list models, the path begins with
modelmanagement/v1.0 . The REST API is divided into several operational groups, each with a distinct path.
A REA PAT H
Artifacts /rest/api/azureml
Models modelmanagement/v1.0/
You can explore the REST API using the general pattern of:
URL C O M P O N EN T EXA M P L E
https://
REGIONAL-API-SERVER/ centralus.api.azureml.ms/
operations-path/ history/v1.0/
subscriptions/YOUR-SUBSCRIPTION-ID/ subscriptions/abcde123-abab-abab-1234-0123456789abc/
resourceGroups/YOUR-RESOURCE-GROUP/ resourceGroups/MyResourceGroup/
providers/operation-provider/ providers/Microsoft.MachineLearningServices/
provider-resource-path/ workspaces/MyWorkspace/experiments/FirstExperiment/runs
/1/
operations-endpoint/ artifacts/metadata/
curl https://management.azure.com/subscriptions/<YOUR-SUBSCRIPTION-ID>/resourceGroups/<YOUR-RESOURCE-
GROUP>/\
providers/Microsoft.MachineLearningServices/workspaces/<YOUR-WORKSPACE-NAME>/computes?api-version=2021-03-
01-preview \
-H "Authorization:Bearer <YOUR-ACCESS-TOKEN>"
To create or overwrite a named compute resource, you'll use a PUT request. In the following, in addition to the
now-familiar replacements of YOUR-SUBSCRIPTION-ID , YOUR-RESOURCE-GROUP , YOUR-WORKSPACE-NAME , and
YOUR-ACCESS-TOKEN , replace YOUR-COMPUTE-NAME , and values for location , vmSize , vmPriority , scaleSettings ,
adminUserName , and adminUserPassword . As specified in the reference at Machine Learning Compute - Create Or
Update SDK Reference, the following command creates a dedicated, single-node Standard_D1 (a basic CPU
compute resource) that will scale down after 30 minutes:
curl -X PUT \
'https://management.azure.com/subscriptions/<YOUR-SUBSCRIPTION-ID>/resourceGroups/<YOUR-RESOURCE-
GROUP>/providers/Microsoft.MachineLearningServices/workspaces/<YOUR-WORKSPACE-NAME>/computes/<YOUR-COMPUTE-
NAME>?api-version=2021-03-01-preview' \
-H 'Authorization:Bearer <YOUR-ACCESS-TOKEN>' \
-H 'Content-Type: application/json' \
-d '{
"location": "eastus",
"properties": {
"computeType": "AmlCompute",
"properties": {
"vmSize": "Standard_D1",
"vmPriority": "Dedicated",
"scaleSettings": {
"maxNodeCount": 1,
"minNodeCount": 0,
"nodeIdleTimeBeforeScaleDown": "PT30M"
}
}
},
"userAccountCredentials": {
"adminUserName": "<ADMIN_USERNAME>",
"adminUserPassword": "<ADMIN_PASSWORD>"
}
}'
NOTE
In Windows terminals you may have to escape the double-quote symbols when sending JSON data. That is, text such as
"location" becomes \"location\" .
A successful request will get a 201 Created response, but note that this response simply means that the
provisioning process has begun. You'll need to poll (or use the portal) to confirm its successful completion.
curl -X PUT \
'https://management.azure.com/subscriptions/<YOUR-SUBSCRIPTION-ID>/resourceGroups/<YOUR-RESOURCE-GROUP>\
/providers/Microsoft.MachineLearningServices/workspaces/<YOUR-NEW-WORKSPACE-NAME>?api-version=2021-03-01-
preview' \
-H 'Authorization: Bearer <YOUR-ACCESS-TOKEN>' \
-H 'Content-Type: application/json' \
-d '{
"location": "AZURE-LOCATION>",
"identity" : {
"type" : "systemAssigned"
},
"properties": {
"friendlyName" : "<YOUR-WORKSPACE-FRIENDLY-NAME>",
"description" : "<YOUR-WORKSPACE-DESCRIPTION>",
"containerRegistry" : "/subscriptions/<YOUR-SUBSCRIPTION-ID>/resourceGroups/<YOUR-RESOURCE-GROUP>/\
providers/Microsoft.ContainerRegistry/registries/<YOUR-REGISTRY-NAME>",
keyVault" : "/subscriptions/<YOUR-SUBSCRIPTION-ID>/resourceGroups/<YOUR-RESOURCE-GROUP>\
/providers/Microsoft.Keyvault/vaults/<YOUR-KEYVAULT-NAME>",
"applicationInsights" : "subscriptions/<YOUR-SUBSCRIPTION-ID>/resourceGroups/<YOUR-RESOURCE-GROUP>/\
providers/Microsoft.insights/components/<YOUR-APPLICATION-INSIGHTS-NAME>",
"storageAccount" : "/subscriptions/<YOUR-SUBSCRIPTION-ID>/resourceGroups/<YOUR-RESOURCE-GROUP>/\
providers/Microsoft.Storage/storageAccounts/<YOUR-STORAGE-ACCOUNT-NAME>"
}
}'
You should receive a 202 Accepted response and, in the returned headers, a Location URI. You can GET this URI
for information on the deployment, including helpful debugging information if there's a problem with one of
your dependent resources (for instance, if you forgot to enable admin access on your container registry).
curl
-X DELETE \
'https://<REGIONAL-API-SERVER>/modelmanagement/v1.0/subscriptions/<YOUR-SUBSCRIPTION-
ID>/resourceGroups/<YOUR-RESOURCE-GROUP>/providers/Microsoft.MachineLearningServices/workspaces/<YOUR-
WORKSPACE-NAME>/models/<YOUR-MODEL-ID>?api-version=2021-03-01-preview' \
-H 'Authorization:Bearer <YOUR-ACCESS-TOKEN>'
Troubleshooting
Resource provider errors
When creating an Azure Machine Learning workspace, or a resource used by the workspace, you may receive an
error similar to the following messages:
No registered resource provider found for location {location}
The subscription is not registered to use namespace {resource-provider-namespace}
Most resource providers are automatically registered, but not all. If you receive this message, you need to
register the provider mentioned.
The following table contains a list of the resource providers required by Azure Machine Learning:
Microsoft.Storage Azure Storage Account is used as the default storage for the
workspace.
If you plan on using a customer-managed key with Azure Machine Learning, then the following service
providers must be registered:
For information on registering resource providers, see Resolve errors for resource provider registration.
Moving the workspace
WARNING
Moving your Azure Machine Learning workspace to a different subscription, or moving the owning subscription to a new
tenant, is not supported. Doing so may cause errors.
WARNING
Once an Azure Container Registry has been created for a workspace, do not delete it. Doing so will break your Azure
Machine Learning workspace.
Next steps
Explore the complete AzureML REST API reference.
Learn how to use the designer to Predict automobile price with the designer.
Explore Azure Machine Learning with Jupyter notebooks.
Move Azure Machine Learning workspaces between
subscriptions (preview)
5/25/2022 • 3 minutes to read • Edit Online
As the requirements of your machine learning application change, you may need to move your workspace to a
different Azure subscription. For example, you may need to move the workspace in the following situations:
Promote workspace from test subscription to production subscription.
Change the design and architecture of your application.
Move workspace to a subscription with more available quota.
Move workspace to a subscription with different cost center.
Moving the workspace enables you to migrate the workspace and its contents as a single, automated step. The
following table describes the workspace contents that are moved:
Datasets Yes
Environments Yes
Compute resources No
Endpoints No
IMPORTANT
Workspace move is currently in public preview. This preview is provided without a service level agreement, and it's not
recommended for production workloads. Certain features might not be supported or might have constrained capabilities.
For more information, see Supplemental Terms of Use for Microsoft Azure Previews.
Prerequisites
An Azure Machine Learning workspace in the source subscription. For more information, see Create an
Azure Machine Learning workspace.
You must have permissions to manage resources in both source and target subscriptions. For example,
Contributor or Owner role at the subscription level. For more information on roles, see Azure roles
The destination subscription must be registered for required resource providers. The following table
contains a list of the resource providers required by Azure Machine Learning:
RESO URC E P RO VIDER W H Y IT 'S N EEDED
If you plan on using a customer-managed key with Azure Machine Learning, then the following service
providers must be registered:
For information on registering resource providers, see Resolve errors for resource provider registration.
The Azure CLI.
TIP
The move operation does not use the Azure CLI extension for machine learning.
Limitations
Workspace move is not meant for replicating workspaces, or moving individual assets such as models or
datasets from one workspace to another.
Workspace move doesn't support migration across Azure regions or Azure Active Directory tenants.
The workspace mustn't be in use during the move operation. Verify that all experiment runs, data profiling
runs, and labeling projects have completed. Also verify that inference endpoints aren't being invoked.
The workspace will become unavailable during the move.
Before to the move, you must delete or detach computes and inference endpoints from the workspace.
2. Verify that the origin workspace isn't being used. Check that any experiment runs, data profiling runs, or
labeling projects have completed. Also verify that inferencing endpoints aren't being invoked.
3. Delete or detach any computes from the workspace, and delete any inferencing endpoints. Moving
computes and endpoints isn't supported. Also note that the workspace will become unavailable during
the move.
4. Create a destination resource group in the new subscription. This resource group will contain the
workspace after the move. The destination must be in the same region as the origin.
5. The following command demonstrates how to validate the move operation for workspace. You can
include associated resources such as storage account, container registry, key vault, and application
insights into the move by adding them to the resources list. The validation may take several minutes. In
this command, origin-rg is the origin resource group, while destination-rg is the destination. The
subscription IDs are represented by origin-sub-id and destination-sub-id , while the workspace is
origin-workspace-name :
After the move has completed, recreate any computes and redeploy any web service endpoints at the new
location.
Next steps
Learn about resource move
Link Azure Synapse Analytics and Azure Machine
Learning workspaces and attach Apache Spark
pools(preview)
5/25/2022 • 4 minutes to read • Edit Online
IMPORTANT
The Azure Machine Learning and Azure Synapse integration is in public preview. The functionalities presented from the
azureml-synapse package are experimental preview features, and may change at any time.
Prerequisites
Create an Azure Machine Learning workspace.
Create a Synapse workspace in Azure portal.
Create Apache Spark pool using Azure portal, web tools, or Synapse Studio
Install the Azure Machine Learning Python SDK
Access to the Azure Machine Learning studio.
The following code employs the LinkedService and SynapseWorkspaceLinkedServiceConfiguration classes to,
Link your machine learning workspace, ws with your Azure Synapse workspace.
Register your Synapse workspace with Azure Machine Learning as a linked service.
import datetime
from azureml.core import Workspace, LinkedService, SynapseWorkspaceLinkedServiceConfiguration
#link configuration
synapse_link_config = SynapseWorkspaceLinkedServiceConfiguration(
subscription_id=ws.subscription_id,
resource_group= 'your resource group',
name='mySynapseWorkspaceName')
IMPORTANT
A managed identity, system_assigned_identity_principal_id , is created for each linked service. This managed identity
must be granted the Synapse Apache Spark Administrator role of the Synapse workspace before you start your
Synapse session. Assign the Synapse Apache Spark Administrator role to the managed identity in the Synapse Studio.
To find the system_assigned_identity_principal_id of a specific linked service, use
LinkedService.get('<your-mlworkspace-name>', '<linked-service-name>') .
LinkedService.list(ws)
linked_service.unregister()
Synapse workspace Select the Synapse workspace you want to link to.
5. Select Next to open the Select Spark pools (optional) form. On this form, you select which Synapse
Spark pool to attach to your workspace
6. Select Next to open the Review form and check your selections.
7. Select Create to complete the linked service creation process.
synapse_compute.wait_for_completion()
Next steps
How to data wrangle with Azure Synapse (preview).
How to use Apache Spark in your machine learning pipeline with Azure Synapse (preview)
Train a model.
How to securely integrate Azure Synapse and Azure Machine Learning workspaces.
How to securely integrate Azure Machine Learning
and Azure Synapse
5/25/2022 • 5 minutes to read • Edit Online
In this article, learn how to securely integrate with Azure Machine Learning from Azure Synapse. This integration
enables you to use Azure Machine Learning from notebooks in your Azure Synapse workspace. Communication
between the two workspaces is secured using an Azure Virtual Network.
TIP
You can also perform integration in the opposite direction, using Azure Synapse spark pool from Azure Machine Learning.
For more information, see Link Azure Synapse and Azure Machine Learning.
Prerequisites
An Azure subscription.
An Azure Machine Learning workspace with a private endpoint connection to a virtual network. The
following workspace dependency services must also have a private endpoint connection to the virtual
network:
Azure Storage Account
TIP
For the storage account there are three separate private endpoints; one each for blob, file, and dfs.
WARNING
The Azure Machine Learning integration is not currently supported in Synapse Workspaces with data exfiltration
protection. When configuring your Azure Synapse workspace, do not enable data exfiltration protection. For more
information, see Azure Synapse Analytics Managed Virtual Network.
NOTE
The steps in this article make the following assumptions:
The Azure Synapse workspace is in a different resource group than the Azure Machine Learning workspace.
The Azure Synapse workspace uses a managed vir tual network . The managed virtual network secures the
connectivity between Azure Synapse and Azure Machine Learning. It does not restrict access to the Azure
Synapse workspace. You will access the workspace over the public internet.
1. From Azure Synapse Studio, Create a new Azure Machine Learning linked service.
2. After creating and publishing the linked service, select Manage , Managed private endpoints , and then
+ New in Azure Synapse Studio.
3. From the New managed private endpoint page, search for Azure Machine Learning and select the
tile.
4. When prompted to select the Azure Machine Learning workspace, use the Azure subscription and
Azure Machine Learning workspace you added previously as a linked service. Select Create to create
the endpoint.
5. The endpoint will be listed as Provisioning until it has been created. Once created, the Approval
column will list a status of Pending . You'll approve the endpoint in the Configure Azure Machine
Learning section.
NOTE
In the following screenshot, a managed private endpoint has been created for the Azure Data Lake Storage Gen 2
associated with this Synapse workspace. For information on how to create an Azure Data Lake Storage Gen 2 and
enable a private endpoint for it, see Provision and secure a linked service with Managed VNet.
3. From the left of the page, select Access control (IAM) . Select + Add , and then select Role
assignment .
5. Select User, group, or ser vice principal , and then + Select members . Enter the name of the identity
created earlier, select it, and then use the Select button.
6. Select Review + assign , verify the information, and then select the Review + assign button.
TIP
It may take several minutes for the Azure Machine Learning workspace to update the credentials cache. Until it
has been updated, you may receive errors when trying to access the Azure Machine Learning workspace from
Synapse.
Verify connectivity
1. From Azure Synapse Studio, select Develop , and then + Notebook .
2. In the Attach to field, select the Apache Spark pool for your Azure Synapse workspace, and enter the
following code in the first cell:
print(ws.name)
This code snippet connects to the linked workspace, and then prints the workspace info. In the printed
output, the value displayed is the name of the Azure Machine Learning workspace, not the linked service
name that was used in the getWorkspace() call. For more information on using the ws object, see the
Workspace class reference.
Next steps
Quickstart: Create a new Azure Machine Learning linked service in Synapse.
Link Azure Synapse Analytics and Azure Machine Learning workspaces.
How to use workspace diagnostics
5/25/2022 • 2 minutes to read • Edit Online
Azure Machine Learning provides a diagnostic API that can be used to identify problems with your workspace.
Errors returned in the diagnostics report include information on how to resolve the problem.
You can use the workspace diagnostics from the Azure Machine Learning studio or Python SDK.
Prerequisites
An Azure Machine learning workspace. If you don't have one, see Create a workspace.
The Azure Machine Learning SDK for Python.
After diagnostics run, a list of any detected problems is returned. This list includes links to possible solutions.
Diagnostics from Python
The following snippet demonstrates how to use workspace diagnostics from Python
APPLIES TO: Python SDK azureml v1
ws = Workspace.from_config()
diag_param = {
"value": {
}
}
resp = ws.diagnose_workspace(diag_param)
print(resp)
The response is a JSON document that contains information on any problems detected with the workspace. The
following JSON is an example response:
{
'value': {
'user_defined_route_results': [],
'network_security_rule_results': [],
'resource_lock_results': [],
'dns_resolution_results': [{
'code': 'CustomDnsInUse',
'level': 'Warning',
'message': "It is detected VNet '/subscriptions/<subscription-id>/resourceGroups/<resource-
group-name>/providers/Microsoft.Network/virtualNetworks/<virtual-network-name>' of private endpoint
'/subscriptions/<subscription-
id>/resourceGroups/larrygroup0916/providers/Microsoft.Network/privateEndpoints/<workspace-private-endpoint>'
is not using Azure default dns. You need to configure your DNS server and check
https://docs.microsoft.com/azure/machine-learning/how-to-custom-dns to make sure the custom dns is set up
correctly."
}],
'storage_account_results': [],
'key_vault_results': [],
'container_registry_results': [],
'application_insights_results': [],
'other_results': []
}
}
Next steps
Workspace.diagnose_workspace()
How to manage workspaces in portal or SDK
Create and manage an Azure Machine Learning
compute instance
5/25/2022 • 15 minutes to read • Edit Online
Learn how to create and manage a compute instance in your Azure Machine Learning workspace.
Use a compute instance as your fully configured and managed development environment in the cloud. For
development and testing, you can also use the instance as a training compute target or for an inference target. A
compute instance can run multiple jobs in parallel and has a job queue. As a development environment, a
compute instance can't be shared with other users in your workspace.
In this article, you learn how to:
Create a compute instance
Manage (start, stop, restart, delete) a compute instance
Create a schedule to automatically start and stop the compute instance (preview)
You can also use a setup script (preview) to create the compute instance with your own custom environment.
Compute instances can run jobs securely in a virtual network environment, without requiring enterprises to
open up SSH ports. The job executes in a containerized environment and packages your model dependencies in
a Docker container.
NOTE
This article shows CLI v2 in the sections below. If you are still using CLI v1, see Create an Azure Machine Learning
compute cluster CLI v1).
Prerequisites
An Azure Machine Learning workspace. For more information, see Create an Azure Machine Learning
workspace.
The Azure CLI extension for Machine Learning service (v2), Azure Machine Learning Python SDK, or the
Azure Machine Learning Visual Studio Code extension.
Create
IMPORTANT
Items marked (preview) below are currently in public preview. The preview version is provided without a service level
agreement, and it's not recommended for production workloads. Certain features might not be supported or might have
constrained capabilities. For more information, see Supplemental Terms of Use for Microsoft Azure Previews.
import datetime
import time
For more information on the classes, methods, and parameters used in this example, see the following reference
documents:
ComputeInstance class
ComputeTarget.create
ComputeInstance.wait_for_completion
You can also create a compute instance with an Azure Resource Manager template.
$schema: https://azuremlschemas.azureedge.net/latest/computeInstance.schema.json
name: schedule-example-i
type: computeinstance
size: STANDARD_DS3_v2
schedules:
compute_start_stop:
- action: stop
trigger:
type: cron
start_time: "2021-03-10T21:21:07"
time_zone: Pacific Standard Time
expression: 0 18 * * *
"schedules": "[parameters('schedules')]"
Then use either cron or LogicApps expressions to define the schedule that starts or stops the instance in your
parameter file:
"schedules": {
"value": {
"computeStartStop": [
{
"triggerType": "Cron",
"cron": {
"timeZone": "UTC",
"expression": "0 18 * * *"
},
"action": "Stop",
"status": "Enabled"
},
{
"triggerType": "Cron",
"cron": {
"timeZone": "UTC",
"expression": "0 8 * * *"
},
"action": "Start",
"status": "Enabled"
},
{
"triggerType":"Recurrence",
"recurrence":{
"frequency":"Day",
"interval":1,
"timeZone":"UTC",
"schedule":{
"hours":[17],
"minutes":[0]
}
},
"action":"Stop",
"status":"Enabled"
}
]
}
}
{
"mode": "All",
"policyRule": {
"if": {
"allOf": [
{
"field": "Microsoft.MachineLearningServices/workspaces/computes/computeType",
"equals": "ComputeInstance"
},
{
"field": "Microsoft.MachineLearningServices/workspaces/computes/schedules",
"exists": "false"
}
]
},
"then": {
"effect": "append",
"details": [
{
"field": "Microsoft.MachineLearningServices/workspaces/computes/schedules",
"value": {
"computeStartStop": [
{
"triggerType": "Cron",
"cron": {
"startTime": "2021-03-10T21:21:07",
"timeZone": "Pacific Standard Time",
"expression": "0 22 * * *"
},
"action": "Stop",
"status": "Enabled"
}
]
}
}
]
}
}
}
NOTE
Support for accessing your workspace file store from RStudio is not yet available.
When accessing multiple instances of RStudio, if you see a "400 Bad Request. Request Header Or Cookie Too Large"
error, use a new browser or access from a browser in incognito mode.
Shiny applications are not currently supported on RStudio Workbench.
Setup RStudio open source
To use RStudio open source, set up a custom application as follows:
1. Follow the steps listed above to Add application when creating your compute instance.
2. Select Custom Application on the Application dropdown
3. Configure the Application name you would like to use.
4. Set up the application to run on Target por t 8787 - the docker image for RStudio open source listed below
needs to run on this Target port.
5. Set up the application to be accessed on Published por t 8787 - you can configure the application to be
accessed on a different Published port if you wish.
6. Point the Docker image to ghcr.io/azure/rocker-rstudio-ml-verse:latest .
7. Select Create to set up RStudio as a custom application on your compute instance.
NOTE
It might take a few minutes after setting up a custom application until you can access it via the links above. The amount
of time taken will depend on the size of the image used for your custom application. If you see a 502 error message when
trying to access the application, wait for some time for the application to be set up and try again.
Manage
Start, stop, restart, and delete a compute instance. A compute instance doesn't automatically scale down, so
make sure to stop the resource to prevent ongoing charges. Stopping a compute instance deallocates it. Then
start it again when you need it. While stopping the compute instance stops the billing for compute hours, you'll
still be billed for disk, public IP, and standard load balancer.
You can create a schedule for the compute instance to automatically start and stop based on a time and day of
week.
TIP
The compute instance has 120GB OS disk. If you run out of disk space, use the terminal to clear at least 1-2 GB before
you stop or restart the compute instance. Please do not stop the compute instance by issuing sudo shutdown from the
terminal. The temp disk size on compute instance depends on the VM size chosen and is mounted on /mnt.
Python
Azure CLI
Studio
Stop
Start
Restart
Delete
# delete() is used to delete the ComputeInstance target. Useful if you want to re-use the compute
name
instance.delete(wait_for_completion=True, show_output=True)
Azure RBAC allows you to control which users in the workspace can create, delete, start, stop, restart a compute
instance. All users in the workspace contributor and owner role can create, delete, start, stop, and restart
compute instances across the workspace. However, only the creator of a specific compute instance, or the user
assigned if it was created on their behalf, is allowed to access Jupyter, JupyterLab, and RStudio on that compute
instance. A compute instance is dedicated to a single user who has root access, and can terminal in through
Jupyter/JupyterLab/RStudio. Compute instance will have single-user sign-in and all actions will use that user’s
identity for Azure RBAC and attribution of experiment runs. SSH access is controlled through public/private key
mechanism.
These actions can be controlled by Azure RBAC:
Microsoft.MachineLearningServices/workspaces/computes/read
Microsoft.MachineLearningServices/workspaces/computes/write
Microsoft.MachineLearningServices/workspaces/computes/delete
Microsoft.MachineLearningServices/workspaces/computes/start/action
Microsoft.MachineLearningServices/workspaces/computes/stop/action
Microsoft.MachineLearningServices/workspaces/computes/restart/action
Microsoft.MachineLearningServices/workspaces/computes/updateSchedules/action
To create a compute instance, you'll need permissions for the following actions:
Microsoft.MachineLearningServices/workspaces/computes/write
Microsoft.MachineLearningServices/workspaces/checkComputeNameAvailability/action
Next steps
Access the compute instance terminal
Create and manage files
Update the compute instance to the latest VM image
Submit a training run
Create an Azure Machine Learning compute cluster
5/25/2022 • 9 minutes to read • Edit Online
Learn how to create and manage a compute cluster in your Azure Machine Learning workspace.
You can use Azure Machine Learning compute cluster to distribute a training or batch inference process across a
cluster of CPU or GPU compute nodes in the cloud. For more information on the VM sizes that include GPUs, see
GPU-optimized virtual machine sizes.
In this article, learn how to:
Create a compute cluster
Lower your compute cluster cost
Set up a managed identity for the cluster
Prerequisites
An Azure Machine Learning workspace. For more information, see Create an Azure Machine Learning
workspace.
The Azure CLI extension for Machine Learning service (v2), Azure Machine Learning Python SDK, or the
Azure Machine Learning Visual Studio Code extension.
If using the Python SDK, set up your development environment with a workspace. Once your
environment is set up, attach to the workspace in your Python script:
APPLIES TO: Python SDK azureml v1
ws = Workspace.from_config()
Limitations
Some of the scenarios listed in this document are marked as preview . Preview functionality is provided
without a service level agreement, and it's not recommended for production workloads. Certain features
might not be supported or might have constrained capabilities. For more information, see Supplemental
Terms of Use for Microsoft Azure Previews.
Compute clusters can be created in a different region than your workspace. This functionality is in
preview , and is only available for compute clusters , not compute instances. This preview is not
available if you are using a private endpoint-enabled workspace.
WARNING
When using a compute cluster in a different region than your workspace or datastores, you may see increased
network latency and data transfer costs. The latency and costs can occur when creating the cluster, and when
running jobs on it.
We currently support only creation (and not updating) of clusters through ARM templates. For updating
compute, we recommend using the SDK, Azure CLI or UX for now.
Azure Machine Learning Compute has default limits, such as the number of cores that can be allocated.
For more information, see Manage and request quotas for Azure resources.
Azure allows you to place locks on resources, so that they cannot be deleted or are read only. Do not
apply resource locks to the resource group that contains your workspace . Applying a lock to
the resource group that contains your workspace will prevent scaling operations for Azure ML compute
clusters. For more information on locking resources, see Lock resources to prevent unexpected changes.
TIP
Clusters can generally scale up to 100 nodes as long as you have enough quota for the number of cores required. By
default clusters are setup with inter-node communication enabled between the nodes of the cluster to support MPI jobs
for example. However you can scale your clusters to 1000s of nodes by simply raising a support ticket, and requesting to
allow list your subscription, or workspace, or a specific cluster for disabling inter-node communication.
Create
Time estimate : Approximately 5 minutes.
Azure Machine Learning Compute can be reused across runs. The compute can be shared with other users in the
workspace and is retained between runs, automatically scaling nodes up or down based on the number of runs
submitted, and the max_nodes set on your cluster. The min_nodes setting controls the minimum nodes
available.
The dedicated cores per region per VM family quota and total regional quota, which applies to compute cluster
creation, is unified and shared with Azure Machine Learning training compute instance quota.
IMPORTANT
To avoid charges when no jobs are running, set the minimum nodes to 0 . This setting allows Azure Machine Learning
to de-allocate the nodes when they aren't in use. Any value larger than 0 will keep that number of nodes running, even if
they are not in use.
The compute autoscales down to zero nodes when it isn't used. Dedicated VMs are created to run your jobs as
needed.
Python
Azure CLI
Studio
To create a persistent Azure Machine Learning Compute resource in Python, specify the vm_size and
max_nodes properties. Azure Machine Learning then uses smart defaults for the other properties.
vm_size : The VM family of the nodes created by Azure Machine Learning Compute.
max_nodes : The max number of nodes to autoscale up to when you run a job on Azure Machine Learning
Compute.
APPLIES TO: Python SDK azureml v1
cpu_cluster.wait_for_completion(show_output=True)
You can also configure several advanced properties when you create Azure Machine Learning Compute. The
properties allow you to create a persistent cluster of fixed size, or within an existing Azure Virtual Network in
your subscription. See the AmlCompute class for details.
WARNING
When setting the location parameter, if it is a different region than your workspace or datastores you may see
increased network latency and data transfer costs. The latency and costs can occur when creating the cluster, and when
running jobs on it.
Python
Azure CLI
Studio
compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2',
vm_priority='lowpriority',
max_nodes=4)
Python
Azure CLI
Studio
cpu_cluster_name = "cpu-cluster"
cpu_cluster = ComputeTarget.create(ws, cpu_cluster_name, compute_config)
client_id = os.environ.get('DEFAULT_IDENTITY_CLIENT_ID')
credential = ManagedIdentityCredential(client_id=client_id)
token = credential.get_token('https://storage.azure.com/')
Troubleshooting
There is a chance that some users who created their Azure Machine Learning workspace from the Azure portal
before the GA release might not be able to create AmlCompute in that workspace. You can either raise a support
request against the service or create a new workspace through the portal or the SDK to unblock yourself
immediately.
Stuck at resizing
If your Azure Machine Learning compute cluster appears stuck at resizing (0 -> 0) for the node state, this may
be caused by Azure resource locks.
Azure allows you to place locks on resources, so that they cannot be deleted or are read only. Locking a
resource can lead to unexpected results. Some operations that don't seem to modify the resource actually
require actions that are blocked by the lock.
With Azure Machine Learning, applying a delete lock to the resource group for your workspace will prevent
scaling operations for Azure ML compute clusters. To work around this problem we recommend removing the
lock from resource group and instead applying it to individual items in the group.
IMPORTANT
Do not apply the lock to the following resources:
These resources are used to communicate with, and perform operations such as scaling on, the compute cluster.
Removing the resource lock from these resources should allow autoscaling for your compute clusters.
For more information on resource locking, see Lock resources to prevent unexpected changes.
Next steps
Use your compute cluster to:
Submit a training run
Run batch inference.
Azure Machine Learning anywhere with Kubernetes
(preview)
5/25/2022 • 17 minutes to read • Edit Online
Azure Machine Learning anywhere with Kubernetes (AzureML anywhere) enables customers to build, train, and
deploy models in any infrastructure on-premises and across multi-cloud using Kubernetes. With an AzureML
extension deployment on a Kubernetes cluster, you can instantly onboard teams of ML professionals with
AzureML service capabilities. These services include full machine learning lifecycle and automation with MLOps
in hybrid cloud and multi-cloud.
In this article, you can learn about steps to configure and attach an existing Kubernetes cluster anywhere for
Azure Machine Learning:
Deploy AzureML extension to Kubernetes cluster
Create and use instance types to manage compute resources efficiently
Prerequisites
1. A running Kubernetes cluster - We recommend minimum of 4 vCPU cores and 8GB memor y,
around 2 vCPU cores and 3GB memor y will be used by Azure Arc agent and AzureML
extension components .
2. Connect your Kubernetes cluster to Azure Arc. Follow instructions in connect existing Kubernetes cluster
to Azure Arc.
a. if you have Azure RedHat OpenShift Service (ARO) cluster or OpenShift Container Platform (OCP)
cluster, follow another prerequisite step here before AzureML extension deployment.
3. If you have an AKS cluster in Azure, register the AKS-ExtensionManager feature flag by using the
az feature register --namespace "Microsoft.ContainerService" --name "AKS-ExtensionManager command.
Azure Arc connection is not required and not recommended .
4. Install or upgrade Azure CLI to version >=2.16.0
5. Install the Azure CLI extension k8s-extension (version>=1.0.0) by running
az extension add --name k8s-extension
Use Minikube on your desktop for a quick POC, training workload support only
Ensure you have fulfilled prerequisites. Since the follow steps would create an Azure Arc connected cluster, you
would need to specify connectedClusters value for --cluster-type parameter. Run following simple Azure CLI
command to deploy AzureML extension:
Enable an AKS cluster in Azure for production training and inference workload
Ensure you have fulfilled prerequisites. Assuming your cluster has more than 3 nodes, and you will use an Azure
public load balancer and HTTPS for inference workload support, run following Azure CLI command to deploy
AzureML extension:
az k8s-extension create --name azureml-extension --extension-type Microsoft.AzureML.Kubernetes --config
enableTraining=True enableInference=True inferenceRouterServiceType=LoadBalancer --config-protected
sslCertPemFile=<file-path-to-cert-PEM> sslCertKeyFile=<file-path-to-cert-KEY> --cluster-type managedClusters
--cluster-name <your-AKS-cluster-name> --resource-group <your-RG-name> --scope cluster
Enable an Azure Arc connected cluster anywhere for production training and inference workload
Ensure you have fulfilled prerequisites. Assuming your cluster has more than 3 nodes, you will use a NodePort
service type and HTTPS for inference workload support, run following Azure CLI command to deploy AzureML
extension:
2. In the response, look for "name": "azureml-extension" and "provisioningState": "Succeeded". Note it might
show "provisioningState": "Pending" for the first few minutes.
3. If the provisioningState shows Succeeded, run the following command on your machine with the
kubeconfig file pointed to your cluster to check that all pods under "azureml" namespace are in 'Running'
state:
Azure Relay resources are created under the same Resource Group as the Arc cluster.
Studio
CLI
Attaching an Azure Arc-enabled Kubernetes cluster makes it available to your workspace for training.
1. Navigate to Azure Machine Learning studio.
2. Under Manage , select Compute .
3. Select the Attached computes tab.
4. Select +New > Kubernetes (preview)
5. Enter a compute name and select your Azure Arc-enabled Kubernetes cluster from the dropdown.
(Optional) Enter Kubernetes namespace, which defaults to default . All machine learning
workloads will be sent to the specified Kubernetes namespace in the cluster.
(Optional) Assign system-assigned or user-assigned managed identity. Managed identities
eliminate the need for developers to manage credentials. For more information, see managed
identities overview .
6. Select Attach
In the Attached compute tab, the initial state of your cluster is Creating. When the cluster is successfully
attached, the state changes to Succeeded. Otherwise, the state changes to Failed.
Create instance types for efficient compute resource usage
What are instance types?
Instance types are an Azure Machine Learning concept that allows targeting certain types of compute nodes for
training and inference workloads. For an Azure VM, an example for an instance type is STANDARD_D2_V3 .
In Kubernetes clusters, instance types are represented in a custom resource definition (CRD) that is installed with
the AzureML extension. Instance types are represented by two elements in AzureML extension: nodeSelector and
resources. In short, a nodeSelector lets us specify which node a pod should run on. The node must have a
corresponding label. In the resources section, we can set the compute resources (CPU, memory and Nvidia
GPU) for the pod.
Default instance type
By default, a defaultinstancetype with following definition is created when you attach Kuberenetes cluster to
AzureML workspace:
No nodeSelector is applied, meaning the pod can get scheduled on any node.
The workload's pods are assigned default resources with 0.6 cpu cores, 1536Mi memory and 0 GPU:
resources:
requests:
cpu: "0.6"
memory: "1536Mi"
limits:
cpu: "0.6"
memory: "1536Mi"
nvidia.com/gpu: null
NOTE
The default instance type purposefully uses little resources. To ensure all ML workloads run with appropriate resources,
for example GPU resource, it is highly recommended to create custom instance types.
defaultinstancetype will not appear as an InstanceType custom resource in the cluster when running the command
kubectl get instancetype , but it will appear in all clients (UI, CLI, SDK).
defaultinstancetype can be overridden with a custom instance type definition having the same name as
defaultinstancetype (see Create custom instance types section)
With my_instance_type.yaml :
apiVersion: amlarc.azureml.com/v1alpha1
kind: InstanceType
metadata:
name: myinstancetypename
spec:
nodeSelector:
mylabel: mylabelvalue
resources:
limits:
cpu: "1"
nvidia.com/gpu: 1
memory: "2Gi"
requests:
cpu: "700m"
memory: "1500Mi"
The following steps will create an instance type with the labeled behavior:
Pods will be scheduled only on nodes with label mylabel: mylabelvalue .
Pods will be assigned resource requests of 700m CPU and 1500Mi memory.
Pods will be assigned resource limits of 1 CPU, 2Gi memory and 1 Nvidia GPU.
NOTE
Nvidia GPU resources are only specified in the limits section as integer values. For more information, see the
Kubernetes documentation.
CPU and memory resources are string values.
CPU can be specified in millicores, for example 100m , or in full numbers, for example "1" is equivalent to 1000m .
Memory can be specified as a full number + suffix, for example 1024Mi for 1024 MiB.
With my_instance_type_list.yaml :
apiVersion: amlarc.azureml.com/v1alpha1
kind: InstanceTypeList
items:
- metadata:
name: cpusmall
spec:
resources:
requests:
cpu: "100m"
memory: "100Mi"
limits:
cpu: "1"
nvidia.com/gpu: 0
memory: "1Gi"
- metadata:
name: defaultinstancetype
spec:
resources:
requests:
cpu: "1"
memory: "1Gi"
limits:
cpu: "1"
nvidia.com/gpu: 0
memory: "1Gi"
The above example creates two instance types: cpusmall and defaultinstancetype . Above defaultinstancetype
definition will override the defaultinstancetype definition created when Kubernetes cluster was attached to
AzureML workspace.
If a training or inference workload is submitted without an instance type, it uses the default instance type. To
specify a default instance type for a Kubernetes cluster, create an instance type with name defaultinstancetype .
It will automatically be recognized as the default.
In the above example, replace <compute_target_name> with the name of your Kubernetes compute target and
<instance_type_name> with the name of the instance type you wish to select. If there is no instance_type
property specified, the system will use defaultinstancetype to submit job.
In the above example, replace <instance_type_name> with the name of the instance type you wish to select. If
there is no instance_type property specified, the system will use defaultinstancetype to deploy model.
Appendix I: AzureML extension components
Upon AzureML extension deployment completes, it will create following resources in Azure cloud:
Azure Service Bus Azure resource Used to sync nodes and cluster
resource information to Azure Machine
Learning services regularly.
Upon AzureML extension deployment completes, it will create following resources in Kubernetes cluster,
depending on each AzureML extension deployment scenario:
C O M M UN IC A
T RA IN IN G T IO N W IT H
RESO URC E RESO URC E AND C LO UD
NAME TYPE T RA IN IN G IN F EREN C E IN F EREN C E DESC RIP T IO N SERVIC E
IMPORTANT
Azure ServiceBus and Azure Relay resources are under the same resource group as the Arc cluster resource. These
resources are used to communicate with the Kubernetes cluster and modifying them will break attached compute
targets.
By default, the deployed kubernetes deployment resourses are randomly deployed to 1 or more nodes of the cluster,
and daemonset resource are deployed to ALL nodes. If you want to restrict the extension deployment to specific
nodes, use nodeSelector configuration setting described as below.
NOTE
{EXTENSION-NAME}: is the extension name specified with az k8s-extension create --name CLI command.
allowInsecureConnections
T rue or False , N/A Optional Optional
default False. This
must be set to
True for AzureML
extension
deployment with
HTTP endpoints
support for inference,
when
sslCertPemFile
and sslKeyPemFile
are not provided.
inferenceRouterServiceType
loadBalancer or N/A ✓ ✓
nodePort . Must be
set for
enableInference=true
.
installNvidiaDevicePlugin
True or False , Optional Optional Optional
default False .
Nvidia Device Plugin
is required for ML
workloads on Nvidia
GPU hardware. By
default, AzureML
extension
deployment will not
install Nvidia Device
Plugin regardless
Kubernetes cluster
has GPU hardware or
not. User can specify
this configuration
setting to True , so
the extension will
install Nvidia Device
Plugin, but make
sure to have
Prerequisites ready
beforehand.
C O N F IGURAT IO N T RA IN IN G A N D
SET T IN G K EY N A M E DESC RIP T IO N T RA IN IN G IN F EREN C E IN F EREN C E
C O N F IGURAT IO N
P ROT EC T ED SET T IN G T RA IN IN G A N D
K EY N A M E DESC RIP T IO N T RA IN IN G IN F EREN C E IN F EREN C E
Next steps
Train models with CLI (v2)
Configure and submit training runs
Tune hyperparameters
Train a model using Scikit-learn
Train a TensorFlow model
Train a PyTorch model
Train using Azure Machine Learning pipelines
Train model on-premise with outbound proxy server
Create and attach an Azure Kubernetes Service
cluster
5/25/2022 • 16 minutes to read • Edit Online
Prerequisites
An Azure Machine Learning workspace. For more information, see Create an Azure Machine Learning
workspace.
The Azure CLI extension for Machine Learning service, Azure Machine Learning Python SDK, or the Azure
Machine Learning Visual Studio Code extension.
If you plan on using an Azure Virtual Network to secure communication between your Azure ML
workspace and the AKS cluster, your workspace and its associated resources (storage, key vault, Azure
Container Registry) must have private endpoints or service endpoints in the same VNET as AKS cluster's
VNET. Please follow tutorial create a secure workspace to add those private endpoints or service
endpoints to your VNET.
Limitations
If you need a Standard Load Balancer(SLB) deployed in your cluster instead of a Basic Load
Balancer(BLB), create a cluster in the AKS portal/CLI/SDK and then attach it to the AML workspace.
If you have an Azure Policy that restricts the creation of Public IP addresses, then AKS cluster creation will
fail. AKS requires a Public IP for egress traffic. The egress traffic article also provides guidance to lock
down egress traffic from the cluster through the Public IP, except for a few fully qualified domain names.
There are 2 ways to enable a Public IP:
The cluster can use the Public IP created by default with the BLB or SLB, Or
The cluster can be created without a Public IP and then a Public IP is configured with a firewall with a
user defined route. For more information, see Customize cluster egress with a user-defined-route.
The AML control plane does not talk to this Public IP. It talks to the AKS control plane for deployments.
To attach an AKS cluster, the service principal/user performing the operation must be assigned the
Owner or contributor Azure role-based access control (Azure RBAC) role on the Azure resource group
that contains the cluster. The service principal/user must also be assigned Azure Kubernetes Service
Cluster Admin Role on the cluster.
If you attach an AKS cluster, which has an Authorized IP range enabled to access the API server, enable
the AML control plane IP ranges for the AKS cluster. The AML control plane is deployed across paired
regions and deploys inference pods on the AKS cluster. Without access to the API server, the inference
pods cannot be deployed. Use the IP ranges for both the paired regions when enabling the IP ranges in
an AKS cluster.
Authorized IP ranges only works with Standard Load Balancer.
If you want to use a private AKS cluster (using Azure Private Link), you must create the cluster first, and
then attach it to the workspace. For more information, see Create a private Azure Kubernetes Service
cluster.
Using a public fully qualified domain name (FQDN) with a private AKS cluster is not suppor ted with
Azure Machine learning.
The compute name for the AKS cluster MUST be unique within your Azure ML workspace. It can include
letters, digits and dashes. It must start with a letter, end with a letter or digit, and be between 3 and 24
characters in length.
If you want to deploy models to GPU nodes or FPGA nodes (or any specific SKU), then you must create a
cluster with the specific SKU. There is no support for creating a secondary node pool in an existing cluster
and deploying models in the secondary node pool.
When creating or attaching a cluster, you can select whether to create the cluster for dev-test or
production . If you want to create an AKS cluster for development , validation , and testing instead of
production, set the cluster purpose to dev-test . If you do not specify the cluster purpose, a
production cluster is created.
IMPORTANT
A dev-test cluster is not suitable for production level traffic and may increase inference times. Dev/test clusters
also do not guarantee fault tolerance.
When creating or attaching a cluster, if the cluster will be used for production , then it must contain at
least 3 nodes . For a dev-test cluster, it must contain at least 1 node.
The Azure Machine Learning SDK does not provide support scaling an AKS cluster. To scale the nodes in
the cluster, use the UI for your AKS cluster in the Azure Machine Learning studio. You can only change the
node count, not the VM size of the cluster. For more information on scaling the nodes in an AKS cluster,
see the following articles:
Manually scale the node count in an AKS cluster
Set up cluster autoscaler in AKS
Do not directly update the cluster by using a YAML configuration . While Azure Kubernetes
Services supports updates via YAML configuration, Azure Machine Learning deployments will override
your changes. The only two YAML fields that will not overwritten are request limits and cpu and
memor y .
Creating an AKS cluster using the Azure Machine Learning studio UI, SDK, or CLI extension is not
idempotent. Attempting to create the resource again will result in an error that a cluster with the same
name already exists.
Using an Azure Resource Manager template and the
Microsoft.MachineLearningServices/workspaces/computes resource to create an AKS cluster is also
not idempotent. If you attempt to use the template again to update an already existing resource, you
will receive the same error.
IMPORTANT
Azure Kubernetes Service uses Blobfuse FlexVolume driver for the versions <=1.16 and Blob CSI driver for the versions
>=1.17. Therefore, it is important to re-deploy or update the web service after cluster upgrade in order to deploy to
correct blobfuse method for the cluster version.
NOTE
There may be edge cases where you have an older cluster that is no longer supported. In this case, the attach operation
will return an error and list the currently supported versions.
You can attach preview versions. Preview functionality is provided without a service level agreement, and it's not
recommended for production workloads. Certain features might not be supported or might have constrained capabilities.
Support for using preview versions may be limited. For more information, see Supplemental Terms of Use for Microsoft
Azure Previews.
KubernetesVersion Upgrades
------------------- ----------------------------------------
1.18.6(preview) None available
1.18.4(preview) 1.18.6(preview)
1.17.9 1.18.4(preview), 1.18.6(preview)
1.17.7 1.17.9, 1.18.4(preview), 1.18.6(preview)
1.16.13 1.17.7, 1.17.9
1.16.10 1.16.13, 1.17.7, 1.17.9
1.15.12 1.16.10, 1.16.13
1.15.11 1.15.12, 1.16.10, 1.16.13
To find the default version that is used when creating a cluster through Azure Machine Learning, you can use
the --query parameter to select the default version:
If you'd like to programmatically check the available versions , use the Container Service Client - List
Orchestrators REST API. To find the available versions, look at the entries where orchestratorType is Kubernetes
. The associated orchestrationVersion entries contain the available versions that can be attached to your
workspace.
To find the default version that is used when creating a cluster through Azure Machine Learning, find the entry
where orchestratorType is Kubernetes and default is true . The associated orchestratorVersion value is the
default version. The following JSON snippet shows an example entry:
...
{
"orchestratorType": "Kubernetes",
"orchestratorVersion": "1.16.13",
"default": true,
"upgrades": [
{
"orchestratorType": "",
"orchestratorVersion": "1.17.7",
"isPreview": false
}
]
},
...
Python
Azure CLI
Portal
# Use the default configuration (you can also provide parameters to customize this).
# For example, to create a dev/test cluster, use:
# prov_config = AksCompute.provisioning_configuration(cluster_purpose = AksCompute.ClusterPurpose.DEV_TEST)
prov_config = AksCompute.provisioning_configuration()
aks_name = 'myaks'
# Create the cluster
aks_target = ComputeTarget.create(workspace = ws,
name = aks_name,
provisioning_configuration = prov_config)
For more information on the classes, methods, and parameters used in this example, see the following reference
documents:
AksCompute.ClusterPurpose
AksCompute.provisioning_configuration
ComputeTarget.create
ComputeTarget.wait_for_completion
TIP
The existing AKS cluster can be in a Azure region other than your Azure Machine Learning workspace.
WARNING
Do not create multiple, simultaneous attachments to the same AKS cluster from your workspace. For example, attaching
one AKS cluster to a workspace using two different names. Each new attachment will break the previous existing
attachment(s).
If you want to re-attach an AKS cluster, for example to change TLS or other cluster configuration setting, you must first
remove the existing attachment by using AksCompute.detach().
For more information on creating an AKS cluster using the Azure CLI or portal, see the following articles:
Create an AKS cluster (CLI)
Create an AKS cluster (portal)
Create an AKS cluster (ARM Template on Azure Quickstart templates)
The following example demonstrates how to attach an existing AKS cluster to your workspace:
Python
Azure CLI
Portal
# Attach the cluster to your workgroup. If the cluster has less than 12 virtual CPUs, use the following
instead:
# attach_config = AksCompute.attach_configuration(resource_group = resource_group,
# cluster_name = cluster_name,
# cluster_purpose = AksCompute.ClusterPurpose.DEV_TEST)
attach_config = AksCompute.attach_configuration(resource_group = resource_group,
cluster_name = cluster_name)
aks_target = ComputeTarget.attach(ws, 'myaks', attach_config)
For more information on the classes, methods, and parameters used in this example, see the following reference
documents:
AksCompute.attach_configuration()
AksCompute.ClusterPurpose
AksCompute.attach
# Enable TLS termination when you create an AKS cluster by using provisioning_config object enable_ssl
method
# Enable TLS termination when you attach an AKS cluster by using attach_config object enable_ssl method
Following example shows how to enable TLS termination with custom certificate and custom domain name.
With custom domain and certificate, you must update your DNS record to point to the IP address of scoring
endpoint, please see Update your DNS
APPLIES TO: Python SDK azureml v1
# Enable TLS termination with custom certificate and custom domain when creating an AKS cluster
provisioning_config.enable_ssl(ssl_cert_pem_file="cert.pem",
ssl_key_pem_file="key.pem", ssl_cname="www.contoso.com")
# Enable TLS termination with custom certificate and custom domain when attaching an AKS cluster
attach_config.enable_ssl(ssl_cert_pem_file="cert.pem",
ssl_key_pem_file="key.pem", ssl_cname="www.contoso.com")
NOTE
For more information about how to secure model deployment on AKS cluster, please see use TLS to secure a web service
through Azure Machine Learning
Create
Attach
IMPORTANT
If your AKS cluster is configured with an Internal Load Balancer, using a Microsoft provided certificate is not supported
and you must use custom certificate to enable TLS.
NOTE
For more information about how to secure inferencing environment, please see Secure an Azure Machine Learning
Inferencing Environment
WARNING
Using the Azure Machine Learning studio, SDK, or the Azure CLI extension for machine learning to detach an AKS cluster
does not delete the AKS cluster . To delete the cluster, see Use the Azure CLI with AKS.
Python
Azure CLI
Portal
aks_target.detach()
Troubleshooting
Update the cluster
Updates to Azure Machine Learning components installed in an Azure Kubernetes Service cluster must be
manually applied.
You can apply these updates by detaching the cluster from the Azure Machine Learning workspace and
reattaching the cluster to the workspace.
APPLIES TO: Python SDK azureml v1
Before you can re-attach the cluster to your workspace, you need to first delete any azureml-fe related
resources. If there is no active service in the cluster, you can delete your azureml-fe related resources with the
following code.
If TLS is enabled in the cluster, you will need to supply the TLS/SSL certificate and private key when reattaching
the cluster.
APPLIES TO: Python SDK azureml v1
attach_config = AksCompute.attach_configuration(resource_group=resourceGroup,
cluster_name=kubernetesClusterName)
# If SSL is enabled.
attach_config.enable_ssl(
ssl_cert_pem_file="cert.pem",
ssl_key_pem_file="key.pem",
ssl_cname=sslCname)
attach_config.validate_configuration()
If you no longer have the TLS/SSL certificate and private key, or you are using a certificate generated by Azure
Machine Learning, you can retrieve the files prior to detaching the cluster by connecting to the cluster using
kubectl and retrieving the secret azuremlfessl .
NOTE
Kubernetes stores the secrets in Base64-encoded format. You will need to Base64-decode the cert.pem and key.pem
components of the secrets prior to providing them to attach_config.enable_ssl .
Webservice failures
Many webservice failures in AKS can be debugged by connecting to the cluster using kubectl . You can get the
kubeconfig.json for an AKS cluster by running
Next steps
Use Azure RBAC for Kubernetes authorization
How and where to deploy a model
Create compute targets for model training and
deployment in Azure Machine Learning studio
5/25/2022 • 10 minutes to read • Edit Online
In this article, learn how to create and manage compute targets in Azure Machine studio. You can also create and
manage compute targets with:
Azure Machine Learning Learning SDK or CLI extension for Azure Machine Learning
Compute instance
Compute cluster
Azure Kubernetes Service cluster
Other compute resources
The VS Code extension for Azure Machine Learning.
IMPORTANT
Items marked (preview) in this article are currently in public preview. The preview version is provided without a service
level agreement, and it's not recommended for production workloads. Certain features might not be supported or might
have constrained capabilities. For more information, see Supplemental Terms of Use for Microsoft Azure Previews.
Prerequisites
If you don't have an Azure subscription, create a free account before you begin. Try the free or paid version of
Azure Machine Learning today
An Azure Machine Learning workspace
3. If you see a list of compute resources, select +New above the list.
Follow the steps in Create and manage an Azure Machine Learning compute instance.
Location The Azure region where the compute cluster will be created.
By default, this is the same location as the workspace.
Setting the location to a different region than the workspace
is in preview , and is only available for compute clusters ,
not compute instances.
When using a different region than your workspace or
datastores, you may see increased network latency and data
transfer costs. The latency and costs can occur when creating
the cluster, and when running jobs on it.
Virtual machine type Choose CPU or GPU. This type cannot be changed after
creation
Virtual machine priority Choose Dedicated or Low priority . Low priority virtual
machines are cheaper but don't guarantee the compute
nodes. Your job may be preempted.
Virtual machine size Supported virtual machine sizes might be restricted in your
region. Check the availability list
Select Next to proceed to Advanced Settings and fill out the form as follows:
Minimum number of nodes Minimum number of nodes that you want to provision. If
you want a dedicated number of nodes, set that count here.
Save money by setting the minimum to 0, so you won't pay
for any nodes when the cluster is idle.
Maximum number of nodes Maximum number of nodes that you want to provision. The
compute will autoscale to a maximum of this node count
when a job is submitted.
Idle seconds before scale down Idle time before scaling the cluster down to the minimum
node count.
Enable SSH access Use the same instructions as Enable SSH access for a
compute instance (above).
client_id = os.environ.get('DEFAULT_IDENTITY_CLIENT_ID')
credential = ManagedIdentityCredential(client_id=client_id)
token = credential.get_token('https://storage.azure.com/')
Create or attach an Azure Kubernetes Service (AKS) cluster for large scale inferencing. Use the steps above to
create the AKS cluster. Then fill out the form as follows:
Kubernetes Service Select Create New and fill out the rest of the form. Or
select Use existing and then select an existing AKS cluster
from your subscription.
Virtual machine size Supported virtual machine sizes might be restricted in your
region. Check the availability list
Enable SSL configuration Use this to configure SSL certificate on the compute
4. Select Attach .
NOTE
To create and attach a compute target for training on Azure Arc-enabled Kubernetes cluster, see Configure Azure Arc-
enabled Machine Learning
IMPORTANT
To attach an Azure Kubernetes Services (AKS) or Azure Arc-enabled Kubernetes cluster, you must be subscription owner
or have permission to access AKS cluster resources under the subscription. Otherwise, the cluster list on "attach new
compute" page will be blank.
For a compute cluster , select Nodes at the top, then select the Connection string in the table
for your node.
4. Copy the connection string.
5. For Windows, open PowerShell or a command prompt:
a. Go into the directory or folder where your key is stored
b. Add the -i flag to the connection string to locate the private key and point to where it is stored:
ssh -i <keyname.pem> azureuser@... (rest of connection string)
6. For Linux users, follow the steps from Create and use an SSH key pair for Linux VMs in Azure
7. For SCP use:
scp -i key.pem -P {port} {fileToCopyFromLocal } azureuser@yourComputeInstancePublicIP:~/{destination}
Next steps
After a target is created and attached to your workspace, you use it in your run configuration with a
ComputeTarget object:
In the customer-managed keys concepts article, you learned about the encryption capabilities that Azure
Machine Learning provides. Now learn how to use customer-managed keys with Azure Machine Learning.
Customer-managed keys are used with the following services that Azure Machine Learning relies on:
SERVIC E W H AT IT ’S USED F O R
Azure Cognitive Search Stores workspace metadata for Azure Machine Learning
Azure Storage Account Stores workspace metadata for Azure Machine Learning
TIP
Azure Cosmos DB, Cognitive Search, and Storage Account are secured using the same key. You can use a different key
for Azure Kubernetes Service and Container Instance.
To use a customer-managed key with Azure Cosmos DB, Cognitive Search, and Storage Account, the key is provided
when you create your workspace. The key(s) used with Azure Container Instance and Kubernetes Service are provided
when configuring those resources.
Prerequisites
An Azure subscription.
The following Azure resource providers must be registered:
Microsoft.Storage Azure Storage Account is used as the default storage for the
workspace.
For information on registering resource providers, see Resolve errors for resource provider registration.
Limitations
The customer-managed key for resources the workspace depends on can’t be updated after workspace
creation.
Resources managed by Microsoft in your subscription can’t transfer ownership to you.
You can't delete Microsoft-managed resources used for customer-managed keys without also deleting your
workspace.
IMPORTANT
When using a customer-managed key, the costs for your subscription will be higher because of the additional resources in
your subscription. To estimate the cost, use the Azure pricing calculator.
TIP
If you have problems creating the key, it may be caused by Azure role-based access controls that have been applied in
your subscription. Make sure that the security principal (user, managed identity, service principal, etc.) you are using to
create the key has been assigned the Contributor role for the key vault instance. You must also configure an Access
policy in key vault that grants the security principal Create , Get , Delete , and Purge authorization.
If you plan to use a user-assigned managed identity for your workspace, the managed identity must also be assigned
these roles and access policies.
For more information, see the following articles:
Provide access to key vault keys, certificates, and secrets
Assign a key vault access policy
Use managed identities with Azure Machine Learning
1. From the Azure portal, select the key vault instance. Then select Keys from the left.
2. Select + Generate/impor t from the top of the page. Use the following values to create a key:
Set Options to Generate .
Enter a Name for the key. The name should be something that identifies what the planned use is. For
example, my-cosmos-key .
Set Key type to RSA .
We recommend selecting at least 3072 for the RSA key size .
Leave Enabled set to yes.
Optionally you can set an activation date, expiration date, and tags.
3. Select Create to create the key.
Allow Azure Cosmos DB to access the key
1. To configure the key vault, select it in the Azure portal and then select Access polices from the left menu.
2. To create permissions for Azure Cosmos DB, select + Create at the top of the page. Under Key
permissions , select Get , Unwrap Key , and Wrap key permissions.
3. Under Principal , search for Azure Cosmos DB and then select it. The principal ID for this entry is
a232010e-820c-4083-83bb-3ace5fc29d0b for all regions other than Azure Government. For Azure Government,
the principal ID is 57506a73-e302-42a9-b869-6f12d9ec29e9 .
4. Select Review + Create , and then select Create .
For examples of creating the workspace with a customer-managed key, see the following articles:
C REAT IO N M ET H O D A RT IC L E
REST API Create, run, and delete Azure ML resources with REST
Once the workspace has been created, you'll notice that Azure resource group is created in your subscription.
This group is in addition to the resource group for your workspace. This resource group will contain the
Microsoft-managed resources that your key is used with. The resource group will be named using the formula
of <Azure Machine Learning workspace resource group name><GUID> . It will contain an Azure Cosmos DB instance,
Azure Storage Account, and Azure Cognitive Search.
TIP
The Request Units for the Azure Cosmos DB instance automatically scale as needed.
If your Azure Machine Learning workspace uses a private endpoint, this resource group will also contain a Microsoft-
managed Azure Virtual Network. This VNet is used to secure communications between the managed services and the
workspace. You cannot provide your own VNet for use with the Microsoft-managed resources . You also
cannot modify the vir tual network . For example, you cannot change the IP address range that it uses.
IMPORTANT
If your subscription does not have enough quota for these services, a failure will occur.
WARNING
Don't delete the resource group that contains this Azure Cosmos DB instance, or any of the resources automatically
created in this group. If you need to delete the resource group or Microsoft-managed services in it, you must delete the
Azure Machine Learning workspace that uses it. The resource group resources are deleted when the associated workspace
is deleted.
For more information on customer-managed keys with Cosmos DB, see Configure customer-managed keys for
your Azure Cosmos DB account.
Azure Container Instance
When deploying a trained model to an Azure Container instance (ACI), you can encrypt the deployed resource
using a customer-managed key. For information on generating a key, see Encrypt data with a customer-
managed key.
To use the key when deploying a model to Azure Container Instance, create a new deployment configuration
using AciWebservice.deploy_configuration() . Provide the key information using the following parameters:
cmk_vault_base_url : The URL of the key vault that contains the key.
cmk_key_name : The name of the key.
cmk_key_version : The version of the key.
For more information on creating and using a deployment configuration, see the following articles:
AciWebservice.deploy_configuration() reference
Where and how to deploy
Deploy a model to Azure Container Instances
For more information on using a customer-managed key with ACI, see Encrypt data with a customer-managed
key.
Azure Kubernetes Service
You may encrypt a deployed Azure Kubernetes Service resource using customer-managed keys at any time. For
more information, see Bring your own keys with Azure Kubernetes Service.
This process allows you to encrypt both the Data and the OS Disk of the deployed virtual machines in the
Kubernetes cluster.
IMPORTANT
This process only works with AKS K8s version 1.17 or higher.
Next steps
Customer-managed keys with Azure Machine Learning
Create a workspace with Azure CLI |
Create and manage a workspace |
Create a workspace with a template |
Create, run, and delete Azure ML resources with REST |
Set up a Python development environment for
Azure Machine Learning
5/25/2022 • 7 minutes to read • Edit Online
Learn how to configure a Python development environment for Azure Machine Learning.
The following table shows each development environment covered in this article, along with pros and cons.
EN VIRO N M EN T P RO S C ONS
Local environment Full control of your development Takes longer to get started. Necessary
environment and dependencies. Run SDK packages must be installed, and
with any build tool, environment, or an environment must also be installed
IDE of your choice. if you don't already have one.
The Data Science Virtual Machine Similar to the cloud-based compute A slower getting started experience
(DSVM) instance (Python and the SDK are pre- compared to the cloud-based compute
installed), but with additional popular instance.
data science and machine learning
tools pre-installed. Easy to scale and
combine with other custom tools and
workflows.
Azure Machine Learning compute Easiest way to get started. The entire Lack of control over your development
instance SDK is already installed in your environment and dependencies.
workspace VM, and notebook tutorials Additional cost incurred for Linux VM
are pre-cloned and ready to run. (VM can be stopped when not in use
to avoid charges). See pricing details.
Azure Databricks Ideal for running large-scale intensive Overkill for experimental machine
machine learning workflows on the learning, or smaller-scale experiments
scalable Apache Spark platform. and workflows. Additional cost
incurred for Azure Databricks. See
pricing details.
This article also provides additional usage tips for the following tools:
Jupyter Notebooks: If you're already using Jupyter Notebooks, the SDK has some extras that you should
install.
Visual Studio Code: If you use Visual Studio Code, the Azure Machine Learning extension includes
extensive language support for Python as well as features to make working with the Azure Machine
Learning much more convenient and productive.
Prerequisites
Azure Machine Learning workspace. If you don't have one, you can create an Azure Machine Learning
workspace through the Azure portal, Azure CLI, and Azure Resource Manager templates.
Local and DSVM only: Create a workspace configuration file
The workspace configuration file is a JSON file that tells the SDK how to communicate with your Azure Machine
Learning workspace. The file is named config.json, and it has the following format:
{
"subscription_id": "<subscription-id>",
"resource_group": "<resource-group>",
"workspace_name": "<workspace-name>"
}
This JSON file must be in the directory structure that contains your Python scripts or Jupyter Notebooks. It can
be in the same directory, a subdirectory named .azureml, or in a parent directory.
To use this file from your code, use the Workspace.from_config method. This code loads the information from the
file and connects to your workspace.
Create a workspace configuration file in one of the following methods:
Azure portal
Download the file : In the Azure portal, select Download config.json from the Over view section of
your workspace.
subscription_id = '<subscription-id>'
resource_group = '<resource-group>'
workspace_name = '<workspace-name>'
try:
ws = Workspace(subscription_id = subscription_id, resource_group = resource_group, workspace_name
= workspace_name)
ws.write_config()
print('Library configuration succeeded')
except:
print('Workspace not found')
NOTE
Although not required, it's recommended you use Anaconda or Miniconda to manage Python virtual
environments and install packages.
IMPORTANT
If you're on Linux or macOS and use a shell other than bash (for example, zsh) you might receive errors when you
run some commands. To work around this problem, use the bash command to start a new bash shell and run
the commands there.
2. Create a kernel for your Python virtual environment. Make sure to replace <myenv> with the name of
your Python virtual environment.
TIP
To prevent incurring charges for an unused compute instance, stop the compute instance.
In addition to a Jupyter Notebook server and JupyterLab, you can use compute instances in the integrated
notebook feature inside of Azure Machine Learning studio.
You can also use the Azure Machine Learning Visual Studio Code extension to connect to a remote compute
instance using VS Code.
IMPORTANT
If you plan to use the Data Science VM as a compute target for your training or inferencing jobs, only Ubuntu is
supported.
2. Activate the conda environment containing the Azure Machine Learning SDK.
For Ubuntu Data Science VM:
3. To configure the Data Science VM to use your Azure Machine Learning workspace, create a workspace
configuration file or use an existing one.
Similar to local environments, you can use Visual Studio Code and the Azure Machine Learning Visual Studio
Code extension to interact with Azure Machine Learning.
For more information, see Data Science Virtual Machines.
Next steps
Train and deploy a model on Azure Machine Learning with the MNIST dataset.
See the Azure Machine Learning SDK for Python reference.
Install and set up the CLI (v2)
5/25/2022 • 4 minutes to read • Edit Online
Prerequisites
To use the CLI, you must have an Azure subscription. If you don't have an Azure subscription, create a free
account before you begin. Try the free or paid version of Azure Machine Learning today.
To use the CLI commands in this document from your local environment , you need the Azure CLI.
Installation
The new Machine Learning extension requires Azure CLI version >=2.15.0 . Ensure this requirement is met:
az version
az extension list
Ensure no conflicting extension using the ml namespace is installed, including the azure-cli-ml extension:
az extension add -n ml -y
Run the help command to verify your installation and see available subcommands:
az ml -h
az extension update -n ml
Installation on Linux
If you're using Linux, the fastest way to install the necessary CLI version and the Machine Learning extension is:
curl -sL https://aka.ms/InstallAzureCLIDeb | sudo bash
az extension add -n ml -y
Set up
Login:
az login
If you have access to multiple Azure subscriptions, you can set your active subscription:
Optionally, setup common variables in your shell for usage in subsequent commands:
GROUP="azureml-examples"
LOCATION="eastus"
WORKSPACE="main"
WARNING
This uses Bash syntax for setting variables -- adjust as needed for your shell. You can also replace the values in commands
below inline rather than using variables.
If it doesn't already exist, you can create the Azure resource group:
Machine learning subcommands require the --workspace/-w and --resource-group/-g parameters. To avoid
typing these repeatedly, configure defaults:
TIP
Most code examples assume you have set a default workspace and resource group. You can override these on the
command line.
TIP
Combining with --output/-o allows for more readable output formats.
Secure communications
The ml CLI extension (sometimes called 'CLI v2') for Azure Machine Learning sends operational data (YAML
parameters and metadata) over the public internet. All the ml CLI extension commands communicate with the
Azure Resource Manager. This communication is secured using HTTPS/TLS 1.2.
Data in a data store that is secured in a virtual network is not sent over the public internet. For example, if your
training data is located in the default storage account for the workspace, and the storage account is in a virtual
network.
NOTE
With the previous extension ( azure-cli-ml , sometimes called 'CLI v1'), only some of the commands communicate with
the Azure Resource Manager. Specifically, commands that create, update, delete, list, or show Azure resources. Operations
such as submitting a training job communicate directly with the Azure Machine Learning workspace. If your workspace is
secured with a private endpoint, that is enough to secure commands provided by the azure-cli-ml extension.
Public workspace
Private workspace
If your Azure Machine Learning workspace is public (that is, not behind a virtual network), then there is no
additional configuration required. Communications are secured using HTTPS/TLS 1.2
Next steps
Train models using CLI (v2)
Set up the Visual Studio Code Azure Machine Learning extension
Train an image classification TensorFlow model using the Azure Machine Learning Visual Studio Code
extension
Explore Azure Machine Learning with examples
Manage software environments in Azure Machine
Learning studio
5/25/2022 • 2 minutes to read • Edit Online
In this article, learn how to create and manage Azure Machine Learning environments in the Azure Machine
Learning studio. Use the environments to track and reproduce your projects' software dependencies as they
evolve.
The examples in this article show how to:
Browse curated environments.
Create an environment and specify package dependencies.
Edit an existing environment specification and its properties.
Rebuild an environment and view image build logs.
For a high-level overview of how environments work in Azure Machine Learning, see What are ML
environments? For information, see How to set up a development environment for Azure Machine Learning.
Prerequisites
An Azure subscription. If you don't have an Azure subscription, create a free account before you begin.
An Azure Machine Learning workspace
Create an environment
To create an environment:
1. Open your workspace in Azure Machine Learning studio.
2. On the left side, select Environments .
3. Select the Custom environments tab.
4. Select the Create button.
Create an environment by specifying one of the following:
Create a new docker context
Start from an existing custom or curated environment
Upload existing docker context
Use existing docker image with conda
You can customize the configuration file, add tags and descriptions, and review the properties before creating
the entity.
If a new environment is given the same name as an existing environment in the workspace, a new version of the
existing one will be created.
Rebuild an environment
In the details page, click on the rebuild button to rebuild the environment. Any unpinned package versions in
your configuration files may be updated to the most recent version with this action.
How to create and manage files in your workspace
5/25/2022 • 2 minutes to read • Edit Online
Learn how to create and manage the files in your Azure Machine Learning workspace. These files are stored in
the default workspace storage. Files and folders can be shared with anyone else withe read access to the
workspace, and can be used from any compute instances in the workspace.
Prerequisites
An Azure subscription. If you don't have an Azure subscription, create a free account before you begin.
A Machine Learning workspace. See Create an Azure Machine Learning workspace.
Create files
To create a new file in your default folder ( Users > yourname ):
1. Open your workspace in Azure Machine Learning studio.
2. On the left side, select Notebooks .
3. Select the + image.
4. Select the Create new file image.
Clone samples
Your workspace contains a Sample notebooks folder with notebooks designed to help you explore the SDK
and serve as examples for your own machine learning projects. Clone these notebooks into your own folder to
run and edit them.
For an example, see Tutorial: Create your first ML experiment.
Share files
Copy and paste the URL to share a file. Only other users of the workspace can access this URL. Learn more
about granting access to your workspace.
Delete a file
You can't delete the Sample notebooks files. These files are part of the studio and are updated each time a
new SDK is published.
You can delete files found in your Files section in any of these ways:
In the studio, select the ... at the end of a folder or file. Make sure to use a supported browser (Microsoft
Edge, Chrome, or Firefox).
Use a terminal from any compute instance in your workspace. The folder ~/cloudfiles is mapped to storage
on your workspace storage account.
In either Jupyter or JupyterLab with their tools.
Run Jupyter notebooks in your workspace
5/25/2022 • 13 minutes to read • Edit Online
Learn how to run your Jupyter notebooks directly in your workspace in Azure Machine Learning studio. While
you can launch Jupyter or JupyterLab, you can also edit and run your notebooks without leaving the workspace.
For information on how to create and manage files, including notebooks, see Create and manage files in your
workspace.
IMPORTANT
Features marked as (preview) are provided without a service level agreement, and it's not recommended for production
workloads. Certain features might not be supported or might have constrained capabilities. For more information, see
Supplemental Terms of Use for Microsoft Azure Previews.
Prerequisites
An Azure subscription. If you don't have an Azure subscription, create a free account before you begin.
A Machine Learning workspace. See Create an Azure Machine Learning workspace.
Your user identity must have access to your workspace's default storage account. Whether you can read, edit,
or create notebooks depends on your access level to your workspace. For example, a Contributor can edit the
notebook, while a Reader could only view it.
Edit a notebook
To edit a notebook, open any notebook located in the User files section of your workspace. Click on the cell you
wish to edit. If you don't have any notebooks in this section, see Create and manage files in your workspace.
You can edit the notebook without connecting to a compute instance. When you want to run the cells in the
notebook, select or create a compute instance. If you select a stopped compute instance, it will automatically
start when you run the first cell.
When a compute instance is running, you can also use code completion, powered by Intellisense, in any Python
notebook.
You can also launch Jupyter or JupyterLab from the notebook toolbar. Azure Machine Learning does not provide
updates and fix bugs from Jupyter or JupyterLab as they are Open Source products outside of the boundary of
Microsoft Support.
Focus mode
Use focus mode to expand your current view so you can focus on your active tabs. Focus mode hides the
Notebooks file explorer.
1. In the terminal window toolbar, select Focus mode to turn on focus mode. Depending on your window
width, the tool may be located under the ... menu item in your toolbar.
2. While in focus mode, return to the standard view by selecting Standard view .
These same snippets are available when you open your notebook in VS Code. For a complete list of available
snippets, see Azure Machine Learning VS Code Snippets.
You can browse and search the list of snippets by using the notebook toolbar to open the snippet panel.
From the snippets panel, you can also submit a request to add new snippets.
Share a notebook
Your notebooks are stored in your workspace's storage account, and can be shared with others, depending on
their access level to your workspace. They can open and edit the notebook as long as they have the appropriate
access. For example, a Contributor can edit the notebook, while a Reader could only view it.
Other users of your workspace can find your notebook in the Notebooks , User files section of Azure ML
studio. By default, your notebooks are in a folder with your username, and others can access them there.
You can also copy the URL from your browser when you open a notebook, then send to others. As long as they
have appropriate access to your workspace, they can open the notebook.
Since you don't share compute instances, other users who run your notebook will do so on their own compute
instance.
NOTE
Comments are saved into the code cell's metadata.
Every notebook is autosaved every 30 seconds. AutoSave updates only the initial ipynb file, not the checkpoint fi
le.
Select Checkpoints in the notebook menu to create a named checkpoint and to revert the notebook to a saved
checkpoint.
Export a notebook
In the notebook toolbar, select the menu and then Expor t As to export the notebook as any of the supported
types:
Notebook
Python
HTML
LaTeX
The exported file is saved on your computer.
Once you are connected to a compute instance, use the toolbar to run all cells in the notebook, or Control +
Enter to run a single selected cell.
Only you can see and use the compute instances you create. Your User files are stored separately from the VM
and are shared among all compute instances in the workspace.
View logs and output
Use notebook widgets to view the progress of the run and logs. A widget is asynchronous and provides updates
until training finishes. Azure Machine Learning widgets are also supported in Jupyter and JupterLab.
A C T IO N RESULT
Stop the kernel Stops any running cell. Running a cell will automatically
restart the kernel.
These actions will reset the notebook state and will reset all variables in the notebook.
A C T IO N RESULT
Manage packages
Since your compute instance has multiple kernels, make sure use %pip or %conda magic functions, which
install packages into the currently-running kernel. Don't use !pip or !conda , which refers to all packages
(including packages outside the currently-running kernel).
Status indicators
An indicator next to the Compute dropdown shows its status. The status is also shown in the dropdown itself.
C O LO R C O M P UT E STAT US
C O LO R K ERN EL STAT US
O Toggle output
II Interrupt kernel
00 Restart kernel
1 Change to h1
2 Change to h2
3 Change to h3
4 Change to h4
5 Change to h5
6 Change to h6
Using the following keystroke shortcuts, you can more easily navigate and run code in Azure Machine Learning
notebooks when in Edit mode.
Control/Command + ] Indent
Control/Command + [ Dedent
Control/Command + Z Undo
Control/Command + Y Redo
Troubleshooting
Connecting to a notebook : If you can't connect to a notebook, ensure that web socket communication
is not disabled. For compute instance Jupyter functionality to work, web socket communication must be
enabled. Ensure your network allows websocket connections to *.instances.azureml.net and
*.instances.azureml.ms.
Private endpoint : When a compute instance is deployed in a workspace with a private endpoint, it can
be only be accessed from within virtual network. If you are using custom DNS or hosts file, add an entry
for < instance-name >.< region >.instances.azureml.ms with the private IP address of your workspace
private endpoint. For more information see the custom DNS article.
Kernel crash : If your kernel crashed and was restarted, you can run the following command to look at
jupyter log and find out more details: sudo journalctl -u jupyter . If kernel issues persist, consider using
a compute instance with more memory.
Kernel not found or Kernel operations were disabled : When using the default Python 3.8 kernel on
a compute instance, you may get an error such as "Kernel not found" or "Kernel operations were
disabled". To fix, use one of the following methods:
Create a new compute instance. This will use a new image where this problem has been resolved.
Use the Py 3.6 kernel on the existing compute instance.
From a terminal in the default py38 environment, run pip install ipykernel==6.6.0 OR
pip install ipykernel==6.0.3
Expired token : If you run into an expired token issue, sign out of your Azure ML studio, sign back in, and
then restart the notebook kernel.
File upload limit : When uploading a file through the notebook's file explorer, you are limited files that
are smaller than 5TB. If you need to upload a file larger than this, we recommend that you use one of the
following methods:
Use the SDK to upload the data to a datastore. For more information, see the Upload the data section
of the tutorial.
Use Azure Data Factory to create a data ingestion pipeline.
Next steps
Run your first experiment
Backup your file storage with snapshots
Working in secure environments
Access a compute instance terminal in your
workspace
5/25/2022 • 3 minutes to read • Edit Online
Prerequisites
An Azure subscription. If you don't have an Azure subscription, create a free account before you begin.
A Machine Learning workspace. See Create an Azure Machine Learning workspace.
Access a terminal
To access the terminal:
1. Open your workspace in Azure Machine Learning studio.
2. On the left side, select Notebooks .
3. Select the Open terminal image.
4. When a compute instance is running, the terminal window for that compute instance appears.
5. When no compute instance is running, use the Compute section on the right to start or create a
compute instance.
In addition to the steps above, you can also access the terminal from:
RStudio (See [Add RStudio]([Create and manage an Azure Machine Learning compute instance]): Select the
Terminal tab on top left.
Jupyter Lab: Select the Terminal tile under the Other heading in the Launcher tab.
Jupyter: Select New>Terminal on top right in the Files tab.
SSH to the machine, if you enabled SSH access when the compute instance was created.
NOTE
Add your files and folders anywhere under the ~/cloudfiles/code/Users folder so they will be visible in all your Jupyter
environments.
Learn more about cloning Git repositories into your workspace file system.
Install packages
Install packages from a terminal window. Install Python packages into the Python 3.8 - AzureML environment.
Install R packages into the R environment.
Or you can install packages directly in Jupyter Notebook or RStudio:
RStudio [Add RStudio]([Create and manage an Azure Machine Learning compute instance]: Use the
Packages tab on the bottom right, or the Console tab on the top left.
Python: Add install code and execute in a Jupyter Notebook cell.
NOTE
For package management within a notebook, use %pip or %conda magic functions to automatically install packages into
the currently-running kernel, rather than !pip or !conda which refers to all packages (including packages outside the
currently-running kernel)
3. Install pip and ipykernel package to the new environment and create a kernel for that conda env
WARNING
Make sure you close any unused sessions to preserve your compute instance's resources. Idle terminals may impact
performance of compute instances.
Create & use software environments in Azure
Machine Learning
5/25/2022 • 13 minutes to read • Edit Online
Prerequisites
The Azure Machine Learning SDK for Python (>= 1.13.0)
An Azure Machine Learning workspace
Create an environment
The following sections explore the multiple ways that you can create an environment for your experiments.
Instantiate an environment object
To manually create an environment, import the Environment class from the SDK. Then use the following code to
instantiate an environment object.
ws = Workspace.from_config()
env = Environment.get(workspace=ws, name="AzureML-sklearn-0.24-ubuntu18.04-py37-cpu")
You can list the curated environments and their packages by using the following code:
envs = Environment.list(workspace=ws)
WARNING
Don't start your own environment name with the AzureML prefix. This prefix is reserved for curated environments.
Enable Docker
Azure Machine Learning builds a Docker image and creates a Python environment within that container, given
your specifications. The Docker images are cached and reused: the first run in a new environment typically takes
longer as the image is build. For local runs, specify Docker within the RunConfiguration.
By default, the newly built Docker image appears in the container registry that's associated with the workspace.
The repository name has the form azureml/azureml_<uuid>. The unique identifier (uuid) part of the name
corresponds to a hash that's computed from the environment configuration. This correspondence allows the
service to determine whether an image for the given environment already exists for reuse.
Use a prebuilt Docker image
By default, the service automatically uses one of the Ubuntu Linux-based base images, specifically the one
defined by azureml.core.environment.DEFAULT_CPU_IMAGE . It then installs any specified Python packages defined
by the provided Azure ML environment. Other Azure ML CPU and GPU base images are available in the
container repository. It is also possible to use a custom Docker base image.
# Specify custom Docker base image and registry, if you don't want to use the defaults
myenv.docker.base_image="your_base-image"
myenv.docker.base_image_registry="your_registry_location"
IMPORTANT
Azure Machine Learning only supports Docker images that provide the following software:
Ubuntu 18.04 or greater.
Conda 4.7.# or greater.
Python 3.6+.
A POSIX compliant shell available at /bin/sh is required in any container image used for training.
When using custom Docker images, it is recommended that you pin package versions in order to better ensure
reproducibility.
Specify your own Python interpreter
In some situations, your custom base image may already contain a Python environment with packages that you
want to use.
To use your own installed packages and disable Conda, set the parameter
Environment.python.user_managed_dependencies = True . Ensure that the base image contains a Python interpreter,
and has the packages your training script needs.
For example, to run in a base Miniconda environment that has NumPy package installed, first specify a
Dockerfile with a step to install the package. Then set the user-managed dependencies to True .
You can also specify a path to a specific Python interpreter within the image, by setting the
Environment.python.interpreter_path variable.
dockerfile = """
FROM mcr.microsoft.com/azureml/openmpi3.1.2-ubuntu18.04:20210615.v1
RUN conda install numpy
"""
myenv.docker.base_image = None
myenv.docker.base_dockerfile = dockerfile
myenv.python.user_managed_dependencies=True
myenv.python.interpreter_path = "/opt/miniconda/bin/python"
WARNING
If you install some Python dependencies in your Docker image and forget to set user_managed_dependencies=True ,
those packages will not exist in the execution environment thus causing runtime failures. By default, Azure ML will build a
Conda environment with dependencies you specified, and will execute the run in that environment instead of using any
Python libraries that you installed on the base image.
details = environment.get_image_details(workspace=ws)
To obtain the image details from an environment autosaved from the execution of a run, use the following code:
details = run.get_environment().get_image_details(workspace=ws)
myenv = Environment.from_existing_conda_environment(name="myenv",
conda_environment_name="mycondaenv")
An environment definition can be saved to a directory in an easily editable format with the save_to_directory()
method. Once modified, a new environment can be instantiated by loading files from the directory.
myenv = Environment(name="myenv")
conda_dep = CondaDependencies()
IMPORTANT
If you use the same environment definition for another run, the Azure Machine Learning service reuses the cached image
of your environment. If you create an environment with an unpinned package dependency, for example numpy , that
environment will keep using the package version installed at the time of environment creation. Also, any future
environment with matching definition will keep using the old version. For more information, see Environment building,
caching, and reuse.
Manage environments
Manage environments so that you can update, track, and reuse them across compute targets and with other
users of the workspace.
Register environments
The environment is automatically registered with your workspace when you submit a run or deploy a web
service. You can also manually register the environment by using the register() method. This operation makes
the environment into an entity that's tracked and versioned in the cloud. The entity can be shared between
workspace users.
The following code registers the myenv environment to the ws workspace.
myenv.register(workspace=ws)
When you use the environment for the first time in training or deployment, it's registered with the workspace.
Then it's built and deployed on the compute target. The service caches the environments. Reusing a cached
environment takes much less time than using a new service or one that has been updated.
Get existing environments
The Environment class offers methods that allow you to retrieve existing environments in your workspace. You
can retrieve environments by name, as a list, or by a specific training run. This information is helpful for
troubleshooting, auditing, and reproducibility.
View a list of environments
View the environments in your workspace by using the Environment.list(workspace="workspace_name") class.
Then select an environment to reuse.
Get an environment by name
You can also get a specific environment by name and version. The following code uses the get() method to
retrieve version 1 of the myenv environment on the ws workspace.
restored_environment = Environment.get(workspace=ws,name="myenv",version="1")
It is useful to first build images locally using the build_local() method. To build a docker image, set the
optional parameter useDocker=True . To push the resulting image into the AzureML workspace container registry,
set pushImageToWorkspaceAcr=True .
WARNING
Changing the order of dependencies or channels in an environment will result in a new environment and will require a
new image build. In addition, calling the build() method for an existing image will update its dependencies if there are
new versions.
# Submit run
run = exp.submit(src)
NOTE
To disable the run history or run snapshots, use the setting under src.run_config.history .
IMPORTANT
Use CPU SKUs for any image build on compute.
If you don't specify the environment in your run configuration, then the service creates a default environment
when you submit your run.
# Define the model, inference, & deployment configuration and web service name and location to deploy
service = Model.deploy(
workspace = ws,
name = "my_web_service",
models = [model],
inference_config = inference_config,
deployment_config = deployment_config)
Notebooks
Code examples in this article are also included in the using environments notebook.
To install a Conda environment as a kernel in a notebook, see add a new Jupyter kernel.
Deploy a model using a custom Docker base image demonstrates how to deploy a model using a custom
Docker base image.
This example notebook demonstrates how to deploy a Spark model as a web service.
Next steps
After you have a trained model, learn how and where to deploy models.
View the Environment class SDK reference.
Use private Python packages with Azure Machine
Learning
5/25/2022 • 3 minutes to read • Edit Online
Prerequisites
The Azure Machine Learning SDK for Python
An Azure Machine Learning workspace
Internally, Azure Machine Learning service replaces the URL by secure SAS URL, so your wheel file is kept
private and secure.
3. Create an Azure Machine Learning environment and add Python packages from the feed.
env = Environment(name="my-env")
cd = CondaDependencies()
cd.add_pip_package("<my-package>")
cd.set_pip_option("--extra-index-url https://pkgs.dev.azure.com/<MY-ORG>/_packaging/<MY-
FEED>/pypi/simple")")
env.python.conda_dependencies=cd
The environment is now ready to be used in training runs or web service endpoint deployments. When building
the environment, Azure Machine Learning service uses the PAT to authenticate against the feed with the
matching base URL.
IMPORTANT
You must complete this step to be able to train or deploy models using the private package repository.
After completing these configurations, you can reference the packages in the Azure Machine Learning
environment definition by their full URL in Azure blob storage.
Next steps
Learn more about enterprise security in Azure Machine Learning
Where to save and write files for Azure Machine
Learning experiments
5/25/2022 • 4 minutes to read • Edit Online
In this article, you learn where to save input files, and where to write output files from your experiments to
prevent storage limit errors and experiment latency.
When launching training runs on a compute target, they are isolated from outside environments. The purpose of
this design is to ensure reproducibility and portability of the experiment. If you run the same script twice, on the
same or another compute target, you receive the same results. With this design, you can treat compute targets
as stateless computation resources, each having no affinity to the jobs that are running after they are finished.
To resolve this error, store your experiment files on a datastore. If you can't use a datastore, the below table
offers possible alternate solutions.
Less than 2000 files & can't use a datastore Override snapshot size limit with
azureml._restclient.snapshots_client.SNAPSHOT_MAX_SIZE_BYTES
= 'insert_desired_size'
This may take several minutes depending on the number
and size of files.
Must use specific script directory To prevent unnecessary files from being included in the
snapshot, make an ignore file ( .gitignore or .amlignore
) in the directory. Add the files and directories to exclude to
this file. For more information on the syntax to use inside
this file, see syntax and patterns for .gitignore . The
.amlignore file uses the same syntax. If both files exist, the
.amlignore file is used and the .gitignore file is unused.
IMPORTANT
Two folders, outputs and logs, receive special treatment by Azure Machine Learning. During training, when you write files
to ./outputs and ./logs folders, the files will automatically upload to your run history, so that you have access to
them once your run is finished.
For output such as status messages or scoring results, write files to the ./outputs folder, so they
are persisted as artifacts in run history. Be mindful of the number and size of files written to this folder, as
latency may occur when the contents are uploaded to run history. If latency is a concern, writing files to a
datastore is recommended.
To save written file as logs in run histor y, write files to ./logs folder. The logs are uploaded in real
time, so this method is suitable for streaming live updates from a remote run.
Next steps
Learn more about accessing data from storage.
Learn more about Create compute targets for model training and deployment
Set up the Visual Studio Code Azure Machine
Learning extension (preview)
5/25/2022 • 2 minutes to read • Edit Online
Learn how to set up the Azure Machine Learning Visual Studio Code extension for your machine learning
workflows.
The Azure Machine Learning extension for VS Code provides a user interface to:
Manage Azure Machine Learning resources (experiments, virtual machines, models, deployments, etc.)
Develop locally using remote compute instances
Train machine learning models
Debug machine learning experiments locally
Schema-based language support, autocompletion and diagnostics for specification file authoring
Prerequisites
Azure subscription. If you don't have one, sign up to try the free or paid version of Azure Machine Learning.
Visual Studio Code. If you don't have it, install it.
Python
(Optional) To create resources using the extension, you need to install the CLI (v2). For setup instructions, see
Install, set up, and use the CLI (v2).
Clone the community driven repository
In this article, you'll learn how to connect to an Azure Machine Learning compute instance using Visual Studio
Code.
An Azure Machine Learning compute instance is a fully managed cloud-based workstation for data scientists
and provides management and enterprise readiness capabilities for IT administrators.
There are two ways you can connect to a compute instance from Visual Studio Code:
Remote compute instance. This option provides you with a full-featured development environment for
building your machine learning projects.
Remote Jupyter Notebook server. This option allows you to set a compute instance as a remote Jupyter
Notebook server.
IMPORTANT
To connect to a compute instance behind a firewall, see use workspace behind a Firewall for Azure Machine Learning.
Studio
VS Code
Navigate to ml.azure.com
IMPORTANT
In order to connect to your remote compute instance from Visual Studio Code, make sure that the account you're logged
into in Azure Machine Learning studio is the same one you use in Visual Studio Code.
Compute
1. Select the Compute tab
2. In the Application URI column, select VS Code for the compute instance you want to connect to.
Notebook
1. Select the Notebook tab
2. In the Notebook tab, select the file you want to edit.
3. Select Editors > Edit in VS Code (preview) .
A new window launches for your remote compute instance. When attempting to make a connection to a remote
compute instance, the following tasks are taking place:
1. Authorization. Some checks are performed to make sure the user attempting to make a connection is
authorized to use the compute instance.
2. VS Code Remote Server is installed on the compute instance.
3. A WebSocket connection is established for real-time interaction.
Once the connection is established, it's persisted. A token is issued at the start of the session which gets
refreshed automatically to maintain the connection with your compute instance.
After you connect to your remote compute instance, use the editor to:
Author and manage files on your remote compute instance or file share.
Use the VS Code integrated terminal to run commands and applications on your remote compute instance.
Debug your scripts and applications
Use VS Code to manage your Git repositories
IMPORTANT
You MUST run a cell in order to establish the connection.
At this point, you can continue to run cells in your Jupyter Notebook.
TIP
You can also work with Python script files (.py) containing Jupyter-like code cells. For more information, see the Visual
Studio Code Python interactive documentation.
Next steps
Now that you've set up Visual Studio Code Remote, you can use a compute instance as remote compute from
Visual Studio Code to interactively debug your code.
Tutorial: Train your first ML model shows how to use a compute instance with an integrated notebook.
Git integration for Azure Machine Learning
5/25/2022 • 6 minutes to read • Edit Online
Git is a popular version control system that allows you to share and collaborate on your projects.
Azure Machine Learning fully supports Git repositories for tracking work - you can clone repositories directly
onto your shared workspace file system, use Git on your local workstation, or use Git from a CI/CD pipeline.
When submitting a job to Azure Machine Learning, if source files are stored in a local git repository then
information about the repo is tracked as part of the training process.
Since Azure Machine Learning tracks information from a local git repo, it isn't tied to any specific central
repository. Your repository can be cloned from GitHub, GitLab, Bitbucket, Azure DevOps, or any other git-
compatible service.
TIP
Use Visual Studio Code to interact with Git through a graphical user interface. To connect to an Azure Machine Learning
remote compute instance using Visual Studio Code, see Connect to an Azure Machine Learning compute instance in
Visual Studio Code (preview)
For more information on Visual Studio Code version control features, see Using Version Control in VS Code and Working
with GitHub in VS Code.
TIP
There is a performance difference between cloning to the local file system of the compute instance or cloning to the
mounted filesystem (mounted as the ~/cloudfiles/code directory). In general, cloning to the local filesystem will have
better performance than to the mounted filesystem. However, the local filesystem is lost if you delete and recreate the
compute instance. The mounted filesystem is kept if you delete and recreate the compute instance.
You can clone any Git repository you can authenticate to (GitHub, Azure Repos, BitBucket, etc.)
For more information about cloning, see the guide on how to use Git CLI.
This creates a new ssh key, using the provided email as a label.
3. When you're prompted to "Enter a file in which to save the key" press Enter. This accepts the default file
location.
4. Verify that the default location is '/home/azureuser/.ssh' and press enter. Otherwise specify the location
'/home/azureuser/.ssh'.
TIP
Make sure the SSH key is saved in '/home/azureuser/.ssh'. This file is saved on the compute instance is only accessible by
the owner of the Compute Instance
> Enter a file in which to save the key (/home/azureuser/.ssh/id_rsa): [Press enter]
5. At the prompt, type a secure passphrase. We recommend you add a passphrase to your SSH key for added
security
cat ~/.ssh/id_rsa.pub
TIP
Copy and Paste in Terminal
Windows: Ctrl-Insert to copy and use Ctrl-Shift-v or Shift-Insert to paste.
Mac OS: Cmd-c to copy and Cmd-v to paste.
FireFox/IE may not support clipboard permissions properly.
SSH may display the server's SSH fingerprint and ask you to verify it. You should verify that the displayed
fingerprint matches one of the fingerprints in the SSH public keys page.
SSH displays this fingerprint when it connects to an unknown host to protect you from man-in-the-middle
attacks. Once you accept the host's fingerprint, SSH will not prompt you again unless the fingerprint changes.
3. When you are asked if you want to continue connecting, type yes . Git will clone the repo and set up the
origin remote to connect with SSH for future Git commands.
azureml.git.repository_uri git ls-remote --get-url The URI that your repository was
cloned from.
mlflow.source.git.repoURL git ls-remote --get-url The URI that your repository was
cloned from.
azureml.git.branch git symbolic-ref --short HEAD The active branch when the run was
submitted.
mlflow.source.git.branch git symbolic-ref --short HEAD The active branch when the run was
submitted.
azureml.git.commit git rev-parse HEAD The commit hash of the code that was
submitted for the run.
mlflow.source.git.commit git rev-parse HEAD The commit hash of the code that was
submitted for the run.
This information is sent for runs that use an estimator, machine learning pipeline, or script run.
If your training files are not located in a git repository on your development environment, or the git command
is not available, then no git-related information is tracked.
TIP
To check if the git command is available on your development environment, open a shell session, command prompt,
PowerShell or other command line interface and type the following command:
git --version
If installed, and in the path, you receive a response similar to git version 2.4.1 . For more information on installing git
on your development environment, see the Git website.
"properties": {
"_azureml.ComputeTargetType": "batchai",
"ContentSnapshotId": "5ca66406-cbac-4d7d-bc95-f5a51dd3e57e",
"azureml.git.repository_uri": "git@github.com:azure/machinelearningnotebooks",
"mlflow.source.git.repoURL": "git@github.com:azure/machinelearningnotebooks",
"azureml.git.branch": "master",
"mlflow.source.git.branch": "master",
"azureml.git.commit": "4d2b93784676893f8e346d5f0b9fb894a9cf0742",
"mlflow.source.git.commit": "4d2b93784676893f8e346d5f0b9fb894a9cf0742",
"azureml.git.dirty": "True",
"AzureML.DerivedImageName": "azureml/azureml_9d3568242c6bfef9631879915768deaf",
"ProcessInfoFile": "azureml-logs/process_info.json",
"ProcessStatusFile": "azureml-logs/process_status.json"
}
Python SDK
After submitting a training run, a Run object is returned. The properties attribute of this object contains the
logged git information. For example, the following code retrieves the commit hash:
APPLIES TO: Python SDK azureml v1
run.properties['azureml.git.commit']
Next steps
Use compute targets for model training
Troubleshoot environment image builds
5/25/2022 • 7 minutes to read • Edit Online
Learn how to troubleshoot issues with Docker environment image builds and package installations.
Prerequisites
An Azure subscription. Try the free or paid version of Azure Machine Learning.
The Azure Machine Learning SDK.
The Azure CLI.
The CLI extension for Azure Machine Learning.
To debug locally, you must have a working Docker installation on your local system.
IMPORTANT
Make sure that the platform and interpreter on your local compute cluster match the ones on the remote compute
cluster.
Timeout
The following network issues can cause timeout errors:
Low internet bandwidth
Server issues
Large dependencies that can't be downloaded with the given conda or pip timeout settings
Messages similar to the following examples will indicate the issue:
If you get an error message, try one of the following possible solutions:
Try a different source, such as mirrors, Azure Blob Storage, or other Python feeds, for the dependency.
Update conda or pip. If you're using a custom Docker file, update the timeout settings.
Some pip versions have known issues. Consider adding a specific version of pip to the environment
dependencies.
Package not found
The following errors are most common for image build failures:
Conda package couldn't be found:
ResolvePackageNotFound:
- not-existing-conda-package
ERROR: Could not find a version that satisfies the requirement invalid-pip-package (from versions:
none)
ERROR: No matching distribution found for invalid-pip-package
Check that the package exists on the specified sources. Use pip search to verify pip dependencies:
pip search azureml-core
Installer notes
Make sure that the required distribution exists for the specified platform and Python interpreter version.
For pip dependencies, go to https://pypi.org/project/[PROJECT NAME]/[VERSION]/#files to see if the required
version is available. Go to https://pypi.org/project/azureml-core/1.11.0/#files to see an example.
For conda dependencies, check the package on the channel repository. For channels maintained by Anaconda,
Inc., check the Anaconda Packages page.
Pip package update
During an installation or an update of a pip package, the resolver might need to update an already-installed
package to satisfy the new requirements. Uninstallation can fail for various reasons related to the pip version or
the way the dependency was installed. The most common scenario is that a dependency installed by conda
couldn't be uninstalled by pip. For this scenario, consider uninstalling the dependency by using
conda remove mypackage .
Installer issues
Certain installer versions have issues in the package resolvers that can lead to a build failure.
If you're using a custom base image or Dockerfile, we recommend using conda version 4.5.4 or later.
A pip package is required to install pip dependencies. If a version isn't specified in the environment, the latest
version will be used. We recommend using a known version of pip to avoid transient issues or breaking changes
that the latest version of the tool might cause.
Consider pinning the pip version in your environment if you see the following message:
Warning: you have pip-installed dependencies in your environment file, but you do not list pip itself as one
of your conda dependencies. Conda may not use the correct pip to install your packages, and they may end up
in the wrong place. Please add an explicit pip dependency. I'm adding one for you, but still nagging you.
ERROR: THESE PACKAGES DO NOT MATCH THE HASHES FROM THE REQUIREMENTS FILE. If you have updated the package
versions, update the hashes as well. Otherwise, examine the package contents carefully; someone may have
tampered with them.
Pip installation can be stuck in an infinite loop if there are unresolvable conflicts in the dependencies. If you're
working locally, downgrade the pip version to < 20.3. In a conda environment created from a YAML file, you'll
see this issue only if conda-forge is the highest-priority channel. To mitigate the issue, explicitly specify pip <
20.3 (!=20.3 or =20.2.4 pin to other version) as a conda dependency in the conda specification file.
ModuleNotFoundError: No module named 'distutils.dir_util'
When setting up your environment, sometimes you'll run into the issue ModuleNotFoundError : No module
named 'distutils.dir_util' . To fix it, run the following command:
Running this command installs the correct module dependencies to configure your environment.
Build failure when using Spark packages
Configure the environment to not precache the packages.
env.spark.precache_packages = False
Service-side failures
See the following scenarios to troubleshoot possible service-side failures.
You're unable to pull an image from a container registry, or the address couldn't be resolved for a container
registry
Possible issues:
The path name to the container registry might not be resolving correctly. Check that image names use
double slashes and the direction of slashes on Linux versus Windows hosts is correct.
If a container registry behind a virtual network is using a private endpoint in an unsupported region,
configure the container registry by using the service endpoint (public access) from the portal and retry.
After you put the container registry behind a virtual network, run the Azure Resource Manager template so
the workspace can communicate with the container registry instance.
You get a 401 error from a workspace container registry
Resynchronize storage keys by using ws.sync_keys().
The environment keeps throwing a "Waiting for other conda operations to finish…" error
When an image build is ongoing, conda is locked by the SDK client. If the process crashed or was canceled
incorrectly by the user, conda stays in the locked state. To resolve this issue, manually delete the lock file.
Your custom Docker image isn't in the registry
Check if the correct tag is used and that user_managed_dependencies = True .
Environment.python.user_managed_dependencies = True disables conda and uses the user's installed packages.
Next steps
Train a machine learning model to categorize flowers
Train a machine learning model by using a custom Docker image
Connect to storage with Azure Machine Learning
datastores
5/25/2022 • 4 minutes to read • Edit Online
In this article, learn how to connect to data storage services on Azure with Azure Machine Learning datastores.
Prerequisites
An Azure subscription. If you don't have an Azure subscription, create a free account before you begin. Try
the free or paid version of Azure Machine Learning.
The Azure Machine Learning SDK for Python.
An Azure Machine Learning workspace.
NOTE
Azure Machine Learning datastores do not create the underlying storage accounts, rather they register an existing
storage account for use in Azure Machine Learning. It is not a requirement to use Azure Machine Learning datastores -
you can use storage URIs directly assuming you have access to the underlying data.
# my_blob_datastore.yml
$schema: https://azuremlschemas.azureedge.net/latest/azureBlob.schema.json
name: my_blob_ds # add name of your datastore here
type: azure_blob
description: here is a description # add a description of your datastore here
account_name: my_account_name # add storage account name here
container_name: my_container_name # add storage container name here
# my_adls_datastore.yml
$schema: https://azuremlschemas.azureedge.net/latest/azureDataLakeGen2.schema.json
name: adls_gen2_credless_example
type: azure_data_lake_gen2
description: Credential-less datastore pointing to an Azure Data Lake Storage Gen2.
account_name: mytestdatalakegen2
filesystem: my-gen2-container
# my_files_datastore.yml
$schema: https://azuremlschemas.azureedge.net/latest/azureFile.schema.json
name: file_example
type: azure_file
description: Datastore pointing to an Azure File Share.
account_name: mytestfilestore
file_share_name: my-share
credentials:
account_key: XxXxXxXXXXXXXxXxXxxXxxXXXXXXXXxXxxXXxXXXXXXXxxxXxXXxXXXXXxXXxXXXxXxXxxxXXxXXxXXXXXxXxxXX
Next steps
Register and Consume your data
Create Azure Machine Learning data assets
5/25/2022 • 16 minutes to read • Edit Online
Prerequisites
To create and work with Data assets, you need:
An Azure subscription. If you don't have one, create a free account before you begin. Try the free or paid
version of Azure Machine Learning.
An Azure Machine Learning workspace.
The Azure Machine Learning CLI/SDK installed and MLTable package installed.
Create an Azure Machine Learning compute instance, which is a fully configured and managed
development environment that includes integrated notebooks and the SDK already installed.
OR
Work on your own Jupyter notebook and install the CLI/SDK and required packages.
IMPORTANT
While the package may work on older versions of Linux distros, we do not recommend using a distro that is out of
mainstream support. Distros that are out of mainstream support may have security vulnerabilities, as they do not receive
the latest updates. We recommend using the latest supported version of your distro that is compatible with .
Data types
Azure Machine Learning allows you to work with different types of data. Your data can be local or in the cloud
(from a registered Azure ML Datastore, a common Azure Storage URL or a public data url). In this article, you'll
learn about using the Python SDK V2 and CLI V2 to work with URIs and Tables. URIs reference a location either
local to your development environment or in the cloud. Tables are a tabular data abstraction.
For most scenarios, you could use URIs ( uri_folder and uri_file ). A URI references a location in storage that
can be easily mapped to the filesystem of a compute node when you run a job. The data is accessed by either
mounting or downloading the storage to the node.
When using tables, you could use mltable . It's an abstraction for tabular data that is used for AutoML jobs,
parallel jobs, and some advanced scenarios. If you're just starting to use Azure Machine Learning, and aren't
using AutoML, we strongly encourage you to begin with URIs.
If you're creating Azure ML Data asset from an existing Datastore:
1. Verify that you have contributor or owner access to the underlying storage service of your registered
Azure Machine Learning datastore. Check your storage account permissions in the Azure portal.
2. Create the data asset by referencing paths in the datastore. You can create a Data asset from multiple
paths in multiple datastores. There's no hard limit on the number of files or data size that you can create a
data asset from.
NOTE
For each data path, a few requests will be sent to the storage service to check whether it points to a file or a folder. This
overhead may lead to degraded performance or failure. A Data asset referencing one folder with 1000 files inside is
considered referencing one data path. We recommend creating Data asset referencing less than 100 paths in datastores
for optimal performance.
TIP
You can create Data asset with identity-based data access. If you don't provide any credentials, we will use your identity
by default.
TIP
If you have dataset assets created using the SDK v1, you can still use those with SDK v2. For more information, see the
Consuming V1 Dataset Assets in V2 section.
URIs
The code snippets in this section cover the following scenarios:
Registering data as an asset in Azure Machine Learning
Reading registered data assets from Azure Machine Learning in a job
These snippets use uri_file and uri_folder .
uri_file is a type that refers to a specific file. For example,
'https://<account_name>.blob.core.windows.net/<container_name>/path/file.csv' .
uri_folder is a type that refers to a specific folder. For example,
'https://<account_name>.blob.core.windows.net/<container_name>/path' .
TIP
We recommend using an argument parser to pass folder information into data-plane code. By data-plane code, we mean
your data processing and/or training code that you run in the cloud. The code that runs in your development
environment and submits code to the data-plane is control-plane code.
Data-plane code is typically a Python script, but can be any programming language. Passing the folder as part of job
submission allows you to easily adjust the path from training locally using local data, to training in the cloud. If you
wanted to pass in just an individual file rather than the entire folder you can use the uri_file type.
my_data = Data(
path=my_path,
type=AssetTypes.URI_FOLDER,
description="description here",
name="a_name",
version='1'
)
ml_client.data.create_or_update(my_data)
my_job_inputs = {
"input_data": JobInput(
type=AssetTypes.URI_FOLDER,
path=registered_data_asset.id
)
}
job = CommandJob(
code="./src",
command='python read_data_asset.py --input_folder ${{inputs.input_data}}',
inputs=my_job_inputs,
environment="AzureML-sklearn-0.24-ubuntu18.04-py37-cpu:9",
compute="cpu-cluster"
)
my_data = Data(
path=my_file_path,
type=AssetTypes.URI_FILE,
description="description here",
name="a_name",
version='1'
)
ml_client.data.create_or_update(my_data)
MLTable
Register data as MLTable type Data assets
Registering a mltable as an asset in Azure Machine Learning You can register a mltable as a data asset in
Azure Machine Learning.
In the MLTable file, the path attribute supports any Azure ML supported URI format:
a relative file: "file://foo/bar.csv"
a short form entity URI: "azureml://datastores/foo/paths/bar/baz"
a long form entity URI: "azureml://subscriptions/my-sub-id/resourcegroups/my-
rg/workspaces/myworkspace/datastores/mydatastore/paths/path_to_data/"
a storage URI: "https://", "wasbs://", "abfss://", "adl://"
a public URI: "http://mypublicdata.com/foo.csv"
Below we show an example of versioning the sample data in this repo. The data is uploaded to cloud storage
and registered as an asset.
Python-SDK
CLI
my_data = Data(
path="./sample_data",
type=AssetTypes.MLTABLE,
description="Titanic Data",
name="titanic-mltable",
version='1'
)
ml_client.data.create_or_update(my_data)
TIP
Although the above example shows a local file. Remember that path supports cloud storage (https, abfss, wasbs
protocols). Therefore, if you want to register data in a > cloud location just specify the path with any of the supported
protocols.
The contents of the MLTable file specify the underlying data location (here a local path) and also the transforms
to perform on the underlying data before materializing into a pandas/spark/dask data frame. The important
part here's that the MLTable-artifact doesn't have any absolute paths, making it self-contained. All the
information stored in one folder; regardless of whether that folder is stored on your local drive or in your cloud
drive or on a public http server.
To consume the data in a job or interactive session, use mltable :
import mltable
tbl = mltable.load("./sample_data")
df = tbl.to_pandas_dataframe()
For a full example of using an MLTable, see the Working with MLTable notebook.
mltable-artifact
Here the files that make up the mltable-artifact are stored on the user's local machine:
.
├── MLTable
└── iris.csv
The contents of the MLTable file specify the underlying data location (here a local path) and also the transforms
to perform on the underlying data before materializing into a pandas/spark/dask data frame:
#source ../configs/dataset/iris/MLTable
$schema: http://azureml/sdk-2-0/MLTable.json
type: mltable
paths:
- file: ./iris.csv
transformations:
- read_delimited:
delimiter: ","
encoding: ascii
header: all_files_same_headers
The important part here's that the MLTable-artifact doesn't have any absolute paths, hence it's self-contained and
all that is needed is stored in that one folder; regardless of whether that folder is stored on your local drive or in
your cloud drive or on a public http server.
This artifact file can be consumed in a command job as follows:
#source ../configs/dataset/01-mltable-CommandJob.yaml
$schema: http://azureml/sdk-2-0/CommandJob.json
inputs:
my_mltable_artifact:
type: mltable
# folder needs to contain an MLTable file
mltable: file://iris
command: |
python -c "
from mltable import load
# load a table from a folder containing an MLTable file
tbl = load(${{my_mltable_artifact}})
tbl.to_pandas_dataframe()
...
"
NOTE
For local files and folders , only relative paths are supported. To be explicit, we will not support absolute paths as that
would require us to change the MLTable file that is residing on disk before we move it to cloud storage.
You can put MLTable file and underlying data in the same folder but in a cloud object store. You can specify
mltable: in their job that points to a location on a datastore that contains the MLTable file:
#source ../configs/dataset/04-mltable-CommandJob.yaml
$schema: http://azureml/sdk-2-0/CommandJob.json
inputs:
my_mltable_artifact:
type: mltable
mltable: azureml://datastores/some_datastore/paths/data/iris
command: |
python -c "
from mltable import load
# load a table from a folder containing an MLTable file
tbl = load(${{my_mltable_artifact}})
tbl.to_pandas_dataframe()
...
"
You can also have an MLTable file stored on the local machine, but no data files. The underlying data is stored on
the cloud. In this case, the MLTable should reference the underlying data with an absolute expression (i.e. a
URI) :
.
├── MLTable
#source ../configs/dataset/iris-cloud/MLTable
$schema: http://azureml/sdk-2-0/MLTable.json
type: mltable
paths:
- file: azureml://datastores/mydatastore/paths/data/iris.csv
transformations:
- read_delimited:
delimiter: ","
encoding: ascii
header: all_files_same_headers
.
└── MLTable
#source ../configs/dataset/multiple-files/MLTable
$schema: http://azureml/sdk-2-0/MLTable.json
type: mltable
As outlined above, MLTable can be created from a URI or a local folder path:
#source ../configs/types/22_input_mldataset_artifacts-PipelineJob.yaml
$schema: http://azureml/sdk-2-0/PipelineJob.json
jobs:
first:
description: this job takes a mltable-artifact as input and mounts it.
Note that the actual data could be in a different location
inputs:
mnist:
type: mltable # redundant but there for clarity
# needs to point to a folder that contains an MLTable file
mltable: azureml://datastores/some_datastore/paths/data/public/mnist
mode: ro_mount # or download
command: |
python -c "
import mltable as mlt
# load a table from a folder containing an MLTable file
tbl = mlt.load('${{inputs.mnist}}')
tbl.list_files()
...
"
second:
description: this job loads a table artifact from a local_path.
Note that the folder needs to contain a well-formed MLTable file
inputs:
tbl_access_artifact:
type: mltable
mltable: file:./iris
mode: download
command: |
python -c "
import mltable as mlt
# load a table from a folder containing an MLTable file
tbl = MLTable.load('${{inputs.tbl_access_artifact}}')
tbl.list_files()
...
"
MLTable-artifacts can yield files that aren't necessarily located in the mltable 's storage. Or it can subset or
shuffle the data that resides in the storage using the take_random_sample transform for example. That view is
only visible if the MLTable file is evaluated by the engine. The user can do that as described above by using the
MLTable SDK by running mltable.load , but that requires python and the installation of the SDK.
Support globbing of files
Along with users being able to provide a file or folder , the MLTable artifact file will also allow customers to
specify a pattern to do globbing of files:
#source ../configs/dataset/parquet-artifact-search/MLTable
$schema: http://azureml/sdk-2-0/MLTable.json
type: mltable
paths:
- pattern: parquet_files/*1.parquet # only get files with this specific pattern
transformations:
- read_parquet:
include_path_column: false
Delimited text: Transformations
There are the following transformations that are specific to delimited text.
infer_column_types : Boolean to infer column data types. Defaults to True. Type inference requires that the
data source is accessible from current compute. Currently type inference will only pull first 200 rows. If the
data contains multiple types of value, it's better to provide desired type as an override via set_column_types
argument
encoding : Specify the file encoding. Supported encodings are 'utf8', 'iso88591', 'latin1', 'ascii', 'utf16', 'utf32',
'utf8bom' and 'windows1252'. Defaults to utf8.
header: user can choose one of the following options:
no_header
from_first_file
all_files_different_headers
all_files_same_headers (default)
delimiter : The separator used to split columns.
empty_as_string : Specify if empty field values should be loaded as empty strings. The default (False) will read
empty field values as nulls. Passing this as True will read empty field values as empty strings. If the values are
converted to numeric or datetime, then this has no effect as empty values will be converted to nulls.
include_path_column : Boolean to keep path information as column in the table. Defaults to False. This is
useful when reading multiple files, and want to know which file a particular record originated from, or to
keep useful information in file path.
support_multi_line : By default (support_multi_line=False), all line breaks, including those in quoted field
values, will be interpreted as a record break. Reading data this way is faster and more optimized for parallel
execution on multiple CPU cores. However, it may result in silently producing more records with misaligned
field values. This should be set to True when the delimited files are known to contain quoted line breaks.
Parquet files: Transforms
If user doesn't define options for read_parquet transformation, default options will be selected (see below).
include_path_column : Boolean to keep path information as column in the table. Defaults to False. This is
useful when reading multiple files, and want to know which file a particular record originated from, or to
keep useful information in file path.
Json lines: Transformations
Below are the supported transformations that are specific for json lines:
include_path Boolean to keep path information as column in the MLTable. Defaults to False. This is useful
when reading multiple files, and want to know which file a particular record originated from, or to keep
useful information in file path.
invalid_lines How to handle lines that are invalid JSON. Supported values are error and drop . Defaults
to error .
encoding Specify the file encoding. Supported encodings are utf8 , iso88591 , latin1 , ascii , utf16 ,
utf32 , utf8bom and windows1252 . Default is utf8 .
Global transforms
MLTable-artifacts provide transformations specific to the delimited text, parquet, Delta. There are other
transforms that mltable-artifact files support:
take : Takes the first n records of the table
take_random_sample : Takes a random sample of the table where each record has a probability of being
selected. The user can also include a seed.
skip : This skips the first n records of the table
drop_columns : Drops the specified columns from the table. This transform supports regex so that users can
drop columns matching a particular pattern.
keep_columns : Keeps only the specified columns in the table. This transform supports regex so that users can
keep columns matching a particular pattern.
filter : Filter the data, leaving only the records that match the specified expression.
extract_partition_format_into_columns : Specify the partition format of path. Defaults to None. The partition
information of each path will be extracted into columns based on the specified format. Format part
'{column_name}' creates string column, and '{column_name:yyyy/MM/dd/HH/mm/ss}' creates datetime
column, where 'yyyy', 'MM', 'dd', 'HH', 'mm' and 'ss' are used to extract year, month, day, hour, minute and
second for the datetime type. The format should start from the position of first partition key until the end of
file path. For example, given the path '../Accounts/2019/01/01/data.csv' where the partition is by department
name and time, partition_format='/{Department}/{PartitionDate:yyyy/MM/dd}/data.csv' creates a string
column 'Department' with the value 'Accounts' and a datetime column 'PartitionDate' with the value '2019-
01-01'. Our principle here's to support transforms specific to data delivery and not to get into wider feature
engineering transforms.
Traits
The keen eyed among you may have spotted that mltable type supports a traits section. Traits define fixed
characteristics of the table (that is, they are not freeform metadata that users can add) and they don't perform
any transformations but can be used by the engine.
index_columns : Set the table index using existing columns. This trait can be used by partition_by in the data
plane to split data by the index.
timestamp_column : Defines the timestamp column of the table. This trait can be used in filter transforms, or in
other data plane operations (SDK) such as drift detection.
Moreover, in the future we can use traits to define RAI aspects of the data, for example:
sensitive_columns : Here the user can define certain columns that contain sensitive information.
Again, this isn't a transform but is informing the system of some extra properties in the data.
Next steps
Install and set up Python SDK v2 (preview)
Install and use the CLI (v2)
Train models with the Python SDK v2 (preview)
Tutorial: Create production ML pipelines with Python SDK v2 (preview)
Learn more about Data in Azure Machine Learning
Read and write data for ML experiments
5/25/2022 • 7 minutes to read • Edit Online
Prerequisites
An Azure subscription. If you don't have an Azure subscription, create a free account before you begin. Try
the free or paid version of Azure Machine Learning.
The Azure Machine Learning SDK for Python v2.
An Azure Machine Learning workspace
IMPORTANT
If the path is local, but your compute is defined to be in the cloud, Azure Machine Learning will automatically upload the
data to cloud storage for you.
Python-SDK
CLI
from azure.ai.ml.entities import Data, UriReference, JobInput, CommandJob
from azure.ai.ml._constants import AssetTypes
my_job_inputs = {
"input_data": JobInput(
path='./sample_data', # change to be your local directory
type=AssetTypes.URI_FOLDER
)
}
job = CommandJob(
code="./src", # local path where the code is stored
command='python train.py --input_folder ${{inputs.input_data}}',
inputs=my_job_inputs,
environment="AzureML-sklearn-0.24-ubuntu18.04-py37-cpu:9",
compute="cpu-cluster"
)
Python-SDK
CLI
The following code shows how to read in uri_folder type data from Azure Data Lake Storage Gen 2 or Blob via
SDK V2.
from azure.ai.ml.entities import Data, UriReference, JobInput, CommandJob
from azure.ai.ml._constants import AssetTypes
my_job_inputs = {
"input_data": JobInput(
path='abfss://<file_system>@<account_name>.dfs.core.windows.net/<path>', # Blob:
'https://<account_name>.blob.core.windows.net/<container_name>/path'
type=AssetTypes.URI_FOLDER
)
}
job = CommandJob(
code="./src", # local path where the code is stored
command='python train.py --input_folder ${{inputs.input_data}}',
inputs=my_job_inputs,
environment="AzureML-sklearn-0.24-ubuntu18.04-py37-cpu:9",
compute="cpu-cluster"
)
IN P UT / O
TYPE UT P UT UPLOAD DOWNLOAD RO_MOUNT RW_MOUNT DIRECT EVAL_DOWNLOAD EVAL_MOUNT
uri_folder Input ❌ ✅ ✅ ❌ ✅ ❌ ❌
uri_file Input ❌ ✅ ✅ ❌ ✅ ❌ ❌
mltable Input ❌ ✅ ✅ ❌ ✅ ✅ ✅
uri_folder Output ✅ ❌ ❌ ✅ ✅ ❌ ❌
uri_file Output ✅ ❌ ❌ ✅ ✅ ❌ ❌
mltable Output ✅ ❌ ❌ ✅ ✅ ❌ ❌
As you can see from the table, eval_download and eval_mount are unique to mltable . A MLTable-artifact can
yield files that are not necessarily located in the mltable 's storage. Or it can subset or shuffle the data that
resides in the storage. That view is only visible if the MLTable file is actually evaluated by the engine. These
modes will provide that view of the files.
Python-SDK
CLI
from azure.ai.ml.entities import Data, UriReference, JobInput, CommandJob, JobOutput
from azure.ai.ml._constants import AssetTypes
my_job_inputs = {
"input_data": JobInput(
path='abfss://<file_system>@<account_name>.dfs.core.windows.net/<path>',
type=AssetTypes.URI_FOLDER
)
}
my_job_outputs = {
"output_folder": JobOutput(
path='abfss://<file_system>@<account_name>.dfs.core.windows.net/<path>',
type=AssetTypes.URI_FOLDER
)
}
job = CommandJob(
code="./src", #local path where the code is stored
command='python pre-process.py --input_folder ${{inputs.input_data}} --output_folder
${{outputs.output_folder}}',
inputs=my_job_inputs,
outputs=my_job_outputs,
environment="AzureML-sklearn-0.24-ubuntu18.04-py37-cpu:9",
compute="cpu-cluster"
)
Register data
You can register data as an asset to your workspace. The benefits of registering data are:
Easy to share with other members of the team (no need to remember file locations)
Versioning of the metadata (location, description, etc.)
Lineage tracking
The following example demonstrates versioning of sample data, and shows how to register a local file as a data
asset. The data is uploaded to cloud storage and registered as an asset.
my_data = Data(
path="./sample_data/titanic.csv",
type=AssetTypes.URI_FILE,
description="Titanic Data",
name="titanic",
version='1'
)
ml_client.data.create_or_update(my_data)
To register data that is in a cloud location, you can specify the path with any of the supported protocols for the
storage type. The following example shows what the path looks like for data from Azure Data Lake Storage Gen
2.
my_data = Data(
path=my_path,
type=AssetTypes.URI_FOLDER,
description="description here",
name="a_name",
version='1'
)
ml_client.data.create_or_update(my_data)
my_job_inputs = {
"input_data": JobInput(
type=AssetTypes.URI_FOLDER,
path=registered_data_asset.id
)
}
job = CommandJob(
code="./src",
command='python read_data_asset.py --input_folder ${{inputs.input_data}}',
inputs=my_job_inputs,
environment="AzureML-sklearn-0.24-ubuntu18.04-py37-cpu:9",
compute="cpu-cluster"
)
display_name: 3b_pipeline_with_data
description: Pipeline with 3 component jobs with data dependencies
compute: azureml:cpu-cluster
outputs:
final_pipeline_output:
mode: rw_mount
jobs:
component_a:
type: command
component: file:./componentA.yml
inputs:
component_a_input:
type: uri_folder
path: ./data
outputs:
component_a_output:
mode: rw_mount
component_b:
type: command
component: file:./componentB.yml
inputs:
component_b_input: ${{parent.jobs.component_a.outputs.component_a_output}}
outputs:
component_b_output:
mode: rw_mount
component_c:
type: command
component: file:./componentC.yml
inputs:
component_c_input: ${{parent.jobs.component_b.outputs.component_b_output}}
outputs:
component_c_output: ${{parent.outputs.final_pipeline_output}}
# mode: upload
train_node = keras_train_component(
input_data=prepare_data_node.outputs.training_data
)
train_node.compute = gpu_compute_target
score_node = keras_score_component(
input_data=prepare_data_node.outputs.test_data,
input_model=train_node.outputs.output_model,
)
# create a pipeline
pipeline_job = image_classification_keras_minist_convnet(pipeline_input_data=fashion_ds)
Next steps
Install and set up Python SDK v2 (preview)
Install and use the CLI (v2)
Train models with the Python SDK v2 (preview)
Tutorial: Create production ML pipelines with Python SDK v2 (preview)
Learn more about Data in Azure Machine Learning
How to authenticate data access
5/25/2022 • 5 minutes to read • Edit Online
Learn how to manage data access and how to authenticate in Azure Machine Learning
APPLIES TO : Python SDK azure-ai-ml v2 (preview)
APPLIES TO: Azure CLI ml extension v2 (current)
IMPORTANT
The information in this article is intended for Azure administrators who are creating the infrastructure required for an
Azure Machine Learning solution.
USE W O RK SPA C E
SC EN A RIO M A N A GED SERVIC E IDEN T IT Y ( M SI) IDEN T IT Y TO USE
Data access is complex and it's important to recognize that there are many pieces to it. For example, accessing
data from Azure Machine Learning studio is different than using the SDK. When using the SDK on your local
development environment, you're directly accessing data in the cloud. When using studio, you aren't always
directly accessing the data store from your client. Studio relies on the workspace to access data on your behalf.
TIP
If you need to access data from outside Azure Machine Learning, such as using Azure Storage Explorer, user identity is
probably what is used. Consult the documentation for the tool or service you are using for specific information. For more
information on how Azure Machine Learning works with data, see Identity-based data access to storage services on
Azure.
Next steps
For information on enabling studio in a network, see Use Azure Machine Learning studio in an Azure Virtual
Network.
Create a training job with the job creation UI
(preview)
5/25/2022 • 6 minutes to read • Edit Online
There are many ways to create a training job with Azure Machine Learning. You can use the CLI (see Train
models (create jobs) with the CLI (v2)), the REST API (see Train models with REST (preview)), or you can use the
UI to directly create a training job. In this article, you'll learn how to use your own data and code to train a
machine learning model with the job creation UI in Azure Machine Learning studio.
Prerequisites
An Azure subscription. If you don't have an Azure subscription, create a free account before you begin. Try
the free or paid version of Azure Machine Learning today.
An Azure Machine Learning workspace. See Create an Azure Machine Learning workspace.
Understanding of what a job is in Azure Machine Learning. See how to train models with the CLI (v2).
Get started
1. Sign in to Azure Machine Learning studio.
2. Select your subscription and workspace.
You may enter the job creation UI from the homepage. Click Create new and select Job .
Or, you may enter the job creation from the left pane. Click +New and select Job .
Or, if you're in the Experiment page, you may go to the All runs tab and click Create job .
These options will all take you to the job creation panel, which has a wizard for configuring and creating a
training job.
C O M P UT E T Y P E IN T RO DUC T IO N
Attached Kubernetes cluster Configure and attach Kubernetes cluster anywhere (preview).
If you're using Azure Machine Learning for the first time, you'll see an empty list and a link to create a new
compute.
For more information on creating the various types, see:
C O M P UT E T Y P E H O W TO
Job name The job name field is used to uniquely identify your job. It's
also used as the display name for your job. Setting this field
is optional; Azure will generate a GUID name for the job if
you don't enter anything. Note: the job name must be
unique.
Experiment name This helps organize the job in Azure Machine Learning
studio. Each job's run record will be organized under the
corresponding experiment in the studio's "Experiment" tab.
By default, Azure will put the job in the Default experiment.
Code You can upload a code file or a folder from your machine, or
upload a code file from the workspace's default blob storage.
Azure will show the files to be uploaded after you make the
selection.
.
├── job.yml
├── data
└── src
└── main.py
Here, the source code is in the src subdirectory. The command would be python ./src/main.py (plus other
command-line arguments).
Inputs
When you use an input in the command, you need to specify the input name. To indicate an input variable, use
the form ${{inputs.input_name}} . For instance, ${{inputs.wiki}} . You can then refer to it in the command, for
instance, --data ${{inputs.wiki}} .
Review and Create
Once you've configured your job, choose Next to go to the Review page. To modify a setting, choose the pencil
icon and make the change.
You may choose view the YAML spec to review and download the yaml file generated by this job
configuration. This job yaml file can be used to submit the job from the CLI (v2). (See Train models (create jobs)
with the CLI (v2).)
To launch the job, choose Create . Once the job is created, Azure will show you the run details page, where you
can monitor and manage your training job.
IMPORTANT
SDK v2 is currently in public preview. The preview version is provided without a service level agreement, and it's not
recommended for production workloads. Certain features might not be supported or might have constrained capabilities.
For more information, see Supplemental Terms of Use for Microsoft Azure Previews.
In this article, you learn how to configure and submit Azure Machine Learning jobs to train your models.
Snippets of code explain the key parts of configuration and submission of a training job. Then use one of the
example notebooks to find the full end-to-end working examples.
Prerequisites
If you don't have an Azure subscription, create a free account before you begin. Try the free or paid version of
Azure Machine Learning today
The Azure Machine Learning SDK v2 for Python
An Azure Machine Learning workspace
Clone examples repository
To run the training examples, first clone the examples repository and change into the sdk directory:
TIP
Use --depth 1 to clone only the latest commit to the repository, which reduces time to complete the operation.
2. Create compute
You'll create a compute called cpu-cluster for your job, with this code:
from azure.ai.ml.entities import AmlCompute
try:
ml_client.compute.get(cpu_compute_target)
except Exception:
print("Creating a new cpu compute target...")
compute = AmlCompute(
name=cpu_compute_target, size="STANDARD_D2_V2", min_instances=0, max_instances=4
)
ml_client.compute.begin_create_or_update(compute)
# we will reuse the command_job created before. we call it as a function so that we can apply inputs
# we do not apply the 'iris_csv' input again -- we will just use what was already defined earlier
command_job_for_sweep = command_job(
learning_rate=Uniform(min_value=0.01, max_value=0.9),
boosting=Choice(values=["gbdt", "dart"]),
)
As seen above, the sweep function allows user to configure the following key aspects:
sampling_algorithm - The hyperparameter sampling algorithm to use over the search_space. Allowed values
are random , grid and bayesian .
objective - the objective of the sweep
primary_metric - The name of the primary metric reported by each trial job. The metric must be
logged in the user's training script using mlflow.log_metric() with the same corresponding metric
name.
goal - The optimization goal of the objective.primary_metric. The allowed values are maximize and
minimize .
compute - Name of the compute target to execute the job on.
limits - Limits for the sweep job
Once this job completes, you can look at the metrics and the job details in the Azure ML Portal. The job details
page will identify the best performing child run.
Distributed training
Azure Machine Learning supports PyTorch, TensorFlow, and MPI-based distributed training. Let us look at how to
configure a command for distribution for the command_job you created earlier
Next steps
Try these next steps to learn how to use the Azure Machine Learning SDK (v2) for Python:
Use pipelines with the Azure ML Python SDK (v2)
Configure and submit training runs
5/25/2022 • 10 minutes to read • Edit Online
Prerequisites
If you don't have an Azure subscription, create a free account before you begin. Try the free or paid version of
Azure Machine Learning today
The Azure Machine Learning SDK for Python (>= 1.13.0)
An Azure Machine Learning workspace, ws
A compute target, my_compute_target . Create a compute target
Create an experiment
Create an experiment in your workspace. An experiment is a light-weight container that helps to organize run
submissions and keep track of code.
experiment_name = 'my_experiment'
experiment = Experiment(workspace=ws, name=experiment_name)
NOTE
Azure Databricks is not supported as a compute target for model training. You can use Azure Databricks for data
preparation and deployment tasks.
NOTE
To create and attach a compute target for training on Azure Arc-enabled Kubernetes cluster, see Configure Azure Arc-
enabled Machine Learning
Create an environment
Azure Machine Learning environments are an encapsulation of the environment where your machine learning
training happens. They specify the Python packages, Docker image, environment variables, and software
settings around your training and scoring scripts. They also specify runtimes (Python, Spark, or Docker).
You can either define your own environment, or use an Azure ML curated environment. Curated environments
are predefined environments that are available in your workspace by default. These environments are backed by
cached Docker images which reduces the run preparation cost. See Azure Machine Learning Curated
Environments for the full list of available curated environments.
For a remote compute target, you can use one of these popular curated environments to start with:
ws = Workspace.from_config()
myenv = Environment.get(workspace=ws, name="AzureML-Minimal")
For more information and details about environments, see Create & use software environments in Azure
Machine Learning.
Local compute target
If your compute target is your local machine , you are responsible for ensuring that all the necessary packages
are available in the Python environment where the script runs. Use python.user_managed_dependencies to use
your current Python environment (or the Python on the path you specify).
myenv = Environment("user-managed-env")
myenv.python.user_managed_dependencies = True
src = ScriptRunConfig(source_directory=project_folder,
script='train.py',
compute_target=my_compute_target,
environment=myenv)
If you do not specify an environment, a default environment will be created for you.
If you have command-line arguments you want to pass to your training script, you can specify them via the
arguments parameter of the ScriptRunConfig constructor, e.g.
arguments=['--arg1', arg1_val, '--arg2', arg2_val] .
If you want to override the default maximum time allowed for the run, you can do so via the
max_run_duration_seconds parameter. The system will attempt to automatically cancel the run if it takes longer
than this value.
Specify a distributed job configuration
If you want to run a distributed training job, provide the distributed job-specific config to the
distributed_job_config parameter. Supported config types include MpiConfiguration, TensorflowConfiguration,
and PyTorchConfiguration.
For more information and examples on running distributed Horovod, TensorFlow and PyTorch jobs, see:
Train TensorFlow models
Train PyTorch models
IMPORTANT
Special Folders Two folders, outputs and logs, receive special treatment by Azure Machine Learning. During training,
when you write files to folders named outputs and logs that are relative to the root directory ( ./outputs and ./logs ,
respectively), the files will automatically upload to your run history so that you have access to them once your run is
finished.
To create artifacts during training (such as model files, checkpoints, data files, or plotted images) write these to the
./outputs folder.
Similarly, you can write any logs from your training run to the ./logs folder. To utilize Azure Machine Learning's
TensorBoard integration make sure you write your TensorBoard logs to this folder. While your run is in progress, you will
be able to launch TensorBoard and stream these logs. Later, you will also be able to restore the logs from any of your
previous runs.
For example, to download a file written to the outputs folder to your local machine after your remote training run:
run.download_file(name='outputs/my_output_file', output_file_path='my_destination_path')
Notebook examples
See these notebooks for examples of configuring runs for various training scenarios:
Training on various compute targets
Training with ML frameworks
tutorials/img-classification-part1-training.ipynb
Learn how to run notebooks by following the article Use Jupyter notebooks to explore this service.
Troubleshooting
AttributeError : 'RoundTripLoader' object has no attribute 'comment_handling' : This error
comes from the new version (v0.17.5) of ruamel-yaml , an azureml-core dependency, that introduces a
breaking change to azureml-core . In order to fix this error, please uninstall ruamel-yaml by running
pip uninstall ruamel-yaml and installing a different version of ruamel-yaml ; the supported versions are
v0.15.35 to v0.17.4 (inclusive). You can do this by running pip install "ruamel-yaml>=0.15.35,<0.17.5" .
Run fails with jwt.exceptions.DecodeError : Exact error message:
jwt.exceptions.DecodeError: It is required that you pass in a value for the "algorithms" argument when
calling decode()
.
Consider upgrading to the latest version of azureml-core: pip install -U azureml-core .
If you are running into this issue for local runs, check the version of PyJWT installed in your environment
where you are starting runs. The supported versions of PyJWT are < 2.0.0. Uninstall PyJWT from the
environment if the version is >= 2.0.0. You may check the version of PyJWT, uninstall and install the right
version as follows:
1. Start a command shell, activate conda environment where azureml-core is installed.
2. Enter pip freeze and look for PyJWT , if found, the version listed should be < 2.0.0
3. If the listed version is not a supported version, pip uninstall PyJWT in the command shell and enter y
for confirmation.
4. Install using pip install 'PyJWT<2.0.0'
If you are submitting a user-created environment with your run, consider using the latest version of
azureml-core in that environment. Versions >= 1.18.0 of azureml-core already pin PyJWT < 2.0.0. If you
need to use a version of azureml-core < 1.18.0 in the environment you submit, make sure to specify
PyJWT < 2.0.0 in your pip dependencies.
ModuleErrors (No module named) : If you are running into ModuleErrors while submitting
experiments in Azure ML, the training script is expecting a package to be installed but it isn't added. Once
you provide the package name, Azure ML installs the package in the environment used for your training
run.
If you are using Estimators to submit experiments, you can specify a package name via pip_packages or
conda_packages parameter in the estimator based on from which source you want to install the package.
You can also specify a yml file with all your dependencies using conda_dependencies_file or list all your
pip requirements in a txt file using pip_requirements_file parameter. If you have your own Azure ML
Environment object that you want to override the default image used by the estimator, you can specify
that environment via the environment parameter of the estimator constructor.
Azure ML maintained docker images and their contents can be seen in AzureML Containers. Framework-
specific dependencies are listed in the respective framework documentation:
Chainer
PyTorch
TensorFlow
SKLearn
NOTE
If you think a particular package is common enough to be added in Azure ML maintained images and
environments please raise a GitHub issue in AzureML Containers.
NameError (Name not defined), AttributeError (Object has no attribute) : This exception should
come from your training scripts. You can look at the log files from Azure portal to get more information
about the specific name not defined or attribute error. From the SDK, you can use run.get_details() to
look at the error message. This will also list all the log files generated for your run. Please make sure to
take a look at your training script and fix the error before resubmitting your run.
Run or experiment deletion : Experiments can be archived by using the Experiment.archive method, or
from the Experiment tab view in Azure Machine Learning studio client via the "Archive experiment"
button. This action hides the experiment from list queries and views, but does not delete it.
Permanent deletion of individual experiments or runs is not currently supported. For more information
on deleting Workspace assets, see Export or delete your Machine Learning service workspace data.
Metric Document is too large : Azure Machine Learning has internal limits on the size of metric objects
that can be logged at once from a training run. If you encounter a "Metric Document is too large" error
when logging a list-valued metric, try splitting the list into smaller chunks, for example:
Internally, Azure ML concatenates the blocks with the same metric name into a contiguous list.
Compute target takes a long time to star t : The Docker images for compute targets are loaded from
Azure Container Registry (ACR). By default, Azure Machine Learning creates an ACR that uses the basic
service tier. Changing the ACR for your workspace to standard or premium tier may reduce the time it
takes to build and load images. For more information, see Azure Container Registry service tiers.
Next steps
Tutorial: Train and deploy a model uses a managed compute target to train a model.
See how to train models with specific ML frameworks, such as Scikit-learn, TensorFlow, and PyTorch.
Learn how to efficiently tune hyperparameters to build better models.
Once you have a trained model, learn how and where to deploy models.
View the ScriptRunConfig class SDK reference.
Use Azure Machine Learning with Azure Virtual Networks
Hyperparameter tuning a model (v2)
5/25/2022 • 14 minutes to read • Edit Online
command_job_for_sweep = command_job(
batch_size=Choice(values=[16, 32, 64, 128]),
number_of_hidden_layers=Choice(values=range(1,5)),
)
In this case, batch_size one of the values [16, 32, 64, 128] and number_of_hidden_layers takes one of the values
[1, 2, 3, 4].
The following advanced discrete hyperparameters can also be specified using a distribution:
QUniform(min_value, max_value, q) - Returns a value like round(Uniform(min_value, max_value) / q) * q
QLogUniform(min_value, max_value, q) - Returns a value like round(exp(Uniform(min_value, max_value)) / q)
*q
QNormal(mu, sigma, q) - Returns a value like round(Normal(mu, sigma) / q) * q
QLogNormal(mu, sigma, q) - Returns a value like round(exp(Normal(mu, sigma)) / q) * q
Continuous hyperparameters
The Continuous hyperparameters are specified as a distribution over a continuous range of values:
Uniform(min_value, max_value) - Returns a value uniformly distributed between min_value and max_value
LogUniform(min_value, max_value) - Returns a value drawn according to exp(Uniform(min_value, max_value))
so that the logarithm of the return value is uniformly distributed
Normal(mu, sigma) - Returns a real value that's normally distributed with mean mu and standard deviation
sigma
LogNormal(mu, sigma) - Returns a value drawn according to exp(Normal(mu, sigma)) so that the logarithm of
the return value is normally distributed
An example of a parameter space definition:
command_job_for_sweep = command_job(
learning_rate=Normal(mu=10, sigma=3),
keep_probability=Uniform(min_value=0.05, max_value=0.1),
)
This code defines a search space with two parameters - learning_rate and keep_probability . learning_rate
has a normal distribution with mean value 10 and a standard deviation of 3. keep_probability has a uniform
distribution with a minimum value of 0.05 and a maximum value of 0.1.
For the CLI, you can use the sweep job YAML schema., to define the search space in your YAML:
search_space:
conv_size:
type: choice
values: [2, 5, 7]
dropout_rate:
type: uniform
min_value: 0.1
max_value: 0.2
command_job_for_sweep = command_job(
learning_rate=Normal(mu=10, sigma=3),
keep_probability=Uniform(min_value=0.05, max_value=0.1),
batch_size=Choice(values=[16, 32, 64, 128]),
)
sweep_job = command_job_for_sweep.sweep(
compute="cpu-cluster",
sampling_algorithm = "random",
...
)
Sobol
Sobol is a type of random sampling supported by sweep job types. You can use sobol to reproduce your results
using seed and cover the search space distribution more evenly.
To use sobol, use the RandomParameterSampling class to add the seed and rule as shown in the example below.
sweep_job = command_job_for_sweep.sweep(
compute="cpu-cluster",
sampling_algorithm = RandomParameterSampling(seed=123, rule="sobol"),
...
)
Grid sampling
Grid sampling supports discrete hyperparameters. Use grid sampling if you can budget to exhaustively search
over the search space. Supports early termination of low-performance jobs.
Grid sampling does a simple grid search over all possible values. Grid sampling can only be used with choice
hyperparameters. For example, the following space has six samples:
command_job_for_sweep = command_job(
batch_size=Choice(values=[16, 32]),
number_of_hidden_layers=Choice(values=[1,2,3]),
)
sweep_job = command_job_for_sweep.sweep(
compute="cpu-cluster",
sampling_algorithm = "grid",
...
)
Bayesian sampling
Bayesian sampling is based on the Bayesian optimization algorithm. It picks samples based on how previous
samples did, so that new samples improve the primary metric.
Bayesian sampling is recommended if you have enough budget to explore the hyperparameter space. For best
results, we recommend a maximum number of jobs greater than or equal to 20 times the number of
hyperparameters being tuned.
The number of concurrent jobs has an impact on the effectiveness of the tuning process. A smaller number of
concurrent jobs may lead to better sampling convergence, since the smaller degree of parallelism increases the
number of jobs that benefit from previously completed jobs.
Bayesian sampling only supports choice , uniform , and quniform distributions over the search space.
command_job_for_sweep = command_job(
learning_rate=Uniform(min_value=0.05, max_value=0.1),
batch_size=Choice(values=[16, 32, 64, 128]),
)
sweep_job = command_job_for_sweep.sweep(
compute="cpu-cluster",
sampling_algorithm = "bayesian",
...
)
command_job_for_sweep = command_job(
learning_rate=Uniform(min_value=0.05, max_value=0.1),
batch_size=Choice(values=[16, 32, 64, 128]),
)
sweep_job = command_job_for_sweep.sweep(
compute="cpu-cluster",
sampling_algorithm = "bayesian",
primary_metric="accuracy",
goal="Maximize",
)
The training script calculates the val_accuracy and logs it as the primary metric "accuracy". Each time the metric
is logged, it's received by the hyperparameter tuning service. It's up to you to determine the frequency of
reporting.
For more information on logging values for training jobs, see Enable logging in Azure ML training jobs.
NOTE
Bayesian sampling does not support early termination. When using Bayesian sampling, set
early_termination_policy = None .
In this example, the early termination policy is applied at every interval when metrics are reported, starting at
evaluation interval 5. Any jobs whose best metric is less than (1/(1+0.1) or 91% of the best performing jobs will
be terminated.
Median stopping policy
Median stopping is an early termination policy based on running averages of primary metrics reported by the
jobs. This policy computes running averages across all training jobs and stops jobs whose primary metric value
is worse than the median of the averages.
This policy takes the following configuration parameters:
evaluation_interval : the frequency for applying the policy (optional parameter).
delay_evaluation : delays the first policy evaluation for a specified number of intervals (optional parameter).
In this example, the early termination policy is applied at every interval starting at evaluation interval 5. A job is
stopped at interval 5 if its best primary metric is worse than the median of the running averages over intervals
1:5 across all training jobs.
Truncation selection policy
Truncation selection cancels a percentage of lowest performing jobs at each evaluation interval. jobs are
compared using the primary metric.
This policy takes the following configuration parameters:
truncation_percentage : the percentage of lowest performing jobs to terminate at each evaluation interval. An
integer value between 1 and 99.
evaluation_interval : (optional) the frequency for applying the policy
delay_evaluation : (optional) delays the first policy evaluation for a specified number of intervals
exclude_finished_jobs : specifies whether to exclude finished jobs when applying the policy
In this example, the early termination policy is applied at every interval starting at evaluation interval 5. A job
terminates at interval 5 if its performance at interval 5 is in the lowest 20% of performance of all jobs at interval
5 and will exclude finished jobs when applying the policy.
No termination policy (default)
If no policy is specified, the hyperparameter tuning service will let all training jobs execute to completion.
sweep_job.early_termination = None
NOTE
If both max_total_trials and max_concurrent_trials are specified, the hyperparameter tuning experiment terminates when
the first of these two thresholds is reached.
NOTE
The number of concurrent trial jobs is gated on the resources available in the specified compute target. Ensure that the
compute target has the available resources for the desired concurrency.
This code configures the hyperparameter tuning experiment to use a maximum of 20 total trial jobs, running
four trial jobs at a time with a timeout of 120 minutes for the entire sweep job.
# Call sweep() on your command job to sweep over your parameter expressions
sweep_job = command_job_for_sweep.sweep(
compute="cpu-cluster",
sampling_algorithm="random",
primary_metric="test-multi_logloss",
goal="Minimize",
)
The command_job is called as a function so we can apply the parameter expressions to the sweep inputs. The
sweep function is then configured with trial , sampling-algorithm , objective , limits , and compute . The
above code snippet is taken from the sample notebook Run hyperparameter sweep on a Command or
CommandComponent. In this sample, the learning_rate and boosting parameters will be tuned. Early
stopping of jobs will be determined by a MedianStoppingPolicy , which stops a job whose primary metric value is
worse than the median of the averages across all training jobs.(see MedianStoppingPolicy class reference).
To see how the parameter values are received, parsed, and passed to the training script to be tuned, refer to this
code sample
IMPORTANT
Every hyperparameter sweep job restarts the training from scratch, including rebuilding the model and all the data
loaders. You can minimize this cost by using an Azure Machine Learning pipeline or manual process to do as much data
preparation as possible prior to your training jobs.
Parallel Coordinates Char t : This visualization shows the correlation between primary metric
performance and individual hyperparameter values. The chart is interactive via movement of axes (click
and drag by the axis label), and by highlighting values across a single axis (click and drag vertically along
a single axis to highlight a range of desired values). The parallel coordinates chart includes an axis on the
right most portion of the chart that plots the best metric value corresponding to the hyperparameters set
for that job instance. This axis is provided in order to project the chart gradient legend onto the data in a
more readable fashion.
2-Dimensional Scatter Char t : This visualization shows the correlation between any two individual
hyperparameters along with their associated primary metric value.
3-Dimensional Scatter Char t : This visualization is the same as 2D but allows for three
hyperparameter dimensions of correlation with the primary metric value. You can also click and drag to
reorient the chart to view different correlations in 3D space.
You can use the CLI to download all default and named outputs of the best trial job and logs of the sweep job.
References
Hyperparameter tuning example
CLI (v2) sweep job YAML schema here
Next steps
Track an experiment
Deploy a trained model
Distributed GPU training guide
5/25/2022 • 13 minutes to read • Edit Online
Prerequisites
Review these basic concepts of distributed GPU training such as data parallelism, distributed data parallelism,
and model parallelism.
TIP
If you don't know which type of parallelism to use, more than 90% of the time you should use Distributed Data
Parallelism .
MPI
Azure ML offers an MPI job to launch a given number of processes in each node. You can adopt this approach to
run distributed training using either per-process-launcher or per-node-launcher, depending on whether
process_count_per_node is set to 1 (the default) for per-node-launcher, or equal to the number of devices/GPUs
for per-process-launcher. Azure ML constructs the full MPI launch command ( mpirun ) behind the scenes. You
can't provide your own full head-node-launcher commands like mpirun or DeepSpeed launcher .
TIP
The base Docker image used by an Azure Machine Learning MPI job needs to have an MPI library installed. Open MPI is
included in all the AzureML GPU base images. When you use a custom Docker image, you are responsible for making
sure the image includes an MPI library. Open MPI is recommended, but you can also use a different MPI implementation
such as Intel MPI. Azure ML also provides curated environments for popular frameworks.
curated_env_name = 'AzureML-PyTorch-1.6-GPU'
pytorch_env = Environment.get(workspace=ws, name=curated_env_name)
distr_config = MpiConfiguration(process_count_per_node=4, node_count=2)
run_config = ScriptRunConfig(
source_directory= './src',
script='train.py',
compute_target=compute_target,
environment=pytorch_env,
distributed_job_config=distr_config,
)
Horovod
Use the MPI job configuration when you use Horovod for distributed training with the deep learning framework.
Make sure your code follows these tips:
The training code is instrumented correctly with Horovod before adding the Azure ML parts
Your Azure ML environment contains Horovod and MPI. The PyTorch and TensorFlow curated GPU
environments come pre-configured with Horovod and its dependencies.
Create an MpiConfiguration with your desired distribution.
Horovod example
azureml-examples: TensorFlow distributed training using Horovod
DeepSpeed
Don't use DeepSpeed's custom launcher to run distributed training with the DeepSpeed library on Azure ML.
Instead, configure an MPI job to launch the training job with MPI.
Make sure your code follows these tips:
Your Azure ML environment contains DeepSpeed and its dependencies, Open MPI, and mpi4py.
Create an MpiConfiguration with your distribution.
DeepSeed example
azureml-examples: Distributed training with DeepSpeed on CIFAR-10
Environment variables from Open MPI
When running MPI jobs with Open MPI images, the following environment variables for each process launched:
1. OMPI_COMM_WORLD_RANK - the rank of the process
2. OMPI_COMM_WORLD_SIZE - the world size
3. AZ_BATCH_MASTER_NODE - primary address with port, MASTER_ADDR:MASTER_PORT
4. OMPI_COMM_WORLD_LOCAL_RANK - the local rank of the process on the node
5. OMPI_COMM_WORLD_LOCAL_SIZE - number of processes on the node
TIP
Despite the name, environment variable OMPI_COMM_WORLD_NODE_RANK does not corresponds to the NODE_RANK . To use
per-node-launcher, set process_count_per_node=1 and use OMPI_COMM_WORLD_RANK as the NODE_RANK .
PyTorch
Azure ML supports running distributed jobs using PyTorch's native distributed training capabilities (
torch.distributed ).
TIP
For data parallelism, the official PyTorch guidance is to use DistributedDataParallel (DDP) over DataParallel for both single-
node and multi-node distributed training. PyTorch also recommends using DistributedDataParallel over the
multiprocessing package. Azure Machine Learning documentation and examples will therefore focus on
DistributedDataParallel training.
The most common communication backends used are mpi , nccl , and gloo . For GPU-based training nccl is
recommended for best performance and should be used whenever possible.
init_method tells how each process can discover each other, how they initialize and verify the process group
using the communication backend. By default if init_method is not specified PyTorch will use the environment
variable initialization method ( env:// ). init_method is the recommended initialization method to use in your
training code to run distributed PyTorch on Azure ML. PyTorch will look for the following environment variables
for initialization,:
MASTER_ADDR - IP address of the machine that will host the process with rank 0.
MASTER_PORT - A free port on the machine that will host the process with rank 0.
WORLD_SIZE - The total number of processes. Should be equal to the total number of devices (GPU) used for
distributed training.
RANK - The (global) rank of the current process. The possible values are 0 to (world size - 1).
For more information on process group initialization, see the PyTorch documentation.
Beyond these, many applications will also need the following environment variables:
LOCAL_RANK - The local (relative) rank of the process within the node. The possible values are 0 to (# of
processes on the node - 1). This information is useful because many operations such as data preparation
only should be performed once per node --- usually on local_rank = 0.
NODE_RANK - The rank of the node for multi-node training. The possible values are 0 to (total # of nodes - 1).
curated_env_name = 'AzureML-PyTorch-1.6-GPU'
pytorch_env = Environment.get(workspace=ws, name=curated_env_name)
distr_config = PyTorchConfiguration(process_count=8, node_count=2)
run_config = ScriptRunConfig(
source_directory='./src',
script='train.py',
arguments=['--epochs', 50],
compute_target=compute_target,
environment=pytorch_env,
distributed_job_config=distr_config,
)
curated_env_name = 'AzureML-PyTorch-1.6-GPU'
pytorch_env = Environment.get(workspace=ws, name=curated_env_name)
distr_config = PyTorchConfiguration(node_count=2)
launch_cmd = "python -m torch.distributed.launch --nproc_per_node 4 --nnodes 2 --node_rank $NODE_RANK --
master_addr $MASTER_ADDR --master_port $MASTER_PORT --use_env train.py --epochs 50".split()
run_config = ScriptRunConfig(
source_directory='./src',
command=launch_cmd,
compute_target=compute_target,
environment=pytorch_env,
distributed_job_config=distr_config,
)
run_config = ScriptRunConfig(
source_directory='./src',
command=launch_cmd,
compute_target=compute_target,
environment=pytorch_env,
)
os.environ["NCCL_SOCKET_IFNAME"] = "^docker0,lo"
try:
os.environ["NODE_RANK"] = os.environ["OMPI_COMM_WORLD_RANK"]
# additional variables
os.environ["MASTER_ADDRESS"] = os.environ["MASTER_ADDR"]
os.environ["LOCAL_RANK"] = os.environ["OMPI_COMM_WORLD_LOCAL_RANK"]
os.environ["WORLD_SIZE"] = os.environ["OMPI_COMM_WORLD_SIZE"]
except:
# fails when used with pytorch configuration instead of mpi
pass
Lightning handles computing the world size from the Trainer flags --gpus and --num_nodes and
manages rank and local rank internally:
nnodes = 2
args = ['--max_epochs', 50, '--gpus', 2, '--accelerator', 'ddp_spawn', '--num_nodes', nnodes]
distr_config = MpiConfiguration(node_count=nnodes)
run_config = ScriptRunConfig(
source_directory='./src',
script='train.py',
arguments=args,
compute_target=compute_target,
environment=pytorch_env,
distributed_job_config=distr_config,
)
run_config = ScriptRunConfig(
source_directory='./src',
command=launch_cmd,
compute_target=compute_target,
environment=pytorch_env,
distributed_job_config=distr_config,
)
You can also use the per-process-launch option to run distributed training without using
torch.distributed.launch . One thing to keep in mind if using this method is that the transformers
TrainingArguments expect the local rank to be passed in as an argument ( --local_rank ).
torch.distributed.launch takes care of this when --use_env=False , but if you are using per-process-launch
you'll need to explicitly pass the local rank in as an argument to the training script --local_rank=$LOCAL_RANK as
Azure ML only sets the LOCAL_RANK environment variable.
TensorFlow
If you're using native distributed TensorFlow in your training code, such as TensorFlow 2.x's
tf.distribute.Strategy API, you can launch the distributed job via Azure ML using the TensorflowConfiguration
.
To do so, specify a TensorflowConfiguration object to the distributed_job_config parameter of the
ScriptRunConfig constructor. If you're using tf.distribute.experimental.MultiWorkerMirroredStrategy , specify
the worker_count in the TensorflowConfiguration corresponding to the number of nodes for your training job.
curated_env_name = 'AzureML-TensorFlow-2.3-GPU'
tf_env = Environment.get(workspace=ws, name=curated_env_name)
distr_config = TensorflowConfiguration(worker_count=2, parameter_server_count=0)
run_config = ScriptRunConfig(
source_directory='./src',
script='train.py',
compute_target=compute_target,
environment=tf_env,
distributed_job_config=distr_config,
)
If your training script uses the parameter server strategy for distributed training, such as for legacy TensorFlow
1.x, you'll also need to specify the number of parameter servers to use in the job, for example,
tf_config = TensorflowConfiguration(worker_count=2, parameter_server_count=1) .
TF_CONFIG
In TensorFlow, the TF_CONFIG environment variable is required for training on multiple machines. For
TensorFlow jobs, Azure ML will configure and set the TF_CONFIG variable appropriately for each worker before
executing your training script.
You can access TF_CONFIG from your training script if you need to: os.environ['TF_CONFIG'] .
Example TF_CONFIG set on a chief worker node:
TF_CONFIG='{
"cluster": {
"worker": ["host0:2222", "host1:2222"]
},
"task": {"type": "worker", "index": 0},
"environment": "cloud"
}'
TensorFlow example
azureml-examples: Distributed TensorFlow training with MultiWorkerMirroredStrategy
WARNING
The older-generation machine SKU Standard_NC24r is RDMA-enabled, but it does not contain SR-IOV hardware
required for InfiniBand.
If you create an AmlCompute cluster of one of these RDMA-capable, InfiniBand-enabled sizes, the OS image will
come with the Mellanox OFED driver required to enable InfiniBand preinstalled and preconfigured.
Next steps
Deploy machine learning models to Azure
Deploy and score a machine learning model by using a managed online endpoint (preview)
Reference architecture for distributed deep learning training in Azure
Train scikit-learn models at scale with Azure
Machine Learning
5/25/2022 • 6 minutes to read • Edit Online
Prerequisites
You can run this code in either an Azure Machine Learning compute instance, or your own Jupyter Notebook:
Azure Machine Learning compute instance
Complete the Quickstart: Get started with Azure Machine Learning to create a compute instance.
Every compute instance includes a dedicated notebook server pre-loaded with the SDK and the
notebooks sample repository.
Select the notebook tab in the Azure Machine Learning studio. In the samples training folder, find a
completed and expanded notebook by navigating to this directory: how-to-use-azureml > ml-
frameworks > scikit-learn > train-hyperparameter-tune-deploy-with-sklearn folder.
You can use the pre-populated code in the sample training folder to complete this tutorial.
Create a Jupyter Notebook server and run the code in the following sections.
Install the Azure Machine Learning SDK (>= 1.13.0).
Create a workspace configuration file.
ws = Workspace.from_config()
Prepare scripts
In this tutorial, the training script train_iris.py is already provided for you. In practice, you should be able to
take any custom training script as is and run it with Azure ML without having to modify your code.
Notes:
The provided training script shows how to log some metrics to your Azure ML run using the Run object
within the script.
The provided training script uses example data from the iris = datasets.load_iris() function. To use and
access your own data, see how to train with datasets to make data available during training.
Define your environment
To define the Azure ML Environment that encapsulates your training script's dependencies, you can either define
a custom environment or use and Azure ML curated environment.
Use a curated environment
Optionally, Azure ML provides prebuilt, curated environments if you don't want to define your own
environment.
If you want to use a curated environment, you can run the following command instead:
dependencies:
- python=3.6.2
- scikit-learn
- numpy
- pip:
- azureml-defaults
Create an Azure ML environment from this Conda environment specification. The environment will be packaged
into a Docker container at runtime.
For more information on creating and using environments, see Create and use software environments in Azure
Machine Learning.
src = ScriptRunConfig(source_directory='.',
script='train_iris.py',
arguments=['--kernel', 'linear', '--penalty', 1.0],
environment=sklearn_env)
If you want to instead run your job on a remote cluster, you can specify the desired compute target to the
compute_target parameter of ScriptRunConfig.
compute_target = ws.compute_targets['<my-cluster-name>']
src = ScriptRunConfig(source_directory='.',
script='train_iris.py',
arguments=['--kernel', 'linear', '--penalty', 1.0],
compute_target=compute_target,
environment=sklearn_env)
run = Experiment(ws,'Tutorial-TrainIRIS').submit(src)
run.wait_for_completion(show_output=True)
WARNING
Azure Machine Learning runs training scripts by copying the entire source directory. If you have sensitive data that you
don't want to upload, use a .ignore file or don't include it in the source directory . Instead, access your data using an
Azure ML dataset.
joblib.dump(svm_model_linear, 'model.joblib')
Register the model to your workspace with the following code. By specifying the parameters model_framework ,
model_framework_version , and resource_configuration , no-code model deployment becomes available. No-code
model deployment allows you to directly deploy your model as a web service from the registered model, and
the ResourceConfiguration object defines the compute resource for the web service.
model = run.register_model(model_name='sklearn-iris',
model_path='outputs/model.joblib',
model_framework=Model.Framework.SCIKITLEARN,
model_framework_version='0.19.1',
resource_configuration=ResourceConfiguration(cpu=1, memory_in_gb=0.5))
Deployment
The model you just registered can be deployed the exact same way as any other registered model in Azure ML.
The deployment how-to contains a section on registering models, but you can skip directly to creating a
compute target for deployment, since you already have a registered model.
(Preview) No -code model deployment
Instead of the traditional deployment route, you can also use the no-code deployment feature (preview) for
scikit-learn. No-code model deployment is supported for all built-in scikit-learn model types. By registering your
model as shown above with the model_framework , model_framework_version , and resource_configuration
parameters, you can simply use the deploy() static function to deploy your model.
NOTE: These dependencies are included in the pre-built scikit-learn inference container.
- azureml-defaults
- inference-schema[numpy-support]
- scikit-learn
- numpy
The full how-to covers deployment in Azure Machine Learning in greater depth.
Next steps
In this article, you trained and registered a scikit-learn model, and learned about deployment options. See these
other articles to learn more about Azure Machine Learning.
Track run metrics during training
Tune hyperparameters
Train TensorFlow models at scale with Azure
Machine Learning
5/25/2022 • 8 minutes to read • Edit Online
Prerequisites
Run this code on either of these environments:
Azure Machine Learning compute instance - no downloads or installation necessary
Complete the Quickstart: Get started with Azure Machine Learning to create a dedicated notebook
server pre-loaded with the SDK and the sample repository.
In the samples deep learning folder on the notebook server, find a completed and expanded notebook
by navigating to this directory: how-to-use-azureml > ml-frameworks > tensorflow > train-
hyperparameter-tune-deploy-with-tensorflow folder.
Your own Jupyter Notebook server
Install the Azure Machine Learning SDK (>= 1.15.0).
Create a workspace configuration file.
Download the sample script files tf_mnist.py and utils.py
You can also find a completed Jupyter Notebook version of this guide on the GitHub samples page. The
notebook includes expanded sections covering intelligent hyperparameter tuning, model deployment,
and notebook widgets.
Initialize a workspace
The Azure Machine Learning workspace is the top-level resource for the service. It provides you with a
centralized place to work with all the artifacts you create. In the Python SDK, you can access the workspace
artifacts by creating a workspace object.
Create a workspace object from the config.json file created in the prerequisites section.
ws = Workspace.from_config()
web_paths = [
'http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz',
'http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz',
'http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz',
'http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz'
]
dataset = Dataset.File.from_files(path = web_paths)
Use the register() method to register the data set to your workspace so they can be shared with others,
reused across various experiments, and referred to by name in your training script.
dataset = dataset.register(workspace=ws,
name='mnist-dataset',
description='training and test dataset',
create_new_version=True)
try:
compute_target = ComputeTarget(workspace=ws, name=cluster_name)
print('Found existing compute target')
except ComputeTargetException:
print('Creating a new compute target...')
compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6',
max_nodes=4)
NOTE
You may choose to use low-priority VMs to run some or all of your workloads. See how to create a low-priority VM.
For more information on compute targets, see the what is a compute target article.
Define your environment
To define the Azure ML Environment that encapsulates your training script's dependencies, you can either define
a custom environment or use an Azure ML curated environment.
Use a curated environment
Azure ML provides prebuilt, curated environments if you don't want to define your own environment. Azure ML
has several CPU and GPU curated environments for TensorFlow corresponding to different versions of
TensorFlow. For more info, see Azure ML Curated Environments.
If you want to use a curated environment, you can run the following command instead:
curated_env_name = 'AzureML-TensorFlow-2.2-GPU'
tf_env = Environment.get(workspace=ws, name=curated_env_name)
To see the packages included in the curated environment, you can write out the conda dependencies to disk:
tf_env.save_to_directory(path=curated_env_name)
Make sure the curated environment includes all the dependencies required by your training script. If not, you'll
have to modify the environment to include the missing dependencies. If the environment is modified, you'll have
to give it a new name, as the 'AzureML' prefix is reserved for curated environments. If you modified the conda
dependencies YAML file, you can create a new environment from it with a new name, for example:
tf_env = Environment.from_conda_specification(name='tensorflow-2.2-gpu',
file_path='./conda_dependencies.yml')
If you had instead modified the curated environment object directly, you can clone that environment with a new
name:
tf_env = tf_env.clone(new_name='tensorflow-2.2-gpu')
Create a custom environment
You can also create your own Azure ML environment that encapsulates your training script's dependencies.
First, define your conda dependencies in a YAML file; in this example the file is named conda_dependencies.yml .
channels:
- conda-forge
dependencies:
- python=3.6.2
- pip:
- azureml-defaults
- tensorflow-gpu==2.2.0
Create an Azure ML environment from this conda environment specification. The environment will be packaged
into a Docker container at runtime.
By default if no base image is specified, Azure ML will use a CPU image
azureml.core.environment.DEFAULT_CPU_IMAGE as the base image. Since this example runs training on a GPU
cluster, you'll need to specify a GPU base image that has the necessary GPU drivers and dependencies. Azure ML
maintains a set of base images published on Microsoft Container Registry (MCR) that you can use, see the
Azure/AzureML-Containers GitHub repo for more information.
tf_env = Environment.from_conda_specification(name='tensorflow-2.2-gpu',
file_path='./conda_dependencies.yml')
TIP
Optionally, you can just capture all your dependencies directly in a custom Docker image or Dockerfile, and create your
environment from that. For more information, see Train with custom image.
For more information on creating and using environments, see Create and use software environments in Azure
Machine Learning.
src = ScriptRunConfig(source_directory=script_folder,
script='tf_mnist.py',
arguments=args,
compute_target=compute_target,
environment=tf_env)
WARNING
Azure Machine Learning runs training scripts by copying the entire source directory. If you have sensitive data that you
don't want to upload, use a .ignore file or don't include it in the source directory . Instead, access your data using an
Azure ML dataset.
For more information on configuring jobs with ScriptRunConfig, see Configure and submit training runs.
WARNING
If you were previously using the TensorFlow estimator to configure your TensorFlow training jobs, please note that
Estimators have been deprecated as of the 1.19.0 SDK release. With Azure ML SDK >= 1.15.0, ScriptRunConfig is the
recommended way to configure training jobs, including those using deep learning frameworks. For common migration
questions, see the Estimator to ScriptRunConfig migration guide.
Submit a run
The Run object provides the interface to the run history while the job is running and after it has completed.
model = run.register_model(model_name='tf-mnist',
model_path='outputs/model',
model_framework=Model.Framework.TENSORFLOW,
model_framework_version='2.0',
resource_configuration=ResourceConfiguration(cpu=1, memory_in_gb=0.5))
You can also download a local copy of the model by using the Run object. In the training script tf_mnist.py , a
TensorFlow saver object persists the model to a local folder (local to the compute target). You can use the Run
object to download a copy.
Distributed training
Azure Machine Learning also supports multi-node distributed TensorFlow jobs so that you can scale your
training workloads. You can easily run distributed TensorFlow jobs and Azure ML will manage the orchestration
for you.
Azure ML supports running distributed TensorFlow jobs with both Horovod and TensorFlow's built-in distributed
training API.
For more information about distributed training, see the Distributed GPU training guide.
The full how-to covers deployment in Azure Machine Learning in greater depth.
Next steps
In this article, you trained and registered a TensorFlow model, and learned about options for deployment. See
these other articles to learn more about Azure Machine Learning.
Track run metrics during training
Tune hyperparameters
Reference architecture for distributed deep learning training in Azure
Train Keras models at scale with Azure Machine
Learning
5/25/2022 • 7 minutes to read • Edit Online
NOTE
If you are using the Keras API tf.keras built into TensorFlow and not the standalone Keras package, refer instead to Train
TensorFlow models.
Prerequisites
Run this code on either of these environments:
Azure Machine Learning compute instance - no downloads or installation necessary
Complete the Quickstart: Get started with Azure Machine Learning to create a dedicated notebook
server pre-loaded with the SDK and the sample repository.
In the samples folder on the notebook server, find a completed and expanded notebook by navigating
to this directory: how-to-use-azureml > ml-frameworks > keras > train-hyperparameter-
tune-deploy-with-keras folder.
Your own Jupyter Notebook server
Install the Azure Machine Learning SDK (>= 1.15.0).
Create a workspace configuration file.
Download the sample script files keras_mnist.py and utils.py
You can also find a completed Jupyter Notebook version of this guide on the GitHub samples page. The
notebook includes expanded sections covering intelligent hyperparameter tuning, model deployment,
and notebook widgets.
import os
import azureml
from azureml.core import Experiment
from azureml.core import Environment
from azureml.core import Workspace, Run
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException
Initialize a workspace
The Azure Machine Learning workspace is the top-level resource for the service. It provides you with a
centralized place to work with all the artifacts you create. In the Python SDK, you can access the workspace
artifacts by creating a workspace object.
Create a workspace object from the config.json file created in the prerequisites section.
ws = Workspace.from_config()
web_paths = [
'http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz',
'http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz',
'http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz',
'http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz'
]
dataset = Dataset.File.from_files(path=web_paths)
You can use the register() method to register the data set to your workspace so they can be shared with
others, reused across various experiments, and referred to by name in your training script.
dataset = dataset.register(workspace=ws,
name='mnist-dataset',
description='training and test dataset',
create_new_version=True)
try:
compute_target = ComputeTarget(workspace=ws, name=cluster_name)
print('Found existing compute target')
except ComputeTargetException:
print('Creating a new compute target...')
compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6',
max_nodes=4)
NOTE
You may choose to use low-priority VMs to run some or all of your workloads. See how to create a low-priority VM.
For more information on compute targets, see the what is a compute target article.
Define your environment
Define the Azure ML Environment that encapsulates your training script's dependencies.
First, define your conda dependencies in a YAML file; in this example the file is named conda_dependencies.yml .
channels:
- conda-forge
dependencies:
- python=3.6.2
- pip:
- azureml-defaults
- tensorflow-gpu==2.0.0
- keras<=2.3.1
- matplotlib
Create an Azure ML environment from this conda environment specification. The environment will be packaged
into a Docker container at runtime.
By default if no base image is specified, Azure ML will use a CPU image
azureml.core.environment.DEFAULT_CPU_IMAGE as the base image. Since this example runs training on a GPU
cluster, you will need to specify a GPU base image that has the necessary GPU drivers and dependencies. Azure
ML maintains a set of base images published on Microsoft Container Registry (MCR) that you can use, see the
Azure/AzureML-Containers GitHub repo for more information.
For more information on creating and using environments, see Create and use software environments in Azure
Machine Learning.
Create a ScriptRunConfig object to specify the configuration details of your training job, including your training
script, environment to use, and the compute target to run on.
Any arguments to your training script will be passed via command line if specified in the arguments parameter.
The DatasetConsumptionConfig for our FileDataset is passed as an argument to the training script, for the
--data-folder argument. Azure ML will resolve this DatasetConsumptionConfig to the mount-point of the
backing datastore, which can then be accessed from the training script.
src = ScriptRunConfig(source_directory=script_folder,
script='keras_mnist.py',
arguments=args,
compute_target=compute_target,
environment=keras_env)
For more information on configuring jobs with ScriptRunConfig, see Configure and submit training runs.
WARNING
If you were previously using the TensorFlow estimator to configure your Keras training jobs, please note that Estimators
have been deprecated as of the 1.19.0 SDK release. With Azure ML SDK >= 1.15.0, ScriptRunConfig is the recommended
way to configure training jobs, including those using deep learning frameworks. For common migration questions, see the
Estimator to ScriptRunConfig migration guide.
TIP
The deployment how-to contains a section on registering models, but you can skip directly to creating a compute target
for deployment, since you already have a registered model.
You can also download a local copy of the model. This can be useful for doing additional model validation work
locally. In the training script, keras_mnist.py , a TensorFlow saver object persists the model to a local folder (local
to the compute target). You can use the Run object to download a copy from the run history.
for f in run.get_file_names():
if f.startswith('outputs/model'):
output_file_path = os.path.join('./model', f.split('/')[-1])
print('Downloading from {} to {} ...'.format(f, output_file_path))
run.download_file(name=f, output_file_path=output_file_path)
Next steps
In this article, you trained and registered a Keras model on Azure Machine Learning. To learn how to deploy a
model, continue on to our model deployment article.
How and where to deploy models
Track run metrics during training
Tune hyperparameters
Deploy a trained model
Reference architecture for distributed deep learning training in Azure
Train PyTorch models at scale with Azure Machine
Learning
5/25/2022 • 8 minutes to read • Edit Online
Prerequisites
Run this code on either of these environments:
Azure Machine Learning compute instance - no downloads or installation necessary
Complete the Quickstart: Get started with Azure Machine Learning to create a dedicated notebook
server pre-loaded with the SDK and the sample repository.
In the samples deep learning folder on the notebook server, find a completed and expanded notebook
by navigating to this directory: how-to-use-azureml > ml-frameworks > pytorch > train-
hyperparameter-tune-deploy-with-pytorch folder.
Your own Jupyter Notebook server
Install the Azure Machine Learning SDK (>= 1.15.0).
Create a workspace configuration file.
Download the sample script files pytorch_train.py
You can also find a completed Jupyter Notebook version of this guide on the GitHub samples page. The
notebook includes expanded sections covering intelligent hyperparameter tuning, model deployment,
and notebook widgets.
Initialize a workspace
The Azure Machine Learning workspace is the top-level resource for the service. It provides you with a
centralized place to work with all the artifacts you create. In the Python SDK, you can access the workspace
artifacts by creating a workspace object.
Create a workspace object from the config.json file created in the prerequisites section.
ws = Workspace.from_config()
project_folder = './pytorch-birds'
os.makedirs(project_folder, exist_ok=True)
shutil.copy('pytorch_train.py', project_folder)
If you instead want to create a CPU cluster, provide a different VM size to the vm_size parameter, such as
STANDARD_D2_V2.
NOTE
You may choose to use low-priority VMs to run some or all of your workloads. See how to create a low-priority VM.
For more information on compute targets, see the what is a compute target article.
Define your environment
To define the Azure ML Environment that encapsulates your training script's dependencies, you can either define
a custom environment or use an Azure ML curated environment.
Use a curated environment
Azure ML provides prebuilt, curated environments if you don't want to define your own environment. There are
several CPU and GPU curated environments for PyTorch corresponding to different versions of PyTorch.
If you want to use a curated environment, you can run the following command instead:
curated_env_name = 'AzureML-PyTorch-1.6-GPU'
pytorch_env = Environment.get(workspace=ws, name=curated_env_name)
To see the packages included in the curated environment, you can write out the conda dependencies to disk:
pytorch_env.save_to_directory(path=curated_env_name)
Make sure the curated environment includes all the dependencies required by your training script. If not, you'll
have to modify the environment to include the missing dependencies. If the environment is modified, you'll have
to give it a new name, as the 'AzureML' prefix is reserved for curated environments. If you modified the conda
dependencies YAML file, you can create a new environment from it with a new name, for example:
pytorch_env = Environment.from_conda_specification(name='pytorch-1.6-gpu',
file_path='./conda_dependencies.yml')
If you had instead modified the curated environment object directly, you can clone that environment with a new
name:
pytorch_env = pytorch_env.clone(new_name='pytorch-1.6-gpu')
channels:
- conda-forge
dependencies:
- python=3.6.2
- pip=21.3.1
- pip:
- azureml-defaults
- torch==1.6.0
- torchvision==0.7.0
- future==0.17.1
- pillow
Create an Azure ML environment from this conda environment specification. The environment will be packaged
into a Docker container at runtime.
By default if no base image is specified, Azure ML will use a CPU image
azureml.core.environment.DEFAULT_CPU_IMAGE as the base image. Since this example runs training on a GPU
cluster, you'll need to specify a GPU base image that has the necessary GPU drivers and dependencies. Azure ML
maintains a set of base images published on Microsoft Container Registry (MCR) that you can use. For more
information, see AzureML-Containers GitHub repo.
pytorch_env = Environment.from_conda_specification(name='pytorch-1.6-gpu',
file_path='./conda_dependencies.yml')
TIP
Optionally, you can just capture all your dependencies directly in a custom Docker image or Dockerfile, and create your
environment from that. For more information, see Train with custom image.
For more information on creating and using environments, see Create and use software environments in Azure
Machine Learning.
src = ScriptRunConfig(source_directory=project_folder,
script='pytorch_train.py',
arguments=['--num_epochs', 30, '--output_dir', './outputs'],
compute_target=compute_target,
environment=pytorch_env)
WARNING
Azure Machine Learning runs training scripts by copying the entire source directory. If you have sensitive data that you
don't want to upload, use a .ignore file or don't include it in the source directory . Instead, access your data using an
Azure ML dataset.
For more information on configuring jobs with ScriptRunConfig, see Configure and submit training runs.
WARNING
If you were previously using the PyTorch estimator to configure your PyTorch training jobs, please note that Estimators
have been deprecated as of the 1.19.0 SDK release. With Azure ML SDK >= 1.15.0, ScriptRunConfig is the recommended
way to configure training jobs, including those using deep learning frameworks. For common migration questions, see the
Estimator to ScriptRunConfig migration guide.
You can also download a local copy of the model by using the Run object. In the training script
pytorch_train.py , a PyTorch save object persists the model to a local folder (local to the compute target). You
can use the Run object to download a copy.
Distributed training
Azure Machine Learning also supports multi-node distributed PyTorch jobs so that you can scale your training
workloads. You can easily run distributed PyTorch jobs and Azure ML will manage the orchestration for you.
Azure ML supports running distributed PyTorch jobs with both Horovod and PyTorch's built-in
DistributedDataParallel module.
For more information about distributed training, see the Distributed GPU training guide.
Export to ONNX
To optimize inference with the ONNX Runtime, convert your trained PyTorch model to the ONNX format.
Inference, or model scoring, is the phase where the deployed model is used for prediction, most commonly on
production data. For an example, see the Exporting model from PyTorch to ONNX tutorial.
Next steps
In this article, you trained and registered a deep learning, neural network using PyTorch on Azure Machine
Learning. To learn how to deploy a model, continue on to our model deployment article.
How and where to deploy models
Track run metrics during training
Tune hyperparameters
Deploy a trained model
Reference architecture for distributed deep learning training in Azure
Train a model by using a custom Docker image
5/25/2022 • 4 minutes to read • Edit Online
Prerequisites
Run the code on either of these environments:
Azure Machine Learning compute instance (no downloads or installation necessary):
Complete the Quickstart: Get started with Azure Machine Learning tutorial to create a dedicated
notebook server preloaded with the SDK and the sample repository.
Your own Jupyter Notebook server:
Create a workspace configuration file.
Install the Azure Machine Learning SDK.
Create an Azure container registry or other Docker registry that's available on the internet.
ws = Workspace.from_config()
fastai_env = Environment("fastai2")
The specified base image in the following code supports the fast.ai library, which allows for distributed deep-
learning capabilities. For more information, see the fast.ai Docker Hub repository.
When you're using your custom Docker image, you might already have your Python environment properly set
up. In that case, set the user_managed_dependencies flag to True to use your custom image's built-in Python
environment. By default, Azure Machine Learning builds a Conda environment with dependencies that you
specified. The service runs the script in that environment instead of using any Python libraries that you installed
on the base image.
fastai_env.docker.base_image = "fastdotai/fastai2:latest"
fastai_env.python.user_managed_dependencies = True
# Set the base image to None, because the image is defined by Dockerfile.
fastai_env.docker.base_image = None
fastai_env.docker.base_dockerfile = dockerfile
IMPORTANT
Azure Machine Learning only supports Docker images that provide the following software:
Ubuntu 18.04 or greater.
Conda 4.7.# or greater.
Python 3.6+.
A POSIX compliant shell available at /bin/sh is required in any container image used for training.
For more information about creating and managing Azure Machine Learning environments, see Create and use
software environments.
Create or attach a compute target
You need to create a compute target for training your model. In this tutorial, you create AmlCompute as your
training compute resource.
Creation of AmlCompute takes a few minutes. If the AmlCompute resource is already in your workspace, this code
skips the creation process.
As with other Azure services, there are limits on certain resources (for example, AmlCompute ) associated with the
Azure Machine Learning service. For more information, see Default limits and how to request a higher quota.
try:
compute_target = ComputeTarget(workspace=ws, name=cluster_name)
print('Found existing compute target.')
except ComputeTargetException:
print('Creating a new compute target...')
compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6',
max_nodes=4)
compute_target.wait_for_completion(show_output=True)
IMPORTANT
Use CPU SKUs for any image build on compute.
src = ScriptRunConfig(source_directory='fastai-example',
script='train.py',
compute_target=compute_target,
environment=fastai_env)
run = Experiment(ws,'Tutorial-fastai').submit(src)
run.wait_for_completion(show_output=True)
WARNING
Azure Machine Learning runs training scripts by copying the entire source directory. If you have sensitive data that you
don't want to upload, use an .ignore file or don't include it in the source directory. Instead, access your data by using a
datastore.
Next steps
In this article, you trained a model by using a custom Docker image. See these other articles to learn more about
Azure Machine Learning:
Track run metrics during training.
Deploy a model by using a custom Docker image.
Migrating from Estimators to ScriptRunConfig
5/25/2022 • 3 minutes to read • Edit Online
IMPORTANT
To migrate to ScriptRunConfig from Estimators, make sure you are using >= 1.15.0 of the Python SDK.
curated_env_name = 'AzureML-PyTorch-1.6-GPU'
pytorch_env = Environment.get(workspace=ws, name=curated_env_name)
compute_target = ws.compute_targets['my-cluster']
src = ScriptRunConfig(source_directory='.',
script='train.py',
compute_target=compute_target,
environment=pytorch_env)
If you want to specify environment variables that will get set on the process where the training script is
executed, use the Environment object:
src = ScriptRunConfig(source_directory='.',
script='train.py',
arguments=['--data-folder', mnist_ds.as_mount()], # or mnist_ds.as_download() to
download
compute_target=compute_target,
environment=pytorch_env)
DataReference (old)
While we recommend using Azure ML Datasets over the old DataReference way, if you are still using
DataReferences for any reason, you must configure your job as follows:
# if you want to pass a DataReference object, such as the below:
datastore = ws.get_default_datastore()
data_ref = datastore.path('./foo').as_mount()
src = ScriptRunConfig(source_directory='.',
script='train.py',
arguments=['--data-folder', str(data_ref)], # cast the DataReference object to str
compute_target=compute_target,
environment=pytorch_env)
src.run_config.data_references = {data_ref.data_reference_name: data_ref.to_config()} # set a dict of the
DataReference(s) you want to the `data_references` attribute of the ScriptRunConfig's underlying
RunConfiguration object.
Distributed training
If you need to configure a distributed job for training, do so by specifying the distributed_job_config
parameter in the ScriptRunConfig constructor. Pass in an MpiConfiguration, PyTorchConfiguration, or
TensorflowConfiguration for distributed jobs of the respective types.
The following example configures a PyTorch training job to use distributed training with MPI/Horovod:
src = ScriptRunConfig(source_directory='.',
script='train.py',
compute_target=compute_target,
environment=pytorch_env,
distributed_job_config=MpiConfiguration(node_count=2, process_count_per_node=2))
Miscellaneous
If you need to access the underlying RunConfiguration object for a ScriptRunConfig for any reason, you can do
so as follows:
src.run_config
Next steps
Configure and submit training runs
Reinforcement learning (preview) with Azure
Machine Learning
5/25/2022 • 11 minutes to read • Edit Online
WARNING
Azure Machine Learning reinforcement learning via the azureml.contrib.train.rl package will no longer be
supported after June 2022. We recommend customers use the Ray on Azure Machine Learning library for reinforcement
learning experiments with Azure Machine Learning. For an example, see the notebook Reinforcement Learning in Azure
Machine Learning - Pong problem.
In this article, you learn how to train a reinforcement learning (RL) agent to play the video game Pong. You use
the open-source Python library Ray RLlib with Azure Machine Learning to manage the complexity of distributed
RL.
In this article you learn how to:
Set up an experiment
Define head and worker nodes
Create an RL estimator
Submit an experiment to start a run
View results
This article is based on the RLlib Pong example that can be found in the Azure Machine Learning notebook
GitHub repository.
Prerequisites
Run this code in either of these environments. We recommend you try Azure Machine Learning compute
instance for the fastest start-up experience. You can quickly clone and run the reinforcement sample notebooks
on an Azure Machine Learning compute instance.
Azure Machine Learning compute instance
Learn how to clone sample notebooks in Tutorial: Train and deploy a model.
Clone the how-to-use-azureml folder instead of tutorials
Run the virtual network setup notebook located at
/how-to-use-azureml/reinforcement-learning/setup/devenv_setup.ipynb to open network ports used for
distributed reinforcement learning.
Run the sample notebook
/how-to-use-azureml/reinforcement-learning/atari-on-distributed-compute/pong_rllib.ipynb
Initialize a workspace
Initialize a workspace object from the config.json file created in the prerequisites section. If you are executing
this code in an Azure Machine Learning Compute Instance, the configuration file has already been created for
you.
ws = Workspace.from_config()
experiment_name='rllib-pong-multi-node'
vnet = 'your_vnet'
# This example uses GPU VM. For using CPU VM, set SKU to STANDARD_D2_V2
head_vm_size = 'STANDARD_NC6'
if head_compute_name in ws.compute_targets:
head_compute_target = ws.compute_targets[head_compute_name]
if head_compute_target and type(head_compute_target) is AmlCompute:
print(f'found head compute target. just use it {head_compute_name}')
else:
print('creating a new head compute target...')
provisioning_config = AmlCompute.provisioning_configuration(vm_size = head_vm_size,
min_nodes = head_compute_min_nodes,
max_nodes = head_compute_max_nodes,
vnet_resourcegroup_name = ws.resource_group,
vnet_name = vnet_name,
subnet_name = 'default')
# can poll for a minimum number of nodes and for a specific timeout.
# if no min node count is provided it will use the scale settings for the cluster
head_compute_target.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)
NOTE
You may choose to use low-priority VMs to run some or all of your workloads. See how to create a low-priority VM.
# This example uses CPU VM. For using GPU VM, set SKU to STANDARD_NC6
worker_vm_size = 'STANDARD_D2_V2'
# can poll for a minimum number of nodes and for a specific timeout.
# if no min node count is provided it will use the scale settings for the cluster
worker_compute_target.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)
# GPU
use_gpu=False,
training_algorithm = "IMPALA"
rl_environment = "PongNoFrameskip-v4"
# Add additional single quotes at the both ends of string values as we have spaces in the
# string parameters, outermost quotes are not passed to scripts as they are not actually part of string
# Number of GPUs
# Number of ray workers
"--config": '\'{"num_gpus": 1, "num_workers": 13}\'',
# Pip packages
pip_packages=pip_packages,
# GPU usage
use_gpu=True,
Entry script
The entry script pong_rllib.py trains a neural network using the OpenAI Gym environment PongNoFrameSkip-v4
. OpenAI Gyms are standardized interfaces to test reinforcement learning algorithms on classic Atari games.
This example uses a training algorithm known as IMPALA (Importance Weighted Actor-Learner Architecture).
IMPALA parallelizes each individual learning actor to scale across many compute nodes without sacrificing
speed or stability.
Ray Tune orchestrates the IMPALA worker tasks.
import ray
import ray.tune as tune
from ray.rllib import train
import os
import sys
DEFAULT_RAY_ADDRESS = 'localhost:6379'
if __name__ == "__main__":
# Parse arguments
train_parser = train.create_parser()
args = train_parser.parse_args()
print("Algorithm config:", args.config)
if args.ray_address is None:
args.ray_address = DEFAULT_RAY_ADDRESS
ray.init(address=args.ray_address)
tune.run(run_or_experiment=args.run,
config={
"env": args.env,
"num_gpus": args.config["num_gpus"],
"num_workers": args.config["num_workers"],
"callbacks": {"on_train_result": callbacks.on_train_result},
"sample_batch_size": 50,
"train_batch_size": 1000,
"num_sgd_iter": 2,
"num_data_loader_buffers": 2,
"model": {
"dim": 42
},
},
stop=args.stop,
local_dir='./logs')
def on_train_result(info):
'''Callback on train result to record metrics returned by trainer.
'''
run = Run.get_context()
run.log(
name='episode_reward_mean',
value=info["result"]["episode_reward_mean"])
run.log(
name='episodes_total',
value=info["result"]["episodes_total"])
Submit a run
Run handles the run history of in-progress or complete jobs.
run = exp.submit(config=rl_estimator)
NOTE
The run may take up to 30 to 45 minutes to complete.
RunDetails(run).show()
run.wait_for_completion()
The episode_reward_mean plot shows the mean number of points scored per training epoch. You can see that
the training agent initially performed poorly, losing its matches without scoring a single point (shown by a
reward_mean of -21). Within 100 iterations, the training agent learned to beat the computer opponent by an
average of 18 points.
If you browse logs of the child run, you can see the evaluation results recorded in driver_log.txt file. You may
need to wait several minutes before these metrics become available on the Run page.
In short work, you have learned to configure multiple compute resources to train a reinforcement learning
agent to play Pong very well against a computer opponent.
Next steps
In this article, you learned how to train a reinforcement learning agent using an IMPALA learning agent. To see
additional examples, go to the Azure Machine Learning Reinforcement Learning GitHub repository.
Train ML models with MLflow Projects and Azure
Machine Learning (preview)
5/25/2022 • 5 minutes to read • Edit Online
IMPORTANT
This feature is currently in public preview. This preview version is provided without a service-level agreement, and it's not
recommended for production workloads. Certain features might not be supported or might have constrained capabilities.
For more information, see Supplemental Terms of Use for Microsoft Azure Previews.
In this article, learn how to enable MLflow's tracking URI and logging API, collectively known as MLflow Tracking,
to submit training jobs with MLflow Projects and Azure Machine Learning backend support. You can submit jobs
locally with Azure Machine Learning tracking or migrate your runs to the cloud like via an Azure Machine
Learning Compute.
MLflow Projects allow for you to organize and describe your code to let other data scientists (or automated
tools) run it. MLflow Projects with Azure Machine Learning enable you to track and manage your training runs
in your workspace.
MLflow is an open-source library for managing the life cycle of your machine learning experiments. MLFlow
Tracking is a component of MLflow that logs and tracks your training run metrics and model artifacts, no matter
your experiment's environment--locally on your computer, on a remote compute target, a virtual machine, or an
Azure Databricks cluster.
Learn more about the MLflow and Azure Machine Learning integration..
TIP
The information in this document is primarily for data scientists and developers who want to monitor the model training
process. If you are an administrator interested in monitoring resource usage and events from Azure Machine Learning,
such as quotas, completed training runs, or completed model deployments, see Monitoring Azure Machine Learning.
Prerequisites
Install the azureml-mlflow package.
This package automatically brings in azureml-core of the The Azure Machine Learning Python SDK,
which provides the connectivity for MLflow to access your workspace.
Create an Azure Machine Learning Workspace.
See which access permissions you need to perform your MLflow operations with your workspace.
Import the mlflow and Workspace classes to access MLflow's tracking URI and configure your workspace.
import mlflow
from azureml.core import Workspace
ws = Workspace.from_config()
mlflow.set_tracking_uri(ws.get_mlflow_tracking_uri())
Set the MLflow experiment name with set_experiment() and start your training run with start_run() . Then,
use log_metric() to activate the MLflow logging API and begin logging your training run metrics.
experiment_name = 'experiment-with-mlflow-projects'
mlflow.set_experiment(experiment_name)
Create the backend configuration object to store necessary information for the integration such as, the compute
target and which type of managed environment to use.
Add the azureml-mlflow package as a pip dependency to your environment configuration file in order to track
metrics and key artifacts in your workspace.
name: mlflow-example
channels:
- defaults
- anaconda
- conda-forge
dependencies:
- python=3.6
- scikit-learn=0.19.1
- pip
- pip:
- mlflow
- azureml-mlflow
Submit the local run and ensure you set the parameter backend = "azureml" . With this setting, you can submit
runs locally and get the added support of automatic output tracking, log files, snapshots, and printed errors in
your workspace.
View your runs and metrics in the Azure Machine Learning studio.
local_env_run = mlflow.projects.run(uri=".",
parameters={"alpha":0.3},
backend = "azureml",
use_conda=False,
backend_config = backend_config,
)
Import the mlflow and Workspace classes to access MLflow's tracking URI and configure your workspace.
import mlflow
from azureml.core import Workspace
ws = Workspace.from_config()
mlflow.set_tracking_uri(ws.get_mlflow_tracking_uri())
Set the MLflow experiment name with set_experiment() and start your training run with start_run() . Then,
use log_metric() to activate the MLflow logging API and begin logging your training run metrics.
experiment_name = 'train-mlflow-project-amlcompute'
mlflow.set_experiment(experiment_name)
Create the backend configuration object to store necessary information for the integration such as, the compute
target and which type of managed environment to use.
The integration accepts "COMPUTE" and "USE_CONDA" as parameters where "COMPUTE" is set to the name of
your remote compute cluster and "USE_CONDA" which creates a new environment for the project from the
environment configuration file. If "COMPUTE" is present in the object, the project will be automatically
submitted to the remote compute and ignore "USE_CONDA". MLflow accepts a dictionary object or a JSON file.
# dictionary
backend_config = {"COMPUTE": "cpu-cluster", "USE_CONDA": False}
Add the azureml-mlflow package as a pip dependency to your environment configuration file in order to track
metrics and key artifacts in your workspace.
name: mlflow-example
channels:
- defaults
- anaconda
- conda-forge
dependencies:
- python=3.6
- scikit-learn=0.19.1
- pip
- pip:
- mlflow
- azureml-mlflow
Submit the mlflow project run and ensure you set the parameter backend = "azureml" . With this setting, you can
submit your run to your remote compute and get the added support of automatic output tracking, log files,
snapshots, and printed errors in your workspace.
View your runs and metrics in the Azure Machine Learning studio.
remote_mlflow_run = mlflow.projects.run(uri=".",
parameters={"alpha":0.3},
backend = "azureml",
backend_config = backend_config,
)
Clean up resources
If you don't plan to use the logged metrics and artifacts in your workspace, the ability to delete them individually
is currently unavailable. Instead, delete the resource group that contains the storage account and workspace, so
you don't incur any charges:
1. In the Azure portal, select Resource groups on the far left.
Example notebooks
The MLflow with Azure ML notebooks demonstrate and expand upon concepts presented in this article.
Train an MLflow project on a local compute
Train an MLflow project on remote compute.
NOTE
A community-driven repository of examples using mlflow can be found at https://github.com/Azure/azureml-examples.
Next steps
Deploy models with MLflow.
Monitor your production models for data drift.
Track Azure Databricks runs with MLflow.
Manage your models.
Start, monitor, and track run history in studio
5/25/2022 • 4 minutes to read • Edit Online
You can use Azure Machine Learning studio to monitor, organize, and track your runs for training and
experimentation. Your ML run history is an important part of an explainable and repeatable ML development
process.
This article shows how to do the following tasks:
Add run display name.
Create a custom view.
Add a run description.
Tag and find runs.
Run search over your run history.
Cancel or fail runs.
Monitor the run status by email notification.
TIP
If you're looking for information on using the Azure Machine Learning SDK v1 or CLI v1, see How to track, monitor,
and analyze runs (v1).
If you're looking for information on monitoring training runs from the CLI or SDK v2, see Track experiments with
MLflow and CLI v2.
If you're looking for information on monitoring the Azure Machine Learning service and associated Azure services, see
How to monitor Azure Machine Learning.
If you're looking for information on monitoring models deployed as web services, see Collect model data and Monitor
with Application Insights.
Prerequisites
You'll need the following items:
To use Azure Machine Learning, you must have an Azure subscription. If you don't have an Azure
subscription, create a free account before you begin. Try the free or paid version of Azure Machine Learning.
You must have an Azure Machine Learning workspace. A workspace is created in Install, set up, and use the
CLI (v2).
OR
2. Use the search bar to quickly find runs by searching on the run metadata like the run status,
descriptions, experiment names, and submitter name.
NOTE
The Azure Log Analytics Workspace is a different type of Azure Resource than the Azure Machine
Learning ser vice Workspace . If there are no options in that list, you can create a Log Analytics Workspace.
4. In the Logs tab, add a New aler t rule .
5. See how to create and manage log alerts using Azure Monitor.
Example notebooks
The following notebooks demonstrate the concepts in this article:
To learn more about the logging APIs, see the logging API notebook.
For more information about managing runs with the Azure Machine Learning SDK, see the manage runs
notebook.
Next steps
To learn how to log metrics for your experiments, see Log metrics during training runs.
To learn how to monitor resources and logs from Azure Machine Learning, see Monitoring Azure Machine
Learning.
Track ML experiments and models with MLflow or
the Azure Machine Learning CLI (v2)
5/25/2022 • 7 minutes to read • Edit Online
IMPORTANT
When using the Azure Machine Learning SDK v2, no native logging is provided. Instead, use MLflow's tracking capabilities.
For more information, see How to log and view metrics (v2).
TIP
The information in this document is primarily for data scientists and developers who want to monitor the model training
process. If you are an administrator interested in monitoring resource usage and events from Azure Machine Learning,
such as quotas, completed training runs, or completed model deployments, see Monitoring Azure Machine Learning.
NOTE
You can use the MLflow Skinny client which is a lightweight MLflow package without SQL storage, server, UI, or data
science dependencies. This is recommended for users who primarily need the tracking and logging capabilities without
importing the full suite of MLflow features including deployments.
Prerequisites
Install the azureml-mlflow package.
This package automatically brings in azureml-core of the The Azure Machine Learning Python SDK,
which provides the connectivity for MLflow to access your workspace.
Create an Azure Machine Learning Workspace.
See which access permissions you need to perform your MLflow operations with your workspace.
Install and set up CLI (v2) and make sure you install the ml extension.
Install and set up SDK(v2) for Python
IMPORTANT
Make sure you are logged in to your Azure account on your local machine, otherwise the tracking URI returns an empty
string. If you are using any Azure ML compute the tracking environment and experiment name is already configured..
MLflow SDK
Terminal
The following code uses mlflow and your Azure Machine Learning workspace details to construct the unique
MLFLow tracking URI associated with your workspace. Then the method set_tracking_uri() points the MLflow
tracking URI to that URI.
tracking_uri = ml_client.workspaces.get(name=workspace).mlflow_tracking_uri
mlflow.set_tracking_uri(tracking_uri)
print(tracking_uri)
MLflow SDK
Terminal
experiment_name = 'experiment_with_mlflow'
mlflow.set_experiment(experiment_name)
# imports
import os
import mlflow
# define functions
def main():
mlflow.log_param("hello_param", "world")
mlflow.log_metric("hello_metric", random())
os.system(f"echo 'hello world' > helloworld.txt")
mlflow.log_artifact("helloworld.txt")
# run functions
if __name__ == "__main__":
# run main function
main()
Use the Azure Machine Learning CLI (v2) to submit a remote run. When using the Azure Machine Learning CLI
(v2), the MLflow tracking URI and experiment name are set automatically and directs the logging from MLflow
to your workspace. Learn more about logging Azure Machine Learning CLI (v2) experiments with MLflow
Create a YAML file with your job definition in a job.yml file. This file should be created outside the src
directory. Copy this code into the file:
$schema: https://azuremlschemas.azureedge.net/latest/commandJob.schema.json
command: python hello-mlflow.py
code: src
environment: azureml:AzureML-sklearn-1.0-ubuntu20.04-py38-cpu@latest
compute: azureml:cpu-cluster
Open your terminal and use the following to submit the job.
metrics = finished_mlflow_run.data.metrics
tags = finished_mlflow_run.data.tags
params = finished_mlflow_run.data.params
print(metrics,tags,params)
client.list_artifacts(run_id)
runs.head(10)
Automatic logging
With Azure Machine Learning and MLFlow, users can log metrics, model parameters and model artifacts
automatically when training a model. A variety of popular machine learning libraries are supported.
To enable automatic logging insert the following code before your training code:
mlflow.autolog()
Manage models
Register and track your models with the Azure Machine Learning model registry, which supports the MLflow
model registry. Azure Machine Learning models are aligned with the MLflow model schema making it easy to
export and import these models across different workflows. The MLflow-related metadata, such as run ID, is also
tracked with the registered model for traceability. Users can submit training runs, register, and deploy models
produced from MLflow runs.
If you want to deploy and register your production ready model in one step, see Deploy and register MLflow
models.
To register and view a model from a run, use the following steps:
1. Once a run is complete, call the register_model() method.
# the model folder produced from a run is registered. This includes the MLmodel file, model.pkl and
the conda.yaml.
model_path = "model"
model_uri = 'runs:/{}/{}'.format(run_id, model_path)
mlflow.register_model(model_uri,"registered_model_name")
2. View the registered model in your workspace with Azure Machine Learning studio.
In the following example the registered model, my-model has MLflow tracking metadata tagged.
3. Select the Ar tifacts tab to see all the model files that align with the MLflow model schema (conda.yaml,
MLmodel, model.pkl).
4. Select MLmodel to see the MLmodel file generated by the run.
Example files
Use MLflow and CLI (v2)
Limitations
The following MLflow methods are not fully supported with Azure Machine Learning.
mlflow.tracking.MlflowClient.create_experiment()
mlflow.tracking.MlflowClient.rename_experiment()
mlflow.tracking.MlflowClient.search_runs()
mlflow.tracking.MlflowClient.download_artifacts()
mlflow.tracking.MlflowClient.rename_registered_model()
Next steps
Deploy MLflow models to managed online endpoint (preview).
Manage your models.
Track Azure Databricks ML experiments with MLflow
and Azure Machine Learning
5/25/2022 • 6 minutes to read • Edit Online
TIP
The information in this document is primarily for data scientists and developers who want to monitor the model training
process. If you are an administrator interested in monitoring resource usage and events from Azure Machine Learning,
such as quotas, completed training runs, or completed model deployments, see Monitoring Azure Machine Learning.
Prerequisites
Install the azureml-mlflow package.
This package automatically brings in azureml-core of the The Azure Machine Learning Python SDK,
which provides the connectivity for MLflow to access your workspace.
An Azure Databricks workspace and cluster.
Create an Azure Machine Learning Workspace.
See which access permissions you need to perform your MLflow operations with your workspace.
Install libraries
To install libraries on your cluster, navigate to the Libraries tab and select Install New
In the Package field, type azureml-mlflow and then select install. Repeat this step as necessary to install other
additional packages to your cluster for your experiment.
subscription_id = 'subscription_id'
# Azure Machine Learning resource group NOT the managed resource group
resource_group = 'resource_group_name'
NOTE
MLflow Tracking in a private link enabled Azure Machine Learning workspace is not supported.
Set MLflow Tracking to only track in your Azure Machine Learning workspace
If you prefer to manage your tracked experiments in a centralized location, you can set MLflow tracking to only
track in your Azure Machine Learning workspace.
Include the following code in your script:
uri = ws.get_mlflow_tracking_uri()
mlflow.set_tracking_uri(uri)
In your training script, import mlflow to use the MLflow logging APIs, and start logging your run metrics. The
following example, logs the epoch loss metric.
import mlflow
mlflow.log_metric('epoch_loss', loss.item())
scoreDf = spark.table({table_name}).where({required_conditions})
#Make Prediction
preds = (scoreDf
display(preds)
Clean up resources
If you wish to keep your Azure Databricks workspace, but no longer need the Azure ML workspace, you can
delete the Azure ML workspace. This action results in unlinking your Azure Databricks workspace and the Azure
ML workspace.
If you don't plan to use the logged metrics and artifacts in your workspace, the ability to delete them individually
is unavailable at this time. Instead, delete the resource group that contains the storage account and workspace,
so you don't incur any charges:
1. In the Azure portal, select Resource groups on the far left.
Example notebooks
The MLflow with Azure Machine Learning notebooks demonstrate and expand upon concepts presented in this
article.
Next steps
Deploy MLflow models as an Azure web service.
Manage your models.
Track experiment runs with MLflow and Azure Machine Learning.
Learn more about Azure Databricks and MLflow.
Log & view metrics and log files
5/25/2022 • 6 minutes to read • Edit Online
Log real-time information using MLflow Tracking. You can log models, metrics, and artifacts with MLflow as it
supports local mode to cloud portability.
IMPORTANT
Unlike the Azure Machine Learning SDK v1, there is no logging functionality in the SDK v2 preview.
Logs can help you diagnose errors and warnings, or track performance metrics like parameters and model
performance. In this article, you learn how to enable logging in the following scenarios:
Log training run metrics
Interactive training sessions
Python native logging settings
Logging from additional sources
TIP
This article shows you how to monitor the model training process. If you're interested in monitoring resource usage and
events from Azure Machine learning, such as quotas, completed training runs, or completed model deployments, see
Monitoring Azure Machine Learning.
Prerequisites
To use Azure Machine Learning, you must have an Azure subscription. If you don't have an Azure
subscription, create a free account before you begin. Try the free or paid version of Azure Machine
Learning.
You must have an Azure Machine Learning workspace. A workspace is created in Install, set up, and use
the CLI (v2).
You must have the aureml-core , mlflow , and azure-mlflow packages installed. If you don't, use the
following command to install them in your development environment:
Data types
The following table describes how to log specific value types:
LO GGED VA L UE EXA M P L E C O DE N OT ES
TIP
You do not need to set the tracking URI when using a notebook running on an Azure Machine Learning compute
instance.
ws = Workspace.from_config()
# Set the tracking URI to the Azure ML backend
# Not needed if running on Azure ML compute instance
# or compute cluster
mlflow.set_tracking_uri(ws.get_mlflow_tracking_uri())
Interactive runs
When training interactively, such as in a Jupyter Notebook, use the following pattern:
1. Create or set the active experiment.
2. Start the run.
3. Use logging methods to log metrics and other information.
4. End the run.
For example, the following code snippet demonstrates setting the tracking URI, creating an experiment, and then
logging during a run
For more information on MLflow logging APIs, see the MLflow reference.
Remote runs
For remote training runs, the tracking URI and experiment are set automatically. Otherwise, the options for
logging the run are the same as for interactive logging:
Call mlflow.start_run() , log information, and then call mlflow.end_run() .
Use the context manager paradigm with mlflow.start_run() .
Call a logging API such as mlflow.log_metric() , which will start a run if one doesn't already exist.
Log a model
To save the model from a training run, use the log_model() API for the framework you're working with. For
example, mlflow.sklearn.log_model(). For frameworks that MLflow doesn't support, see Convert custom models
to MLflow.
You can view the metrics, parameters, and tags for the run in the data field of the run object.
metrics = finished_mlflow_run.data.metrics
tags = finished_mlflow_run.data.tags
params = finished_mlflow_run.data.params
NOTE
The metrics dictionary under mlflow.entities.Run.data.metrics only returns the most recently logged value for a
given metric name. For example, if you log, in order, 1, then 2, then 3, then 4 to a metric called sample_metric , only 4 is
present in the metrics dictionary for sample_metric .
To get all metrics logged for a particular metric name, you can use MlFlowClient.get_metric_history() .
user_logs folder
This folder contains information about the user generated logs. This folder is open by default, and the
std_log.txt log is selected. The std_log.txt is where your code's logs (for example, print statements) show up.
This file contains stdout log and stderr logs from your control script and training script, one per process. In
most cases, you'll monitor the logs here.
system_logs folder
This folder contains the logs generated by Azure Machine Learning and it will be closed by default. The logs
generated by the system are grouped into different folders, based on the stage of the job in the runtime.
Other folders
For jobs training on multi-compute clusters, logs are present for each node IP. The structure for each node is the
same as single node jobs. There's one more logs folder for overall execution, stderr, and stdout logs.
Azure Machine Learning logs information from various sources during training, such as AutoML or the Docker
container that runs the training job. Many of these logs aren't documented. If you encounter problems and
contact Microsoft support, they may be able to use these logs during troubleshooting.
Next steps
Train ML models with MLflow and Azure Machine Learning.
Migrate from SDK v1 logging to MLflow tracking.
Visualize experiment runs and metrics with
TensorBoard and Azure Machine Learning
5/25/2022 • 7 minutes to read • Edit Online
TIP
The information in this document is primarily for data scientists and developers who want to monitor the model training
process. If you are an administrator interested in monitoring resource usage and events from Azure Machine learning,
such as quotas, completed training runs, or completed model deployments, see Monitoring Azure Machine Learning.
Prerequisites
To launch TensorBoard and view your experiment run histories, your experiments need to have previously
enabled logging to track its metrics and performance.
The code in this document can be run in either of the following environments:
Azure Machine Learning compute instance - no downloads or installation necessary
Complete the Quickstart: Get started with Azure Machine Learning to create a dedicated
notebook server pre-loaded with the SDK and the sample repository.
In the samples folder on the notebook server, find two completed and expanded notebooks by
navigating to these directories:
how-to-use-azureml > track-and-monitor-experiments > tensorboard >
expor t-run-histor y-to-tensorboard > expor t-run-histor y-to-
tensorboard.ipynb
how-to-use-azureml > track-and-monitor-experiments > tensorboard >
tensorboard > tensorboard.ipynb
Your own Juptyer notebook server
Install the Azure Machine Learning SDK with the tensorboard extra
Create an Azure Machine Learning workspace.
Create a workspace configuration file.
Option 1: Directly view run history in TensorBoard
This option works for experiments that natively outputs log files consumable by TensorBoard, such as PyTorch,
Chainer, and TensorFlow experiments. If that is not the case of your experiment, use the export_to_tensorboard()
method instead.
The following example code uses the MNIST demo experiment from TensorFlow's repository in a remote
compute target, Azure Machine Learning Compute. Next, we will configure and start a run for training the
TensorFlow model, and then start TensorBoard against this TensorFlow experiment.
Set experiment name and create project folder
Here we name the experiment and create its folder.
# experiment folder
exp_dir = './sample_projects/' + experiment_name
if not path.exists(exp_dir):
makedirs(exp_dir)
import requests
import os
tf_code =
requests.get("https://raw.githubusercontent.com/tensorflow/tensorflow/r1.8/tensorflow/examples/tutorials/mni
st/mnist_with_summaries.py")
with open(os.path.join(exp_dir, "mnist_with_summaries.py"), "w") as file:
file.write(tf_code.text)
Throughout the MNIST code file, mnist_with_summaries.py, notice that there are lines that call
tf.summary.scalar() , tf.summary.histogram() , tf.summary.FileWriter() etc. These methods group, log, and tag
key metrics of your experiments into run history. The tf.summary.FileWriter() is especially important as it
serializes the data from your logged experiment metrics, which allows for TensorBoard to generate
visualizations off of them.
Configure experiment
In the following, we configure our experiment and set up directories for logs and data. These logs will be
uploaded to the run history, which TensorBoard accesses later.
NOTE
For this TensorFlow example, you will need to install TensorFlow on your local machine. Further, the TensorBoard module
(that is, the one included with TensorFlow) must be accessible to this notebook's kernel, as the local machine is what runs
TensorBoard.
import azureml.core
from azureml.core import Workspace
from azureml.core import Experiment
ws = Workspace.from_config()
if not path.exists(data_dir):
makedirs(data_dir)
os.environ["TEST_TMPDIR"] = data_dir
# Writing logs to ./logs results in their being uploaded to the run history,
# and thus, made accessible to our TensorBoard instance.
args = ["--log_dir", logs_dir]
# Create an experiment
exp = Experiment(ws, experiment_name)
cluster_name = "cpu-cluster"
cts = ws.compute_targets
found = False
if cluster_name in cts and cts[cluster_name].type == 'AmlCompute':
found = True
print('Found existing compute target.')
compute_target = cts[cluster_name]
if not found:
print('Creating a new compute target...')
compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2',
max_nodes=4)
compute_target.wait_for_completion(show_output=True, min_node_count=None)
NOTE
You may choose to use low-priority VMs to run some or all of your workloads. See how to create a low-priority VM.
src = ScriptRunConfig(source_directory=exp_dir,
script='mnist_with_summaries.py',
arguments=args,
compute_target=compute_target,
environment=tf_env)
run = exp.submit(src)
Launch TensorBoard
You can launch TensorBoard during your run or after it completes. In the following, we create a TensorBoard
object instance, tb , that takes the experiment run history loaded in the run , and then launches TensorBoard
with the start() method.
The TensorBoard constructor takes an array of runs, so be sure and pass it in as a single-element array.
tb = Tensorboard([run])
# After your job completes, be sure to stop() the streaming otherwise it will continue to run.
tb.stop()
NOTE
While this example used TensorFlow, TensorBoard can be used as easily with PyTorch or Chainer. TensorFlow must be
available on the machine running TensorBoard, but is not necessary on the machine doing PyTorch or Chainer
computations.
Here we load the diabetes dataset-- a built-in small dataset that comes with scikit-learn, and split it into test and
training sets.
from sklearn.datasets import load_diabetes
from sklearn.linear_model import Ridge
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split
X, y = load_diabetes(return_X_y=True)
columns = ['age', 'gender', 'bmi', 'bp', 's1', 's2', 's3', 's4', 's5', 's6']
x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
data = {
"train":{"x":x_train, "y":y_train},
"test":{"x":x_test, "y":y_test}
}
reg = Ridge(alpha=alpha)
reg.fit(data["train"]["x"], data["train"]["y"])
preds = reg.predict(data["test"]["x"])
mse = mean_squared_error(preds, data["test"]["y"])
# End train and eval
logdir = 'exportedTBlogs'
log_path = os.path.join(os.getcwd(), logdir)
try:
os.stat(log_path)
except os.error:
os.mkdir(log_path)
print(logdir)
root_run.complete()
NOTE
You can also export a particular run to TensorBoard by specifying the name of the run
export_to_tensorboard(run_name, logdir)
# The TensorBoard constructor takes an array of runs, so be sure and pass it in as a single-element array
here
tb = Tensorboard([], local_root=logdir, port=6006)
When you're done, make sure to call the stop() method of the TensorBoard object. Otherwise, TensorBoard will
continue to run until you shut down the notebook kernel.
tb.stop()
Next steps
In this how-to you, created two experiments and learned how to launch TensorBoard against their run histories
to identify areas for potential tuning and retraining.
If you are satisfied with your model, head over to our How to deploy a model article.
Learn more about hyperparameter tuning.
Migrate logging from SDK v1 to SDK v2 (preview)
5/25/2022 • 6 minutes to read • Edit Online
The Azure Machine Learning Python SDK v2 does not provide native logging APIs. Instead, we recommend that
you use MLflow Tracking. If you're migrating from SDK v1 to SDK v2 (preview), use the information in this
section to understand the MLflow equivalents of SDK v1 logging APIs.
Setup
To use MLflow tracking, import mlflow and optionally set the tracking URI for your workspace. If you're training
on an Azure Machine Learning compute resource, such as a compute instance or compute cluster, the tracking
URI is set automatically. If you're using a different compute resource, such as your laptop or desktop, you need
to set the tracking URI.
import mlflow
# The rest of this is only needed if you are not using an Azure ML compute
## Construct AzureML MLFLOW TRACKING URI
def get_azureml_mlflow_tracking_uri(region, subscription_id, resource_group, workspace):
return
"azureml://{}.api.azureml.ms/mlflow/v1.0/subscriptions/{}/resourceGroups/{}/providers/Microsoft.MachineLearn
ingServices/workspaces/{}".format(region, subscription_id, resource_group, workspace)
azureml_run.log("sample_int_metric", 1)
mlflow.log_metric("sample_int_metric", 1)
azureml_run.log("sample_boolean_metric", True)
mlflow.log_metric("sample_boolean_metric", 1)
azureml_run.log("sample_string_metric", "a_metric")
mlflow.log_text("sample_string_text", "string.txt")
The string will be logged as an artifact, not as a metric. In Azure Machine Learning studio, the value will be
displayed in the Outputs + logs tab.
Log an image to a PNG or JPEG file
SDK v1
azureml_run.log_image("sample_image", path="Azure.png")
mlflow.log_artifact("Azure.png")
The image is logged as an artifact and will appear in the Images tab in Azure Machine Learning Studio.
Log a matplotlib.pyplot
SDK v1
plt.plot([1, 2, 3])
azureml_run.log_image("sample_pyplot", plot=plt)
plt.plot([1, 2, 3])
fig, ax = plt.subplots()
ax.plot([0, 1], [2, 3])
mlflow.log_figure(fig, "sample_pyplot.png")
The image is logged as an artifact and will appear in the Images tab in Azure Machine Learning Studio.
The mlflow.log_figure method is experimental .
Log a list of metrics
SDK v1
list_to_log = [1, 2, 3, 2, 1, 2, 3, 2, 1]
azureml_run.log_list('sample_list', list_to_log)
list_to_log = [1, 2, 3, 2, 1, 2, 3, 2, 1]
from mlflow.entities import Metric
from mlflow.tracking import MlflowClient
import time
table = {
"col1" : [1, 2, 3],
"col2" : [4, 5, 6]
}
azureml_run.log_table("table", table)
# Using mlflow.log_artifact
import json
azureml_run.log_accuracy_table('v1_accuracy_table', ACCURACY_TABLE)
mlflow.log_dict(ACCURACY_TABLE, 'mlflow_accuracy_table.json')
azureml_run.log_confusion_matrix('v1_confusion_matrix', json.loads(CONF_MATRIX))
mlflow.log_dict(CONF_MATRIX, 'mlflow_confusion_matrix.json')
azureml_run.log_predictions('test_predictions', json.loads(PREDICTIONS))
mlflow.log_dict(PREDICTIONS, 'mlflow_predictions.json')
RESIDUALS = '{"schema_type": "residuals", "schema_version": "v1", "data": {"bin_edges": [100, 200, 300], ' +
\
'"bin_counts": [0.88, 20, 30, 50.99]}}'
azureml_run.log_residuals('test_residuals', json.loads(RESIDUALS))
RESIDUALS = '{"schema_type": "residuals", "schema_version": "v1", "data": {"bin_edges": [100, 200, 300], ' +
\
'"bin_counts": [0.88, 20, 30, 50.99]}}'
mlflow.log_dict(RESIDUALS, 'mlflow_residuals.json')
The following example shows how to view the metrics , tags , and params :
metrics = finished_mlflow_run.data.metrics
tags = finished_mlflow_run.data.tags
params = finished_mlflow_run.data.params
NOTE
The metrics will only have the most recently logged value for a given metric. For example, if you log in order a value of
1 , then 2 , 3 , and finally 4 to a metric named sample_metric , only 4 will be present in the metrics dictionary.
To get all metrics logged for a specific named metric, use MlFlowClient.get_metric_history:
print(client.get_run(multiple_metrics_run.info.run_id).data.metrics)
print(client.get_metric_history(multiple_metrics_run.info.run_id, "sample_metric"))
The info field provides general information about the run, such as start time, run ID, experiment ID, etc.:
run_start_time = finished_mlflow_run.info.start_time
run_experiment_id = finished_mlflow_run.info.experiment_id
run_id = finished_mlflow_run.info.run_id
client.list_artifacts(finished_mlflow_run.info.run_id)
client.download_artifacts(finished_mlflow_run.info.run_id, "Azure.png")
Next steps
Track ML experiments and models with MLflow
Log and view metrics
Use authentication credential secrets in Azure
Machine Learning training runs
5/25/2022 • 2 minutes to read • Edit Online
Set secrets
In the Azure Machine Learning, the Keyvault class contains methods for setting secrets. In your local Python
session, first obtain a reference to your workspace Key Vault, and then use the set_secret() method to set a
secret by name and value. The set_secret method updates the secret value if the name already exists.
ws = Workspace.from_config()
my_secret = os.environ.get("MY_SECRET")
keyvault = ws.get_default_keyvault()
keyvault.set_secret(name="mysecret", value = my_secret)
Do not put the secret value in your Python code as it is insecure to store it in file as cleartext. Instead, obtain the
secret value from an environment variable, for example Azure DevOps build secret, or from interactive user
input.
You can list secret names using the list_secrets() method and there is also a batch version,set_secrets() that
allows you to set multiple secrets at a time.
IMPORTANT
Using list_secrets() will only list secrets created through set_secret() or set_secrets() using the Azure ML
SDK. It will not list secrets created by something other than the SDK. For example, a secret created using the Azure portal
or Azure PowerShell will not be listed.
You can use get_secret() to get a secret value from the key vault, regardless of how it was created. So you can retrieve
secrets that are not listed by list_secrets() .
Get secrets
In your local code, you can use the get_secret() method to get the secret value by name.
For runs submitted the Experiment.submit , use the get_secret() method with the Run class. Because a
submitted run is aware of its workspace, this method shortcuts the Workspace instantiation and returns the
secret value directly.
run = Run.get_context()
secret_value = run.get_secret(name="mysecret")
Next steps
View example notebook
Learn about enterprise security with Azure Machine Learning
Train models with the CLI (v2)
5/25/2022 • 21 minutes to read • Edit Online
Prerequisites
To use the CLI (v2), you must have an Azure subscription. If you don't have an Azure subscription, create a
free account before you begin. Try the free or paid version of Azure Machine Learning today.
Install and set up CLI (v2).
TIP
For a full-featured development environment with schema validation and autocompletion for job YAMLs, use Visual
Studio Code and the Azure Machine Learning extension.
Using --depth 1 clones only the latest commit to the repository, which reduces time to complete the operation.
Create compute
You can create an Azure Machine Learning compute cluster from the command line. For instance, the following
commands will create one cluster named cpu-cluster and one named gpu-cluster .
You are not charged for compute at this point as cpu-cluster and gpu-cluster will remain at zero nodes until a
job is submitted. Learn more about how to manage and optimize cost for AmlCompute.
The following example jobs in this article use one of cpu-cluster or gpu-cluster . Adjust these names in the
example jobs throughout this article as needed to the name of your cluster(s). Use az ml compute create -h for
more details on compute create options.
Hello world
For the Azure Machine Learning CLI (v2), jobs are authored in YAML format. A job aggregates:
What to run
How to run it
Where to run it
The "hello world" job has all three:
$schema: https://azuremlschemas.azureedge.net/latest/commandJob.schema.json
command: echo "hello world"
environment:
image: library/python:latest
compute: azureml:cpu-cluster
WARNING
Python must be installed in the environment used for jobs. Run apt-get update -y && apt-get install python3 -y
in your Dockerfile to install if needed, or derive from a base image with Python installed already.
TIP
The $schema: throughout examples allows for schema validation and autocompletion if authoring YAML files in VSCode
with the Azure Machine Learning extension.
TIP
The --web parameter will attempt to open your job in the Azure Machine Learning studio using your default web
browser. The --stream parameter can be used to stream logs to the console and block further commands.
Job names
Most az ml job commands other than create and list require --name/-n , which is a job's name or "Run ID"
in the studio. You typically should not directly set a job's name property during creation as it must be unique per
workspace. Azure Machine Learning generates a random GUID for the job name if it is not set that can be
obtained from the output of job creation in the CLI or by copying the "Run ID" property in the studio and
MLflow APIs.
To automate jobs in scripts and CI/CD flows, you can capture a job's name when it is created by querying and
stripping the output by adding --query name -o tsv . The specifics will vary by shell, but for Bash:
Organize jobs
To organize jobs, you can set a display name, experiment name, description, and tags. Descriptions support
markdown syntax in the studio. These properties are mutable after a job is created. A full example:
$schema: https://azuremlschemas.azureedge.net/latest/commandJob.schema.json
command: echo "hello world"
environment:
image: library/python:latest
compute: azureml:cpu-cluster
tags:
hello: world
display_name: hello-world-example
experiment_name: hello-world-example
description: |
# Azure Machine Learning "hello world" job
This is a "hello world" job running in the cloud via Azure Machine Learning!
## Description
Markdown is supported in the studio for job descriptions! You can edit the description there or via CLI.
You can run this job, where these properties will be immediately visible in the studio:
Using --set you can update the mutable values after the job is created:
Environment variables
You can set environment variables for use in your job:
$schema: https://azuremlschemas.azureedge.net/latest/commandJob.schema.json
command: echo $hello_env_var
environment:
image: library/python:latest
compute: azureml:cpu-cluster
environment_variables:
hello_env_var: "hello world"
WARNING
You should use inputs for parameterizing arguments in the command . See inputs and outputs.
TIP
If you're following along and running from the examples repository, you can see the source repository and commit in the
studio on any of the jobs run so far.
You can specify the code field in a job with the value as the path to a source code directory. A snapshot of the
directory is taken and uploaded with the job. The contents of the directory are directly available from the
working directory of the job.
WARNING
The source code should not include large data inputs for model training. Instead, use data inputs. You can use a
.gitignore file in the source code directory to exclude files from the snapshot. The limits for snapshot size are 300 MB
or 2000 files.
$schema: https://azuremlschemas.azureedge.net/latest/commandJob.schema.json
command: python hello-mlflow.py
code: src
environment: azureml:AzureML-sklearn-1.0-ubuntu20.04-py38-cpu@latest
compute: azureml:cpu-cluster
The Python script is in the local source code directory. The command then invokes python to run the script. The
same pattern can be applied for other programming languages.
WARNING
The "hello" family of jobs shown in this article are for demonstration purposes and do not necessarily follow
recommended best practices. Using && or similar to run many commands in a sequence is not recommended -- instead,
consider writing the commands to a script file in the source code directory and invoking the script in your command .
Installing dependencies in the command , as shown above via pip install , is not recommended -- instead, all job
dependencies should be specified as part of your environment. See how to manage environments with the CLI (v2) for
details.
WARNING
The mlflow and azureml-mlflow packages must be installed in your Python environment for MLflow tracking features.
TIP
The mlflow.autolog() call is supported for many popular frameworks and takes care of the majority of logging for you.
Let's take a look at Python script invoked in the job above that uses mlflow to log a parameter, a metric, and an
artifact:
# imports
import os
import mlflow
# define functions
def main():
mlflow.log_param("hello_param", "world")
mlflow.log_metric("hello_metric", random())
os.system(f"echo 'hello world' > helloworld.txt")
mlflow.log_artifact("helloworld.txt")
# run functions
if __name__ == "__main__":
# run main function
main()
You can run this job in the cloud via Azure Machine Learning, where it is tracked and auditable:
$schema: https://azuremlschemas.azureedge.net/latest/commandJob.schema.json
command: |
echo ${{inputs.hello_string}}
echo ${{inputs.hello_number}}
environment:
image: library/python:latest
inputs:
hello_string: "hello world"
hello_number: 42
compute: azureml:cpu-cluster
Literal inputs to jobs can be converted to search space inputs for hyperparameter sweeps on model training.
Search space inputs
For a sweep job, you can specify a search space for literal inputs to be chosen from. For the full range of options
for search space inputs, see the sweep job YAML syntax reference.
Let's demonstrate the concept with a simple Python script that takes in arguments and logs a random metric:
# imports
import os
import mlflow
import argparse
# define functions
def main(args):
# print inputs
print(f"A: {args.A}")
print(f"B: {args.B}")
print(f"C: {args.C}")
def parse_args():
# setup arg parser
parser = argparse.ArgumentParser()
# add arguments
parser.add_argument("--A", type=float, default=0.5)
parser.add_argument("--B", type=str, default="hello world")
parser.add_argument("--C", type=float, default=1.0)
# parse args
args = parser.parse_args()
# return args
return args
# run script
if __name__ == "__main__":
# parse args
args = parse_args()
Data inputs
Data inputs are resolved to a path on the job compute's local filesystem. Let's demonstrate with the classic Iris
dataset, which is hosted publicly in a blob container at
https://azuremlexamples.blob.core.windows.net/datasets/iris.csv .
You can author a Python script that takes the path to the Iris CSV file as an argument, reads it into a dataframe,
prints the first 5 lines, and saves it to the outputs directory.
# imports
import os
import argparse
import pandas as pd
# define functions
def main(args):
# read in data
df = pd.read_csv(args.iris_csv)
def parse_args():
# setup arg parser
parser = argparse.ArgumentParser()
# add arguments
parser.add_argument("--iris-csv", type=str)
# parse args
args = parser.parse_args()
# return args
return args
# run script
if __name__ == "__main__":
# parse args
args = parse_args()
Azure storage URI inputs can be specified, which will mount or download data to the local filesystem. You can
specify a single file:
$schema: https://azuremlschemas.azureedge.net/latest/commandJob.schema.json
command: |
echo "--iris-csv: ${{inputs.iris_csv}}"
python hello-iris.py --iris-csv ${{inputs.iris_csv}}
code: src
inputs:
iris_csv:
type: uri_file
path: https://azuremlexamples.blob.core.windows.net/datasets/iris.csv
environment: azureml:AzureML-sklearn-1.0-ubuntu20.04-py38-cpu@latest
compute: azureml:cpu-cluster
And run:
And run:
Make sure you accurately specify the input type field to either type: uri_file or type: uri_folder
corresponding to whether the data points to a single file or a folder. The default if the type field is omitted is
uri_folder .
Private data
For private data in Azure Blob Storage or Azure Data Lake Storage connected to Azure Machine Learning
through a datastore, you can use Azure Machine Learning URIs of the format
azureml://datastores/<DATASTORE_NAME>/paths/<PATH_TO_DATA> for input data. For instance, if you upload the Iris
CSV to a directory named /example-data/ in the Blob container corresponding to the datastore named
workspaceblobstore you can modify a previous job to use the file in the datastore:
WARNING
Running these jobs will fail for you if you have not copied the Iris CSV to the same location in workspaceblobstore .
$schema: https://azuremlschemas.azureedge.net/latest/commandJob.schema.json
command: |
echo "--iris-csv: ${{inputs.iris_csv}}"
python hello-iris.py --iris-csv ${{inputs.iris_csv}}
code: src
inputs:
iris_csv:
type: uri_file
path: azureml://datastores/workspaceblobstore/paths/example-data/iris.csv
environment: azureml:AzureML-sklearn-1.0-ubuntu20.04-py38-cpu@latest
compute: azureml:cpu-cluster
Default outputs
The ./outputs and ./logs directories receive special treatment by Azure Machine Learning. If you write any
files to these directories during your job, these files will get uploaded to the job so that you can still access them
once the job is complete. The ./outputs folder is uploaded at the end of the job, while the files written to
./logs are uploaded in real time. Use the latter if you want to stream logs during the job, such as TensorBoard
logs.
In addition, any files logged from MLflow via autologging or mlflow.log_* for artifact logging will get
automatically persisted as well. Collectively with the aforementioned ./outputs and ./logs directories, this set
of files and directories will be persisted to a directory that corresponds to that job's default artifact location.
You can modify the "hello world" job to output to a file in the default outputs directory instead of printing to
stdout :
$schema: https://azuremlschemas.azureedge.net/latest/commandJob.schema.json
command: echo "hello world" > ./outputs/helloworld.txt
environment:
image: library/python:latest
compute: azureml:cpu-cluster
And download the logs, where helloworld.txt will be present in the <RUN_ID>/outputs/ directory:
Data outputs
You can specify named data outputs. This will create a directory in the default datastore which will be read/write
mounted by default.
You can modify the earlier "hello world" job to write to a named data output:
$schema: https://azuremlschemas.azureedge.net/latest/commandJob.schema.json
command: echo "hello world" > ${{outputs.hello_output}}/helloworld.txt
outputs:
hello_output:
environment:
image: python
compute: azureml:cpu-cluster
Hello pipelines
Pipeline jobs can run multiple jobs in parallel or in sequence. If there are input/output dependencies between
steps in a pipeline, the dependent step will run after the other completes.
You can split a "hello world" job into two jobs:
$schema: https://azuremlschemas.azureedge.net/latest/pipelineJob.schema.json
type: pipeline
display_name: hello_pipeline
jobs:
hello_job:
command: echo "hello"
environment: azureml:AzureML-sklearn-0.24-ubuntu18.04-py37-cpu@latest
compute: azureml:cpu-cluster
world_job:
command: echo "world"
environment: azureml:AzureML-sklearn-0.24-ubuntu18.04-py37-cpu@latest
compute: azureml:cpu-cluster
The "hello" and "world" jobs respectively will run in parallel if the compute target has the available resources to
do so.
To pass data between steps in a pipeline, define a data output in the "hello" job and a corresponding input in the
"world" job, which refers to the prior's output:
$schema: https://azuremlschemas.azureedge.net/latest/pipelineJob.schema.json
type: pipeline
display_name: hello_pipeline_io
jobs:
hello_job:
command: echo "hello" && echo "world" > ${{outputs.world_output}}/world.txt
environment: azureml:AzureML-sklearn-0.24-ubuntu18.04-py37-cpu@latest
compute: azureml:cpu-cluster
outputs:
world_output:
world_job:
command: cat ${{inputs.world_input}}/world.txt
environment: azureml:AzureML-sklearn-0.24-ubuntu18.04-py37-cpu:23
compute: azureml:cpu-cluster
inputs:
world_input: ${{parent.jobs.hello_job.outputs.world_output}}
This time, the "world" job will run after the "hello" job completes.
To avoid duplicating common settings across jobs in a pipeline, you can set them outside the jobs:
$schema: https://azuremlschemas.azureedge.net/latest/pipelineJob.schema.json
type: pipeline
display_name: hello_pipeline_settings
settings:
default_datestore: azureml:workspaceblobstore
default_compute: azureml:cpu-cluster
jobs:
hello_job:
command: echo 202204190 & echo "hello"
environment: azureml:AzureML-sklearn-0.24-ubuntu18.04-py37-cpu:23
world_job:
command: echo 202204190 & echo "hello"
environment: azureml:AzureML-sklearn-0.24-ubuntu18.04-py37-cpu:23
The corresponding setting on an individual job will override the common settings for a pipeline job. The
concepts so far can be combined into a three-step pipeline job with jobs "A", "B", and "C". The "C" job has a data
dependency on the "B" job, while the "A" job can run independently. The "A" job will also use an individually set
environment and bind one of its inputs to a top-level pipeline job input:
$schema: https://azuremlschemas.azureedge.net/latest/pipelineJob.schema.json
type: pipeline
display_name: hello_pipeline_abc
compute: azureml:cpu-cluster
inputs:
hello_string_top_level_input: "hello world"
jobs:
a:
command: echo hello ${{inputs.hello_string}}
environment: azureml:AzureML-sklearn-0.24-ubuntu18.04-py37-cpu@latest
inputs:
hello_string: ${{parent.inputs.hello_string_top_level_input}}
b:
command: echo "world" >> ${{outputs.world_output}}/world.txt
environment: azureml:AzureML-sklearn-0.24-ubuntu18.04-py37-cpu@latest
outputs:
world_output:
c:
command: echo ${{inputs.world_input}}/world.txt
environment: azureml:AzureML-sklearn-0.24-ubuntu18.04-py37-cpu@latest
inputs:
world_input: ${{parent.jobs.b.outputs.world_output}}
Train a model
In Azure Machine Learning you basically have two possible ways to train a model:
1. Leverage automated ML to train models with your data and get the best model for you. This approach
maximizes productivity by automating the iterative process of tuning hyperparameters and trying out
different algorithms.
2. Train a model with your own custom training script. This approach offers the most control and allows you to
customize your training.
Train a model with automated ML
Automated ML is the easiest way to train a model because you don't need to know how training algorithms
work exactly but you just need to provide your training/validation/test datasets and some basic configuration
parameters such as 'ML Task', 'target column', 'primary metric, 'timeout' etc, and the service will train multiple
models and try out various algorithms and hyperparameter combinations for you.
When you train with automated ML via the CLI (v2), you just need to create a .YAML file with an AutoML
configuration and provide it to the CLI for training job creation and submission.
The following example shows an AutoML configuration file for training a classification model where,
The primary metric is accuracy
The training has a time out of 180 minutes
The data for training is in the folder "./training-mltable-folder". Automated ML jobs only accept data in the
form of an MLTable .
$schema: https://azuremlsdk2.blob.core.windows.net/preview/0.0.1/autoMLJob.schema.json
type: automl
experiment_name: dpv2-cli-automl-classifier-experiment
# name: dpv2-cli-classifier-train-job-basic-01
description: A Classification job using bank marketing
compute: azureml:cpu-cluster
task: classification
primary_metric: accuracy
target_column_name: "y"
training_data:
path: "./training-mltable-folder"
type: mltable
limits:
timeout_minutes: 180
max_trials: 40
enable_early_termination: true
featurization:
mode: auto
That mentioned MLTable definition is what points to the training data file, in this case a local .csv file that will be
uploaded automatically:
paths:
- file: ./bank_marketing_train_data.csv
transformations:
- read_delimited:
delimiter: ','
encoding: 'ascii'
Finally, you can run it (create the AutoML job) with this CLI command:
To investigate additional AutoML model training examples using other ML-tasks such as regression, time-series
forecasting, image classification, object detection, NLP text-classification, etc., see the complete list of AutoML CLI
examples.
Train a model with a custom script
When training by using your own custom script, the first thing you need is that python script (.py), so let's add
some sklearn code into a Python script with MLflow tracking to train a model on the Iris CSV:
# imports
import os
import mlflow
import argparse
import pandas as pd
# define functions
def main(args):
# enable auto logging
mlflow.autolog()
# setup parameters
params = {
"C": args.C,
"kernel": args.kernel,
"degree": args.degree,
"gamma": args.gamma,
"coef0": args.coef0,
"shrinking": args.shrinking,
"probability": args.probability,
"tol": args.tol,
"cache_size": args.cache_size,
"class_weight": args.class_weight,
"verbose": args.verbose,
"max_iter": args.max_iter,
"decision_function_shape": args.decision_function_shape,
"break_ties": args.break_ties,
"random_state": args.random_state,
}
# read in data
df = pd.read_csv(args.iris_csv)
# process data
X_train, X_test, y_train, y_test = process_data(df, args.random_state)
# train model
model = train_model(params, X_train, X_test, y_train, y_test)
# train/test split
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=random_state
X, y, test_size=0.2, random_state=random_state
)
# return model
return model
def parse_args():
# setup arg parser
parser = argparse.ArgumentParser()
# add arguments
parser.add_argument("--iris-csv", type=str)
parser.add_argument("--C", type=float, default=1.0)
parser.add_argument("--kernel", type=str, default="rbf")
parser.add_argument("--degree", type=int, default=3)
parser.add_argument("--gamma", type=str, default="scale")
parser.add_argument("--coef0", type=float, default=0)
parser.add_argument("--shrinking", type=bool, default=False)
parser.add_argument("--probability", type=bool, default=False)
parser.add_argument("--tol", type=float, default=1e-3)
parser.add_argument("--cache_size", type=float, default=1024)
parser.add_argument("--class_weight", type=dict, default=None)
parser.add_argument("--verbose", type=bool, default=False)
parser.add_argument("--max_iter", type=int, default=-1)
parser.add_argument("--decision_function_shape", type=str, default="ovr")
parser.add_argument("--break_ties", type=bool, default=False)
parser.add_argument("--random_state", type=int, default=42)
# parse args
args = parser.parse_args()
# return args
return args
# run script
if __name__ == "__main__":
# parse args
args = parse_args()
The scikit-learn framework is supported by MLflow for autologging, so a single mlflow.autolog() call in the
script will log all model parameters, training metrics, model artifacts, and some extra artifacts (in this case a
confusion matrix image).
To run this in the cloud, specify as a job:
$schema: https://azuremlschemas.azureedge.net/latest/commandJob.schema.json
code: src
command: >-
python main.py
--iris-csv ${{inputs.iris_csv}}
--C ${{inputs.C}}
--kernel ${{inputs.kernel}}
--coef0 ${{inputs.coef0}}
inputs:
iris_csv:
type: uri_file
path: wasbs://datasets@azuremlexamples.blob.core.windows.net/iris.csv
C: 0.8
kernel: "rbf"
coef0: 0.1
environment: azureml:AzureML-sklearn-0.24-ubuntu18.04-py37-cpu@latest
compute: azureml:cpu-cluster
display_name: sklearn-iris-example
experiment_name: sklearn-iris-example
description: Train a scikit-learn SVM on the Iris dataset.
To register a model, you can upload the model files from the run to the model registry:
For the full set of configurable options for running command jobs, see the command job YAML schema
reference.
Sweep hyperparameters
You can modify the previous job to sweep over hyperparameters:
$schema: https://azuremlschemas.azureedge.net/latest/sweepJob.schema.json
type: sweep
trial:
code: src
command: >-
python main.py
--iris-csv ${{inputs.iris_csv}}
--C ${{search_space.C}}
--kernel ${{search_space.kernel}}
--coef0 ${{search_space.coef0}}
environment: azureml:AzureML-sklearn-0.24-ubuntu18.04-py37-cpu@latest
inputs:
iris_csv:
type: uri_file
path: wasbs://datasets@azuremlexamples.blob.core.windows.net/iris.csv
compute: azureml:cpu-cluster
sampling_algorithm: random
search_space:
C:
type: uniform
min_value: 0.5
max_value: 0.9
kernel:
type: choice
values: ["rbf", "linear", "poly"]
coef0:
type: uniform
min_value: 0.1
max_value: 1
objective:
goal: minimize
primary_metric: training_f1_score
limits:
max_total_trials: 20
max_concurrent_trials: 10
timeout: 7200
display_name: sklearn-iris-sweep-example
experiment_name: sklearn-iris-sweep-example
description: Sweep hyperparemeters for training a scikit-learn SVM on the Iris dataset.
TIP
Check the "Child runs" tab in the studio to monitor progress and view parameter charts..
For the full set of configurable options for sweep jobs, see the sweep job YAML schema reference.
Distributed training
Azure Machine Learning supports PyTorch, TensorFlow, and MPI-based distributed training. See the distributed
section of the command job YAML syntax reference for details.
As an example, you can train a convolutional neural network (CNN) on the CIFAR-10 dataset using distributed
PyTorch. The full script is available in the examples repository.
The CIFAR-10 dataset in torchvision expects as input a directory that contains the cifar-10-batches-py
directory. You can download the zipped source and extract into a local directory:
mkdir data
wget "https://azuremlexamples.blob.core.windows.net/datasets/cifar-10-python.tar.gz"
Then create an Azure Machine Learning data asset from the local directory, which will be uploaded to the default
datastore:
rm cifar-10-python.tar.gz
rm -r data
Registered data assets can be used as inputs to job using the path field for a job input. The format is
azureml:<data_name>:<data_version> , so for the CIFAR-10 dataset just created, it is azureml:cifar-10-example:1 .
You can optionally use the azureml:<data_name>@latest syntax instead if you want to reference the latest version
of the data asset. Azure ML will resolve that reference to the explicit version.
With the data asset in place, you can author a distributed PyTorch job to train our model:
$schema: https://azuremlschemas.azureedge.net/latest/commandJob.schema.json
code: src
command: >-
python train.py
--epochs ${{inputs.epochs}}
--learning-rate ${{inputs.learning_rate}}
--data-dir ${{inputs.cifar}}
inputs:
epochs: 1
learning_rate: 0.2
cifar:
type: uri_folder
path: azureml:cifar-10-example:1
environment: azureml:AzureML-pytorch-1.9-ubuntu18.04-py37-cuda11-gpu@latest
compute: azureml:gpu-cluster
distribution:
type: pytorch
process_count_per_instance: 1
resources:
instance_count: 2
display_name: pytorch-cifar-distributed-example
experiment_name: pytorch-cifar-distributed-example
description: Train a basic convolutional neural network (CNN) with PyTorch on the CIFAR-10 dataset,
distributed via PyTorch.
Next steps
Deploy and score a machine learning model with a managed online endpoint (preview)
Train models with REST (preview)
5/25/2022 • 6 minutes to read • Edit Online
Learn how to use the Azure Machine Learning REST API to create and manage training jobs (preview).
The REST API uses standard HTTP verbs to create, retrieve, update, and delete resources. The REST API works
with any language or tool that can make HTTP requests. REST's straightforward structure makes it a good choice
in scripting environments and for MLOps automation.
In this article, you learn how to use the new REST APIs to:
Create machine learning assets
Create a basic training job
Create a hyperparameter tuning sweep job
Prerequisites
An Azure subscription for which you have administrative rights. If you don't have such a subscription, try
the free or paid personal subscription.
An Azure Machine Learning workspace.
A service principal in your workspace. Administrative REST requests use service principal authentication.
A service principal authentication token. Follow the steps in Retrieve a service principal authentication token
to retrieve this token.
The curl utility. The curl program is available in the Windows Subsystem for Linux or any UNIX distribution.
In PowerShell, curl is an alias for Invoke-WebRequest and curl -d "key=val" -X POST uri becomes
Invoke-WebRequest -Body "key=val" -Method POST -Uri uri .
API_VERSION="2022-02-01-preview"
Compute
Running machine learning jobs requires compute resources. You can list your workspace's compute resources:
curl
"https://management.azure.com/subscriptions/$SUBSCRIPTION_ID/resourceGroups/$RESOURCE_GROUP/providers/Micros
oft.MachineLearningServices/workspaces/$WORKSPACE/computes?api-version=$API_VERSION" \
--header "Authorization: Bearer $TOKEN"
For this example, we use an existing compute cluster named cpu-cluster . We set the compute name as a
variable for encapsulation:
COMPUTE_NAME="cpu-cluster"
TIP
You can create or overwrite a named compute resource with a PUT request.
Environment
The LightGBM example needs to run in a LightGBM environment. Create the environment with a PUT request.
Use a docker image from Microsoft Container Registry.
You can configure the docker image with Docker and add conda dependencies with condaFile :
ENV_VERSION=$RANDOM
curl --location --request PUT
"https://management.azure.com/subscriptions/$SUBSCRIPTION_ID/resourceGroups/$RESOURCE_GROUP/providers/Micros
oft.MachineLearningServices/workspaces/$WORKSPACE/environments/lightgbm-environment/versions/$ENV_VERSION?
api-version=$API_VERSION" \
--header "Authorization: Bearer $TOKEN" \
--header "Content-Type: application/json" \
--data-raw "{
\"properties\":{
\"condaFile\": \"$CONDA_FILE\",
\"image\": \"mcr.microsoft.com/azureml/openmpi3.1.2-ubuntu18.04\"
}
}"
Datastore
The training job needs to run on data, so you need to specify a datastore. In this example, you get the default
datastore and Azure Storage account for your workspace. Query your workspace with a GET request to return a
JSON file with the information.
You can use the tool jq to parse the JSON result and get the required values. You can also use the Azure portal to
find the same information.
response=$(curl --location --request GET
"https://management.azure.com/subscriptions/$SUBSCRIPTION_ID/resourceGroups/$RESOURCE_GROUP/providers/Micros
oft.MachineLearningServices/workspaces/$WORKSPACE/datastores?api-version=$API_VERSION&isDefault=true" \
--header "Authorization: Bearer $TOKEN")
Data
Now that you have the datastore, you can create a dataset. For this example, use the common dataset iris.csv .
DATA_VERSION=$RANDOM
curl --location --request PUT
"https://management.azure.com/subscriptions/$SUBSCRIPTION_ID/resourceGroups/$RESOURCE_GROUP/providers/Micros
oft.MachineLearningServices/workspaces/$WORKSPACE/data/iris-data/versions/$DATA_VERSION?api-
version=$API_VERSION" \
--header "Authorization: Bearer $TOKEN" \
--header "Content-Type: application/json" \
--data-raw "{
\"properties\": {
\"description\": \"Iris dataset\",
\"dataType\": \"UriFile\",
\"dataUri\": \"https://azuremlexamples.blob.core.windows.net/datasets/iris.csv\"
}
}"
Code
Now that you have the dataset and datastore, you can upload the training script that will run on the job. Use the
Azure Storage CLI to upload a blob into your default container. You can also use other methods to upload, such
as the Azure portal or Azure Storage Explorer.
Once you upload your code, you can specify your code with a PUT request and reference the url through
codeUri .
run_id=$(uuidgen)
curl --location --request PUT
"https://management.azure.com/subscriptions/$SUBSCRIPTION_ID/resourceGroups/$RESOURCE_GROUP/providers/Micros
oft.MachineLearningServices/workspaces/$WORKSPACE/jobs/$run_id?api-version=$API_VERSION" \
--header "Authorization: Bearer $TOKEN" \
--header "Content-Type: application/json" \
--data-raw "{
\"properties\": {
\"jobType\": \"Command\",
\"codeId\":
\"/subscriptions/$SUBSCRIPTION_ID/resourceGroups/$RESOURCE_GROUP/providers/Microsoft.MachineLearningServices
/workspaces/$WORKSPACE/codes/train-lightgbm/versions/1\",
\"command\": \"python main.py --iris-csv \$AZURE_ML_INPUT_iris\",
\"environmentId\":
\"/subscriptions/$SUBSCRIPTION_ID/resourceGroups/$RESOURCE_GROUP/providers/Microsoft.MachineLearningServices
/workspaces/$WORKSPACE/environments/lightgbm-environment/versions/$ENV_VERSION\",
\"inputDataBindings\": {
\"iris\": {
\"jobInputType\": \"UriFile\",
\"uri\": \"https://azuremlexamples.blob.core.windows.net/datasets/iris.csv\"
}
},
\"experimentName\": \"lightgbm-iris\",
\"computeId\":
"/subscriptions/$SUBSCRIPTION_ID/resourceGroups/$RESOURCE_GROUP/providers/Microsoft.MachineLearningServices/
workspaces/$WORKSPACE/computes/$COMPUTE_NAME\"
}
}"
run_id=$(uuidgen)
curl --location --request PUT
"https://management.azure.com/subscriptions/$SUBSCRIPTION_ID/resourceGroups/$RESOURCE_GROUP/providers/Micros
oft.MachineLearningServices/workspaces/$WORKSPACE/jobs/$run_id?api-version=$API_VERSION" \
--header "Authorization: Bearer $TOKEN" \
--header "Content-Type: application/json" \
--data-raw "{
\"properties\": {
\"samplingAlgorithm\": {
\"samplingAlgorithmType\": \"Random\",
},
\"jobType\": \"Sweep\",
\"trial\":{
\"codeId\":
\"/subscriptions/$SUBSCRIPTION_ID/resourceGroups/$RESOURCE_GROUP/providers/Microsoft.MachineLearningServices
/workspaces/$WORKSPACE/codes/train-lightgbm/versions/1\",
\"command\": \"python main.py --iris-csv \$AZURE_ML_INPUT_iris\",
\"environmentId\":
\"/subscriptions/$SUBSCRIPTION_ID/resourceGroups/$RESOURCE_GROUP/providers/Microsoft.MachineLearningServices
/workspaces/$WORKSPACE/environments/lightgbm-environment/versions/$ENV_VERSION\"
},
\"experimentName\": \"lightgbm-iris-sweep\",
\"computeId\": \"target\":
\"/subscriptions/$SUBSCRIPTION_ID/resourceGroups/$RESOURCE_GROUP/providers/Microsoft.MachineLearningServices
/workspaces/$WORKSPACE/computes/$COMPUTE_NAME\",
\"objective\": {
\"primaryMetric\": \"test-multi_logloss\",
\"goal\": \"minimize\"
},
\"searchSpace\": {
\"learning_rate\": [\"uniform\", [0.01, 0.9]],
\"boosting\":[\"choice\",[[\"gbdt\",\"dart\"]]]
},
\"limits\": {
\"jobLimitsType\": \"sweep\",
\"maxTotalTrials\": 20,
\"maxConcurrentTrials\": 10,
}
}
}"
Next steps
Now that you have a trained model, learn how to deploy your model.
Manage Azure Machine Learning environments
with the CLI (v2)
5/25/2022 • 7 minutes to read • Edit Online
Prerequisites
To use the CLI, you must have an Azure subscription. If you don't have an Azure subscription, create a free
account before you begin. Try the free or paid version of Azure Machine Learning today.
Install and set up the Azure CLI extension for Machine Learning
TIP
For a full-featured development environment, use Visual Studio Code and the Azure Machine Learning extension to
manage Azure Machine Learning resources and train machine learning models.
Note that --depth 1 clones only the latest commit to the repository which reduces time to complete the
operation.
Curated environments
There are two types of environments in Azure ML: curated and custom environments. Curated environments are
predefined environments containing popular ML frameworks and tooling. Custom environments are user-
defined and can be created via az ml environment create .
Curated environments are provided by Azure ML and are available in your workspace by default. Azure ML
routinely updates these environments with the latest framework version releases and maintains them for bug
fixes and security patches. They are backed by cached Docker images, which reduces job preparation cost and
model deployment time.
You can use these curated environments out of the box for training or deployment by referencing a specific
environment using the azureml:<curated-environment-name>:<version> or
azureml:<curated-environment-name>@latest syntax. You can also use them as reference for your own custom
environments by modifying the Dockerfiles that back these curated environments.
You can see the set of available curated environments in the Azure ML studio UI, or by using the CLI (v2) via
az ml environments list .
Create an environment
You can define an environment from a conda specification, Docker image, or Docker build context. Configure the
environment using a YAML specification file and create the environment using the following CLI command:
For the YAML reference documentation for Azure ML environments, see CLI (v2) environment YAML schema.
Create an environment from a Docker image
To define an environment from a Docker image, provide the image URI of the image hosted in a registry such as
Docker Hub or Azure Container Registry.
The following example is a YAML specification file for an environment defined from a Docker image. An image
from the official PyTorch repository on Docker Hub is specified via the image property in the YAML file.
$schema: https://azuremlschemas.azureedge.net/latest/environment.schema.json
name: docker-image-example
image: pytorch/pytorch:latest
description: Environment created from a Docker image.
TIP
Azure ML maintains a set of CPU and GPU Ubuntu Linux-based base images with common system dependencies. For
example, the GPU images contain Miniconda, OpenMPI, CUDA, cuDNN, and NCCL. You can use these images for your
environments, or use their corresponding Dockerfiles as reference when building your own custom images.
For the set of base images and their corresponding Dockerfiles, see the AzureML-Containers repo.
$schema: https://azuremlschemas.azureedge.net/latest/environment.schema.json
name: docker-context-example
build:
path: docker-contexts/python-and-pip
To create the environment:
Azure ML will start building the image from the build context when the environment is created. You can monitor
the status of the build and view the build logs in the studio UI.
Create an environment from a conda specification
You can define an environment using a standard conda YAML configuration file that includes the dependencies
for the conda environment. See Creating an environment manually for information on this standard format.
You must also specify a base Docker image for this environment. Azure ML will build the conda environment on
top of the Docker image provided. If you install some Python dependencies in your Docker image, those
packages will not exist in the execution environment thus causing runtime failures. By default, Azure ML will
build a Conda environment with dependencies you specified, and will execute the run in that environment
instead of using any Python libraries that you installed on the base image.
The following example is a YAML specification file for an environment defined from a conda specification. Here
the relative path to the conda file from the Azure ML environment YAML file is specified via the conda_file
property. You can alternatively define the conda specification inline using the conda_file property, rather than
defining it in a separate file.
$schema: https://azuremlschemas.azureedge.net/latest/environment.schema.json
name: docker-image-plus-conda-example
image: mcr.microsoft.com/azureml/openmpi3.1.2-ubuntu18.04
conda_file: conda-yamls/pydata.yml
description: Environment created from a Docker image plus Conda environment.
Azure ML will build the final Docker image from this environment specification when the environment is used in
a job or deployment. You can also manually trigger a build of the environment in the studio UI.
Manage environments
The CLI (v2) provides a set of commands under az ml environment for managing the lifecycle of your Azure ML
environment assets.
List
List all the environments in your workspace:
az ml environment list
Show
Get the details of a specific environment:
az ml environment list --name docker-image-example --version 1
Update
Update mutable properties of a specific environment:
IMPORTANT
For environments, only description and tags can be updated. All other properties are immutable; if you need to
change any of those properties you should create a new version of the environment.
You can restore an archived environment to no longer hide it from list queries.
If an entire environment container is archived, you can restore that archived container. You cannot restore only a
specific environment version if the entire environment container is archived - you will need to restore the entire
container.
Restore an environment container:
If only individual environment version(s) within an environment container are archived, you can restore those
individual version(s).
Restore a specific environment version:
Next steps
Train models (create jobs) with the CLI (v2)
Deploy and score a machine learning model by using a managed online endpoint
Environment YAML schema reference
Set up AutoML training with the Azure ML Python
SDK v2 (preview)
5/25/2022 • 15 minutes to read • Edit Online
IMPORTANT
This feature is currently in public preview. This preview version is provided without a service-level agreement, and it's not
recommended for production workloads. Certain features might not be supported or might have constrained capabilities.
For more information, see Supplemental Terms of Use for Microsoft Azure Previews.
In this guide, learn how to set up an automated machine learning, AutoML, training job with the Azure Machine
Learning Python SDK v2 (preview). Automated ML picks an algorithm and hyperparameters for you and
generates a model ready for deployment. This guide provides details of the various options that you can use to
configure automated ML experiments.
If you prefer a no-code experience, you can also Set up no-code AutoML training in the Azure Machine Learning
studio.
If you prefer to submit training jobs with the Azure Machine learning CLI v2 extension, see Train models with the
CLI (v2).
Prerequisites
For this article you need:
An Azure Machine Learning workspace. To create the workspace, see Create an Azure Machine Learning
workspace.
The Azure Machine Learning Python SDK v2 (preview) installed. To install the SDK you can either,
Create a compute instance, which already has installed the latest AzureML Python SDK and is pre-
configured for ML workflows. See Create and manage an Azure Machine Learning compute
instance for more information.
Use the followings commands to install Azure ML Python SDK v2:
Uninstall previous preview version:
credential = DefaultAzureCredential()
ml_client = None
try:
ml_client = MLClient.from_config(credential)
except Exception as ex:
print(ex)
# Enter details of your AML workspace
subscription_id = "<SUBSCRIPTION_ID>"
resource_group = "<RESOURCE_GROUP>"
workspace = "<AML_WORKSPACE_NAME>"
ml_client = MLClient(credential, subscription_id, resource_group, workspace)
paths:
- file: ./bank_marketing_train_data.csv
transformations:
- read_delimited:
delimiter: ','
encoding: 'ascii'
Therefore, the MLTable folder would have the MLTable deinifition file plus the data file (the
bank_marketing_train_data.csv file in this case).
The following shows two ways of creating an MLTable.
A. Providing your training data and MLTable definition file from your local folder and it'll be automatically
uploaded into the cloud (default Workspace Datastore)
B. Providing a MLTable already registered and uploaded into the cloud.
Larger than 20,000 rows Train/validation data split is applied. The default is to take
10% of the initial training data set as the validation set. In
turn, that validation set is used for metrics calculation.
Smaller than 20,000 rows Cross-validation approach is applied. The default number of
folds depends on the number of rows.
If the dataset is less than 1,000 rows , 10 folds are
used.
If the rows are between 1,000 and 20,000 , then three
folds are used.
Large data
Automated ML supports a limited number of algorithms for training on large data that can successfully build
models for big data on small virtual machines. Automated ML heuristics depend on properties such as data size,
virtual machine memory size, experiment timeout and featurization settings to determine if these large data
algorithms should be applied. Learn more about what models are supported in automated ML.
For regression, Online Gradient Descent Regressor and Fast Linear Regressor
For classification, Averaged Perceptron Classifier and Linear SVM Classifier; where the Linear SVM
classifier has both large data and small data versions.
If you want to override these heuristics, apply the following settings:
TA SK SET T IN G N OT ES
Block data streaming algorithms Use the blocked_algorithms Results in either run failure or long run
parameter in the set_training() time
function and list the model(s) you
don't want to use.
Use data streaming algorithms Block all models except the big data
(studio UI experiments) algorithms you want to use.
The following example shows the required parameters for a classification task that specifies accuracy as the
primary metric and 5 cross-validation folds.
classification_job = automl.classification(
compute=compute_name,
experiment_name=exp_name,
training_data=my_training_data_input,
target_column_name="y",
primary_metric="accuracy",
n_cross_validations=5,
enable_model_explainability=True,
tags={"my_custom_tag": "My custom value"}
)
classification_job.set_limits(
timeout=600, # timeout
trial_timeout=20, # trial_timeout
max_trials=max_trials,
# max_concurrent_trials = 4,
# max_cores_per_trial: -1,
enable_early_termination=True,
)
precision_score_weighted
You can also see the enums to use in Python in this reference page for ClassificationPrimaryMetrics Enum
Metrics for classification multi-label scenarios
For Text classification multi-label currently 'Accuracy' is the only primary metric supported.
For Image classification multi-label, the primary metrics supported are defined in the
ClassificationMultilabelPrimaryMetrics Enum
Metrics for NLP Text NER (Named Entity Recognition ) scenarios
For NLP Text NER (Named Entity Recognition) currently 'Accuracy' is the only primary metric supported.
Metrics for regression scenarios
r2_score , normalized_mean_absolute_error and normalized_root_mean_squared_error are all trying to minimize
prediction errors. r2_score and normalized_root_mean_squared_error are both minimizing average squared
errors while normalized_mean_absolute_error is minizing the average absolute value of errors. Absolute value
treats errors at all magnitudes alike and squared errors will have a much larger penalty for errors with larger
absolute values. Depending on whether larger errors should be punished more or not, one can choose to
optimize squared error or absolute error.
The main difference between r2_score and normalized_root_mean_squared_error is the way they are normalized
and their meanings. normalized_root_mean_squared_error is root mean squared error normalized by range and
can be interpreted as the average error magnitude for prediction. r2_score is mean squared error normalized
by an estimate of variance of data. It is the proportion of variation that can be captured by the model.
NOTE
r2_score and normalized_root_mean_squared_error also behave similarly as primary metrics. If a fixed validation set
is applied, these two metrics are optimizing the same target, mean squared error, and will be optimized by the same
model. When only a training set is available and cross-validation is applied, they would be slightly different as the
normalizer for normalized_root_mean_squared_error is fixed as the range of training set, but the normalizer for
r2_score would vary for every fold as it's the variance for each fold.
If the rank, instead of the exact value is of interest, spearman_correlation can be a better choice as it measures
the rank correlation between real values and predictions.
However, currently no primary metrics for regression addresses relative difference. All of r2_score ,
normalized_mean_absolute_error , and normalized_root_mean_squared_error treat a $20k prediction error the
same for a worker with a $30k salary as a worker making $20M, if these two data points belongs to the same
dataset for regression, or the same time series specified by the time series identifier. While in reality, predicting
only $20k off from a $20M salary is very close (a small 0.1% relative difference), whereas $20k off from $30k is
not close (a large 67% relative difference). To address the issue of relative difference, one can train a model with
available primary metrics, and then select the model with best mean_absolute_percentage_error or
root_mean_squared_log_error .
spearman_correlation
normalized_mean_absolute_error
You can also see the enums to use in Python in this reference page for RegressionPrimaryMetrics Enum
Metrics for Time Series Forecasting scenarios
The recommendations are similar to those noted for regression scenarios.
normalized_mean_absolute_error
You can also see the enums to use in Python in this reference page for ForecastingPrimaryMetrics Enum
Metrics for Image Object Detection scenarios
For Image Object Detection, the primary metrics supported are defined in the
ObjectDetectionPrimaryMetrics Enum
Metrics for Image Instance Segmentation scenarios
For Image Instance Segmentation scenarios, the primary metrics supported are defined in the
InstanceSegmentationPrimaryMetrics Enum
Data featurization
In every automated ML experiment, your data is automatically transformed to numbers and vectors of numbers
plus (i.e. converting text to numeric) also scaled and normalized to help certain algorithms that are sensitive to
features that are on different scales. This data transformation, scaling and normalization is referred to as
featurization.
NOTE
Automated machine learning featurization steps (feature normalization, handling missing data, converting text to numeric,
etc.) become part of the underlying model. When using the model for predictions, the same featurization steps applied
during training are applied to your input data automatically.
When configuring your automated ML jobs, you can enable/disable the featurization settings by using the
.set_featurization() setter function.
The following code shows how custom featurization can be provided in this case for a regression job.
transformer_params = {
"imputer": [
ColumnTransformer(fields=["CACH"], parameters={"strategy": "most_frequent"}),
ColumnTransformer(fields=["PRP"], parameters={"strategy": "most_frequent"}),
],
}
regression_job.set_featurization(
mode="custom",
transformer_params=transformer_params,
blocked_transformers=["LabelEncoding"],
column_name_and_types={"CHMIN": "Categorical"},
)
Exit criteria
There are a few options you can define in the set_limits() function to end your experiment prior to job
completion.
trial_timeout Maximum time in minutes that each trial (child job) can run
for before it terminates. If not specified, a value of 1 month
or 43200 minutes is used
enable_early_termination Whether to end the job if the score is not improving in the
short term
Run experiment
WARNING
If you run an experiment with the same configuration settings and primary metric multiple times, you'll likely see variation
in each experiments final metrics score and generated models. The algorithms automated ML employs have inherent
randomness that can cause slight variation in the models output by the experiment and the recommended model's final
metrics score, like accuracy. You'll likely also see results with the same model name, but different hyperparameters used.
Submit the experiment to run and generate a model. With the MLClient created in the prerequisites,you can run
the following command in the workspace.
# Submit the AutoML job
returned_job = ml_client.jobs.create_or_update(
classification_job
) # submit the job to the backend
TIP
For registered models, one-click deployment is available via the Azure Machine Learning studio. See how to deploy
registered models from the studio.
Next steps
Learn more about how and where to deploy a model.
Set up no-code AutoML training with the studio UI
5/25/2022 • 15 minutes to read • Edit Online
In this article, you learn how to set up AutoML training runs without a single line of code using Azure Machine
Learning automated ML in the Azure Machine Learning studio.
Automated machine learning, AutoML, is a process in which the best machine learning algorithm to use for your
specific data is selected for you. This process enables you to generate machine learning models quickly. Learn
more about how Azure Machine Learning implements automated machine learning.
For an end to end example, try the Tutorial: AutoML- train no-code classification models.
For a Python code-based experience, configure your automated machine learning experiments with the Azure
Machine Learning SDK.
Prerequisites
An Azure subscription. If you don't have an Azure subscription, create a free account before you begin. Try
the free or paid version of Azure Machine Learning today.
An Azure Machine Learning workspace. See Create an Azure Machine Learning workspace.
Get started
1. Sign in to Azure Machine Learning studio.
2. Select your subscription and workspace.
3. Navigate to the left pane. Select Automated ML under the Author section.
If this is your first time doing any experiments, you'll see an empty list and links to documentation.
Otherwise, you'll see a list of your recent automated ML experiments, including those created with the SDK.
IMPORTANT
Requirements for training data:
Data must be in tabular form.
The value you want to predict (target column) must be present in the data.
a. To create a new dataset from a file on your local computer, select +Create dataset and then select
From local file .
b. In the Basic info form, give your dataset a unique name and provide an optional description.
c. Select Next to open the Datastore and file selection form . On this form you select where to
upload your dataset; the default storage container that's automatically created with your
workspace, or choose a storage container that you want to use for the experiment.
a. If your data is behind a virtual network, you need to enable the skip the validation function
to ensure that the workspace can access your data. For more information, see Use Azure
Machine Learning studio in an Azure virtual network.
d. Select Browse to upload the data file for your dataset.
e. Review the Settings and preview form for accuracy. The form is intelligently populated based on
the file type.
File format Defines the layout and type of data stored in a file.
Column headers Indicates how the headers of the dataset, if any, will
be treated.
Skip rows Indicates how many, if any, rows are skipped in the
dataset.
Select Next .
f. The Schema form is intelligently populated based on the selections in the Settings and preview
form. Here configure the data type for each column, review the column names, and select which
columns to Not include for your experiment.
Select Next.
g. The Confirm details form is a summary of the information previously populated in the Basic
info and Settings and preview forms. You also have the option to create a data profile for your
dataset using a profiling enabled compute. Learn more about data profiling.
Select Next .
3. Select your newly created dataset once it appears. You are also able to view a preview of the dataset and
sample statistics.
4. On the Configure run form, select Create new and enter Tutorial-automl-deploy for the experiment
name.
5. Select a target column; this is the column that you would like to do predictions on.
6. Select a compute type for the data profiling and training job. You can select a compute cluster or compute
instance.
7. Select a compute from the dropdown list of your existing computes. To create a new compute, follow the
instructions in step 8.
8. Select Create a new compute to configure your compute context for this experiment.
Virtual machine priority Low priority virtual machines are cheaper but don't
guarantee the compute nodes.
Virtual machine type Select CPU or GPU for virtual machine type.
Virtual machine size Select the virtual machine size for your compute.
Min / Max nodes To profile data, you must specify 1 or more nodes. Enter
the maximum number of nodes for your compute. The
default is 6 nodes for an AML Compute.
Advanced settings These settings allow you to configure a user account and
existing virtual network for your experiment.
NOTE
Your compute name will indicate if the compute you select/create is profiling enabled. (See the section data
profiling for more details).
Select Next .
9. On the Task type and settings form, select the task type: classification, regression, or forecasting. See
supported task types for more information.
a. For classification , you can also enable deep learning.
If deep learning is enabled, validation is limited to train_validation split. Learn more about
validation options.
b. For forecasting you can,
a. Enable deep learning.
b. Select time column: This column contains the time data to be used.
c. Select forecast horizon: Indicate how many time units
(minutes/hours/days/weeks/months/years) will the model be able to predict to the future.
The further the model is required to predict into the future, the less accurate it becomes.
Learn more about forecasting and forecast horizon.
10. (Optional) View addition configuration settings: additional settings you can use to better control the
training job. Otherwise, defaults are applied based on experiment selection and data.
Primary metric Main metric used for scoring your model. Learn more
about model metrics.
Blocked algorithm Select algorithms you want to exclude from the training
job.
Exit criterion When any of these criteria are met, the training job is
stopped.
Training job time (hours): How long to allow the training
job to run.
Metric score threshold: Minimum metric score for all
pipelines. This ensures that if you have a defined target
metric you want to reach, you do not spend more time
on the training job than necessary.
11. (Optional) View featurization settings: if you choose to enable Automatic featurization in the
Additional configuration settings form, default featurization techniques are applied. In the View
featurization settings you can change these defaults and customize accordingly. Learn how to
customize featurizations.
12. The [Optional] Validate and test form allows you to do the following.
a. Specify the type of validation to be used for your training job. Learn more about cross validation.
a. Forecasting tasks only supports k-fold cross validation.
b. Provide a test dataset (preview) to evaluate the recommended model that automated ML
generates for you at the end of your experiment. When you provide test data, a test run is
automatically triggered at the end of your experiment. This test run is only run on the best model
that was recommended by automated ML. Learn how to get the results of the remote test run.
IMPORTANT
Providing a test dataset to evaluate generated models is a preview feature. This capability is an
experimental preview feature, and may change at any time.
Test data is considered a separate from training and validation, so as to not bias the results of
the test run of the recommended model. Learn more about bias during model validation.
You can either provide your own test dataset or opt to use a percentage of your training
dataset. Test data must be in the form of an Azure Machine Learning TabularDataset.
The schema of the test dataset should match the training dataset. The target column is optional,
but if no target column is indicated no test metrics are calculated.
The test dataset should not be the same as the training dataset or the validation dataset.
Forecasting runs do not support train/test split.
Customize featurization
In the Featurization form, you can enable/disable automatic featurization and customize the automatic
featurization settings for your experiment. To open this form, see step 10 in the Create and run experiment
section.
The following table summarizes the customizations currently available via the studio.
C O L UM N C USTO M IZ AT IO N
Feature type Change the value type for the selected column.
Impute with Select what value to impute missing values with in your data.
NOTE
The algorithms automated ML employs have inherent randomness that can cause slight variation in a recommended
model's final metrics score, like accuracy. Automated ML also performs operations on data such as train-test split, train-
validation split or cross-validation when necessary. So if you run an experiment with the same configuration settings and
primary metric multiple times, you'll likely see variation in each experiments final metrics score due to these factors.
On the Data transformation tab, you can see a diagram of what data preprocessing, feature engineering, scaling
techniques and the machine learning algorithm that were applied to generate this model.
IMPORTANT
The Data transformation tab is in preview. This capability should be considered experimental and may change at any time.
IMPORTANT
Testing your models with a test dataset to evaluate generated models is a preview feature. This capability is an
experimental preview feature, and may change at any time.
WARNING
This feature is not available for the following automated ML scenarios
Computer vision tasks (preview)
Many models and hiearchical time series forecasting training (preview)
Forecasting tasks where deep learning neural networks (DNN) are enabled
Automated ML runs from local computes or Azure Databricks clusters
WARNING
This feature is not available for the following automated ML scenarios
Computer vision tasks (preview)
Many models and hiearchical time series forecasting training (preview)
Forecasting tasks where deep learning neural networks (DNN) are enabled
Automated ML runs from local computes or Azure Databricks clusters
After your experiment completes, you can test the model(s) that automated ML generates for you. If you want to
test a different automated ML generated model, not the recommended model, you can do so with the following
steps.
1. Select an existing automated ML experiment run.
2. Navigate to the Models tab of the run and select the completed model you want to test.
3. On the model Details page, select the Test model(preview) button to open the Test model pane.
4. On the Test model pane, select the compute cluster and a test dataset you want to use for your test run.
5. Select the Test button. The schema of the test dataset should match the training dataset, but the target
column is optional.
6. Upon successful creation of model test run, the Details page displays a success message. Select the Test
results tab to see the progress of the run.
7. To view the results of the test run, open the Details page and follow the steps in the view results of the
remote test run section.
In scenarios where you would like to create a new experiment based on the settings of an existing experiment,
automated ML provides the option to do so with the Edit and submit button in the studio UI.
This functionality is limited to experiments initiated from the studio UI and requires the data schema for the new
experiment to match that of the original experiment.
The Edit and submit button opens the Create a new Automated ML run wizard with the data, compute and
experiment settings pre-populated. You can go through each form and edit selections as needed for your new
experiment.
TIP
If you are looking to deploy a model that was generated via the automl package with the Python SDK, you must
register your model to the workspace.
Once you're model is registered, find it in the studio by selecting Models on the left pane. Once you open your model,
you can select the Deploy button at the top of the screen, and then follow the instructions as described in step 2 of the
Deploy your model section.
Automated ML helps you with deploying the model without writing code:
1. You have a couple options for deployment.
Option 1: Deploy the best model, according to the metric criteria you defined.
a. After the experiment is complete, navigate to the parent run page by selecting Run 1 at the top
of the screen.
b. Select the model listed in the Best model summar y section.
c. Select Deploy on the top left of the window.
Option 2: To deploy a specific model iteration from this experiment.
a. Select the desired model from the Models tab
b. Select Deploy on the top left of the window.
2. Populate the Deploy model pane.
F IEL D VA L UE
Compute type Select the type of endpoint you want to deploy: Azure
Kubernetes Service (AKS) or Azure Container Instance
(ACI).
Compute name Applies to AKS only: Select the name of the AKS cluster
you wish to deploy to.
Use custom deployment assets Enable this feature if you want to upload your own
scoring script and environment file. Otherwise,
automated ML provides these assets for you by default.
Learn more about scoring scripts.
IMPORTANT
File names must be under 32 characters and must begin and end with alphanumerics. May include dashes,
underscores, dots, and alphanumerics between. Spaces are not allowed.
The Advanced menu offers default deployment features such as data collection and resource utilization
settings. If you wish to override these defaults do so in this menu.
3. Select Deploy . Deployment can take about 20 minutes to complete. Once deployment begins, the Model
summar y tab appears. See the deployment progress under the Deploy status section.
Now you have an operational web service to generate predictions! You can test the predictions by querying the
service from Power BI's built in Azure Machine Learning support.
Next steps
Learn how to consume a web service.
Understand automated machine learning results.
Learn more about automated machine learning and Azure Machine Learning.
Set up a development environment with Azure
Databricks and AutoML in Azure Machine Learning
5/25/2022 • 4 minutes to read • Edit Online
Learn how to configure a development environment in Azure Machine Learning that uses Azure Databricks and
automated ML.
Azure Databricks is ideal for running large-scale intensive machine learning workflows on the scalable Apache
Spark platform in the Azure cloud. It provides a collaborative Notebook-based environment with a CPU or GPU-
based compute cluster.
For information on other machine learning development environments, see Set up Python development
environment.
Prerequisite
Azure Machine Learning workspace. If you don't have one, you can create an Azure Machine Learning
workspace through the Azure portal, Azure CLI, and Azure Resource Manager templates.
SET T IN G A P P L IES TO VA L UE
TIP
If you have an old SDK version, deselect it from cluster's installed libraries and move to trash. Install the new SDK
version and restart the cluster. If there is an issue after the restart, detach and reattach your cluster.
2. Choose the following option (no other SDK installations are supported)
WARNING
No other SDK extras can be installed. Choose only the [ databricks ] option .
Troubleshooting
Databricks cancel an automated machine learning run : When you use automated machine
learning capabilities on Azure Databricks, to cancel a run and start a new experiment run, restart your
Azure Databricks cluster.
Databricks >10 iterations for automated machine learning : In automated machine learning
settings, if you have more than 10 iterations, set show_output to False when you submit the run.
Databricks widget for the Azure Machine Learning SDK and automated machine learning : The
Azure Machine Learning SDK widget isn't supported in a Databricks notebook because the notebooks
can't parse HTML widgets. You can view the widget in the portal by using this Python code in your Azure
Databricks notebook cell:
Alternatively, you can use init scripts if you keep facing install issues with Python libraries. This approach
isn't officially supported. For more information, see Cluster-scoped init scripts.
Impor t error : cannot impor t name Timedelta from pandas._libs.tslibs : If you see this error when
you use automated machine learning, run the two following lines in your notebook:
Impor t error : No module named 'pandas.core.indexes' : If you see this error when you use
automated machine learning:
1. Run this command to install two packages in your Azure Databricks cluster:
scikit-learn==0.19.1
pandas==0.22.0
Next steps
Train and deploy a model on Azure Machine Learning with the MNIST dataset.
See the Azure Machine Learning SDK for Python reference.
Set up AutoML to train a time-series forecasting
model with Python
5/25/2022 • 17 minutes to read • Edit Online
Prerequisites
For this article you need,
An Azure Machine Learning workspace. To create the workspace, see Create an Azure Machine Learning
workspace.
This article assumes some familiarity with setting up an automated machine learning experiment. Follow
the tutorial or how-to to see the main automated machine learning experiment design patterns.
IMPORTANT
The Python commands in this article require the latest azureml-train-automl package version.
Install the latest azureml-train-automl package to your local environment.
For details on the latest azureml-train-automl package, see the release notes.
For example, when creating a demand forecast, including a feature for current stock price could massively increase training
accuracy. However, if you intend to forecast with a long horizon, you may not be able to accurately predict future stock
values corresponding to future time-series points, and model accuracy could suffer.
You can specify separate training data and validation data directly in the AutoMLConfig object. Learn more about
the AutoMLConfig.
For time series forecasting, only Rolling Origin Cross Validation (ROCV) is used for validation by default.
Pass the training and validation data together, and set the number of cross validation folds with the
n_cross_validations parameter in your AutoMLConfig . ROCV divides the series into training and validation data
using an origin time point. Sliding the origin in time generates the cross-validation folds. This strategy preserves
the time series data integrity and eliminates the risk of data leakage
You can also bring your own validation data, learn more in Configure data splits and cross-validation in AutoML.
APPLIES TO: Python SDK azureml v1
automl_config = AutoMLConfig(task='forecasting',
training_data= training_data,
n_cross_validations=3,
...
**time_series_settings)
Learn more about how AutoML applies cross validation to prevent over-fitting models.
Configure experiment
The AutoMLConfig object defines the settings and data necessary for an automated machine learning task.
Configuration for a forecasting model is similar to the setup of a standard regression model, but certain models,
configuration options, and featurization steps exist specifically for time-series data.
Supported models
Automated machine learning automatically tries different models and algorithms as part of the model creation
and tuning process. As a user, there is no need for you to specify the algorithm. For forecasting experiments,
both native time-series and deep learning models are part of the recommendation system.
TIP
Traditional regression models are also tested as part of the recommendation system for forecasting experiments. See a
complete list of the supported models in the SDK reference documentation.
Configuration settings
Similar to a regression problem, you define standard training parameters like task type, number of iterations,
training data, and number of cross-validations. Forecasting tasks require the time_column_name and
forecast_horizon parameters to configure your experiment. If the data includes multiple time series, such as
sales data for multiple stores or energy data across different states, automated ML automatically detects this
and sets the time_series_id_column_names parameter (preview) for you. You can also include additional
parameters to better configure your run, see the optional configurations section for more detail on what can be
included.
IMPORTANT
Automatic time series identification is currently in public preview. This preview version is provided without a service-level
agreement. Certain features might not be supported or might have constrained capabilities. For more information, see
Supplemental Terms of Use for Microsoft Azure Previews.
PA RA M ET ER N A M E DESC RIP T IO N
time_column_name Used to specify the datetime column in the input data used
for building the time series and inferring its frequency.
forecasting_parameters = ForecastingParameters(time_column_name='day_datetime',
forecast_horizon=50,
freq='W')
These forecasting_parameters are then passed into your standard AutoMLConfig object along with the
forecasting task type, primary metric, exit criteria, and training data.
from azureml.core.workspace import Workspace
from azureml.core.experiment import Experiment
from azureml.train.automl import AutoMLConfig
import logging
automl_config = AutoMLConfig(task='forecasting',
primary_metric='normalized_root_mean_squared_error',
experiment_timeout_minutes=15,
enable_early_stopping=True,
training_data=train_data,
label_column_name=label,
n_cross_validations=5,
enable_ensembling=False,
verbosity=logging.INFO,
**forecasting_parameters)
The amount of data required to successfully train a forecasting model with automated ML is influenced by the
forecast_horizon , n_cross_validations , and target_lags or target_rolling_window_size values specified when
you configure your AutoMLConfig .
The following formula calculates the amount of historic data that what would be needed to construct time series
features.
Minimum historic data required: (2x forecast_horizon ) + # n_cross_validations + max(max( target_lags ),
target_rolling_window_size )
An Error exception is raised for any series in the dataset that does not meet the required amount of historic
data for the relevant settings specified.
Featurization steps
In every automated machine learning experiment, automatic scaling and normalization techniques are applied
to your data by default. These techniques are types of featurization that help certain algorithms that are
sensitive to features on different scales. Learn more about default featurization steps in Featurization in AutoML
However, the following steps are performed only for forecasting task types:
Detect time-series sample frequency (for example, hourly, daily, weekly) and create new records for absent
time points to make the series continuous.
Impute missing values in the target (via forward-fill) and feature columns (using median column values)
Create features based on time series identifiers to enable fixed effects across different series
Create time-based features to assist in learning seasonal patterns
Encode categorical variables to numeric quantities
To view the full list of possible engineered features generated from time series data, see TimeIndexFeaturizer
Class.
NOTE
Automated machine learning featurization steps (feature normalization, handling missing data, converting text to numeric,
etc.) become part of the underlying model. When using the model for predictions, the same featurization steps applied
during training are applied to your input data automatically.
Customize featurization
You also have the option to customize your featurization settings to ensure that the data and features that are
used to train your ML model result in relevant predictions.
Supported customizations for forecasting tasks include:
C USTO M IZ AT IO N DEF IN IT IO N
Column purpose update Override the auto-detected feature type for the specified
column.
Transformer parameter update Update the parameters for the specified transformer.
Currently supports Imputer (fill_value and median).
To customize featurizations with the SDK, specify "featurization": FeaturizationConfig in your AutoMLConfig
object. Learn more about custom featurizations.
NOTE
The drop columns functionality is deprecated as of SDK version 1.19. Drop columns from your dataset as part of data
cleansing, prior to consuming it in your automated ML experiment.
featurization_config = FeaturizationConfig()
If you're using the Azure Machine Learning studio for your experiment, see how to customize featurization in the
studio.
Optional configurations
Additional optional configurations are available for forecasting tasks, such as enabling deep learning and
specifying a target rolling window aggregation. A complete list of additional parameters is available in the
ForecastingParameters SDK reference documentation.
Frequency & target data aggregation
Leverage the frequency, freq , parameter to help avoid failures caused by irregular data, that is data that doesn't
follow a set cadence, like hourly or daily data.
For highly irregular data or for varying business needs, users can optionally set their desired forecast frequency,
freq , and specify the target_aggregation_function to aggregate the target column of the time series. Leverage
these two settings in your AutoMLConfig object can help save some time on data preparation.
Supported aggregation operations for target column values include:
F UN C T IO N DESC RIP T IO N
NOTE
DNN support for forecasting in Automated Machine Learning is in preview and not supported for local runs or runs
initiated in Databricks.
You can also apply deep learning with deep neural networks, DNNs, to improve the scores of your model.
Automated ML's deep learning allows for forecasting univariate and multivariate time series data.
Deep learning models have three intrinsic capabilities:
1. They can learn from arbitrary mappings from inputs to outputs
2. They support multiple inputs and outputs
3. They can automatically extract patterns in input data that spans over long sequences.
To enable deep learning, set the enable_dnn=True in the AutoMLConfig object.
automl_config = AutoMLConfig(task='forecasting',
enable_dnn=True,
...
**forecasting_parameters)
WARNING
When you enable DNN for experiments created with the SDK, best model explanations are disabled.
To enable DNN for an AutoML experiment created in the Azure Machine Learning studio, see the task type
settings in the studio UI how-to.
Target rolling window aggregation
Often the best information a forecaster can have is the recent value of the target. Target rolling window
aggregations allow you to add a rolling aggregation of data values as features. Generating and using these
features as extra contextual data helps with the accuracy of the train model.
For example, say you want to predict energy demand. You might want to add a rolling window feature of three
days to account for thermal changes of heated spaces. In this example, create this window by setting
target_rolling_window_size= 3 in the AutoMLConfig constructor.
The table shows resulting feature engineering that occurs when window aggregation is applied. Columns for
minimum, maximum, and sum are generated on a sliding window of three based on the defined settings.
Each row has a new calculated feature, in the case of the timestamp for September 8, 2017 4:00am the
maximum, minimum, and sum values are calculated using the demand values for September 8, 2017 1:00AM
- 3:00AM. This window of three shifts along to populate data for the remaining rows.
View a Python code example applying the target rolling window aggregate feature.
Short series handling
Automated ML considers a time series a shor t series if there are not enough data points to conduct the train
and validation phases of model development. The number of data points varies for each experiment, and
depends on the max_horizon, the number of cross validation splits, and the length of the model lookback, that is
the maximum of history that's needed to construct the time-series features.
Automated ML offers short series handling by default with the short_series_handling_configuration parameter
in the ForecastingParameters object.
To enable short series handling, the freq parameter must also be defined. To define an hourly frequency, we
will set freq='H' . View the frequency string options by visiting the pandas Time series page DataOffset objects
section. To change the default behavior, short_series_handling_configuration = 'auto' , update the
short_series_handling_configuration parameter in your ForecastingParameter object.
forecast_parameters = ForecastingParameters(time_column_name='day_datetime',
forecast_horizon=50,
short_series_handling_configuration='auto',
freq = 'H',
target_lags='auto')
If many of the series are short, then you may also see some impact in explainability results
ws = Workspace.from_config()
experiment = Experiment(ws, "Tutorial-automl-forecasting")
local_run = experiment.submit(automl_config, show_output=True)
best_run, fitted_model = local_run.get_output()
label_query = test_labels.copy().astype(np.float)
label_query.fill(np.nan)
label_fcst, data_trans = fitted_model.forecast_quantiles(
test_dataset, label_query, forecast_destination=pd.Timestamp(2019, 1, 8))
Often customers want to understand the predictions at a specific quantile of the distribution. For example, when
the forecast is used to control inventory like grocery items or virtual machines for a cloud service. In such cases,
the control point is usually something like "we want the item to be in stock and not run out 99% of the time".
The following demonstrates how to specify which quantiles you'd like to see for your predictions, such as 50th
or 95th percentile. If you don't specify a quantile, like in the aforementioned code example, then only the 50th
percentile predictions are generated.
You can calculate model metrics like, root mean squared error (RMSE) or mean absolute percentage error
(MAPE) to help you estimate the models performance. See the Evaluate section of the Bike share demand
notebook for an example.
After the overall model accuracy has been determined, the most realistic next step is to use the model to
forecast unknown future values.
Supply a data set in the same format as the test set test_dataset but with future datetimes, and the resulting
prediction set is the forecasted values for each time-series step. Assume the last time-series records in the data
set were for 12/31/2018. To forecast demand for the next day (or as many periods as you need to forecast, <=
forecast_horizon ), create a single time series record for each store for 01/01/2019.
day_datetime,store,week_of_year
01/01/2019,A,1
01/01/2019,A,1
Repeat the necessary steps to load this future data to a dataframe and then run
best_run.forecast_quantiles(test_dataset) to predict future values.
NOTE
In-sample predictions are not supported for forecasting with automated ML when target_lags and/or
target_rolling_window_size are enabled.
Forecasting at scale
There are scenarios where a single machine learning model is insufficient and multiple machine learning models
are needed. For instance, predicting sales for each individual store for a brand, or tailoring an experience to
individual users. Building a model for each instance can lead to improved results on many machine learning
problems.
Grouping is a concept in time series forecasting that allows time series to be combined to train an individual
model per group. This approach can be particularly helpful if you have time series which require smoothing,
filling or entities in the group that can benefit from history or trends from other entities. Many models and
hierarchical time series forecasting are solutions powered by automated machine learning for these large scale
forecasting scenarios.
Many models
The Azure Machine Learning many models solution with automated machine learning allows users to train and
manage millions of models in parallel. Many models The solution accelerator leverages Azure Machine Learning
pipelines to train the model. Specifically, a Pipeline object and ParalleRunStep are used and require specific
configuration parameters set through the ParallelRunConfig.
The following diagram shows the workflow for the many models solution.
The following code demonstrates the key parameters users need to set up their many models run. See the Many
Models- Automated ML notebook for a many models forecasting example
mm_paramters = ManyModelsTrainParameters(automl_settings=automl_settings,
partition_column_names=partition_column_names)
The hierarchical time series solution is built on top of the Many Models Solution and share a similar
configuration setup.
The following code demonstrates the key parameters to set up your hierarchical time series forecasting runs.
See the Hierarchical time series- Automated ML notebook, for an end to end example.
model_explainability = True
engineered_explanations = False # Define your hierarchy. Adjust the settings below based on your dataset.
hierarchy = ["state", "store_id", "product_category", "SKU"]
training_level = "SKU"# Set your forecast parameters. Adjust the settings below based on your dataset.
time_column_name = "date"
label_column_name = "quantity"
forecast_horizon = 7
hts_parameters = HTSTrainParameters(
automl_settings=automl_settings,
hierarchy_column_names=hierarchy,
training_level=training_level,
enable_engineered_explanations=engineered_explanations
)
Example notebooks
See the forecasting sample notebooks for detailed code examples of advanced forecasting configuration
including:
holiday detection and featurization
rolling-origin cross validation
configurable lags
rolling window aggregate features
Next steps
Learn more about how and where to deploy a model.
Learn about Interpretability: model explanations in automated machine learning (preview).
Follow the Tutorial: Train regression models for an end to end example for creating experiments with
automated machine learning.
Prepare data for computer vision tasks with
automated machine learning (preview)
5/25/2022 • 3 minutes to read • Edit Online
IMPORTANT
Support for training computer vision models with automated ML in Azure Machine Learning is an experimental public
preview feature. Certain features might not be supported or might have constrained capabilities. For more information,
see Supplemental Terms of Use for Microsoft Azure Previews.
In this article, you learn how to prepare image data for training computer vision models with automated
machine learning in Azure Machine Learning.
To generate models for computer vision tasks with automated machine learning, you need to bring labeled
image data as input for model training in the form of an MLTable .
You can create an MLTable from labeled training data in JSONL format. If your labeled training data is in a
different format (like, pascal VOC or COCO), you can use a conversion script to first convert it to JSONL, and
then create an MLTable . Alternatively, you can use Azure Machine Learning's data labeling tool to manually label
images, and export the labeled data to use for training your AutoML model.
Prerequisites
Familiarize yourself with the accepted schemas for JSONL files for AutoML computer vision experiments.
$schema: https://azuremlschemas.azureedge.net/latest/data.schema.json
name: fridge-items-images-object-detection
description: Fridge-items images Object detection
path: ./data/odFridgeObjects
type: uri_folder
To upload the images as a data asset, you run the following CLI v2 command with the path to your .yml file,
workspace name, resource group and subscription ID.
Next, you will need to get the label annotations in JSONL format. The schema of labeled data depends on the
computer vision task at hand. Refer to schemas for JSONL files for AutoML computer vision experiments to
learn more about the required JSONL schema for each task type.
If your training data is in a different format (like, pascal VOC or COCO), helper scripts to convert the data to
JSONL are available in notebook examples.
Create MLTable
Once you have your labeled data in JSONL format, you can use it to create MLTable as shown below. MLtable
packages your data into a consumable object for training.
paths:
- file: ./train_annotations.jsonl
transformations:
- read_json_lines:
encoding: utf8
invalid_lines: error
include_path_column: false
- convert_column_types:
- columns: image_url
column_type: stream_info
You can then pass in the MLTable as a data input for your AutoML training job.
Next steps
Train computer vision models with automated machine learning.
Train a small object detection model with automated machine learning.
Tutorial: Train an object detection model (preview) with AutoML and Python.
Set up AutoML to train computer vision models
5/25/2022 • 14 minutes to read • Edit Online
IMPORTANT
This feature is currently in public preview. This preview version is provided without a service-level agreement. Certain
features might not be supported or might have constrained capabilities. For more information, see Supplemental Terms of
Use for Microsoft Azure Previews.
In this article, you learn how to train computer vision models on image data with automated ML with the Azure
Machine Learning CLI extension v2 or the Azure Machine Learning Python SDK v2 (preview).
Automated ML supports model training for computer vision tasks like image classification, object detection, and
instance segmentation. Authoring AutoML models for computer vision tasks is currently supported via the
Azure Machine Learning Python SDK. The resulting experimentation runs, models, and outputs are accessible
from the Azure Machine Learning studio UI. Learn more about automated ml for computer vision tasks on
image data.
Prerequisites
CLI v2
Python SDK v2 (preview)
An Azure Machine Learning workspace. To create the workspace, see Create an Azure Machine Learning
workspace.
Install and set up CLI (v2) and make sure you install the ml extension.
TA SK T Y P E A UTO M L JO B SY N TA X
CLI v2
Python SDK v2 (preview)
APPLIES TO: Azure CLI ml extension v2 (current)
This task type is a required parameter and can be set using the task key.
For example:
task: image_object_detection
NOTE
The training data needs to have at least 10 images in order to be able to submit an AutoML run.
WARNING
Creation of MLTable is only supported using the SDK and CLI to create from data in JSONL format for this capability.
Creating the MLTable via UI is not supported at this time.
{
"image_url": "AmlDatastore://image_data/Image_01.png",
"image_details":
{
"format": "png",
"width": "2230px",
"height": "4356px"
},
"label":
{
"label": "cat",
"topX": "1",
"topY": "0",
"bottomX": "0",
"bottomY": "1",
"isCrowd": "true",
}
}
{
"image_url": "AmlDatastore://image_data/Image_02.png",
"image_details":
{
"format": "jpeg",
"width": "1230px",
"height": "2356px"
},
"label":
{
"label": "dog",
"topX": "0",
"topY": "1",
"bottomX": "0",
"bottomY": "1",
"isCrowd": "false",
}
}
Consume data
Once your data is in JSONL format, you can create training and validation MLTable as shown below.
paths:
- file: ./train_annotations.jsonl
transformations:
- read_json_lines:
encoding: utf8
invalid_lines: error
include_path_column: false
- convert_column_types:
- columns: image_url
column_type: stream_info
Automated ML doesn't impose any constraints on training or validation data size for computer vision tasks.
Maximum dataset size is only limited by the storage layer behind the dataset (i.e. blob store). There's no
minimum number of images or labels. However, we recommend starting with a minimum of 10-15 samples per
label to ensure the output model is sufficiently trained. The higher the total number of labels/classes, the more
samples you need per label.
CLI v2
Python SDK v2 (preview)
target_column_name: label
training_data:
path: data/training-mltable-folder
type: mltable
validation_data:
path: data/validation-mltable-folder
type: mltable
CLI v2
Python SDK v2 (preview)
compute: azureml:gpu-cluster
Configure model algorithms and hyperparameters
With support for computer vision tasks, you can control the model algorithm and sweep hyperparameters.
These model algorithms and hyperparameters are passed in as the parameter space for the sweep.
The model algorithm is required and is passed in via model_name parameter. You can either specify a single
model_name or choose between multiple.
ST RIN G L IT ERA L SY N TA X
TA SK M O DEL A L GO RIT H M S DEFAULT_MODEL * DEN OT ED W IT H *
In addition to controlling the model algorithm, you can also tune hyperparameters used for model training.
While many of the hyperparameters exposed are model-agnostic, there are instances where hyperparameters
are task-specific or model-specific. Learn more about the available hyperparameters for these instances.
Data augmentation
In general, deep learning model performance can often improve with more data. Data augmentation is a
practical technique to amplify the data size and variability of a dataset which helps to prevent overfitting and
improve the model’s generalization ability on unseen data. Automated ML applies different data augmentation
techniques based on the computer vision task, before feeding input images to the model. Currently, there is no
exposed hyperparameter to control data augmentations.
DATA A UGM EN TAT IO N T EC H N IQ UE( S)
TA SK IM PA C T ED DATA SET A P P L IED
Image classification (multi-class and Training Random resize and crop, horizontal
multi-label) flip, color jitter (brightness, contrast,
saturation, and hue), normalization
Validation & Test using channel-wise ImageNet’s mean
and standard deviation
Normalization, resize
Letterbox resizing
CLI v2
Python SDK v2 (preview)
image_model:
model_name: "yolov5"
Once you've built a baseline model, you might want to optimize model performance in order to sweep over the
model algorithm and hyperparameter space. You can use the following sample config to sweep over the
hyperparameters for each algorithm, choosing from a range of values for learning_rate, optimizer, lr_scheduler,
etc., to generate a model with the optimal primary metric. If hyperparameter values are not specified, then
default values are used for the specified algorithm.
Primary metric
The primary metric used for model optimization and hyperparameter tuning depends on the task type. Using
other primary metric values is currently not supported.
accuracy for IMAGE_CLASSIFICATION
iou for IMAGE_CLASSIFICATION_MULTILABEL
mean_average_precision for IMAGE_OBJECT_DETECTION
mean_average_precision for IMAGE_INSTANCE_SEGMENTATION
Experiment budget
You can optionally specify the maximum time budget for your AutoML Vision training job using the timeout
parameter in the limits - the amount of time in minutes before the experiment terminates. If none specified,
default experiment timeout is seven days (maximum 60 days). For example,
CLI v2
Python SDK v2 (preview)
limits:
timeout: 60
SA M P L IN G T Y P E A UTO M L JO B SY N TA X
NOTE
Currently only random sampling supports conditional hyperparameter spaces.
Learn more about how to configure the early termination policy for your hyperparameter sweep.
Resources for the sweep
You can control the resources spent on your hyperparameter sweep by specifying the max_trials and the
max_concurrent_trials for the sweep.
NOTE
For a complete sweep configuration sample, please refer to this tutorial.
PA RA M ET ER DETA IL
You can configure all the sweep related parameters as shown in the example below.
CLI v2
Python SDK v2 (preview)
Fixed settings
You can pass fixed settings or parameters that don't change during the parameter space sweep as shown below.
CLI v2
Python SDK v2 (preview)
image_model:
early_stopping: True
evaluation_frequency: 1
CLI v2
Python SDK v2 (preview)
image_model:
checkpoint_run_id : "target_checkpoint_run_id"
TIP
Check how to navigate to the run results from the View run results section.
For definitions and examples of the performance charts and metrics provided for each run, see Evaluate
automated machine learning experiment results
You can configure the model deployment endpoint name and the inferencing cluster to use for your model
deployment in the Deploy a model pane.
Code examples
CLI v2
Python SDK v2 (preview)
Review detailed code examples and use cases in the azureml-examples repository for automated machine
learning samples.
Next steps
Tutorial: Train an object detection model (preview) with AutoML and Python.
Troubleshoot automated ML experiments.
Train a small object detection model with AutoML
(preview)
5/25/2022 • 5 minutes to read • Edit Online
IMPORTANT
This feature is currently in public preview. This preview version is provided without a service-level agreement. Certain
features might not be supported or might have constrained capabilities. For more information, see Supplemental Terms of
Use for Microsoft Azure Previews.
In this article, you'll learn how to train an object detection model to detect small objects in high-resolution
images with automated ML in Azure Machine Learning.
Typically, computer vision models for object detection work well for datasets with relatively large objects.
However, due to memory and computational constraints, these models tend to under-perform when tasked to
detect small objects in high-resolution images. Because high-resolution images are typically large, they are
resized before input into the model, which limits their capability to detect smaller objects--relative to the initial
image size.
To help with this problem, automated ML supports tiling as part of the public preview computer vision
capabilities. The tiling capability in automated ML is based on the concepts in The Power of Tiling for Small
Object Detection.
When tiling, each image is divided into a grid of tiles. Adjacent tiles overlap with each other in width and height
dimensions. The tiles are cropped from the original as shown in the following image.
Prerequisites
An Azure Machine Learning workspace. To create the workspace, see Create an Azure Machine Learning
workspace.
This article assumes some familiarity with how to configure an automated machine learning experiment
for computer vision tasks.
Supported models
Small object detection using tiling is currently supported for the following models:
fasterrcnn_resnet18_fpn
fasterrcnn_resnet50_fpn
fasterrcnn_resnet34_fpn
fasterrcnn_resnet101_fpn
fasterrcnn_resnet152_fpn
retinanet_resnet50_fpn
When tiling is enabled, the entire image and the tiles generated from it are passed through the model. These
images and tiles are resized according to the min_size and max_size parameters before feeding to the model.
The computation time increases proportionally because of processing this extra data.
For example, when the tile_grid_size parameter is (3, 2), the computation time would be approximately seven
times when compared to no tiling.
You can specify the value for tile_grid_size in your hyperparameter space as a string.
parameter_space = {
'model_name': choice('fasterrcnn_resnet50_fpn'),
'tile_grid_size': choice('(3, 2)'),
...
}
The value for tile_grid_size parameter depends on the image dimensions and size of objects within the
image. For example, larger number of tiles would be helpful when there are smaller objects in the images.
To choose the optimal value for this parameter for your dataset, you can use hyperparameter search. To do so,
you can specify a choice of values for this parameter in your hyperparameter space.
parameter_space = {
'model_name': choice('fasterrcnn_resnet50_fpn'),
'tile_grid_size': choice('(2, 1)', '(3, 2)', '(5, 3)'),
...
}
You also have the option to enable tiling only during inference without enabling it in training. To do so, set the
tile_grid_size parameter only during inference, not for training.
Doing so, may improve performance for some datasets, and won't incur the extra cost that comes with tiling at
training time.
Tiling hyperparameters
The following are the parameters you can use to control the tiling feature.
tile_grid_size The grid size to use for tiling each no default value
image. Available for use during
training, validation, and inference.
Example notebooks
See the object detection sample notebook for detailed code examples of setting up and training an object
detection model.
NOTE
All images in this article are made available in accordance with the permitted use section of the MIT licensing agreement.
Copyright © 2020 Roboflow, Inc.
Next steps
Learn more about how and where to deploy a model.
For definitions and examples of the performance charts and metrics provided for each run, see Evaluate
automated machine learning experiment results.
Tutorial: Train an object detection model (preview) with AutoML and Python.
See what hyperparameters are available for computer vision tasks. *Make predictions with ONNX on
computer vision models from AutoML
# Set up AutoML to train a natural language
processing model (preview)
5/25/2022 • 12 minutes to read • Edit Online
IMPORTANT
This feature is currently in public preview. This preview version is provided without a service-level agreement, and it's not
recommended for production workloads. Certain features might not be supported or might have constrained capabilities.
For more information, see Supplemental Terms of Use for Microsoft Azure Previews.
In this article, you learn how to train natural language processing (NLP) models with automated ML in Azure
Machine Learning. You can create NLP models with automated ML via the Azure Machine Learning Python SDK
v2 (preview) or the Azure Machine Learning CLI v2.
Automated ML supports NLP which allows ML professionals and data scientists to bring their own text data and
build custom models for tasks such as, multi-class text classification, multi-label text classification, and named
entity recognition (NER).
You can seamlessly integrate with the Azure Machine Learning data labeling capability to label your text data or
bring your existing labeled data. Automated ML provides the option to use distributed training on multi-GPU
compute clusters for faster model training. The resulting model can be operationalized at scale by leveraging
Azure ML’s MLOps capabilities.
Prerequisites
CLI v2
Python SDK v2 (preview)
WARNING
Support for multilingual models and the use of models with longer max sequence length is necessary for several
NLP use cases, such as non-english datasets and longer range documents. As a result, these scenarios may
require higher GPU memory for model training to succeed, such as the NC_v3 series or the ND series.
The Azure Machine Learning CLI v2 installed. For guidance to update and install the latest version, see the
Install and set up CLI (v2).
This article assumes some familiarity with setting up an automated machine learning experiment. Follow
the how-to to see the main automated machine learning experiment design patterns.
Select your NLP task
Determine what NLP task you want to accomplish. Currently, automated ML supports the follow deep neural
network NLP tasks.
Multi-class text classification CLI v2: text_classification There are multiple possible classes and
SDK v2 (preview): each sample can be classified as exactly
text_classification() one class. The task is to predict the
correct class for each sample.
Multi-label text classification CLI v2: There are multiple possible classes and
text_classification_multilabel each sample can be assigned any
SDK v2 (preview): number of classes. The task is to
text_classification_multilabel() predict all the classes for each sample
Named Entity Recognition (NER) CLI v2: text_ner There are multiple possible tags for
SDK v2 (preview): text_ner() tokens in sequences. The task is to
predict the tags for all the tokens for
each sequence.
Preparing data
For NLP experiments in automated ML, you can bring your data in .csv format for multi-class and multi-label
classification tasks. For NER tasks, two-column .txt files that use a space as the separator and adhere to the
CoNLL format are supported. The following sections provide additional detail for the data format accepted for
each task.
Multi-class
For multi-class classification, the dataset can contain several text columns and exactly one label column. The
following example has only one text column.
text,labels
"I love watching Chicago Bulls games.","NBA"
"Tom Brady is a great player.","NFL"
"There is a game between Yankees and Orioles tonight","MLB"
"Stephen Curry made the most number of 3-Pointers","NBA"
Multi-label
For multi-label classification, the dataset columns would be the same as multi-class, however there are special
format requirements for data in the label column. The two accepted formats and examples are in the following
table.
L A B EL C O L UM N F O RM AT
O P T IO N S M ULT IP L E L A B EL S O N E L A B EL N O L A B EL S
IMPORTANT
Different parsers are used to read labels for these formats. If you are using the plain text format, only use alphabetical,
numerical and '_' in your labels. All other characters are recognized as the separator of labels.
For example, if your label is "cs.AI" , it's read as "cs" and "AI" . Whereas with the Python list format, the label would
be "['cs.AI']" , which is read as "cs.AI" .
text,labels
"I love watching Chicago Bulls games.","basketball"
"The four most popular leagues are NFL, MLB, NBA and NHL","football,baseball,basketball,hockey"
"I like drinking beer.",""
text,labels
"I love watching Chicago Bulls games.","['basketball']"
"The four most popular leagues are NFL, MLB, NBA and NHL","['football','baseball','basketball','hockey']"
"I like drinking beer.","[]"
Hudson B-loc
Square I-loc
is O
a O
famous O
place O
in O
New B-loc
York I-loc
City I-loc
Stephen B-per
Curry I-per
got O
three O
championship O
rings O
Data validation
Before training, automated ML applies data validation checks on the input data to ensure that the data can be
preprocessed correctly. If any of these checks fail, the run fails with the relevant error message. The following are
the requirements to pass data validation checks for each task.
NOTE
Some data validation checks are applicable to both the training and the validation set, whereas others are applicable only
to the training set. If the test dataset could not pass the data validation, that means that automated ML couldn't capture
it and there is a possibility of model inference failure, or a decline in model performance.
TA SK DATA VA L IDAT IO N C H EC K
Multi-class and Multi-label The training data and validation data must have
- The same set of columns
- The same order of columns from left to right
- The same data type for columns with the same name
- At least two unique labels
- Unique column names within each dataset (For example,
the training set can't have multiple columns named Age )
NER only - The file should not start with an empty line
- Each line must be an empty line, or follow format
{token} {label} , where there is exactly one space
between the token and the label and no white space after
the label
- All labels must start with I- , B- , or be exactly O . Case
sensitive
- Exactly one empty line between two samples
- Exactly one empty line at the end of the file
Configure experiment
Automated ML's NLP capability is triggered through task specific automl type jobs, which is the same workflow
for submitting automated ML experiments for classification, regression and forecasting tasks. You would set
parameters as you would for those experiments, such as experiment_name , compute_name and data inputs.
However, there are key differences:
You can ignore primary_metric , as it is only for reporting purposes. Currently, automated ML only trains one
model per run for NLP and there is no model selection.
The label_column_name parameter is only required for multi-class and multi-label text classification tasks.
If the majority of the samples in your dataset contain more than 128 words, it's considered long range. For
this scenario, you can enable the long range text option with the enable_long_range_text=True parameter in
your task function. Doing so, helps improve model performance but requires longer training times.
If you enable long range text, then a GPU with higher memory is required such as, NCv3 series or ND
series.
The enable_long_range_text parameter is only available for multi-class classification tasks.
CLI v2
Python SDK v2 (preview)
CLI v2
Python SDK v2 (preview)
featurization:
dataset_language: "eng"
Distributed training
You can also run your NLP experiments with distributed training on an Azure ML compute cluster.
CLI v2
Python SDK v2 (preview)
Code examples
CLI v2
Python SDK v2 (preview)
See the following sample YAML files for each NLP task.
Multi-class text classification
Multi-label text classification
Named entity recognition
Next steps
Deploy AutoML models to an online (real-time inference) endpoint
Troubleshoot automated ML experiments
Configure training, validation, cross-validation and
test data in automated machine learning
5/25/2022 • 8 minutes to read • Edit Online
Prerequisites
For this article you need,
An Azure Machine Learning workspace. To create the workspace, see Create an Azure Machine Learning
workspace.
Familiarity with setting up an automated machine learning experiment with the Azure Machine Learning
SDK. Follow the tutorial or how-to to see the fundamental automated machine learning experiment
design patterns.
An understanding of train/validation data splits and cross-validation as machine learning concepts. For a
high-level explanation,
About training, validation and test data in machine learning
Understand Cross Validation in machine learning
IMPORTANT
The Python commands in this article require the latest azureml-train-automl package version.
Install the latest azureml-train-automl package to your local environment.
For details on the latest azureml-train-automl package, see the release notes.
data = "https://automlsamplenotebookdata.blob.core.windows.net/automl-sample-notebook-data/creditcard.csv"
dataset = Dataset.Tabular.from_delimited_files(data)
Larger than 20,000 rows Train/validation data split is applied. The default is to take
10% of the initial training data set as the validation set. In
turn, that validation set is used for metrics calculation.
Smaller than 20,000 rows Cross-validation approach is applied. The default number of
folds depends on the number of rows.
If the dataset is less than 1,000 rows , 10 folds are
used.
If the rows are between 1,000 and 20,000 , then three
folds are used.
NOTE
The validation_data parameter requires the training_data and label_column_name parameters to be set as well.
You can only set one validation parameter, that is you can only specify either validation_data or
n_cross_validations , not both.
The following code example explicitly defines which portion of the provided data in dataset to use for training
and validation.
data = "https://automlsamplenotebookdata.blob.core.windows.net/automl-sample-notebook-data/creditcard.csv"
dataset = Dataset.Tabular.from_delimited_files(data)
NOTE
The validation_size parameter is not supported in forecasting scenarios.
data = "https://automlsamplenotebookdata.blob.core.windows.net/automl-sample-notebook-data/creditcard.csv"
dataset = Dataset.Tabular.from_delimited_files(data)
K-fold cross-validation
To perform k-fold cross-validation, include the n_cross_validations parameter and set it to a value. This
parameter sets how many cross validations to perform, based on the same number of folds.
NOTE
The n_cross_validations parameter is not supported in classification scenarios that use deep neural networks. For
forecasting scenarios, see how cross validation is applied in Set up AutoML to train a time-series forecasting model.
In the following code, five folds for cross-validation are defined. Hence, five different trainings, each training
using 4/5 of the data, and each validation using 1/5 of the data with a different holdout fold each time.
As a result, metrics are calculated with the average of the five validation metrics.
data = "https://automlsamplenotebookdata.blob.core.windows.net/automl-sample-notebook-data/creditcard.csv"
dataset = Dataset.Tabular.from_delimited_files(data)
NOTE
The Monte Carlo cross-validation is not supported in forecasting scenarios.
The follow code defines, 7 folds for cross-validation and 20% of the training data should be used for validation.
Hence, 7 different trainings, each training uses 80% of the data, and each validation uses 20% of the data with a
different holdout fold each time.
data = "https://automlsamplenotebookdata.blob.core.windows.net/automl-sample-notebook-data/creditcard.csv"
dataset = Dataset.Tabular.from_delimited_files(data)
NOTE
The cv_split_column_names parameter is not supported in forecasting scenarios.
The following code snippet contains bank marketing data with two CV split columns 'cv1' and 'cv2'.
data = "https://automlsamplenotebookdata.blob.core.windows.net/automl-sample-notebook-
data/bankmarketing_with_cv.csv"
dataset = Dataset.Tabular.from_delimited_files(data)
NOTE
To use cv_split_column_names with training_data and label_column_name , please upgrade your Azure Machine
Learning Python SDK version 1.6.0 or later. For previous SDK versions, please refer to using cv_splits_indices , but
note that it is used with X and y dataset input only.
You can also provide test data to evaluate the recommended model that automated ML generates for you upon
completion of the experiment. When you provide test data it's considered a separate from training and
validation, so as to not bias the results of the test run of the recommended model. Learn more about training,
validation and test data in automated ML.
WARNING
This feature is not available for the following automated ML scenarios
Computer vision tasks (preview)
Many models and hiearchical time series forecasting training (preview)
Forecasting tasks where deep learning neural networks (DNN) are enabled
Automated ML runs from local computes or Azure Databricks clusters
Test datasets must be in the form of an Azure Machine Learning TabularDataset. You can specify a test dataset
with the test_data and test_size parameters in your AutoMLConfig object. These parameters are mutually
exclusive and can not be specified at the same time or with cv_split_column_names or cv_splits_indices .
With the test_data parameter, specify an existing dataset to pass into your AutoMLConfig object.
automl_config = AutoMLConfig(task='forecasting',
...
# Provide an existing test dataset
test_data=test_dataset,
...
forecasting_parameters=forecasting_parameters)
To use a train/test split instead of providing test data directly, use the test_size parameter when creating the
AutoMLConfig . This parameter must be a floating point value between 0.0 and 1.0 exclusive, and specifies the
percentage of the training dataset that should be used for the test dataset.
NOTE
For regression tasks, random sampling is used.
For classification tasks, stratified sampling is used, but random sampling is used as a fall back when stratified sampling is
not feasible.
Forecasting does not currently support specifying a test dataset using a train/test split with the test_size parameter.
Passing the test_data or test_size parameters into the AutoMLConfig , automatically triggers a remote test
run upon completion of your experiment. This test run uses the provided test data to evaluate the best model
that automated ML recommends. Learn more about how to get the predictions from the test run.
Next steps
Prevent imbalanced data and overfitting.
Tutorial: Use automated machine learning to predict taxi fares - Split data section.
How to Auto-train a time-series forecast model.
Data featurization in automated machine learning
5/25/2022 • 14 minutes to read • Edit Online
Prerequisites
This article assumes that you already know how to configure an automated ML experiment.
IMPORTANT
The Python commands in this article require the latest azureml-train-automl package version.
Install the latest azureml-train-automl package to your local environment.
For details on the latest azureml-train-automl package, see the release notes.
Configure featurization
In every automated machine learning experiment, automatic scaling and normalization techniques are applied
to your data by default. These techniques are types of featurization that help certain algorithms that are
sensitive to features on different scales. You can enable more featurization, such as missing-values imputation,
encoding, and transforms.
NOTE
Steps for automated machine learning featurization (such as feature normalization, handling missing data, or converting
text to numeric) become part of the underlying model. When you use the model for predictions, the same featurization
steps that are applied during training are applied to your input data automatically.
For experiments that you configure with the Python SDK, you can enable or disable the featurization setting and
further specify the featurization steps to be used for your experiment. If you're using the Azure Machine
Learning studio, see the steps to enable featurization.
The following table shows the accepted settings for featurization in the AutoMLConfig class:
Automatic featurization
The following table summarizes techniques that are automatically applied to your data. These techniques are
applied for experiments that are configured by using the SDK or the studio UI. To disable this behavior, set
"featurization": 'off' in your AutoMLConfig object.
NOTE
If you plan to export your AutoML-created models to an ONNX model, only the featurization options indicated with an
asterisk ("*") are supported in the ONNX format. Learn more about converting models to ONNX.
Drop high cardinality or no variance features * Drop these features from training and validation sets.
Applies to features with all values missing, with the same
value across all rows, or with high cardinality (for example,
hashes, IDs, or GUIDs).
Impute missing values * For numeric features, impute with the average of values in
the column.
Generate more features * For DateTime features: Year, Month, Day, Day of week, Day
of year, Quarter, Week of the year, Hour, Minute, Second.
Transform and encode * Transform numeric features that have few unique values into
categorical features.
In every automated machine learning experiment, your data is automatically scaled or normalized to help
algorithms perform well. During model training, one of the following scaling or normalization techniques are
applied to each model.
SparseNormalizer Each sample (that is, each row of the data matrix) with at
least one non-zero component is rescaled independently of
other samples so that its norm (l1 or l2) equals one
Data guardrails
Data guardrails help you identify potential issues with your data (for example, missing values or class
imbalance). They also help you take corrective actions for improved results.
Data guardrails are applied:
For SDK experiments : When the parameters "featurization": 'auto' or validation=auto are specified in
your AutoMLConfig object.
For studio experiments : When automatic featurization is enabled.
You can review the data guardrails for your experiment:
By setting show_output=True when you submit an experiment by using the SDK.
In the studio, on the Data guardrails tab of your automated ML run.
Data guardrail states
Data guardrails display one of three states:
High cardinality feature handling Passed Your inputs were analyzed, and no
high-cardinality features were
detected.
Done
High-cardinality features were
detected in your inputs and were
handled.
Class balancing detection Passed Your inputs were analyzed, and all
classes are balanced in your training
data. A dataset is considered to be
balanced if each class has good
Alerted representation in the dataset, as
measured by number and ratio of
samples.
Done
Imbalanced classes were detected in
your inputs. To fix model bias, fix the
balancing problem. Learn more about
imbalanced data.
Customize featurization
You can customize your featurization settings to ensure that the data and features that are used to train your ML
model result in relevant predictions.
To customize featurizations, specify "featurization": FeaturizationConfig in your AutoMLConfig object. If you're
using the Azure Machine Learning studio for your experiment, see the how-to article. To customize featurization
for forecastings task types, refer to the forecasting how-to.
Supported customizations include:
C USTO M IZ AT IO N DEF IN IT IO N
Column purpose update Override the autodetected feature type for the specified
column.
Transformer parameter update Update the parameters for the specified transformer.
Currently supports Imputer (mean, most frequent, and
median) and HashOneHotEncoder.
NOTE
The drop columns functionality is deprecated as of SDK version 1.19. Drop columns from your dataset as part of data
cleansing, prior to consuming it in your automated ML experiment.
featurization_config = FeaturizationConfig()
featurization_config.blocked_transformers = ['LabelEncoder']
featurization_config.drop_columns = ['aspiration', 'stroke']
featurization_config.add_column_purpose('engine-size', 'Numeric')
featurization_config.add_column_purpose('body-style', 'CategoricalHash')
#default strategy mean, add transformer param for for 3 columns
featurization_config.add_transformer_params('Imputer', ['engine-size'], {"strategy": "median"})
featurization_config.add_transformer_params('Imputer', ['city-mpg'], {"strategy": "median"})
featurization_config.add_transformer_params('Imputer', ['bore'], {"strategy": "most_frequent"})
featurization_config.add_transformer_params('HashOneHotEncoder', [], {"number_of_bits": 3})
Featurization transparency
Every AutoML model has featurization automatically applied. Featurization includes automated feature
engineering (when "featurization": 'auto' ) and scaling and normalization, which then impacts the selected
algorithm and its hyperparameter values. AutoML supports different methods to ensure you have visibility into
what was applied to your model.
Consider this forecasting example:
There are four input features: A (Numeric), B (Numeric), C (Numeric), D (DateTime).
Numeric feature C is dropped because it is an ID column with all unique values.
Numeric features A and B have missing values and hence are imputed by the mean.
DateTime feature D is featurized into 11 different engineered features.
To get this information, use the fitted_model output from your automated ML experiment run.
automl_config = AutoMLConfig(…)
automl_run = experiment.submit(automl_config …)
best_run, fitted_model = automl_run.get_output()
NOTE
Use 'timeseriestransformer' for task='forecasting', else use 'datatransformer' for 'regression' or 'classification' task.
fitted_model.named_steps['timeseriestransformer']. get_engineered_feature_names ()
['A', 'B', 'A_WASNULL', 'B_WASNULL', 'year', 'half', 'quarter', 'month', 'day', 'hour', 'am_pm', 'hour12',
'wday', 'qday', 'week']
fitted_model.named_steps['timeseriestransformer'].get_featurization_summary()
Output
[{'RawFeatureName': 'A',
'TypeDetected': 'Numeric',
'Dropped': 'No',
'EngineeredFeatureCount': 2,
'Tranformations': ['MeanImputer', 'ImputationMarker']},
{'RawFeatureName': 'B',
'TypeDetected': 'Numeric',
'Dropped': 'No',
'EngineeredFeatureCount': 2,
'Tranformations': ['MeanImputer', 'ImputationMarker']},
{'RawFeatureName': 'C',
'TypeDetected': 'Numeric',
'Dropped': 'Yes',
'EngineeredFeatureCount': 0,
'Tranformations': []},
{'RawFeatureName': 'D',
'TypeDetected': 'DateTime',
'Dropped': 'No',
'EngineeredFeatureCount': 11,
'Tranformations':
['DateTime','DateTime','DateTime','DateTime','DateTime','DateTime','DateTime','DateTime','DateTime','DateTim
e','DateTime']}]
O UT P UT DEF IN IT IO N
The following sample output is from running fitted_model.steps for a chosen run:
[('RobustScaler',
RobustScaler(copy=True,
quantile_range=[10, 90],
with_centering=True,
with_scaling=True)),
('LogisticRegression',
LogisticRegression(C=0.18420699693267145, class_weight='balanced',
dual=False,
fit_intercept=True,
intercept_scaling=1,
max_iter=100,
multi_class='multinomial',
n_jobs=1, penalty='l2',
random_state=None,
solver='newton-cg',
tol=0.0001,
verbose=0,
warm_start=False))
This helper function returns the following output for a particular run using
LogisticRegression with RobustScalar as the specific algorithm.
RobustScaler
{'copy': True,
'quantile_range': [10, 90],
'with_centering': True,
'with_scaling': True}
LogisticRegression
{'C': 0.18420699693267145,
'class_weight': 'balanced',
'dual': False,
'fit_intercept': True,
'intercept_scaling': 1,
'max_iter': 100,
'multi_class': 'multinomial',
'n_jobs': 1,
'penalty': 'l2',
'random_state': None,
'solver': 'newton-cg',
'tol': 0.0001,
'verbose': 0,
'warm_start': False}
If the underlying model does not support the predict_proba() function or the format is incorrect, a model class-
specific exception will be thrown. See the RandomForestClassifier and XGBoost reference docs for examples of
how this function is implemented for different model types.
BERT generally runs longer than other featurizers. For better performance, we recommend using
"STANDARD_NC24r" or "STANDARD_NC24rs_V3" for their RDMA capabilities.
AutoML will distribute BERT training across multiple nodes if they are available (upto a max of eight nodes). This
can be done in your AutoMLConfig object by setting the max_concurrent_iterations parameter to higher than 1.
featurization_config = FeaturizationConfig(dataset_language='deu')
automl_settings = {
"experiment_timeout_minutes": 120,
"primary_metric": 'accuracy',
# All other settings you want to use
"featurization": featurization_config,
Next steps
Learn how to set up your automated ML experiments:
For a code-first experience: Configure automated ML experiments by using the Azure Machine
Learning SDK.
For a low-code or no-code experience: Create your automated ML experiments in the Azure Machine
Learning studio.
Learn more about how and where to deploy a model.
Learn more about how to train a regression model by using automated machine learning or how to train
by using automated machine learning on a remote resource.
Evaluate automated machine learning experiment
results
5/25/2022 • 25 minutes to read • Edit Online
In this article, learn how to evaluate and compare models trained by your automated machine learning
(automated ML) experiment. Over the course of an automated ML experiment, many runs are created and each
run creates a model. For each model, automated ML generates evaluation metrics and charts that help you
measure the model's performance.
For example, automated ML generates the following charts based on experiment type.
Lift curve
Calibration curve
Prerequisites
An Azure subscription. (If you don't have an Azure subscription, create a free account before you begin)
An Azure Machine Learning experiment created with either:
The Azure Machine Learning studio (no code required)
The Azure Machine Learning Python SDK
NOTE
Refer to image metrics section for additional details on metrics for image classification models.
average_precision_score_weighted ,
the arithmetic mean of the average
precision score for each class, weighted
by the number of true instances in
each class.
average_precision_score_binary
, the value of average precision by
treating one specific class as true
class and combine all other classes as
false class.
M ET RIC DESC RIP T IO N C A L C UL AT IO N
NOTE
When a binary classification task is detected, we use numpy.unique to find the set of labels and the later label will be
used as the true class. Since there is a sorting procedure in numpy.unique , the choice of true class will be stable.
Note that multiclass classification metrics are intended for multiclass classification. When applied to a binary
dataset, these metrics won't treat any class as the true class, as you might expect. Metrics that are clearly
meant for multiclass are suffixed with micro , macro , or weighted . Examples include average_precision_score ,
f1_score , precision_score , recall_score , and AUC . For example, instead of calculating recall as
tp / (tp + fn) , the multiclass averaged recall ( micro , macro , or weighted ) averages over both classes of a
binary classification dataset. This is equivalent to calculating the recall for the true class and the false class
separately, and then taking the average of the two.
Besides, although automatic detection of binary classification is supported, it is still recommended to always
specify the true class manually to make sure the binary classification metrics are calculated for the correct
class.
To activate metrics for binary classification datasets when the dataset itself is multiclass, users only need to
specify the class to be treated as true class and these metrics will be calculated.
Confusion matrix
Confusion matrices provide a visual for how a machine learning model is making systematic errors in its
predictions for classification models. The word "confusion" in the name comes from a model "confusing" or
mislabeling samples. A cell at row i and column j in a confusion matrix contains the number of samples in
the evaluation dataset that belong to class C_i and were classified by the model as class C_j .
In the studio, a darker cell indicates a higher number of samples. Selecting Normalized view in the dropdown
will normalize over each matrix row to show the percent of class C_i predicted to be class C_j . The benefit of
the default Raw view is that you can see whether imbalance in the distribution of actual classes caused the
model to misclassify samples from the minority class, a common issue in imbalanced datasets.
The confusion matrix of a good model will have most samples along the diagonal.
Confusion matrix for a good model
Confusion matrix for a bad model
ROC curve
The receiver operating characteristic (ROC) curve plots the relationship between true positive rate (TPR) and
false positive rate (FPR) as the decision threshold changes. The ROC curve can be less informative when training
models on datasets with high class imbalance, as the majority class can drown out contributions from minority
classes.
The area under the curve (AUC) can be interpreted as the proportion of correctly classified samples. More
precisely, the AUC is the probability that the classifier ranks a randomly chosen positive sample higher than a
randomly chosen negative sample. The shape of the curve gives an intuition for relationship between TPR and
FPR as a function of the classification threshold or decision boundary.
A curve that approaches the top-left corner of the chart is approaching a 100% TPR and 0% FPR, the best
possible model. A random model would produce an ROC curve along the y = x line from the bottom-left
corner to the top-right. A worse than random model would have an ROC curve that dips below the y = x line.
TIP
For classification experiments, each of the line charts produced for automated ML models can be used to evaluate the
model per-class or averaged over all classes. You can switch between these different views by clicking on class labels in the
legend to the right of the chart.
Precision-recall curve
The precision-recall curve plots the relationship between precision and recall as the decision threshold changes.
Recall is the ability of a model to detect all positive samples and precision is the ability of a model to avoid
labeling negative samples as positive. Some business problems might require higher recall and some higher
precision depending on the relative importance of avoiding false negatives vs false positives.
TIP
For classification experiments, each of the line charts produced for automated ML models can be used to evaluate the
model per-class or averaged over all classes. You can switch between these different views by clicking on class labels in the
legend to the right of the chart.
TIP
For classification experiments, each of the line charts produced for automated ML models can be used to evaluate the
model per-class or averaged over all classes. You can switch between these different views by clicking on class labels in the
legend to the right of the chart.
TIP
For classification experiments, each of the line charts produced for automated ML models can be used to evaluate the
model per-class or averaged over all classes. You can switch between these different views by clicking on class labels in the
legend to the right of the chart.
Calibration curve
The calibration curve plots a model's confidence in its predictions against the proportion of positive samples at
each confidence level. A well-calibrated model will correctly classify 100% of the predictions to which it assigns
100% confidence, 50% of the predictions it assigns 50% confidence, 20% of the predictions it assigns a 20%
confidence, and so on. A perfectly calibrated model will have a calibration curve following the y = x line where
the model perfectly predicts the probability that samples belong to each class.
An over-confident model will over-predict probabilities close to zero and one, rarely being uncertain about the
class of each sample and the calibration curve will look similar to backward "S". An under-confident model will
assign a lower probability on average to the class it predicts and the associated calibration curve will look
similar to an "S". The calibration curve does not depict a model's ability to classify correctly, but instead its ability
to correctly assign confidence to its predictions. A bad model can still have a good calibration curve if the model
correctly assigns low confidence and high uncertainty.
NOTE
The calibration curve is sensitive to the number of samples, so a small validation set can produce noisy results that can be
hard to interpret. This does not necessarily mean that the model is not well-calibrated.
Regression/forecasting metrics
Automated ML calculates the same performance metrics for each model generated, regardless if it is a
regression or forecasting experiment. These metrics also undergo normalization to enable comparison between
models trained on data with different ranges. To learn more, see metric normalization.
The following table summarizes the model performance metrics generated for regression and forecasting
experiments. Like classification metrics, these metrics are also based on the scikit learn implementations. The
appropriate scikit learn documentation is linked accordingly, in the Calculation field.
Types:
mean_absolute_error
normalized_mean_absolute_error ,
the mean_absolute_error divided by
the range of the data.
Types:
median_absolute_error
normalized_median_absolute_error :
the median_absolute_error divided by
the range of the data.
M ET RIC DESC RIP T IO N C A L C UL AT IO N
Types:
root_mean_squared_error
normalized_root_mean_squared_error
: the root_mean_squared_error divided
by the range of the data.
Types:
root_mean_squared_log_error
normalized_root_mean_squared_log_error
: the root_mean_squared_log_error
divided by the range of the data.
M ET RIC DESC RIP T IO N C A L C UL AT IO N
Metric normalization
Automated ML normalizes regression and forecasting metrics which enables comparison between models
trained on data with different ranges. A model trained on a data with a larger range has higher error than the
same model trained on data with a smaller range, unless that error is normalized.
While there is no standard method of normalizing error metrics, automated ML takes the common approach of
dividing the error by the range of the data: normalized_error = error / (y_max - y_min)
NOTE
The range of data is not saved with the model. If you do inference with the same model on a holdout test set, y_min
and y_max may change according to the test data and the normalized metrics may not be directly used to compare the
model's performance on training and test sets. You can pass in the value of y_min and y_max from your training set to
make the comparison fair.
When evaluating a forecasting model on time series data, automated ML takes extra steps to ensure that
normalization happens per time series ID (grain), because each time series likely has a different distribution of
target values.
Residuals
The residuals chart is a histogram of the prediction errors (residuals) generated for regression and forecasting
experiments. Residuals are calculated as y_predicted - y_true for all samples and then displayed as a
histogram to show model bias.
In this example, note that both models are slightly biased to predict lower than the actual value. This is not
uncommon for a dataset with a skewed distribution of actual targets, but indicates worse model performance. A
good model will have a residuals distribution that peaks at zero with few residuals at the extremes. A worse
model will have a spread out residuals distribution with fewer samples around zero.
Residuals chart for a good model
Residuals chart for a bad model
IMPORTANT
This chart is only available for models generated from training and validation data. We allow up to 20 data points before
and up to 80 data points after the forecast origin. Visuals for models based on test data are not supported at this time.
TIP
The image object detection model evaluation can use coco metrics if the validation_metric_type hyperparameter is
set to be 'coco' as explained in the hyperparameter tuning section.
NOTE
Epoch-level metrics for precision, recall and per_label_metrics are not available when using the 'coco' method.
Model explanations and feature importances
While model evaluation metrics and charts are good for measuring the general quality of a model, inspecting
which dataset features a model used to make its predictions is essential when practicing responsible AI. That's
why automated ML provides a model explanations dashboard to measure and report the relative contributions
of dataset features. See how to view the explanations dashboard in the Azure Machine Learning studio.
For a code first experience, see how to set up model explanations for automated ML experiments with the Azure
Machine Learning Python SDK.
NOTE
Interpretability, best model explanation, is not available for automated ML forecasting experiments that recommend the
following algorithms as the best model or ensemble:
TCNForecaster
AutoArima
ExponentialSmoothing
Prophet
Average
Naive
Seasonal Average
Seasonal Naive
Next steps
Try the automated machine learning model explanation sample notebooks.
For automated ML specific questions, reach out to askautomatedml@microsoft.com.
View automated ML model's training code (preview)
5/25/2022 • 13 minutes to read • Edit Online
IMPORTANT
This feature is currently in public preview. This preview version is provided without a service-level agreement, and it's not
recommended for production workloads. Certain features might not be supported or might have constrained capabilities.
For more information, see Supplemental Terms of Use for Microsoft Azure Previews.
In this article, you learn how to view the generated training code from any automated machine learning trained
model.
Code generation for automated ML trained models allows you to see the following details that automated ML
uses to train and build the model for a specific run.
Data preprocessing
Algorithm selection
Featurization
Hyperparameters
You can select any automated ML trained model, recommended or child run, and view the generated Python
training code that created that specific model.
With the generated model's training code you can,
Learn what featurization process and hyperparameters the model algorithm uses.
Track/version/audit trained models. Store versioned code to track what specific training code is used with
the model that's to be deployed to production.
Customize the training code by changing hyperparameters or applying your ML and algorithms
skills/experience, and retrain a new model with your customized code.
You can generate the code for automated ML experiments with task types classification, regression, and time-
series forecasting.
WARNING
Computer vision models and natural language processing based models in AutoML do not currently support model's
training code generation.
The following diagram illustrates that you can enable code generation for any AutoML created model from the
Azure Machine Learning studio UI or with the Azure Machine Learning SDK. After you select a model, Azure
Machine Learning copies the code files used to create the model, and displays them into your notebooks shared
folder. From here, you can view and customize the code as needed.
Prerequisites
An Azure Machine Learning workspace. To create the workspace, see Create an Azure Machine Learning
workspace.
This article assumes some familiarity with setting up an automated machine learning experiment. Follow
the tutorial or how-to to see the main automated machine learning experiment design patterns.
Automated ML code generation is only available for experiments run on remote Azure ML compute
targets. Code generation isn't supported for local runs.
To enable code generation with the SDK, you have the following options:
You can run your code via a Jupyter notebook in an Azure Machine Learning compute instance,
which contains the latest Azure ML SDK already installed. The compute instance comes with a
ready-to-use Conda environment that is compatible with the automated ML code generation
(preview) capability.
Alternatively, you can create a new local Conda environment on your local machine and then
install the latest Azure ML SDK. How to install AutoML client SDK in Conda environment with the
automl package.
best_run = remote_run.get_best_child()
best_run.download_file("outputs/generated_code/script.py", "script.py")
best_run.download_file("outputs/generated_code/script_run_notebook.ipynb", "script_run_notebook.ipynb")
You also can view the generated code and prepare it for code customization via the Azure Machine Learning
studio UI.
To do so, navigate to the Models tab of the automated ML experiment parent run page. After you select one of
the trained models, you can select the View generated code (preview) button. This button redirects you to
the Notebooks portal extension, where you can view, edit and run the generated code for that particular
selected model.
Alternatively, you can also access to the model's generated code from the top of the child run's page once you
navigate into that child run's page of a particular model.
script.py
The script.py file contains the core logic needed to train a model with the previously used hyperparameters.
While intended to be executed in the context of an Azure ML script run, with some modifications, the model's
training code can also be run standalone in your own on-premises environment.
The script can roughly be broken down into several the following parts: data loading, data preparation, data
featurization, preprocessor/algorithm specification, and training.
Data loading
The function get_training_dataset() loads the previously used dataset. It assumes that the script is run in an
AzureML script run under the same workspace as the original experiment.
def get_training_dataset(dataset_id):
from azureml.core.dataset import Dataset
from azureml.core.run import Run
logger.info("Running get_training_dataset")
ws = Run.get_context().experiment.workspace
dataset = Dataset.get_by_id(workspace=ws, id=dataset_id)
return dataset.to_pandas_dataframe()
When running as part of a script run, Run.get_context().experiment.workspace retrieves the correct workspace.
However, if this script is run inside of a different workspace or run locally without using ScriptRunConfig , you
need to modify the script to explicitly specify the appropriate workspace.
Once the workspace has been retrieved, the original dataset is retrieved by its ID. Another dataset with exactly
the same structure could also be specified by ID or name with the get_by_id() or get_by_name() , respectively.
You can find the ID later on in the script, in a similar section as the following code.
if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument('--training_dataset_id', type=str, default='xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxx',
help='Default training dataset id is populated from the parent run')
args = parser.parse_args()
main(args.training_dataset_id)
You can also opt to replace this entire function with your own data loading mechanism; the only constraints are
that the return value must be a Pandas dataframe and that the data must have the same shape as in the original
experiment.
Data preparation code
The function prepare_data() cleans the data, splits out the feature and sample weight columns and prepares the
data for use in training. This function can vary depending on the type of dataset and the experiment task type:
classification, regression, or time-series forecasting.
The following example shows that in general, the dataframe from the data loading step is passed in. The label
column and sample weights, if originally specified, are extracted and rows containing NaN are dropped from the
input data.
def prepare_data(dataframe):
from azureml.training.tabular.preprocessing import data_cleaning
logger.info("Running prepare_data")
label_column_name = 'y'
return X, y, sample_weights
If you want to do any additional data preparation, it can be done in this step by adding your custom data
preparation code.
Data featurization code
The function generate_data_transformation_config() specifies the featurization step in the final scikit-learn
pipeline. The featurizers from the original experiment are reproduced here, along with their parameters.
For example, possible data transformation that can happen in this function can be based on imputers like,
SimpleImputer() and CatImputer() , or transformers such as StringCastTransformer() and
LabelEncoderTransformer() .
The following is a transformer of type StringCastTransformer() that can be used to transform a set of columns.
In this case, the set indicated by column_names .
def get_mapper_c6ba98(column_names):
# ... Multiple imports to package dependencies, removed for simplicity ...
definition = gen_features(
columns=column_names,
classes=[
{
'class': StringCastTransformer,
},
{
'class': CountVectorizer,
'analyzer': 'word',
'binary': True,
'decode_error': 'strict',
'dtype': numpy.uint8,
'encoding': 'utf-8',
'input': 'content',
'lowercase': True,
'max_df': 1.0,
'max_features': None,
'min_df': 1,
'ngram_range': (1, 1),
'preprocessor': None,
'stop_words': None,
'strip_accents': None,
'token_pattern': '(?u)\\b\\w\\w+\\b',
'tokenizer': wrap_in_lst,
'vocabulary': None,
},
]
)
mapper = DataFrameMapper(features=definition, input_df=True, sparse=True)
return mapper
Be aware that if you have many columns that need to have the same featurization/transformation applied (for
example, 50 columns in several column groups), these columns are handled by grouping based on type.
In the following example, notice that each group has a unique mapper applied. This mapper is then applied to
each of the columns of that group.
def generate_data_transformation_config():
from sklearn.pipeline import FeatureUnion
feature_union = FeatureUnion([
('mapper_ab1045', get_mapper_ab1045(column_group_1)),
('mapper_c6ba98', get_mapper_c6ba98(column_group_3)),
('mapper_9133f9', get_mapper_9133f9(column_group_2)),
])
return feature_union
This approach allows you to have a more streamlined code, by not having a transformer's code-block for each
column, which can be especially cumbersome even when you have tens or hundreds of columns in your dataset.
With classification and regression tasks, [ FeatureUnion ] is used for featurizers. For time-series forecasting
models, multiple time series-aware featurizers are collected into a scikit-learn pipeline, then wrapped in the
TimeSeriesTransformer . Any user provided featurizations for time series forecasting models happens before the
ones provided by automated ML.
Preprocessor specification code
The function generate_preprocessor_config() , if present, specifies a preprocessing step to be done after
featurization in the final scikit-learn pipeline.
Normally, this preprocessing step only consists of data standardization/normalization that's accomplished with
sklearn.preprocessing .
Automated ML only specifies a preprocessing step for non-ensemble classification and regression models.
Here's an example of a generated preprocessor code:
def generate_preprocessor_config():
from sklearn.preprocessing import MaxAbsScaler
preproc = MaxAbsScaler(
copy=True
)
return preproc
algorithm = XGBClassifier(
base_score=0.5,
booster='gbtree',
colsample_bylevel=1,
colsample_bynode=1,
colsample_bytree=1,
gamma=0,
learning_rate=0.1,
max_delta_step=0,
max_depth=3,
min_child_weight=1,
missing=numpy.nan,
n_estimators=100,
n_jobs=-1,
nthread=None,
objective='binary:logistic',
random_state=0,
reg_alpha=0,
reg_lambda=1,
scale_pos_weight=1,
seed=None,
silent=None,
subsample=1,
verbosity=0,
tree_method='auto',
verbose=-10
)
return algorithm
The generated code in most cases uses open source software (OSS) packages and classes. There are instances
where intermediate wrapper classes are used to simplify more complex code. For example, XGBoost classifier
and other commonly used libraries like LightGBM or Scikit-Learn algorithms can be applied.
As an ML Professional, you are able to customize that algorithm's configuration code by tweaking its
hyperparameters as needed based on your skills and experience for that algorithm and your particular ML
problem.
For ensemble models, generate_preprocessor_config_N() (if needed) and generate_algorithm_config_N() are
defined for each learner in the ensemble model, where N represents the placement of each learner in the
ensemble model's list. For stack ensemble models, the meta learner generate_algorithm_config_meta() is
defined.
End to end training code
Code generation emits build_model_pipeline() and train_model() for defining the scikit-learn pipeline and for
calling fit() on it, respectively.
def build_model_pipeline():
from sklearn.pipeline import Pipeline
logger.info("Running build_model_pipeline")
pipeline = Pipeline(
steps=[
('featurization', generate_data_transformation_config()),
('preproc', generate_preprocessor_config()),
('model', generate_algorithm_config()),
]
)
return pipeline
The scikit-learn pipeline includes the featurization step, a preprocessor (if used), and the algorithm or model.
For time-series forecasting models, the scikit-learn pipeline is wrapped in a ForecastingPipelineWrapper , which
has some additional logic needed to properly handle time-series data depending on the applied algorithm. For
all task types, we use PipelineWithYTransformer in cases where the label column needs to be encoded.
Once you have the scikit-Learn pipeline, all that is left to call is the fit() method to train the model:
logger.info("Running train_model")
model_pipeline = build_model_pipeline()
model = model_pipeline.fit(X, y)
return model
The return value from train_model() is the model fitted/trained on the input data.
The main code that runs all the previous functions is the following:
def main(training_dataset_id=None):
from azureml.core.run import Run
# The following code is for when running this code as part of an AzureML script run.
run = Run.get_context()
setup_instrumentation(run)
df = get_training_dataset(training_dataset_id)
X, y, sample_weights = prepare_data(df)
split_ratio = 0.1
try:
(X_train, y_train, sample_weights_train), (X_valid, y_valid, sample_weights_valid) =
split_dataset(X, y, sample_weights, split_ratio, should_stratify=True)
except Exception:
(X_train, y_train, sample_weights_train), (X_valid, y_valid, sample_weights_valid) =
split_dataset(X, y, sample_weights, split_ratio, should_stratify=False)
print(metrics)
for metric in metrics:
run.log(metric, metrics[metric])
Once you have the trained model, you can use it for making predictions with the predict() method. If your
experiment is for a time series model, use the forecast() method for predictions.
y_pred = model.predict(X)
Finally, the model is serialized and saved as a .pkl file named "model.pkl":
script_run_notebook.ipynb
The script_run_notebook.ipynb notebook serves as an easy way to execute script.py on an Azure ML
compute. This notebook is similar to the existing automated ML sample notebooks however, there are a couple
of key differences as explained in the following sections.
Environment
Typically, the training environment for an automated ML run is automatically set by the SDK. However, when
running a custom script run like the generated code, automated ML is no longer driving the process, so the
environment must be specified for the script run to succeed.
Code generation reuses the environment that was used in the original automated ML experiment, if possible.
Doing so guarantees that the training script run doesn't fail due to missing dependencies, and has a side benefit
of not needing a Docker image rebuild, which saves time and compute resources.
If you make changes to script.py that require additional dependencies, or you would like to use your own
environment, you need to update the Create environment cell in script_run_notebook.ipynb accordingly.
For more information about AzureML environments, see the Environment class documentation.
Submit the experiment
Since the generated code isn’t driven by automated ML anymore, instead of creating an AutoMLConfig and then
passing it to experiment.submit() , you need to create a ScriptRunConfig and provide the generated code
(script.py) to it.
The following example contains the parameters and regular dependencies needed to run ScriptRunConfig , such
as compute, environment, etc. For more information on how to use ScriptRunConfig, see Configure and submit
training runs.
src = ScriptRunConfig(source_directory=project_folder,
script='script.py',
compute_target=cpu_cluster,
environment=myenv,
docker_runtime_config=docker_config)
run = experiment.submit(config=src)
except ImportError:
print('Required dependencies are missing; please run pip install azureml-automl-runtime.')
raise
import os
import numpy as np
import pandas as pd
DATA_DIR = "."
filepath = os.path.join(DATA_DIR, 'porto_seguro_safe_driver_test_dataset.csv')
test_data_df = pd.read_csv(filepath)
print(test_data_df.shape)
test_data_df.head(5)
In an Azure ML compute instance you have all the automated ML dependencies, so you’re able to load the
model and predict from any notebook in a compute instance recently created.
However, in order to load that model in a notebook in your custom local Conda environment, you need to have
all the dependencies coming from the environment used when training (AutoML environment) installed.
Next steps
Learn more about how and where to deploy a model.
See how to enable interpretability features specifically within automated ML experiments.
Make predictions with an AutoML ONNX model in
.NET
5/25/2022 • 7 minutes to read • Edit Online
In this article, you learn how to use an Automated ML (AutoML) Open Neural Network Exchange (ONNX) model
to make predictions in a C# .NET Core console application with ML.NET.
ML.NET is an open-source, cross-platform, machine learning framework for the .NET ecosystem that allows you
to train and consume custom machine learning models using a code-first approach in C# or F# as well as
through low-code tooling like Model Builder and the ML.NET CLI. The framework is also extensible and allows
you to leverage other popular machine learning frameworks like TensorFlow and ONNX.
ONNX is an open-source format for AI models. ONNX supports interoperability between frameworks. This
means you can train a model in one of the many popular machine learning frameworks like PyTorch, convert it
into ONNX format, and consume the ONNX model in a different framework like ML.NET. To learn more, visit the
ONNX website.
Prerequisites
.NET Core SDK 3.1 or greater
Text Editor or IDE (such as Visual Studio or Visual Studio Code)
ONNX model. To learn how to train an AutoML ONNX model, see the following bank marketing classification
notebook.
Netron (optional)
cd AutoMLONNXConsoleApp
These packages contain the dependencies required to use an ONNX model in a .NET application. ML.NET
provides an API that uses the ONNX runtime for predictions.
2. Open the Program.cs file and add the following using statements at the top to reference the appropriate
packages.
using System.Linq;
using Microsoft.ML;
using Microsoft.ML.Data;
using Microsoft.ML.Transforms.Onnx;
<ItemGroup>
<None Include="automl-model.onnx">
<CopyToOutputDirectory>PreserveNewest</CopyToOutputDirectory>
</None>
</ItemGroup>
Initialize MLContext
Inside the Main method of your Program class, create a new instance of MLContext .
The MLContext class is a starting point for all ML.NET operations, and initializing mlContext creates a new
ML.NET environment that can be shared across the model lifecycle. It's similar, conceptually, to DbContext in
Entity Framework.
4. Select the last node at the bottom of the graph ( variable_out1 in this case) to display the model's
metadata. The inputs and outputs on the sidebar show you the model's expected inputs, outputs, and data
types. Use this information to define the input and output schema of your model.
Define model input schema
Create a new class called OnnxInput with the following properties inside the Program.cs file.
public class OnnxInput
{
[ColumnName("vendor_id")]
public string VendorId { get; set; }
[ColumnName("rate_code"),OnnxMapType(typeof(Int64),typeof(Single))]
public Int64 RateCode { get; set; }
[ColumnName("trip_distance")]
public float TripDistance { get; set; }
[ColumnName("payment_type")]
public string PaymentType { get; set; }
}
Each of the properties maps to a column in the dataset. The properties are further annotated with attributes.
The ColumnName attribute lets you specify how ML.NET should reference the column when operating on the data.
For example, although the TripDistance property follows standard .NET naming conventions, the model only
knows of a column or feature known as trip_distance . To address this naming discrepancy, the ColumnName
attribute maps the TripDistance property to a column or feature by the name trip_distance .
For numerical values, ML.NET only operates on Single value types. However, the original data type of some of
the columns are integers. The OnnxMapType attribute maps types between ONNX and ML.NET.
To learn more about data attributes, see the ML.NET load data guide.
Define model output schema
Once the data is processed, it produces an output of a certain format. Define your data output schema. Create a
new class called OnnxOutput with the following properties inside the Program.cs file.
Similar to OnnxInput , use the ColumnName attribute to map the variable_out1 output to a more descriptive
name PredictedFare .
}
2. Define the name of the input and output columns. Add the following code inside the
GetPredictionPipeline method.
3. Define your pipeline. An IEstimator provides a blueprint of the operations, input, and output schemas of
your pipeline.
var onnxPredictionPipeline =
mlContext
.Transforms
.ApplyOnnxModel(
outputColumnNames: outputColumns,
inputColumnNames: inputColumns,
ONNX_MODEL_PATH);
In this case, ApplyOnnxModel is the only transform in the pipeline, which takes in the names of the input
and output columns as well as the path to the ONNX model file.
4. An IEstimator only defines the set of operations to apply to your data. What operates on your data is
known as an ITransformer . Use the Fit method to create one from your onnxPredictionPipeline .
return onnxPredictionPipeline.Fit(emptyDv);
The Fit method expects an IDataView as input to perform the operations on. An IDataView is a way to
represent data in ML.NET using a tabular format. Since in this case the pipeline is only used for
predictions, you can provide an empty IDataView to give the ITransformer the necessary input and
output schema information. The fitted ITransformer is then returned for further use in your application.
TIP
In this sample, the pipeline is defined and used within the same application. However, it is recommended that you
use separate applications to define and use your pipeline to make predictions. In ML.NET your pipelines can be
serialized and saved for further use in other .NET end-user applications. ML.NET supports various deployment
targets such as desktop applications, web services, WebAssembly applications*, and many more. To learn more
about saving pipelines, see the ML.NET save and load trained models guide.
*WebAssembly is only supported in .NET Core 5 or greater
5. Inside the Main method, call the GetPredictionPipeline method with the required parameters.
3. Use the predictionEngine to make predictions based on the new testInput data using the Predict
method.
dotnet run
To learn more about making predictions in ML.NET, see the use a model to make predictions guide.
Next steps
Deploy your model as an ASP.NET Core Web API
Deploy your model as a serverless .NET Azure Function
Make predictions with ONNX on computer vision
models from AutoML
5/25/2022 • 32 minutes to read • Edit Online
Prerequisites
Get an AutoML-trained computer vision model for any of the supported image tasks: classification, object
detection, or instance segmentation. Learn more about AutoML support for computer vision tasks.
Install the onnxruntime package. The methods in this article have been tested with versions 1.3.0 to 1.8.0.
Download the labels.json file, which contains all the classes and labels in the training dataset.
labels_file = 'automl_models/labels.json'
best_child_run.download_file(name='train_artifacts/labels.json', output_file_path=labels_file)
onnx_model_path = 'automl_models/model.onnx'
best_child_run.download_file(name='train_artifacts/model.onnx', output_file_path=onnx_model_path)
Use the following model specific arguments to submit the script. For more details on arguments, refer to model
specific hyperparameters and for supported object detection model names refer to the supported model
algorithm section.
To get the argument values needed to create the batch scoring model, refer to the scoring scripts generated
under the outputs folder of the Automl training runs. Use the hyperparameter values available in the model
settings variable inside the scoring file for the best child run.
Multi-class image classification
Multi-label image classification
Object detection with Faster R-CNN or RetinaNet
Object detection with YOLO
Instance segmentation
For multi-class image classification, the generated ONNX model for the best child-run supports batch scoring by
default. Therefore, no model specific arguments are needed for this task type and you can skip to the Load the
labels and ONNX model files section.
Download and keep the ONNX_batch_model_generator_automl_for_images.py file in the current directory and
submit the script. Use ScriptRunConfig to submit the script ONNX_batch_model_generator_automl_for_images.py
available in the azureml-examples GitHub repository, to generate an ONNX model of a specific batch size. In the
following code, the trained model environment is used to submit this script to generate and save the ONNX
model to the outputs directory.
script_run_config = ScriptRunConfig(source_directory='.',
script='ONNX_batch_model_generator_automl_for_images.py',
arguments=arguments,
compute_target=compute_target,
environment=best_child_run.get_environment())
remote_run = experiment.submit(script_run_config)
remote_run.wait_for_completion(wait_post_processing=True)
Once the batch model is generated, either download it from Outputs+logs > outputs manually, or use the
following method:
batch_size= 8 # use the batch size used to generate the model
onnx_model_path = 'automl_models/model.onnx' # local path to save the model
remote_run.download_file(name='outputs/model_'+str(batch_size)+'.onnx', output_file_path=onnx_model_path)
After the model downloading step, you use the ONNX Runtime Python package to perform inferencing by using
the model.onnx file. For demonstration purposes, this article uses the datasets from How to prepare image
datasets for each vision task.
We've trained the models for all vision tasks with their respective datasets to demonstrate ONNX model
inference.
import json
import onnxruntime
labels_file = "automl_models/labels.json"
with open(labels_file) as f:
classes = json.load(f)
print(classes)
try:
session = onnxruntime.InferenceSession(onnx_model_path)
print("ONNX model loaded...")
except Exception as e:
print("Error loading ONNX file: ",str(e))
sess_input = session.get_inputs()
sess_output = session.get_outputs()
print(f"No. of inputs : {len(sess_input)}, No. of outputs : {len(sess_output)}")
This example applies the model trained on the fridgeObjects dataset with 134 images and 4 classes/labels to
explain ONNX model inference. For more information on training an image classification task, see the multi-
class image classification notebook.
Input format
The input is a preprocessed image.
IN P UT N A M E IN P UT SH A P E IN P UT T Y P E DESC RIP T IO N
Output format
The output is an array of logits for all the classes/labels.
O UT P UT N A M E O UT P UT SH A P E O UT P UT T Y P E DESC RIP T IO N
Preprocessing
Multi-class image classification
Multi-label image classification
Object detection with Faster R-CNN or RetinaNet
Object detection with YOLO
Instance segmentation
Perform the following preprocessing steps for the ONNX model inference:
1. Convert the image to RGB.
2. Resize the image to valid_resize_size and valid_resize_size values that correspond to the values used in
the transformation of the validation dataset during training. The default value for valid_resize_size is 256.
3. Center crop the image to height_onnx_crop_size and width_onnx_crop_size . It corresponds to
valid_crop_size with the default value of 224.
4. Change HxWxC to CxHxW .
5. Convert to float type.
6. Normalize with ImageNet's mean = [0.485, 0.456, 0.406] and std = [0.229, 0.224, 0.225] .
If you chose different values for the hyperparameters valid_resize_size and valid_crop_size during training,
then those values should be used.
Get the input shape needed for the ONNX model.
Without PyTorch
import glob
import numpy as np
from PIL import Image
image = image.convert('RGB')
# resize
image = image.resize((resize_size, resize_size))
# center crop
left = (resize_size - crop_size_onnx)/2
top = (resize_size - crop_size_onnx)/2
right = (resize_size + crop_size_onnx)/2
bottom = (resize_size + crop_size_onnx)/2
image = image.crop((left, top, right, bottom))
np_image = np.array(image)
# HWC -> CHW
np_image = np_image.transpose(2, 0, 1) # CxHxW
# normalize the image
mean_vec = np.array([0.485, 0.456, 0.406])
std_vec = np.array([0.229, 0.224, 0.225])
norm_img_data = np.zeros(np_image.shape).astype('float32')
for i in range(np_image.shape[0]):
norm_img_data[i,:,:] = (np_image[i,:,:]/255 - mean_vec[i])/std_vec[i]
# following code loads only batch_size number of images for demonstrating ONNX inference
# make sure that the data directory has at least batch_size number of images
image_files = glob.glob(test_images_path)
img_processed_list = []
for i in range(batch_size):
img = Image.open(image_files[i])
img_processed_list.append(preprocess(img, resize_size, crop_size_onnx))
if len(img_processed_list) > 1:
img_data = np.concatenate(img_processed_list)
elif len(img_processed_list) == 1:
img_data = img_processed_list[0]
else:
img_data = None
import glob
import torch
import numpy as np
from PIL import Image
from torchvision import transforms
img_data = transform(image)
img_data = img_data.numpy()
img_data = np.expand_dims(img_data, axis=0)
return img_data
# following code loads only batch_size number of images for demonstrating ONNX inference
# make sure that the data directory has at least batch_size number of images
image_files = glob.glob(test_images_path)
img_processed_list = []
for i in range(batch_size):
img = Image.open(image_files[i])
img_processed_list.append(preprocess(img, resize_size, crop_size_onnx))
if len(img_processed_list) > 1:
img_data = np.concatenate(img_processed_list)
elif len(img_processed_list) == 1:
img_data = img_processed_list[0]
else:
img_data = None
sess_input = onnx_session.get_inputs()
sess_output = onnx_session.get_outputs()
print(f"No. of inputs : {len(sess_input)}, No. of outputs : {len(sess_output)}")
# predict with ONNX Runtime
output_names = [ output.name for output in sess_output]
scores = onnx_session.run(output_names=output_names,\
input_feed={sess_input[0].name: img_data})
return scores[0]
Postprocessing
Multi-class image classification
Multi-label image classification
Object detection with Faster R-CNN or RetinaNet
Object detection with YOLO
Instance segmentation
Apply softmax() over predicted values to get classification confidence scores (probabilities) for each class. Then
the prediction will be the class with the highest probability.
Without PyTorch
def softmax(x):
e_x = np.exp(x - np.max(x, axis=1, keepdims=True))
return e_x / np.sum(e_x, axis=1, keepdims=True)
conf_scores = softmax(scores)
class_preds = np.argmax(conf_scores, axis=1)
print("predicted classes:", ([(class_idx, classes[class_idx]) for class_idx in class_preds]))
With PyTorch
Visualize predictions
Multi-class image classification
Multi-label image classification
Object detection with Faster R-CNN or RetinaNet
Object detection with YOLO
Instance segmentation
label = class_preds[sample_image_index]
if torch.is_tensor(label):
label = label.item()
conf_score = conf_scores[sample_image_index]
if torch.is_tensor(conf_score):
conf_score = np.max(conf_score.tolist())
else:
conf_score = np.max(conf_score)
color = 'red'
plt.text(30, 30, display_text, color=color, fontsize=30)
plt.show()
Next steps
Learn more about computer vision tasks in AutoML
Troubleshoot AutoML experiments
Interpretability: Model explainability in automated
ML (preview)
5/25/2022 • 6 minutes to read • Edit Online
Prerequisites
Interpretability features. Run pip install azureml-interpret to get the necessary package.
Knowledge of building automated ML experiments. For more information on how to use the Azure Machine
Learning SDK, complete this regression model tutorial or see how to configure automated ML experiments.
NOTE
Interpretability, model explanation, is not available for the TCNForecaster model recommended by Auto ML forecasting
experiments.
client = ExplanationClient.from_run(best_run)
engineered_explanations = client.download_model_explanation(raw=False)
print(engineered_explanations.get_feature_importance_dict())
client = ExplanationClient.from_run(best_run)
raw_explanations = client.download_model_explanation(raw=True)
print(raw_explanations.get_feature_importance_dict())
The MimicWrapper also takes the automl_run object where the engineered explanations will be uploaded.
Use Mimic Explainer for computing and visualizing engineered feature importance
You can call the explain() method in MimicWrapper with the transformed test samples to get the feature
importance for the generated engineered features. You can also sign in to Azure Machine Learning studio to
view the explanations dashboard visualization of the feature importance values of the generated engineered
features by automated ML featurizers.
For models trained with automated ML, you can get the best model using the get_output() method and
compute explanations locally. You can visualize the explanation results with ExplanationDashboard from the
raiwidgets package.
Use Mimic Explainer for computing and visualizing raw feature importance
You can call the explain() method in MimicWrapper with the transformed test samples to get the feature
importance for the raw features. In the Machine Learning studio, you can view the dashboard visualization of
the feature importance values of the raw features.
# Register trained automl model present in the 'outputs' folder in the artifacts
original_model = automl_run.register_model(model_name='automl_model',
model_path='outputs/model.pkl')
azureml_pip_packages = [
'azureml-interpret', 'azureml-train-automl', 'azureml-defaults'
]
with open("myenv.yml","w") as f:
f.write(myenv.serialize_to_string())
with open("myenv.yml","r") as f:
print(f.read())
def init():
global automl_model
global scoring_explainer
# Retrieve the path to the model file using the model name
# Assume original model is named automl_model
automl_model_path = Model.get_model_path('automl_model')
scoring_explainer_path = Model.get_model_path('scoring_explainer')
automl_model = joblib.load(automl_model_path)
scoring_explainer = joblib.load(scoring_explainer_path)
def run(raw_data):
data = pd.read_json(raw_data, orient='records')
# Make prediction
predictions = automl_model.predict(data)
# Setup for inferencing explanations
automl_explainer_setup_obj = automl_setup_model_explanations(automl_model,
X_test=data, task='classification')
# Retrieve model explanations for engineered explanations
engineered_local_importance_values =
scoring_explainer.explain(automl_explainer_setup_obj.X_test_transform)
# Retrieve model explanations for raw explanations
raw_local_importance_values = scoring_explainer.explain(automl_explainer_setup_obj.X_test_transform,
get_raw=True)
# You can return any data type as long as it is JSON-serializable
return {'predictions': predictions.tolist(),
'engineered_local_importance_values': engineered_local_importance_values,
'raw_local_importance_values': raw_local_importance_values}
aciconfig = AciWebservice.deploy_configuration(cpu_cores=1,
memory_gb=1,
tags={"data": "Bank Marketing",
"method" : "local_explanation"},
description='Get local explanations for Bank marketing test
data')
myenv = Environment.from_conda_specification(name="myenv", file_path="myenv.yml")
inference_config = InferenceConfig(entry_script="score_local_explain.py", environment=myenv)
if service.state == 'Healthy':
# Serialize the first row of the test data into json
X_test_json = X_test[:1].to_json(orient='records')
print(X_test_json)
# Call the service to get the predictions and the engineered explanations
output = service.run(X_test_json)
# Print the predicted value
print(output['predictions'])
# Print the engineered feature importances for the predicted value
print(output['engineered_local_importance_values'])
# Print the raw feature importances for the predicted value
print('raw_local_importance_values:\n{}\n'.format(output['raw_local_importance_values']))
For more information on the explanation dashboard visualizations and specific plots, please refer to the how-to
doc on interpretability.
Next steps
For more information about how you can enable model explanations and feature importance in areas other
than automated ML, see more techniques for model interpretability.
Troubleshoot automated ML experiments in Python
5/25/2022 • 7 minutes to read • Edit Online
Version dependencies
AutoML dependencies to newer package versions break compatibility . After SDK version 1.13.0, models
aren't loaded in older SDKs due to incompatibility between the older versions pinned in previous AutoML
packages, and the newer versions pinned today.
Expect errors such as:
Module not found errors such as,
No module named 'sklearn.decomposition._truncated_svd
If there is a version mismatch, upgrade scikit-learn and/or pandas to correct version with the
following,
If your SDK training version is less than or equal to 1.12.0, you need
AutoML pandas == 0.23.4 and
sckit-learn==0.20.3 .
If there is a version mismatch, downgrade scikit-learn and/or pandas to correct version with the
following,
Setup
AutoML package changes since version 1.0.76 require the previous version to be uninstalled before updating to
the new version.
ImportError: cannot import name AutoMLConfig
If you encounter this error after upgrading from an SDK version before v1.0.76 to v1.0.76 or later, resolve
the error by running: pip uninstall azureml-train automl and then pip install azureml-train-automl .
The automl_setup.cmd script does this automatically.
automl_setup fails
On Windows, run automl_setup from an Anaconda Prompt. Install Miniconda.
Ensure that conda 64-bit version 4.4.10 or later is installed. You can check the bit with the
conda info command. The platform should be win-64 for Windows or osx-64 for Mac. To
check the version use the command conda -V . If you have a previous version installed, you can
update it by using the command: conda update conda . To check 32-bit by running
Ensure that conda is installed.
Linux - gcc: error trying to exec 'cc1plus'
1. If the gcc: error trying to exec 'cc1plus': execvp: No such file or directory error is
encountered, install the GCC build tools for your Linux distribution. For example, on Ubuntu,
use the command sudo apt-get install build-essential .
2. Pass a new name as the first parameter to automl_setup to create a new conda
environment. View existing conda environments using conda env list and remove them
with conda env remove -n <environmentname> .
automl_setup_linux.sh fails : If automl_setup_linus.sh fails on Ubuntu Linux with the error:
unable to execute 'gcc': No such file or directory
1. Make sure that outbound ports 53 and 80 are enabled. On an Azure virtual machine, you can do this
from the Azure portal by selecting the VM and clicking on Networking .
2. Run the command: sudo apt-get update
3. Run the command: sudo apt-get install build-essential --fix-missing
4. Run automl_setup_linux.sh again
configuration.ipynb fails :
For local conda, first ensure that automl_setup has successfully run.
Ensure that the subscription_id is correct. Find the subscription_id in the Azure portal by selecting All
Service and then Subscriptions. The characters "<" and ">" should not be included in the
subscription_id value. For example, subscription_id = "12345678-90ab-1234-5678-1234567890abcd" has
the valid format.
Ensure Contributor or Owner access to the subscription.
Check that the region is one of the supported regions: eastus2 , eastus , westcentralus ,
southeastasia , westeurope , australiaeast , westus2 , southcentralus .
Ensure access to the region using the Azure portal.
workspace.from_config fails :
If the call ws = Workspace.from_config() fails:
1. Ensure that the configuration.ipynb notebook has run successfully.
2. If the notebook is being run from a folder that is not under the folder where the configuration.ipynb
was run, copy the folder aml_config and the file config.json that it contains to the new folder.
Workspace.from_config reads the config.json for the notebook folder or its parent folder.
3. If a new subscription, resource group, workspace, or region, is being used, make sure that you run the
configuration.ipynb notebook again. Changing config.json directly will only work if the workspace
already exists in the specified resource group under the specified subscription.
4. If you want to change the region, change the workspace, resource group, or subscription.
Workspace.create will not create or update a workspace if it already exists, even if the region specified
is different.
TensorFlow
As of version 1.5.0 of the SDK, automated machine learning does not install TensorFlow models by default. To
install TensorFlow and use it with your automated ML experiments, install tensorflow==1.12.0 via
CondaDependencies .
Numpy failures
import numpy fails in Windows : Some Windows environments see an error loading numpy with the
latest Python version 3.6.8. If you see this issue, try with Python version 3.6.7.
import numpy fails : Check the TensorFlow version in the automated ml conda environment. Supported
versions are < 1.13. Uninstall TensorFlow from the environment if version is >= 1.13.
You can check the version of TensorFlow and uninstall as follows:
1. Start a command shell, activate conda environment where automated ml packages are installed.
2. Enter pip freeze and look for tensorflow , if found, the version listed should be < 1.13
3. If the listed version is not a supported version, pip uninstall tensorflow in the command shell and enter y
for confirmation.
jwt.exceptions.DecodeError
Exact error message:
jwt.exceptions.DecodeError: It is required that you pass in a value for the "algorithms" argument when
calling decode()
.
For SDK versions <= 1.17.0, installation might result in an unsupported version of PyJWT. Check that the PyJWT
version in the automated ml conda environment is a supported version. That is PyJWT version < 2.0.0.
You may check the version of PyJWT as follows:
1. Start a command shell and activate conda environment where automated ML packages are installed.
2. Enter pip freeze and look for PyJWT , if found, the version listed should be < 2.0.0
2. If that is not viable, uninstall PyJWT from the environment and install the right version as follows:
a. pip uninstall PyJWT in the command shell and enter y for confirmation.
b. Install using pip install 'PyJWT<2.0.0' .
Data access
For automated ML runs, you need to ensure the file datastore that connects to your AzureFile storage has the
appropriate authentication credentials. Otherwise, the following message results. Learn how to update your data
access authentication credentials.
Error message:
Could not create a connection to the AzureFileService due to missing credentials. Either an Account Key or
SAS token needs to be linked the default workspace blob store.
Data schema
When you try to create a new automated ML experiment via the Edit and submit button in the Azure Machine
Learning studio, the data schema for the new experiment must match the schema of the data that was used in
the original experiment. Otherwise, an error message similar to the following results. Learn more about how to
edit and submit experiments from the studio UI.
Error message non-vision experiments:
Schema mismatch error: (an) additional column(s): "Column1: String, Column2: String, Column3: String", (a)
missing column(s)
Databricks
See How to configure an automated ML experiment with Databricks.
If this pattern is expected in your time series, you can switch your primary metric to normalized root mean
squared error .
Failed deployment
For versions <= 1.18.0 of the SDK, the base image created for deployment may fail with the following error:
ImportError: cannot import name cached_property from werkzeug .
Experiment throttling
If you have over 100 automated ML experiments, this may cause new automated ML experiments to have long
run times.
Next steps
Learn more about how to train a regression model with Automated machine learning or how to train
using Automated machine learning on a remote resource.
Learn more about how and where to deploy a model.
Deploy and score a machine learning model by
using an online endpoint
5/25/2022 • 16 minutes to read • Edit Online
Prerequisites
To use Azure Machine Learning, you must have an Azure subscription. If you don't have an Azure
subscription, create a free account before you begin. Try the free or paid version of Azure Machine
Learning.
Install and configure the Azure CLI and the ml extension to the Azure CLI. For more information, see
Install, set up, and use the CLI (v2).
You must have an Azure resource group, and you (or the service principal you use) must have
Contributor access to it. A resource group is created in Install, set up, and use the CLI (v2).
You must have an Azure Machine Learning workspace. A workspace is created in Install, set up, and use
the CLI (v2).
If you haven't already set the defaults for the Azure CLI, save your default settings. To avoid passing in the
values for your subscription, workspace, and resource group multiple times, run this code:
Azure role-based access controls (Azure RBAC) is used to grant access to operations in Azure Machine
Learning. To perform the steps in this article, your user account must be assigned the owner or
contributor role for the Azure Machine Learning workspace, or a custom role allowing
Microsoft.MachineLearningServices/workspaces/onlineEndpoints/* . For more information, see Manage
access to an Azure Machine Learning workspace.
(Optional) To deploy locally, you must install Docker Engine on your local computer. We highly
recommend this option, so it's easier to debug issues.
IMPORTANT
The examples in this document assume that you are using the Bash shell. For example, from a Linux system or Windows
Subsystem for Linux.
To set your endpoint name, choose one of the following commands, depending on your operating system
(replace YOUR_ENDPOINT_NAME with a unique name).
For Unix, run this command:
export ENDPOINT_NAME="<YOUR_ENDPOINT_NAME>"
NOTE
Endpoint names must be unique within an Azure region. For example, in the Azure westus2 region, there can be only
one endpoint with the name my-endpoint .
$schema: https://azuremlschemas.azureedge.net/latest/managedOnlineEndpoint.schema.json
name: my-endpoint
auth_mode: key
NOTE
For a full description of the YAML, see Online endpoint YAML reference.
The reference for the endpoint YAML format is described in the following table. To learn how to specify these
attributes, see the YAML example in Prepare your system or the online endpoint YAML reference. For
information about limits related to managed endpoints, see Manage and increase quotas for resources with
Azure Machine Learning.
K EY DESC RIP T IO N
The example contains all the files needed to deploy a model on an online endpoint. To deploy a model, you must
have:
Model files (or the name and version of a model that's already registered in your workspace). In the example,
we have a scikit-learn model that does regression.
The code that's required to score the model. In this case, we have a score.py file.
An environment in which your model runs. As you'll see, the environment might be a Docker image with
Conda dependencies, or it might be a Dockerfile.
Settings to specify the instance type and scaling capacity.
The following snippet shows the endpoints/online/managed/sample/blue-deployment.yml file, with all the
required inputs:
$schema: https://azuremlschemas.azureedge.net/latest/managedOnlineDeployment.schema.json
name: blue
endpoint_name: my-endpoint
model:
path: ../../model-1/model/
code_configuration:
code: ../../model-1/onlinescoring/
scoring_script: score.py
environment:
conda_file: ../../model-1/environment/conda.yml
image: mcr.microsoft.com/azureml/openmpi3.1.2-ubuntu18.04:20210727.v1
instance_type: Standard_DS2_v2
instance_count: 1
K EY DESC RIP T IO N
instance_type The VM SKU that will host your deployment instances. For
more information, see Managed online endpoints supported
VM SKUs.
During deployment, the local files such as the Python source for the scoring model, are uploaded from the
development environment.
For more information about the YAML schema, see the online endpoint YAML reference.
NOTE
To use Kubernetes instead of managed endpoints as a compute target:
1. Create and attach your Kubernetes cluster as a compute target to your Azure Machine Learning workspace by using
Azure Machine Learning studio.
2. Use the endpoint YAML to target Kubernetes instead of the managed endpoint YAML. You'll need to edit the YAML to
change the value of target to the name of your registered compute target. You can use this deployment.yaml that
has additional properties applicable to Kubernetes deployment.
All the commands that are used in this article (except the optional SLA monitoring and Azure Log Analytics integration)
can be used either with managed endpoints or with Kubernetes endpoints.
For registration, you can extract the YAML definitions of model and environment into separate YAML files and
use the commands az ml model create and az ml environment create . To learn more about these commands,
run az ml model create -h and az ml environment create -h .
Use different CPU and GPU instance types
The preceding YAML uses a general-purpose type ( Standard_F2s_v2 ) and a non-GPU Docker image (in the
YAML, see the image attribute). For GPU compute, choose a GPU compute type SKU and a GPU Docker image.
For supported general-purpose and GPU instance types, see Managed online endpoints supported VM SKUs.
For a list of Azure Machine Learning CPU and GPU base images, see Azure Machine Learning base images.
Use more than one model
Currently, you can specify only one model per deployment in the YAML. If you have more than one model, when
you register the model, copy all the models as files or subdirectories into a folder that you use for registration. In
your scoring script, use the environment variable AZUREML_MODEL_DIR to get the path to the model root folder.
The underlying directory structure is retained.
As noted earlier, the code_configuration.scoring_script must have an init() function and a run() function.
This example uses the score.py file. The init() function is called when the container is initialized or started.
Initialization typically occurs shortly after the deployment is created or updated. Write logic here for global
initialization operations like caching the model in memory (as we do in this example). The run() function is
called for every invocation of the endpoint and should do the actual scoring and prediction. In the example, we
extract the data from the JSON input, call the scikit-learn model's predict() method, and then return the result.
NOTE
To deploy locally, Docker Engine must be installed.
Docker Engine must be running. Docker Engine typically starts when the computer starts. If it doesn't, you can
troubleshoot Docker Engine.
IMPORTANT
The goal of a local endpoint deployment is to validate and debug your code and configuration before you deploy to
Azure. Local deployment has the following limitations:
Local endpoints do not support traffic rules, authentication, or probe settings.
Local endpoints support only one deployment per endpoint.
The --local flag directs the CLI to deploy the endpoint in the Docker environment.
TIP
Use Visual Studio Code to test and debug your endpoints locally. For more information, see debug online endpoints
locally in Visual Studio Code.
The output should appear similar to the following JSON. Note that the provisioning_state is Succeeded .
{
"auth_mode": "key",
"location": "local",
"name": "docs-endpoint",
"properties": {},
"provisioning_state": "Succeeded",
"scoring_uri": "http://localhost:49158/score",
"tags": {},
"traffic": {}
}
If you want to use a REST client (like curl), you must have the scoring URI. To get the scoring URI, run
az ml online-endpoint show --local -n $ENDPOINT_NAME . In the returned data, find the scoring_uri attribute.
Sample curl based commands are available later in this doc.
Review the logs for output from the invoke operation
In the example score.py file, the run() method logs some output to the console. You can view this output by
using the get-logs command again:
To create the deployment named blue under the endpoint, run the following code:
This deployment might take up to 15 minutes, depending on whether the underlying environment or image is
being built for the first time. Subsequent deployments that use the same environment will finish processing
more quickly.
IMPORTANT
The --all-traffic flag in the above az ml online-deployment create allocates 100% of the traffic to the endpoint to the
newly created deployment. Though this is helpful for development and testing purposes, for production, you might want
to open traffic to the new deployment through an explicit command. For example,
az ml online-endpoint update -n $ENDPOINT_NAME --traffic "blue=100"
TIP
If you prefer not to block your CLI console, you may add the flag --no-wait to the command. However, this will
stop the interactive display of the deployment status.
Use Troubleshooting online endpoints deployment to debug errors.
You can list all the endpoints in the workspace in a table format by using the list command:
By default, logs are pulled from inference-server. To see the logs from storage-initializer (it mounts assets like
model and code to the container), add the --container storage-initializer flag.
Invoke the endpoint to score data by using your model
You can use either the invoke command or a REST client of your choice to invoke the endpoint and score some
data:
The following example shows how to get the key used to authenticate to the endpoint:
TIP
You can control which Azure Active Directory security principals can get the authentication key by assigning them to a
custom role that allows Microsoft.MachineLearningServices/workspaces/onlineEndpoints/token/action and
Microsoft.MachineLearningServices/workspaces/onlineEndpoints/listkeys/action . For more information, see
Manage access to an Azure Machine Learning workspace.
curl --request POST "$SCORING_URI" --header "Authorization: Bearer $ENDPOINT_KEY" --header 'Content-Type:
application/json' --data @endpoints/online/model-1/sample-request.json
Notice we use show and get-credentials commands to get the authentication credentials. Also notice that
we're using the --query flag to filter attributes to only what we need. To learn more about --query , see Query
Azure CLI command output.
To see the invocation logs, run get-logs again.
For information on authenticating using a token, see Authenticate to online endpoints.
(Optional) Update the deployment
If you want to update the code, model, or environment, update the YAML file, and then run the
az ml online-endpoint update command.
NOTE
If you update instance count and along with other model settings (code, model, or environment) in a single update
command: first the scaling operation will be performed, then the other updates will be applied. In production environment
is a good practice to perform these operations separately.
NOTE
Updating by using YAML is declarative. That is, changes in the YAML are reflected in the underlying Azure
Resource Manager resources (endpoints and deployments). A declarative approach facilitates GitOps: All changes
to endpoints and deployments (even instance_count ) go through the YAML. You can make updates without
using the YAML by using the --set flag.
5. Because you modified the init() function ( init() runs when the endpoint is created or updated), the
message Updated successfully will be in the logs. Retrieve the logs by running:
The update command also works with local deployments. Use the same az ml online-deployment update
command with the --local flag.
TIP
With the update command, you can use the --set parameter in the Azure CLI to override attributes in your YAML or
to set specific attributes without passing the YAML file. Using --set for single attributes is especially valuable in
development and test scenarios. For example, to scale up the instance_count value for the first deployment, you could
use the --set instance_count=2 flag. However, because the YAML isn't updated, this technique doesn't facilitate
GitOps.
NOTE
The above is an example of inplace rolling update: i.e. the same deployment is updated with the new configuration, with
20% nodes at a time. If the deployment has 10 nodes, 2 nodes at a time will be updated. For production usage, you
might want to consider blue-green deployment, which offers a safer alternative.
Prerequisites
To use Azure machine learning, you must have an Azure subscription. If you don't have an Azure
subscription, create a free account before you begin. Try the free or paid version of Azure Machine
Learning today.
You must install and configure the Azure CLI and ML extension. For more information, see Install, set up,
and use the CLI (v2).
You must have an Azure Resource group, in which you (or the service principal you use) need to have
Contributor access. You'll have such a resource group if you configured your ML extension per the above
article.
You must have an Azure Machine Learning workspace. You'll have such a workspace if you configured
your ML extension per the above article.
If you've not already set the defaults for Azure CLI, you should save your default settings. To avoid having
to repeatedly pass in the values, run:
An existing online endpoint and deployment. This article assumes that your deployment is as described in
Deploy and score a machine learning model with an online endpoint.
If you haven't already set the environment variable $ENDPOINT_NAME, do so now:
export ENDPOINT_NAME="<YOUR_ENDPOINT_NAME>"
(Recommended) Clone the samples repository and switch to the repository's cli/ directory:
git clone https://github.com/Azure/azureml-examples
cd azureml-examples/cli
The commands in this tutorial are in the file deploy-safe-rollout-online-endpoints.sh and the YAML
configuration files are in the endpoints/online/managed/sample/ subdirectory.
You should see the endpoint identified by $ENDPOINT_NAME and, a deployment called blue .
NOTE
Notice that in the above command we use --set to override the deployment configuration. Alternatively you can
update the yaml file and pass it as an input to the update command using the --file input.
Since we haven't explicitly allocated any traffic to green, it will have zero traffic allocated to it. You can verify that
using the command:
If you want to use a REST client to invoke the deployment directly without going through traffic rules, set the
following HTTP header: azureml-model-deployment: <deployment-name> . The below code snippet uses curl to
invoke the deployment directly. The code snippet should work in Unix/WSL environments:
# get the scoring uri
SCORING_URI=$(az ml online-endpoint show -n $ENDPOINT_NAME -o tsv --query scoring_uri)
# use curl to invoke the endpoint
curl --request POST "$SCORING_URI" --header "Authorization: Bearer $ENDPOINT_KEY" --header 'Content-Type:
application/json' --header "azureml-model-deployment: green" --data @endpoints/online/model-2/sample-
request.json
Once you've tested your green deployment, you can copy (or 'mirror') a percentage of the live traffic to it.
Mirroring traffic doesn't change results returned to clients. Requests still flow 100% to the blue deployment. The
mirrored percentage of the traffic is copied and submitted to the green deployment so you can gather metrics
and logging without impacting your clients. Mirroring is useful when you want to validate a new deployment
without impacting clients. For example, to check if latency is within acceptable bounds and that there are no
HTTP errors.
WARNING
Mirroring traffic uses your endpoint bandwidth quota (default 5 MBPS). Your endpoint bandwidth will be throttled if you
exceed the allocated quota. For information on monitoring bandwidth throttling, see Monitor managed online endpoints.
The following command mirrors 10% of the traffic to the green deployment:
IMPORTANT
Mirroring has the following limitations:
You can only mirror traffic to one deployment.
A deployment can only be set to live or mirror traffic, not both.
Mirrored traffic is not currently supported with K8s.
The maximum mirrored traffic you can configure is 50%. This limit is to reduce the impact on your endpoint bandwidth
quota.
After testing, you can set the mirror traffic to zero to disable mirroring:
Next steps
Deploy models with REST
Create and use online endpoints in the studio
Access Azure resources with a online endpoint and managed identity
Monitor managed online endpoints
Manage and increase quotas for resources with Azure Machine Learning
View costs for an Azure Machine Learning managed online endpoint
Managed online endpoints SKU list
Troubleshooting online endpoints deployment and scoring
Online endpoint YAML reference
Deploy MLflow models to online endpoints
(preview)
5/25/2022 • 5 minutes to read • Edit Online
Prerequisites
Before following the steps in this article, make sure you have the following prerequisites:
An Azure subscription. If you don't have an Azure subscription, create a free account before you begin. Try
the free or paid version of Azure Machine Learning.
The Azure CLI and the ml extension to the Azure CLI. For more information, see Install, set up, and use
the CLI (v2) (preview).
IMPORTANT
The CLI examples in this article assume that you are using the Bash (or compatible) shell. For example, from a
Linux system or Windows Subsystem for Linux.
An Azure Machine Learning workspace. If you don't have one, use the steps in the Install, set up, and use
the CLI (v2) (preview) to create one.
You must have a MLflow model. The examples in this article are based on the models from
https://github.com/Azure/azureml-examples/tree/main/cli/endpoints/online/mlflow.
If you don't have an MLflow formatted model, you can convert your custom ML model to MLflow
format.
The information in this article is based on code samples contained in the azureml-examples repository. To run
the commands locally without having to copy/paste YAML and other files, clone the repo and then change
directories to the cli directory in the repo:
If you haven't already set the defaults for the Azure CLI, save your default settings. To avoid passing in the values
for your subscription, workspace, and resource group multiple times, use the following commands. Replace the
following parameters with values for your specific configuration:
Replace <subscription> with your Azure subscription ID.
Replace <workspace> with your Azure Machine Learning workspace name.
Replace <resource-group> with the Azure resource group that contains your workspace.
Replace <location> with the Azure region that contains your workspace.
TIP
You can see what your current defaults are by using the az configure -l command.
In this code snippets used in this article, the ENDPOINT_NAME environment variable contains the name of the
endpoint to create and use. To set this, use the following command from the CLI. Replace <YOUR_ENDPOINT_NAME>
with the name of your endpoint:
export ENDPOINT_NAME="<YOUR_ENDPOINT_NAME>"
IMPORTANT
For MLflow no-code-deployment, testing via local endpoints is currently not supported.
1. Create a YAML configuration file for your endpoint. The following example configures the name and
authentication mode of the endpoint:
create-endpoint.yaml
$schema: https://azuremlschemas.azureedge.net/latest/managedOnlineEndpoint.schema.json
name: my-endpoint
auth_mode: key
2. To create a new endpoint using the YAML configuration, use the following command:
3. Create a YAML configuration file for the deployment. The following example configures a deployment of
the sklearn-diabetes model to the endpoint created in the previous step:
IMPORTANT
For MLflow no-code-deployment (NCD) to work, setting type to mlflow_model is required,
type: mlflow_model . For more information, see CLI (v2) model YAML schema.
sklearn-deployment.yaml
$schema: https://azuremlschemas.azureedge.net/latest/managedOnlineDeployment.schema.json
name: sklearn-deployment
endpoint_name: my-endpoint
model:
name: mir-sample-sklearn-mlflow-model
version: 1
path: sklearn-diabetes/model
type: mlflow_model
instance_type: Standard_DS2_v2
instance_count: 1
4. To create the deployment using the YAML configuration, use the following command:
sample-request-sklearn.json
{"input_data": {
"columns": [
"age",
"sex",
"bmi",
"bp",
"s1",
"s2",
"s3",
"s4",
"s5",
"s6"
],
"data": [
[ 1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0,9.0,10.0 ],
[ 10.0,2.0,9.0,8.0,7.0,6.0,5.0,4.0,3.0,2.0]
],
"index": [0,1]
}}
[
11633.100167144921,
8522.117402884991
]
Delete endpoint
Once you're done with the endpoint, use the following command to delete it:
$schema: https://azuremlschemas.azureedge.net/latest/model.schema.json
name: sklearn-diabetes-mlflow
version: 1
path: sklearn-diabetes/model
type: mlflow_model
description: Scikit-learn MLflow model.
2. From studio, select your workspace and then use either the endpoints or models page to create the
endpoint deployment:
Endpoints page
Models page
2. Provide a name and authentication type for the endpoint, and then select Next .
3. When selecting a model, select the MLflow model registered previously. Select Next to continue.
4. When you select a model registered in MLflow format, in the Environment step of the wizard, you
don't need a scoring script or an environment.
NOTE
If you have used mlflow.autolog() in your training script, you will see model artifacts in the job's run history.
Azure Machine Learning integrates with MLflow's tracking functionality. You can use mlflow.autolog() for
several common ML frameworks to log model parameters, performance metrics, model artifacts, and even feature
importance graphs.
For more information, see Train models with CLI. Also see the training job samples in the GitHub repository.
Next steps
To learn more, review these articles:
Deploy models with REST (preview)
Create and use online endpoints (preview) in the studio
Safe rollout for online endpoints (preview)
How to autoscale managed online endpoints
Use batch endpoints (preview) for batch scoring
View costs for an Azure Machine Learning managed online endpoint (preview)
Access Azure resources with an online endpoint and managed identity (preview)
Troubleshoot online endpoint deployment
Deploy a TensorFlow model served with TF Serving
using a custom container in an online endpoint
5/25/2022 • 4 minutes to read • Edit Online
WARNING
Microsoft may not be able to help troubleshoot problems caused by a custom image. If you encounter problems, you
may be asked to use the default image or one of the images Microsoft provides to see if the problem is specific to your
image.
Prerequisites
Install and configure the Azure CLI and ML extension. For more information, see Install, set up, and use
the CLI (v2).
You must have an Azure resource group, in which you (or the service principal you use) need to have
Contributor access. You'll have such a resource group if you configured your ML extension per the above
article.
You must have an Azure Machine Learning workspace. You'll have such a workspace if you configured
your ML extension per the above article.
If you've not already set the defaults for Azure CLI, you should save your default settings. To avoid having
to repeatedly pass in the values, run:
To deploy locally, you must have Docker engine running locally. This step is highly recommended . It
will help you debug issues.
BASE_PATH=endpoints/online/custom-container
AML_MODEL_NAME=tfserving-mounted
MODEL_NAME=half_plus_two
MODEL_BASE_PATH=/var/azureml-app/azureml-models/$AML_MODEL_NAME/1
Check that you can send liveness and scoring requests to the image
First, check that the container is "alive," meaning that the process inside the container is still running. You should
get a 200 (OK) response.
curl -v http://localhost:8501/v1/models/$MODEL_NAME
Then, check that you can get predictions about unlabeled data:
$schema: https://azuremlsdk2.blob.core.windows.net/latest/managedOnlineEndpoint.schema.json
name: tfserving-endpoint
auth_mode: aml_token
tfser ving-deployment.yml
$schema: https://azuremlschemas.azureedge.net/latest/managedOnlineDeployment.schema.json
name: tfserving-deployment
endpoint_name: tfserving-endpoint
model:
name: tfserving-mounted
version: 1
path: ./half_plus_two
environment_variables:
MODEL_BASE_PATH: /var/azureml-app/azureml-models/tfserving-mounted/1
MODEL_NAME: half_plus_two
environment:
#name: tfserving
#version: 1
image: docker.io/tensorflow/serving:latest
inference_config:
liveness_route:
port: 8501
path: /v1/models/half_plus_two
readiness_route:
port: 8501
path: /v1/models/half_plus_two
scoring_route:
port: 8501
path: /v1/models/half_plus_two:predict
instance_type: Standard_DS2_v2
instance_count: 1
model:
name: tfserving-mounted
version: 1
path: ./half_plus_two
You can optionally configure your model_mount_path . It enables you to change the path where the model is
mounted. For example, you can have model_mount_path parameter in your tfserving-deployment.yml:
IMPORTANT
The model_mount_path must be a valid absolute path in Linux (the OS of the container image).
name: tfserving-deployment
endpoint_name: tfserving-endpoint
model:
name: tfserving-mounted
version: 1
path: ./half_plus_two
model_mount_path: /var/tfserving-model-mount
.....
Next steps
Safe rollout for online endpoints
Troubleshooting online endpoints deployment
Torch serve sample
Create and use managed online endpoints in the
studio
5/25/2022 • 4 minutes to read • Edit Online
Learn how to use the studio to create and manage your managed online endpoints in Azure Machine Learning.
Use managed online endpoints to streamline production-scale deployments. For more information on managed
online endpoints, see What are endpoints.
In this article, you learn how to:
Create a managed online endpoint
View managed online endpoints
Add a deployment to a managed online endpoint
Update managed online endpoints
Delete managed online endpoints and deployments
Prerequisites
An Azure Machine Learning workspace. For more information, see Create an Azure Machine Learning
workspace.
The examples repository - Clone the AzureML Example repository. This article uses the assets in
/cli/endpoints/online .
Test
Use the Test tab in the endpoints details page to test your managed online deployment. Enter sample input and
view the results.
1. Select the Test tab in the endpoint's detail page.
2. Use the dropdown to select the deployment you want to test.
3. Enter sample input.
4. Select Test .
Monitoring
Use the Monitoring tab to see high-level activity monitor graphs for your managed online endpoint.
To use the monitoring tab, you must select "Enable Application Insight diagnostic and data collection "
when you create your endpoint.
For more information on how viewing additional monitors and alerts, see How to monitor managed online
endpoints.
TIP
The Total traffic percentage must sum to either 0% (to disable traffic) or 100% (to enable traffic).
NOTE
You cannot delete a deployment that has allocated traffic. You must first set traffic allocation for the deployment to 0%
before deleting it.
Next steps
In this article, you learned how to use Azure Machine Learning managed online endpoints. See these next steps:
What are endpoints?
How to deploy managed online endpoints with the Azure CLI
Deploy models with REST
How to monitor managed online endpoints
Troubleshooting managed online endpoints deployment and scoring
View costs for an Azure Machine Learning managed online endpoint
Manage and increase quotas for resources with Azure Machine Learning
High-performance serving with Triton Inference
Server (Preview)
5/25/2022 • 6 minutes to read • Edit Online
NOTE
NVIDIA Triton Inference Server is an open-source third-party software that is integrated in Azure Machine Learning.
Prerequisites
Before following the steps in this article, make sure you have the following prerequisites:
An Azure subscription. If you don't have an Azure subscription, create a free account before you begin. Try
the free or paid version of Azure Machine Learning.
The Azure CLI and the ml extension to the Azure CLI. For more information, see Install, set up, and use
the CLI (v2) (preview).
IMPORTANT
The CLI examples in this article assume that you are using the Bash (or compatible) shell. For example, from a
Linux system or Windows Subsystem for Linux.
An Azure Machine Learning workspace. If you don't have one, use the steps in the Install, set up, and use
the CLI (v2) (preview) to create one.
A working Python 3.8 (or higher) environment.
Access to NCv3-series VMs for your Azure subscription.
IMPORTANT
You may need to request a quota increase for your subscription before you can use this series of VMs. For more
information, see NCv3-series.
The information in this article is based on code samples contained in the azureml-examples repository. To run
the commands locally without having to copy/paste YAML and other files, clone the repo and then change
directories to the cli directory in the repo:
git clone https://github.com/Azure/azureml-examples --depth 1
cd azureml-examples
cd cli
If you haven't already set the defaults for the Azure CLI, save your default settings. To avoid passing in the values
for your subscription, workspace, and resource group multiple times, use the following commands. Replace the
following parameters with values for your specific configuration:
Replace <subscription> with your Azure subscription ID.
Replace <workspace> with your Azure Machine Learning workspace name.
Replace <resource-group> with the Azure resource group that contains your workspace.
Replace <location> with the Azure region that contains your workspace.
TIP
You can see what your current defaults are by using the az configure -l command.
NVIDIA Triton Inference Server requires a specific model repository structure, where there is a directory for each
model and subdirectories for the model version. The contents of each model version subdirectory is determined
by the type of the model and the requirements of the backend that supports the model. To see all the model
repository structure https://github.com/triton-inference-
server/server/blob/main/docs/model_repository.md#model-files
The information in this document is based on using a model stored in ONNX format, so the directory structure
of the model repository is <model-repository>/<model-name>/1/model.onnx . Specifically, this model performs
image identification.
IMPORTANT
For Triton no-code-deployment, testing via local endpoints is currently not supported.
1. To avoid typing in a path for multiple commands, use the following command to set a BASE_PATH
environment variable. This variable points to the directory where the model and associated YAML
configuration files are located:
BASE_PATH=endpoints/online/triton/single-model
2. Use the following command to set the name of the endpoint that will be created. In this example, a
random name is created for the endpoint:
export ENDPOINT_NAME=triton-single-endpt-`echo $RANDOM`
4. Create a YAML configuration file for your endpoint. The following example configures the name and
authentication mode of the endpoint. The one used in the following commands is located at
/cli/endpoints/online/triton/single-model/create-managed-endpoint.yml in the azureml-examples repo
you cloned earlier:
create-managed-endpoint.yaml
$schema: https://azuremlschemas.azureedge.net/latest/managedOnlineEndpoint.schema.json
name: my-endpoint
auth_mode: aml_token
5. To create a new endpoint using the YAML configuration, use the following command:
6. Create a YAML configuration file for the deployment. The following example configures a deployment
named blue to the endpoint created in the previous step. The one used in the following commands is
located at /cli/endpoints/online/triton/single-model/create-managed-deployment.yml in the azureml-
examples repo you cloned earlier:
IMPORTANT
For Triton no-code-deployment (NCD) to work, setting type to triton_model is required,
type: triton_model . For more information, see CLI (v2) model YAML schema.
This deployment uses a Standard_NC6s_v3 VM. You may need to request a quota increase for your subscription
before you can use this VM. For more information, see NCv3-series.
$schema: https://azuremlschemas.azureedge.net/latest/managedOnlineDeployment.schema.json
name: blue
endpoint_name: my-endpoint
model:
name: sample-densenet-onnx-model
version: 1
path: ./models
type: triton_model
instance_count: 1
instance_type: Standard_NC6s_v3
7. To create the deployment using the YAML configuration, use the following command:
TIP
The file /cli/endpoints/online/triton/single-model/triton_densenet_scoring.py in the azureml-examples repo is
used for scoring. The image passed to the endpoint needs pre-processing to meet the size, type, and format
requirements, and post-processing to show the predicted label. The triton_densenet_scoring.py uses the
tritonclient.http library to communicate with the Triton inference server.
3. To score data with the endpoint, use the following command. It submits the image of a peacock
(https://aka.ms/peacock-pic) to the endpoint:
The following screenshot shows how your registered model will look on the Models page of Azure
Machine Learning studio.
2. From studio, select your workspace and then use either the endpoints or models page to create the
endpoint deployment:
Endpoints page
Models page
Learn how to use the Azure Machine Learning REST API to deploy models.
The REST API uses standard HTTP verbs to create, retrieve, update, and delete resources. The REST API works
with any language or tool that can make HTTP requests. REST's straightforward structure makes it a good choice
in scripting environments and for MLOps automation.
In this article, you learn how to use the new REST APIs to:
Create machine learning assets
Create a basic training job
Create a hyperparameter tuning sweep job
Prerequisites
An Azure subscription for which you have administrative rights. If you don't have such a subscription, try
the free or paid personal subscription.
An Azure Machine Learning workspace.
A service principal in your workspace. Administrative REST requests use service principal authentication.
A service principal authentication token. Follow the steps in Retrieve a service principal authentication token
to retrieve this token.
The curl utility. The curl program is available in the Windows Subsystem for Linux or any UNIX distribution.
In PowerShell, curl is an alias for Invoke-WebRequest and curl -d "key=val" -X POST uri becomes
Invoke-WebRequest -Body "key=val" -Method POST -Uri uri .
The service provider uses the api-version argument to ensure compatibility. The api-version argument varies
from service to service. Set the API version as a variable to accommodate future versions:
API_VERSION="2022-05-01"
TIP
You can also use other methods to upload, such as the Azure portal or Azure Storage Explorer.
Once you upload your code, you can specify your code with a PUT request and refer to the datastore with
datastoreId :
curl --location --request PUT
"https://management.azure.com/subscriptions/$SUBSCRIPTION_ID/resourceGroups/$RESOURCE_GROUP/providers/Micros
oft.MachineLearningServices/workspaces/$WORKSPACE/codes/score-sklearn/versions/1?api-version=$API_VERSION" \
--header "Authorization: Bearer $TOKEN" \
--header "Content-Type: application/json" \
--data-raw "{
\"properties\": {
\"codeUri\": \"https://$AZURE_STORAGE_ACCOUNT.blob.core.windows.net/$AZUREML_DEFAULT_CONTAINER/score\"
}
}"
\"modelUri\":\"azureml://subscriptions/$SUBSCRIPTION_ID/resourceGroups/$RESOURCE_GROUP/workspaces/$WORKSPACE
/datastores/$AZUREML_DEFAULT_DATASTORE/paths/model/sklearn_regression_model.pkl\"
}
}"
Create environment
The deployment needs to run in an environment that has the required dependencies. Create the environment
with a PUT request. Use a docker image from Microsoft Container Registry. You can configure the docker image
with Docker and add conda dependencies with condaFile .
In the following snippet, the contents of a Conda environment (YAML file) has been read into an environment
variable:
ENV_VERSION=$RANDOM
curl --location --request PUT
"https://management.azure.com/subscriptions/$SUBSCRIPTION_ID/resourceGroups/$RESOURCE_GROUP/providers/Micros
oft.MachineLearningServices/workspaces/$WORKSPACE/environments/sklearn-env/versions/$ENV_VERSION?api-
version=$API_VERSION" \
--header "Authorization: Bearer $TOKEN" \
--header "Content-Type: application/json" \
--data-raw "{
\"properties\":{
\"condaFile\": \"$CONDA_FILE\",
\"image\": \"mcr.microsoft.com/azureml/openmpi3.1.2-ubuntu18.04:20210727.v1\"
}
}"
Create endpoint
Create the online endpoint:
response=$(curl --location --request PUT
"https://management.azure.com/subscriptions/$SUBSCRIPTION_ID/resourceGroups/$RESOURCE_GROUP/providers/Micros
oft.MachineLearningServices/workspaces/$WORKSPACE/onlineEndpoints/$ENDPOINT_NAME?api-version=$API_VERSION" \
--header "Content-Type: application/json" \
--header "Authorization: Bearer $TOKEN" \
--data-raw "{
\"identity\": {
\"type\": \"systemAssigned\"
},
\"properties\": {
\"authMode\": \"AMLToken\"
},
\"location\": \"$LOCATION\"
}")
Create deployment
Create a deployment under the endpoint:
Next steps
Learn how to deploy your model using the Azure CLI.
Learn how to deploy your model using studio.
Learn to Troubleshoot online endpoints deployment and scoring
Learn how to Access Azure resources with a online endpoint and managed identity
Learn how to monitor online endpoints.
Learn Safe rollout for online endpoints.
View costs for an Azure Machine Learning managed online endpoint.
Managed online endpoints SKU list.
Learn about limits on managed online endpoints in Manage and increase quotas for resources with Azure
Machine Learning.
How to deploy an AutoML model to an online
endpoint
5/25/2022 • 4 minutes to read • Edit Online
Prerequisites
An AutoML-trained machine learning model. For more, see Tutorial: Train a classification model with no-code
AutoML in the Azure Machine Learning studio or Tutorial: Forecast demand with automated machine learning.
To deploy using these files, you can use either the studio or the Azure CLI.
Studio
CLI
Next steps
Troubleshooting online endpoints deployment
Safe rollout for online endpoints
Key and token-based authentication for online
endpoints
5/25/2022 • 2 minutes to read • Edit Online
When consuming an online endpoint from a client, you can use either a key or a token. Keys don't expire, tokens
do.
When deploying using CLI v2, set this value in the online endpoint YAML file. For more information, see How to
deploy an online endpoint.
When deploying using the Python SDK v2 (preview), use the OnlineEndpoint class.
curl --request POST "$SCORING_URI" --header "Authorization: Bearer $ENDPOINT_KEY" --header 'Content-Type:
application/json' --data @endpoints/online/model-1/sample-request.json
Next steps
Deploy a machine learning model using an online endpoint
Enable network isolation for managed online endpoints
Use network isolation with managed online
endpoints (preview)
5/25/2022 • 13 minutes to read • Edit Online
When deploying a machine learning model to a managed online endpoint, you can secure communication with
the online endpoint by using private endpoints. Using a private endpoint with online endpoints is currently a
preview feature.
IMPORTANT
This feature is currently in public preview. This preview version is provided without a service-level agreement, and it's not
recommended for production workloads. Certain features might not be supported or might have constrained capabilities.
For more information, see Supplemental Terms of Use for Microsoft Azure Previews.
You can secure the inbound scoring requests from clients to an online endpoint. You can also secure the
outbound communications between a deployment and the Azure resources used by the deployment. Security
for inbound and outbound communication is configured separately. For more information on endpoints and
deployments, see What are endpoints and deployments.
The following diagram shows how communications flow through private endpoints to the managed online
endpoint. Incoming scoring requests from clients are received through the workspace private endpoint from
your virtual network. Outbound communication with services is handled through private endpoints to those
service instances from the deployment:
Prerequisites
To use Azure machine learning, you must have an Azure subscription. If you don't have an Azure
subscription, create a free account before you begin. Try the free or paid version of Azure Machine
Learning today.
You must install and configure the Azure CLI and ML extension. For more information, see Install, set up,
and use the CLI (v2).
You must have an Azure Resource Group, in which you (or the service principal you use) need to have
Contributor access. You'll have such a resource group if you configured your ML extension per the above
article.
You must have an Azure Machine Learning workspace, and the workspace must use a private endpoint. If
you don't have one, the steps in this article create an example workspace, VNet, and VM. For more
information, see Configure a private endpoint for Azure Machine Learning workspace.
The Azure Container Registry for your workspace must be configured for Premium tier. For more
information, see Azure Container Registry service tiers.
The Azure Container Registry and Azure Storage Account must be in the same Azure Resource Group as
the workspace.
IMPORTANT
The end-to-end example in this article comes from the files in the azureml-examples GitHub repository. To clone the
samples repository and switch to the repository's cli/ directory, use the following commands:
Limitations
If your Azure Machine Learning workspace has a private endpoint that was created before May 24, 2022,
you must recreate the workspace's private endpoint before configuring your online endpoints to use a
private endpoint. For more information on creating a private endpoint for your workspace, see How to
configure a private endpoint for Azure Machine Learning workspace.
Secure outbound communication creates three private endpoints per deployment. One to Azure Blob
storage, one to Azure Container Registry, and one to your workspace.
Azure Log Analytics and Application Insights aren't supported when using network isolation with a
deployment. To see the logs for the deployment, use the az ml online-deployment get_logs command
instead.
NOTE
Requests to create, update, or retrieve the authentication keys are sent to the Azure Resource Manager over the public
network.
Inbound (scoring)
To secure scoring requests to the online endpoint to your virtual network, set the public_network_access flag for
the endpoint to disabled :
When public_network_access is disabled , inbound scoring requests are received using the private endpoint of
the Azure Machine Learning workspace and the endpoint can't be reached from public networks.
Outbound (resource access)
To restrict communication between a deployment and the Azure resources used to by the deployment, set the
egress_public_network_access flag to disabled . Use this flag to ensure that the download of the model, code,
and images needed by your deployment are secured with a private endpoint.
The following are the resources that the deployment communicates with over the private endpoint:
The Azure Machine Learning workspace.
The Azure Storage blob that is the default storage for the workspace.
The Azure Container Registry for the workspace.
When you configure the egress_public_network_access to disabled , a new private endpoint is created per
deployment, per service. For example, if you set the flag to disabled for three deployments to an online
endpoint, nine private endpoints are created. Each deployment would have three private endpoints that are used
to communicate with the workspace, blob, and container registry.
Scenarios
The following table lists the supported configurations when configuring inbound and outbound
communications for an online endpoint:
IN B O UN D O UT B O UN D
C O N F IGURAT IO N ( EN DP O IN T P RO P ERT Y ) ( DEP LO Y M EN T P RO P ERT Y ) SUP P O RT ED?
End-to-end example
Use the information in this section to create an example configuration that uses private endpoints to secure
online endpoints.
TIP
In this example, and Azure Virtual Machine is created inside the VNet. You connect to the VM using SSH, and run the
deployment from the VM. This configuration is used to simplify the steps in this example, and does not represent a typical
secure configuration. For example, in a production environment you would most likely use a VPN client or Azure
ExpressRoute to directly connect clients to the virtual network.
To create the resources, use the following Azure CLI commands. Replace <UNIQUE_SUFFIX> with a unique suffix
for the resources that are created.
# SUFFIX will be used as resource name suffix in created workspace and related resources
export SUFFIX="<UNIQUE_SUFFIX>"
# create vm
az vm create --name test-vm --vnet-name vnet-$SUFFIX --subnet snet-scoring --image UbuntuLTS --admin-
username azureuser --admin-password <your-new-password>
IMPORTANT
The VM created by these commands has a public endpoint that you can connect to over the public network.
The response from this command is similar to the following JSON document:
{
"fqdns": "",
"id": "/subscriptions/<GUID>/resourceGroups/<my-resource-
group>/providers/Microsoft.Compute/virtualMachines/test-vm",
"location": "westus",
"macAddress": "00-0D-3A-ED-D8-E8",
"powerState": "VM running",
"privateIpAddress": "192.168.0.12",
"publicIpAddress": "20.114.122.77",
"resourceGroup": "<my-resource-group>",
"zones": ""
}
Use the following command to connect to the VM using SSH. Replace publicIpAddress with the value of the
public IP address in the response from the previous command:
ssh azureusere@publicIpAddress
When prompted, enter the password you used when creating the VM.
Configure the VM
1. Use the following commands from the SSH session to install the CLI and Docker:
# setup docker
sudo apt-get update -y && sudo apt install docker.io -y && sudo snap install docker && docker --
version && sudo usermod -aG docker $USER
# setup az cli and ml extension
curl -sL https://aka.ms/InstallAzureCLIDeb | sudo bash && az extension add --upgrade -n ml -y
2. To create the environment variables used by this example, run the following commands. Replace
<YOUR_SUBSCRIPTION_ID> with your Azure subscription ID. Replace <YOUR_RESOURCE_GROUP> with the
resource group that contains your workspace. Replace <SUFFIX_USED_IN_SETUP> with the suffix you
provided earlier. Replace <LOCATION> with the location of your Azure workspace. Replace
<YOUR_ENDPOINT_NAME> with the name to use for the endpoint.
TIP
Use the tabs to select whether you want to perform a deployment using an MLflow model or generic ML model.
Generic model
MLflow model
export SUBSCRIPTION="<YOUR_SUBSCRIPTION_ID>"
export RESOURCE_GROUP="<YOUR_RESOURCE_GROUP>"
export LOCATION="<LOCATION>"
# SUFFIX that was used when creating the workspace resources. Alternatively the resource names can be
looked up from the resource group after the vnet setup script has completed.
export SUFFIX="<SUFFIX_USED_IN_SETUP>"
# SUFFIX used during the initial setup. Alternatively the resource names can be looked up from the
resource group after the setup script has completed.
export WORKSPACE=mlw-$SUFFIX
export ACR_NAME=cr$SUFFIX
# name of the image that will be built for this sample and pushed into acr - no need to change this
export IMAGE_NAME="img"
# Yaml files that will be used to create endpoint and deployment. These are relative to azureml-
examples/cli/ directory. Do not change these
export ENDPOINT_FILE_PATH="endpoints/online/managed/vnet/sample/endpoint.yml"
export DEPLOYMENT_FILE_PATH="endpoints/online/managed/vnet/sample/blue-deployment-vnet.yml"
export SAMPLE_REQUEST_PATH="endpoints/online/managed/vnet/sample/sample-request.json"
export ENV_DIR_PATH="endpoints/online/managed/vnet/sample/environment"
3. To sign in to the Azure CLI in the VM environment, use the following command:
az login
4. To configure the defaults for the CLI, use the following commands:
5. To clone the example files for the deployment, use the following command:
6. To build a custom docker image to use with the deployment, use the following commands:
TIP
You can test or debug the Docker image locally by using the --local flag when creating the deployment. For
more information, see the Deploy and debug locally article.
# create endpoint
az ml online-endpoint create --name $ENDPOINT_NAME -f $ENDPOINT_FILE_PATH --set
public_network_access="disabled"
# create deployment in managed vnet
az ml online-deployment create --name blue --endpoint $ENDPOINT_NAME -f $DEPLOYMENT_FILE_PATH --all-
traffic --set environment.image="$ACR_NAME.azurecr.io/repo/$IMAGE_NAME:v1"
egress_public_network_access="disabled"
2. To make a scoring request with the endpoint, use the following commands:
Cleanup
To delete the endpoint, use the following command:
To delete all the resources created in this article, use the following command. Replace <resource-group-name>
with the name of the resource group used in this example:
The response for this command is similar to the following JSON document:
{
"bypass": "AzureServices",
"defaultAction": "Deny",
"ipRules": [],
"virtualNetworkRules": []
}
If the value of bypass isn't AzureServices , use the guidance in the Configure key vault network settings to set it
to AzureServices .
Online deployments fail with an image download error
1. Check if the egress-public-network-access flag is disabled for the deployment. If this flag is enabled, and
the visibility of the container registry is private, then this failure is expected.
2. Use the following command to check the status of the private endpoint connection. Replace
<registry-name> with the name of the Azure Container Registry for your workspace:
In the response document, verify that the status field is set to Approved . If it isn't approved, use the
following command to approve it. Replace <private-endpoint-name> with the name returned from the
previous command:
nslookup endpointname.westcentralus.inference.ml.azure.com
The response contains an address . This address should be in the range provided by the virtual network.
3. If the host name isn't resolved by the nslookup command, check if an A record exists in the private DNS
zone for the virtual network. To check the records, use the following command:
TIP
This step isn't needed if you are using the azureml-model-deployment header in your request to target this
deployment.
The response from this command should list percentage of traffic assigned to deployments.
3. If the traffic assignments (or deployment header) are set correctly, use the following command to get the
logs for the endpoint. Replace <endpointname> with the name of the endpoint, and <deploymentname> with
the deployment:
Look through the logs to see if there's a problem running the scoring code when you submit a request to
the deployment.
Next steps
Safe rollout for online endpoints
How to autoscale managed online endpoints
View costs for an Azure Machine Learning managed online endpoint
Access Azure resources with a online endpoint and managed identity
Troubleshoot online endpoints deployment
Access Azure resources from an online endpoint
with a managed identity
5/25/2022 • 12 minutes to read • Edit Online
Prerequisites
To use Azure Machine Learning, you must have an Azure subscription. If you don't have an Azure
subscription, create a free account before you begin. Try the free or paid version of Azure Machine
Learning today.
Install and configure the Azure CLI and ML (v2) extension. For more information, see Install, set up, and
use the 2.0 CLI.
An Azure Resource group, in which you (or the service principal you use) need to have
User Access Administrator and Contributor access. You'll have such a resource group if you configured
your ML extension per the above article.
An Azure Machine Learning workspace. You'll have a workspace if you configured your ML extension per
the above article.
A trained machine learning model ready for scoring and deployment. If you are following along with the
sample, a model is provided.
If you haven't already set the defaults for the Azure CLI, save your default settings. To avoid passing in the
values for your subscription, workspace, and resource group multiple times, run this code:
Limitations
The identity for an endpoint is immutable. During endpoint creation, you can associate it with a system-
assigned identity (default) or a user-assigned identity. You can't change the identity after the endpoint has
been created.
$schema: https://azuremlschemas.azureedge.net/latest/managedOnlineEndpoint.schema.json
name: my-sai-endpoint
auth_mode: key
$schema: https://azuremlschemas.azureedge.net/latest/managedOnlineDeployment.schema.json
name: blue
model:
path: ../../model-1/model/
code_configuration:
code: ../../model-1/onlinescoring/
scoring_script: score_managedidentity.py
environment:
conda_file: ../../model-1/environment/conda.yml
image: mcr.microsoft.com/azureml/openmpi3.1.2-ubuntu18.04:20210727.v1
instance_type: Standard_DS2_v2
instance_count: 1
environment_variables:
STORAGE_ACCOUNT_NAME: "storage_place_holder"
STORAGE_CONTAINER_NAME: "container_place_holder"
FILE_NAME: "file_place_holder"
The following code exports these values as environment variables in your endpoint:
export WORKSPACE="<WORKSPACE_NAME>"
export LOCATION="<WORKSPACE_LOCATION>"
export ENDPOINT_NAME="<ENDPOINT_NAME>"
Next, specify what you want to name your blob storage account, blob container, and file. These variable names
are defined here, and are referred to in az storage account create and az storage container create commands
in the next section.
The following code exports those values as environment variables:
export STORAGE_ACCOUNT_NAME="<BLOB_STORAGE_TO_ACCESS>"
export STORAGE_CONTAINER_NAME="<CONTAINER_TO_ACCESS>"
export FILE_NAME="<FILE_TO_ACCESS>"
After these variables are exported, create a text file locally. When the endpoint is deployed, the scoring script will
access this text file using the system-assigned managed identity that's generated upon endpoint creation.
When you create an online endpoint, a system-assigned managed identity is automatically generated for you, so
no need to create a separate one.
WARNING
The identity for an endpoint is immutable. During endpoint creation, you can associate it with a system-assigned identity
(default) or a user-assigned identity. You can't change the identity after the endpoint has been created.
When you create an online endpoint, a system-assigned managed identity is created for the endpoint by default.
If you encounter any issues, see Troubleshooting online endpoints deployment and scoring.
You can allow the online endpoint permission to access your storage via its system-assigned managed identity
or give permission to the user-assigned managed identity to access the storage account created in the previous
section.
System-assigned managed identity
User-assigned managed identity
Retrieve the system-assigned managed identity that was created for your endpoint.
From here, you can give the system-assigned managed identity permission to access your storage.
import os
import logging
import json
import numpy
import joblib
import requests
def get_token():
access_token = None
msi_endpoint = os.environ.get("MSI_ENDPOINT", None)
msi_secret = os.environ.get("MSI_SECRET", None)
# If UAI_CLIENT_ID is provided then assume that endpoint was created with user assigned identity,
# # otherwise system assigned identity deployment.
client_id = os.environ.get("UAI_CLIENT_ID", None)
if client_id is not None:
token_url = (
msi_endpoint + f"?clientid={client_id}&resource=https://storage.azure.com/"
)
else:
token_url = msi_endpoint + f"?resource=https://storage.azure.com/"
def access_blob_storage():
logging.info("Trying to access blob storage...")
storage_account = os.environ.get("STORAGE_ACCOUNT_NAME")
storage_container = os.environ.get("STORAGE_CONTAINER_NAME")
file_name = os.environ.get("FILE_NAME")
logging.info(
f"storage_account: {storage_account}, container: {storage_container}, filename: {file_name}"
)
token = get_token()
blob_url = f"https://{storage_account}.blob.core.windows.net/{storage_container}/{file_name}?api-
version=2019-04-01"
auth_headers = {
"Authorization": f"Bearer {token}",
"x-ms-blob-type": "BlockBlob",
"x-ms-version": "2019-02-02",
}
resp = requests.get(blob_url, headers=auth_headers)
resp.raise_for_status()
logging.info(f"Blob containts: {resp.text}")
def init():
global model
# AZUREML_MODEL_DIR is an environment variable created during deployment.
# It is the path to the model folder (./azureml-models/$MODEL_NAME/$VERSION)
# For multiple models, it points to the folder containing all deployed models (./azureml-models)
# Please provide your model's folder name if there is one
model_path = os.path.join(
os.getenv("AZUREML_MODEL_DIR"), "model/sklearn_regression_model.pkl"
)
# deserialize the model file back into a sklearn model
model = joblib.load(model_path)
logging.info("Model loaded")
# Access Azure resource (Blob storage) using system assigned identity token
access_blob_storage()
logging.info("Init complete")
logging.info("Init complete")
WARNING
This deployment can take approximately 8-14 minutes depending on whether the underlying environment/image is being
built for the first time. Subsequent deployments using the same environment will go quicker.
NOTE
The value of the --name argument may override the name key inside the YAML file.
To refine the above query to only return specific data, see Query Azure CLI command output.
NOTE
The init method in the scoring script reads the file from your storage account using the system assigned managed
identity token.
To check the init method output, see the deployment log with the following code.
# Check deployment logs to confirm blob storage file contents read operation success.
az ml online-deployment get-logs --endpoint-name $ENDPOINT_NAME --name blue
When your deployment completes, the model, the environment, and the endpoint are registered to your Azure
Machine Learning workspace.
Confirm your endpoint deployed successfully
Once your online endpoint is deployed, confirm its operation. Details of inferencing vary from model to model.
For this guide, the JSON query parameters look like:
{"data": [
[1,2,3,4,5,6,7,8,9,10],
[10,9,8,7,6,5,4,3,2,1]
]}
Next steps
Deploy and score a machine learning model by using a online endpoint.
For more on deployment, see Safe rollout for online endpoints.
For more information on using the CLI, see Use the CLI extension for Azure Machine Learning.
To see which compute resources you can use, see Managed online endpoints SKU list.
For more on costs, see View costs for an Azure Machine Learning managed online endpoint.
For information on monitoring endpoints, see Monitor managed online endpoints.
For limitations for managed endpoints, see Manage and increase quotas for resources with Azure Machine
Learning.
Autoscale a managed online endpoint
5/25/2022 • 5 minutes to read • Edit Online
Autoscale automatically runs the right amount of resources to handle the load on your application. Managed
endpoints supports autoscaling through integration with the Azure Monitor autoscale feature.
Azure Monitor autoscaling supports a rich set of rules. You can configure metrics-based scaling (for instance,
CPU utilization >70%), schedule-based scaling (for example, scaling rules for peak business hours), or a
combination. For more information, see Overview of autoscale in Microsoft Azure.
Today, you can manage autoscaling using either the Azure CLI, REST, ARM, or the browser-based Azure portal.
Other Azure ML SDKs, such as the Python SDK, will add support over time.
Prerequisites
A deployed endpoint. Deploy and score a machine learning model by using a managed online endpoint.
Next, get the Azure Resource Manager ID of the deployment and endpoint:
NOTE
For more, see the reference page for autoscale
Azure CLI
Portal
The rule is part of the my-scale-settings profile ( autoscale-name matches the name of the profile). The value of
its condition argument says the rule should trigger when "The average CPU consumption among the VM
instances exceeds 70% for five minutes." When that condition is satisfied, two more VM instances are allocated.
NOTE
For more information on the CLI syntax, see az monitor autoscale .
Azure CLI
Portal
Delete resources
If you are not going to use your deployments, delete them:
APPLIES TO: Azure CLI ml extension v2 (current)
Next steps
To learn more about autoscale with Azure Monitor, see the following articles:
Understand autoscale settings
Overview of common autoscale patterns
Best practices for autoscale
Troubleshooting Azure autoscale
Managed online endpoints SKU list (preview)
5/25/2022 • 2 minutes to read • Edit Online
This table shows the VM SKUs that are supported for Azure Machine Learning managed online endpoints
(preview).
The instance_type attribute used for deployment must be specified in the form "Standard_F4s_v2". The
table below lists instance names, for example, F2s v2. These names should be put in the specified form (
Standard_{name} ) for Azure CLI or Azure Resource Manager templates (ARM templates) requests to
create and update deployments.
For more information on configuration details such as CPU and RAM, see Azure Machine Learning
Pricing.
IMPORTANT
If you use a Windows-based image for your deployment, we recommend using a VM SKU that provides a minimum of 4
cores.
C O M P UT E
SIZ E GEN ERA L P URP O SE O P T IM IZ ED M EM O RY O P T IM IZ ED GP U
Learn how to view costs for a managed online endpoint (preview). Costs for your endpoints will accrue to the
associated workspace. You can see costs for a specific endpoint using tags.
IMPORTANT
This article only applies to viewing costs for Azure Machine Learning managed online endpoints (preview). Managed
online endpoints are different from other resources since they must use tags to track costs. For more information on
viewing the costs of other Azure resources, see Quickstart: Explore and analyze costs with cost analysis.
Prerequisites
Deploy an Azure Machine Learning managed online endpoint (preview).
Have at least Billing Reader access on the subscription where the endpoint is deployed
View costs
Navigate to the Cost Analysis page for your subscription:
1. In the Azure portal, Select Cost Analysis for your subscription.
Create a filter to scope data to your Azure Machine learning workspace resource:
1. At the top navigation bar, select Add filter .
2. In the first filter dropdown, select Resource for the filter type.
3. In the second filter dropdown, select your Azure Machine Learning workspace.
Create a tag filter to show your managed online endpoint and/or managed online deployment:
1. Select Add filter > Tag > azuremlendpoint : "<your endpoint name>"
2. Select Add filter > Tag > azuremldeployment : "<your deployment name>".
NOTE
Dollar values in this image are fictitious and do not reflect actual costs.
Next steps
What are endpoints?
Learn how to monitor your managed online endpoint.
How to deploy managed online endpoints with the Azure CLI
How to deploy managed online endpoints with the studio
Monitor managed online endpoints
5/25/2022 • 2 minutes to read • Edit Online
In this article, you learn how to monitor Azure Machine Learning managed online endpoints. Use Application
Insights to view metrics and create alerts to stay up to date with your managed online endpoints.
In this article you learn how to:
View metrics for your managed online endpoint
Create a dashboard for your metrics
Create a metric alert
Prerequisites
Deploy an Azure Machine Learning managed online endpoint.
You must have at least Reader access on the endpoint.
View metrics
Use the following steps to view metrics for a managed endpoint or deployment:
1. Go to the Azure portal.
2. Navigate to the managed online endpoint or deployment resource.
Managed online endpoints and deployments are Azure Resource Manager (ARM) resources that can be
found by going to their owning resource group. Look for the resource types Machine Learning online
endpoint and Machine Learning online deployment .
3. In the left-hand column, select Metrics .
Available metrics
Depending on the resource that you select, the metrics that you see will be different. Metrics are scoped
differently for managed online endpoints and managed online deployments.
Metrics at endpoint scope
Request Latency
Request Latency P50 (Request latency at the 50th percentile)
Request Latency P90 (Request latency at the 90th percentile)
Request Latency P95 (Request latency at the 95th percentile)
Requests per minute
New connections per second
Active connection count
Network bytes
Split on the following dimensions:
Deployment
Status Code
Status Code Class
Bandwidth throttling
Bandwidth will be throttled if the limits are exceeded (see managed online endpoints section in Manage and
increase quotas for resources with Azure Machine Learning). To determine if requests are throttled:
Monitor the "Network bytes" metric
The response trailers will have the fields: ms-azureml-bandwidth-request-delay-ms and
ms-azureml-bandwidth-response-delay-ms . The values of the fields are the delays, in milliseconds, of the
bandwidth throttling.
Metrics at deployment scope
CPU Utilization Percentage
Deployment Capacity (the number of instances of the requested instance type)
Disk Utilization
GPU Memory Utilization (only applicable to GPU instances)
GPU Utilization (only applicable to GPU instances)
Memory Utilization Percentage
Split on the following dimension:
InstanceId
Create a dashboard
You can create custom dashboards to visualize data from multiple sources in the Azure portal, including the
metrics for your managed online endpoint. For more information, see Create custom KPI dashboards using
Application Insights.
Create an alert
You can also create custom alerts to notify you of important status updates to your managed online endpoint:
1. At the top right of the metrics page, select New aler t rule .
Next steps
Learn how to view costs for your deployed endpoint.
Read more about metrics explorer.
Debug online endpoints locally in Visual Studio
Code (preview)
5/25/2022 • 5 minutes to read • Edit Online
Prerequisites
This guide assumes you have the following items installed locally on your PC.
Docker
VS Code
Azure CLI
Azure CLI ml extension (v2)
For more information, see the guide on how to prepare your system to deploy managed online endpoints.
The examples in this article are based on code samples contained in the azureml-examples repository. To run the
commands locally without having to copy/paste YAML and other files, clone the repo and then change
directories to the cli directory in the repo:
git clone https://github.com/Azure/azureml-examples --depth 1
cd azureml-examples
cd cli
If you haven't already set the defaults for the Azure CLI, save your default settings. To avoid passing in the values
for your subscription, workspace, and resource group multiple times, use the following commands. Replace the
following parameters with values for your specific configuration:
Replace <subscription> with your Azure subscription ID.
Replace <workspace> with your Azure Machine Learning workspace name.
Replace <resource-group> with the Azure resource group that contains your workspace.
Replace <location> with the Azure region that contains your workspace.
TIP
You can see what your current defaults are by using the az configure -l command.
IMPORTANT
On Windows Subsystem for Linux (WSL), you'll need to update your PATH environment variable to include the path to the
VS Code executable or use WSL interop. For more information, see Windows interoperability with Linux.
A Docker image is built locally. Any environment configuration or model file errors are surfaced at this stage of
the process.
NOTE
The first time you launch a new or updated dev container it can take several minutes.
Once the image successfully builds, your dev container opens in a VS Code window.
You'll use a few VS Code extensions to debug your deployments in the dev container. Azure Machine Learning
automatically installs these extensions in your dev container.
Inference Debug
Pylance
Jupyter
Python
IMPORTANT
Before starting your debug session, make sure that the VS Code extensions have finished installing in your dev container.
TIP
The score.py script used by the endpoint deployed earlier is located at
azureml-samples/cli/endpoints/online/managed/sample/score.py in the repository you cloned. However, the
steps in this guide work with any scoring script.
In this case, <REQUEST-FILE> is a JSON file that contains input data samples for the model to make predictions
on similar to the following JSON:
{"data": [
[1,2,3,4,5,6,7,8,9,10],
[10,9,8,7,6,5,4,3,2,1]
]}
TIP
The scoring URI is the address where your endpoint listens for requests. Use the ml extension to get the scoring URI.
{
"auth_mode": "aml_token",
"location": "local",
"name": "my-new-endpoint",
"properties": {},
"provisioning_state": "Succeeded",
"scoring_uri": "http://localhost:5001/score",
"tags": {},
"traffic": {},
"type": "online"
}
At this point, any breakpoints in your run function are caught. Use the debug actions to step through your
code. For more information on debug actions, see the debug actions guide.
NOTE
Since the directory containing your code and endpoint assets is mounted onto the dev container, any changes you make
in the dev container are synced with your local file system.
For more extensive changes involving updates to your environment and endpoint configuration, use the ml
extension update command. Doing so will trigger a full image rebuild with your changes.
Once the updated image is built and your development container launches, use the VS Code debugger to test
and troubleshoot your updated endpoint.
Next steps
Deploy and score a machine learning model by using a managed online endpoint (preview)
Troubleshooting managed online endpoints deployment and scoring (preview)
Troubleshooting online endpoints deployment and
scoring
5/25/2022 • 14 minutes to read • Edit Online
Prerequisites
An Azure subscription . Try the free or paid version of Azure Machine Learning.
The Azure CLI.
The Install, set up, and use the CLI (v2).
Deploy locally
Local deployment is deploying a model to a local Docker environment. Local deployment is useful for testing
and debugging before deployment to the cloud.
TIP
Use Visual Studio Code to test and debug your endpoints locally. For more information, see debug online endpoints
locally in Visual Studio Code.
Local deployment supports creation, update, and deletion of a local endpoint. It also allows you to invoke and
get logs from the endpoint. To use local deployment, add --local to the appropriate CLI command:
Conda installation
Generally, issues with mlflow deployment stem from issues with the installation of the user environment
specified in the conda.yaml file.
To debug conda installation problems, try the following:
1. Check the logs for conda installation. If the container crashed or taking too long to start up, it is likely that
conda environment update has failed to resolve correctly.
2. Install the mlflow conda file locally with the command
conda env create -n userenv -f <CONDA_ENV_FILENAME> .
3. If there are errors locally, try resolving the conda environment and creating a functional one before
redeploying.
4. If the container crashes even if it resolves locally, the SKU size used for deployment may be too small.
a. Conda package installation occurs at runtime, so if the SKU size is too small to accommodate all of the
packages detailed in the conda.yaml environment file, then the container may crash.
b. A Standard_F4s_v2 VM is a good starting SKU size, but larger ones may be needed depending on
which dependencies are specified in the conda file.
or
Add --resource-group and --workspace-name to the commands above if you have not already set these
parameters via az configure .
To see information about how to set these parameters, and if current values are already set, run:
az ml online-deployment get-logs -h
By default the logs are pulled from the inference server. Logs include the console log from the inference server,
which contains print/log statements from your `score.py' code.
NOTE
If you use Python logging, ensure you use the correct logging level order for the messages to be published to logs. For
example, INFO.
You can also get logs from the storage initializer container by passing –-container storage-initializer . These
logs contain information on whether code and model data were successfully downloaded to the container.
Add --help and/or --debug to commands to see more information.
Request tracing
There are three supported tracing headers:
x-request-id is reserved for server tracing. We override this header to ensure it's a valid GUID.
NOTE
When you create a support ticket for a failed request, attach the failed request ID to expedite investigation.
x-ms-client-request-idis available for client tracing scenarios. We sanitize this header to remove non-
alphanumeric symbols. This header is truncated to 72 characters.
ERROR: OutOfCapacity
The specified VM Size failed to provision due to a lack of Azure Machine Learning capacity. Retry later or try
deploying to a different region.
ERROR: BadArgument
Below is a list of reasons you might run into this error:
Resource request was greater than limits
Startup task failed due to authorization error
Startup task failed due to incorrect role assignments on resource
Unable to download user container image
Unable to download user model or code artifacts
Resource requests greater than limits
Requests for resources must be less than or equal to limits. If you don't set limits, we set default values when
you attach your compute to an Azure Machine Learning workspace. You can check limits in the Azure portal or
by using the az ml compute show command.
Authorization error
After provisioning the compute resource, during deployment creation, Azure tries to pull the user container
image from the workspace private Azure Container Registry (ACR) and mount the user model and code artifacts
into the user container from the workspace storage account.
First, check if there is a permissions issue accessing ACR.
To pull blobs, Azure uses managed identities to access the storage account.
If you created the associated endpoint with SystemAssigned, Azure role-based access control (RBAC)
permission is automatically granted, and no further permissions are needed.
If you created the associated endpoint with UserAssigned, the user's managed identity must have Storage
blob data reader permission on the workspace storage account.
Unable to download user container image
It is possible that the user container could not be found. Check container logs to get more details.
Make sure container image is available in workspace ACR.
For example, if image is testacr.azurecr.io/azureml/azureml_92a029f831ce58d2ed011c3c42d35acb:latest check the
repository with
az acr repository show-tags -n testacr --repository azureml/azureml_92a029f831ce58d2ed011c3c42d35acb --
orderby time_desc --output table
.
Unable to download user model or code artifacts
It is possible that the user model or code artifacts can't be found. Check container logs to get more details.
Make sure model and code artifacts are registered to the same workspace as the deployment. Use the show
command to show details for a model or code artifact in a workspace.
For example:
You can also check if the blobs are present in the workspace storage account.
For example, if the blob is
https://foobar.blob.core.windows.net/210212154504-1517266419/WebUpload/210212154504-
1517266419/GaussianNB.pkl
, you can use this command to check if it exists:
az storage blob exists --account-name foobar --container-name 210212154504-1517266419 --name
WebUpload/210212154504-1517266419/GaussianNB.pkl --subscription <sub-name>
ERROR: ResourceNotReady
To run the score.py provided as part of the deployment, Azure creates a container that includes all the
resources that the score.py needs, and runs the scoring script on that container. The error in this scenario is
that this container is crashing when running, which means scoring can't happen. This error happens when:
There's an error in score.py . Use get-logs to help diagnose common problems:
A package that was imported but is not in the conda environment.
A syntax error.
A failure in the init() method.
If get-logs isn't producing any logs, it usually means that the container has failed to start. To debug this
issue, try deploying locally instead.
Readiness or liveness probes are not set up correctly.
There's an error in the environment setup of the container, such as a missing dependency.
ERROR: ResourceNotFound
This error occurs when Azure Resource Manager can't find a required resource. For example, you will receive
this error if a storage account was referred to but cannot be found at the path on which it was specified. Be sure
to double check resources that might have been supplied by exact path or the spelling of their names.
For more information, see Resolve resource not found errors.
ERROR: OperationCancelled
Azure operations have a certain priority level and are executed from highest to lowest. This error happens when
your operation happened to be overridden by another operation that has a higher priority. Retrying the
operation might allow it to be performed without cancellation.
ERROR: InternalServerError
Although we do our best to provide a stable and reliable service, sometimes things don't go according to plan. If
you get this error, it means that something isn't right on our side, and we need to fix it. Submit a customer
support ticket with all related information and we'll address the issue.
Autoscaling issues
If you are having trouble with autoscaling, see Troubleshooting Azure autoscale.
W H Y T H IS C O DE M IGH T GET
STAT US C O DE REA SO N P H RA SE RET URN ED
429 Too many pending requests Your model is getting more requests
than it can handle. We allow 2 *
max_concurrent_requests_per_instance
* instance_count requests at any
time. Additional requests are rejected.
You can confirm these settings in your
model deployment config under
request_settings and
scale_settings . If you are using
auto-scaling, your model is getting
requests faster than the system can
scale up. With auto-scaling, you can
try to resend requests with
exponential backoff. Doing so can give
the system time to adjust.
The response for this command is similar to the following JSON document:
{
"bypass": "AzureServices",
"defaultAction": "Deny",
"ipRules": [],
"virtualNetworkRules": []
}
If the value of bypass isn't AzureServices , use the guidance in the Configure key vault network settings to set it
to AzureServices .
Online deployments fail with an image download error
1. Check if the egress-public-network-access flag is disabled for the deployment. If this flag is enabled, and
the visibility of the container registry is private, then this failure is expected.
2. Use the following command to check the status of the private endpoint connection. Replace
<registry-name> with the name of the Azure Container Registry for your workspace:
In the response document, verify that the status field is set to Approved . If it isn't approved, use the
following command to approve it. Replace <private-endpoint-name> with the name returned from the
previous command:
nslookup endpointname.westcentralus.inference.ml.azure.com
The response contains an address . This address should be in the range provided by the virtual network.
3. If the host name isn't resolved by the nslookup command, check if an A record exists in the private DNS
zone for the virtual network. To check the records, use the following command:
TIP
This step isn't needed if you are using the azureml-model-deployment header in your request to target this
deployment.
The response from this command should list percentage of traffic assigned to deployments.
3. If the traffic assignments (or deployment header) are set correctly, use the following command to get the
logs for the endpoint. Replace <endpointname> with the name of the endpoint, and <deploymentname> with
the deployment:
Look through the logs to see if there's a problem running the scoring code when you submit a request to
the deployment.
Next steps
Deploy and score a machine learning model with a managed online endpoint
Safe rollout for online endpoints
Online endpoint YAML reference
Use batch endpoints for batch scoring
5/25/2022 • 18 minutes to read • Edit Online
Prerequisites
You must have an Azure subscription to use Azure Machine Learning. If you don't have an Azure
subscription, create a free account before you begin. Try the free or paid version of Azure Machine
Learning today.
Install the Azure CLI and the ml extension. Follow the installation steps in Install, set up, and use the CLI
(v2).
Create an Azure resource group if you don't have one, and you (or the service principal you use) must
have Contributor permission. For resource group creation, see Install, set up, and use the CLI (v2).
Create an Azure Machine Learning workspace if you don't have one. For workspace creation, see Install,
set up, and use the CLI (v2).
Configure your default workspace and resource group for the Azure CLI. Machine Learning CLI
commands require the --workspace/-w and --resource-group/-g parameters. Configure the defaults can
avoid passing in the values multiple times. You can override these on the command line. Run the
following code to set up your defaults. For more information, see Install, set up, and use the CLI (v2).
Set your endpoint name. Replace YOUR_ENDPOINT_NAME with a unique name within an Azure region.
For Unix, run this command:
export ENDPOINT_NAME="<YOUR_ENDPOINT_NAME>"
set ENDPOINT_NAME="<YOUR_ENDPOINT_NAME>"
NOTE
Batch endpoint names need to be unique within an Azure region. For example, there can be only one batch endpoint with
the name mybatchendpoint in westus2.
Create compute
Batch endpoint runs only on cloud computing resources, not locally. The cloud computing resource is a reusable
virtual computer cluster. Run the following code to create an Azure Machine Learning compute cluster. The
following examples in this article use the compute created here named batch-cluster . Adjust as needed and
reference your compute using azureml:<your-compute-name> .
NOTE
You are not charged for compute at this point as the cluster will remain at 0 nodes until a batch endpoint is invoked and a
batch scoring job is submitted. Learn more about manage and optimize cost for AmlCompute.
TIP
One of the batch deployments will serve as the default deployment for the endpoint. The default deployment will be used
to do the actual batch scoring when the endpoint is invoked. Learn more about batch endpoints and batch deployment.
The following YAML file defines a batch endpoint, which you can include in the CLI command for batch endpoint
creation. In the repository, this file is located at /cli/endpoints/batch/batch-endpoint.yml .
$schema: https://azuremlschemas.azureedge.net/latest/batchEndpoint.schema.json
name: mybatchedp
description: my sample batch endpoint
auth_mode: aad_token
The following table describes the key properties of the endpoint YAML. For the full batch endpoint YAML
schema, see CLI (v2) batch endpoint YAML schema.
K EY DESC RIP T IO N
$schema [Optional] The YAML schema. You can view the schema in
the above example in a browser to see all available options
for a batch endpoint YAML file.
defaults.deployment_name The name of the deployment that will serve as the default
deployment for the endpoint.
For more information about how to reference an Azure ML entity, see Referencing an Azure ML entity.
The example repository contains all the required files. The following YAML file defines a batch deployment with
all the required inputs and optional settings. You can include this file in your CLI command to create your batch
deployment. In the repository, this file is located at /cli/endpoints/batch/nonmlflow-deployment.yml .
$schema: https://azuremlschemas.azureedge.net/latest/batchDeployment.schema.json
name: nonmlflowdp
endpoint_name: mybatchedp
model:
path: ./mnist/model/
code_configuration:
code: ./mnist/code/
scoring_script: digit_identification.py
environment:
conda_file: ./mnist/environment/conda.yml
image: mcr.microsoft.com/azureml/openmpi3.1.2-ubuntu18.04:latest
compute: azureml:batch-cluster
resources:
instance_count: 1
max_concurrency_per_instance: 2
mini_batch_size: 10
output_action: append_row
output_file_name: predictions.csv
retry_settings:
max_retries: 3
timeout: 30
error_threshold: -1
logging_level: info
For the full batch deployment YAML schema, see CLI (v2) batch deployment YAML schema.
K EY DESC RIP T IO N
$schema [Optional] The YAML schema. You can view the schema in
the above example in a browser to see all available options
for a batch deployment YAML file.
model The model to be used for batch scoring. The example defines
a model inline using path . Model files will be automatically
uploaded and registered with an autogenerated name and
version. Follow the Model schema for more options. As a
best practice for production scenarios, you should create the
model separately and reference it here. To reference an
existing model, use the
azureml:<model-name>:<model-version> syntax.
code_configuration.code.path The local directory that contains all the Python source code
to score the model.
code_configuration.scoring_script The Python file in the above directory. This file must have an
init() function and a run() function. Use the init()
function for any costly or common preparation (for example,
load the model in memory). init() will be called only once
at beginning of process. Use run(mini_batch) to score
each entry; the value of mini_batch is a list of file paths.
The run() function should return a pandas DataFrame or
an array. Each returned element indicates one successful run
of input element in the mini_batch . For more information
on how to author scoring script, see Understanding the
scoring script.
compute The compute to run batch scoring. The example uses the
batch-cluster created at the beginning and reference it
using azureml:<compute-name> syntax.
output_file_name [Optional] The name of the batch scoring output file for
append_row output_action .
You can also create a batch endpoint using a YAML file. Add --file parameter in above command and specify
the YAML file path.
Create a batch deployment
Run the following code to create a batch deployment named nonmlflowdp under the batch endpoint and set it as
the default deployment.
TIP
The --set-default parameter sets the newly created deployment as the default deployment of the endpoint. It's a
convenient way to create a new default deployment of the endpoint, especially for the first deployment creation. As a best
practice for production scenarios, you may want to create a new deployment without setting it as default, verify it, and
update the default deployment later. For more information, see the Deploy a new model section.
To check a batch endpoint, run the following code. As the newly created deployment is set as the default
deployment, you should see nonmlflowdp in defaults.deployment_name from the response.
For more information about data URI, see Azure Machine Learning data reference URI.
The example uses publicly available data in a folder from
https://pipelinedata.blob.core.windows.net/sampledata/mnist , which contains thousands of hand-written
digits. Name of the batch scoring job will be returned from the invoke response. Run the following code
to invoke the batch endpoint using this data. --query name is added to only return the job name from the
invoke response, and it will be used later to Monitor batch scoring job execution progress and Check
batch scoring results. Remove --query name -o tsv if you want to see the full invoke response. For more
information on the --query parameter, see Query Azure CLI command output.
NOTE
If you are using existing V1 FileDataset for batch endpoint, we recommend migrating them to V2 data assets and refer
to them directly when invoking batch endpoints. Currently only data assets of type uri_folder or uri_file are
supported. Batch endpoints created with GA CLIv2 (2.4.0 and newer) or GA REST API (2022-05-01 and newer) will not
support V1 Dataset.
You can also extract the URI or path on datastore extracted from V1 FileDataset by using az ml dataset show
command with --query parameter and use that information for invoke.
While Batch endpoints created with earlier APIs will continue to support V1 FileDataset, we will be adding further V2
data assets support with the latest API versions for even more usability and flexibility. For more information on V2
data assets, see Work with data using SDK v2 (preview). For more information on the new V2 experience, see What is
v2.
IMPORTANT
You must use a unique output location. If the output file exists, the batch scoring job will fail.
Some settings can be overwritten when invoke to make best use of the compute resources and to improve
performance:
Use --instance-count to overwrite instance_count . For example, for larger volume of data inputs, you may
want to use more instances to speed up the end to end batch scoring.
Use --mini-batch-size to overwrite mini_batch_size . The number of mini batches is decided by total input
file counts and mini_batch_size. Smaller mini_batch_size generates more mini batches. Mini batches can be
run in parallel, but there might be extra scheduling and invocation overhead.
Use --set to overwrite other settings including max_retries , timeout , and error_threshold . These settings
might impact the end to end batch scoring time for different workloads.
To specify the output location and overwrite settings when invoke, run the following code. The example stores
the outputs in a folder with the same name as the endpoint in the workspace's default blob storage, and also
uses a random file name to ensure the output location uniqueness. The code should work in Unix. Replace with
your own unique folder and file name.
The scoring results in Storage Explorer are similar to the following sample page:
Notice that --set-default isn't used. If you show the batch endpoint again, you should see no change of the
defaults.deployment_name .
The example uses a model ( /cli/endpoints/batch/autolog_nyc_taxi ) trained and tracked with MLflow.
scoring_script and environment can be auto generated using model's metadata, no need to specify in the
YAML file. For more about MLflow, see Train and track ML models with MLflow and Azure Machine Learning.
Below is the YAML file the example uses to deploy an MLflow model, which only contains the minimum required
properties. The source file in repository is /cli/endpoints/batch/mlflow-deployment.yml .
$schema: https://azuremlschemas.azureedge.net/latest/batchDeployment.schema.json
name: mlflowdp
endpoint_name: mybatchedp
model:
path: ./autolog_nyc_taxi
compute: azureml:batch-cluster
NOTE
scoring_script and environment auto generation only supports Python Function model flavor and column-based
model signature.
Notice --deployment-name is used to specify the new deployment name. This parameter allows you to invoke a
non-default deployment, and it will not update the default deployment of the batch endpoint.
Update the default batch deployment
To update the default batch deployment of the endpoint, run the following code:
Now, if you show the batch endpoint again, you should see defaults.deployment_name is set to mlflowdp . You
can invoke the batch endpoint directly without the --deployment-name parameter.
(Optional) Update the deployment
If you want to update the deployment (for example, update code, model, environment, or settings), update the
YAML file, and then run az ml batch-deployment update . You can also update without the YAML file by using
--set . Check az ml batch-deployment update -h for more information.
Run the following code to delete the batch endpoint and all the underlying deployments. Batch scoring jobs
won't be deleted.
Next steps
Batch endpoints in studio
Deploy models with REST for batch scoring
Troubleshooting batch endpoints
How to use batch endpoints in Azure Machine
Learning studio
5/25/2022 • 3 minutes to read • Edit Online
In this article, you learn how to use batch endpoints to do batch scoring in Azure Machine Learning studio. For
more, see What are Azure Machine Learning endpoints?.
In this article, you learn about:
Create a batch endpoint with a no-code experience for MLflow model
Check batch endpoint details
Start a batch scoring job
Overview of batch endpoint features in Azure machine learning studio
Prerequisites
An Azure subscription - If you don't have an Azure subscription, create a free account before you begin.
Try the free or paid version of Azure Machine Learning today.
The example repository - Clone the AzureML Example repository. This article uses the assets in
/cli/endpoints/batch .
A compute target where you can run batch scoring workflows. For more information on creating a
compute target, see Create compute targets in Azure Machine Learning studio.
Register machine learning model.
OR
From the Models page, select the model you want to deploy and then select Deploy to batch
endpoint .
TIP
If you're using an MLflow model, you can use no-code batch endpoint creation. That is, you don't need to prepare a
scoring script and environment, both can be auto generated. For more, see Train and track ML models with MLflow and
Azure Machine Learning.
Complete all the steps in the wizard to create a batch endpoint and deployment.
Check batch endpoint details
After a batch endpoint is created, select it from the Endpoints page to view the details.
Start a batch scoring job
A batch scoring workload runs as an offline job. By default, batch scoring stores the scoring outputs in blob
storage. You can also configure the outputs location and overwrite some of the settings to get the best
performance.
1. Select + Create job :
2. You can update the default deployment while submitting a job from the drop-down:
Overwrite settings
Some settings can be overwritten when you start a batch scoring job. For example, you might overwrite settings
to make better use of the compute resource, or to improve performance. To override settings, select Override
deployment settings and provide the settings. For more information, see Use batch endpoints.
Start a batch scoring job with different input options
You have two options to specify the data inputs in Azure machine learning studio:
Use a registered dataset :
NOTE
During Preview, only FileDataset is supported.
OR
Use a datastore :
You can specify AML registered datastore or if your data is publicly available, specify the public path.
Configure the output location
By default, the batch scoring results are stored in the default blob store for the workspace. Results are in a folder
named after the job name (a system-generated GUID).
To change where the results are stored, providing a blob store and output path when you start a job.
IMPORTANT
You must use a unique output location. If the output file exists, the batch scoring job will fail.
OR
From the Models page, select the model you want to deploy. Then select Deploy to batch endpoint
option from the drop-down. In the wizard, on the Endpoint screen, select Existing . Complete the wizard
to add the new deployment.
Update the default deployment
If an endpoint has multiple deployments, one of the deployments is the default. The default deployment receives
100% of the traffic to the endpoint. To change the default deployment, use the following steps:
1. Select the endpoint from the Endpoints page.
2. Select Update default deployment . From the Details tab, select the deployment you want to set as default
and then select Update .
WARNING
Deleting an endpoint also deletes all deployments to that endpoint.
To delete a deployment , select the endpoint from the Endpoints page, select the deployment, and then select
delete.
Next steps
In this article, you learned how to create and call batch endpoints. See these other articles to learn more about
Azure Machine Learning:
Troubleshooting batch endpoints
Deploy and score a machine learning model with a managed online endpoint
Deploy models with REST for batch scoring
5/25/2022 • 11 minutes to read • Edit Online
Prerequisites
An Azure subscription for which you have administrative rights. If you don't have such a subscription, try
the free or paid personal subscription.
An Azure Machine Learning workspace.
A service principal in your workspace. Administrative REST requests use service principal authentication.
A service principal authentication token. Follow the steps in Retrieve a service principal authentication token
to retrieve this token.
The curl utility. The curl program is available in the Windows Subsystem for Linux or any UNIX distribution.
In PowerShell, curl is an alias for Invoke-WebRequest and curl -d "key=val" -X POST uri becomes
Invoke-WebRequest -Body "key=val" -Method POST -Uri uri .
The jq JSON processor.
IMPORTANT
The code snippets in this article assume that you are using the Bash shell.
The code snippets are pulled from the /cli/batch-score-rest.sh file in the AzureML Example repository.
The service provider uses the api-version argument to ensure compatibility. The api-version argument varies
from service to service. Set the API version as a variable to accommodate future versions:
API_VERSION="2022-05-01"
Create compute
Batch scoring runs only on cloud computing resources, not locally. The cloud computing resource is a reusable
virtual computer cluster where you can run batch scoring workflows.
Create a compute cluster:
TIP
If you want to use an existing compute instead, you must specify the full Azure Resource Manager ID when creating the
batch deployment. The full ID uses the format
/subscriptions/$SUBSCRIPTION_ID/resourceGroups/$RESOURCE_GROUP/providers/Microsoft.MachineLearningServices/workspaces/$WORKSPACE/computes/<y
compute-name>
.
BLOB_URI_ROOT="$BLOB_URI_PROTOCOL://$AZURE_STORAGE_ACCOUNT.blob.$BLOB_URI_ENDPOINT/$AZUREML_DEFAULT_CONTAINE
R"
TIP
You can also use other methods to upload, such as the Azure portal or Azure Storage Explorer.
Once you upload your code, you can specify your code with a PUT request:
\"modelUri\":\"azureml://subscriptions/$SUBSCRIPTION_ID/resourceGroups/$RESOURCE_GROUP/workspaces/$WORKSPACE
/datastores/$AZUREML_DEFAULT_DATASTORE/paths/model\"
}
}")
Create environment
The deployment needs to run in an environment that has the required dependencies. Create the environment
with a PUT request. Use a docker image from Microsoft Container Registry. You can configure the docker image
with image and add conda dependencies with condaFile .
Run the following code to read the condaFile defined in json. The source file is at
/cli/endpoints/batch/mnist/environment/conda.json in the example repository:
ENV_VERSION=$RANDOM
response=$(curl --location --request PUT
"https://management.azure.com/subscriptions/$SUBSCRIPTION_ID/resourceGroups/$RESOURCE_GROUP/providers/Micros
oft.MachineLearningServices/workspaces/$WORKSPACE/environments/mnist-env/versions/$ENV_VERSION?api-
version=$API_VERSION" \
--header "Authorization: Bearer $TOKEN" \
--header "Content-Type: application/json" \
--data-raw "{
\"properties\":{
\"condaFile\": $(echo \"$CONDA_FILE\"),
\"image\": \"mcr.microsoft.com/azureml/openmpi3.1.2-ubuntu18.04:latest\"
}
}")
Now, let's look at other options for invoking the batch endpoint. When it comes to input data, there are multiple
scenarios you can choose from, depending on the input type (whether you are specifying a folder or a single
file), and the URI type (whether you are using a path on Azure Machine Learning registered datastore, a
reference to Azure Machine Learning registered V2 data asset, or a public URI).
An InputData property has JobInputType and Uri keys. When you are specifying a single file, use
"JobInputType": "UriFile" , and when you are specifying a folder, use 'JobInputType": "UriFolder" .
When the file or folder is on Azure ML registered datastore, the syntax for the Uri is
azureml://datastores/<datastore-name>/paths/<path-on-datastore> for folder, and
azureml://datastores/<datastore-name>/paths/<path-on-datastore>/<file-name> for a specific file. You can
also use the longer form to represent the same path, such as
azureml://subscriptions/<subscription_id>/resourceGroups/<resource-group-name>/workspaces/<workspace-
name>/datastores/<datastore-name>/paths/<path-on-datastore>/
.
When the file or folder is registered as V2 data asset as uri_folder or uri_file , the syntax for the Uri
is \"azureml://data/<data-name>/versions/<data-version>/\" (short form) or
\"azureml://subscriptions/<subscription_id>/resourceGroups/<resource-group-
name>/workspaces/<workspace-name>/data/<data-name>/versions/<data-version>/\"
(long form).
When the file or folder is a publicly accessible path, the syntax for the URI is https://<public-path> for
folder, https://<public-path>/<file-name> for a specific file.
NOTE
For more information about data URI, see Azure Machine Learning data reference URI.
If you want to manage your data as Azure ML registered V2 data asset as uri_folder , you can follow the
two steps below:
1. Create the V2 data asset:
DATA_NAME="mnist"
DATA_VERSION=$RANDOM
If your data is a single file publicly available from the web, you can use the following snippet:
NOTE
We strongly recommend using the latest REST API version for batch scoring.
If you want to use local data, you can upload it to Azure Machine Learning registered datastore and use REST API for
Cloud data.
If you are using existing V1 FileDataset for batch endpoint, we recommend migrating them to V2 data assets and refer
to them directly when invoking batch endpoints. Currently only data assets of type uri_folder or uri_file are
supported. Batch endpoints created with GA CLIv2 (2.4.0 and newer) or GA REST API (2022-05-01 and newer) will not
support V1 Dataset.
You can also extract the URI or path on datastore extracted from V1 FileDataset by using az ml dataset show
command with --query parameter and use that information for invoke.
While Batch endpoints created with earlier APIs will continue to support V1 FileDataset, we will be adding further V2
data assets support with the latest API versions for even more usability and flexibility. For more information on V2
data assets, see Work with data using SDK v2 (preview). For more information on the new V2 experience, see What is
v2.
Following is the example snippet for configuring the output location for the batch scoring results.
response=$(curl --location --request POST $SCORING_URI \
--header "Authorization: Bearer $SCORING_TOKEN" \
--header "Content-Type: application/json" \
--data-raw "{
\"properties\": {
\"InputData\":
{
\"mnistInput\": {
\"JobInputType\" : \"UriFolder\",
\"Uri": \"azureml://datastores/workspaceblobstore/paths/$ENDPOINT_NAME/mnist\"
}
},
\"OutputData\":
{
\"mnistOutput\": {
\"JobOutputType\": \"UriFile\",
\"Uri\":
\"azureml://datastores/workspaceblobstore/paths/$ENDPOINT_NAME/mnistOutput/$OUTPUT_FILE_NAME\"
}
}
}
}")
IMPORTANT
You must use a unique output location. If the output file exists, the batch scoring job will fail.
TIP
The example invokes the default deployment of the batch endpoint. To invoke a non-default deployment, use the
azureml-model-deployment HTTP header and set the value to the deployment name. For example, using a parameter of
--header "azureml-model-deployment: $DEPLOYMENT_NAME" with curl.
Next steps
Learn how to deploy your model for batch scoring using the Azure CLI.
Learn how to deploy your model for batch scoring using studio.
Learn to Troubleshoot batch endpoints
Troubleshooting batch endpoints
5/25/2022 • 3 minutes to read • Edit Online
P RO B L EM P O SSIB L E SO L UT IO N
Code configuration or Environment is missing. Ensure you provide the scoring script and an environment
definition if you're using a non-MLflow model. No-code
deployment is supported for the MLflow model only. For
more, see Track ML models with MLflow and Azure Machine
Learning
Unsupported input data. Batch endpoint accepts input data in three forms: 1)
registered data 2) data in the cloud 3) data in local. Ensure
you're using the right format. For more, see Use batch
endpoints for batch scoring
Output already exists. If you configure your own output location, ensure you
provide a new output for each endpoint invocation.
1. Open the job in studio using the value returned by the above command.
2. Choose batchscoring
3. Open the Outputs + logs tab
4. Choose the log(s) you wish to review
Understand log structure
There are two top-level log folders, azureml-logs and logs .
The file ~/azureml-logs/70_driver_log.txt contains information from the controller that launches the scoring
script.
Because of the distributed nature of batch scoring jobs, there are logs from several different sources. However,
two combined files are created that provide high-level information:
~/logs/job_progress_overview.txt : This file provides high-level information about the number of mini-
batches (also known as tasks) created so far and the number of mini-batches processed so far. As the
mini-batches end, the log records the results of the job. If the job failed, it will show the error message
and where to start the troubleshooting.
~/logs/sys/master_role.txt : This file provides the principal node (also known as the orchestrator) view of
the running job. This log provides information on task creation, progress monitoring, the run result.
For a concise understanding of errors in your script there is:
~/logs/user/error.txt : This file will try to summarize the errors in your script.
For more information on errors in your script, there is:
~/logs/user/error/ : This file contains full stack traces of exceptions thrown while loading and running the
entry script.
When you need a full understanding of how each node executed the score script, look at the individual process
logs for each node. The process logs can be found in the sys/node folder, grouped by worker nodes:
~/logs/sys/node/<ip_address>/<process_name>.txt : This file provides detailed info about each mini-batch
as it's picked up or completed by a worker. For each mini-batch, this file includes:
The IP address and the PID of the worker process.
The total number of items, the number of successfully processed items, and the number of failed
items.
The start time, duration, process time, and run method time.
You can also view the results of periodic checks of the resource usage for each node. The log files and setup files
are in this folder:
~/logs/perf : Set --resource_monitor_interval to change the checking interval in seconds. The default
interval is 600 , which is approximately 10 minutes. To stop the monitoring, set the value to 0 . Each
<ip_address> folder includes:
os/ : Information about all running processes in the node. One check runs an operating system
command and saves the result to a file. On Linux, the command is ps .
%Y%m%d%H : The sub folder name is the time to hour.
processes_%M : The file ends with the minute of the checking time.
node_disk_usage.csv : Detailed disk usage of the node.
node_resource_usage.csv : Resource usage overview of the node.
processes_resource_usage.csv : Resource usage overview of each process.
# Get logging_level
arg_parser = argparse.ArgumentParser(description="Argument parser.")
arg_parser.add_argument("--logging_level", type=str, help="logging level")
args, unknown_args = arg_parser.parse_known_args()
print(args.logging_level)
The Azure Machine Learning inference HTTP server (preview) is a Python package that allows you to easily
validate your entry script ( score.py ) in a local development environment. If there's a problem with the scoring
script, the server will return an error. It will also return the location where the error occurred.
The server can also be used when creating validation gates in a continuous integration and deployment pipeline.
For example, start the server with the candidate script and run the test suite against the local endpoint.
Prerequisites
Requires: Python >=3.7
Installation
NOTE
To avoid package conflicts, install the server in a virtual environment.
To install the azureml-inference-server-http package , run the following command in your cmd/terminal:
mkdir server_quickstart
cd server_quickstart
virtualenv myenv
source myenv/bin/activate
4. Create your entry script ( score.py ). The following example creates a basic entry script:
echo '
import time
def init():
time.sleep(1)
def run(input_data):
return {"message":"Hello, World!"}
' > score.py
NOTE
The server is hosted on 0.0.0.0, which means it will listen to all IP addresses of the hosting machine.
curl -p 127.0.0.1:5001/score
Now you can modify the scoring script and test your changes by running the server again.
Server Routes
The server is listening on port 5001 at these routes.
NAME RO UT E
Score 127.0.0.1:5001/score
Server parameters
The following table contains the parameters accepted by the server:
Request flow
The following steps explain how the Azure Machine Learning inference HTTP server works handles incoming
requests:
1. A Python CLI wrapper sits around the server's network stack and is used to start the server.
2. A client sends a request to the server.
3. When a request is received, it goes through the WSGI server and is then dispatched to one of the workers.
Gunicorn is used on Linux .
Waitress is used on Windows .
4. The requests are then handled by a Flask app, which loads the entry script & any dependencies.
5. Finally, the request is sent to your entry script. The entry script then makes an inference call to the loaded
model and returns a response.
Which OS is supported?
The Azure Machine Learning inference server runs on Windows & Linux based operating systems.
Next steps
For more information on creating an entry script and deploying models, see How to deploy a model using
Azure Machine Learning.
Learn about Prebuilt docker images for inference
Convert custom ML models to MLflow formatted
models
5/25/2022 • 3 minutes to read • Edit Online
In this article, learn how to convert your custom ML model into MLflow format. MLflow is an open-source
library for managing the lifecycle of your machine learning experiments. In some cases, you might use a
machine learning framework without its built-in MLflow model flavor support. Due to this lack of built-in
MLflow model flavor, you cannot log or register the model with MLflow model fluent APIs. To resolve this, you
can convert your model to an MLflow format where you can leverage the following benefits of Azure Machine
Learning and MLflow models.
With Azure Machine Learning, MLflow models get the added benefits of,
No code deployment
Portability as an open source standard format
Ability to deploy both locally and on cloud
MLflow provides support for a variety of machine learning frameworks (scikit-learn, Keras, Pytorch, and more);
however, it might not cover every use case. For example, you may want to create an MLflow model with a
framework that MLflow does not natively support or you may want to change the way your model does pre-
processing or post-processing when running jobs.
If you didn't train your model with MLFlow and want to use Azure Machine Learning's MLflow no-code
deployment offering, you need to convert your custom model to MLFLow. Learn more about custom python
models and MLflow.
Prerequisites
Only the mlflow package installed is needed to convert your custom models to an MLflow format.
import mlflow.pyfunc
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split
PYTHON_VERSION = "{major}.{minor}.{micro}".format(major=version_info.major,
minor=version_info.minor,
micro=version_info.micro)
artifacts = {
"sklearn_model": sklearn_model_path
}
# create wrapper
class SKLearnWrapper(mlflow.pyfunc.PythonModel):
import cloudpickle
conda_env = {
'channels': ['defaults'],
'dependencies': [
'python={}'.format(PYTHON_VERSION),
'pip',
{
'pip': [
'mlflow',
'scikit-learn=={}'.format(sklearn.__version__),
'cloudpickle=={}'.format(cloudpickle.__version__),
],
},
],
'name': 'sklearn_env'
}
To ensure your newly saved MLflow formatted model didn't change during the save, you can load your model
and print out a test prediction to compare your original model.
The following code prints a test prediction from the mlflow formatted model and a test prediction from the
sklearn model that's saved to your disk for comparison.
loaded_model = mlflow.pyfunc.load_model(mlflow_pyfunc_model_path)
mlflow.start_run()
mlflow.pyfunc.log_model(artifact_path=mlflow_pyfunc_model_path,
loader_module=None,
data_path=None,
code_path=None,
python_model=SKLearnWrapper(),
registered_model_name="Custom_mlflow_model",
conda_env=conda_env,
artifacts=artifacts)
mlflow.end_run()
IMPORTANT
In some cases, you might use a machine learning framework without its built-in MLflow model flavor support. For
instance, the vaderSentiment library is a standard natural language processing (NLP) library used for sentiment analysis.
Since it lacks a built-in MLflow model flavor, you cannot log or register the model with MLflow model fluent APIs. See an
example on how to save, log and register a model that doesn't have a supported built-in MLflow model flavor.
Next steps
No-code deployment for Mlflow models
Learn more about MLflow and Azure Machine Learning
Work with Models in Azure Machine Learning
5/25/2022 • 6 minutes to read • Edit Online
APPLIES TO: Azure CLI ml extension v2 (current) Python SDK azure-ai-ml v2 (preview)
Azure Machine Learning allows you to work with different types of models. In this article, you'll learn about
using Azure Machine Learning to work with different model types (Custom, MLflow, and Triton), how to register
a model from different locations and how to use the SDK, UI and CLI to manage your models.
TIP
If you have model assets created using the SDK/CLI v1, you can still use those with SDK/CLI v2. For more information, see
the Consuming V1 Model Assets in V2 section.
Prerequisites
An Azure subscription - If you don't have an Azure subscription, create a free account before you begin. Try
the free or paid version of Azure Machine Learning today.
An Azure Machine Learning workspace.
The Azure Machine Learning SDK v2 for Python.
The Azure Machine Learning CLI v2.
file_model = Model(
path="mlflow-model/model.pkl",
type=ModelType.CUSTOM,
name="local-file-example",
description="Model created from local file."
)
ml_client.models.create_or_update(file_model)
Local model
Datastore
Job Output
To upload a model from your computer, Select Local and upload the model you want to save in the model
registry.
Next steps
Install and set up Python SDK v2 (preview)
No-code deployment for Mlflow models
Learn more about MLflow and Azure Machine Learning
Use GitHub Actions with Azure Machine Learning
5/25/2022 • 4 minutes to read • Edit Online
Get started with GitHub Actions to train a model on Azure Machine Learning.
NOTE
GitHub Actions for Azure Machine Learning are provided as-is, and are not fully supported by Microsoft. If you encounter
problems with a specific action, open an issue in the repository for the action. For example, if you encounter a problem
with the aml-deploy action, report the problem in the https://github.com/Azure/aml-deploy repo.
Prerequisites
An Azure account with an active subscription. Create an account for free.
A GitHub account. If you don't have one, sign up for free.
SEC T IO N TA SK S
Create repository
Create a new repository off the ML Ops with GitHub Actions and Azure Machine Learning template.
1. Open the template on GitHub.
2. Select Use this template .
3. Create a new repository from the template. Set the repository name to ml-learning or a name of your
choice.
Generate deployment credentials
You can create a service principal with the az ad sp create-for-rbac command in the Azure CLI. Run this
command with Azure Cloud Shell in the Azure portal or by selecting the Tr y it button.
In the example above, replace the placeholders with your subscription ID, resource group name, and app name.
The output is a JSON object with the role assignment credentials that provide access to your App Service app
similar to below. Copy this JSON object for later.
{
"clientId": "<GUID>",
"clientSecret": "<GUID>",
"subscriptionId": "<GUID>",
"tenantId": "<GUID>",
(...)
}
By default, the action expects a workspace.json file. If your JSON file has a different name, you can specify it
with the parameters_file input parameter. If there is not a file, a new one will be created with the repository
name.
The action writes the workspace Azure Resource Manager (ARM) properties to a config file, which will be picked
by all future Azure Machine Learning GitHub Actions. The file is saved to GITHUB_WORKSPACE/aml_arm_config.json .
Complete example
Train your model and deploy to Azure Machine Learning.
# Actions train a model on Azure Machine Learning
name: Azure Machine Learning training and deployment
on:
push:
branches:
- master
# paths:
# - 'code/*'
jobs:
train:
runs-on: ubuntu-latest
steps:
# Checks-out your repository under $GITHUB_WORKSPACE, so your job can access it
- name: Check Out Repository
id: checkout_repository
uses: actions/checkout@v2
Clean up resources
When your resource group and repository are no longer needed, clean up the resources you deployed by
deleting the resource group and your GitHub repository.
Next steps
Create and run machine learning pipelines with Azure Machine Learning SDK
Trigger applications, processes, or CI/CD workflows
based on Azure Machine Learning events (preview)
5/25/2022 • 8 minutes to read • Edit Online
In this article, you learn how to set up event-driven applications, processes, or CI/CD workflows based on Azure
Machine Learning events, such as failure notification emails or ML pipeline runs, when certain conditions are
detected by Azure Event Grid.
Azure Machine Learning manages the entire lifecycle of machine learning process, including model training,
model deployment, and monitoring. You can use Event Grid to react to Azure Machine Learning events, such as
the completion of training runs, the registration and deployment of models, and the detection of data drift, by
using modern serverless architectures. You can then subscribe and consume events such as run status changed,
run completion, model registration, model deployment, and data drift detection within a workspace.
When to use Event Grid for event driven actions:
Send emails on run failure and run completion
Use an Azure function after a model is registered
Streaming events from Azure Machine Learning to various of endpoints
Trigger an ML pipeline when drift is detected
Prerequisites
To use Event Grid, you need contributor or owner access to the Azure Machine Learning workspace you will
create events for.
For more information on event sources and event handlers, see What is Event Grid?
Event types for Azure Machine Learning
Azure Machine Learning provides events in the various points of machine learning lifecycle:
Microsoft.MachineLearningServices.DatasetDriftDetected Raised when a data drift detection job for two datasets is
completed
Microsoft.MachineLearningServices.RunCompleted
experiments/{ExperimentId}/runs/{RunId}
experiments/b1d7966c-f73a-4c68-b846-
992ace89551f/runs/my_exp1_1554835758_38dbaa94
Microsoft.MachineLearningServices.ModelRegistered
models/{modelName}: models/sklearn_regression_model:3
{modelVersion}
Microsoft.MachineLearningServices.ModelDeployed
endpoints/{serviceId} endpoints/my_sklearn_aks
Microsoft.MachineLearningServices.DatasetDriftDetected
datadrift/{data.DataDriftId}/run/{data.RunId}
datadrift/4e694bf5-712e-4e40-b06a-
d2a2755212d4/run/my_driftrun1_1550564444_fbbcdc0f
Microsoft.MachineLearningServices.RunStatusChanged
experiments/{ExperimentId}/runs/{RunId}
experiments/b1d7966c-f73a-4c68-b846-
992ace89551f/runs/my_exp1_1554835758_38dbaa94
Advanced filtering : Azure Event Grid also supports advanced filtering based on published event
schema. Azure Machine Learning event schema details can be found in Azure Event Grid event schema
for Azure Machine Learning. Some sample advanced filterings you can perform include:
For Microsoft.MachineLearningServices.ModelRegistered event, to filter model's tag value:
To learn more about how to apply filters, see Filter events for Event Grid.
3. Select the event type to consume. For example, the following screenshot has selected Model registered ,
Model deployed , Run completed , and Dataset drift detected :
4. Select the endpoint to publish the event to. In the following screenshot, Event hub is the selected
endpoint:
Once you have confirmed your selection, click Create . After configuration, these events will be pushed to your
endpoint.
Set up with the CLI
You can either install the latest Azure CLI, or use the Azure Cloud Shell that is provided as part of your Azure
subscription.
To install the Event Grid extension, use the following command from the CLI:
The following example demonstrates how to select an Azure subscription and creates e a new event subscription
for Azure Machine Learning:
# Subscribe to the machine learning workspace. This example uses EventHub as a destination.
az eventgrid event-subscription create --name {eventGridFilterName} \
--source-resource-id
/subscriptions/{subId}/resourceGroups/{RG}/providers/Microsoft.MachineLearningServices/workspaces/{wsName} \
--endpoint-type eventhub \
--endpoint
/subscriptions/{SubID}/resourceGroups/TestRG/providers/Microsoft.EventHub/namespaces/n1/eventhubs/EH1 \
--included-event-types Microsoft.MachineLearningServices.ModelRegistered \
--subject-begins-with "models/mymodelname"
Examples
Example: Send email alerts
Use Azure Logic Apps to configure emails for all your events. Customize with conditions and specify recipients
to enable collaboration and awareness across teams working together.
1. In the Azure portal, go to your Azure Machine Learning workspace and select the events tab from the left
bar. From here, select Logic apps .
2. Sign into the Logic App UI and select Machine Learning service as the topic type.
3. Select which event(s) to be notified for. For example, the following screenshot RunCompleted .
4. Next, add a step to consume this event and search for email. There are several different mail accounts you
can use to receive events. You can also configure conditions on when to send an email alert.
5. Select Send an email and fill in the parameters. In the subject, you can include the Event Type and
Topic to help filter events. You can also include a link to the workspace page for runs in the message
body.
6. To save this action, select Save As on the left corner of the page. From the right bar that appears, confirm
creation of this action.
Example: Data drift triggers retraining
Models go stale over time, and not remain useful in the context it is running in. One way to tell if it's time to
retrain the model is detecting data drift.
This example shows how to use event grid with an Azure Logic App to trigger retraining. The example triggers
an Azure Data Factory pipeline when data drift occurs between a model's training and serving datasets.
Before you begin, perform the following actions:
Set up a dataset monitor to detect data drift in a workspace
Create a published Azure Data Factory pipeline.
In this example, a simple Data Factory pipeline is used to copy files into a blob store and run a published
Machine Learning pipeline. For more information on this scenario, see how to set up a Machine Learning step in
Azure Data Factory
1. Start with creating the logic app. Go to the Azure portal, search for Logic Apps, and select create.
2. Fill in the requested information. To simplify the experience, use the same subscription and resource
group as your Azure Data Factory Pipeline and Azure Machine Learning workspace.
3. Once you have created the logic app, select When an Event Grid resource event occurs .
4. Login and fill in the details for the event. Set the Resource Name to the workspace name. Set the Event
Type to DatasetDriftDetected .
5. Add a new step, and search for Azure Data Factor y . Select Create a pipeline run .
6. Login and specify the published Azure Data Factory pipeline to run.
7. Save and create the logic app using the save button on the top left of the page. To view your app, go to
your workspace in the Azure portal and click on Events .
Now the data factory pipeline is triggered when drift occurs. View details on your data drift run and machine
learning pipeline on the new workspace portal.
Next steps
Learn more about Event Grid and give Azure Machine Learning events a try:
About Event Grid
Event schema for Azure Machine Learning
Create and run machine learning pipelines using
components with the Azure Machine Learning CLI
5/25/2022 • 11 minutes to read • Edit Online
Prerequisites
If you don't have an Azure subscription, create a free account before you begin. Try the free or paid
version of Azure Machine Learning.
You'll need an Azure Machine Learning workspace for your pipelines and associated resources
Install and set up the Azure CLI extension for Machine Learning
Clone the examples repository:
az ml compute list
Now, create a pipeline job defined in the pipeline.yml file with the following command. The compute target will
be referenced in the pipeline.yml file as azureml:cpu-cluster . If your compute target uses a different name,
remember to update it in the pipeline.yml file.
You should receive a JSON dictionary with information about the pipeline job, including:
K EY DESC RIP T IO N
status The status of the job. This will likely be Preparing at this
point.
Open the services.Studio.endpoint URL you'll see a graph visualization of the pipeline looks like below.
Understand the pipeline definition YAML
Let's take a look at the pipeline definition in the 3b_pipeline_with_data/pipeline.yml file.
$schema: https://azuremlschemas.azureedge.net/latest/pipelineJob.schema.json
type: pipeline
display_name: 3b_pipeline_with_data
description: Pipeline with 3 component jobs with data dependencies
compute: azureml:cpu-cluster
outputs:
final_pipeline_output:
mode: rw_mount
jobs:
component_a:
type: command
component: file:./componentA.yml
inputs:
component_a_input:
type: uri_folder
path: ./data
outputs:
component_a_output:
mode: rw_mount
component_b:
type: command
component: file:./componentB.yml
inputs:
component_b_input: ${{parent.jobs.component_a.outputs.component_a_output}}
outputs:
component_b_output:
mode: rw_mount
component_c:
type: command
component: file:./componentC.yml
inputs:
component_c_input: ${{parent.jobs.component_b.outputs.component_b_output}}
outputs:
component_c_output: ${{parent.outputs.final_pipeline_output}}
# mode: upload
Below table describes the most common used fields of pipeline YAML schema. See full pipeline YAML schema
here.
K EY DESC RIP T IO N
name: component_a
display_name: componentA
version: 1
inputs:
component_a_input:
type: uri_folder
outputs:
component_a_output:
type: uri_folder
code: ./componentA_src
environment:
image: python
command: >-
python hello.py --componentA_input ${{inputs.component_a_input}} --componentA_output
${{outputs.component_a_output}}
The most common used schema of the component YAML is described in below table. See full component YAML
schema here.
K EY DESC RIP T IO N
For the example in 3b_pipeline_with_data/componentA.yml, componentA has one data input and one data
output, which can be connected to other steps in the parent pipeline. All the files under code section in
component YAML will be uploaded to AzureML when submitting the pipeline job. In this example, files under
./componentA_src will be uploaded (line 16 in componentA.yml). You can see the uploaded source code in
Studio UI: double select the ComponentA step and navigate to Snapshot tab, as shown in below screenshot. We
can see it's a hello-world script just doing some simple printing, and write current datetime to the
componentA_output path. The component takes input and output through command line argument, and it's
handled in the hello.py using argparse .
Object input (of type uri_file , uri_folder , mltable , mlflow_model , custom_model ) can connect to other steps
in the parent pipeline job and hence pass data/model to other steps. In pipeline graph, the object type input will
render as a connection dot.
Literal value inputs ( string , number , integer , boolean ) are the parameters you can pass to the component
at run time. You can add default value of literal inputs under default field. For number and integer type, you
can also add minimum and maximum value of the accepted value using min and max fields. If the input value
exceeds the min and max, pipeline will fail at validation. Validation happens before you submit a pipeline job to
save your time. Validation works for CLI, Python SDK and designer UI. Below screenshot shows a validation
example in designer UI. Similarly, you can define allowed values in enum field.
If you want to add an input to a component, remember to edit three places: 1) inputs field in component YAML
2) command field in component YAML. 3) component source code to handle the command line input. It's marked
in green box in above screenshot.
Environment
Environment defines the environment to execute the component. It could be an AzureML environment(curated
or custom registered), docker image or conda environment. See examples below.
AzureML registered environment asset. It's referenced in component following
azureml:<environment-name>:<environment-version> syntax.
public docker image
conda file Conda file needs to be used together with a base image.
After these commands run to completion, you can see the components in Studio, under Asset -> Components:
Select a component. You'll see detailed information for each version of the component.
Under Details tab, you'll see basic information of the component like name, created by, version etc. You'll see
editable fields for Tags and Description. The tags can be used for adding rapidly searched keywords. The
description field supports Markdown formatting and should be used to describe your component's functionality
and basic use.
Under Jobs tab, you'll see the history of all jobs that use this component.
Manage components
You can check component details and manage the component using CLI (v2). Use az ml component -h to get
detailed instructions on component command. Below table lists all available commands. See more examples in
Azure CLI reference
C O M M A N DS DESC RIP T IO N
Next steps
Try out CLI v2 component example
Create and run machine learning pipelines using
components with the Azure Machine Learning SDK
v2 (Preview)
5/25/2022 • 15 minutes to read • Edit Online
Prerequisites
Complete the Quickstart: Get started with Azure Machine Learning if you don't already have an Azure
Machine Learning workspace.
A Python environment in which you've installed Azure Machine Learning Python SDK v2 - install
instructions - check the getting started section. This environment is for defining and controlling your
Azure Machine Learning resources and is separate from the environment used at runtime for training.
Clone examples repository
To run the training examples, first clone the examples repository and change into the sdk directory:
fashion_ds = Input(
path="wasbs://demo@data4mldemo6150520719.blob.core.windows.net/mnist-fashion/"
)
By defining an Input , you create a reference to the data source location. The data remains in its existing
location, so no extra storage cost is incurred.
@command_component(
name="prep_data",
version="1",
display_name="Prep Data",
description="Convert data to CSV file, and split to training and test data",
environment=dict(
conda_file=Path(__file__).parent / "conda.yaml",
image="mcr.microsoft.com/azureml/openmpi3.1.2-ubuntu18.04",
),
)
def prepare_data_component(
input_data: Input(type="uri_folder"),
training_data: Output(type="uri_folder"),
test_data: Output(type="uri_folder"),
):
convert(
os.path.join(input_data, "train-images-idx3-ubyte"),
os.path.join(input_data, "train-labels-idx1-ubyte"),
os.path.join(training_data, "mnist_train.csv"),
60000,
)
convert(
os.path.join(input_data, "t10k-images-idx3-ubyte"),
os.path.join(input_data, "t10k-labels-idx1-ubyte"),
os.path.join(test_data, "mnist_test.csv"),
10000,
)
f.read(16)
l.read(8)
images = []
for i in range(n):
image = [ord(l.read(1))]
for j in range(28 * 28):
image.append(ord(f.read(1)))
images.append(image)
The code above define a component with display name Prep Data using @command_component decorator:
name is the unique identifier of the component.
version is the current version of the component. A component can have multiple versions.
display_name is a friendly display name of the component in UI, which isn't unique.
description usually describes what task this component can complete.
environment specifies the run-time environment for this component. The environment of this component
specifies a docker image and refers to the conda.yaml file.
The prepare_data_component function defines one input for input_data and two outputs for training_data
and test_data . input_data is input data path. training_data and test_data are output data paths for
training data and test data.
This component converts the data from input_data into a training data csv to training_data and a test data
csv to test_data .
@command_component(
name="prep_data",
version="1",
display_name="Prep Data",
The above code creates an object of Environment class, which represents the runtime environment in which the
component runs.
The conda.yaml file contains all packages used for the component like following:
name: imagekeras_prep_conda_env
channels:
- defaults
dependencies:
- python=3.7.11
- pip=20.0
- pip:
- mldesigner
Now, you've prepared all source files for the Prep Data component.
Create the train-model component
In this section, you'll create a component for training the image classification model in the python function like
the Prep Data component.
The difference is that since the training logic is more complicated, you can put the original training code in a
separate Python file.
The source files of this component are under train/ folder in the AzureML Examples repo. This folder contains
three files to construct the component:
train.py : contains the actual logic to train model.
train_component.py : defines the interface of the component and imports the function in train.py .
conda.yaml : defines the run-time environment of the component.
import os
from pathlib import Path
from mldesigner import command_component, Input, Output
@command_component(
name="train_image_classification_keras",
version="1",
display_name="Train Image Classification Keras",
description="train image classification with keras",
environment=dict(
conda_file=Path(__file__).parent / "conda.yaml",
image="mcr.microsoft.com/azureml/openmpi3.1.2-ubuntu18.04",
),
)
def keras_train_component(
input_data: Input(type="uri_folder"),
output_model: Output(type="uri_folder"),
epochs=10,
):
# avoid dependency issue, execution logic is in train() func in train.py file
from train import train
The code above define a component with display name Train Image Classification Keras using
@command_component :
The keras_train_component function defines one input input_data where training data comes from, one
input epochs specifying epochs during training, and one output output_model where outputs the model file.
The default value of epochs is 10. The execution logic of this component is from train() function in
train.py above.
Now, you've prepared all source files for the Train Image Classification Keras component.
Create the score -model component
In this section, other than the previous components, you'll create a component to score the trained model via
Yaml specification and script.
If you're following along with the example in the AzureML Examples repo, the source files are already available
in score/ folder. This folder contains three files to construct the component:
score.py : contains the source code of the component.
score.yaml : defines the interface and other details of the component.
conda.yaml : defines the run-time environment of the component.
import argparse
from pathlib import Path
import numpy as np
import pandas as pd
import os
import matplotlib.pyplot as plt
import mlflow
def get_file(f):
f = Path(f)
if f.is_file():
return f
else:
files = list(f.iterdir())
if len(files) == 1:
return files[0]
else:
raise Exception("********This path contains more than one file*******")
def parse_args():
# setup argparse
parser = argparse.ArgumentParser()
# add arguments
parser.add_argument(
"--input_data", type=str, help="path containing data for scoring"
)
parser.add_argument(
"--input_model", type=str, default="./", help="input path for model"
)
parser.add_argument(
"--output_result", type=str, default="./", help="output path for model"
)
# parse args
args = parser.parse_args()
# return args
return args
test_file = get_file(input_data)
data_test = pd.read_csv(test_file, header=None)
# Load model
files = [f for f in os.listdir(input_model) if f.endswith(".h5")]
model = load_model(input_model + "/" + files[0])
# Output result
np.savetxt(output_result + "/predict_result.csv", y_result, delimiter=",")
def main(args):
score(args.input_data, args.input_model, args.output_result)
# run script
if __name__ == "__main__":
# parse args
args = parse_args()
$schema: https://azuremlschemas.azureedge.net/latest/commandComponent.schema.json
type: command
name: score_image_classification_keras
display_name: Score Image Classification Keras
inputs:
input_data:
type: uri_folder
input_model:
type: uri_folder
outputs:
output_result:
type: uri_folder
code: ./
command: python score.py --input_data ${{inputs.input_data}} --input_model ${{inputs.input_model}} --
output_result ${{outputs.output_result}}
environment:
conda_file: ./conda.yaml
image: mcr.microsoft.com/azureml/openmpi3.1.2-ubuntu18.04
name is the unique identifier of the component. Its display name is Score Image Classification Keras .
This component has two inputs and one output.
The source code path of it's defined in the code section and when the component is run in cloud, all files
from that path will be uploaded as the snapshot of this component.
The command section specifies the command to execute while running this component.
The environment section contains a docker image and a conda yaml file.
Specify component run-time environment
The score component uses the same image and conda.yaml file as the train component. The source file is in the
sample repository.
Now, you've got all source files for score-model component.
For score component defined by yaml, you can use load_component() function to load.
# define a pipeline containing 3 nodes: Prepare data node, train node, and score node
@pipeline(
default_compute=cpu_compute_target,
)
def image_classification_keras_minist_convnet(pipeline_input_data):
"""E2E image classification pipeline with keras using python sdk."""
prepare_data_node = prepare_data_component(input_data=pipeline_input_data)
train_node = keras_train_component(
input_data=prepare_data_node.outputs.training_data
)
train_node.compute = gpu_compute_target
score_node = keras_score_component(
input_data=prepare_data_node.outputs.test_data,
input_model=train_node.outputs.output_model,
)
# create a pipeline
pipeline_job = image_classification_keras_minist_convnet(pipeline_input_data=fashion_ds)
The pipeline has a default compute cpu_compute_target , which means if you don't specify compute for a specific
node, that node will run on the default compute.
The pipeline has a pipeline level input pipeline_input_data . You can assign value to pipeline input when you
submit a pipeline job.
The pipeline contains three nodes, prepare_data_node, train_node and score_node.
The input_data of prepare_data_node uses the value of pipeline_input_data .
The input_data of train_node is from the training_data output of the prepare_data_node.
The input_data of score_node is from the test_data output of prepare_data_node, and the input_model
is from the output_model of train_node.
Since train_node will train a CNN model, you can specify its compute as the gpu_compute_target, which
can improve the training performance.
try:
credential = DefaultAzureCredential()
# Check if given credential can get token successfully.
credential.get_token("https://management.azure.com/.default")
except Exception as ex:
# Fall back to InteractiveBrowserCredential in case DefaultAzureCredential not work
credential = InteractiveBrowserCredential()
IMPORTANT
This code snippet expects the workspace configuration json file to be saved in the current directory or its parent. For more
information on creating a workspace, see Create and manage Azure Machine Learning workspaces. For more information
on saving the configuration to file, see Create a workspace configuration file.
pipeline_job = ml_client.jobs.create_or_update(
pipeline_job, experiment_name="pipeline_samples"
)
pipeline_job
The code above submit this image classification pipeline job to experiment called pipeline_samples . It will auto
create the experiment if not exists. The pipeline_input_data uses fashion_ds .
The call to pipeline_job produces output similar to:
The call to submit the Experiment completes quickly, and produces output similar to:
You can monitor the pipeline run by opening the link or you can block until it completes by running:
IMPORTANT
The first pipeline run takes roughly 15 minutes. All dependencies must be downloaded, a Docker image is created, and the
Python environment is provisioned and created. Running the pipeline again takes significantly less time because those
resources are reused instead of created. However, total run time for the pipeline depends on the workload of your scripts
and the processes that are running in each pipeline step.
You can check the logs and outputs of each component by right clicking the component, or select the
component to open its detail pane. To learn more about how to debug your pipeline in UI, see How to use studio
UI to build and debug Azure ML pipelines.
Using , you can get a registered component by name and version. Using
ml_client.components.get()
ml_client.compoennts.create_or_update() , you can register a component previously loaded from python
function or yaml.
Next steps
For more examples of how to build pipelines by using the machine learning SDK, see the example repository.
For how to use studio UI to submit and debug your pipeline, refer to how to create pipelines using
component in the UI.
For how to use Azure Machine Learning CLI to create components and pipelines, refer to how to create
pipelines using component with CLI.
Create and run machine learning pipelines using
components with the Azure Machine Learning
studio (Preview)
5/25/2022 • 3 minutes to read • Edit Online
IMPORTANT
This feature is currently in public preview. This preview version is provided without a service-level agreement, and it's not
recommended for production workloads. Certain features might not be supported or might have constrained capabilities.
For more information, see Supplemental Terms of Use for Microsoft Azure Previews.
Prerequisites
If you don't have an Azure subscription, create a free account before you begin. Try the free or paid
version of Azure Machine Learning.
You'll need an Azure Machine Learning workspace for your pipelines and associated resources
Install and set up the Azure CLI extension for Machine Learning
Clone the examples repository:
3. After register component successfully, you can see your component in the studio UI.
Select the Gear icon at the top right of the canvas to open the Settings pane. Select the default
compute target for your pipeline.
IMPORTANT
Attached compute is not supported, use compute instances or clusters instead.
3. In asset library, you can see Data assets and Components tabs. Switch to Components tab, you can
see the components registered from previous section.
Drag the components and drop on the canvas. By default it will use the default version of the component,
and you can change to a specific version in the right pane of component if your component has multiple
versions.
4. Connect the upstream component output ports to the downstream component input ports.
5. Select one component, you'll see a right pane where you can configure the component.
For components with primitive type inputs like number, integer, string and boolean, you can change
values of such inputs in the component detailed pane.
You can also change the output settings and compute target where this component run in the right pane.
NOTE
Currently registered components and the designer built-in components cannot be used together.
Submit pipeline
1. Select submit, and fill in the required information for your pipeline job.
2. After submit successfully, you'll see a job detail page link in the left page. Select Job detail to go to
pipeline job detail page for checking status and debugging.
NOTE
The Submitted jobs list only contains pipeline jobs submitted during an active session. A page reload will clear
out the content.
Next steps
Use these Jupyter notebooks on GitHub to explore machine learning pipelines further
Learn how to use CLI v2 to create pipeline using components.
Learn how to use SDK v2 (preview) to create pipeline using components
How to do hyperparameter tuning in pipeline (V2)
(preview)
5/25/2022 • 6 minutes to read • Edit Online
APPLIES TO: Azure CLI ml extension v2 (current) Python SDK azure-ai-ml v2 (preview)
In this article, you'll learn how to do hyperparameter tuning in Azure Machine Learning pipeline.
Prerequisite
1. Understand what is hyperparameter tuning and how to do hyperparameter tuning in Azure Machine
Learning use SweepJob.
2. Understand what is a Azure Machine Learning pipeline
3. Build a command component that takes hyperparameter as input.
predict_step:
type: command
inputs:
model: ${{parent.jobs.sweep_step.outputs.model_output}}
test_data: ${{parent.jobs.sweep_step.outputs.test_data}}
outputs:
predict_result:
component: file:./predict.yml
The sweep_step is the step for hyperparameter tuning. Its type needs to be sweep . And trial refers to the
command component defined in train.yaml . From the search sapce field we can see three hyparmeters (
c_value , kernel , and coef ) are added to the search space. After you submit this pipeline job, Azure Machine
Learning will run the trial component multiple times to sweep over hyperparameters based on the search space
and terminate policy you defined in sweep_step . Check sweep job YAML schema for full schema of sweep job.
Below is the trial component definition (train.yml file).
$schema: https://azuremlschemas.azureedge.net/latest/commandComponent.schema.json
type: command
name: train_model
display_name: train_model
version: 1
inputs:
data:
type: uri_folder
c_value:
type: number
default: 1.0
kernel:
type: string
default: rbf
degree:
type: integer
default: 3
gamma:
type: string
default: scale
coef0:
type: number
default: 0
shrinking:
type: boolean
default: false
probability:
type: boolean
default: false
tol:
type: number
default: 1e-3
cache_size:
type: number
default: 1024
verbose:
type: boolean
default: false
max_iter:
type: integer
default: -1
decision_function_shape:
type: string
default: ovr
break_ties:
type: boolean
default: false
random_state:
type: integer
default: 42
outputs:
model_output:
type: mlflow_model
test_data:
type: uri_folder
code: ./train-src
environment: azureml:AzureML-sklearn-0.24-ubuntu18.04-py37-cpu@latest
command: >-
python train.py
--data ${{inputs.data}}
--C ${{inputs.c_value}}
--kernel ${{inputs.kernel}}
--kernel ${{inputs.kernel}}
--degree ${{inputs.degree}}
--gamma ${{inputs.gamma}}
--coef0 ${{inputs.coef0}}
--shrinking ${{inputs.shrinking}}
--probability ${{inputs.probability}}
--tol ${{inputs.tol}}
--cache_size ${{inputs.cache_size}}
--verbose ${{inputs.verbose}}
--max_iter ${{inputs.max_iter}}
--decision_function_shape ${{inputs.decision_function_shape}}
--break_ties ${{inputs.break_ties}}
--random_state ${{inputs.random_state}}
--model_output ${{outputs.model_output}}
--test_data ${{outputs.test_data}}
The hyperparameters added to search space in pipeline.yml need to be inputs for the trial component. The
source code of the trial component is under ./train-src folder. In this example, it's a single train.py file. This
is the code that will be executed in every trial of the sweep job. Make sure you've logged the metrics in the trial
component source code with exactly the same name as primary_metric value in pipeline.yml file. In this
example, we use mlflow.autolog() , which is the recommended way to track your ML experiments. See more
about mlflow here
Below code snippet is the source code of trial component.
# imports
import os
import mlflow
import argparse
import pandas as pd
from pathlib import Path
# define functions
def main(args):
# enable auto logging
mlflow.autolog()
# setup parameters
params = {
"C": args.C,
"kernel": args.kernel,
"degree": args.degree,
"gamma": args.gamma,
"coef0": args.coef0,
"shrinking": args.shrinking,
"probability": args.probability,
"tol": args.tol,
"cache_size": args.cache_size,
"class_weight": args.class_weight,
"verbose": args.verbose,
"max_iter": args.max_iter,
"decision_function_shape": args.decision_function_shape,
"break_ties": args.break_ties,
"random_state": args.random_state,
}
# read in data
df = pd.read_csv(args.data)
# process data
X_train, X_test, y_train, y_test = process_data(df, args.random_state)
# train model
model = train_model(params, X_train, X_test, y_train, y_test)
# Output the model and test data
# write to local folder first, then copy to output folder
mlflow.sklearn.save_model(model, "model")
copy_tree(from_directory, to_directory)
# train/test split
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=random_state
)
# return model
return model
def parse_args():
# setup arg parser
parser = argparse.ArgumentParser()
# add arguments
parser.add_argument("--data", type=str)
parser.add_argument("--C", type=float, default=1.0)
parser.add_argument("--kernel", type=str, default="rbf")
parser.add_argument("--degree", type=int, default=3)
parser.add_argument("--gamma", type=str, default="scale")
parser.add_argument("--coef0", type=float, default=0)
parser.add_argument("--shrinking", type=bool, default=False)
parser.add_argument("--probability", type=bool, default=False)
parser.add_argument("--tol", type=float, default=1e-3)
parser.add_argument("--cache_size", type=float, default=1024)
parser.add_argument("--class_weight", type=dict, default=None)
parser.add_argument("--verbose", type=bool, default=False)
parser.add_argument("--max_iter", type=int, default=-1)
parser.add_argument("--decision_function_shape", type=str, default="ovr")
parser.add_argument("--break_ties", type=bool, default=False)
parser.add_argument("--random_state", type=int, default=42)
parser.add_argument("--model_output", type=str, help="Path of output model")
parser.add_argument("--test_data", type=str, help="Path of output model")
# parse args
args = parser.parse_args()
# return args
return args
# run script
if __name__ == "__main__":
# parse args
args = parse_args()
Python SDK
The python SDK example can be found in azureml-example repo. Navigate to azureml-
examples/sdk/jobs/pipelines/1c_pipeline_with_hyperparameter_sweep to check the example.
In Azure Machine Learning Python SDK v2, you can enable hyperparameter tuning for any command
component by calling .sweep() method.
Below code snippet shows how to enable sweep for train_model .
train_component_func = load_component(path="./train.yml")
score_component_func = load_component(path="./predict.yml")
# define a pipeline
@pipeline(default_compute="cpu-cluster")
def pipeline_with_hyperparameter_sweep():
"""Tune hyperparameters using sample components."""
train_model = train_component_func(
data=Input(
type="uri_file",
path="wasbs://datasets@azuremlexamples.blob.core.windows.net/iris.csv",
),
c_value=Uniform(min_value=0.5, max_value=0.9),
kernel=Choice(["rbf", "linear", "poly"]),
coef0=Uniform(min_value=0.1, max_value=1),
degree=3,
gamma="scale",
shrinking=False,
probability=False,
tol=0.001,
cache_size=1024,
verbose=False,
max_iter=-1,
decision_function_shape="ovr",
break_ties=False,
random_state=42,
)
sweep_step = train_model.sweep(
primary_metric="training_f1_score",
goal="minimize",
sampling_algorithm="random",
compute="cpu-cluster",
)
sweep_step.set_limits(max_total_trials=20, max_concurrent_trials=10, timeout=7200)
score_data = score_component_func(
model=sweep_step.outputs.model_output, test_data=sweep_step.outputs.test_data
)
pipeline_job = pipeline_with_hyperparameter_sweep()
We first load train_component_func defined in train.yml file. When creating train_model , we add c_value ,
kernel and coef0 into search space(line 15-17). Line 30-35 defines the primary metric, sampling algorithm
etc.
This will link you to the sweep job page as seen in the below screenshot. Navigate to child run tab, here you
can see the metrics of all child runs and list of all child runs.
If a child runs failed, select the name of that child run to enter detail page of that specific child run (see
screenshot below). The useful debug information is under Outputs + Logs .
Sample notebooks
Build pipeline with sweep node
Run hyperparameter sweep on a command job
Next steps
Track an experiment
Deploy a trained model
How to use studio UI to build and debug Azure
Machine Learning pipelines
5/25/2022 • 3 minutes to read • Edit Online
Azure Machine Learning studio provides UI to build and debug your pipeline. You can use components to author
a pipeline in the designer, and you can debug your pipeline in the job detail page.
This article will introduce how to use the studio UI to build and debug machine learning pipelines.
Then you can drag and drop either built-in components or custom components to the canvas. You can construct
your pipeline or configure your components in any order. Just hide the right pane to construct your pipeline
first, and open the right pane to configure your component.
NOTE
Currently built-in components and custom components cannot be used together.
Submit pipeline
Now you've built your pipeline. Select Submit button above the canvas, and configure your pipeline job.
After you submit your pipeline job, you'll see a submitted job list in the left pane, which shows all the pipeline
job you create from the current pipeline draft in the same session. There's also notification popping up from the
notification center. You can select through the pipeline job link in the submission list or the notification to check
pipeline job status or debugging.
NOTE
Pipeline job status and results will not be filled back to the authoring page.
If you want to try a few different parameter values for the same pipeline, you can change values and submit for
multiple times, without having to waiting for the running status.
NOTE
The submission list only contains jobs submitted in the same session. If you refresh current page, it will not preserve the
previous submitted job list.
On the pipeline job detail page, you can check the status of the overall job and each node inside, and logs of
each node.
If you don't see those folders, this is due to the compute run time update isn't released to the compute
cluster yet, and you can look at 70_driver_log.txt under azureml-logs folder first.
You can edit your pipeline and then submit again. After submitting, you can see the lineage between the job you
submit and the original job by selecting Show lineage in the job detail page.
Next steps
In this article, you learned the key features in how to create, explore, and debug a pipeline in UI. To learn more
about how you can use the pipeline, see the following articles:
How to train a model in the designer
How to deploy model to real-time endpoint in the designer
What is machine learning component
Enable logging in Azure Machine Learning designer
pipelines
5/25/2022 • 2 minutes to read • Edit Online
In this article, you learn how to add logging code to designer pipelines. You also learn how to view those logs
using the Azure Machine Learning studio web portal.
For more information on logging metrics using the SDK authoring experience, see Monitor Azure ML
experiment runs and metrics.
2. Paste the following code into the Execute Python Script code editor to log the mean absolute error for
your trained model. You can use a similar pattern to log any other value in the designer:
APPLIES TO: Python SDK azureml v1
# dataframe1 contains the values from Evaluate Model
def azureml_main(dataframe1=None, dataframe2=None):
print(f'Input pandas.DataFrame #1: {dataframe1}')
run = Run.get_context()
# Log the mean absolute error to the parent run to see the metric in the run details page.
# Note: 'run.parent.log()' should not be called multiple times because of performance issues.
# If repeated calls are necessary, cache 'run.parent' as a local variable and call 'log()' on
that variable.
parent_run = Run.get_context().parent
# Log left output port result of Evaluate Model. This also works when evaluate only 1 model.
parent_run.log(name='Mean_Absolute_Error (left port)', value=dataframe1['Mean_Absolute_Error']
[0])
# Log right output port result of Evaluate Model. The following line should be deleted if you
only connect one Score component to the` left port of Evaluate Model component.
parent_run.log(name='Mean_Absolute_Error (right port)', value=dataframe1['Mean_Absolute_Error']
[1])
return dataframe1,
This code uses the Azure Machine Learning Python SDK to log values. It uses Run.get_context() to get the context
of the current run. It then logs values to that context with the run.parent.log() method. It uses parent to log
values to the parent pipeline run rather than the component run.
For more information on how to use the Python SDK to log values, see Enable logging in Azure ML training
runs.
View logs
After the pipeline run completes, you can see the Mean_Absolute_Error in the Experiments page.
1. Navigate to the Experiments section.
2. Select your experiment.
3. Select the run in your experiment you want to view.
4. Select Metrics .
Next steps
In this article, you learned how to use logs in the designer. For next steps, see these related articles:
Learn how to troubleshoot designer pipelines, see Debug & troubleshoot ML pipelines.
Learn how to use the Python SDK to log metrics in the SDK authoring experience, see Enable logging in
Azure ML training runs.
Learn how to use Execute Python Script in the designer.
Transform data in Azure Machine Learning designer
5/25/2022 • 6 minutes to read • Edit Online
In this article, you learn how to transform and save datasets in Azure Machine Learning designer so that you can
prepare your own data for machine learning.
You will use the sample Adult Census Income Binary Classification dataset to prepare two datasets: one dataset
that includes adult census information from only the United States and another dataset that includes census
information from non-US adults.
In this article, you learn how to:
1. Transform a dataset to prepare it for training.
2. Export the resulting datasets to a datastore.
3. View results.
This how-to is a prerequisite for the how to retrain designer models article. In that article, you will learn how to
use the transformed datasets to train multiple models with pipeline parameters.
IMPORTANT
If you do not see graphical elements mentioned in this document, such as buttons in studio or designer, you may not
have the right level of permissions to the workspace. Please contact your Azure subscription administrator to verify that
you have been granted the correct level of access. For more information, see Manage users and roles.
Transform a dataset
In this section, you learn how to import the sample dataset and split the data into US and non-US datasets. For
more information on how to import your own data into the designer, see how to import data.
Import data
Use the following steps to import the sample dataset.
1. Sign in to ml.azure.com, and select the workspace you want to work with.
2. Go to the designer. Select Easy-to-use-prebuild components to create a new pipeline.
3. Select a default compute target to run the pipeline.
4. To the left of the pipeline canvas is a palette of datasets and components. Select Datasets . Then view the
Samples section.
5. Drag and drop the Adult Census Income Binar y classification dataset onto the canvas.
6. Right-click the Adult Census Income dataset component, and select Visualize > Dataset output
7. Use the data preview window to explore the dataset. Take special note of the "native-country" column
values.
Split the data
In this section, you use the Split Data component to identify and split rows that contain "United-States" in the
"native-country" column.
1. In the component palette to the left of the canvas, expand the Data Transformation section and find the
Split Data component.
2. Drag the Split Data component onto the canvas, and drop the component below the dataset component.
3. Connect the dataset component to the Split Data component.
4. Select the Split Data component.
5. In the component details pane to the right of the canvas, set Splitting mode to Regular Expression .
6. Enter the Regular Expression : \"native-country" United-States .
The Regular expression mode tests a single column for a value. For more information on the Split Data
component, see the related algorithm component reference page.
Your pipeline should look like this:
NOTE
This article assumes that you have access to a datastore registered to the current Azure Machine Learning
workspace. For instructions on how to setup a datastore, see Connect to Azure storage services.
If you don't have a datastore, you can create one now. For example purposes, this article will save the
datasets to the default blob storage account associated with the workspace. It will save the datasets into
the azureml container in a new folder called data .
6. Select the Expor t Data component connected to the right-most port of the Split Data component.
7. In the component details pane to the right of the canvas, set the following options:
Datastore type : Azure Blob Storage
Datastore : Select the same datastore as above
Path : /data/non-us-income
9. Confirm the Expor t Data component connected to the right port has the Path /data/non-us-income .
Your pipeline and settings should look like this:
.
Submit the run
Now that your pipeline is setup to split and export the data, submit a pipeline run.
1. At the top of the canvas, select Submit .
2. In the Set up pipeline run dialog, select Create new to create an experiment.
Experiments logically group together related pipeline runs. If you run this pipeline in the future, you
should use the same experiment for logging and tracking purposes.
3. Provide a descriptive experiment name like "split-census-data".
4. Select Submit .
View results
After the pipeline finishes running, you can view your results by navigating to your blob storage in the Azure
portal. You can also view the intermediary results of the Split Data component to confirm that your data has
been split correctly.
1. Select the Split Data component.
2. In the component details pane to the right of the canvas, select Outputs + logs .
Clean up resources
Skip this section if you want to continue on with part 2 of this how to, Retrain models with Azure Machine
Learning designer.
IMPORTANT
You can use the resources that you created as prerequisites for other Azure Machine Learning tutorials and how-to
articles.
Delete everything
If you don't plan to use anything that you created, delete the entire resource group so you don't incur any
charges.
1. In the Azure portal, select Resource groups on the left side of the window.
To delete a dataset, go to the storage account by using the Azure portal or Azure Storage Explorer and manually
delete those assets.
Next steps
In this article, you learned how to transform a dataset and save it to a registered datastore.
Continue to the next part of this how-to series with Retrain models with Azure Machine Learning designer to
use your transformed datasets and pipeline parameters to train machine learning models.
Use pipeline parameters in the designer to build
versatile pipelines
5/25/2022 • 5 minutes to read • Edit Online
Use pipeline parameters to build flexible pipelines in the designer. Pipeline parameters let you dynamically set
values at runtime to encapsulate pipeline logic and reuse assets.
Pipeline parameters are especially useful when resubmitting a pipeline run, retraining models, or performing
batch predictions.
In this article, you learn how to do the following:
Create pipeline parameters
Delete and manage pipeline parameters
Trigger pipeline runs while adjusting pipeline parameters
Prerequisites
An Azure Machine Learning workspace. See Create an Azure Machine Learning workspace.
For a guided introduction to the designer, complete the designer tutorial.
IMPORTANT
If you do not see graphical elements mentioned in this document, such as buttons in studio or designer, you may not
have the right level of permissions to the workspace. Please contact your Azure subscription administrator to verify that
you have been granted the correct level of access. For more information, see Manage users and roles.
NOTE
Pipeline parameters only support basic data types like int , float , and string .
After you create a pipeline parameter, you must attach it to the component parameter that you want to
dynamically set.
Option 2: Promote a component parameter
The simplest way to create a pipeline parameter for a component value is to promote a component parameter.
Use the following steps to promote a component parameter to a pipeline parameter:
1. Select the component you want to attach a pipeline parameter to.
2. In the component detail pane, mouseover the parameter you want to specify.
3. Select the ellipses (...) that appear.
4. Select Add to pipeline parameter .
5. Enter a parameter name and default value.
6. Select Save
You can now specify new values for this parameter anytime you submit this pipeline.
Option 3: Promote a dataset to a pipeline parameter
If you want to submit your pipeline with variable datasets, you must promote your dataset to a pipeline
parameter:
1. Select the dataset you want to turn into a pipeline parameter.
2. In the detail panel of dataset, check Set as pipeline parameter .
You can now specify a different dataset by using the pipeline parameter the next time you run the pipeline.
NOTE
Deleting a pipeline parameter will cause all attached component parameters to be detached and the value of
detached component parameters will keep current pipeline parameter value.
Next steps
In this article, you learned how to create pipeline parameters in the designer. Next, see how you can use pipeline
parameters to retrain models or perform batch predictions.
You can also learn how to use pipelines programmatically with the SDK.
Use pipeline parameters to retrain models in the
designer
5/25/2022 • 4 minutes to read • Edit Online
In this how-to article, you learn how to use Azure Machine Learning designer to retrain a machine learning
model using pipeline parameters. You will use published pipelines to automate your workflow and set
parameters to train your model on new data. Pipeline parameters let you re-use existing pipelines for different
jobs.
In this article, you learn how to:
Train a machine learning model.
Create a pipeline parameter.
Publish your training pipeline.
Retrain your model with new parameters.
Prerequisites
An Azure Machine Learning workspace
Complete part 1 of this how-to series, Transform data in the designer
IMPORTANT
If you do not see graphical elements mentioned in this document, such as buttons in studio or designer, you may not
have the right level of permissions to the workspace. Please contact your Azure subscription administrator to verify that
you have been granted the correct level of access. For more information, see Manage users and roles.
This article also assumes that you have some knowledge of building pipelines in the designer. For a guided
introduction, complete the tutorial.
Sample pipeline
The pipeline used in this article is an altered version of a sample pipeline Income prediction in the designer
homepage. The pipeline uses the Import Data component instead of the sample dataset to show you how to
train models using your own data.
Create a pipeline parameter
Pipeline parameters are used to build versatile pipelines which can be resubmitted later with varying parameter
values. Some common scenarios are updating datasets or some hyper-parameters for retraining. Create pipeline
parameters to dynamically set variables at runtime.
Pipeline parameters can be added to data source or component parameters in a pipeline. When the pipeline is
resubmitted, the values of these parameters can be specified.
For this example, you will change the training data path from a fixed value to a parameter, so that you can
retrain your model on different data. You can also add other component parameters as pipeline parameters
according to your use case.
1. Select the Impor t Data component.
NOTE
This example uses the Import Data component to access data in a registered datastore. However, you can follow
similar steps if you use alternative data access patterns.
2. In the component detail pane, to the right of the canvas, select your data source.
3. Enter the path to your data. You can also select Browse path to browse your file tree.
4. Mouseover the Path field, and select the ellipses above the Path field that appear.
5. Select Add to pipeline parameter .
6. Provide a parameter name and a default value.
7. Select Save .
NOTE
You can also detach a component parameter from pipeline parameter in the component detail pane, similar to
adding pipeline parameters.
You can inspect and edit your pipeline parameters by selecting the Settings gear icon next to the title of your
pipeline draft.
After detaching, you can delete the pipeline parameter in the Setings pane.
You can also add a pipeline parameter in the Settings pane, and then apply it on some component parameter.
NOTE
You can publish multiple pipelines to a single endpoint. Each pipeline in a given endpoint is given a version
number, which you can specify when you call the pipeline endpoint.
3. Select Publish .
Retrain your model
Now that you have a published training pipeline, you can use it to retrain your model on new data. You can
submit runs from a pipeline endpoint from the studio workspace or programmatically.
Submit runs by using the studio portal
Use the following steps to submit a parameterized pipeline endpoint run from the studio portal:
1. Go to the Endpoints page in your studio workspace.
2. Select the Pipeline endpoints tab. Then, select your pipeline endpoint.
3. Select the Published pipelines tab. Then, select the pipeline version that you want to run.
4. Select Submit .
5. In the setup dialog box, you can specify the parameters values for the run. For this example, update the data
path to train your model using a non-US dataset.
Next steps
In this article, you learned how to create a parameterized training pipeline endpoint using the designer.
For a complete walkthrough of how you can deploy a model to make predictions, see the designer tutorial to
train and deploy a regression model.
For how to publish and submit a run to pipeline endpoint using SDK, see this article.
Run batch predictions using Azure Machine
Learning designer
5/25/2022 • 5 minutes to read • Edit Online
In this article, you learn how to use the designer to create a batch prediction pipeline. Batch prediction lets you
continuously score large datasets on-demand using a web service that can be triggered from any HTTP library.
In this how-to, you learn to do the following tasks:
Create and publish a batch inference pipeline
Consume a pipeline endpoint
Manage endpoint versions
To learn how to set up batch scoring services using the SDK, see the accompanying tutorial on pipeline batch
scoring.
NOTE
Azure Machine Learning Endpoints (preview) provide an improved, simpler deployment experience. Endpoints support
both real-time and batch inference scenarios. Endpoints provide a unified interface to invoke and manage model
deployments across compute types. See What are Azure Machine Learning endpoints (preview)?.
Prerequisites
This how-to assumes you already have a training pipeline. For a guided introduction to the designer, complete
part one of the designer tutorial.
IMPORTANT
If you do not see graphical elements mentioned in this document, such as buttons in studio or designer, you may not
have the right level of permissions to the workspace. Please contact your Azure subscription administrator to verify that
you have been granted the correct level of access. For more information, see Manage users and roles.
NOTE
Currently auto-generating inference pipeline only works for training pipeline built purely by the designer built-in
components.
It will create a batch inference pipeline draft for you. The batch inference pipeline draft uses the trained
model as MD- node and transformation as TD- node from the training pipeline job.
You can also modify this inference pipeline draft to better handle your input data for batch inference.
Add a pipeline parameter
To create predictions on new data, you can either manually connect a different dataset in this pipeline draft view
or create a parameter for your dataset. Parameters let you change the behavior of the batch inferencing process
at runtime.
In this section, you create a dataset parameter to specify a different dataset to make predictions on.
1. Select the dataset component.
2. A pane will appear to the right of the canvas. At the bottom of the pane, select Set as pipeline
parameter .
Enter a name for the parameter, or accept the default value.
3. Submit the batch inference pipeline and go to job detail page by selecting the job link in the left pane.
Publish your batch inference pipeline
Now you're ready to deploy the inference pipeline. This will deploy the pipeline and make it available for others
to use.
1. Select the Publish button.
2. In the dialog that appears, expand the drop-down for PipelineEndpoint , and select New
PipelineEndpoint .
3. Provide an endpoint name and optional description.
Near the bottom of the dialog, you can see the parameter you configured with a default value of the
dataset ID used during training.
4. Select Publish .
Consume an endpoint
Now, you have a published pipeline with a dataset parameter. The pipeline will use the trained model created in
the training pipeline to score the dataset you provide as a parameter.
Submit a pipeline run
In this section, you'll set up a manual pipeline run and alter the pipeline parameter to score new data.
1. After the deployment is complete, go to the Endpoints section.
2. Select Pipeline endpoints .
3. Select the name of the endpoint you created.
1. Select Published pipelines .
This screen shows all published pipelines published under this endpoint.
2. Select the pipeline you published.
The pipeline details page shows you a detailed run history and connection string information for your
pipeline.
3. Select Submit to create a manual run of the pipeline.
4. Change the parameter to use a different dataset.
5. Select Submit to run the pipeline.
Use the REST endpoint
You can find information on how to consume pipeline endpoints and published pipeline in the Endpoints
section.
You can find the REST endpoint of a pipeline endpoint in the run overview panel. By calling the endpoint, you're
consuming its default published pipeline.
You can also consume a published pipeline in the Published pipelines page. Select a published pipeline and
you can find the REST endpoint of it in the Published pipeline over view panel to the right of the graph.
To make a REST call, you'll need an OAuth 2.0 bearer-type authentication header. See the following tutorial
section for more detail on setting up authentication to your workspace and making a parameterized REST call.
Versioning endpoints
The designer assigns a version to each subsequent pipeline that you publish to an endpoint. You can specify the
pipeline version that you want to execute as a parameter in your REST call. If you don't specify a version number,
the designer will use the default pipeline.
When you publish a pipeline, you can choose to make it the new default pipeline for that endpoint.
You can also set a new default pipeline in the Published pipelines tab of your endpoint.
3. Find the previous batch inference pipeline draft, or you can just Clone the published pipeline into a new
draft.
4. Replace the MD- node in the inference pipeline draft with the registered data in the step above.
5. Updating data transformation node TD- is the same as the trained model.
6. Then you can submit the inference pipeline with the updated model and transformation, and publish
again.
Next steps
Follow the designer tutorial to train and deploy a regression model.
For how to publish and run a published pipeline using SDK, see the How to deploy pipelines article.
Run Python code in Azure Machine Learning
designer
5/25/2022 • 2 minutes to read • Edit Online
In this article, you learn how to use the Execute Python Script component to add custom logic to Azure Machine
Learning designer. In the following how-to, you use the Pandas library to do simple feature engineering.
You can use the in-built code editor to quickly add simple Python logic. If you want to add more complex code or
upload additional Python libraries, you should use the zip file method.
The default execution environment uses the Anacondas distribution of Python. For a complete list of pre-
installed packages, see the Execute Python Script component reference page.
IMPORTANT
If you do not see graphical elements mentioned in this document, such as buttons in studio or designer, you may not
have the right level of permissions to the workspace. Please contact your Azure subscription administrator to verify that
you have been granted the correct level of access. For more information, see Manage users and roles.
3. Take note of which input port you use. The designer assigns the left input port to the variable dataset1
and the middle input port to dataset2 .
Input components are optional since you can generate or import data directly in the Execute Python Script
component.
Write your Python code
The designer provides an initial entry point script for you to edit and enter your own Python code.
In this example, you use Pandas to combine two columns found in the automobile dataset, Price and
Horsepower , to create a new column, Dollars per horsepower . This column represents how much you pay
for each horsepower, which could be a useful feature to decide if a car is a good deal for the money.
1. Select the Execute Python Script component.
2. In the pane that appears to the right of the canvas, select the Python script text box.
3. Copy and paste the following code into the text box.
import pandas as pd
Next steps
Learn how to import your own data in Azure Machine Learning designer.
Manage and increase quotas for resources with
Azure Machine Learning
5/25/2022 • 9 minutes to read • Edit Online
Azure uses limits and quotas to prevent budget overruns due to fraud, and to honor Azure capacity constraints.
Consider these limits as you scale for production workloads. In this article, you learn about:
Default limits on Azure resources related to Azure Machine Learning.
Creating workspace-level quotas.
Viewing your quotas and limits.
Requesting quota increases.
Along with managing quotas, you can learn how to plan and manage costs for Azure Machine Learning or learn
about the service limits in Azure Machine Learning.
Special considerations
A quota is a credit limit, not a capacity guarantee. If you have large-scale capacity needs, contact Azure
support to increase your quota.
A quota is shared across all the services in your subscriptions, including Azure Machine Learning.
Calculate usage across all services when you're evaluating capacity.
Azure Machine Learning compute is an exception. It has a separate quota from the core compute quota.
Default limits vary by offer category type, such as free trial, pay-as-you-go, and virtual machine (VM)
series (such as Dv2, F, and G).
IMPORTANT
Limits are subject to change. For the latest information, see Service limits in Azure Machine Learning.
Datasets 10 million
RESO URC E M A XIM UM L IM IT
Runs 10 million
Models 10 million
Artifacts 10 million
In addition, the maximum run time is 30 days and the maximum number of metrics logged per run is 1
million.
Azure Machine Learning Compute
Azure Machine Learning Compute has a default quota limit on both the number of cores (split by each VM
Family and cumulative total cores) as well as the number of unique compute resources allowed per region in a
subscription. This quota is separate from the VM core quota listed in the previous section as it applies only to
the managed compute resources of Azure Machine Learning.
Request a quota increase to raise the limits for various VM family core quotas, total subscription core quotas,
cluster quota and resources in this section.
Available resources:
Dedicated cores per region have a default limit of 24 to 300, depending on your subscription offer
type. You can increase the number of dedicated cores per subscription for each VM family. Specialized VM
families like NCv2, NCv3, or ND series start with a default of zero cores.
Low-priority cores per region have a default limit of 100 to 3,000, depending on your subscription
offer type. The number of low-priority cores per subscription can be increased and is a single value
across VM families.
Clusters per region have a default limit of 200. These are shared between training clusters, compute
instances and MIR endpoint deployments. (A compute instance is considered a single-node cluster for
quota purposes.) Cluster quota can be increased up to a value of 500 per region within a given
subscription.
TIP
To learn more about which VM family to request a quota increase for, check out virtual machine sizes in Azure. For
instance GPU VM families start with an "N" in their family name (eg. NCv3 series)
The following table shows additional limits in the platform. Please reach out to the AzureML product team
through a technical support ticket to request an exception.
Nodes in a single Azure Machine Learning Compute 100 nodes but configurable up to 65000 nodes
(AmlCompute) cluster setup as a non communication-
enabled pool (i.e. cannot run MPI jobs)
Nodes in a single Parallel Run Step run on an Azure Machine 100 nodes but configurable up to 65000 nodes if your
Learning Compute (AmlCompute) cluster cluster is setup to scale per above
RESO URC E O R A C T IO N M A XIM UM L IM IT
Nodes in a single Azure Machine Learning Compute 300 nodes but configurable up to 4000 nodes
(AmlCompute) cluster setup as a communication-enabled
pool
Nodes in a single MPI run on an Azure Machine Learning 100 nodes but can be increased to 300 nodes
Compute (AmlCompute) cluster
1 Maximum lifetime is the duration between when a run starts and when it finishes. Completed runs persist
indefinitely. Data for runs not completed within the maximum lifetime is not accessible.
2 Jobs on a low-priority node can be preempted whenever there's a capacity constraint. We recommend that you
implement checkpoints in your job.
Azure Machine Learning managed online endpoints
Azure Machine Learning managed online endpoints have limits described in the following table.
To determine the current usage for an endpoint, view the metrics. To request an exception from the Azure
Machine Learning product team, please open a technical support ticket.
RESO URC E L IM IT
1 Single dashes like, my-endpoint-name , are accepted in endpoint and deployment names.
2 We reserve 20% extra compute resources for performing upgrades. For example, if you request 10 instances in
a deployment, you must have a quota for 12. Otherwise, you will receive an error.
3 If you request a limit increase, be sure to calculate related limit increases you might need. For
example, if you
request a limit increase for requests per second, you might also want to compute the required connections and
bandwidth limits and include these limit increases in the same request.
Azure Machine Learning pipelines
Azure Machine Learning pipelines have the following limits.
RESO URC E L IM IT
Virtual machines
Each Azure subscription has a limit on the number of virtual machines across all services. Virtual machine cores
have a regional total limit and a regional limit per size series. Both limits are separately enforced.
For example, consider a subscription with a US East total VM core limit of 30, an A series core limit of 30, and a
D series core limit of 30. This subscription would be allowed to deploy 30 A1 VMs, or 30 D1 VMs, or a
combination of the two that does not exceed a total of 30 cores.
You can't raise limits for virtual machines above the values shown in the following table.
RESO URC E L IM IT
1You can apply up to 50 tags directly to a subscription. However, the subscription can contain an unlimited
number of tags that are applied to resource groups and resources within the subscription. The number of tags
per resource or resource group is limited to 50.
2Resource Manager returns a list of tag name and values in the subscription only when the number of unique
tags is 80,000 or less. A unique tag is defined by the combination of resource ID, tag name, and tag value. For
example, two resources with the same tag name and value would be calculated as two unique tags. You still can
find a resource by tag when the number exceeds 80,000.
3Deployments are automatically deleted from the history as you near the limit. For more information, see
Automatic deletions from deployment history.
Container Instances
For more information, see Container Instances limits.
Storage
Azure Storage has a limit of 250 storage accounts per region, per subscription. This limit includes both Standard
and Premium storage accounts.
To increase the limit, make a request through Azure Support. The Azure Storage team will review your case and
can approve up to 250 storage accounts for a region.
Workspace-level quotas
Use workspace-level quotas to manage Azure Machine Learning compute target allocation between multiple
workspaces in the same subscription.
By default, all workspaces share the same quota as the subscription-level quota for VM families. However, you
can set a maximum quota for individual VM families on workspaces in a subscription. This lets you share
capacity and avoid resource contention issues.
1. Go to any workspace in your subscription.
2. In the left pane, select Usages + quotas .
3. Select the Configure quotas tab to view the quotas.
4. Expand a VM family.
5. Set a quota limit on any workspace listed under that VM family.
You can't set a negative value or a value higher than the subscription-level quota.
NOTE
You need subscription-level permissions to set a quota at the workspace level.
NOTE
Free trial subscriptions are not eligible for limit or quota increases. If you have a free trial subscription, you can upgrade to
a pay-as-you-go subscription. For more information, see Upgrade Azure free trial to pay-as-you-go and Azure free
account FAQ.
Next steps
Plan and manage costs for Azure Machine Learning
Service limits in Azure Machine Learning
Troubleshooting managed online endpoints deployment and scoring
Manage and optimize Azure Machine Learning
costs
5/25/2022 • 6 minutes to read • Edit Online
Learn how to manage and optimize costs when training and deploying machine learning models to Azure
Machine Learning.
Use the following tips to help you manage and optimize your compute resource costs.
Configure your training clusters for autoscaling
Set quotas on your subscription and workspaces
Set termination policies on your training run
Use low-priority virtual machines (VM)
Schedule compute instances to shut down and start up automatically
Use an Azure Reserved VM Instance
Train locally
Parallelize training
Set data retention and deletion policies
Deploy resources to the same region
For information on planning and monitoring costs, see the plan to manage costs for Azure Machine Learning
guide.
You can also configure the amount of time the node is idle before scale down. By default, idle time before scale
down is set to 120 seconds.
If you perform less iterative experimentation, reduce this time to save costs.
If you perform highly iterative dev/test experimentation, you might need to increase the time so you aren't
paying for constant scaling up and down after each change to your training script or environment.
AmlCompute clusters can be configured for your changing workload requirements in Azure portal, using the
AmlCompute SDK class, AmlCompute CLI, with the REST APIs.
Train locally
When prototyping and running training jobs that are small enough to run on your local computer, consider
training locally. Using the Python SDK, setting your compute target to local executes your script locally. For
more information, see Configure and submit training runs.
Visual Studio Code provides a full-featured environment for developing your machine learning applications.
Using the Azure Machine Learning visual Visual Studio Code extension and Docker, you can run and debug
locally. For more information, see interactive debugging with Visual Studio Code.
Parallelize training
One of the key methods of optimizing cost and performance is by parallelizing the workload with the help of a
parallel run step in Azure Machine Learning. This step allows you to use many smaller nodes to execute the task
in parallel, hence allowing you to scale horizontally. There is an overhead for parallelization. Depending on the
workload and the degree of parallelism that can be achieved, this may or may not be an option. For further
information, see the ParallelRunStep documentation.
Next steps
Plan to manage costs for Azure Machine Learning
Manage budgets, costs, and quota for Azure Machine Learning at organizational scale
Manage Azure Machine Learning resources with the
VS Code Extension (preview)
5/25/2022 • 11 minutes to read • Edit Online
Learn how to manage Azure Machine Learning resources with the VS Code extension.
Prerequisites
Azure subscription. If you don't have one, sign up to try the free or paid version of Azure Machine Learning.
Visual Studio Code. If you don't have it, install it.
Azure Machine Learning extension. Follow the Azure Machine Learning VS Code extension installation guide
to set up the extension.
Create resources
The quickest way to create resources is using the extension's toolbar.
1. Open the Azure Machine Learning view.
2. Select + in the activity bar.
3. Choose your resource from the dropdown list.
4. Configure the specification file. The information required depends on the type of resource you want to create.
5. Right-click the specification file and select Azure ML: Execute YAML .
Alternatively, you can create a resource by using the command palette:
1. Open the command palette View > Command Palette
2. Enter > Azure ML: Create <RESOURCE-TYPE> into the text box. Replace RESOURCE-TYPE with the type of resource
you want to create.
3. Configure the specification file.
4. Open the command palette View > Command Palette
5. Enter > Azure ML: Create Resource into the text box.
Version resources
Some resources like environments, datasets, and models allow you to make changes to a resource and store the
different versions.
To version a resource:
1. Use the existing specification file that created the resource or follow the create resources process to create a
new specification file.
2. Increment the version number in the template.
3. Right-click the specification file and select Azure ML: Execute YAML .
As long as the name of the updated resource is the same as the previous version, Azure Machine Learning picks
up the changes and creates a new version.
Workspaces
For more information, see workspaces.
Create a workspace
1. In the Azure Machine Learning view, right-click your subscription node and select Create Workspace .
2. A specification file appears. Configure the specification file.
3. Right-click the specification file and select Azure ML: Execute YAML .
Alternatively, use the > Azure ML: Create Workspace command in the command palette.
Remove workspace
1. Expand the subscription node that contains your workspace.
2. Right-click the workspace you want to remove.
3. Select whether you want to remove:
Only the workspace: This option deletes only the workspace Azure resource. The resource group,
storage accounts, and any other resources the workspace was attached to are still in Azure.
With associated resources: This option deletes the workspace and all resources associated with it.
Alternatively, use the > Azure ML: Remove Workspace command in the command palette.
Datastores
The extension currently supports datastores of the following types:
Azure Blob
Azure Data Lake Gen 1
Azure Data Lake Gen 2
Azure File
For more information, see datastores.
Create a datastore
1. Expand the subscription node that contains your workspace.
2. Expand the workspace node you want to create the datastore under.
3. Right-click the Datastores node and select Create Datastore .
4. Choose the datastore type.
5. A specification file appears. Configure the specification file.
6. Right-click the specification file and select Azure ML: Execute YAML .
Alternatively, use the > Azure ML: Create Datastore command in the command palette.
Manage a datastore
1. Expand the subscription node that contains your workspace.
2. Expand your workspace node.
3. Expand the Datastores node inside your workspace.
4. Right-click the datastore you want to:
Unregister Datastore. Removes datastore from your workspace.
View Datastore. Display read-only datastore settings
Alternatively, use the > Azure ML: Unregister Datastore and > Azure ML: View Datastore commands
respectively in the command palette.
Datasets
The extension currently supports the following dataset types:
Tabular: Allows you to materialize data into a DataFrame.
File: A file or collection of files. Allows you to download or mount files to your compute.
For more information, see datasets
Create dataset
1. Expand the subscription node that contains your workspace.
2. Expand the workspace node you want to create the dataset under.
3. Right-click the Datasets node and select Create Dataset .
4. A specification file appears. Configure the specification file.
5. Right-click the specification file and select Azure ML: Execute YAML .
Alternatively, use the > Azure ML: Create Dataset command in the command palette.
Manage a dataset
1. Expand the subscription node that contains your workspace.
2. Expand your workspace node.
3. Expand the Datasets node.
4. Right-click the dataset you want to:
View Dataset Proper ties . Lets you view metadata associated with a specific dataset. If you have
multiple version of a dataset, you can choose to only view the dataset properties of a specific version
by expanding the dataset node and performing the same steps described in this section on the version
of interest.
Preview dataset . View your dataset directly in the VS Code Data Viewer. Note that this option is only
available for tabular datasets.
Unregister dataset . Removes a dataset and all versions of it from your workspace.
Alternatively, use the > Azure ML: View Dataset Properties and > Azure ML: Unregister Dataset commands
respectively in the command palette.
Environments
For more information, see environments.
Create environment
1. Expand the subscription node that contains your workspace.
2. Expand the workspace node you want to create the datastore under.
3. Right-click the Environments node and select Create Environment .
4. A specification file appears. Configure the specification file.
5. Right-click the specification file and select Azure ML: Execute YAML .
Alternatively, use the > Azure ML: Create Environment command in the command palette.
View environment configurations
To view the dependencies and configurations for a specific environment in the extension:
1. Expand the subscription node that contains your workspace.
2. Expand your workspace node.
3. Expand the Environments node.
4. Right-click the environment you want to view and select View Environment .
Alternatively, use the > Azure ML: View Environment command in the command palette.
Experiments
For more information, see experiments.
Create job
The quickest way to create a job is by clicking the Create Job icon in the extension's activity bar.
Using the resource nodes in the Azure Machine Learning view:
1. Expand the subscription node that contains your workspace.
2. Expand your workspace node.
3. Right-click the Experiments node in your workspace and select Create Job .
4. Choose your job type.
5. A specification file appears. Configure the specification file.
6. Right-click the specification file and select Azure ML: Execute YAML .
Alternatively, use the > Azure ML: Create Job command in the command palette.
View job
To view your job in Azure Machine Learning studio:
1. Expand the subscription node that contains your workspace.
2. Expand the Experiments node inside your workspace.
3. Right-click the experiment you want to view and select View Experiment in Studio .
4. A prompt appears asking you to open the experiment URL in Azure Machine Learning studio. Select Open .
Alternatively, use the > Azure ML: View Experiment in Studio command respectively in the command palette.
Track run progress
As you're running your job, you may want to see its progress. To track the progress of a run in Azure Machine
Learning studio from the extension:
1. Expand the subscription node that contains your workspace.
2. Expand the Experiments node inside your workspace.
3. Expand the job node you want to track progress for.
4. Right-click the run and select View Run in Studio .
5. A prompt appears asking you to open the run URL in Azure Machine Learning studio. Select Open .
Download run logs & outputs
Once a run is complete, you may want to download the logs and assets such as the model generated as part of a
run.
1. Expand the subscription node that contains your workspace.
2. Expand the Experiments node inside your workspace.
3. Expand the job node you want to download logs and outputs for.
4. Right-click the run:
To download the outputs, select Download outputs .
To download the logs, select Download logs .
Alternatively, use the > Azure ML: Download Outputs and > Azure ML: Download Logs commands respectively in
the command palette.
Compute instances
For more information, see compute instances.
Create compute instance
1. Expand the subscription node that contains your workspace.
2. Expand your workspace node.
3. Expand the Compute node.
4. Right-click the Compute instances node in your workspace and select Create Compute .
5. A specification file appears. Configure the specification file.
6. Right-click the specification file and select Azure ML: Execute YAML .
Alternatively, use the > Azure ML: Create Compute command in the command palette.
Connect to compute instance
To use a compute instance as a development environment or remote Jupyter server, see Connect to a compute
instance.
Stop or restart compute instance
1. Expand the subscription node that contains your workspace.
2. Expand your workspace node.
3. Expand the Compute instances node inside your Compute node.
4. Right-click the compute instance you want to stop or restart and select Stop Compute instance or Restar t
Compute instance respectively.
Alternatively, use the > Azure ML: Stop Compute instance and Restart Compute instance commands respectively
in the command palette.
View compute instance configuration
1. Expand the subscription node that contains your workspace.
2. Expand your workspace node.
3. Expand the Compute instances node inside your Compute node.
4. Right-click the compute instance you want to inspect and select View Compute instance Proper ties .
Alternatively, use the Azure ML: View Compute instance Properties command in the command palette.
Delete compute instance
1. Expand the subscription node that contains your workspace.
2. Expand your workspace node.
3. Expand the Compute instances node inside your Compute node.
4. Right-click the compute instance you want to delete and select Delete Compute instance .
Alternatively, use the Azure ML: Delete Compute instance command in the command palette.
Compute clusters
For more information, see training compute targets.
Create compute cluster
1. Expand the subscription node that contains your workspace.
2. Expand your workspace node.
3. Expand the Compute node.
4. Right-click the Compute clusters node in your workspace and select Create Compute .
5. A specification file appears. Configure the specification file.
6. Right-click the specification file and select Azure ML: Execute YAML .
Alternatively, use the > Azure ML: Create Compute command in the command palette.
View compute configuration
1. Expand the subscription node that contains your workspace.
2. Expand your workspace node.
3. Expand the Compute clusters node inside your Compute node.
4. Right-click the compute you want to view and select View Compute Proper ties .
Alternatively, use the > Azure ML: View Compute Properties command in the command palette.
Delete compute cluster
1. Expand the subscription node that contains your workspace.
2. Expand your workspace node.
3. Expand the Compute clusters node inside your Compute node.
4. Right-click the compute you want to delete and select Remove Compute .
Alternatively, use the > Azure ML: Remove Compute command in the command palette.
Inference Clusters
For more information, see compute targets for inference.
Manage inference clusters
1. Expand the subscription node that contains your workspace.
2. Expand your workspace node.
3. Expand the Inference clusters node inside your Compute node.
4. Right-click the compute you want to:
View Compute Proper ties . Displays read-only configuration data about your attached compute.
Detach compute . Detaches the compute from your workspace.
Alternatively, use the > Azure ML: View Compute Properties and > Azure ML: Detach Compute commands
respectively in the command palette.
Delete inference clusters
1. Expand the subscription node that contains your workspace.
2. Expand your workspace node.
3. Expand the Attached computes node inside your Compute node.
4. Right-click the compute you want to delete and select Remove Compute .
Alternatively, use the > Azure ML: Remove Compute command in the command palette.
Attached Compute
For more information, see unmanaged compute.
Manage attached compute
1. Expand the subscription node that contains your workspace.
2. Expand your workspace node.
3. Expand the Attached computes node inside your Compute node.
4. Right-click the compute you want to:
View Compute Proper ties . Displays read-only configuration data about your attached compute.
Detach compute . Detaches the compute from your workspace.
Alternatively, use the > Azure ML: View Compute Properties and > Azure ML: Detach Compute commands
respectively in the command palette.
Models
For more information, see models
Create model
1. Expand the subscription node that contains your workspace.
2. Expand your workspace node.
3. Right-click the Models node in your workspace and select Create Model .
4. A specification file appears. Configure the specification file.
5. Right-click the specification file and select Azure ML: Execute YAML .
Alternatively, use the > Azure ML: Create Model command in the command palette.
View model properties
1. Expand the subscription node that contains your workspace.
2. Expand the Models node inside your workspace.
3. Right-click the model whose properties you want to see and select View Model Proper ties . A file opens in
the editor containing your model properties.
Alternatively, use the > Azure ML: View Model Properties command in the command palette.
Download model
1. Expand the subscription node that contains your workspace.
2. Expand the Models node inside your workspace.
3. Right-click the model you want to download and select Download Model File .
Alternatively, use the > Azure ML: Download Model File command in the command palette.
Delete a model
1. Expand the subscription node that contains your workspace.
2. Expand the Models node inside your workspace.
3. Right-click the model you want to delete and select Remove Model .
4. A prompt appears confirming you want to remove the model. Select Ok .
Alternatively, use the > Azure ML: Remove Model command in the command palette.
Endpoints
For more information, see endpoints.
Create endpoint
1. Expand the subscription node that contains your workspace.
2. Expand your workspace node.
3. Right-click the Models node in your workspace and select Create Endpoint .
4. Choose your endpoint type.
5. A specification file appears. Configure the specification file.
6. Right-click the specification file and select Azure ML: Execute YAML .
Alternatively, use the > Azure ML: Create Endpoint command in the command palette.
Delete endpoint
1. Expand the subscription node that contains your workspace.
2. Expand the Endpoints node inside your workspace.
3. Right-click the deployment you want to remove and select Remove Ser vice .
4. A prompt appears confirming you want to remove the service. Select Ok .
Alternatively, use the > Azure ML: Remove Service command in the command palette.
View service properties
In addition to creating and deleting deployments, you can view and edit settings associated with the
deployment.
1. Expand the subscription node that contains your workspace.
2. Expand the Endpoints node inside your workspace.
3. Right-click the deployment you want to manage:
To view deployment configuration settings, select View Ser vice Proper ties .
Alternatively, use the > Azure ML: View Service Properties command in the command palette.
Next steps
Train an image classification model with the VS Code extension.
Generate Responsible AI dashboard with YAML and
Python (preview)
5/25/2022 • 14 minutes to read • Edit Online
APPLIES TO: Azure CLI ml extension v2 (current) Python SDK azure-ai-ml v2 (preview)
The Responsible AI (RAI) dashboard can be generated via a pipeline job using RAI components. There are six
core components for creating Responsible AI dashboards, along with a couple of helper components. A sample
experiment graph:
Getting started
To use the Responsible AI components, you must first register them in your Azure Machine Learning workspace.
This section documents the required steps.
Prerequisites
You'll need:
An AzureML workspace
A git installation
A MiniConda installation
An Azure CLI installation
Installation Steps
1. Clone the Repository
cd RAI-vNext-Preview
Az login
Quick-Setup.ps1
This will prompt for the desired conda environment name and AzureML workspace details. Alternatively,
use the bash script:
This script will echo the supplied parameters, and pause briefly before continuing.
Responsible AI components
The core components for constructing a Responsible AI dashboard in AzureML are:
RAI Insights Dashboard Constructor
The tool components:
Add Explanation to RAI Insights Dashboard
Add Causal to RAI Insights Dashboard
Add Counterfactuals to RAI Insights Dashboard
Add Error Analysis to RAI Insights Dashboard
Gather RAI Insights Dashboard
The RAI Insights Dashboard Constructor and Gather RAI Insights Dashboard components are always required,
plus at least one of the tool components. However, it isn't necessary to use all the tools in every Responsible AI
dashboard.
Below are specifications of the Responsible AI components and examples of code snippets in YAML and Python.
To view the full code, see sample YAML and Python notebook
RAI Insights Dashboard Constructor
This component has three input ports:
The machine learning model
The training dataset
The test dataset
Use the train and test dataset that you used when training your model to generate model-debugging insights
with components such as Error analysis and Model explanations. For components like Causal analysis that
doesn't require a model, the train dataset will be used to train the causal model to generate the causal insights.
The test dataset is used to populate your Responsible AI dashboard visualizations.
The easiest way to supply the model is using our Fetch Registered Model component, which will be discussed
below.
NOTE
Currently only models with MLFlow format, with a sklearn flavor are supported.
The two datasets should be file datasets (of type uri_file) in Parquet format. Tabular datasets aren't supported,
we provide a TabularDataset to Parquet file component to help with conversions. The training and test
datasets provided don't have to be the same datasets used in training the model (although it's permissible for
them to be the same). By default, the test dataset is restricted to 5000 rows for performance reasons of the
visualization UI.
The constructor component also accepts the following parameters:
categorical_column_names The columns in the datasets, which Optional list of strings (see note below)
represent categorical data
classes The full list of class labels in the Optional list of strings (see note below)
training dataset
NOTE
The lists should be supplied as a single JSON encoded string for categorical_column_names and classes inputs.
The constructor component has a single output named rai_insights_dashboard . This is an empty dashboard,
which the individual tool components will operate on, and then all the results will be assembled by the
Gather RAI Insights Dashboard component at the end.
YAML
Python
create_rai_job:
type: command
component: azureml:rai_insights_constructor:1
inputs:
title: From YAML snippet
task_type: regression
model_info_path: ${{parent.jobs.fetch_model_job.outputs.model_info_output_path}}
train_dataset: ${{parent.inputs.my_training_data}}
test_dataset: ${{parent.inputs.my_test_data}}
target_column_name: ${{parent.inputs.target_column_name}}
categorical_column_names: '["location", "style", "job title", "OS", "Employer", "IDE", "Programming
language"]'
[
{
"name": "High Yoe",
"cohort_filter_list": [
{
"method": "greater",
"arg": [
5
],
"column": "YOE"
}
]
},
{
"name": "Low Yoe",
"cohort_filter_list": [
{
"method": "less",
"arg": [
6.5
],
"column": "YOE"
}
]
}
]
treatment_features A list of feature names in the datasets, List of strings (see note below)
which are potentially ‘treatable’ to
obtain different outcomes.
heterogeneity_features A list of feature names in the datasets, Optional list of strings (see note
which might affect how the ‘treatable’ below).
features behave. By default all features
will be considered
nuisance_model The model used to estimate the Optional string. Must be ‘linear’ or
outcome of changing the treatment ‘AutoML’ defaulting to ‘linear.’
features.
heterogeneity_model The model used to estimate the effect Optional string. Must be ‘linear’ or
of the heterogeneity features on the ‘forest’ defaulting to ‘linear.’
outcome.
treatment_cost The cost of the treatments. If 0, then Optional integer or list (see note
all treatments will have zero cost. If a below).
list is passed, then each element is
applied to one of the
treatment_features. Each element can
be a scalar value to indicate a constant
cost of applying that treatment or an
array indicating the cost for each
sample. If the treatment is a discrete
treatment, then the array for that
feature should be two dimensional
with the first dimension representing
samples and the second representing
the difference in cost between the
non-default values and the default
value.
categories What categories to use for the Optional. auto or list (see note
categorical columns. If auto , then the below.)
categories will be inferred for all
categorical columns. Otherwise, this
argument should have as many entries
as there are categorical columns. Each
entry should be either auto to infer
the values for that column or the list of
values for the column. If explicit values
are provided, the first value is treated
as the "control" value for that column
against which other values are
compared.
NOTE
For the list parameters: Several of the parameters accept lists of other types (strings, numbers, even other lists). To
pass these into the component, they must first be JSON-encoded into a single string.
This component has a single output port, which can be connected to one of the insight_[n] input ports of the
Gather RAI Insights Dashboard component.
YAML
Python
causal_01:
type: command
component: azureml:rai_insights_causal:1
inputs:
rai_insights_dashboard: ${{parent.jobs.create_rai_job.outputs.rai_insights_dashboard}}
treatment_features: `["Number of github repos contributed to", "YOE"]'
desired_range For regression problems, identify the Optional list of two numbers (see note
desired range of outcomes below).
permitted_range Dictionary with feature names as keys Optional string or list (see note below).
and permitted range in list as values.
Defaults to the range inferred from
training data.
features_to_vary Either a string "all" or a list of feature Optional string or list (see note below)
names to vary.
NOTE
For the non-scalar parameters: Parameters which are lists or dictionaries should be passed as single JSON-encoded
strings.
This component has a single output port, which can be connected to one of the insight_[n] input ports of the
Gather RAI Insights Dashboard component.
YAML
Python
counterfactual_01:
type: command
component: azureml:rai_insights_counterfactual:1
inputs:
rai_insights_dashboard: ${{parent.jobs.create_rai_job.outputs.rai_insights_dashboard}}
total_CFs: 10
desired_range: "[5, 10]"
filter_features A list of one or two features to use for Optional list of two feature names (see
the matrix filter note below).
NOTE
filter_features: This list of one or two feature names should be passed as a single JSON-encoded string.
This component has a single output port, which can be connected to one of the insight_[n] input ports of the
Gather RAI Insights Dashboard component.
YAML
Python
error_analysis_01:
type: command
component: azureml:rai_insights_erroranalysis:1
inputs:
rai_insights_dashboard: ${{parent.jobs.create_rai_job.outputs.rai_insights_dashboard}}
filter_features: `["style", "Employer"]'
YAML
Python
explain_01:
type: command
component: azureml:rai_insights_explanation:VERSION_REPLACEMENT_STRING
inputs:
comment: My comment
rai_insights_dashboard: ${{parent.jobs.create_rai_job.outputs.rai_insights_dashboard}}
YAML
Python
gather_01:
type: command
component: azureml:rai_insights_gather:1
inputs:
constructor: ${{parent.jobs.create_rai_job.outputs.rai_insights_dashboard}}
insight_1: ${{parent.jobs.causal_01.outputs.causal}}
insight_2: ${{parent.jobs.counterfactual_01.outputs.counterfactual}}
insight_3: ${{parent.jobs.error_analysis_01.outputs.error_analysis}}
insight_4: ${{parent.jobs.explain_01.outputs.explanation}}
Helper components
We provide two helper components to aid in connecting the Responsible AI components to your existing assets.
Fetch registered model
This component produces information about a registered model, which can be consumed by the
model_info_path input port of the RAI Insights Dashboard Constructor component. It has a single input
parameter – the AzureML ID ( <NAME>:<VERSION> ) of the desired model.
YAML
Python
fetch_model_job:
type: command
component: azureml:fetch_registered_model:1
inputs:
model_id: my_model_name:12
convert_train_job:
type: command
component: azureml:convert_tabular_to_parquet:1
inputs:
tabular_dataset_name: tabular_dataset_name
Input constraints
What model formats and flavors are supported?
The model must be in MLFlow directory with a sklearn flavor available. Furthermore, the model needs to be
loadable in the environment used by the Responsible AI components.
What data formats are supported?
The supplied datasets should be file datasets (uri_file type) in Parquet format. We provide the
TabularDataset to Parquet File component to help convert the data into the required format.
Next steps
Once your Responsible AI dashboard is generated, view how to access and use it in Azure Machine Learning
studio
Summarize and share your Responsible AI insights with the Responsible AI scorecard as a PDF export.
Learn more about the concepts and techniques behind the Responsible AI dashboard.
Learn more about how to collect data responsibly
View sample YAML and Python notebooks to generate a Responsible AI dashboard with YAML or Python.
Generate Responsible AI dashboard in the studio UI
(preview)
5/25/2022 • 5 minutes to read • Edit Online
You can create a Responsible AI dashboard with a no-code experience in the Azure Machine Learning studio UI.
To start the wizard, navigate to the registered model you’d like to create Responsible AI insights for and select
the Details tab. Then select the Create Responsible AI dashboard (preview) button.
The wizard is designed to provide an interface to input all the necessary parameters to instantiate your
Responsible AI dashboard without having to touch code. The experience takes place entirely in the Azure
Machine Learning studio UI with a guided flow and instructional text to help contextualize the variety of choices
in which Responsible AI components you’d like to populate your dashboard with. The wizard is divided into five
steps:
1. Datasets
2. Modeling task
3. Dashboard components
4. Component parameters
5. Experiment configuration
NOTE
Only tabular dataset formats are supported.
1. Select a dataset for training : Select the dropdown to view your registered datasets in Azure Machine
Learning workspace. This dataset will be used to generate Responsible AI insights for components such as
model explanations and error analysis.
2. Create new dataset : If the desired datasets aren't in your Azure Machine Learning workspace, select “New
dataset” to upload your dataset
3. Select a dataset for testing : Select the dropdown to view your registered datasets in Azure Machine
Learning workspace. This dataset is used to populate your Responsible AI dashboard visualizations.
NOTE
The wizard only supports models with MLflow format and sci-kit learn flavor.
NOTE
Multi-class classification does not support Real-life intervention analysis profile. Select the desired profile, then Next .
Experiment configuration
Finally, configure your experiment to kick off a job to generate your Responsible AI dashboard.
1. Name : Give your dashboard a unique name so that you can differentiate it when you’re viewing the list of
dashboards for a given model.
2. Experiment name : Select an existing experiment to run the job in, or create a new experiment.
3. Existing experiment : Select an existing experiment from drop-down.
4. Select compute type : Specify which compute type you’d like to use to execute your job.
5. Select compute : Select from a drop-down that compute you’d like to use. If there are no existing compute
resources, select the “+” to create a new compute resource and refresh the list.
6. Description : Add a more verbose description for your Responsible AI dashboard.
7. Tags : Add any tags to this Responsible AI dashboard.
After you’ve finished your experiment configuration, select Create to start the generation of your Responsible
AI dashboard. You'll be redirected to the experiment page to track the progress of your job. See below next steps
on how to view your Responsible AI dashboard.
Next steps
Once your Responsible AI dashboard is generated, view how to access and use it in Azure Machine Learning
studio
Summarize and share your Responsible AI insights with the Responsible AI scorecard as a PDF export.
Learn more about the concepts and techniques behind the Responsible AI dashboard.
Learn more about how to collect data responsibly
How to use the Responsible AI dashboard in studio
(preview)
5/25/2022 • 15 minutes to read • Edit Online
Responsible AI dashboards are linked to your registered models. To view your Responsible AI dashboard, go into
your model registry and select the registered model you've generated a Responsible AI dashboard for. Once you
select into your model, select the Responsible AI (preview) tab to view a list of generated dashboards.
Multiple dashboards can be configured and attached to your registered model. Different combinations of
components (explainers, causal analysis, etc.) can be attached to each Responsible AI dashboard. The list below
only shows whether a component was generated for your dashboard, but different components can be viewed
or hidden within the dashboard itself.
Selecting the name of the dashboard will open up your dashboard into a full view in your browser. At anytime,
select the Back to models details to get back to your list of dashboards.
Full functionality with integrated compute resource
Some features of the Responsible AI dashboard require dynamic, real-time computation. Without connecting a
compute resource to the dashboard, you may find some functionality missing. Connecting to a compute
resource will enable full functionality of your Responsible AI dashboard for the following components:
Error analysis
Setting your global data cohort to any cohort of interest will update the error tree instead of disabling
it.
Selecting other error or performance metrics is supported.
Selecting any subset of features for training the error tree map is supported.
Changing the minimum number of samples required per leaf node and error tree depth is supported.
Dynamically updating the heatmap for up to two features is supported.
Feature impor tance
An individual conditional expectation (ICE) plot in the individual feature importance tab is supported.
Counterfactual what-if
Generating a new what-if counterfactual datapoint to understand the minimum change required for a
desired outcome is supported.
Causal analysis
Selecting any individual datapoint, perturbing its treatment features, and seeing the expected causal
outcome of causal what-if is supported (only for regression ML scenarios).
The information above can also be found on the Responsible AI dashboard page by selecting the information
icon button:
2. Once compute is in “Running” state, your Responsible AI dashboard will start to connect to the compute
instance. To achieve this, a terminal process will be created on the selected compute instance, and
Responsible AI endpoint will be started on the terminal. Select View terminal outputs to view current
terminal process.
3. When your Responsible AI dashboard is connected to the compute instance, you'll see a green message
bar, and the dashboard is now fully functional.
4. If it takes a while and your Responsible AI dashboard is still not connected to the compute instance, or a
red error message bar shows up, it means there are issues with starting your Responsible AI endpoint.
Select View terminal outputs and scroll down to the bottom to view the error message.
If you're having issues with figuring out how to resolve the failed to connect to compute instance issue,
select the “smile” icon on the upper right corner, and submit feedback to us to let us know what error or
issue you hit. You can include screenshot and/or your email address in the feedback form.
Selecting the New cohor t button on the top of the dashboard or in the Cohort settings opens a new panel with
options to filter on the following:
1. Index : filters by the position of the datapoint in the full dataset
2. Dataset : filters by the value of a particular feature in the dataset
3. Predicted Y : filters by the prediction made by the model
4. True Y : filters by the actual value of the target feature
5. Error (regression) : filters by error or Classification Outcome (classification): filters by type and accuracy of
classification
6. Categorical Values : filter by a list of values that should be included
7. Numerical Values : filter by a Boolean operation over the values (for example, select datapoints where age
< 64)
You can name your new dataset cohort, select Add filter to add each desired filter, then select Save to save the
new cohort to your cohort list or Save and switch to save and immediately switch the global cohort of the
dashboard to the newly created cohort.
Selecting Dashboard configuration will open a panel with a list of the components you’ve configured in your
dashboard. You can hide components in your dashboard by selecting the ‘trash’ icon.
You can add components back into your dashboard via the blue circular ‘+’ icon in the divider between each
component.
Error analysis
Error tree map
The first tab of the Error analysis component is the Tree map, which illustrates how model failure is distributed
across different cohorts with a tree visualization. Select any node to see the prediction path on your features
where error was found.
M L SC EN A RIO M ET RIC S
You can further investigate your model by looking at a comparative analysis of its performance across different
cohorts or subgroups of your dataset, including automatically created “temporary cohorts” based on selected
nodes from the Error analysis component. Select filters along y-value and x-value to cut across different
dimensions.
Data explorer
The Data explorer component allows you to analyze data statistics along axes filters such as predicted outcome,
dataset features and error groups. This component helps you understand over and underrepresentation in your
dataset.
1. Select a dataset cohor t to explore : Specify which dataset cohort from your list of cohorts you want to
view data statistics for.
2. X-axis : displays the type of value being plotted horizontally, modify by clicking the button to open a side
panel.
3. Y-axis : displays the type of value being plotted vertically, modify by clicking the button to open a side panel.
4. Char t type : specifies chart type, choose between aggregate plots (bar charts) or individual datapoints
(scatter plot).
Selecting the "Individual datapoints" option under "Chart type" shifts to a disaggregated view of the data with
the availability of a color axis.
Feature importances (model explanations)
The model explanation component allows you to see which features were most important in your model’s
predictions. You can view what features impacted your model’s prediction overall in the Aggregate feature
impor tance tab or view feature importances for individual datapoints in the Individual feature impor tance
tab.
Aggregate feature importances (global explanations )
1. Top k features : lists the most important global features for a prediction and allows you to change it through
a slider bar.
2. Aggregate feature impor tance : visualizes the weight of each feature in influencing model decisions
across all predictions.
3. Sor t by : allows you to select which cohort's importances to sort the aggregate feature importance graph by.
4. Char t type : allows you to select between a bar plot view of average importances for each feature and a box
plot of importances for all data.
When you select on one of the features in the bar plot, the below dependence plot will be populated. The
dependence plot shows the relationship of the values of a feature to its corresponding feature importance
values impacting the model prediction.
5. Feature impor tance of [feature] (regression) or Feature impor tance of [feature] on [predicted
class] (classification) : plots the importance of a particular feature across the predictions. For regression
scenarios, the importance values are in terms of the output so positive feature importance means it
contributed positively towards the output; vice versa for negative feature importance. For classification
scenarios, positive feature importances mean that feature value is contributing towards the predicted class
denoted in the y-axis title; and negative feature importance means it's contributing against the predicted
class.
6. View dependence plot for : selects the feature whose importances you want to plot.
7. Select a dataset cohor t : selects the cohort whose importances you want to plot.
Individual feature importances (local explanations )
This tab explains how features influence the predictions made on specific datapoints. You can choose up to five
datapoints to compare feature importances for.
Point selection table : view your datapoints and select up to five points to display in the feature importance
plot or the ICE plot below the table.
Feature impor tance plot : bar plot of the importance of each feature for the model's prediction on the
selected datapoint(s)
1. Top k features : allows you to specify the number of features to show importances for through a slider.
2. Sor t by : allows you to select the point (of those checked above) whose feature importances are displayed in
descending order on the feature importance plot.
3. View absolute values : Toggle on to sort the bar plot by the absolute values; this allows you to see the top
highest impacting features regardless of its positive or negative direction.
4. Bar plot : displays the importance of each feature in the dataset for the model prediction of the selected
datapoints.
Individual conditional expectation (ICE) plot : switches to the ICE plot showing model predictions across a
range of values of a particular feature
Min (numerical features) : specifies the lower bound of the range of predictions in the ICE plot.
Max (numerical features) : specifies the upper bound of the range of predictions in the ICE plot.
Steps (numerical features) : specifies the number of points to show predictions for within the interval.
Feature values (categorical features) : specifies which categorical feature values to show predictions for.
Feature : specifies the feature to make predictions for.
Counterfactual what-if
Counterfactual analysis provides a diverse set of “what-if” examples generated by changing the values of
features minimally to produce the desired prediction class (classification) or range (regression).
1. Point selection : selects the point to create a counterfactual for and display in the top-ranking features
plot below
Top ranked features plot : displays, in descending order in terms of average frequency, the features to
perturb to create a diverse set of counterfactuals of the desired class. You must generate at least 10
diverse counterfactuals per datapoint to enable this chart due to lack of accuracy with a lesser number of
counterfactuals.
2. Selected datapoint : performs the same action as the point selection in the table, except in a dropdown
menu.
3. Desired class for counterfactual(s) : specifies the class or range to generate counterfactuals for.
4. Create what-if counterfactual : opens a panel for counterfactual what-if datapoint creation.
Selecting the Create what-if counterfactual button opens a full window panel.
5. Search features : finds features to observe and change values.
6. Sor t counterfactual by ranked features : sorts counterfactual examples in order of perturbation effect
(see above for top ranked features plot).
7. Counterfactual Examples : lists feature values of example counterfactuals with the desired class or range.
The first row is the original reference datapoint. Select on “Set value” to set all the values of your own
counterfactual datapoint in the bottom row with the values of the pre-generated counterfactual example.
8. Predicted value or class lists the model prediction of a counterfactual's class given those changed
features.
9. Create your own counterfactual : allows you to perturb your own features to modify the counterfactual,
features that have been changed from the original feature value will be denoted by the title being bolded (ex.
Employer and Programming language). Clicking on “See prediction delta” will show you the difference in the
new prediction value from the original datapoint.
10. What-if counterfactual name : allows you to name the counterfactual uniquely.
11. Save as new datapoint : saves the counterfactual you've created.
Causal analysis
Aggregate causal effects
Selecting on the Aggregate causal effects tab of the Causal analysis component shows the average causal
effects for pre-defined treatment features (the features that you want to treat to optimize your outcome).
NOTE
Global cohort functionality is not supported for the causal analysis component.
1. Direct aggregate causal effect table : displays the causal effect of each feature aggregated on the entire
dataset and associated confidence statistics
a. Continuous treatments : On average in this sample, increasing this feature by one unit will cause the
probability of class to increase by X units, where X is the causal effect.
b. Binar y treatments : On average in this sample, turning on this feature will cause the probability of
class to increase by X units, where X is the causal effect.
2. Direct aggregate causal effect whisker plot : visualizes the causal effects and confidence intervals of the
points in the table
Individual causal effects and causal what-if
To get a granular view of causal effects on an individual datapoint, switch to the Individual causal what-if tab.
3. Show top k datapoint samples ordered by causal effects for recommended treatment feature :
selects the number of datapoints to show in the table below.
4. Recommended individual treatment policy table : lists, in descending order of causal effect, the
datapoints whose target features would be most improved by an intervention.
Next steps
Summarize and share your Responsible AI insights with the Responsible AI scorecard as a PDF export.
Learn more about the concepts and techniques behind the Responsible AI dashboard.
View sample YAML and Python notebooks to generate a Responsible AI dashboard with YAML or Python.
Share insights with Responsible AI scorecard
(preview)
5/25/2022 • 7 minutes to read • Edit Online
APPLIES TO: Azure CLI ml extension v2 (current) Python SDK azure-ai-ml v2 (preview)
Azure Machine Learning’s Responsible AI dashboard is designed for machine learning professionals and data
scientists to explore and evaluate model insights and inform their data-driven decisions, and while it can help
you implement Responsible AI practically in your machine learning lifecycle, there are some needs left
unaddressed:
There often exists a gap between the technical Responsible AI tools (designed for machine-learning
professionals) and the ethical, regulatory, and business requirements that define the production
environment.
While an end-to-end machine learning life cycle includes both technical and non-technical stakeholders in
the loop, there's very little support to enable an effective multi-stakeholder alignment, helping technical
experts get timely feedback and direction from the non-technical stakeholders.
AI regulations make it essential to be able to share model and data insights with auditors and risk officers for
auditability purposes.
One of the biggest benefits of using the Azure Machine Learning ecosystem is related to the archival of model
and data insights in the Azure Machine Learning Run History (for quick reference in future). As a part of that
infrastructure and to accompany machine learning models and their corresponding Responsible AI dashboards,
we introduce the Responsible AI scorecard, a customizable report that you can easily configure, download, and
share with your technical and non-technical stakeholders to educate them about your data and model health
and compliance and build trust. This scorecard could also be used in audit reviews to inform the stakeholders
about the characteristics of your model.
type: command
component: azureml:rai_score_card@latest
inputs:
dashboard: ${{parent.jobs.gather_01.outputs.dashboard}}
pdf_generation_config:
type: uri_file
path: ./pdf_gen.json
mode: download
predefined_cohorts_json:
type: uri_file
path: ./cohorts.json
mode: download
Sample json for cohorts definition and score card generation config can be found below:
Cohorts definition:
[
{
"name": "High Yoe",
"cohort_filter_list": [
{
"method": "greater",
"arg": [
5
],
"column": "YOE"
}
]
},
{
"name": "Low Yoe",
"cohort_filter_list": [
{
"method": "less",
"arg": [
6.5
],
"column": "YOE"
}
]
}
]
M O DEL N A M E N A M E O F M O DEL
Metrics
Threshold: Desired threshold for selected metric. Allowed mathematical tokens are >, <, >=, and <= followed by
a real number. For example, >= 0.75 means that the target for selected metric is greater than or equal to 0.75.
Feature importance
top_n: Number of features to show with a maximum of 10. Positive integers up to 10 are allowed.
Fairness
M ET RIC DEF IN IT IO N
NOTE
Your choice of fairness_evaluation_kind (selecting ‘difference’ vs ‘ratio) impacts the scale of your target value.
Be mindful of your selection to choose a meaningful target value.
You can select from the following metrics, paired with the fairness_evaluation_kind to configure your fairness
assessment component of the scorecard:
Selecting Responsible AI scorecard (preview) will show you a dropdown to view all Responsible A I
scorecards generated for this dashboard.
Select which scorecard you’d like to download from the list and select Download to download the PDF to your
machine.
The model performance segment displays your model’s most important metrics and characteristics of your
predictions and how well they satisfy your desired target values.
Next, you can also view the top performing and worst performing data cohorts and subgroups that are
automatically extracted for you to see the blind spots of your model.
Then you can see the top important factors impacting your model predictions, which is a requirement to build
trust with how your model is performing its task.
You can further see your model fairness insights summarized and inspect how well your model is satisfying the
fairness target values you had set for your desired sensitive groups.
Finally, you can observe your dataset’s causal insights summarized, figuring out whether your identified
factors/treatments have any causal effect on the real-world outcome.
Next steps
See the how-to guide for generating a Responsible AI dashboard via CLIv2 and SDKv2 or studio UI .
Learn more about the concepts and techniques behind the Responsible AI dashboard.
View sample YAML and Python notebooks to generate a Responsible AI dashboard with YAML or Python.
Interactive debugging with Visual Studio Code
5/25/2022 • 16 minutes to read • Edit Online
IMPORTANT
The Azure Machine Learning VS Code extension uses the CLI (v2) by default. The instructions in this guide use 1.0
CLI. To switch to the 1.0 CLI, set the azureML.CLI Compatibility Mode setting in Visual Studio Code to 1.0 .
For more information on modifying your settings in Visual Studio Code, see the user and workspace settings
documentation.
Docker
Docker Desktop for Mac and Windows
Docker Engine for Linux.
NOTE
On Windows, make sure to configure Docker to use Linux containers.
TIP
For Windows, although not required, it's highly recommended to use Docker with Windows Subsystem for
Linux (WSL) 2.
Python 3
Debug experiment locally
IMPORTANT
Before running your experiment locally make sure that:
Docker is running.
The azureML.CLI Compatibility Mode setting in Visual Studio Code is set to 1.0 as specified in the prerequisites
NOTE
The first time your Docker image is created can take several minutes.
11. Once your image is built, a prompt appears to start the debugger. Set your breakpoints in your script and
select Star t debugger when you're ready to start debugging. Doing so attaches the VS Code debugger
to the container running your experiment. Alternatively, in the Azure Machine Learning extension, hover
over the node for your current run and select the play icon to start the debugger.
IMPORTANT
You cannot have multiple debug sessions for a single experiment. You can however debug two or more
experiments using multiple VS Code instances.
At this point, you should be able to step-through and debug your code using VS Code.
If at any point you want to cancel your run, right-click your run node and select Cancel run .
Similar to remote experiment runs, you can expand your run node to inspect the logs and outputs.
TIP
Docker images that use the same dependencies defined in your environment are reused between runs. However, if you
run an experiment using a new or different environment, a new image is created. Since these images are saved to your
local storage, it's recommended to remove old or unused Docker images. To remove images from your system, use the
Docker CLI or the VS Code Docker extension.
TIP
Although you can work with Azure Machine Learning resources that are not behind a virtual network, using a virtual
network is recommended.
How it works
Your ML pipeline steps run Python scripts. These scripts are modified to perform the following actions:
1. Log the IP address of the host that they are running on. You use the IP address to connect the debugger
to the script.
2. Start the debugpy debug component, and wait for a debugger to connect.
3. From your development environment, you monitor the logs created by the training process to find the IP
address where the script is running.
4. You tell VS Code the IP address to connect the debugger to by using a launch.json file.
5. You attach the debugger and interactively step through the script.
Configure Python scripts
To enable debugging, make the following changes to the Python script(s) used by steps in your ML pipeline:
1. Add the following import statements:
import argparse
import os
import debugpy
import socket
from azureml.core import Run
2. Add the following arguments. These arguments allow you to enable the debugger as needed, and set the
timeout for attaching the debugger:
parser.add_argument('--remote_debug', action='store_true')
parser.add_argument('--remote_debug_connection_timeout', type=int,
default=300,
help=f'Defines how much time the AML compute target '
f'will await a connection from a debugger client (VSCODE).')
parser.add_argument('--remote_debug_client_ip', type=str,
help=f'Defines IP Address of VS Code client')
parser.add_argument('--remote_debug_port', type=int,
default=5678,
help=f'Defines Port of VS Code client')
3. Add the following statements. These statements load the current run context so that you can log the IP
address of the node that the code is running on:
global run
run = Run.get_context()
4. Add an if statement that starts debugpy and waits for a debugger to attach. If no debugger attaches
before the timeout, the script continues as normal. Make sure to replace the HOST and PORT values is the
listen function with your own.
if args.remote_debug:
print(f'Timeout for debug connection: {args.remote_debug_connection_timeout}')
# Log the IP and port
try:
ip = args.remote_debug_client_ip
except:
print("Need to supply IP address for VS Code client")
print(f'ip_address: {ip}')
debugpy.listen(address=(ip, args.remote_debug_port))
# Wait for the timeout for debugger to attach
debugpy.wait_for_client()
print(f'Debugger attached = {debugpy.is_client_connected()}')
The following Python example shows a simple train.py file that enables debugging:
# Copyright (c) Microsoft. All rights reserved.
# Licensed under the MIT license.
import argparse
import os
import debugpy
import socket
from azureml.core import Run
print("In train.py")
print("As a data scientist, this is where I use my training code.")
parser = argparse.ArgumentParser("train")
# Get run object, so we can find and log the IP of the host instance
global run
run = Run.get_context()
args = parser.parse_args()
Configure ML pipeline
To provide the Python packages needed to start debugpy and get the run context, create an environment and set
pip_packages=['debugpy', 'azureml-sdk==<SDK-VERSION>'] . Change the SDK version to match the one you are
using. The following code snippet demonstrates how to create an environment:
# Use a RunConfiguration to specify some additional requirements for this step.
from azureml.core.runconfig import RunConfiguration
from azureml.core.conda_dependencies import CondaDependencies
from azureml.core.runconfig import DEFAULT_CPU_IMAGE
# enable Docker
run_config.environment.docker.enabled = True
# use conda_dependencies.yml to create a conda environment in the Docker image for execution
run_config.environment.python.user_managed_dependencies = False
In the Configure Python scripts section, new arguments were added to the scripts used by your ML pipeline
steps. The following code snippet demonstrates how to use these arguments to enable debugging for the
component and set a timeout. It also demonstrates how to use the environment created earlier by setting
runconfig=run_config :
When the pipeline runs, each step creates a child run. If debugging is enabled, the modified script logs
information similar to the following text in the 70_driver_log.txt for the child run:
TIP
You can also find the IP address from the run logs for the child run for this pipeline step. For more information on viewing
this information, see Monitor Azure ML experiment runs and metrics.
For more information on using debugpy with VS Code, see Remote Debugging.
2. To configure VS Code to communicate with the Azure Machine Learning compute that is running the
debugger, create a new debug configuration:
a. From VS Code, select the Debug menu and then select Open configurations . A file named
launch.json opens.
b. In the launch.json file, find the line that contains "configurations": [ , and insert the following
text after it. Change the "host": "<IP-ADDRESS>" entry to the IP address returned in your logs from
the previous section. Change the "localRoot": "${workspaceFolder}/code/step" entry to a local
directory that contains a copy of the script being debugged:
{
"name": "Azure Machine Learning Compute: remote debug",
"type": "python",
"request": "attach",
"port": 5678,
"host": "<IP-ADDRESS>",
"redirectOutput": true,
"pathMappings": [
{
"localRoot": "${workspaceFolder}/code/step1",
"remoteRoot": "."
}
]
}
IMPORTANT
If there are already other entries in the configurations section, add a comma (,) after the code that you
inserted.
TIP
The best practice, especially for pipelines is to keep the resources for scripts in separate directories so that
code is relevant only for each of the steps. In this example the localRoot example value references
/code/step1 .
If you are debugging multiple scripts, in different directories, create a separate configuration section for
each script.
TIP
Save time and catch bugs early by debugging managed online endpoints and deployments locally. For more information,
see Debug managed online endpoints locally in Visual Studio Code (preview).
IMPORTANT
This method of debugging does not work when using Model.deploy() and LocalWebservice.deploy_configuration
to deploy a model locally. Instead, you must create an image using the Model.package() method.
Local web service deployments require a working Docker installation on your local system. For more
information on using Docker, see the Docker Documentation. Note that when working with compute instances,
Docker is already installed.
Configure development environment
1. To install debugpy on your local VS Code development environment, use the following command:
For more information on using debugpy with VS Code, see Remote Debugging.
2. To configure VS Code to communicate with the Docker image, create a new debug configuration:
a. From VS Code, select the Debug menu in the Run extention and then select Open
configurations . A file named launch.json opens.
b. In the launch.json file, find the "configurations" item (the line that contains
"configurations": [ ), and insert the following text after it.
{
"name": "Azure Machine Learning Deployment: Docker Debug",
"type": "python",
"request": "attach",
"connect": {
"port": 5678,
"host": "0.0.0.0",
},
"pathMappings": [
{
"localRoot": "${workspaceFolder}",
"remoteRoot": "/var/azureml-app"
}
]
}
{
// Use IntelliSense to learn about possible attributes.
// Hover to view descriptions of existing attributes.
// For more information, visit: https://go.microsoft.com/fwlink/linkid=830387
"version": "0.2.0",
"configurations": [
{
"name": "Python: Current File",
"type": "python",
"request": "launch",
"program": "${file}",
"console": "integratedTerminal"
},
{
"name": "Azure Machine Learning Deployment: Docker Debug",
"type": "python",
"request": "attach",
"connect": {
"port": 5678,
"host": "0.0.0.0"
},
"pathMappings": [
{
"localRoot": "${workspaceFolder}",
"remoteRoot": "/var/azureml-app"
}
]
}
]
}
IMPORTANT
If there are already other entries in the configurations section, add a comma ( , ) after the code that you
inserted.
with open("myenv.yml","w") as f:
f.write(myenv.serialize_to_string())
2. To start debugpy and wait for a connection when the service starts, add the following to the top of your
score.py file:
import debugpy
# Allows other computers to attach to debugpy on this IP address and port.
debugpy.listen(('0.0.0.0', 5678))
# Wait 30 seconds for a debugger to attach. If none attaches, the script continues as normal.
debugpy.wait_for_client()
print("Debugger attached...")
3. Create an image based on the environment definition and pull the image to the local registry.
NOTE
This example assumes that ws points to your Azure Machine Learning workspace, and that model is the model
being deployed. The myenv.yml file contains the conda dependencies created in step 1.
Once the image has been created and downloaded (this process may take more than 10 minutes, so
please wait patiently), the image path (includes repository, name, and tag, which in this case is also its
digest) is finally displayed in a message similar to the following:
4. To make it easier to work with the image locally, you can use the following command to add a tag for this
image. Replace myimagepath in the following command with the location value from the previous step.
For the rest of the steps, you can refer to the local image as debug:1 instead of the full image path value.
Debug the service
TIP
If you set a timeout for the debugpy connection in the score.py file, you must connect VS Code to the debug session
before the timeout expires. Start VS Code, open the local copy of score.py , set a breakpoint, and have it ready to go
before using the steps in this section.
For more information on debugging and setting breakpoints, see Debugging.
1. To start a Docker container using the image, use the following command:
This attaches your score.py locally to the one in the container. Therefore, any changes made in the editor
are automatically reflected in the container
2. For a better experience, you can go into the container with a new VS code interface. Select the Docker
extention from the VS Code side bar, find your local container created, in this documentation it's debug:1 .
Right-click this container and select "Attach Visual Studio Code" , then a new VS Code interface will be
opened automatically, and this interface shows the inside of your created container.
runsvdir /var/runit
Then you can see the following output in the shell inside your container:
4. To attach VS Code to debugpy inside the container, open VS Code and use the F5 key or select Debug .
When prompted, select the Azure Machine Learning Deployment: Docker Debug configuration.
You can also select the Run extention icon from the side bar, the Azure Machine Learning
Deployment: Docker Debug entry from the Debug dropdown menu, and then use the green arrow to
attach the debugger.
After clicking the green arrow and attaching the debugger, in the container VS Code interface you can see
some new information:
Also, in your main VS Code interface, what you can see is following:
And now, the local score.py which is attached to the container has already stopped at the breakpoints where
you set. At this point, VS Code connects to debugpy inside the Docker container and stops the Docker container
at the breakpoint you set previously. You can now step through the code as it runs, view variables, etc.
For more information on using VS Code to debug Python, see Debug your Python code.
Stop the container
To stop the container, use the following command:
Workspace
REF EREN C E URI
Workspace https://azuremlschemas.azureedge.net/latest/workspace.sche
ma.json
Environment
REF EREN C E URI
Environment https://azuremlschemas.azureedge.net/latest/environment.sc
hema.json
Data
REF EREN C E URI
Dataset https://azuremlschemas.azureedge.net/latest/data.schema.js
on
Model
REF EREN C E URI
Model https://azuremlschemas.azureedge.net/latest/model.schema.j
son
Compute
REF EREN C E URI
Job
REF EREN C E URI
Command https://azuremlschemas.azureedge.net/latest/commandJob.s
chema.json
Sweep https://azuremlschemas.azureedge.net/latest/sweepJob.sche
ma.json
Pipeline https://azuremlschemas.azureedge.net/latest/pipelineJob.sch
ema.json
Datastore
REF EREN C E URI
Endpoint
REF EREN C E URI
Batch https://azuremlschemas.azureedge.net/latest/batchEndpoint.
schema.json
Deployment
REF EREN C E URI
REF EREN C E URI
Batch https://azuremlschemas.azureedge.net/latest/batchDeploym
ent.schema.json
Component
REF EREN C E URI
Command https://azuremlschemas.azureedge.net/latest/commandCom
ponent.schema.json
Next steps
Install and use the CLI (v2)
CLI (v2) core YAML syntax
5/25/2022 • 7 minutes to read • Edit Online
azureml:/subscriptions/<subscription-id>/resourceGroups/<resource-
group>/providers/Microsoft.MachineLearningServices/workspaces/<workspace-
name>/environments/<environment-name>/versions/<environment-version>
azureml:/subscriptions/<subscription-id>/resourceGroups/<resource-
group>/providers/Microsoft.MachineLearningServices/workspaces/<workspace-name>/computes/<compute-name>
azureml://datastores/<datastore-name>/paths/<path-on-datastore>/
For example:
azureml://datastores/workspaceblobstore/paths/example-data/
azureml://datastores/workspaceblobstore/paths/example-data/iris.csv
In addition to the Azure ML data reference URI, Azure ML also supports the following direct storage URI
protocols: https , wasbs , abfss , and adl , as well as public http and https URIs.
In the example below for a sweep job YAML file, the ${{search_space.learning_rate}} and
${{search_space.boosting}} references in trial.command will resolve to the actual hyperparameter values
selected for each trial when the trial job is submitted for execution.
$schema: https://azuremlschemas.azureedge.net/latest/sweepJob.schema.json
type: sweep
sampling_algorithm:
type: random
search_space:
learning_rate:
type: uniform
min_value: 0.01
max_value: 0.9
boosting:
type: choice
values: ["gbdt", "dart"]
objective:
goal: minimize
primary_metric: test-multi_logloss
trial:
code: ./src
command: >-
python train.py
--training-data ${{inputs.iris}}
--lr ${{search_space.learning_rate}}
--boosting ${{search_space.boosting}}
environment: azureml:AzureML-Minimal@latest
inputs:
iris:
type: uri_file
path: https://azuremlexamples.blob.core.windows.net/datasets/iris.csv
mode: download
compute: azureml:cpu-cluster
Parameterizing the command with the inputs and outputs contexts of a component
Similar to the command for a job, the command for a component can also be parameterized with references to the
inputs and outputs contexts. In this case the reference is to the component's inputs and outputs. When the
component is run in a job, Azure ML will resolve those references to the job runtime input and output values
specified for the respective component inputs and outputs. Below is an example of using the context syntax for a
command component YAML specification.
$schema: https://azuremlschemas.azureedge.net/latest/commandComponent.schema.json
type: command
code: ./src
command: python train.py --lr ${{inputs.learning_rate}} --training-data ${{inputs.iris}} --model-dir
${{outputs.model_dir}}
environment: azureml:AzureML-Minimal@latest
inputs:
learning_rate:
type: number
default: 0.01
iris:
type: uri_file
outputs:
model_dir:
type: uri_folder
Next steps
Install and use the CLI (v2)
Train models with the CLI (v2)
CLI (v2) YAML schemas
CLI (v2) workspace YAML schema
5/25/2022 • 4 minutes to read • Edit Online
NOTE
The YAML syntax detailed in this document is based on the JSON schema for the latest version of the ML CLI v2
extension. This syntax is guaranteed only to work with the latest version of the ML CLI v2 extension. You can find the
schemas for older extension versions at https://azuremlschemasprod.azureedge.net/.
YAML syntax
K EY TYPE DESC RIP T IO N A L LO W ED VA L UES DEFA ULT VA L UE
string
customer_managed_key.key_vault The fully qualified
resource ID of the
key vault containing
the customer-
managed key. This
key vault can be
different than the
default workspace
key vault specified in
key_vault .
K EY TYPE DESC RIP T IO N A L LO W ED VA L UES DEFA ULT VA L UE
string
customer_managed_key.key_uri The key URI of the
customer-managed
key to encrypt data
at rest. The URI
format is
https://<keyvault-
dns-
name>/keys/<key-
name>/<key-
version>
.
Remarks
The az ml workspace command can be used for managing Azure Machine Learning workspaces.
Examples
Examples are available in the examples GitHub repository. Several are shown below.
YAML: basic
$schema: https://azuremlschemas.azureedge.net/latest/workspace.schema.json
name: mlw-basic-prod
location: eastus
display_name: Basic workspace-example
description: This example shows a YML configuration for a basic workspace. In case you use this
configuration to deploy a new workspace, since no existing dependent resources are specified, these will be
automatically created.
hbi_workspace: false
tags:
purpose: demonstration
Next steps
Install and use the CLI (v2)
CLI (v2) environment YAML schema
5/25/2022 • 2 minutes to read • Edit Online
NOTE
The YAML syntax detailed in this document is based on the JSON schema for the latest version of the ML CLI v2
extension. This syntax is guaranteed only to work with the latest version of the ML CLI v2 extension. You can find the
schemas for older extension versions at https://azuremlschemasprod.azureedge.net/.
YAML syntax
K EY TYPE DESC RIP T IO N A L LO W ED VA L UES DEFA ULT VA L UE
If specified, image
must be specified as
well. Azure ML will
build the conda
environment on top
of the Docker image
provided.
Remarks
The az ml environment command can be used for managing Azure Machine Learning environments.
Examples
Examples are available in the examples GitHub repository. Several are shown below.
Next steps
Install and use the CLI (v2)
CLI (v2) data YAML schema
5/25/2022 • 2 minutes to read • Edit Online
NOTE
The YAML syntax detailed in this document is based on the JSON schema for the latest version of the ML CLI v2
extension. This syntax is guaranteed only to work with the latest version of the ML CLI v2 extension. You can find the
schemas for older extension versions at https://azuremlschemasprod.azureedge.net/.
YAML syntax
K EY TYPE DESC RIP T IO N A L LO W ED VA L UES DEFA ULT VA L UE
Remarks
The az ml data commands can be used for managing Azure Machine Learning data assets.
Examples
Examples are available in the examples GitHub repository. Several are shown below.
Next steps
Install and use the CLI (v2)
CLI (v2) model YAML schema
5/25/2022 • 2 minutes to read • Edit Online
NOTE
The YAML syntax detailed in this document is based on the JSON schema for the latest version of the ML CLI v2
extension. This syntax is guaranteed only to work with the latest version of the ML CLI v2 extension. You can find the
schemas for older extension versions at https://azuremlschemasprod.azureedge.net/.
YAML syntax
K EY TYPE DESC RIP T IO N A L LO W ED VA L UES
Remarks
The az ml model command can be used for managing Azure Machine Learning models.
Examples
Examples are available in the examples GitHub repository. Several are shown below.
NOTE
The YAML syntax detailed in this document is based on the JSON schema for the latest version of the ML CLI v2
extension. This syntax is guaranteed only to work with the latest version of the ML CLI v2 extension. You can find the
schemas for older extension versions at https://azuremlschemasprod.azureedge.net/.
YAML syntax
K EY TYPE DESC RIP T IO N A L LO W ED VA L UES DEFA ULT VA L UE
integer
idle_time_before_scale_down Node idle time in 120
seconds before
scaling down the
cluster.
boolean
ssh_public_access_enabled Whether to enable false
public SSH access on
the nodes of the
cluster.
string
ssh_settings.admin_username The name of the
administrator user
account that can be
used to SSH into
nodes.
K EY TYPE DESC RIP T IO N A L LO W ED VA L UES DEFA ULT VA L UE
string
ssh_settings.admin_password The password of the
administrator user
account. One of
admin_password or
ssh_key_value is
required.
string
ssh_settings.ssh_key_value The SSH public key of
the administrator
user account. One
of admin_password
or ssh_key_value
is required.
string
network_settings.vnet_name Name of the virtual
network (VNet) when
creating a new one
or referencing an
existing one.
array
identity.user_assigned_identities List of fully qualified
resource IDs of the
user-assigned
identities.
Remarks
The az ml compute commands can be used for managing Azure Machine Learning compute clusters
(AmlCompute).
Examples
Examples are available in the examples GitHub repository. Several are shown below.
YAML: minimal
$schema: https://azuremlschemas.azureedge.net/latest/amlCompute.schema.json
name: minimal-example
type: amlcompute
YAML: basic
$schema: https://azuremlschemas.azureedge.net/latest/amlCompute.schema.json
name: basic-example
type: amlcompute
size: STANDARD_DS3_v2
min_instances: 0
max_instances: 2
idle_time_before_scale_down: 120
Next steps
Install and use the CLI (v2)
CLI (v2) compute instance YAML schema
5/25/2022 • 2 minutes to read • Edit Online
NOTE
The YAML syntax detailed in this document is based on the JSON schema for the latest version of the ML CLI v2
extension. This syntax is guaranteed only to work with the latest version of the ML CLI v2 extension. You can find the
schemas for older extension versions at https://azuremlschemasprod.azureedge.net/.
YAML syntax
K EY TYPE DESC RIP T IO N A L LO W ED VA L UES DEFA ULT VA L UE
string
create_on_behalf_of.user_tenant_id The AAD Tenant ID of
the assigned user.
string
create_on_behalf_of.user_object_id The AAD Object ID of
the assigned user.
boolean
ssh_public_access_enabled Whether to enable false
public SSH access on
the compute
instance.
string
ssh_settings.ssh_key_value The SSH public key of
the administrator
user account.
string
network_settings.vnet_name Name of the virtual
network (VNet) when
creating a new one
or referencing an
existing one.
Remarks
The az ml compute command can be used for managing Azure Machine Learning compute instances.
YAML: minimal
$schema: https://azuremlschemas.azureedge.net/latest/computeInstance.schema.json
name: minimal-example-i
type: computeinstance
YAML: basic
$schema: https://azuremlschemas.azureedge.net/latest/computeInstance.schema.json
name: basic-example-i
type: computeinstance
size: STANDARD_DS3_v2
Next steps
Install and use the CLI (v2)
CLI (v2) attached Virtual Machine YAML schema
5/25/2022 • 2 minutes to read • Edit Online
NOTE
The YAML syntax detailed in this document is based on the JSON schema for the latest version of the ML CLI v2
extension. This syntax is guaranteed only to work with the latest version of the ML CLI v2 extension. You can find the
schemas for older extension versions at https://azuremlschemasprod.azureedge.net/.
YAML syntax
K EY TYPE DESC RIP T IO N A L LO W ED VA L UES DEFA ULT VA L UE
string
ssh_settings.admin_username The name of the
administrator user
account that can be
used to SSH into the
virtual machine.
string
ssh_settings.admin_password The password of the
administrator user
account. One of
admin_password or
ssh_private_key_file
is required.
string
ssh_settings.ssh_private_key_file The local path to the
SSH private key file of
the administrator
user account. One
of admin_password
or
ssh_private_key_file
is required.
Remarks
The az ml compute command can be used for managing Virtual Machines (VM) attached to an Azure Machine
Learning workspace.
Examples
Examples are available in the examples GitHub repository. Several are shown below.
YAML: basic
$schema: https://azuremlschemas.azureedge.net/latest/vmCompute.schema.json
name: vm-example
type: virtualmachine
resource_id:
/subscriptions/<SUBSCRIPTION_ID>/resourceGroups/<RESOURCE_GROUP>/providers/Microsoft.Compute/virtualMachines
/<VM_NAME>
ssh_settings:
admin_username: <admin_username>
admin_password: <admin_password>
Next steps
Install and use the CLI (v2)
CLI (v2) Attached Azure Arc-enabled Kubernetes
cluster (KubernetesCompute) YAML schema
5/25/2022 • 2 minutes to read • Edit Online
NOTE
The YAML syntax detailed in this document is based on the JSON schema for the latest version of the ML CLI v2
extension. This syntax is guaranteed only to work with the latest version of the ML CLI v2 extension. You can find the
schemas for older extension versions at https://azuremlschemasprod.azureedge.net/.
YAML syntax
K EY TYPE DESC RIP T IO N A L LO W ED VA L UES DEFA ULT VA L UE
array
identity.user_assigned_identities List of fully qualified
resource IDs of the
user-assigned
identities.
Remarks
The az ml compute commands can be used for managing Azure Arc-enabled Kubernetes clusters
(KubernetesCompute) attached to an Azure Machine Learning workspace.
Next steps
Install and use the CLI (v2)
Configure and attach Kubernetes clusters anywhere
CLI (v2) command job YAML schema
5/25/2022 • 8 minutes to read • Edit Online
NOTE
The YAML syntax detailed in this document is based on the JSON schema for the latest version of the ML CLI v2
extension. This syntax is guaranteed only to work with the latest version of the ML CLI v2 extension. You can find the
schemas for older extension versions at https://azuremlschemasprod.azureedge.net/.
YAML syntax
K EY TYPE DESC RIP T IO N A L LO W ED VA L UES DEFA ULT VA L UE
To reference an
existing environment
use the
azureml:
<environment_name>:
<environment_version>
syntax or
azureml:
<environment_name>@latest
(to reference the
latest version of an
environment).
To define an
environment inline
please follow the
Environment schema.
Exclude the name
and version
properties as they
are not supported
for inline
environments.
integer
resources.instance_count The number of nodes 1
to use for the job.
Inputs can be
referenced in the
command using the
${{ inputs.
<input_name> }}
expression.
K EY TYPE DESC RIP T IO N A L LO W ED VA L UES DEFA ULT VA L UE
Outputs can be
referenced in the
command using the
${{ outputs.
<output_name> }}
expression.
Distribution configurations
MpiConfiguration
PyTorchConfiguration
K EY TYPE DESC RIP T IO N A L LO W ED VA L UES DEFA ULT VA L UE
integer
process_count_per_instance The number of 1
processes per node
to launch for the job.
TensorFlowConfiguration
Job inputs
K EY TYPE DESC RIP T IO N A L LO W ED VA L UES DEFA ULT VA L UE
- A URI of a cloud
path to the file or
folder to use as the
input. Supported URI
types are azureml ,
https , wasbs ,
abfss , adl . See
Core yaml syntax for
more information on
how to use the
azureml:// URI
format.
- An existing
registered Azure ML
data asset to use as
the input. To
reference a registered
data asset use the
azureml:
<data_name>:
<data_version>
syntax or
azureml:
<data_name>@latest
(to reference the
latest version of that
data asset), e.g.
path:
azureml:cifar10-
data:1
or
path:
azureml:cifar10-
data@latest
.
K EY TYPE DESC RIP T IO N A L LO W ED VA L UES DEFA ULT VA L UE
Job outputs
K EY TYPE DESC RIP T IO N A L LO W ED VA L UES DEFA ULT VA L UE
Remarks
The az ml job command can be used for managing Azure Machine Learning jobs.
Examples
Examples are available in the examples GitHub repository. Several are shown below.
This is a "hello world" job running in the cloud via Azure Machine Learning!
## Description
Markdown is supported in the studio for job descriptions! You can edit the description there or via CLI.
Next steps
Install and use the CLI (v2)
CLI (v2) sweep job YAML schema
5/25/2022 • 10 minutes to read • Edit Online
NOTE
The YAML syntax detailed in this document is based on the JSON schema for the latest version of the ML CLI v2
extension. This syntax is guaranteed only to work with the latest version of the ML CLI v2 extension. You can find the
schemas for older extension versions at https://azuremlschemasprod.azureedge.net/.
YAML syntax
K EY TYPE DESC RIP T IO N A L LO W ED VA L UES DEFA ULT VA L UE
Hyperparameters can
be referenced in the
trial.command
using the
${{ search_space.
<hyperparameter>
}}
expression.
string
objective.primary_metric Required. The name
of the primary metric
reported by each trial
job. The metric must
be logged in the
user's training script
using
mlflow.log_metric()
with the same
corresponding metric
name.
Inputs can be
referenced in the
command using the
${{ inputs.
<input_name> }}
expression.
Outputs can be
referenced in the
command using the
${{ outputs.
<output_name> }}
expression.
Sampling algorithms
RandomSamplingAlgorithm
K EY TYPE DESC RIP T IO N A L LO W ED VA L UES DEFA ULT VA L UE
GridSamplingAlgorithm
BayesianSamplingAlgorithm
MedianStoppingPolicy
TruncationSelectionPolicy
Parameter expressions
choice
randint
qlognormal, qnormal
qloguniform, quniform
K EY TYPE DESC RIP T IO N A L LO W ED VA L UES
lognormal, normal
loguniform
uniform
To reference an existing
environment use the
azureml:<environment-
name>:<environment-
version>
syntax.
To define an environment
inline please follow the
Environment schema.
Exclude the name and
version properties as
they are not supported for
inline environments.
Distribution configurations
M p i C o n fi g u r a t i o n
P y To r c h C o n fi g u r a t i o n
integer
process_count_per_instance The number of 1
processes per node
to launch for the job.
Te n so r F l o w C o n fi g u r a t i o n
Job inputs
K EY TYPE DESC RIP T IO N A L LO W ED VA L UES DEFA ULT VA L UE
- A URI of a cloud
path to the file or
folder to use as the
input. Supported URI
types are azureml ,
https , wasbs ,
abfss , adl . See
Core yaml syntax for
more information on
how to use the
azureml:// URI
format.
- An existing
registered Azure ML
data asset to use as
the input. To
reference a registered
data asset use the
azureml:
<data_name>:
<data_version>
syntax or
azureml:
<data_name>@latest
(to reference the
latest version of that
data asset), e.g.
path:
azureml:cifar10-
data:1
or
path:
azureml:cifar10-
data@latest
.
K EY TYPE DESC RIP T IO N A L LO W ED VA L UES DEFA ULT VA L UE
Job outputs
K EY TYPE DESC RIP T IO N A L LO W ED VA L UES DEFA ULT VA L UE
Remarks
The az ml job command can be used for managing Azure Machine Learning jobs.
Examples
Examples are available in the examples GitHub repository. Several are shown below.
Next steps
Install and use the CLI (v2)
CLI (v2) pipeline job YAML schema
5/25/2022 • 6 minutes to read • Edit Online
NOTE
The YAML syntax detailed in this document is based on the JSON schema for the latest version of the ML CLI v2
extension. This syntax is guaranteed only to work with the latest version of the ML CLI v2 extension. You can find the
schemas for older extension versions at https://azuremlschemasprod.azureedge.net/.
YAML syntax
K EY TYPE DESC RIP T IO N A L LO W ED VA L UES DEFA ULT VA L UE
These pipeline
outputs can be
referenced by the
outputs of an
individual step job in
the pipeline using the
${{
parents.outputs.
<output_name> }}
expression. For more
information on how
to bind the inputs of
a pipeline step to the
inputs of the top-
level pipeline job, see
the Expression syntax
for binding inputs
and outputs between
steps in a pipeline
job.
Job inputs
K EY TYPE DESC RIP T IO N A L LO W ED VA L UES DEFA ULT VA L UE
- A URI of a cloud
path to the file or
folder to use as the
input. Supported URI
types are azureml ,
https , wasbs ,
abfss , adl . See
Core yaml syntax for
more information on
how to use the
azureml:// URI
format.
- An existing
registered Azure ML
data asset to use as
the input. To
reference a registered
data asset use the
azureml:
<data_name>:
<data_version>
syntax or
azureml:
<data_name>@latest
(to reference the
latest version of that
data asset), e.g.
path:
azureml:cifar10-
data:1
or
path:
azureml:cifar10-
data@latest
.
K EY TYPE DESC RIP T IO N A L LO W ED VA L UES DEFA ULT VA L UE
Job outputs
K EY TYPE DESC RIP T IO N A L LO W ED VA L UES DEFA ULT VA L UE
Remarks
The az ml job commands can be used for managing Azure Machine Learning pipeline jobs.
Examples
Examples are available in the examples GitHub repository. Several are shown below.
settings:
default_datestore: azureml:workspaceblobstore
default_compute: azureml:cpu-cluster
jobs:
hello_job:
command: echo 202204190 & echo "hello"
environment: azureml:AzureML-sklearn-0.24-ubuntu18.04-py37-cpu:23
world_job:
command: echo 202204190 & echo "hello"
environment: azureml:AzureML-sklearn-0.24-ubuntu18.04-py37-cpu:23
inputs:
hello_string_top_level_input: "hello world"
jobs:
a:
command: echo hello ${{inputs.hello_string}}
environment: azureml:AzureML-sklearn-0.24-ubuntu18.04-py37-cpu@latest
inputs:
hello_string: ${{parent.inputs.hello_string_top_level_input}}
b:
command: echo "world" >> ${{outputs.world_output}}/world.txt
environment: azureml:AzureML-sklearn-0.24-ubuntu18.04-py37-cpu@latest
outputs:
world_output:
c:
command: echo ${{inputs.world_input}}/world.txt
environment: azureml:AzureML-sklearn-0.24-ubuntu18.04-py37-cpu@latest
inputs:
world_input: ${{parent.jobs.b.outputs.world_output}}
Next steps
Install and use the CLI (v2)
CLI (v2) Azure Blob datastore YAML schema
5/25/2022 • 2 minutes to read • Edit Online
NOTE
The YAML syntax detailed in this document is based on the JSON schema for the latest version of the ML CLI v2
extension. This syntax is guaranteed only to work with the latest version of the ML CLI v2 extension. You can find the
schemas for older extension versions at https://azuremlschemasprod.azureedge.net/.
YAML syntax
K EY TYPE DESC RIP T IO N A L LO W ED VA L UES DEFA ULT VA L UE
Remarks
The az ml datastore command can be used for managing Azure Machine Learning datastores.
Examples
Examples are available in the examples GitHub repository. Several are shown below.
Next steps
Install and use the CLI (v2)
CLI (v2) Azure Files datastore YAML schema
5/25/2022 • 2 minutes to read • Edit Online
NOTE
The YAML syntax detailed in this document is based on the JSON schema for the latest version of the ML CLI v2
extension. This syntax is guaranteed only to work with the latest version of the ML CLI v2 extension. You can find the
schemas for older extension versions at https://azuremlschemasprod.azureedge.net/.
YAML syntax
K EY TYPE DESC RIP T IO N A L LO W ED VA L UES DEFA ULT VA L UE
Remarks
The az ml datastore command can be used for managing Azure Machine Learning datastores.
Examples
Examples are available in the examples GitHub repository. Several are shown below.
Next steps
Install and use the CLI (v2)
CLI (v2) Azure Data Lake Gen1 YAML schema
5/25/2022 • 2 minutes to read • Edit Online
NOTE
The YAML syntax detailed in this document is based on the JSON schema for the latest version of the ML CLI v2
extension. This syntax is guaranteed only to work with the latest version of the ML CLI v2 extension. You can find the
schemas for older extension versions at https://azuremlschemasprod.azureedge.net/.
YAML syntax
K EY TYPE DESC RIP T IO N A L LO W ED VA L UES DEFA ULT VA L UE
string
credentials.client_secret The client secret of
the service principal.
Required if
credentials is
specified.
string
credentials.resource_url The resource URL https://datalake.azure.net/
that determines what
operations will be
performed on the
Azure Data Lake
Storage Gen1
account.
string
credentials.authority_url The authority URL https://login.microsoftonline.com
used to authenticate
the user.
Remarks
The az ml datastore command can be used for managing Azure Machine Learning datastores.
Examples
Examples are available in the examples GitHub repository. Several are shown below.
NOTE
The YAML syntax detailed in this document is based on the JSON schema for the latest version of the ML CLI v2
extension. This syntax is guaranteed only to work with the latest version of the ML CLI v2 extension. You can find the
schemas for older extension versions at https://azuremlschemasprod.azureedge.net/.
YAML syntax
K EY TYPE DESC RIP T IO N A L LO W ED VA L UES DEFA ULT VA L UE
string
credentials.client_secret The client secret of
the service principal.
Required if
credentials is
specified.
string
credentials.resource_url The resource URL https://storage.azure.com/
that determines what
operations will be
performed on the
Azure Data Lake
Storage Gen2
account.
string
credentials.authority_url The authority URL https://login.microsoftonline.com
used to authenticate
the user.
Remarks
The az ml datastore command can be used for managing Azure Machine Learning datastores.
Examples
Examples are available in the examples GitHub repository. Several are shown below.
Next steps
Install and use the CLI (v2)
CLI (v2) online endpoint YAML schema
5/25/2022 • 3 minutes to read • Edit Online
NOTE
The YAML syntax detailed in this document is based on the JSON schema for the latest version of the ML CLI v2
extension. This syntax is guaranteed only to work with the latest version of the ML CLI v2 extension. You can find the
schemas for older extension versions at https://azuremlschemasprod.azureedge.net/.
NOTE
A fully specified sample YAML for online endpoints is available for reference
YAML syntax
K EY TYPE DESC RIP T IO N A L LO W ED VA L UES DEFA ULT VA L UE
array
identity.user_assigned_identities List of fully qualified
resource IDs of the
user-assigned
identities.
K EY TYPE DESC RIP T IO N A L LO W ED VA L UES DEFA ULT VA L UE
Remarks
The az ml online-endpoint commands can be used for managing Azure Machine Learning online endpoints.
Examples
Examples are available in the examples GitHub repository. Several are shown below.
YAML: basic
$schema: https://azuremlschemas.azureedge.net/latest/managedOnlineEndpoint.schema.json
name: my-endpoint
auth_mode: key
Next steps
Install and use the CLI (v2)
Learn how to deploy a model with a managed online endpoint
Troubleshooting managed online endpoints deployment and scoring (preview)
CLI (v2) batch endpoint YAML schema
5/25/2022 • 2 minutes to read • Edit Online
NOTE
The YAML syntax detailed in this document is based on the JSON schema for the latest version of the ML CLI v2
extension. This syntax is guaranteed only to work with the latest version of the ML CLI v2 extension. You can find the
schemas for older extension versions at https://azuremlschemasprod.azureedge.net/.
YAML syntax
K EY TYPE DESC RIP T IO N A L LO W ED VA L UES DEFA ULT VA L UE
string
defaults.deployment_name Name of the
deployment that will
serve as the default
deployment for the
endpoint.
Remarks
The az ml batch-endpoint commands can be used for managing Azure Machine Learning endpoints.
Examples
Examples are available in the examples GitHub repository. Several are shown below.
YAML: basic
$schema: https://azuremlschemas.azureedge.net/latest/batchEndpoint.schema.json
name: mybatchedp
description: my sample batch endpoint
auth_mode: aad_token
Next steps
Install and use the CLI (v2)
CLI (v2) managed online deployment YAML schema
5/25/2022 • 5 minutes to read • Edit Online
NOTE
The YAML syntax detailed in this document is based on the JSON schema for the latest version of the ML CLI v2
extension. This syntax is guaranteed only to work with the latest version of the ML CLI v2 extension. You can find the
schemas for older extension versions at https://azuremlschemasprod.azureedge.net/.
YAML syntax
K EY TYPE DESC RIP T IO N A L LO W ED VA L UES DEFA ULT VA L UE
To reference an
existing model, use
the
azureml:<model-
name>:<model-
version>
syntax.
To define a model
inline, follow the
Model schema.
string
code_configuration.scoring_script Relative path to the
scoring file in the
source code
directory.
To reference an
existing environment,
use the
azureml:
<environment-
name>:
<environment-
version>
syntax.
To define an
environment inline,
follow the
Environment schema.
instance_count
can be updated after
deployment creation
using
az ml online-
deployment update
command.
We reserve an extra
20% for performing
upgrades. For more
information, see
managed online
endpoint quotas.
string
egress_public_network_access This flag secures the enabled , enabled
deployment by disabled
restricting
communication
between the
deployment and the
Azure resources used
by it. Set to
disabled to ensure
that the download of
the model, code, and
images needed by
your deployment are
secured with a
private endpoint. This
flag is applicable only
for managed online
endpoints.
RequestSettings
K EY TYPE DESC RIP T IO N DEFA ULT VA L UE
integer
max_concurrent_requests_per_instance The maximum number of 1
concurrent requests per
instance allowed for the
deployment.
ProbeSettings
K EY TYPE DESC RIP T IO N DEFA ULT VA L UE
Remarks
The az ml online-deployment commands can be used for managing Azure Machine Learning managed online
deployments.
Examples
Examples are available in the examples GitHub repository. Several are shown below.
YAML: basic
$schema: https://azuremlschemas.azureedge.net/latest/managedOnlineDeployment.schema.json
name: blue
endpoint_name: my-endpoint
model:
path: ../../model-1/model/
code_configuration:
code: ../../model-1/onlinescoring/
scoring_script: score.py
environment:
conda_file: ../../model-1/environment/conda.yml
image: mcr.microsoft.com/azureml/openmpi3.1.2-ubuntu18.04:20210727.v1
instance_type: Standard_DS2_v2
instance_count: 1
$schema: https://azuremlschemas.azureedge.net/latest/managedOnlineDeployment.schema.json
name: green
endpoint_name: my-endpoint
model:
path: ../../model-2/model/
code_configuration:
code: ../../model-2/onlinescoring/
scoring_script: score.py
environment:
conda_file: ../../model-2/environment/conda.yml
image: mcr.microsoft.com/azureml/openmpi3.1.2-ubuntu18.04:20210727.v1
instance_type: Standard_DS2_v2
instance_count: 1
Next steps
Install and use the CLI (v2)
CLI (v2) Azure Arc-enabled Kubernetes online
deployment YAML schema
5/25/2022 • 5 minutes to read • Edit Online
NOTE
The YAML syntax detailed in this document is based on the JSON schema for the latest version of the ML CLI v2
extension. This syntax is guaranteed only to work with the latest version of the ML CLI v2 extension. You can find the
schemas for older extension versions at https://azuremlschemasprod.azureedge.net/.
YAML syntax
K EY TYPE DESC RIP T IO N A L LO W ED VA L UES DEFA ULT VA L UE
To reference an
existing model, use
the
azureml:<model-
name>:<model-
version>
syntax.
To define a model
inline, follow the
Model schema.
string
code_configuration.scoring_script Relative path to the
scoring file in the
source code
directory.
To reference an
existing environment,
use the
azureml:
<environment-
name>:
<environment-
version>
syntax.
To define an
environment inline,
follow the
Environment schema.
instance_count
can be updated after
deployment creation
using
az ml online-
deployment update
command.
K EY TYPE DESC RIP T IO N A L LO W ED VA L UES DEFA ULT VA L UE
To configure the
target_utilization
scale type (
scale_settings.type:
target_utilization
), see
TargetUtilizationScale
Settings for the set of
configurable
properties.
RequestSettings
K EY TYPE DESC RIP T IO N DEFA ULT VA L UE
integer
max_concurrent_requests_per_instance The maximum number of 1
concurrent requests per
instance allowed for the
deployment.
ProbeSettings
K EY TYPE DESC RIP T IO N DEFA ULT VA L UE
TargetUtilizationScaleSettings
K EY TYPE DESC RIP T IO N DEFA ULT VA L UE
integer
target_utilization_percentage The target CPU usage for 70
the autoscaler.
K EY TYPE DESC RIP T IO N DEFA ULT VA L UE
ContainerResourceRequests
K EY TYPE DESC RIP T IO N
ContainerResourceLimits
K EY TYPE DESC RIP T IO N
memory string The limit for the memory size for the
container.
Remarks
The az ml online-deployment commands can be used for managing Azure Machine Learning Kubernetes online
deployments.
Examples
Examples are available in the examples GitHub repository.
Next steps
Install and use the CLI (v2)
CLI (v2) batch deployment YAML schema
5/25/2022 • 3 minutes to read • Edit Online
NOTE
The YAML syntax detailed in this document is based on the JSON schema for the latest version of the ML CLI v2
extension. This syntax is guaranteed only to work with the latest version of the ML CLI v2 extension. You can find the
schemas for older extension versions at https://azuremlschemasprod.azureedge.net/.
YAML syntax
K EY TYPE DESC RIP T IO N A L LO W ED VA L UES DEFA ULT VA L UE
To reference an
existing model, use
the
azureml:<model-
name>:<model-
version>
syntax.
To define a model
inline, follow the
Model schema.
string
code_configuration.scoring_script The Python file in the
above directory. This
file must have an
init() function
and a run()
function. Use the
init() function for
any costly or
common preparation
(for example, load the
model in memory).
init() will be
called only once at
beginning of process.
Use
run(mini_batch)
to score each entry;
the value of
mini_batch is a list
of file paths. The
run() function
should return a
pandas DataFrame or
an array. Each
returned element
indicates one
successful run of
input element in the
mini_batch . For
more information on
how to author
scoring script, see
Understanding the
scoring script.
K EY TYPE DESC RIP T IO N A L LO W ED VA L UES DEFA ULT VA L UE
To reference an
existing environment,
use the
azureml:
<environment-
name>:
<environment-
version>
syntax.
To define an
environment inline,
follow the
Environment schema.
integer
resources.instance_count The number of nodes 1
to use for each batch
scoring job.
integer
max_concurrency_per_instance The maximum 1
number of parallel
scoring_script
runs per instance.
K EY TYPE DESC RIP T IO N A L LO W ED VA L UES DEFA ULT VA L UE
integer
retry_settings.max_retries The maximum 3
number of retries for
a failed or timed-out
mini batch.
Remarks
The az ml batch-deployment commands can be used for managing Azure Machine Learning batch deployments.
Examples
Examples are available in the examples GitHub repository. Several are shown below.
Next steps
Install and use the CLI (v2)
CLI (v2) command component YAML schema
5/25/2022 • 3 minutes to read • Edit Online
NOTE
The YAML syntax detailed in this document is based on the JSON schema for the latest version of the ML CLI v2
extension. This syntax is guaranteed only to work with the latest version of the ML CLI v2 extension. You can find the
schemas for older extension versions at https://azuremlschemasprod.azureedge.net/.
YAML syntax
K EY TYPE DESC RIP T IO N A L LO W ED VA L UES DEFA ULT VA L UE
To reference an
existing environment,
use the
azureml:
<environment-
name>:
<environment-
version>
syntax.
To define an
environment inline,
follow the
Environment schema.
Exclude the name
and version
properties as they
are not supported
for inline
environments.
integer
resources.instance_count The number of nodes 1
to use for the job.
K EY TYPE DESC RIP T IO N A L LO W ED VA L UES DEFA ULT VA L UE
Inputs can be
referenced in the
command using the
${{ inputs.
<input_name> }}
expression.
Outputs can be
referenced in the
command using the
${{ outputs.
<output_name> }}
expression.
Distribution configurations
MpiConfiguration
PyTorchConfiguration
integer
process_count_per_instance The number of 1
processes per node
to launch for the job.
TensorFlowConfiguration
Component input
K EY TYPE DESC RIP T IO N A L LO W ED VA L UES DEFA ULT VA L UE
Component output
K EY TYPE DESC RIP T IO N A L LO W ED VA L UES DEFA ULT VA L UE
Remarks
The az ml component commands can be used for managing Azure Machine Learning components.
Examples
Command component examples are available in the examples GitHub repository. Select examples for are shown
below.
Examples are available in the examples GitHub repository. Several are shown below.
name: hello_python_world
display_name: Hello_Python_World
version: 1
code: ./src
environment:
image: python
command: >-
python hello.py
Next steps
Install and use the CLI (v2)
Data schemas to train computer vision models with
automated machine learning
5/25/2022 • 9 minutes to read • Edit Online
Learn how to format your JSONL files for data consumption in automated ML experiments for computer vision
tasks during training and inference.
{
"image_url":"AmlDatastore://data_directory/../Image_name.image_format",
"image_details":{
"format":"image_format",
"width":"image_width",
"height":"image_height"
},
"label":"class_name",
}
format Image type (all the available Image "jpg" or "jpeg" or "png" or
formats in Pillow library are supported) "jpe" or "jfif" or "bmp" or
"tif" or "tiff"
Optional, String from {"jpg",
"jpeg", "png", "jpe",
"jfif","bmp", "tif", "tiff"}
{
"image_url":"AmlDatastore://data_directory/../Image_name.image_format",
"image_details":{
"format":"image_format",
"width":"image_width",
"height":"image_height"
},
"label":[
"class_name_1",
"class_name_2",
"class_name_3",
"...",
"class_name_n"
]
}
format Image type (all the Image formats "jpg" or "jpeg" or "png" or
available in Pillow library are "jpe" or "jfif" or "bmp" or
"tif" or "tiff"
supported)
Optional, String from {"jpg",
"jpeg", "png", "jpe", "jfif",
"bmp", "tif", "tiff"}
K EY DESC RIP T IO N EXA M P L E
Object detection
The following is an example JSONL file for object detection.
{
"image_url":"AmlDatastore://data_directory/../Image_name.image_format",
"image_details":{
"format":"image_format",
"width":"image_width",
"height":"image_height"
},
"label":[
{
"label":"class_name_1",
"topX":"xmin/width",
"topY":"ymin/height",
"bottomX":"xmax/width",
"bottomY":"ymax/height",
"isCrowd":"isCrowd"
},
{
"label":"class_name_2",
"topX":"xmin/width",
"topY":"ymin/height",
"bottomX":"xmax/width",
"bottomY":"ymax/height",
"isCrowd":"isCrowd"
},
"..."
]
}
Here,
xmin = x coordinate of top-left corner of bounding box
ymin = y coordinate of top-left corner of bounding box
xmax = x coordinate of bottom-right corner of bounding box
ymax = y coordinate of bottom-right corner of bounding box
format Image type (all the Image formats "jpg" or "jpeg" or "png" or
available in Pillow library are "jpe" or "jfif" or "bmp" or
"tif" or "tiff"
supported. But for YOLO only image
formats allowed by opencv are
supported)
Optional, String from {"jpg",
"jpeg", "png", "jpe", "jfif",
"bmp", "tif", "tiff"}
label (outer key) List of bounding boxes, where each [{"label": "cat", "topX": 0.260,
box is a dictionary of "topY": 0.406, "bottomX": 0.735,
"bottomY": 0.701, "isCrowd": 0}]
label, topX, topY, bottomX,
bottomY, isCrowd
their top-left and bottom-right
coordinates
Required, List of dictionaries
{
"image_url":"AmlDatastore://data_directory/../Image_name.image_format",
"image_details":{
"format":"image_format",
"width":"image_width",
"height":"image_height"
},
"label":[
{
"label":"class_name",
"isCrowd":"isCrowd",
"polygon":[["x1", "y1", "x2", "y2", "x3", "y3", "...", "xn", "yn"]]
}
]
}
label (outer key) List of masks, where each mask is a [{"label": "can", "isCrowd": 0,
dictionary of "polygon": [[0.577, 0.689,
label, isCrowd, polygon 0.562, 0.681,
coordinates 0.559, 0.686]]}]
Required, List of dictionaries
polygon Polygon coordinates for the object [[0.577, 0.689, 0.567, 0.689,
Required, List of list for 0.559, 0.686]]
multiple segments of the same
instance. Float values in the
range [0,1]
Output format
Predictions made on model endpoints follow different structure depending on the task type. This section
explores the output data formats for multi-class, multi-label image classification, object detection, and instance
segmentation tasks.
Image classification
Endpoint for image classification returns all the labels in the dataset and their probability scores for the input
image in the following format.
{
"filename":"/tmp/tmppjr4et28",
"probs":[
2.098e-06,
4.783e-08,
0.999,
8.637e-06
],
"labels":[
"can",
"carton",
"milk_bottle",
"water_bottle"
]
}
Image classification multi-label
For image classification multi-label, model endpoint returns labels and their probabilities.
{
"filename":"/tmp/tmpsdzxlmlm",
"probs":[
0.997,
0.960,
0.982,
0.025
],
"labels":[
"can",
"carton",
"milk_bottle",
"water_bottle"
]
}
Object detection
Object detection model returns multiple boxes with their scaled top-left and bottom-right coordinates along
with box label and confidence score.
{
"filename":"/tmp/tmpdkg2wkdy",
"boxes":[
{
"box":{
"topX":0.224,
"topY":0.285,
"bottomX":0.399,
"bottomY":0.620
},
"label":"milk_bottle",
"score":0.937
},
{
"box":{
"topX":0.664,
"topY":0.484,
"bottomX":0.959,
"bottomY":0.812
},
"label":"can",
"score":0.891
},
{
"box":{
"topX":0.423,
"topY":0.253,
"bottomX":0.632,
"bottomY":0.725
},
"label":"water_bottle",
"score":0.876
}
]
}
Instance segmentation
In instance segmentation, output consists of multiple boxes with their scaled top-left and bottom-right
coordinates, labels, confidence scores, and polygons (not masks). Here, the polygon values are in the same
format that we discussed in the schema section.
{
"filename":"/tmp/tmpi8604s0h",
"boxes":[
{
"box":{
"topX":0.679,
"topY":0.491,
"bottomX":0.926,
"bottomY":0.810
},
"label":"can",
"score":0.992,
"polygon":[
[
0.82, 0.811, 0.771, 0.810, 0.758, 0.805, 0.741, 0.797, 0.735, 0.791, 0.718, 0.785, 0.715,
0.778, 0.706, 0.775, 0.696, 0.758, 0.695, 0.717, 0.698, 0.567, 0.705, 0.552, 0.706, 0.540, 0.725, 0.520,
0.735, 0.505, 0.745, 0.502, 0.755, 0.493
]
]
},
{
"box":{
"topX":0.220,
"topY":0.298,
"bottomX":0.397,
"bottomY":0.601
},
"label":"milk_bottle",
"score":0.989,
"polygon":[
[
0.365, 0.602, 0.273, 0.602, 0.26, 0.595, 0.263, 0.588, 0.251, 0.546, 0.248, 0.501, 0.25,
0.485, 0.246, 0.478, 0.245, 0.463, 0.233, 0.442, 0.231, 0.43, 0.226, 0.423, 0.226, 0.408, 0.234, 0.385,
0.241, 0.371, 0.238, 0.345, 0.234, 0.335, 0.233, 0.325, 0.24, 0.305, 0.586, 0.38, 0.592, 0.375, 0.598, 0.365
]
]
},
{
"box":{
"topX":0.433,
"topY":0.280,
"bottomX":0.621,
"bottomY":0.679
},
"label":"water_bottle",
"score":0.988,
"polygon":[
[
0.576, 0.680, 0.501, 0.680, 0.475, 0.675, 0.460, 0.625, 0.445, 0.630, 0.443, 0.572, 0.440,
0.560, 0.435, 0.515, 0.431, 0.501, 0.431, 0.433, 0.433, 0.426, 0.445, 0.417, 0.456, 0.407, 0.465, 0.381,
0.468, 0.327, 0.471, 0.318
]
]
}
]
}
NOTE
The images used in this article are from the Fridge Objects dataset, copyright © Microsoft Corporation and available at
computervision-recipes/01_training_introduction.ipynb under the MIT License.
Next steps
Learn how to Prepare data for training computer vision models with automated ML.
Set up computer vision tasks in AutoML
Tutorial: Train an object detection model (preview) with AutoML and Python.
Hyperparameters for computer vision tasks in
automated machine learning
5/25/2022 • 7 minutes to read • Edit Online
Learn which hyperparameters are available specifically for computer vision tasks in automated ML experiments.
With support for computer vision tasks, you can control the model algorithm and sweep hyperparameters.
These model algorithms and hyperparameters are passed in as the parameter space for the sweep. While many
of the hyperparameters exposed are model-agnostic, there are instances where hyperparameters are model-
specific or task-specific.
Model-specific hyperparameters
This table summarizes hyperparameters specific to the yolov5 algorithm.
This table summarizes hyperparameters specific to the maskrcnn_* for instance segmentation during inference.
Object detection: 2
(except yolov5 : 16)
Instance segmentation: 2
Object detection: 1
(except yolov5 : 16)
Instance segmentation: 1
Multi-label: 0.035
(except vit-variants:
vits16r224 : 0.025
vitb16r224 : 0.025
vitl16r224 : 0.002)
checkpoint_frequency Frequency to store model checkpoints. Checkpoint at epoch with best primary
Must be a positive integer. metric on validation.
PA RA M ET ER N A M E DESC RIP T IO N DEFA ULT
Notes:
seresnext doesn't take an
arbitrary size.
Training run may get into CUDA
OOM if the size is too big
.
Notes:
seresnext doesn't take an
arbitrary size.
ViT-variants should have the same
valid_crop_size and
train_crop_size .
Training run may get into CUDA
OOM if the size is too big
.
Notes:
seresnext doesn't take an
arbitrary size.
ViT-variants should have the same
valid_crop_size and
train_crop_size .
Training run may get into CUDA
OOM if the size is too big
.
WARNING
These parameters are not supported with the yolov5 algorithm. See the model specific hyperparameters section for
yolo5 supported hyperparmeters.
PA RA M ET ER N A M E DESC RIP T IO N DEFA ULT
Learn about the data and resources collected by Azure Monitor from your Azure Machine Learning workspace.
See Monitoring Azure Machine Learning for details on collecting and analyzing monitoring data.
Metrics
This section lists all the automatically collected platform metrics collected for Azure Machine Learning. The
resource provider for these metrics is Microsoft.MachineLearningServices/workspaces.
Model
Quota
Quota information is for Azure Machine Learning compute only.
Resource
Run
Information on training runs for the workspace.
Metric dimensions
For more information on what metric dimensions are, see Multi-dimensional metrics.
Azure Machine Learning has the following dimensions associated with its metrics.
Cluster Name The name of the compute cluster resource. Available for all
quota metrics.
Vm Family Name The name of the VM family used by the cluster. Available for
quota utilization percentage.
ComputeType The compute type that the run used. Only available for
Completed runs, Failed runs, and Started runs.
PipelineStepType The type of PipelineStep used in the run. Only available for
Completed runs, Failed runs, and Started runs.
RunType The type of run. Only available for Completed runs, Failed
runs, and Started runs.
VA L UE DESC RIP T IO N
Activity log
The following table lists the operations related to Azure Machine Learning that may be created in the Activity
log.
Creates or updates the compute resources A compute resource was created or updated
Resource logs
This section lists the types of resource logs you can collect for Azure Machine Learning workspace.
Resource Provider and Type: Microsoft.MachineLearningServices/workspace.
C AT EGO RY DISP L AY N A M E
AmlComputeClusterEvent AmlComputeClusterEvent
AmlComputeCpuGpuUtilization AmlComputeCpuGpuUtilization
AmlComputeJobEvent AmlComputeJobEvent
AmlRunStatusChangedEvent AmlRunStatusChangedEvent
ModelsChangeEvent ModelsChangeEvent
ModelsReadEvent ModelsReadEvent
ModelsActionEvent ModelsActionEvent
DeploymentReadEvent DeploymentReadEvent
C AT EGO RY DISP L AY N A M E
DeploymentEventACI DeploymentEventACI
DeploymentEventAKS DeploymentEventAKS
InferencingOperationAKS InferencingOperationAKS
InferencingOperationACI InferencingOperationACI
EnvironmentChangeEvent EnvironmentChangeEvent
EnvironmentReadEvent EnvironmentReadEvent
DataLabelChangeEvent DataLabelChangeEvent
DataLabelReadEvent DataLabelReadEvent
ComputeInstanceEvent ComputeInstanceEvent
DataStoreChangeEvent DataStoreChangeEvent
DataStoreReadEvent DataStoreReadEvent
DataSetChangeEvent DataSetChangeEvent
DataSetReadEvent DataSetReadEvent
PipelineChangeEvent PipelineChangeEvent
PipelineReadEvent PipelineReadEvent
RunEvent RunEvent
RunReadEvent RunReadEvent
Schemas
The following schemas are in use by Azure Machine Learning
AmlComputeJobEvent table
P RO P ERT Y DESC RIP T IO N
ExecutionState State of the job (the Run). For example, Queued, Running,
Succeeded, Failed
AmlComputeClusterEvent table
P RO P ERT Y DESC RIP T IO N
AmlComputeClusterNodeEvent table
P RO P ERT Y DESC RIP T IO N
NOTE
Effective February 2022, the AmlComputeClusterNodeEvent table will be deprecated. We recommend that you instead
use the AmlComputeClusterEvent table.
AmlComputeInstanceEvent table
P RO P ERT Y DESC RIP T IO N
OperationName The name of the operation associated with the log entry
AmlComputeInstanceName "The name of the compute instance associated with the log
entry.
AmlDataLabelEvent table
P RO P ERT Y DESC RIP T IO N
OperationName The name of the operation associated with the log entry
AmlLabelNames The label class names which are created for the project.
AmlDataStoreName The name of the data store where the project's data is
stored.
AmlDataSetEvent table
P RO P ERT Y DESC RIP T IO N
OperationName The name of the operation associated with the log entry
AmlDataStoreEvent table
P RO P ERT Y DESC RIP T IO N
OperationName The name of the operation associated with the log entry
AmlDeploymentEvent table
P RO P ERT Y DESC RIP T IO N
OperationName The name of the operation associated with the log entry
OperationName The name of the operation associated with the log entry
AmlModelsEvent table
P RO P ERT Y DESC RIP T IO N
OperationName The name of the operation associated with the log entry
ResultSignature The HTTP status code of the event. Typical values include
200, 201, 202 etc.
AmlPipelineEvent table
P RO P ERT Y DESC RIP T IO N
OperationName The name of the operation associated with the log entry
AmlParentPipelineId The ID of the parent AML pipeline (in the case of cloning).
AmlRunEvent table
P RO P ERT Y DESC RIP T IO N
OperationName The name of the operation associated with the log entry
P RO P ERT Y DESC RIP T IO N
AmlEnvironmentEvent table
P RO P ERT Y DESC RIP T IO N
OperationName The name of the operation associated with the log entry
See also
See Monitoring Azure Machine Learning for a description of monitoring Azure Machine Learning.
See Monitoring Azure resources with Azure Monitor for details on monitoring Azure resources.
Azure Policy built-in policy definitions for Azure
Machine Learning
5/25/2022 • 4 minutes to read • Edit Online
This page is an index of Azure Policy built-in policy definitions for Azure Machine Learning. Common use cases
for Azure Policy include implementing governance for resource consistency, regulatory compliance, security,
cost, and management. Policy definitions for these common use cases are already available in your Azure
environment as built-ins to help you get started. For additional Azure Policy built-ins for other services, see
Azure Policy built-in definitions.
The name of each built-in policy definition links to the policy definition in the Azure portal. Use the link in the
GitHub column to view the source on the Azure Policy GitHub repo.
Azure Machine Learning Manage encryption at rest Audit, Deny, Disabled 1.0.3
workspaces should be of Azure Machine Learning
encrypted with a customer- workspace data with
managed key customer-managed keys. By
default, customer data is
encrypted with service-
managed keys, but
customer-managed keys
are commonly required to
meet regulatory compliance
standards. Customer-
managed keys enable the
data to be encrypted with
an Azure Key Vault key
created and owned by you.
You have full control and
responsibility for the key
lifecycle, including rotation
and management. Learn
more at
https://aka.ms/azureml-
workspaces-cmk.
Azure Machine Learning Disabling public network Audit, Deny, Disabled 1.2.0
workspaces should disable access improves security by
public network access ensuring that the machine
learning workspaces aren't
exposed on the public
internet. You can limit
exposure of your
workspaces by creating
private endpoints instead.
Learn more at:
https://aka.ms/privateendp
oints.
Azure Machine Learning Azure Private Link lets you Audit, Deny, Disabled 1.1.0
workspaces should use connect your virtual
private link network to Azure services
without a public IP address
at the source or destination.
The Private Link platform
handles the connectivity
between the consumer and
services over the Azure
backbone network. By
mapping private endpoints
to Azure Machine Learning
workspaces, data leakage
risks are reduced. Learn
more about private links at:
https://docs.microsoft.com/
azure/machine-
learning/how-to-configure-
private-link.
NAME VERSIO N
DESC RIP T IO N EF F EC T ( S)
Azure Machine Learning Manange access to Azure Audit, Deny, Disabled 1.0.0
workspaces should use ML workspace and
user-assigned managed associated resources, Azure
identity Container Registry,
KeyVault, Storage, and App
Insights using user-
assigned managed identity.
By default, system-assigned
managed identity is used by
Azure ML workspace to
access the associated
resources. User-assigned
managed identity allows
you to create the identity as
an Azure resource and
maintain the life cycle of
that identity. Learn more at
https://docs.microsoft.com/
azure/machine-
learning/how-to-use-
managed-identities?
tabs=python.
Configure Azure Machine Use private DNS zones to DeployIfNotExists, Disabled 1.0.0
Learning workspace to use override the DNS resolution
private DNS zones for a private endpoint. A
private DNS zone links to
your virtual network to
resolve to Azure Machine
Learning workspaces. Learn
more at:
https://docs.microsoft.com/
azure/machine-
learning/how-to-network-
security-overview.
Next steps
See the built-ins on the Azure Policy GitHub repo.
Review the Azure Policy definition structure.
Review Understanding policy effects.
Azure Machine Learning Curated Environments
5/25/2022 • 2 minutes to read • Edit Online
This article lists the curated environments with latest framework versions in Azure Machine Learning. Curated
environments are provided by Azure Machine Learning and are available in your workspace by default. They are
backed by cached Docker images that use the latest version of the Azure Machine Learning SDK, reducing the
run preparation cost and allowing for faster deployment time. Use these environments to quickly get started
with various machine learning frameworks.
NOTE
Use the Python SDK, CLI, or Azure Machine Learning studio to get the full list of environments and their dependencies.
For more information, see the environments article.
IMPORTANT
To view more information about curated environment packages and versions, visit the Environments tab in the Azure
Machine Learning studio.
Automated ML (AutoML)
Azure ML pipeline training workflows that use AutoML automatically selects a curated environment based on
the compute type and whether DNN is enabled. AutoML provides the following curated environments:
NAME C O M P UT E T Y P E DN N EN A B L ED
AzureML-AutoML CPU No
AzureML-AutoML-GPU GPU No
For more information on AutoML and Azure ML pipelines, see use automated ML in an Azure Machine Learning
pipeline in Python.
PyTorch
F RA M EW O RK P RE- IN STA L L ED C URAT ED
VERSIO N C P U/ GP U PA C K A GES M C R PAT H EN VIRO N M EN T
SciKit-Learn
F RA M EW O RK P RE- IN STA L L ED C URAT ED
VERSIO N C P U/ GP U PA C K A GES M C R PAT H EN VIRO N M EN T
ONNX Runtime
F RA M EW O RK P RE- IN STA L L ED C URAT ED
VERSIO N C P U/ GP U PA C K A GES M C R PAT H EN VIRO N M EN T
XGBoost
F RA M EW O RK P RE- IN STA L L ED C URAT ED
VERSIO N C P U/ GP U PA C K A GES M C R PAT H EN VIRO N M EN T
No framework
F RA M EW O RK P RE- IN STA L L ED C URAT ED
VERSIO N C P U/ GP U PA C K A GES M C R PAT H EN VIRO N M EN T
NA CPU NA AzureML-minimal-
mcr.microsoft.com/azureml/minimal-
ubuntu18.04-py37-cpu- ubuntu18.04-py37-
inference:latest
cpu-inference
Support
Version updates for supported environments, including the base images they reference, are released every two
weeks to address vulnerabilities no older than 30 days. Based on usage, some environments may be deprecated
(hidden from the product but usable) to support more common machine learning scenarios.
Azure Machine Learning Python SDK release notes
5/25/2022 • 133 minutes to read • Edit Online
In this article, learn about Azure Machine Learning Python SDK releases. For the full SDK reference content, visit
the Azure Machine Learning's main SDK for Python reference page.
RSS feed : Get notified when this page is updated by copying and pasting the following URL into your feed
reader:
https://docs.microsoft.com/api/search/rss?search=%22Azure+machine+learning+release+notes%22&locale=en-us
2022-04-25
Azure Machine Learning SDK for Python v1.41.0
Breaking change warning
This breaking change comes from the June release of azureml-inference-server-http . In the
azureml-inference-server-http June release (v0.9.0), Python 3.6 support will be dropped. Since
azureml-defaults depends on azureml-inference-server-http , this change will be propagated to
azureml-defaults . If you are not using azureml-defaults for inference, feel free to use azureml-core or any
other AzureML SDK packages directly instead of install azureml-defaults .
azureml-automl-dnn-nlp
Turning on long range text feature by default.
azureml-automl-dnn-vision
Changing the ObjectAnnotation Class type from object to "dataobject".
azureml-core
This release updates the Keyvault class used by customers to enable them to provide the keyvault
content type when creating a secret using the SDK. This release also updates the SDK to include a new
function that enables customers to retrieve the value of the content type from a specific secret.
azureml-interpret
updated azureml-interpret package to interpret-community 0.25.0
azureml-pipeline-core
Do not print run detail anymore if pipeline_run.wait_for_completion with show_output=False
azureml-train-automl-runtime
Fixes a bug that would cause code generation to fail when the azureml-contrib-automl-dnn-
forecasting package is present in the training environment.
Fix error when using a test dataset without a label column with AutoML Model Testing.
2022-03-28
Azure Machine Learning SDK for Python v1.40.0
azureml-automl-dnn-nlp
We're making the Long Range Text feature optional and only if the customers explicitly opt in for it,
using the kwarg "enable_long_range_text"
Adding data validation layer for multi-class classification scenario which leverages the same base class
as multilabel for common validations, and a derived class for additional task specific data validation
checks.
azureml-automl-dnn-vision
Fixing KeyError while computing class weights.
azureml-contrib-reinforcementlearning
SDK warning message for upcoming deprecation of RL service
azureml-core
Return logs for runs that went through our new runtime when calling any of the get logs
function on the run object, including run.get_details , run.get_all_logs , etc.
Added experimental method Datastore.register_onpremises_hdfs to allow users to create datastores
pointing to on-premises HDFS resources.
Updating the cli documentation in the help command
azureml-interpret
For azureml-interpret package, remove shap pin with packaging update. Remove numba and numpy
pin after CE env update.
azureml-mlflow
Bugfix for MLflow deployment client run_local failing when config object wasn't provided.
azureml-pipeline-steps
Remove broken link of deprecated pipeline EstimatorStep
azureml-responsibleai
update azureml-responsibleai package to raiwidgets and responsibleai 0.17.0 release
azureml-train-automl-runtime
Code generation for automated ML now supports ForecastTCN models (experimental).
Models created via code generation will now have all metrics calculated by default (except normalized
mean absolute error, normalized median absolute error, normalized RMSE, and normalized RMSLE in
the case of forecasting models). The list of metrics to be calculated can be changed by editing the
return value of get_metrics_names() . Cross validation will now be used by default for forecasting
models created via code generation..
azureml-training-tabular
The list of metrics to be calculated can be changed by editing the return value of get_metrics_names() .
Cross validation will now be used by default for forecasting models created via code generation.
Converting decimal type y-test into float to allow for metrics computation to proceed without errors.
2022-02-28
Azure Machine Learning SDK for Python v1.39.0
azureml-automl-core
Fix incorrect form displayed in PBI for integration with AutoML regression models
Adding min-label-classes check for both classification tasks (multi-class and multi-label). It will throw
an error for the customer's run if the unique number of classes in the input training dataset is fewer
than 2. It is meaningless to run classification on fewer than two classes.
azureml-automl-runtime
Converting decimal type y-test into float to allow for metrics computation to proceed without errors.
Automl training now supports numpy version 1.8.
azureml-contrib-automl-dnn-forecasting
Fixed a bug in the TCNForecaster model where not all training data would be used when cross-
validation settings were provided.
TCNForecaster wrapper's forecast method that was corrupting inference-time predictions. Also fixed
an issue where the forecast method would not use the most recent context data in train-valid
scenarios.
azureml-interpret
For azureml-interpret package, remove shap pin with packaging update. Remove numba and numpy
pin after CE env update.
azureml-responsibleai
azureml-responsibleai package to raiwidgets and responsibleai 0.17.0 release
azureml-synapse
Fix the issue that magic widget is disappeared.
azureml-train-automl-runtime
Updating AutoML dependencies to support Python 3.8. This change will break compatibility with
models trained with SDK 1.37 or below due to newer Pandas interfaces being saved in the model.
Automl training now supports numpy version 1.19
Fix automl reset index logic for ensemble models in automl_setup_model_explanations API
In automl, use lightgbm surrogate model instead of linear surrogate model for sparse case after latest
lightgbm version upgrade
All internal intermediate artifacts that are produced by AutoML are now stored transparently on the
parent run (instead of being sent to the default workspace blob store). Users should be able to see the
artifacts that AutoML generates under the 'outputs/` directory on the parent run.
2022-01-24
Azure Machine Learning SDK for Python v1.38.0
azureml-automl-core
Tabnet Regressor and Tabnet Classifier support in AutoML
Saving data transformer in parent run outputs, which can be reused to produce same featurized
dataset which was used during the experiment run
Supporting getting primary metrics for Forecasting task in get_primary_metrics API.
Renamed second optional parameter in v2 scoring scripts as GlobalParameters
azureml-automl-dnn-vision
Added the scoring metrics in the metrics UI
azureml-automl-runtime
Bug fix for cases where the algorithm name for NimbusML models may show up as empty strings,
either on the ML Studio, or on the console outputs.
azureml-core
Added parameter blobfuse_enabled in
azureml.core.webservice.aks.AksWebservice.deploy_configuration. When this parameter is true,
models and scoring files will be downloaded with blobfuse instead of the blob storage API.
azureml-interpret
Updated azureml-interpret to interpret-community 0.24.0
In azureml-interpret update scoring explainer to support latest version of lightgbm with sparse
TreeExplainer
Update azureml-interpret to interpret-community 0.23.*
azureml-pipeline-core
Add note in pipelinedata, recommend user to use pipeline output dataset instead.
azureml-pipeline-steps
Add environment_variables to ParallelRunConfig, runtime environment variables can be passed by
this parameter and will be set on the process where the user script is executed.
azureml-train-automl-client
Tabnet Regressor and Tabnet Classifier support in AutoML
azureml-train-automl-runtime
Saving data transformer in parent run outputs, which can be reused to produce same featurized
dataset which was used during the experiment run
azureml-train-core
Enable support for early termination for Bayesian Optimization in Hyperdrive
Bayesian and GridParameterSampling objects can now pass on properties
2021-12-13
Azure Machine Learning SDK for Python v1.37.0
Breaking changes
azureml-core
Starting in version 1.37.0, AzureML SDK uses MSAL as the underlying authentication library.
MSAL uses Azure Active Directory (Azure AD) v2.0 authentication flow to provide more
functionality and increases security for token cache. For more details, see Overview of the
Microsoft Authentication Library (MSAL).
Update AML SDK dependencies to the latest version of Azure Resource Management Client
Library for Python (azure-mgmt-resource>=15.0.0,<20.0.0) & adopt track2 SDK.
Starting in version 1.37.0, azure-ml-cli extension should be compatible with the latest version
of Azure CLI >=2.30.0.
When using Azure CLI in a pipeline, like as Azure DevOps, ensure all tasks/stages are using
versions of Azure CLI above v2.30.0 for MSAL-based Azure CLI. Azure CLI 2.30.0 is not
backward compatible with prior versions and throws an error when using incompatible
versions. To use Azure CLI credentials with AzureML SDK, Azure CLI should be installed as pip
package.
Bug fixes and improvements
azureml-core
Removed instance types from the attach workflow for Kubernetes compute. Instance types can
now directly be set up in the Kubernetes cluster. For more details, please visit
aka.ms/amlarc/doc.
azureml-interpret
updated azureml-interpret to interpret-community 0.22.*
azureml-pipeline-steps
Fixed a bug where the experiment "placeholder" might be created on submission of a Pipeline
with an AutoMLStep.
azureml-responsibleai
update azureml-responsibleai and compute instance environment to responsibleai and
raiwidgets 0.15.0 release
update azureml-responsibleai package to latest responsibleai 0.14.0.
azureml-tensorboard
You can now use Tensorboard(runs, use_display_name=True) to mount the TensorBoard logs to
folders named after the run.display_name/run.id instead of run.id .
azureml-train-automl-client
Fixed a bug where the experiment "placeholder" might be created on submission of a Pipeline
with an AutoMLStep.
Update AutoMLConfig test_data and test_size docs to reflect preview status.
azureml-train-automl-runtime
Added new feature that allows users to pass time series grains with one unique value.
In certain scenarios, an AutoML model can predict NaNs. The rows that correspond to these
NaN predictions will be removed from test datasets and predictions before computing metrics
in test runs.
2021-11-08
Azure Machine Learning SDK for Python v1.36.0
Bug fixes and improvements
azureml-automl-dnn-vision
Cleaned up minor typos on some error messages.
azureml-contrib-reinforcementlearning
Submitting Reinforcement Learning runs that use simulators is no longer supported.
azureml-core
Added support for partitioned premium blob.
Specifying non-public clouds for Managed Identity authentication is no longer supported.
User can migrate AKS web service to online endpoint and deployment which is managed by
CLI (v2).
The instance type for training jobs on Kubernetes compute targets can now be set via a
RunConfiguration property: run_config.kubernetescompute.instance_type.
azureml-defaults
Removed redundant dependencies like gunicorn and werkzeug
azureml-interpret
azureml-interpret package updated to 0.21.* version of interpret-community
azureml-pipeline-steps
Deprecate MpiStep in favor of using CommandStep for running ML training (including
distributed training) in pipelines.
azureml-train-automl-rutime
Update the AutoML model test predictions output format docs.
Added docstring descriptions for Naive, SeasonalNaive, Average, and SeasonalAverage
forecasting model.
Featurization summary is now stored as an artifact on the run (check for a file named
'featurization_summary.json' under the outputs folder)
Enable categorical indicators support for Tabnet Learner.
Add downsample parameter to automl_setup_model_explanations to allow users to get
explanations on all data without downsampling by setting this parameter to be false.
2021-10-11
Azure Machine Learning SDK for Python v1.35.0
Bug fixes and improvements
azureml-automl-core
Enable binary metrics calculation
azureml-contrib-fairness
Improve error message on failed dashboard download
azureml-core
Bug in specifying non-public clouds for Managed Identity authentication has been resolved.
Dataset.File.upload_directory() and Dataset.Tabular.register_pandas_dataframe() experimental
flags are now removed.
Experimental flags are now removed in partition_by() method of TabularDataset class.
azureml-pipeline-steps
Experimental flags are now removed for the partition_keys parameter of the
ParallelRunConfig class.
azureml-interpret
azureml-interpret package updated to intepret-community 0.20.*
azureml-mlflow
Made it possible to log artifacts and images with MLflow using subdirectories
azureml-responsibleai
Improve error message on failed dashboard download
azureml-train-automl-client
Added support for computer vision tasks such as Image Classification, Object Detection and
Instance Segmentation. Detailed documentation can be found at: How to automatically train
image models
Enable binary metrics calculation
azureml-train-automl-runtime
Add TCNForecaster support to model test runs.
Update the model test predictions.csv output format. The output columns now include the
original target values and the features which were passed in to the test run. This can be turned
off by setting test_include_predictions_only=True in AutoMLConfig or by setting
include_predictions_only=True in ModelProxy.test() . If the user has requested to only include
predictions then the output format looks like (forecasting is the same as regression):
Classification => [predicted values] [probabilities] Regression => [predicted values] else
(default): Classification => [original test data labels] [predicted values] [probabilities] [features]
Regression => [original test data labels] [predicted values] [features] The [predicted values]
column name = [label column name] + "_predicted" . The [probabilities] column names =
[class name] + "_predicted_proba" . If no target column was passed in as input to the test run,
then [original test data labels] will not be in the output.
2021-09-07
Azure Machine Learning SDK for Python v1.34.0
Bug fixes and improvements
azureml-automl-core
Added support for re-fitting a previously trained forecasting pipeline.
Added ability to get predictions on the training data (in-sample prediction) for forecasting.
azureml-automl-runtime
Add support to return predicted probabilities from a deployed endpoint of an AutoML classifier
model.
Added a forecasting option for users to specify that all predictions should be integers.
Removed the target column name from being part of model explanation feature names for
local experiments with training_data_label_column_name
as dataset inputs.
Added support for re-fitting a previously trained forecasting pipeline.
Added ability to get predictions on the training data (in-sample prediction) for forecasting.
azureml-core
Added support to set stream column type, mount and download stream columns in tabular
dataset.
New optional fields added to Kubernetes.attach_configuration(identity_type=None,
identity_ids=None) which allow attaching KubernetesCompute with either SystemAssigned or
UserAssigned identity. New identity fields will be included when calling print(compute_target)
or compute_target.serialize(): identity_type, identity_id, principal_id, and tenant_id/client_id.
azureml-dataprep
Added support to set stream column type for tabular dataset. added support to mount and
download stream columns in tabular dataset.
azureml-defaults
The dependency azureml-inference-server-http==0.3.1 has been added to azureml-defaults .
azureml-mlflow
Allow pagination of list_experiments API by adding max_results and page_token optional
params. For documentation, see MLflow official docs.
azureml-sdk
Replaced dependency on deprecated package(azureml-train) inside azureml-sdk.
Add azureml-responsibleai to azureml-sdk extras
azureml-train-automl-client
Expose the test_data and test_size parameters in AutoMLConfig . These parameters can be
used to automatically start a test run after the model
training phase has been completed. The test run will compute predictions using the best model
and will generate metrics given these predictions.
2021-08-24
Azure Machine Learning Experimentation User Interface
Run Delete
Run Delete is a new functionality that allows users to delete one or multiple runs from their
workspace.
This functionality can help users reduce storage costs and manage storage capacity by regularly
deleting runs and experiments from the UI directly.
Batch Cancel Run
Batch Cancel Run is new functionality that allows users to select one or multiple runs to cancel from
their run list.
This functionality can help users cancel multiple queued runs and free up space on their cluster.
2021-08-18
Azure Machine Learning Experimentation User Interface
Run Display Name
The Run Display Name is a new, editable and optional display name that can be assigned to a run.
This name can help with more effectively tracking, organizing and discovering the runs.
The Run Display Name is defaulted to an adjective_noun_guid format (Example:
awesome_watch_2i3uns).
This default name can be edited to a more customizable name. This can be edited from the Run details
page in the Azure Machine Learning studio user interface.
2021-08-02
Azure Machine Learning SDK for Python v1.33.0
Bug fixes and improvements
azureml-automl-core
Improved error handling around XGBoost model retrieval.
Added possibility to convert the predictions from float to integers for forecasting and
regression tasks.
Updated default value for enable_early_stopping in AutoMLConfig to True.
azureml-automl-runtime
Added possibility to convert the predictions from float to integers for forecasting and
regression tasks.
Updated default value for enable_early_stopping in AutoMLConfig to True.
azureml-contrib-automl-pipeline-steps
Hierarchical timeseries (HTS) is enabled for forecasting tasks through pipelines.
Add Tabular dataset support for inferencing
Custom path can be specified for the inference data
azureml-contrib-reinforcementlearning
Some properties in azureml.core.environment.DockerSection are deprecated, such as shm_size
property used by Ray workers in reinforcement learning jobs. This property can now be
specified in azureml.contrib.train.rl.WorkerConfiguration instead.
azureml-core
Fixed a hyperlink in ScriptRunConfig.distributed_job_config documentation
Azure Machine Learning compute clusters can now be created in a location different from the
location of the workspace. This is useful for maximizing idle capacity allocation and managing
quota utilization across different locations without having to create more workspaces just to
use quota and create a compute cluster in a particular location. For more information, see
Create an Azure Machine Learning compute cluster.
Added display_name as a mutable name field of Run object.
Dataset from_files now supports skipping of data extensions for large input data
azureml-dataprep
Fixed a bug where to_dask_dataframe would fail because of a race condition.
Dataset from_files now supports skipping of data extensions for large input data
azureml-defaults
We are removing the dependency azureml-model-management-sdk==1.0.1b6.post1 from
azureml-defaults.
azureml-interpret
updated azureml-interpret to interpret-community 0.19.*
azureml-pipeline-core
Hierarchical timeseries (HTS) is enabled for forecasting tasks through pipelines.
azureml-train-automl-client
Switch to using blob store for caching in Automated ML.
Hierarchical timeseries (HTS) is enabled for forecasting tasks through pipelines.
Improved error handling around XGBoost model retrieval.
Updated default value for enable_early_stopping in AutoMLConfig to True.
azureml-train-automl-runtime
Switch to using blob store for caching in Automated ML.
Hierarchical timeseries (HTS) is enabled for forecasting tasks through pipelines.
Updated default value for enable_early_stopping in AutoMLConfig to True.
2021-07-06
Azure Machine Learning SDK for Python v1.32.0
Bug fixes and improvements
azureml-core
Expose diagnose workspace health in SDK/CLI
azureml-defaults
Added opencensus-ext-azure==1.0.8 dependency to azureml-defaults
azureml-pipeline-core
Updated the AutoMLStep to use prebuilt images when the environment for job submission
matches the default environment
azureml-responsibleai
New error analysis client added to upload, download and list error analysis reports
Ensure raiwidgets and responsibleai packages are version synchronized
azureml-train-automl-runtime
Set the time allocated to dynamically search across various featurization strategies to a
maximum of one-fourth of the overall experiment timeout
2021-06-21
Azure Machine Learning SDK for Python v1.31.0
Bug fixes and improvements
azureml-core
Improved documentation for platform property on Environment class
Changed default AML Compute node scale down time from 120 seconds to 1800 seconds
Updated default troubleshooting link displayed on the portal for troubleshooting failed runs to:
https://aka.ms/azureml-run-troubleshooting
azureml-automl-runtime
Data Cleaning: Samples with target values in [None, "", "nan", np.nan] will be dropped prior to
featurization and/or model training
azureml-interpret
Prevent flush task queue error on remote AzureML runs that use ExplanationClient by
increasing timeout
azureml-pipeline-core
Add jar parameter to synapse step
azureml-train-automl-runtime
Fix high cardinality guardrails to be more aligned with docs
2021-06-07
Azure Machine Learning SDK for Python v1.30.0
Bug fixes and improvements
azureml-core
Pin dependency ruamel-yaml to < 0.17.5 as a breaking change was released in 0.17.5.
aml_k8s_config property is being replaced with namespace , default_instance_type , and
instance_types parameters for KubernetesCompute attach.
Workspace sync keys was changed to a long running operation.
azureml-automl-runtime
Fixed problems where runs with big data may fail with Elements of y_test cannot be NaN .
azureml-mlflow
MLFlow deployment plugin bugfix for models with no signature.
azureml-pipeline-steps
ParallelRunConfig: update doc for process_count_per_node.
azureml-train-automl-runtime
Support for custom defined quantiles during MM inference
Support for forecast_quantiles during batch inference.
azureml-contrib-automl-pipeline-steps
Support for custom defined quantiles during MM inference
Support for forecast_quantiles during batch inference.
2021-05-25
Announcing the CLI (v2) (preview) for Azure Machine Learning
The ml extension to the Azure CLI is the next-generation interface for Azure Machine Learning. It enables you to
train and deploy models from the command line, with features that accelerate scaling data science up and out
while tracking the model lifecycle. Install and get started.
Azure Machine Learning SDK for Python v1.29.0
Bug fixes and improvements
Breaking changes
Dropped support for Python 3.5.
azureml-automl-runtime
Fixed a bug where the STLFeaturizer failed if the time-series length was shorter than the
seasonality. This error manifested as an IndexError. The case is handled now without error,
though the seasonal component of the STL will just consist of zeros in this case.
azureml-contrib-automl-dnn-vision
Added a method for batch inferencing with file paths.
azureml-contrib-gbdt
The azureml-contrib-gbdt package has been deprecated and might not receive future updates
and will be removed from the distribution altogether.
azureml-core
Corrected explanation of parameter create_if_not_exists in
Datastore.register_azure_blob_container.
Added sample code to DatasetConsumptionConfig class.
Added support for step as an alternative axis for scalar metric values in run.log()
azureml-dataprep
Limit partition size accepted in _with_partition_size() to 2GB
azureml-interpret
update azureml-interpret to the latest interpret-core package version
Dropped support for SHAP DenseData, which has been deprecated in SHAP 0.36.0.
Enable ExplanationClient to upload to a user specified datastore.
azureml-mlflow
Move azureml-mlflow to mlflow-skinny to reduce the dependency footprint while maintaining
full plugin support
azureml-pipeline-core
PipelineParameter code sample is updated in the reference doc to use correct parameter.
2021-05-10
Azure Machine Learning SDK for Python v1.28.0
Bug fixes and improvements
azureml-automl-runtime
Improved AutoML Scoring script to make it consistent with designer
Patch bug where forecasting with the Prophet model would throw a "missing column" error if
trained on an earlier version of the SDK.
Added the ARIMAX model to the public-facing, forecasting-supported model lists of the
AutoML SDK. Here, ARIMAX is a regression with ARIMA errors and a special case of the transfer
function models developed by Box and Jenkins. For a discussion of how the two approaches are
different, see The ARIMAX model muddle. Unlike the rest of the multivariate models that use
auto-generated, time-dependent features (hour of the day, day of the year, and so on) in
AutoML, this model uses only features that are provided by the user, and it makes interpreting
coefficients easy.
azureml-contrib-dataset
Updated documentation description with indication that libfuse should be installed while using
mount.
azureml-core
Default CPU curated image is now mcr.microsoft.com/azureml/openmpi3.1.2-ubuntu18.04.
Default GPU image is now mcr.microsoft.com/azureml/openmpi3.1.2-cuda10.2-cudnn8-
ubuntu18.04
Run.fail() is now deprecated, use Run.tag() to mark run as failed or use Run.cancel() to mark the
run as canceled.
Updated documentation with a note that libfuse should be installed when mounting a file
dataset.
Add experimental register_dask_dataframe() support to tabular dataset.
Support DatabricksStep with Azure Blob/ADL-S as inputs/outputs and expose parameter
permit_cluster_restart to let customer decide whether AML can restart cluster when i/o access
configuration need to be added into cluster
azureml-dataset-runtime
azureml-dataset-runtime now supports versions of pyarrow < 4.0.0
azureml-mlflow
Added support for deploying to AzureML via our MLFlow plugin.
azureml-pipeline-steps
Support DatabricksStep with Azure Blob/ADL-S as inputs/outputs and expose parameter
permit_cluster_restart to let customer decide whether AML can restart cluster when i/o access
configuration need to be added into cluster
azureml-synapse
Enable audience in msi authentication
azureml-train-automl-client
Added changed link for compute target doc
2021-04-19
Azure Machine Learning SDK for Python v1.27.0
Bug fixes and improvements
azureml-core
Added the ability to override the default timeout value for artifact uploading via the
"AZUREML_ARTIFACTS_DEFAULT_TIMEOUT" environment variable.
Fixed a bug where docker settings in Environment object on ScriptRunConfig are not respected.
Allow partitioning a dataset when copying it to a destination.
Added a custom mode to the OutputDatasetConfig to enable passing created Datasets in
pipelines through a link function. These support enhancements made to enable Tabular
Partitioning for PRS.
Added a new KubernetesCompute compute type to azureml-core.
azureml-pipeline-core
Adding a custom mode to the OutputDatasetConfig and enabling a user to pass through
created Datasets in pipelines through a link function. File path destinations support
placeholders. These support the enhancements made to enable Tabular Partitioning for PRS.
Addition of new KubernetesCompute compute type to azureml-core.
azureml-pipeline-steps
Addition of new KubernetesCompute compute type to azureml-core.
azureml-synapse
Update spark UI url in widget of azureml synapse
azureml-train-automl-client
The STL featurizer for the forecasting task now uses a more robust seasonality detection based
on the frequency of the time series.
azureml-train-core
Fixed bug where docker settings in Environment object are not respected.
Addition of new KubernetesCompute compute type to azureml-core.
2021-04-05
Azure Machine Learning SDK for Python v1.26.0
Bug fixes and improvements
azureml-automl-core
Fixed an issue where Naive models would be recommended in AutoMLStep runs and fail with
lag or rolling window features. These models will not be recommended when target lags or
target rolling window size are set.
Changed console output when submitting an AutoML run to show a portal link to the run.
azureml-core
Added HDFS mode in documentation.
Added support to understand File Dataset partitions based on glob structure.
Added support for update container registry associated with AzureML Workspace.
Deprecated Environment attributes under the DockerSection - "enabled", "shared_volume" and
"arguments" are a part of DockerConfiguration in RunConfiguration now.
Updated Pipeline CLI clone documentation
Updated portal URIs to include tenant for authentication
Removed experiment name from run URIs to avoid redirects
Updated experiment URO to use experiment ID.
Bug fixes for attaching remote compute with AzureML CLI.
Updated portal URIs to include tenant for authentication.
Updated experiment URI to use experiment ID.
azureml-interpret
azureml-interpret updated to use interpret-community 0.17.0
azureml-opendatasets
Input start date and end date type validation and error indication if it's not datetime type.
azureml-parallel-run
[Experimental feature] Add partition_keys parameter to ParallelRunConfig, if specified, the
input dataset(s) would be partitioned into mini-batches by the keys specified by it. It requires all
input datasets to be partitioned dataset.
azureml-pipeline-steps
Bugfix - supporting path_on_compute while passing dataset configuration as download.
Deprecate RScriptStep in favor of using CommandStep for running R scripts in pipelines.
Deprecate EstimatorStep in favor of using CommandStep for running ML training (including
distributed training) in pipelines.
azureml-sdk
Update python_requires to < 3.9 for azureml-sdk
azureml-train-automl-client
Changed console output when submitting an AutoML run to show a portal link to the run.
azureml-train-core
Deprecated DockerSection's 'enabled', 'shared_volume', and 'arguments' attributes in favor of
using DockerConfiguration with ScriptRunConfig.
Use Azure Open Datasets for MNIST dataset
Hyperdrive error messages have been updated.
2021-03-22
Azure Machine Learning SDK for Python v1.25.0
Bug fixes and improvements
azureml-automl-core
Changed console output when submitting an AutoML run to show a portal link to the run.
azureml-core
Starts to support updating container registry for workspace in SDK and CLI
Deprecated DockerSection's 'enabled', 'shared_volume', and 'arguments' attributes in favor of
using DockerConfiguration with ScriptRunConfig.
Updated Pipeline CLI clone documentation
Updated portal URIs to include tenant for authentication
Removed experiment name from run URIs to avoid redirects
Updated experiment URO to use experiment ID.
Bug fixes for attaching remote compute using az CLI
Updated portal URIs to include tenant for authentication.
Added support to understand File Dataset partitions based on glob structure.
azureml-interpret
azureml-interpret updated to use interpret-community 0.17.0
azureml-opendatasets
Input start date and end date type validation and error indication if it's not datetime type.
azureml-pipeline-core
Bugfix - supporting path_on_compute while passing dataset configuration as download.
azureml-pipeline-steps
Bugfix - supporting path_on_compute while passing dataset configuration as download.
Deprecate RScriptStep in favor of using CommandStep for running R scripts in pipelines.
Deprecate EstimatorStep in favor of using CommandStep for running ML training (including
distributed training) in pipelines.
azureml-train-automl-runtime
Changed console output when submitting an AutoML run to show a portal link to the run.
azureml-train-core
Deprecated DockerSection's 'enabled', 'shared_volume', and 'arguments' attributes in favor of
using DockerConfiguration with ScriptRunConfig.
Use Azure Open Datasets for MNIST dataset
Hyperdrive error messages have been updated.
2021-03-31
Azure Machine Learning Studio Notebooks Experience (March Update )
New features
Render CSV/TSV. Users will be able to render and TSV/CSV file in a grid format for easier data
analysis.
SSO Authentication for Compute Instance. Users can now easily authenticate any new compute
instances directly in the Notebook UI, making it easier to authenticate and use Azure SDKs directly in
AzureML.
Compute Instance Metrics. Users will be able to view compute metrics like CPU usage and memory
via terminal.
File Details. Users can now see file details including the last modified time, and file size by clicking the
3 dots beside a file.
Bug fixes and improvements
Improved page load times.
Improved performance.
Improved speed and kernel reliability.
Gain vertical real estate by permanently moving Notebook file pane up
Links are now clickable in Terminal
Improved Intellisense performance
2021-03-08
Azure Machine Learning SDK for Python v1.24.0
Bug fixes and improvements
azureml-automl-core
Removed backwards compatible imports from azureml.automl.core.shared . Module not found
errors in the azureml.automl.core.shared namespace can be resolved by importing from
azureml.automl.runtime.shared .
azureml-contrib-automl-dnn-vision
Exposed object detection yolo model.
azureml-contrib-dataset
Added functionality to filter Tabular Datasets by column values and File Datasets by metadata.
azureml-contrib-fairness
Include JSON schema in wheel for azureml-contrib-fairness
azureml-contrib-mir
With setting show_output to True when deploy models, inference configuration and
deployment configuration will be replayed before sending the request to server.
azureml-core
Added functionality to filter Tabular Datasets by column values and File Datasets by metadata.
Previously, it was possibly for users to create provisioning configurations for ComputeTarget's
that did not satisfy the password strength requirements for the admin_user_password field (i.e.,
that they must contain at least 3 of the following: 1 lowercase letter, 1 uppercase letter, 1 digit,
and 1 special character from the following set: \`~!@#$%^&*()=+_[]{}|;:./'",<>? ). If the user
created a configuration with a weak password and ran a job using that configuration, the job
would fail at runtime. Now, the call to AmlCompute.provisioning_configuration will throw a
ComputeTargetException with an accompanying error message explaining the password
strength requirements.
Additionally, it was also possible in some cases to specify a configuration with a negative
number of maximum nodes. It is no longer possible to do this. Now,
AmlCompute.provisioning_configuration will throw a ComputeTargetException if the max_nodes
argument is a negative integer.
With setting show_output to True when deploy models, inference configuration and
deployment configuration will be displayed.
With setting show_output to True when wait for the completion of model deployment, the
progress of deployment operation will be displayed.
Allow customer specified AzureML auth config directory through environment variable:
AZUREML_AUTH_CONFIG_DIR
Previously, it was possible to create a provisioning configuration with the minimum node count
less than the maximum node count. The job would run but fail at runtime. This bug has now
been fixed. If you now try to create a provisioning configuration with min_nodes < max_nodes
the SDK will raise a ComputeTargetException .
azureml-interpret
fix explanation dashboard not showing aggregate feature importances for sparse engineered
explanations
optimized memory usage of ExplanationClient in azureml-interpret package
azureml-train-automl-client
Fixed show_output=False to return control to the user when running using spark.
2021-02-28
Azure Machine Learning Studio Notebooks Experience (February Update )
New features
Native Terminal (GA). Users will now have access to an integrated terminal as well as Git operation via
the integrated terminal.
Notebook Snippets (preview). Common Azure ML code excerpts are now available at your fingertips.
Navigate to the code snippets panel, accessible via the toolbar, or activate the in-code snippets menu
using Ctrl + Space.
Keyboard Shortcuts. Full parity with keyboard shortcuts available in Jupyter.
Indicate Cell parameters. Shows users which cells in a notebook are parameter cells and can run
parameterized notebooks via Papermill on the Compute Instance.
Terminal and Kernel session manager: Users will be able to manage all kernels and terminal sessions
running on their compute.
Sharing Button. Users can now share any file in the Notebook file explorer by right-clicking the file and
using the share button.
Bug fixes and improvements
Improved page load times
Improved performance
Improved speed and kernel reliability
Added spinning wheel to show progress for all ongoing Compute Instance operations.
Right click in File Explorer. Right-clicking any file will now open file operations.
2021-02-16
Azure Machine Learning SDK for Python v1.23.0
Bug fixes and improvements
azureml-core
[Experimental feature] Add support to link synapse workspace into AML as an linked service
[Experimental feature] Add support to attach synapse spark pool into AML as a compute
[Experimental feature] Add support for identity based data access. Users can register datastore
or datasets without providing credentials. In such case, users' Azure AD token or managed
identity of compute target will be used for authentication. To learn more, see Connect to
storage by using identity-based data access.
azureml-pipeline-steps
[Experimental feature] Add support for SynapseSparkStep
azureml-synapse
[Experimental feature] Add support of spark magic to run interactive session in synapse spark
pool.
Bug fixes and improvements
azureml-automl-runtime
In this update, we added holt winters exponential smoothing to forecasting toolbox of AutoML
SDK. Given a time series, the best model is selected by AICc (Corrected Akaike's Information
Criterion) and returned.
AutoML will now generate two log files instead of one. Log statements will go to one or the
other depending on which process the log statement was generated in.
Remove unnecessary in-sample prediction during model training with cross-validations. This
may decrease model training time in some cases, especially for time-series forecasting models.
azureml-contrib-fairness
Add a JSON schema for the dashboardDictionary uploads.
azureml-contrib-interpret
azureml-contrib-interpret README is updated to reflect that package will be removed in next
update after being deprecated since October, use azureml-interpret package instead
azureml-core
Previously, it was possible to create a provisioning configuration with the minimum node count
less than the maximum node count. This has now been fixed. If you now try to create a
provisioning configuration with min_nodes < max_nodes the SDK will raise a
ComputeTargetException .
Fixes bug in wait_for_completion in AmlCompute which caused the function to return control
flow before the operation was actually complete
Run.fail() is now deprecated, use Run.tag() to mark run as failed or use Run.cancel() to mark the
run as canceled.
Show error message 'Environment name expected str, {} found' when provided environment
name is not a string.
azureml-train-automl-client
Fixed a bug that prevented AutoML experiments performed on Azure Databricks clusters from
being canceled.
2021-02-09
Azure Machine Learning SDK for Python v1.22.0
Bug fixes and improvements
azureml-automl-core
Fixed bug where an extra pip dependency was added to the conda yml file for vision models.
azureml-automl-runtime
Fixed a bug where classical forecasting models (e.g. AutoArima) could receive training data
wherein rows with imputed target values were not present. This violated the data contract of
these models. * Fixed various bugs with lag-by-occurrence behavior in the time-series lagging
operator. Previously, the lag-by-occurrence operation did not mark all imputed rows correctly
and so would not always generate the correct occurrence lag values. Also fixed some
compatibility issues between the lag operator and the rolling window operator with lag-by-
occurrence behavior. This previously resulted in the rolling window operator dropping some
rows from the training data that it should otherwise use.
azureml-core
Adding support for Token Authentication by audience.
Add process_count to PyTorchConfiguration to support multi-process multi-node PyTorch jobs.
azureml-pipeline-steps
CommandStep now GA and no longer experimental.
ParallelRunConfig: add argument allowed_failed_count and allowed_failed_percent to check
error threshold on mini batch level. Error threshold has 3 flavors now:
error_threshold - the number of allowed failed mini batch items;
allowed_failed_count - the number of allowed failed mini batches;
allowed_failed_percent- the percent of allowed failed mini batches.
A job will stop if exceeds any of them. error_threshold is required to keep it backward
compatibility. Set the value to -1 to ignore it.
Fixed whitespace handling in AutoMLStep name.
ScriptRunConfig is now supported by HyperDriveStep
azureml-train-core
HyperDrive runs invoked from a ScriptRun will now be considered a child run.
Add process_count to PyTorchConfiguration to support multi-process multi-node PyTorch jobs.
azureml-widgets
Add widget ParallelRunStepDetails to visualize status of a ParallelRunStep.
Allows hyperdrive users to see an additional axis on the parallel coordinates chart that shows
the metric value corresponding to each set of hyperparameters for each child run.
2021-01-31
Azure Machine Learning Studio Notebooks Experience (January Update )
New features
Native Markdown Editor in AzureML. Users can now render and edit markdown files natively in
AzureML Studio.
Run Button for Scripts (.py, .R and .sh). Users can easily now run Python, R and Bash script in AzureML
Variable Explorer. Explore the contents of variables and data frames in a pop-up panel. Users can
easily check data type, size, and contents.
Table of Content. Navigate to sections of your notebook, indicated by Markdown headers.
Export your Notebook as Latex/HTML/Py. Create easy-to-share notebook files by exporting to LaTex,
HTML, or .py
Intellicode. ML-powered results provides an enhanced intelligent autocompletion experience.
Bug fixes and improvements
Improved page load times
Improved performance
Improved speed and kernel reliability
2021-01-25
Azure Machine Learning SDK for Python v1.21.0
Bug fixes and improvements
azure-cli-ml
Fixed CLI help text when using AmlCompute with UserAssigned Identity
azureml-contrib-automl-dnn-vision
Deploy and download buttons will become visible for AutoML vision runs, and models can be
deployed or downloaded similar to other AutoML runs. There are two new files
(scoring_file_v_1_0_0.py and conda_env_v_1_0_0.yml) which contain a script to run inferencing
and a yml file to recreate the conda environment. The 'model.pth' file has also been renamed to
use the '.pt' extension.
azureml-core
MSI support for azure-cli-ml
User Assigned Managed Identity Support.
With this change, the customers should be able to provide a user assigned identity that can be
used to fetch the key from the customer key vault for encryption at rest.
fix row_count=0 for the profile of very large files - fix error in double conversion for delimited
values with white space padding
Remove experimental flag for Output dataset GA
Update documentation on how to fetch specific version of a Model
Allow updating workspace for mixed mode access in case of private link
Fix to remove additional registration on datastore for resume run feature
Added CLI/SDK support for updating primary user assigned identity of workspace
azureml-interpret
updated azureml-interpret to interpret-community 0.16.0
memory optimizations for explanation client in azureml-interpret
azureml-train-automl-runtime
Enabled streaming for ADB runs
azureml-train-core
Fix to remove additional registration on datastore for resume run feature
azureml-widgets
Customers should not see changes to existing run data visualization using the widget, and now
will have support if they optionally use conditional hyperparameters.
The user run widget now includes a detailed explanation for why a run is in the queued state.
2021-01-11
Azure Machine Learning SDK for Python v1.20.0
Bug fixes and improvements
azure-cli-ml
framework_version added in OptimizationConfig. It will be used when model is registered with
framework MULTI.
azureml-contrib-optimization
framework_version added in OptimizationConfig. It will be used when model is registered with
framework MULTI.
azureml-pipeline-steps
Introducing CommandStep which would take command to process. Command can include
executables, shell commands, scripts, etc.
azureml-core
Now workspace creation supports user assigned identity. Adding the uai support from SDK/CLI
Fixed issue on service.reload() to pick up changes on score.py in local deployment.
run.get_details() has an extra field named "submittedBy" which displays the author's name
for this run.
Edited Model.register method documentation to mention how to register model from run
directly
Fixed IOT-Server connection status change handling issue.
2020-12-31
Azure Machine Learning Studio Notebooks Experience (December Update )
New features
User Filename search. Users are now able to search all the files saved in a workspace.
Markdown Side by Side support per Notebook Cell. In a notebook cell, users can now have the option
to view rendered markdown and markdown syntax side-by-side.
Cell Status Bar. The status bar indicates what state a code cell is in, whether a cell run was successful,
and how long it took to run.
Bug fixes and improvements
Improved page load times
Improved performance
Improved speed and kernel reliability
2020-12-07
Azure Machine Learning SDK for Python v1.19.0
Bug fixes and improvements
azureml-automl-core
Added experimental support for test data to AutoMLStep.
Added the initial core implementation of test set ingestion feature.
Moved references to sklearn.externals.joblib to depend directly on joblib.
introduce a new AutoML task type of "image-instance-segmentation".
azureml-automl-runtime
Added the initial core implementation of test set ingestion feature.
When all the strings in a text column have a length of exactly 1 character, the TfIdf word-gram
featurizer won't work because its tokenizer ignores the strings with fewer than 2 characters.
The current code change will allow AutoML to handle this use case.
introduce a new AutoML task type of "image-instance-segmentation".
azureml-contrib-automl-dnn-nlp
Initial PR for new dnn-nlp package
azureml-contrib-automl-dnn-vision
introduce a new AutoML task type of "image-instance-segmentation".
azureml-contrib-automl-pipeline-steps
This new package is responsible for creating steps required for many models train/inference
scenario. - It also moves the train/inference code into azureml.train.automl.runtime package so
any future fixes would be automatically available through curated environment releases.
azureml-contrib-dataset
introduce a new AutoML task type of "image-instance-segmentation".
azureml-core
Added the initial core implementation of test set ingestion feature.
Fixing the xref warnings for documentation in azureml-core package
Doc string fixes for Command support feature in SDK
Adding command property to RunConfiguration. The feature enables users to run an actual
command or executables on the compute through AzureML SDK.
Users can delete an empty experiment given the ID of that experiment.
azureml-dataprep
Added dataset support for Spark built with Scala 2.12. This adds to the existing 2.11 support.
azureml-mlflow
AzureML-MLflow adds safe guards in remote scripts to avoid early termination of submitted
runs.
azureml-pipeline-core
Fixed a bug in setting a default pipeline for pipeline endpoint created via UI
azureml-pipeline-steps
Added experimental support for test data to AutoMLStep.
azureml-tensorboard
Fixing the xref warnings for documentation in azureml-core package
azureml-train-automl-client
Added experimental support for test data to AutoMLStep.
Added the initial core implementation of test set ingestion feature.
introduce a new AutoML task type of "image-instance-segmentation".
azureml-train-automl-runtime
Added the initial core implementation of test set ingestion feature.
Fix the computation of the raw explanations for the best AutoML model if the AutoML models
are trained using validation_size setting.
Moved references to sklearn.externals.joblib to depend directly on joblib.
azureml-train-core
HyperDriveRun.get_children_sorted_by_primary_metric() should complete faster now
Improved error handling in HyperDrive SDK.
Deprecated all estimator classes in favor of using ScriptRunConfig to configure experiment
runs. Deprecated classes include:
MMLBase
Estimator
PyTorch
TensorFlow
Chainer
SKLearn
Deprecated the use of Nccl and Gloo as valid input types for Estimator classes in favor of
using PyTorchConfiguration with ScriptRunConfig.
Deprecated the use of Mpi as a valid input type for Estimator classes in favor of using
MpiConfiguration with ScriptRunConfig.
Adding command property to runconfiguration. The feature enables users to run an actual
command or executables on the compute through AzureML SDK.
Deprecated all estimator classes in favor of using ScriptRunConfig to configure experiment
runs. Deprecated classes include: + MMLBaseEstimator + Estimator + PyTorch + TensorFlow
+ Chainer + SKLearn
Deprecated the use of Nccl and Gloo as a valid type of input for Estimator classes in favor of
using PyTorchConfiguration with ScriptRunConfig.
Deprecated the use of Mpi as a valid type of input for Estimator classes in favor of using
MpiConfiguration with ScriptRunConfig.
2020-11-30
Azure Machine Learning Studio Notebooks Experience (November Update )
New features
Native Terminal. Users will now have access to an integrated terminal as well as Git operation via the
integrated terminal.
Duplicate Folder
Costing for Compute Drop Down
Offline Compute Pylance
Bug fixes and improvements
Improved page load times
Improved performance
Improved speed and kernel reliability
Large File Upload. You can now upload file >95mb
2020-11-09
Azure Machine Learning SDK for Python v1.18.0
Bug fixes and improvements
azureml-automl-core
Improved handling of short time series by allowing padding them with gaussian noise.
azureml-automl-runtime
Throw ConfigException if a DateTime column has OutOfBoundsDatetime value
Improved handling of short time series by allowing padding them with gaussian noise.
Making sure that each text column can leverage char-gram transform with the n-gram range
based on the length of the strings in that text column
Providing raw feature explanations for best mode for AutoML experiments running on user's
local compute
azureml-core
Pin the package: pyjwt to avoid pulling in breaking versions in upcoming releases.
Creating an experiment will return the active or last archived experiment with that same given
name if such experiment exists or a new experiment.
Calling get_experiment by name will return the active or last archived experiment with that
given name.
Users cannot rename an experiment while reactivating it.
Improved error message to include potential fixes when a dataset is incorrectly passed to an
experiment (e.g. ScriptRunConfig).
Improved documentation for OutputDatasetConfig.register_on_complete to include the behavior
of what will happen when the name already exists.
Specifying dataset input and output names that have the potential to collide with common
environment variables will now result in a warning
Repurposed grant_workspace_access parameter when registering datastores. Set it to True to
access data behind virtual network from Machine Learning Studio. Learn more
Linked service API is refined. Instead of providing resource ID, we have 3 separate parameters
sub_id, rg, and name defined in configuration.
In order to enable customers to self-resolve token corruption issues, enable workspace token
synchronization to be a public method.
This change allows an empty string to be used as a value for a script_param
azureml-train-automl-client
Improved handling of short time series by allowing padding them with gaussian noise.
azureml-train-automl-runtime
Throw ConfigException if a DateTime column has OutOfBoundsDatetime value
Added support for providing raw feature explanations for best model for AutoML experiments
running on user's local compute
Improved handling of short time series by allowing padding them with gaussian noise.
azureml-train-core
This change allows an empty string to be used as a value for a script_param
azureml-train-restclients-hyperdrive
README has been changed to offer more context
azureml-widgets
Add string support to charts/parallel-coordinates library for widget.
2020-11-05
Data Labeling for image instance segmentation (polygon annotation) (preview)
The image instance segmentation (polygon annotations) project type in data labeling is available now, so users
can draw and annotate with polygons around the contour of the objects in the images. Users will be able assign
a class and a polygon to each object which of interest within an image.
Learn more about image instance segmentation labeling.
2020-10-26
Azure Machine Learning SDK for Python v1.17.0
new examples
A new community-driven repository of examples is available at https://github.com/Azure/azureml-
examples
Bug fixes and improvements
azureml-automl-core
Fixed an issue where get_output may raise an XGBoostError.
azureml-automl-runtime
Time/calendar based features created by AutoML will now have the prefix.
Fixed an IndexError occurring during training of StackEnsemble for classification datasets with
large number of classes and subsampling enabled.
Fixed an issue where VotingRegressor predictions may be inaccurate after refitting the model.
azureml-core
Additional detail added about relationship between AKS deployment configuration and Azure
Kubernetes Service concepts.
Environment client labels support. User can label Environments and reference them by label.
azureml-dataprep
Better error message when using currently unsupported Spark with Scala 2.12.
azureml-explain-model
The azureml-explain-model package is officially deprecated
azureml-mlflow
Resolved a bug in mlflow.projects.run against azureml backend where Finalizing state was not
handled properly.
azureml-pipeline-core
Add support to create, list and get pipeline schedule based one pipeline endpoint.
Improved the documentation of PipelineData.as_dataset with an invalid usage example - Using
PipelineData.as_dataset improperly will now result in a ValueException being thrown
Changed the HyperDriveStep pipelines notebook to register the best model within a
PipelineStep directly after the HyperDriveStep run.
azureml-pipeline-steps
Changed the HyperDriveStep pipelines notebook to register the best model within a
PipelineStep directly after the HyperDriveStep run.
azureml-train-automl-client
Fixed an issue where get_output may raise an XGBoostError.
Azure Machine Learning Studio Notebooks Experience (October Update )
New features
Full virtual network support
Focus Mode
Save notebooks Ctrl-S
Line Numbers
Bug fixes and improvements
Improvement in speed and kernel reliability
Jupyter Widget UI updates
2020-10-12
Azure Machine Learning SDK for Python v1.16.0
Bug fixes and improvements
azure-cli-ml
AKSWebservice and AKSEndpoints now support pod-level CPU and Memory resource limits.
These optional limits can be used by setting --cpu-cores-limit and --memory-gb-limit flags in
applicable CLI calls
azureml-core
Pin major versions of direct dependencies of azureml-core
AKSWebservice and AKSEndpoints now support pod-level CPU and Memory resource limits.
More information on Kubernetes Resources and Limits
Updated run.log_table to allow individual rows to be logged.
Added static method Run.get(workspace, run_id) to retrieve a run only using a workspace
Added instance method Workspace.get_run(run_id) to retrieve a run within the workspace
Introducing command property in run configuration which will enables users to submit
command instead of script & arguments.
azureml-interpret
fixed explanation client is_raw flag behavior in azureml-interpret
azureml-sdk
azureml-sdk officially support Python 3.8.
azureml-train-core
Adding TensorFlow 2.3 curated environment
Introducing command property in run configuration which will enables users to submit
command instead of script & arguments.
azureml-widgets
Redesigned interface for script run widget.
2020-09-28
Azure Machine Learning SDK for Python v1.15.0
Bug fixes and improvements
azureml-contrib-interpret
LIME explainer moved from azureml-contrib-interpret to interpret-community package and
image explainer removed from azureml-contrib-interpret package
visualization dashboard removed from azureml-contrib-interpret package, explanation client
moved to azureml-interpret package and deprecated in azureml-contrib-interpret package and
notebooks updated to reflect improved API
fix pypi package descriptions for azureml-interpret, azureml-explain-model, azureml-contrib-
interpret and azureml-tensorboard
azureml-contrib-notebook
Pin nbcovert dependency to < 6 so that papermill 1.x continues to work.
azureml-core
Added parameters to the TensorflowConfiguration and MpiConfiguration constructor to enable
a more streamlined initialization of the class attributes without requiring the user to set each
individual attribute. Added a PyTorchConfiguration class for configuring distributed PyTorch
jobs in ScriptRunConfig.
Pin the version of azure-mgmt-resource to fix the authentication error.
Support Triton No Code Deploy
outputs directories specified in Run.start_logging() will now be tracked when using run in
interactive scenarios. The tracked files will be visible on ML Studio upon calling Run.complete()
File encoding can be now specified during dataset creation with
Dataset.Tabular.from_delimited_files and Dataset.Tabular.from_json_lines_files by passing
the encoding argument. The supported encodings are 'utf8', 'iso88591', 'latin1', 'ascii', utf16',
'utf32', 'utf8bom' and 'windows1252'.
Bug fix when environment object is not passed to ScriptRunConfig constructor.
Updated Run.cancel() to allow cancel of a local run from another machine.
azureml-dataprep
Fixed dataset mount timeout issues.
azureml-explain-model
fix pypi package descriptions for azureml-interpret, azureml-explain-model, azureml-contrib-
interpret and azureml-tensorboard
azureml-interpret
visualization dashboard removed from azureml-contrib-interpret package, explanation client
moved to azureml-interpret package and deprecated in azureml-contrib-interpret package and
notebooks updated to reflect improved API
azureml-interpret package updated to depend on interpret-community 0.15.0
fix pypi package descriptions for azureml-interpret, azureml-explain-model, azureml-contrib-
interpret and azureml-tensorboard
azureml-pipeline-core
Fixed pipeline issue with OutputFileDatasetConfig where the system may stop responding
when register_on_complete is called with the name parameter set to a pre-existing dataset
name.
azureml-pipeline-steps
Removed stale databricks notebooks.
azureml-tensorboard
fix pypi package descriptions for azureml-interpret, azureml-explain-model, azureml-contrib-
interpret and azureml-tensorboard
azureml-train-automl-runtime
visualization dashboard removed from azureml-contrib-interpret package, explanation client
moved to azureml-interpret package and deprecated in azureml-contrib-interpret package and
notebooks updated to reflect improved API
azureml-widgets
visualization dashboard removed from azureml-contrib-interpret package, explanation client
moved to azureml-interpret package and deprecated in azureml-contrib-interpret package and
notebooks updated to reflect improved API
2020-09-21
Azure Machine Learning SDK for Python v1.14.0
Bug fixes and improvements
azure-cli-ml
Grid Profiling removed from the SDK and is not longer supported.
azureml-accel-models
azureml-accel-models package now supports TensorFlow 2.x
azureml-automl-core
Added error handling in get_output for cases when local versions of pandas/sklearn don't
match the ones used during training
azureml-automl-runtime
Fixed a bug where AutoArima iterations would fail with a PredictionException and the message:
"Silent failure occurred during prediction."
azureml-cli-common
Grid Profiling removed from the SDK and is not longer supported.
azureml-contrib-ser ver
Update description of the package for pypi overview page.
azureml-core
Grid Profiling removed from the SDK and is no longer supported.
Reduce number of error messages when workspace retrieval fails.
Don't show warning when fetching metadata fails
New Kusto Step and Kusto Compute Target.
Update document for sku parameter. Remove sku in workspace update functionality in CLI and
SDK.
Update description of the package for pypi overview page.
Updated documentation for AzureML Environments.
Expose service managed resources settings for AML workspace in SDK.
azureml-dataprep
Enable execute permission on files for Dataset mount.
azureml-mlflow
Updated AzureML MLflow documentation and notebook samples
New support for MLflow projects with AzureML backend
MLflow model registry support
Added Azure RBAC support for AzureML-MLflow operations
azureml-pipeline-core
Improved the documentation of the PipelineOutputFileDataset.parse_* methods.
New Kusto Step and Kusto Compute Target.
Provided Swaggerurl property for pipeline-endpoint entity via that user can see the schema
definition for published pipeline endpoint.
azureml-pipeline-steps
New Kusto Step and Kusto Compute Target.
azureml-telemetr y
Update description of the package for pypi overview page.
azureml-train
Update description of the package for pypi overview page.
azureml-train-automl-client
Added error handling in get_output for cases when local versions of pandas/sklearn don't
match the ones used during training
azureml-train-core
Update description of the package for pypi overview page.
2020-08-31
Azure Machine Learning SDK for Python v1.13.0
Preview features
azureml-core With the new output datasets capability, you can write back to cloud storage including
Blob, ADLS Gen 1, ADLS Gen 2, and FileShare. You can configure where to output data, how to output
data (via mount or upload), whether to register the output data for future reuse and sharing and pass
intermediate data between pipeline steps seamlessly. This enables reproducibility, sharing, prevents
duplication of data, and results in cost efficiency and productivity gains. Learn how to use it
Bug fixes and improvements
azureml-automl-core
Added validated_{platform}_requirements.txt file for pinning all pip dependencies for AutoML.
This release supports models greater than 4 Gb.
Upgraded AutoML dependencies: scikit-learn (now 0.22.1), pandas (now 0.25.1), numpy
(now 1.18.2).
azureml-automl-runtime
Set horovod for text DNN to always use fp16 compression.
This release supports models greater than 4 Gb.
Fixed issue where AutoML fails with ImportError: cannot import name RollingOriginValidator .
Upgraded AutoML dependencies: scikit-learn (now 0.22.1), pandas (now 0.25.1), numpy
(now 1.18.2).
azureml-contrib-automl-dnn-forecasting
Upgraded AutoML dependencies: scikit-learn (now 0.22.1), pandas (now 0.25.1), numpy
(now 1.18.2).
azureml-contrib-fairness
Provide a short description for azureml-contrib-fairness.
azureml-contrib-pipeline-steps
Added message indicating this package is deprecated and user should use azureml-pipeline-
steps instead.
azureml-core
Added list key command for workspace.
Add tags parameter in Workspace SDK and CLI.
Fixed the bug where submitting a child run with Dataset will fail due to
TypeError: can't pickle _thread.RLock objects .
Adding page_count default/documentation for Model list().
Modify CLI&SDK to take adbworkspace parameter and Add workspace adb lin/unlink runner.
Fix bug in Dataset.update that caused newest Dataset version to be updated not the version of
the Dataset update was called on.
Fix bug in Dataset.get_by_name that would show the tags for the newest Dataset version even
when a specific older version was retrieved.
azureml-interpret
Added probability outputs to shap scoring explainers in azureml-interpret based on
shap_values_output parameter from original explainer.
azureml-pipeline-core
Improved PipelineOutputAbstractDataset.register 's documentation.
azureml-train-automl-client
Upgraded AutoML dependencies: scikit-learn (now 0.22.1), pandas (now 0.25.1), numpy
(now 1.18.2).
azureml-train-automl-runtime
Upgraded AutoML dependencies: scikit-learn (now 0.22.1), pandas (now 0.25.1), numpy
(now 1.18.2).
azureml-train-core
Users must now provide a valid hyperparameter_sampling arg when creating a
HyperDriveConfig. In addition, the documentation for HyperDriveRunConfig has been edited to
inform users of the deprecation of HyperDriveRunConfig.
Reverting PyTorch Default Version to 1.4.
Adding PyTorch 1.6 & TensorFlow 2.2 images and curated environment.
Azure Machine Learning Studio Notebooks Experience (August Update )
New features
New Getting started landing Page
Preview features
Gather feature in Notebooks. With the Gather feature, users can now easily clean up notebooks with,
Gather uses an automated dependency analysis of your notebook, ensuring the essential code is kept,
but removing any irrelevant pieces.
Bug fixes and improvements
Improvement in speed and reliability
Dark mode bugs fixed
Output Scroll Bugs fixed
Sample Search now searches all the content of all the files in the Azure Machine Learning sample
notebooks repo
Multi-line R cells can now run
"I trust contents of this file" is now auto checked after first time
Improved Conflict resolution dialog, with new "Make a copy" option
2020-08-17
Azure Machine Learning SDK for Python v1.12.0
Bug fixes and improvements
azure-cli-ml
Add image_name and image_label parameters to Model.package() to enable renaming the built
package image.
azureml-automl-core
AutoML raises a new error code from dataprep when content is modified while being read.
azureml-automl-runtime
Added alerts for the user when data contains missing values but featurization is turned off.
Fixed child run failures when data contains nan and featurization is turned off.
AutoML raises a new error code from dataprep when content is modified while being read.
Updated normalization for forecasting metrics to occur by grain.
Improved calculation of forecast quantiles when lookback features are disabled.
Fixed bool sparse matrix handling when computing explanations after AutoML.
azureml-core
A new method run.get_detailed_status() now shows the detailed explanation of current run
status. It is currently only showing explanation for Queued status.
Add image_name and image_label parameters to Model.package() to enable renaming the built
package image.
New method set_pip_requirements() to set the entire pip section in CondaDependencies at
once.
Enable registering credential-less ADLS Gen2 datastore.
Improved error message when trying to download or mount an incorrect dataset type.
Update time series dataset filter sample notebook with more examples of partition_timestamp
that provides filter optimization.
Change the sdk and CLI to accept subscriptionId, resourceGroup, workspaceName,
peConnectionName as parameters instead of ArmResourceId when deleting private endpoint
connection.
Experimental Decorator shows class name for easier identification.
Descriptions for the Assets inside of Models are no longer automatically generated based on a
Run.
azureml-datadrift
Mark create_from_model API in DataDriftDetector as to be deprecated.
azureml-dataprep
Improved error message when trying to download or mount an incorrect dataset type.
azureml-pipeline-core
Fixed bug when deserializing pipeline graph that contains registered datasets.
azureml-pipeline-steps
RScriptStep supports RSection from azureml.core.environment.
Removed the passthru_automl_config parameter from the AutoMLStep public API and
converted it to an internal only parameter.
azureml-train-automl-client
Removed local asynchronous, managed environment runs from AutoML. All local runs will run
in the environment the run was launched from.
Fixed snapshot issues when submitting AutoML runs with no user-provided scripts.
Fixed child run failures when data contains nan and featurization is turned off.
azureml-train-automl-runtime
AutoML raises a new error code from dataprep when content is modified while being read.
Fixed snapshot issues when submitting AutoML runs with no user-provided scripts.
Fixed child run failures when data contains nan and featurization is turned off.
azureml-train-core
Added support for specifying pip options (for example --extra-index-url) in the pip
requirements file passed to an Estimator through pip_requirements_file parameter.
2020-08-03
Azure Machine Learning SDK for Python v1.11.0
Bug fixes and improvements
azure-cli-ml
Fix model framework and model framework not passed in run object in CLI model registration
path
Fix CLI amlcompute identity show command to show tenant ID and principal ID
azureml-train-automl-client
Added get_best_child () to AutoMLRun for fetching the best child run for an AutoML Run
without downloading the associated model.
Added ModelProxy object that allow predict or forecast to be run on a remote training
environment without downloading the model locally.
Unhandled exceptions in AutoML now point to a known issues HTTP page, where more
information about the errors can be found.
azureml-core
Model names can be 255 characters long.
Environment.get_image_details() return object type changed. DockerImageDetails class
replaced dict , image details are available from the new class properties. Changes are
backward compatible.
Fix bug for Environment.from_pip_requirements() to preserve dependencies structure
Fixed a bug where log_list would fail if an int and double were included in the same list.
While enabling private link on an existing workspace, please note that if there are compute
targets associated with the workspace, those targets will not work if they are not behind the
same virtual network as the workspace private endpoint.
Made as_named_input optional when using datasets in experiments and added as_mount and
as_download to FileDataset . The input name will automatically generated if as_mount or
as_download is called.
azureml-automl-core
Unhandled exceptions in AutoML now point to a known issues HTTP page, where more
information about the errors can be found.
Added get_best_child () to AutoMLRun for fetching the best child run for an AutoML Run
without downloading the associated model.
Added ModelProxy object that allows predict or forecast to be run on a remote training
environment without downloading the model locally.
azureml-pipeline-steps
Added enable_default_model_output and enable_default_metrics_output flags to AutoMLStep .
These flags can be used to enable/disable the default outputs.
2020-07-20
Azure Machine Learning SDK for Python v1.10.0
Bug fixes and improvements
azureml-automl-core
When using AutoML, if a path is passed into the AutoMLConfig object and it does not already
exist, it will be automatically created.
Users can now specify a time series frequency for forecasting tasks by using the freq
parameter.
azureml-automl-runtime
When using AutoML, if a path is passed into the AutoMLConfig object and it does not already
exist, it will be automatically created.
Users can now specify a time series frequency for forecasting tasks by using the freq
parameter.
AutoML Forecasting now supports rolling evaluation, which applies to the use case that the
length of a test or validation set is longer than the input horizon, and known y_pred value is
used as forecasting context.
azureml-core
Warning messages will be printed if no files were downloaded from the datastore in a run.
Added documentation for skip_validation to the
Datastore.register_azure_sql_database method .
Users are required to upgrade to sdk v1.10.0 or above to create an auto approved private
endpoint. This includes the Notebook resource that is usable behind the VNet.
Expose NotebookInfo in the response of get workspace.
Changes to have calls to list compute targets and getting compute target succeed on a remote
run. Sdk functions to get compute target and list workspace compute targets will now work in
remote runs.
Add deprecation messages to the class descriptions for azureml.core.image classes.
Throw exception and clean up workspace and dependent resources if workspace private
endpoint creation fails.
Support workspace sku upgrade in workspace update method.
azureml-datadrift
Update matplotlib version from 3.0.2 to 3.2.1 to support Python 3.8.
azureml-dataprep
Added support of web url data sources with Range or Head request.
Improved stability for file dataset mount and download.
azureml-train-automl-client
Fixed issues related to removal of RequirementParseError from setuptools.
Use docker instead of conda for local runs submitted using "compute_target='local'"
The iteration duration printed to the console has been corrected. Previously, the iteration
duration was sometimes printed as run end time minus run creation time. It has been corrected
to equal run end time minus run start time.
When using AutoML, if a path is passed into the AutoMLConfig object and it does not already
exist, it will be automatically created.
Users can now specify a time series frequency for forecasting tasks by using the freq
parameter.
azureml-train-automl-runtime
Improved console output when best model explanations fail.
Renamed input parameter to "blocked_models" to remove a sensitive term.
Renamed input parameter to "allowed_models" to remove a sensitive term.
Users can now specify a time series frequency for forecasting tasks by using the freq
parameter.
2020-07-06
Azure Machine Learning SDK for Python v1.9.0
Bug fixes and improvements
azureml-automl-core
Replaced get_model_path() with AZUREML_MODEL_DIR environment variable in AutoML
autogenerated scoring script. Also added telemetry to track failures during init().
Removed the ability to specify enable_cache as part of AutoMLConfig
Fixed a bug where runs may fail with service errors during specific forecasting runs
Improved error handling around specific models during get_output
Fixed call to fitted_model.fit(X, y) for classification with y transformer
Enabled customized forward fill imputer for forecasting tasks
A new ForecastingParameters class will be used instead of forecasting parameters in a dict
format
Improved target lag autodetection
Added limited availability of multi-noded, multi-gpu distributed featurization with BERT
azureml-automl-runtime
Prophet now does additive seasonality modeling instead of multiplicative.
Fixed the issue when short grains, having frequencies different from ones of the long grains
will result in failed runs.
azureml-contrib-automl-dnn-vision
Collect system/gpu stats and log averages for training and scoring
azureml-contrib-mir
Added support for enable-app-insights flag in ManagedInferencing
azureml-core
A validate parameter to these APIs by allowing validation to be skipped when the data source is
not accessible from the current compute.
TabularDataset.time_before(end_time, include_boundary=True, validate=True)
TabularDataset.time_after(start_time, include_boundary=True, validate=True)
TabularDataset.time_recent(time_delta, include_boundary=True, validate=True)
TabularDataset.time_between(start_time, end_time, include_boundary=True,
validate=True)
Added framework filtering support for model list, and added NCD AutoML sample in notebook
back
For Datastore.register_azure_blob_container and Datastore.register_azure_file_share (only
options that support SAS token), we have updated the doc strings for the sas_token field to
include minimum permissions requirements for typical read and write scenarios.
Deprecating _with_auth param in ws.get_mlflow_tracking_uri()
azureml-mlflow
Add support for deploying local file:// models with AzureML-MLflow
Deprecating _with_auth param in ws.get_mlflow_tracking_uri()
azureml-opendatasets
Recently published Covid-19 tracking datasets are now available with the SDK
azureml-pipeline-core
Log out warning when "azureml-defaults" is not included as part of pip-dependency
Improve Note rendering.
Added support for quoted line breaks when parsing delimited files to
PipelineOutputFileDataset.
The PipelineDataset class is deprecated. For more information, see https://aka.ms/dataset-
deprecation. Learn how to use dataset with pipeline, see https://aka.ms/pipeline-with-dataset.
azureml-pipeline-steps
Doc updates to azureml-pipeline-steps.
Added support in ParallelRunConfig's load_yaml() for users to define Environments inline with
the rest of the config or in a separate file
azureml-train-automl-client .
Removed the ability to specify enable_cache as part of AutoMLConfig
azureml-train-automl-runtime
Added limited availability of multi-noded, multi-gpu distributed featurization with BERT.
Added error handling for incompatible packages in ADB based automated machine learning
runs.
azureml-widgets
Doc updates to azureml-widgets.
2020-06-22
Azure Machine Learning SDK for Python v1.8.0
Preview features
azureml-contrib-fairness The azureml-contrib-fairness package provides integration between the
open-source fairness assessment and unfairness mitigation package Fairlearn and Azure Machine
Learning studio. In particular, the package enables model fairness evaluation dashboards to be
uploaded as part of an AzureML Run and appear in Azure Machine Learning studio
Bug fixes and improvements
azure-cli-ml
Support getting logs of init container.
Added new CLI commands to manage ComputeInstance
azureml-automl-core
Users are now able to enable stack ensemble iteration for Time series tasks with a warning that
it could potentially overfit.
Added a new type of user exception that is raised if the cache store contents have been
tampered with
azureml-automl-runtime
Class Balancing Sweeping will no longer be enabled if user disables featurization.
azureml-contrib-notebook
Doc improvements to azureml-contrib-notebook package.
azureml-contrib-pipeline-steps
Doc improvements to azureml-contrib--pipeline-steps package.
azureml-core
Add set_connection, get_connection, list_connections, delete_connection functions for customer
to operate on workspace connection resource
Documentation updates to azureml-coore/azureml.exceptions package.
Documentation updates to azureml-core package.
Doc updates to ComputeInstance class.
Doc improvements to azureml-core/azureml.core.compute package.
Doc improvements for webservice-related classes in azureml-core.
Support user-selected datastore to store profiling data
Added expand and page_count property for model list API
Fixed bug where removing the overwrite property will cause the submitted run to fail with
deserialization error.
Fixed inconsistent folder structure when downloading or mounting a FileDataset referencing to
a single file.
Loading a dataset of parquet files to_spark_dataframe is now faster and supports all parquet
and Spark SQL datatypes.
Support getting logs of init container.
AutoML runs are now marked as child run of Parallel Run Step.
azureml-datadrift
Doc improvements to azureml-contrib-notebook package.
azureml-dataprep
Loading a dataset of parquet files to_spark_dataframe is now faster and supports all parquet
and Spark SQL datatypes.
Better memory handling for OutOfMemory issue for to_pandas_dataframe.
azureml-interpret
Upgraded azureml-interpret to use interpret-community version 0.12.*
azureml-mlflow
Doc improvements to azureml-mlflow.
Adds support for AML model registry with MLFlow.
azureml-opendatasets
Added support for Python 3.8
azureml-pipeline-core
Updated PipelineDataset 's documentation to make it clear it is an internal class.
ParallelRunStep updates to accept multiple values for one argument, for example: "--
group_column_names", "Col1", "Col2", "Col3"
Removed the passthru_automl_config requirement for intermediate data usage with
AutoMLStep in Pipelines.
azureml-pipeline-steps
Doc improvements to azureml-pipeline-steps package.
Removed the passthru_automl_config requirement for intermediate data usage with
AutoMLStep in Pipelines.
azureml-telemetr y
Doc improvements to azureml-telemetry.
azureml-train-automl-client
Fixed a bug where experiment.submit() called twice on an AutoMLConfig object resulted in
different behavior.
Users are now able to enable stack ensemble iteration for Time series tasks with a warning that
it could potentially overfit.
Changed AutoML run behavior to raise UserErrorException if service throws user error
Fixes a bug that caused azureml_automl.log to not get generated or be missing logs when
performing an AutoML experiment on a remote compute target.
For Classification data sets with imbalanced classes, we will apply Weight Balancing, if the
feature sweeper determines that for subsampled data, Weight Balancing improves the
performance of the classification task by a certain threshold.
AutoML runs are now marked as child run of Parallel Run Step.
azureml-train-automl-runtime
Changed AutoML run behavior to raise UserErrorException if service throws user error
AutoML runs are now marked as child run of Parallel Run Step.
2020-06-08
Azure Machine Learning SDK for Python v1.7.0
Bug fixes and improvements
azure-cli-ml
Completed the removal of model profiling from mir contrib by cleaning up CLI commands and
package dependencies, Model profiling is available in core.
Upgrades the min Azure CLI version to 2.3.0
azureml-automl-core
Better exception message on featurization step fit_transform() due to custom transformer
parameters.
Add support for multiple languages for deep learning transformer models such as BERT in
automated ML.
Remove deprecated lag_length parameter from documentation.
The forecasting parameters documentation was improved. The lag_length parameter was
deprecated.
azureml-automl-runtime
Fixed the error raised when one of categorical columns is empty in forecast/test time.
Fix the run failures happening when the lookback features are enabled and the data contain
short grains.
Fixed the issue with duplicated time index error message when lags or rolling windows were
set to 'auto'.
Fixed the issue with Prophet and Arima models on data sets, containing the lookback features.
Added support of dates before 1677-09-21 or after 2262-04-11 in columns other then date
time in the forecasting tasks. Improved error messages.
The forecasting parameters documentation was improved. The lag_length parameter was
deprecated.
Better exception message on featurization step fit_transform() due to custom transformer
parameters.
Add support for multiple languages for deep learning transformer models such as BERT in
automated ML.
Cache operations that result in some OSErrors will raise user error.
Added checks to ensure training and validation data have the same number and set of columns
Fixed issue with the autogenerated AutoML scoring script when the data contains quotation
marks
Enabling explanations for AutoML Prophet and ensembled models that contain Prophet model.
A recent customer issue revealed a live-site bug wherein we log messages along Class-
Balancing-Sweeping even when the Class Balancing logic isn't properly enabled. Removing
those logs/messages with this PR.
azureml-cli-common
Completed the removal of model profiling from mir contrib by cleaning up CLI commands and
package dependencies, Model profiling is available in core.
azureml-contrib-reinforcementlearning
Load testing tool
azureml-core
Documentation changes on Script_run_config.py
Fixes a bug with printing the output of run submit-pipeline CLI
Documentation improvements to azureml-core/azureml.data
Fixes issue retrieving storage account using hdfs getconf command
Improved register_azure_blob_container and register_azure_file_share documentation
azureml-datadrift
Improved implementation for disabling and enabling dataset drift monitors
azureml-interpret
In explanation client, remove NaNs or Infs prior to json serialization on upload from artifacts
Update to latest version of interpret-community to improve out of memory errors for global
explanations with many features and classes
Add true_ys optional parameter to explanation upload to enable additional features in the
studio UI
Improve download_model_explanations() and list_model_explanations() performance
Small tweaks to notebooks, to aid with debugging
azureml-opendatasets
azureml-opendatasets needs azureml-dataprep version 1.4.0 or higher. Added warning if lower
version is detected
azureml-pipeline-core
This change allows user to provide an optional runconfig to the moduleVersion when calling
module.Publish_python_script.
Enable node account can be a pipeline parameter in ParallelRunStep in azureml.pipeline.steps
azureml-pipeline-steps
This change allows user to provide an optional runconfig to the moduleVersion when calling
module.Publish_python_script.
azureml-train-automl-client
Add support for multiple languages for deep learning transformer models such as BERT in
automated ML.
Remove deprecated lag_length parameter from documentation.
The forecasting parameters documentation was improved. The lag_length parameter was
deprecated.
azureml-train-automl-runtime
Enabling explanations for AutoML Prophet and ensembled models that contain Prophet model.
Documentation updates to azureml-train-automl-* packages.
azureml-train-core
Supporting TensorFlow version 2.1 in the PyTorch Estimator
Improvements to azureml-train-core package.
2020-05-26
Azure Machine Learning SDK for Python v1.6.0
New features
azureml-automl-runtime
AutoML Forecasting now supports customers forecast beyond the pre-specified max-horizon
without retraining the model. When the forecast destination is farther into the future than the
specified maximum horizon, the forecast() function will still make point predictions out to the
later date using a recursive operation mode. For the illustration of the new feature, please see
the "Forecasting farther than the maximum horizon" section of "forecasting-forecast-function"
notebook in folder."
azureml-pipeline-steps
ParallelRunStep is now released and is part of azureml-pipeline-steps package. Existing
ParallelRunStep in azureml-contrib-pipeline-steps package is deprecated. Changes from
public preview version:
Added run_max_try optional configurable parameter to control max call to run method
for any given batch, default value is 3.
No PipelineParameters are autogenerated anymore. Following configurable values can
be set as PipelineParameter explicitly.
mini_batch_size
node_count
process_count_per_node
logging_level
run_invocation_timeout
run_max_try
Default value for process_count_per_node is changed to 1. User should tune this value
for better performance. Best practice is to set as the number of GPU or CPU node has.
ParallelRunStep does not inject any packages, user needs to include azureml-core and
azureml-dataprep[pandas, fuse] packages in environment definition. If custom
docker image is used with user_managed_dependencies then user need to install conda
on the image.
Breaking changes
azureml-pipeline-steps
Deprecated the use of azureml.dprep.Dataflow as a valid type of input for AutoMLConfig
azureml-train-automl-client
Deprecated the use of azureml.dprep.Dataflow as a valid type of input for AutoMLConfig
Bug fixes and improvements
azureml-automl-core
Fixed the bug where a warning may be printed during get_output that asked user to
downgrade client.
Updated Mac to rely on cudatoolkit=9.0 as it is not available at version 10 yet.
Removing restrictions on prophet and xgboost models when trained on remote compute.
Improved logging in AutoML
The error handling for custom featurization in forecasting tasks was improved.
Added functionality to allow users to include lagged features to generate forecasts.
Updates to error message to correctly display user error.
Support for cv_split_column_names to be used with training_data
Update logging the exception message and traceback.
azureml-automl-runtime
Enable guardrails for forecasting missing value imputations.
Improved logging in AutoML
Added fine grained error handling for data prep exceptions
Removing restrictions on prophet and xgboost models when trained on remote compute.
azureml-train-automl-runtime and azureml-automl-runtime have updated dependencies for
pytorch , scipy , and cudatoolkit . we now support pytorch==1.4.0 , scipy>=1.0.0,<=1.3.1 ,
and cudatoolkit==10.1.243 .
The error handling for custom featurization in forecasting tasks was improved.
The forecasting data set frequency detection mechanism was improved.
Fixed issue with Prophet model training on some data sets.
The auto detection of max horizon during the forecasting was improved.
Added functionality to allow users to include lagged features to generate forecasts.
Adds functionality in the forecast function to enable providing forecasts beyond the trained
horizon without retraining the forecasting model.
Support for cv_split_column_names to be used with training_data
azureml-contrib-automl-dnn-forecasting
Improved logging in AutoML
azureml-contrib-mir
Added support for Windows services in ManagedInferencing
Remove old MIR workflows such as attach MIR compute, SingleModelMirWebservice class -
Clean out model profiling placed in contrib-mir package
azureml-contrib-pipeline-steps
Minor fix for YAML support
ParallelRunStep is released to General Availability - azureml.contrib.pipeline.steps has a
deprecation notice and is move to azureml.pipeline.steps
azureml-contrib-reinforcementlearning
RL Load testing tool
RL estimator has smart defaults
azureml-core
Remove old MIR workflows such as attach MIR compute, SingleModelMirWebservice class -
Clean out model profiling placed in contrib-mir package
Fixed the information provided to the user in case of profiling failure: included request ID and
reworded the message to be more meaningful. Added new profiling workflow to profiling
runners
Improved error text in case of Dataset execution failures.
Workspace private link CLI support added.
Added an optional parameter invalid_lines to Dataset.Tabular.from_json_lines_files that
allows for specifying how to handle lines that contain invalid JSON.
We will be deprecating the run-based creation of compute in the next release. We recommend
creating an actual Amlcompute cluster as a persistent compute target, and using the cluster
name as the compute target in your run configuration. See example notebook here:
aka.ms/amlcomputenb
Improved error messages in case of Dataset execution failures.
azureml-dataprep
Made warning to upgrade pyarrow version more explicit.
Improved error handling and message returned in case of failure to execute dataflow.
azureml-interpret
Documentation updates to azureml-interpret package.
Fixed interpretability packages and notebooks to be compatible with latest sklearn update
azureml-opendatasets
return None when there is no data returned.
Improve the performance of to_pandas_dataframe.
azureml-pipeline-core
Quick fix for ParallelRunStep where loading from YAML was broken
ParallelRunStep is released to General Availability - azureml.contrib.pipeline.steps has a
deprecation notice and is move to azureml.pipeline.steps - new features include: 1. Datasets as
PipelineParameter 2. New parameter run_max_retry 3. Configurable append_row output file
name
azureml-pipeline-steps
Deprecated azureml.dprep.Dataflow as a valid type for input data.
Quick fix for ParallelRunStep where loading from YAML was broken
ParallelRunStep is released to General Availability - azureml.contrib.pipeline.steps has a
deprecation notice and is move to azureml.pipeline.steps - new features include:
Datasets as PipelineParameter
New parameter run_max_retry
Configurable append_row output file name
azureml-telemetr y
Update logging the exception message and traceback.
azureml-train-automl-client
Improved logging in AutoML
Updates to error message to correctly display user error.
Support for cv_split_column_names to be used with training_data
Deprecated azureml.dprep.Dataflow as a valid type for input data.
Updated Mac to rely on cudatoolkit=9.0 as it is not available at version 10 yet.
Removing restrictions on prophet and xgboost models when trained on remote compute.
azureml-train-automl-runtime and azureml-automl-runtime have updated dependencies for
pytorch , scipy , and cudatoolkit . we now support pytorch==1.4.0 , scipy>=1.0.0,<=1.3.1 ,
and cudatoolkit==10.1.243 .
Added functionality to allow users to include lagged features to generate forecasts.
azureml-train-automl-runtime
Improved logging in AutoML
Added fine grained error handling for data prep exceptions
Removing restrictions on prophet and xgboost models when trained on remote compute.
azureml-train-automl-runtime and azureml-automl-runtime have updated dependencies for
pytorch , scipy , and cudatoolkit . we now support pytorch==1.4.0 , scipy>=1.0.0,<=1.3.1 ,
and cudatoolkit==10.1.243 .
Updates to error message to correctly display user error.
Support for cv_split_column_names to be used with training_data
azureml-train-core
Added a new set of HyperDrive specific exceptions. azureml.train.hyperdrive will now throw
detailed exceptions.
azureml-widgets
AzureML Widgets is not displaying in JupyterLab
2020-05-11
Azure Machine Learning SDK for Python v1.5.0
New features
Preview features
azureml-contrib-reinforcementlearning
Azure Machine Learning is releasing preview support for reinforcement learning using
the Ray framework. The ReinforcementLearningEstimator enables training of
reinforcement learning agents across GPU and CPU compute targets in Azure Machine
Learning.
Bug fixes and improvements
azure-cli-ml
Fixes an accidentally left behind warning log in my previous PR. The log was used for
debugging and accidentally was left behind.
Bug fix: inform clients about partial failure during profiling
azureml-automl-core
Speed up Prophet/AutoArima model in AutoML forecasting by enabling parallel fitting for the
time series when data sets have multiple time series. In order to benefit from this new feature,
you are recommended to set "max_cores_per_iteration = -1" (that is, using all the available cpu
cores) in AutoMLConfig.
Fix KeyError on printing guardrails in console interface
Fixed error message for experimentation_timeout_hours
Deprecated TensorFlow models for AutoML.
azureml-automl-runtime
Fixed error message for experimentation_timeout_hours
Fixed unclassified exception when trying to deserialize from cache store
Speed up Prophet/AutoArima model in AutoML forecasting by enabling parallel fitting for the
time series when data sets have multiple time series.
Fixed the forecasting with enabled rolling window on the data sets where test/prediction set
does not contain one of grains from the training set.
Improved handling of missing data
Fixed issue with prediction intervals during forecasting on data sets, containing time series,
which are not aligned in time.
Added better validation of data shape for the forecasting tasks.
Improved the frequency detection.
Created better error message if the cross validation folds for forecasting tasks cannot be
generated.
Fix console interface to print missing value guardrail correctly.
Enforcing datatype checks on cv_split_indices input in AutoMLConfig.
azureml-cli-common
Bug fix: inform clients about partial failure during profiling
azureml-contrib-mir
Adds a class azureml.contrib.mir.RevisionStatus which relays information about the currently
deployed MIR revision and the most recent version specified by the user. This class is included
in the MirWebservice object under 'deployment_status' attribute.
Enables update on Webservices of type MirWebservice and its child class
SingleModelMirWebservice.
azureml-contrib-reinforcementlearning
Added support for Ray 0.8.3
AmlWindowsCompute only supports Azure Files as mounted storage
Renamed health_check_timeout to health_check_timeout_seconds
Fixed some class/method descriptions.
azureml-core
Enabled WASB -> Blob conversions in Azure Government and China clouds.
Fixes bug to allow Reader roles to use az ml run CLI commands to get run information
Removed unnecessary logging during Azure ML Remote Runs with input Datasets.
RCranPackage now supports "version" parameter for the CRAN package version.
Bug fix: inform clients about partial failure during profiling
Added European-style float handling for azureml-core.
Enabled workspace private link features in Azure ml sdk.
When creating a TabularDataset using from_delimited_files , you can specify whether empty
values should be loaded as None or as empty string by setting the boolean argument
empty_as_string .
Added European-style float handling for datasets.
Improved error messages on dataset mount failures.
azureml-datadrift
Data Drift results query from the SDK had a bug that didn't differentiate the minimum,
maximum, and mean feature metrics, resulting in duplicate values. We have fixed this bug by
prefixing target or baseline to the metric names. Before: duplicate min, max, mean. After:
target_min, target_max, target_mean, baseline_min, baseline_max, baseline_mean.
azureml-dataprep
Improve handling of write restricted Python environments when ensuring .NET Dependencies
required for data delivery.
Fixed Dataflow creation on file with leading empty records.
Added error handling options for to_partition_iterator similar to to_pandas_dataframe .
azureml-interpret
Reduced explanation path length limits to reduce likelihood of going over Windows limit
Bugfix for sparse explanations created with the mimic explainer using a linear surrogate model.
azureml-opendatasets
Fix issue of MNIST's columns are parsed as string, which should be int.
azureml-pipeline-core
Allowing the option to regenerate_outputs when using a module that is embedded in a
ModuleStep.
azureml-train-automl-client
Deprecated TensorFlow models for AutoML.
Fix users allow listing unsupported algorithms in local mode
Doc fixes to AutoMLConfig.
Enforcing datatype checks on cv_split_indices input in AutoMLConfig.
Fixed issue with AutoML runs failing in show_output
azureml-train-automl-runtime
Fixing a bug in Ensemble iterations that was preventing model download timeout from kicking
in successfully.
azureml-train-core
Fix typo in azureml.train.dnn.Nccl class.
Supporting PyTorch version 1.5 in the PyTorch Estimator
Fix the issue that framework image can't be fetched in Azure Government region when using
training framework estimators
2020-05-04
New Notebook Experience
You can now create, edit, and share machine learning notebooks and files directly inside the studio web
experience of Azure Machine Learning. You can use all the classes and methods available in Azure Machine
Learning Python SDK from inside these notebooks. To get started, visit the Run Jupyter Notebooks in your
workspace article.
New Features Introduced:
Improved editor (Monaco editor) used by VS Code
UI/UX improvements
Cell Toolbar
New Notebook Toolbar and Compute Controls
Notebook Status Bar
Inline Kernel Switching
R Support
Accessibility and Localization improvements
Command Palette
Additional Keyboard Shortcuts
Auto save
Improved performance and reliability
Access the following web-based authoring tools from the studio:
Azure ML Studio Notebooks First in-class authoring for notebook files and support all
operation available in the Azure ML Python SDK.
2020-04-27
Azure Machine Learning SDK for Python v1.4.0
New features
AmlCompute clusters now support setting up a managed identity on the cluster at the time of
provisioning. Just specify whether you would like to use a system-assigned identity or a user-assigned
identity, and pass an identityId for the latter. You can then set up permissions to access various
resources like Storage or ACR in a way that the identity of the compute gets used to securely access
the data, instead of a token-based approach that AmlCompute employs today. Check out our SDK
reference for more information on the parameters.
Breaking changes
AmlCompute clusters supported a Preview feature around run-based creation, that we are planning
on deprecating in two weeks. You can continue to create persistent compute targets as always by
using the Amlcompute class, but the specific approach of specifying the identifier "amlcompute" as the
compute target in run config will not be supported in the near future.
Bug fixes and improvements
azureml-automl-runtime
Enable support for unhashable type when calculating number of unique values in a column.
azureml-core
Improved stability when reading from Azure Blob Storage using a TabularDataset.
Improved documentation for the grant_workspace_msi parameter for
Datastore.register_azure_blob_store .
Fixed bug with datastore.upload to support the src_dir argument ending with a / or \ .
Added actionable error message when trying to upload to an Azure Blob Storage datastore that
does not have an access key or SAS token.
azureml-interpret
Added upper bound to file size for the visualization data on uploaded explanations.
azureml-train-automl-client
Explicitly checking for label_column_name & weight_column_name parameters for
AutoMLConfig to be of type string.
azureml-contrib-pipeline-steps
ParallelRunStep now supports dataset as pipeline parameter. User can construct pipeline with
sample dataset and can change input dataset of the same type (file or tabular) for new pipeline
run.
2020-04-13
Azure Machine Learning SDK for Python v1.3.0
Bug fixes and improvements
azureml-automl-core
Added additional telemetry around post-training operations.
Speeds up automatic ARIMA training by using conditional sum of squares (CSS) training for
series of length longer than 100. The length used is stored as the constant
ARIMA_TRIGGER_CSS_TRAINING_LENGTH w/in the TimeSeriesInternal class at /src/azureml-
automl-core/azureml/automl/core/shared/constants.py
The user logging of forecasting runs was improved, now more information on what phase is
currently running will be shown in the log
Disallowed target_rolling_window_size to be set to values less than 2
azureml-automl-runtime
Improved the error message shown when duplicated timestamps are found.
Disallowed target_rolling_window_size to be set to values less than 2.
Fixed the lag imputation failure. The issue was caused by the insufficient number of
observations needed to seasonally decompose a series. The "de-seasonalized" data is used to
compute a partial autocorrelation function (PACF) to determine the lag length.
Enabled column purpose featurization customization for forecasting tasks by featurization
config. Numerical and Categorical as column purpose for forecasting tasks is now supported.
Enabled drop column featurization customization for forecasting tasks by featurization config.
Enabled imputation customization for forecasting tasks by featurization config. Constant value
imputation for target column and mean, median, most_frequent, and constant value imputation
for training data are now supported.
azureml-contrib-pipeline-steps
Accept string compute names to be passed to ParallelRunConfig
azureml-core
Added Environment.clone(new_name) API to create a copy of Environment object
Environment.docker.base_dockerfile accepts filepath. If able to resolve a file, the content will be
read into base_dockerfile environment property
Automatically reset mutually exclusive values for base_image and base_dockerfile when user
manually sets a value in Environment.docker
Added user_managed flag in RSection that indicates whether the environment is managed by
user or by AzureML.
Dataset: Fixed dataset download failure if data path containing unicode characters.
Dataset: Improved dataset mount caching mechanism to respect the minimum disk space
requirement in Azure Machine Learning Compute, which avoids making the node unusable and
causing the job to be canceled.
Dataset: We add an index for the time series column when you access a time series dataset as a
pandas dataframe, which is used to speed up access to time series-based data access.
Previously, the index was given the same name as the timestamp column, confusing users
about which is the actual timestamp column and which is the index. We now don't give any
specific name to the index since it should not be used as a column.
Dataset: Fixed dataset authentication issue in sovereign cloud.
Dataset: Fixed Dataset.to_spark_dataframe failure for datasets created from Azure PostgreSQL
datastores.
azureml-interpret
Added global scores to visualization if local importance values are sparse
Updated azureml-interpret to use interpret-community 0.9.*
Fixed issue with downloading explanation that had sparse evaluation data
Added support of sparse format of the explanation object in AutoML
azureml-pipeline-core
Support ComputeInstance as compute target in pipelines
azureml-train-automl-client
Added additional telemetry around post-training operations.
Fixed the regression in early stopping
Deprecated azureml.dprep.Dataflow as a valid type for input data.
Changing default AutoML experiment time out to six days.
azureml-train-automl-runtime
Added additional telemetry around post-training operations.
added sparse AutoML end to end support
azureml-opendatasets
Added additional telemetry for service monitor.
Enable front door for blob to increase stability
2020-03-23
Azure Machine Learning SDK for Python v1.2.0
Breaking changes
Drop support for Python 2.7
Bug fixes and improvements
azure-cli-ml
Adds "--subscription-id" to az ml model/computetarget/service commands in the CLI
Adding support for passing customer-managed key(CMK) vault_url, key_name and key_version
for ACI deployment
azureml-automl-core
Enabled customized imputation with constant value for both X and y data forecasting tasks.
Fixed the issue in with showing error messages to user.
azureml-automl-runtime
Fixed the issue in with forecasting on the data sets, containing grains with only one row
Decreased the amount of memory required by the forecasting tasks.
Added better error messages if time column has incorrect format.
Enabled customized imputation with constant value for both X and y data forecasting tasks.
azureml-core
Added support for loading ServicePrincipal from environment variables:
AZUREML_SERVICE_PRINCIPAL_ID, AZUREML_SERVICE_PRINCIPAL_TENANT_ID, and
AZUREML_SERVICE_PRINCIPAL_PASSWORD
Introduced a new parameter support_multi_line to Dataset.Tabular.from_delimited_files : By
default ( support_multi_line=False ), all line breaks, including those in quoted field values, will
be interpreted as a record break. Reading data this way is faster and more optimized for
parallel execution on multiple CPU cores. However, it may result in silently producing more
records with misaligned field values. This should be set to True when the delimited files are
known to contain quoted line breaks.
Added the ability to register ADLS Gen2 in the Azure Machine Learning CLI
Renamed parameter 'fine_grain_timestamp' to 'timestamp' and parameter
'coarse_grain_timestamp' to 'partition_timestamp' for the with_timestamp_columns() method
in TabularDataset to better reflect the usage of the parameters.
Increased max experiment name length to 255.
azureml-interpret
Updated azureml-interpret to interpret-community 0.7.*
azureml-sdk
Changing to dependencies with compatible version Tilde for the support of patching in pre-
release and stable releases.
2020-03-11
Azure Machine Learning SDK for Python v1.1.5
Feature deprecation
Python 2.7
Last version to support Python 2.7
Breaking changes
Semantic Versioning 2.0.0
Starting with version 1.1 Azure ML Python SDK adopts Semantic Versioning 2.0.0. All
subsequent versions will follow new numbering scheme and semantic versioning contract.
Bug fixes and improvements
azure-cli-ml
Change the endpoint CLI command name from 'az ml endpoint aks' to 'az ml endpoint real
time' for consistency.
update CLI installation instructions for stable and experimental branch CLI
Single instance profiling was fixed to produce a recommendation and was made available in
core sdk.
azureml-automl-core
Enabled the Batch mode inference (taking multiple rows once) for AutoML ONNX models
Improved the detection of frequency on the data sets, lacking data or containing irregular data
points
Added the ability to remove data points not complying with the dominant frequency.
Changed the input of the constructor to take a list of options to apply the imputation options
for corresponding columns.
The error logging has been improved.
azureml-automl-runtime
Fixed the issue with the error thrown if the grain was not present in the training set appeared in
the test set
Removed the y_query requirement during scoring on forecasting service
Fixed the issue with forecasting when the data set contains short grains with long time gaps.
Fixed the issue when the auto max horizon is turned on and the date column contains dates in
form of strings. Proper conversion and error messages were added for when conversion to
date is not possible
Using native NumPy and SciPy for serializing and deserializing intermediate data for
FileCacheStore (used for local AutoML runs)
Fixed a bug where failed child runs could get stuck in Running state.
Increased speed of featurization.
Fixed the frequency check during scoring, now the forecasting tasks do not require strict
frequency equivalence between train and test set.
Changed the input of the constructor to take a list of options to apply the imputation options
for corresponding columns.
Fixed errors related to lag type selection.
Fixed the unclassified error raised on the data sets, having grains with the single row
Fixed the issue with frequency detection slowness.
Fixes a bug in AutoML exception handling that caused the real reason for training failure to be
replaced by an AttributeError.
azureml-cli-common
Single instance profiling was fixed to produce a recommendation and was made available in
core sdk.
azureml-contrib-mir
Adds functionality in the MirWebservice class to retrieve the Access Token
Use token auth for MirWebservice by default during MirWebservice.run() call - Only refresh if
call fails
Mir webservice deployment now requires proper Skus [Standard_DS2_v2, Standard_F16,
Standard_A2_v2] instead of [Ds2v2, A2v2, and F16] respectively.
azureml-contrib-pipeline-steps
Optional parameter side_inputs added to ParallelRunStep. This parameter can be used to
mount folder on the container. Currently supported types are DataReference and PipelineData.
Parameters passed in ParallelRunConfig can be overwritten by passing pipeline parameters
now. New pipeline parameters supported aml_mini_batch_size, aml_error_threshold,
aml_logging_level, aml_run_invocation_timeout (aml_node_count and
aml_process_count_per_node are already part of earlier release).
azureml-core
Deployed AzureML Webservices will now default to INFO logging. This can be controlled by
setting the AZUREML_LOG_LEVEL environment variable in the deployed service.
Python sdk uses discovery service to use 'api' endpoint instead of 'pipelines'.
Swap to the new routes in all SDK calls.
Changed routing of calls to the ModelManagementService to a new unified structure.
Made workspace update method publicly available.
Added image_build_compute parameter in workspace update method to allow user
updating the compute for image build.
Added deprecation messages to the old profiling workflow. Fixed profiling cpu and memory
limits.
Added RSection as part of Environment to run R jobs.
Added validation to Dataset.mount to raise error when source of the dataset is not accessible
or does not contain any data.
Added --grant-workspace-msi-access as an additional parameter for the Datastore CLI for
registering Azure Blob Container that will allow you to register Blob Container that is behind a
VNet.
Single instance profiling was fixed to produce a recommendation and was made available in
core sdk.
Fixed the issue in aks.py _deploy.
Validates the integrity of models being uploaded to avoid silent storage failures.
User may now specify a value for the auth key when regenerating keys for webservices.
Fixed bug where uppercase letters cannot be used as dataset's input name.
azureml-defaults
azureml-dataprep will now be installed as part of azureml-defaults . It is no longer required to
install data prep[fuse] manually on compute targets to mount datasets.
azureml-interpret
Updated azureml-interpret to interpret-community 0.6.*
Updated azureml-interpret to depend on interpret-community 0.5.0
Added azureml-style exceptions to azureml-interpret
Fixed DeepScoringExplainer serialization for keras models
azureml-mlflow
Add support for sovereign clouds to azureml.mlflow
azureml-pipeline-core
Pipeline batch scoring notebook now uses ParallelRunStep
Fixed a bug where PythonScriptStep results could be incorrectly reused despite changing the
arguments list
Added the ability to set columns' type when calling the parse_* methods on
PipelineOutputFileDataset
azureml-pipeline-steps
Moved the AutoMLStep to the azureml-pipeline-steps package. Deprecated the AutoMLStep
within azureml-train-automl-runtime .
Added documentation example for dataset as PythonScriptStep input
azureml-tensorboard
Updated azureml-tensorboard to support TensorFlow 2.0
Show correct port number when using a custom TensorBoard port on a Compute Instance
azureml-train-automl-client
Fixed an issue where certain packages may be installed at incorrect versions on remote runs.
fixed FeaturizationConfig overriding issue that filters custom featurization config.
azureml-train-automl-runtime
Fixed the issue with frequency detection in the remote runs
Moved the AutoMLStep in the azureml-pipeline-steps package. Deprecated the AutoMLStep
within azureml-train-automl-runtime .
azureml-train-core
Supporting PyTorch version 1.4 in the PyTorch Estimator
2020-03-02
Azure Machine Learning SDK for Python v1.1.2rc0 (Pre -release )
Bug fixes and improvements
azureml-automl-core
Enabled the Batch mode inference (taking multiple rows once) for AutoML ONNX models
Improved the detection of frequency on the data sets, lacking data or containing irregular data
points
Added the ability to remove data points not complying with the dominant frequency.
azureml-automl-runtime
Fixed the issue with the error thrown if the grain was not present in the training set appeared in
the test set
Removed the y_query requirement during scoring on forecasting service
azureml-contrib-mir
Adds functionality in the MirWebservice class to retrieve the Access Token
azureml-core
Deployed AzureML Webservices will now default to INFO logging. This can be controlled by
setting the AZUREML_LOG_LEVEL environment variable in the deployed service.
Fix iterating on Dataset.get_all to return all datasets registered with the workspace.
Improve error message when invalid type is passed to path argument of dataset creation APIs.
Python sdk uses discovery service to use 'api' endpoint instead of 'pipelines'.
Swap to the new routes in all SDK calls
Changes routing of calls to the ModelManagementService to a new unified structure
Made workspace update method publicly available.
Added image_build_compute parameter in workspace update method to allow user
updating the compute for image build
Added deprecation messages to the old profiling workflow. Fixed profiling cpu and memory
limits
azureml-interpret
update azureml-interpret to interpret-community 0.6.*
azureml-mlflow
Add support for sovereign clouds to azureml.mlflow
azureml-pipeline-steps
Moved the AutoMLStep to the azureml-pipeline-steps package . Deprecated the AutoMLStep
within azureml-train-automl-runtime .
azureml-train-automl-client
Fixed an issue where certain packages may be installed at incorrect versions on remote runs.
azureml-train-automl-runtime
Fixed the issue with frequency detection in the remote runs
Moved the AutoMLStep to the azureml-pipeline-steps package . Deprecated the AutoMLStep
within azureml-train-automl-runtime .
azureml-train-core
Moved the AutoMLStep to the azureml-pipeline-steps package . Deprecated the AutoMLStep
within azureml-train-automl-runtime .
2020-02-18
Azure Machine Learning SDK for Python v1.1.1rc0 (Pre -release )
Bug fixes and improvements
azure-cli-ml
Single instance profiling was fixed to produce a recommendation and was made available in
core sdk.
azureml-automl-core
The error logging has been improved.
azureml-automl-runtime
Fixed the issue with forecasting when the data set contains short grains with long time gaps.
Fixed the issue when the auto max horizon is turned on and the date column contains dates in
form of strings. We added proper conversion and sensible error if conversion to date is not
possible
Using native NumPy and SciPy for serializing and deserializing intermediate data for
FileCacheStore (used for local AutoML runs)
Fixed a bug where failed child runs could get stuck in Running state.
azureml-cli-common
Single instance profiling was fixed to produce a recommendation and was made available in
core sdk.
azureml-core
Added --grant-workspace-msi-access as an additional parameter for the Datastore CLI for
registering Azure Blob Container that will allow you to register Blob Container that is behind a
VNet
Single instance profiling was fixed to produce a recommendation and was made available in
core sdk.
Fixed the issue in aks.py _deploy
Validates the integrity of models being uploaded to avoid silent storage failures.
azureml-interpret
added azureml-style exceptions to azureml-interpret
fixed DeepScoringExplainer serialization for keras models
azureml-pipeline-core
Pipeline batch scoring notebook now uses ParallelRunStep
azureml-pipeline-steps
Moved the AutoMLStep in the azureml-pipeline-steps package. Deprecated the AutoMLStep
within azureml-train-automl-runtime .
azureml-contrib-pipeline-steps
Optional parameter side_inputs added to ParallelRunStep. This parameter can be used to
mount folder on the container. Currently supported types are DataReference and PipelineData.
azureml-tensorboard
Updated azureml-tensorboard to support TensorFlow 2.0
azureml-train-automl-client
Fixed FeaturizationConfig overriding issue that filters custom featurization config.
azureml-train-automl-runtime
Moved the AutoMLStep in the azureml-pipeline-steps package. Deprecated the AutoMLStep
within azureml-train-automl-runtime .
azureml-train-core
Supporting PyTorch version 1.4 in the PyTorch Estimator
2020-02-04
Azure Machine Learning SDK for Python v1.1.0rc0 (Pre -release )
Breaking changes
Semantic Versioning 2.0.0
Starting with version 1.1 Azure ML Python SDK adopts Semantic Versioning 2.0.0. All
subsequent versions will follow new numbering scheme and semantic versioning contract.
Bug fixes and improvements
azureml-automl-runtime
Increased speed of featurization.
Fixed the frequency check during scoring, now in the forecasting tasks we do not require strict
frequency equivalence between train and test set.
azureml-core
User may now specify a value for the auth key when regenerating keys for webservices.
azureml-interpret
Updated azureml-interpret to depend on interpret-community 0.5.0
azureml-pipeline-core
Fixed a bug where PythonScriptStep results could be incorrectly reused despite changing the
arguments list
azureml-pipeline-steps
Added documentation example for dataset as PythonScriptStep input
azureml-contrib-pipeline-steps
Parameters passed in ParallelRunConfig can be overwritten by passing pipeline parameters
now. New pipeline parameters supported aml_mini_batch_size, aml_error_threshold,
aml_logging_level, aml_run_invocation_timeout (aml_node_count and
aml_process_count_per_node are already part of earlier release).
2020-01-21
Azure Machine Learning SDK for Python v1.0.85
New features
azureml-core
Get the current core usage and quota limitation for AmlCompute resources in a given
workspace and subscription
azureml-contrib-pipeline-steps
Enable user to pass tabular dataset as intermediate result from previous step to parallelrunstep
Bug fixes and improvements
azureml-automl-runtime
Removed the requirement of y_query column in the request to the deployed forecasting
service.
The 'y_query' was removed from the Dominick's Orange Juice notebook service request
section.
Fixed the bug preventing forecasting on the deployed models, operating on data sets with date
time columns.
Added Matthews Correlation Coefficient as a classification metric, for both binary and
multiclass classification.
azureml-contrib-interpret
Removed text explainers from azureml-contrib-interpret as text explanation has been moved to
the interpret-text repo that will be released soon.
azureml-core
Dataset: usages for file dataset no longer depend on numpy and pandas to be installed in the
Python env.
Changed LocalWebservice.wait_for_deployment() to check the status of the local Docker
container before trying to ping its health endpoint, greatly reducing the amount of time it takes
to report a failed deployment.
Fixed the initialization of an internal property used in LocalWebservice.reload() when the
service object is created from an existing deployment using the LocalWebservice() constructor.
Edited error message for clarification.
Added a new method called get_access_token() to AksWebservice that will return
AksServiceAccessToken object, which contains access token, refresh after timestamp, expiry on
timestamp and token type.
Deprecated existing get_token() method in AksWebservice as the new method returns all of the
information this method returns.
Modified output of az ml service get-access-token command. Renamed token to accessToken
and refreshBy to refreshAfter. Added expiryOn and tokenType properties.
Fixed get_active_runs
azureml-explain-model
updated shap to 0.33.0 and interpret-community to 0.4.*
azureml-interpret
updated shap to 0.33.0 and interpret-community to 0.4.*
azureml-train-automl-runtime
Added Matthews Correlation Coefficient as a classification metric, for both binary and
multiclass classification.
Deprecate preprocess flag from code and replaced with featurization -featurization is on by
default
2020-01-06
Azure Machine Learning SDK for Python v1.0.83
New features
Dataset: Add two options on_error and out_of_range_datetime for to_pandas_dataframe to fail when
data has error values instead of filling them with None .
Workspace: Added the hbi_workspace flag for workspaces with sensitive data that enables further
encryption and disables advanced diagnostics on workspaces. We also added support for bringing
your own keys for the associated Cosmos DB instance, by specifying the cmk_keyvault and
resource_cmk_uri parameters when creating a workspace, which creates a Cosmos DB instance in
your subscription while provisioning your workspace. To learn more, see the Azure Cosmos DB section
of data encryption article.
Bug fixes and improvements
azureml-automl-runtime
Fixed a regression that caused a TypeError to be raised when running AutoML on Python
versions below 3.5.4.
azureml-core
Fixed bug in datastore.upload_files were relative path that didn't start with ./ was not able
to be used.
Added deprecation messages for all Image class code paths
Fixed Model Management URL construction for Azure China 21Vianet region.
Fixed issue where models using source_dir couldn't be packaged for Azure Functions.
Added an option to Environment.build_local() to push an image into AzureML workspace
container registry
Updated the SDK to use new token library on Azure synapse in a back compatible manner.
azureml-interpret
Fixed bug where None was returned when no explanations were available for download. Now
raises an exception, matching behavior elsewhere.
azureml-pipeline-steps
Disallowed passing DatasetConsumptionConfig s to Estimator 's inputs parameter when the
Estimator will be used in an EstimatorStep .
azureml-sdk
Added AutoML client to azureml-sdk package, enabling remote AutoML runs to be submitted
without installing the full AutoML package.
azureml-train-automl-client
Corrected alignment on console output for AutoML runs
Fixed a bug where incorrect version of pandas may be installed on remote amlcompute.
2019-12-23
Azure Machine Learning SDK for Python v1.0.81
Bug fixes and improvements
azureml-contrib-interpret
defer shap dependency to interpret-community from azureml-interpret
azureml-core
Compute target can now be specified as a parameter to the corresponding deployment config
objects. This is specifically the name of the compute target to deploy to, not the SDK object.
Added CreatedBy information to Model and Service objects. May be accessed
through.created_by
Fixed ContainerImage.run(), which was not correctly setting up the Docker container's HTTP
port.
Make azureml-dataprep optional for az ml dataset register CLI command
Fixed a bug where TabularDataset.to_pandas_dataframe would incorrectly fall back to an
alternate reader and print out a warning.
azureml-explain-model
defer shap dependency to interpret-community from azureml-interpret
azureml-pipeline-core
Added new pipeline step NotebookRunnerStep , to run a local notebook as a step in pipeline.
Removed deprecated get_all functions for PublishedPipelines, Schedules, and PipelineEndpoints
azureml-train-automl-client
Started deprecation of data_script as an input to AutoML.
2019-12-09
Azure Machine Learning SDK for Python v1.0.79
Bug fixes and improvements
azureml-automl-core
Removed featurizationConfig to be logged
Updated logging to log "auto"/"off"/"customized" only.
azureml-automl-runtime
Added support for pandas. Series and pandas. Categorical for detecting column data type.
Previously only supported numpy.ndarray
Added related code changes to handle categorical dtype correctly.
The forecast function interface was improved: the y_pred parameter was made optional. -The
docstrings were improved.
azureml-contrib-dataset
Fixed a bug where labeled datasets could not be mounted.
azureml-core
Bug fix for Environment.from_existing_conda_environment(name, conda_environment_name) . User
can create an instance of Environment that is exact replica of the local environment
Changed time series-related Datasets methods to include_boundary=True by default.
azureml-train-automl-client
Fixed issue where validation results are not printed when show output is set to false.
2019-11-25
Azure Machine Learning SDK for Python v1.0.76
Breaking changes
Azureml-Train-AutoML upgrade issues
Upgrading to azureml-train-automl>=1.0.76 from azureml-train-automl<1.0.76 can cause
partial installations, causing some AutoML imports to fail. To resolve this, you can run the setup
script found at https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-
use-azureml/automated-machine-learning/automl_setup.cmd. Or if you are using pip directly
you can:
"pip install --upgrade azureml-train-automl"
"pip install --ignore-installed azureml-train-automl-client"
or you can uninstall the old version before upgrading
"pip uninstall azureml-train-automl"
"pip install azureml-train-automl"
Bug fixes and improvements
azureml-automl-runtime
AutoML will now take into account both true and false classes when calculating averaged scalar
metrics for binary classification tasks.
Moved Machine learning and training code in AzureML-AutoML-Core to a new package
AzureML-AutoML-Runtime.
azureml-contrib-dataset
When calling to_pandas_dataframe on a labeled dataset with the download option, you can now
specify whether to overwrite existing files or not.
When calling keep_columns or drop_columns that results in a time series, label, or image
column being dropped, the corresponding capabilities will be dropped for the dataset as well.
Fixed an issue with pytorch loader for the object detection task.
azureml-contrib-interpret
Removed explanation dashboard widget from azureml-contrib-interpret, changed package to
reference the new one in interpret_community
Updated version of interpret-community to 0.2.0
azureml-core
Improve performance of workspace.datasets .
Added the ability to register Azure SQL Database Datastore using username and password
authentication
Fix for loading RunConfigurations from relative paths.
When calling keep_columns or drop_columns that results in a time series column being
dropped, the corresponding capabilities will be dropped for the dataset as well.
azureml-interpret
updated version of interpret-community to 0.2.0
azureml-pipeline-steps
Documented supported values for runconfig_pipeline_params for Azure machine learning
pipeline steps.
azureml-pipeline-core
Added CLI option to download output in json format for Pipeline commands.
azureml-train-automl
Split AzureML-Train-AutoML into two packages, a client package AzureML-Train-AutoML-Client
and an ML training package AzureML-Train-AutoML-Runtime
azureml-train-automl-client
Added a thin client for submitting AutoML experiments without needing to install any machine
learning dependencies locally.
Fixed logging of automatically detected lags, rolling window sizes and maximal horizons in the
remote runs.
azureml-train-automl-runtime
Added a new AutoML package to isolate machine learning and runtime components from the
client.
azureml-contrib-train-rl
Added reinforcement learning support in SDK.
Added AmlWindowsCompute support in RL SDK.
2019-11-11
Azure Machine Learning SDK for Python v1.0.74
Preview features
azureml-contrib-dataset
After importing azureml-contrib-dataset, you can call Dataset.Labeled.from_json_lines instead
of ._Labeled to create a labeled dataset.
When calling to_pandas_dataframe on a labeled dataset with the download option, you can now
specify whether to overwrite existing files or not.
When calling keep_columns or drop_columns that results in a time series, label, or image
column being dropped, the corresponding capabilities will be dropped for the dataset as well.
Fixed issues with PyTorch loader when calling dataset.to_torchvision() .
Bug fixes and improvements
azure-cli-ml
Added Model Profiling to the preview CLI.
Fixes breaking change in Azure Storage causing AzureML CLI to fail.
Added Load Balancer Type to MLC for AKS types
azureml-automl-core
Fixed the issue with detection of maximal horizon on time series, having missing values and
multiple grains.
Fixed the issue with failures during generation of cross validation splits.
Replace this section with a message in markdown format to appear in the release notes: -
Improved handling of short grains in the forecasting data sets.
Fixed the issue with masking of some user information during logging. -Improved logging of
the errors during forecasting runs.
Adding psutil as a conda dependency to the autogenerated yml deployment file.
azureml-contrib-mir
Fixes breaking change in Azure Storage causing AzureML CLI to fail.
azureml-core
Fixes a bug that caused models deployed on Azure Functions to produce 500s.
Fixed an issue where the amlignore file was not applied on snapshots.
Added a new API amlcompute.get_active_runs that returns a generator for running and queued
runs on a given amlcompute.
Added Load Balancer Type to MLC for AKS types.
Added append_prefix bool parameter to download_files in run.py and
download_artifacts_from_prefix in artifacts_client. This flag is used to selectively flatten the
origin filepath so only the file or folder name is added to the output_directory
Fix deserialization issue for run_config.yml with dataset usage.
When calling keep_columns or drop_columns that results in a time series column being
dropped, the corresponding capabilities will be dropped for the dataset as well.
azureml-interpret
Updated interpret-community version to 0.1.0.3
azureml-train-automl
Fixed an issue where automl_step might not print validation issues.
Fixed register_model to succeed even if the model's environment is missing dependencies
locally.
Fixed an issue where some remote runs were not docker enabled.
Add logging of the exception that is causing a local run to fail prematurely.
azureml-train-core
Consider resume_from runs in the calculation of automated hyperparameter tuning best child
runs.
azureml-pipeline-core
Fixed parameter handling in pipeline argument construction.
Added pipeline description and step type yaml parameter.
New yaml format for Pipeline step and added deprecation warning for old format.
2019-11-04
Web experience
The collaborative workspace landing page at https://ml.azure.com has been enhanced and rebranded as the
Azure Machine Learning studio.
From the studio, you can train, test, deploy, and manage Azure Machine Learning assets such as datasets,
pipelines, models, endpoints, and more.
Access the following web-based authoring tools from the studio:
Automated machine learning (preview) No code experience for automating machine learning model
development
2019-10-31
Azure Machine Learning SDK for Python v1.0.72
New features
Added dataset monitors through the azureml-datadrift package, allowing for monitoring time
series datasets for data drift or other statistical changes over time. Alerts and events can be
triggered if drift is detected or other conditions on the data are met. See our documentation for
details.
Announcing two new editions (also referred to as a SKU interchangeably) in Azure Machine
Learning. With this release, you can now create either a Basic or Enterprise Azure Machine
Learning workspace. All existing workspaces will be defaulted to the Basic edition, and you can go
to the Azure portal or to the studio to upgrade the workspace anytime. You can create either a
Basic or Enterprise workspace from the Azure portal. Read our documentation to learn more. From
the SDK, the edition of your workspace can be determined using the "sku" property of your
workspace object.
We have also made enhancements to Azure Machine Learning Compute - you can now view
metrics for your clusters (like total nodes, running nodes, total core quota) in Azure Monitor,
besides viewing Diagnostic logs for debugging. In addition, you can also view currently running or
queued runs on your cluster and details such as the IPs of the various nodes on your cluster. You
can view these either in the portal or by using corresponding functions in the SDK or CLI.
Preview features
We are releasing preview support for disk encryption of your local SSD in Azure Machine
Learning Compute. Raise a technical support ticket to get your subscription allow listed to use
this feature.
Public Preview of Azure Machine Learning Batch Inference. Azure Machine Learning Batch
Inference targets large inference jobs that are not time-sensitive. Batch Inference provides cost-
effective inference compute scaling, with unparalleled throughput for asynchronous
applications. It is optimized for high-throughput, fire-and-forget inference over large
collections of data.
azureml-contrib-dataset
Enabled functionalities for labeled dataset
import azureml.core
from azureml.core import Workspace, Datastore, Dataset
import azureml.contrib.dataset
from azureml.contrib.dataset import FileHandlingOption, LabeledDatasetTask
2019-10-14
Azure Machine Learning SDK for Python v1.0.69
Bug fixes and improvements
azureml-automl-core
Limiting model explanations to best run rather than computing explanations for every run.
Making this behavior change for local, remote and ADB.
Added support for on-demand model explanations for UI
Added psutil as a dependency of automl and included psutil as a conda dependency in
amlcompute.
Fixed the issue with heuristic lags and rolling window sizes on the forecasting data sets some
series of which can cause linear algebra errors
Added print out for the heuristically determined parameters in the forecasting runs.
azureml-contrib-datadrift
Added protection while creating output metrics if dataset level drift is not in the first section.
azureml-contrib-interpret
azureml-contrib-explain-model package has been renamed to azureml-contrib-interpret
azureml-core
Added API to unregister datasets. dataset.unregister_all_versions()
azureml-contrib-explain-model package has been renamed to azureml-contrib-interpret.
azureml-core
Added API to unregister datasets. dataset.unregister_all_versions().
Added Dataset API to check data changed time. dataset.data_changed_time .
Being able to consume FileDataset and TabularDataset as inputs to PythonScriptStep ,
EstimatorStep , and HyperDriveStep in Azure Machine Learning Pipeline
Performance of FileDataset.mount has been improved for folders with a large number of files
Being able to consume FileDataset and TabularDataset as inputs to PythonScriptStep,
EstimatorStep, and HyperDriveStep in the Azure Machine Learning Pipeline.
Performance of FileDataset.mount() has been improved for folders with a large number of files
Added URL to known error recommendations in run details.
Fixed a bug in run.get_metrics where requests would fail if a run had too many children
Fixed a bug in run.get_metrics where requests would fail if a run had too many children
Added support for authentication on Arcadia cluster.
Creating an Experiment object gets or creates the experiment in the Azure Machine Learning
workspace for run history tracking. The experiment ID and archived time are populated in the
Experiment object on creation. Example: experiment = Experiment(workspace, "New
Experiment") experiment_id = experiment.id archive() and reactivate() are functions that can be
called on an experiment to hide and restore the experiment from being shown in the UX or
returned by default in a call to list experiments. If a new experiment is created with the same
name as an archived experiment, you can rename the archived experiment when reactivating
by passing a new name. There can only be one active experiment with a given name. Example:
experiment1 = Experiment(workspace, "Active Experiment") experiment1.archive() # Create
new active experiment with the same name as the archived. experiment2. =
Experiment(workspace, "Active Experiment") experiment1.reactivate(new_name="Previous
Active Experiment") The static method list() on Experiment can take a name filter and ViewType
filter. ViewType values are "ACTIVE_ONLY", "ARCHIVED_ONLY" and "ALL" Example:
archived_experiments = Experiment.list(workspace, view_type="ARCHIVED_ONLY")
all_first_experiments = Experiment.list(workspace, name="First Experiment", view_type="ALL")
Support using environment for model deployment, and service update
azureml-datadrift
The show attribute of DataDriftDector class won't support optional argument 'with_details'
anymore. The show attribute will only present data drift coefficient and data drift contribution
of feature columns.
DataDriftDetector attribute 'get_output' behavior changes:
Input parameter start_time, end_time are optional instead of mandatory;
Input specific start_time and/or end_time with a specific run_id in the same invoking will
result in value error exception because they are mutually exclusive
By input specific start_time and/or end_time, only results of scheduled runs will be
returned;
Parameter 'daily_latest_only' is deprecated.
Support retrieving Dataset-based Data Drift outputs.
azureml-explain-model
Renames AzureML-explain-model package to AzureML-interpret, keeping the old package for
backwards compatibility for now
fixed automl bug with raw explanations set to classification task instead of regression by
default on download from ExplanationClient
Add support for ScoringExplainer to be created directly using MimicWrapper
azureml-pipeline-core
Improved performance for large Pipeline creation
azureml-train-core
Added TensorFlow 2.0 support in TensorFlow Estimator
azureml-train-automl
Creating an Experiment object gets or creates the experiment in the Azure Machine
Learning workspace for run history tracking. The experiment ID and archived time are
populated in the Experiment object on creation. Example:
experiment = Experiment(workspace, "New Experiment")
experiment_id = experiment.id
archive() and reactivate() are functions that can be called on an experiment to hide and
restore the experiment from being shown in the UX or returned by default in a call to list
experiments. If a new experiment is created with the same name as an archived experiment,
you can rename the archived experiment when reactivating by passing a new name. There
can only be one active experiment with a given name. Example:
The static method list() on Experiment can take a name filter and ViewType filter. ViewType
values are "ACTIVE_ONLY", "ARCHIVED_ONLY" and "ALL". Example:
2019-09-30
Azure Machine Learning SDK for Python v1.0.65
New features
Added curated environments. These environments have been pre-configured with libraries for
common machine learning tasks, and have been pre-build and cached as Docker images for faster
execution. They appear by default in Workspace's list of environment, with prefix "AzureML".
Added curated environments. These environments have been pre-configured with libraries for
common machine learning tasks, and have been pre-build and cached as Docker images for faster
execution. They appear by default in Workspace's list of environment, with prefix "AzureML".
azureml-train-automl
azureml-train-automl
Added the ONNX conversion support for the ADB and HDI
Preview features
azureml-train-automl
azureml-train-automl
Supported BERT and BiLSTM as text featurizer (preview only)
Supported featurization customization for column purpose and transformer parameters
(preview only)
Supported raw explanations when user enables model explanation during training (preview
only)
Added Prophet for timeseries forecasting as a trainable pipeline (preview only)
azureml-contrib-datadrift
Packages relocated from azureml-contrib-datadrift to azureml-datadrift; the contrib package
will be removed in a future release
Bug fixes and improvements
azureml-automl-core
Introduced FeaturizationConfig to AutoMLConfig and AutoMLBaseSettings
Introduced FeaturizationConfig to AutoMLConfig and AutoMLBaseSettings
Override Column Purpose for Featurization with given column and feature type
Override transformer parameters
Added deprecation message for explain_model() and retrieve_model_explanations()
Added Prophet as a trainable pipeline (preview only)
Added deprecation message for explain_model() and retrieve_model_explanations().
Added Prophet as a trainable pipeline (preview only).
Added support for automatic detection of target lags, rolling window size, and maximal
horizon. If one of target_lags, target_rolling_window_size or max_horizon is set to 'auto', the
heuristics will be applied to estimate the value of corresponding parameter based on training
data.
Fixed forecasting in the case when data set contains one grain column, this grain is of a
numeric type and there is a gap between train and test set
Fixed the error message about the duplicated index in the remote run in forecasting tasks
Fixed forecasting in the case when data set contains one grain column, this grain is of a
numeric type and there is a gap between train and test set.
Fixed the error message about the duplicated index in the remote run in forecasting tasks.
Added a guardrail to check whether a dataset is imbalanced or not. If it is, a guardrail message
would be written to the console.
azureml-core
Added ability to retrieve SAS URL to model in storage through the model object. Ex:
model.get_sas_url()
Introduce run.get_details()['datasets'] to get datasets associated with the submitted run
Add API Dataset.Tabular.from_json_lines_files to create a TabularDataset from JSON Lines
files. To learn about this tabular data in JSON Lines files on TabularDataset, visit this article for
documentation.
Added additional VM size fields (OS Disk, number of GPUs) to the supported_vmsizes ()
function
Added additional fields to the list_nodes () function to show the run, the private and the public
IP, the port etc.
Ability to specify a new field during cluster provisioning --remotelogin_port_public_access
which can be set to enabled or disabled depending on whether you would like to leave the SSH
port open or closed at the time of creating the cluster. If you do not specify it, the service will
smartly open or close the port depending on whether you are deploying the cluster inside a
VNet.
azureml-explain-model
azureml-core
Added ability to retrieve SAS URL to model in storage through the model object. Ex:
model.get_sas_url()
Introduce run.get_details['datasets'] to get datasets associated with the submitted run
Add API Dataset.Tabular .from_json_lines_files() to create a TabularDataset from JSON Lines
files. To learn about this tabular data in JSON Lines files on TabularDataset,
visithttps://aka.ms/azureml-data for documentation.
Added additional VM size fields (OS Disk, number of GPUs) to the supported_vmsizes()
function
Added additional fields to the list_nodes() function to show the run, the private, and the public
IP, the port etc.
Ability to specify a new field during cluster provisioning that can be set to enabled or disabled
depending on whether you would like to leave the SSH port open or closed at the time of
creating the cluster. If you do not specify it, the service will smartly open or close the port
depending on whether you are deploying the cluster inside a VNet.
azureml-explain-model
Improved documentation for Explanation outputs in the classification scenario.
Added the ability to upload the predicted y values on the explanation for the evaluation
examples. Unlocks more useful visualizations.
Added explainer property to MimicWrapper to enable getting the underlying MimicExplainer.
azureml-pipeline-core
Added notebook to describe Module, ModuleVersion, and ModuleStep
azureml-pipeline-steps
Added RScriptStep to support R script run via AML pipeline.
Fixed metadata parameters parsing in AzureBatchStep that was causing the error message
"assignment for parameter SubscriptionId is not specified."
azureml-train-automl
Supported training_data, validation_data, label_column_name, weight_column_name as data
input format
Added deprecation message for explain_model() and retrieve_model_explanations()
azureml-pipeline-core
Added a notebook to describe Module, [ModuleVersion, and ModuleStep.
azureml-pipeline-steps
Added RScriptStep to support R script run via AML pipeline.
Fixed metadata parameters parsing in [AzureBatchStep that was causing the error message
"assignment for parameter SubscriptionId is not specified".
azureml-train-automl
Supported training_data, validation_data, label_column_name, weight_column_name as data
input format.
Added deprecation message for explain_model() and retrieve_model_explanations().
2019-09-16
Azure Machine Learning SDK for Python v1.0.62
New features
Introduced the timeseries trait on TabularDataset. This trait enables easy timestamp filtering on
data a TabularDataset, such as taking all data between a range of time or the most recent data.
https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/work-
with-data/datasets-tutorial/timeseries-datasets/tabular-timeseries-dataset-filtering.ipynb for an
example notebook.
Enabled training with TabularDataset and FileDataset.
azureml-train-core
Added Nccl and Gloo support in PyTorch estimator
Bug fixes and improvements
azureml-automl-core
Deprecated the AutoML setting 'lag_length' and the LaggingTransformer.
Fixed correct validation of input data if they are specified in a Dataflow format
Modified the fit_pipeline.py to generate the graph json and upload to artifacts.
Rendered the graph under userrun using Cytoscape .
azureml-core
Revisited the exception handling in ADB code and make changes to as per new error handling
Added automatic MSI authentication for Notebook VMs.
Fixes bug where corrupt or empty models could be uploaded because of failed retries.
Fixed the bug where DataReference name changes when the DataReference mode changes
(for example, when calling as_upload , as_download , or as_mount ).
Make mount_point and target_path optional for FileDataset.mount and
FileDataset.download .
Exception that timestamp column cannot be found will be throw out if the time serials-related
API is called without fine timestamp column assigned or the assigned timestamp columns are
dropped.
Time serials columns should be assigned with column whose type is Date, otherwise exception
is expected
Time serials columns assigning API 'with_timestamp_columns' can take None value fine/coarse
timestamp column name, which will clear previously assigned timestamp columns.
Exception will be thrown out when either coarse grain or fine grained timestamp column is
dropped with indication for user that dropping can be done after either excluding timestamp
column in dropping list or call with_time_stamp with None value to release timestamp columns
Exception will be thrown out when either coarse grain or fine grained timestamp column is not
included in keep columns list with indication for user that keeping can be done after either
including timestamp column in keep column list or call with_time_stamp with None value to
release timestamp columns.
Added logging for the size of a registered model.
azureml-explain-model
Fixed warning printed to console when "packaging" Python package is not installed: "Using
older than supported version of lightgbm, please upgrade to version greater than 2.2.1"
Fixed download model explanation with sharding for global explanations with many features
Fixed mimic explainer missing initialization examples on output explanation
Fixed immutable error on set properties when uploading with explanation client using two
different types of models
Added a get_raw param to scoring explainer.explain() so one scoring explainer can return both
engineered and raw values.
azureml-train-automl
Introduced public APIs from AutoML for supporting explanations from automl explain SDK -
Newer way of supporting AutoML explanations by decoupling AutoML featurization and
explain SDK - Integrated raw explanation support from azureml explain SDK for AutoML
models.
Removing azureml-defaults from remote training environments.
Changed default cache store location from FileCacheStore based one to AzureFileCacheStore
one for AutoML on Azure Databricks code path.
Fixed correct validation of input data if they are specified in a Dataflow format
azureml-train-core
Reverted source_directory_data_store deprecation.
Added ability to override azureml installed package versions.
Added dockerfile support in environment_definition parameter in estimators.
Simplified distributed training parameters in estimators.
2019-09-09
New web experience (preview) for Azure Machine Learning workspaces
The new web experience enables data scientists and data engineers to complete their end-to-end machine
learning lifecycle from prepping and visualizing data to training and deploying models in a single location.
Key features:
Using this new Azure Machine Learning interface, you can now:
Manage your notebooks or link out to Jupyter
Run automated ML experiments
Create datasets from local files, datastores, & web files
Explore & prepare datasets for model creation
Monitor data drift for your models
View recent resources from a dashboard
At the time, of this release, the following browsers are supported: Chrome, Firefox, Safari, and Microsoft Edge
Preview.
Known issues:
1. Refresh your browser if you see "Something went wrong! Error loading chunk files" when deployment is
in progress.
2. Can't delete or rename file in Notebooks and Files. During Public Preview, you can use Jupyter UI or
Terminal in Notebook VM to perform update file operations. Because it is a mounted network file system
all changes, you make on Notebook VM are immediately reflected in the Notebook Workspace.
3. To SSH into the Notebook VM:
a. Find the SSH keys that were created during VM setup. Or, find the keys in the Azure Machine Learning
workspace > open Compute tab > locate Notebook VM in the list > open its properties: copy the keys
from the dialog.
b. Import those public and private SSH keys to your local machine.
c. Use them to SSH into the Notebook VM.
2019-09-03
Azure Machine Learning SDK for Python v1.0.60
New features
Introduced FileDataset, which references single or multiple files in your datastores or public urls. The
files can be of any format. FileDataset provides you with the ability to download or mount the files to
your compute.
Added Pipeline Yaml Support for PythonScript Step, Adla Step, Databricks Step, DataTransferStep, and
AzureBatch Step
Bug fixes and improvements
azureml-automl-core
AutoArima is now a suggestable pipeline for preview only.
Improved error reporting for forecasting.
Improved the logging by using custom exceptions instead of generic in the forecasting tasks.
Removed the check on max_concurrent_iterations to be less than total number of iterations.
AutoML models now return AutoMLExceptions
This release improves the execution performance of automated machine learning local runs.
azureml-core
Introduce Dataset.get_all(workspace), which returns a dictionary of TabularDataset and
FileDataset objects keyed by their registration name.
workspace = Workspace.from_config()
all_datasets = Dataset.get_all(workspace)
mydata = all_datasets['my-data']
workspace = Workspace.from_config()
all_datasets = Dataset.get_all(workspace)
mydata = all_datasets['my-data']
2019-08-19
Azure Machine Learning SDK for Python v1.0.57
New features
Enabled TabularDataset to be consumed by AutomatedML. To learn more about TabularDataset ,
visithttps://aka.ms/azureml/howto/createdatasets.
Bug fixes and improvements
azure-cli-ml
You can now update the TLS/SSL certificate for the scoring endpoint deployed on AKS cluster
both for Microsoft generated and customer certificate.
azureml-automl-core
Fixed an issue in AutoML were rows with missing labels were not removed properly.
Improved error logging in AutoML; full error messages will now always be written to the log
file.
AutoML has updated its package pinning to include azureml-defaults , azureml-explain-model ,
and azureml-dataprep . AutoML will no longer warn on package mismatches (except for
azureml-train-automl package).
Fixed an issue in timeseries where cv splits are of unequal size causing bin calculation to fail.
When running ensemble iteration for the Cross-Validation training type, if we ended up having
trouble downloading the models trained on the entire dataset, we were having an inconsistency
between the model weights and the models that were being fed into the voting ensemble.
Fixed the error, raised when training and/or validation labels (y and y_valid) are provided in the
form of pandas dataframe but not as numpy array.
Fixed the issue with the forecasting tasks when None was encountered in the Boolean columns
of input tables.
Allow AutoML users to drop training series that are not long enough when forecasting. - Allow
AutoML users to drop grains from the test set that does not exist in the training set when
forecasting.
azureml-core
Fixed issue with blob_cache_timeout parameter ordering.
Added external fit and transform exception types to system errors.
Added support for Key Vault secrets for remote runs. Add a azureml.core.keyvault.Keyvault
class to add, get, and list secrets from the keyvault associated with your workspace. Supported
operations are:
azureml.core.workspace.Workspace.get_default_keyvault()
azureml.core.keyvault.Keyvault.set_secret(name, value)
azureml.core.keyvault.Keyvault.set_secrets(secrets_dict)
azureml.core.keyvault.Keyvault.get_secret(name)
azureml.core.keyvault.Keyvault.get_secrets(secrets_list)
azureml.core.keyvault.Keyvault.list_secrets()
Additional methods to obtain default keyvault and get secrets during remote run:
azureml.core.workspace.Workspace.get_default_keyvault()
azureml.core.run.Run.get_secret(name)
azureml.core.run.Run.get_secrets(secrets_list)
Added additional override parameters to submit-hyperdrive CLI command.
Improve reliability of API calls be expanding retries to common requests library exceptions.
Add support for submitting runs from a submitted run.
Fixed expiring SAS token issue in FileWatcher, which caused files to stop being uploaded after
their initial token had expired.
Supported importing HTTP csv/tsv files in dataset Python SDK.
Deprecated the Workspace.setup() method. Warning message shown to users suggests using
create() or get()/from_config() instead.
Added Environment.add_private_pip_wheel(), which enables uploading private custom Python
packages whl to the workspace and securely using them to build/materialize the environment.
You can now update the TLS/SSL certificate for the scoring endpoint deployed on AKS cluster
both for Microsoft generated and customer certificate.
azureml-explain-model
Added parameter to add a model ID to explanations on upload.
Added is_raw tagging to explanations in memory and upload.
Added pytorch support and tests for azureml-explain-model package.
azureml-opendatasets
Support detecting and logging auto test environment.
Added classes to get US population by county and zip.
azureml-pipeline-core
Added label property to input and output port definitions.
azureml-telemetr y
Fixed an incorrect telemetry configuration.
azureml-train-automl
Fixed the bug where on setup failure, error was not getting logged in "errors" field for the setup
run and hence was not stored in parent run "errors".
Fixed an issue in AutoML were rows with missing labels were not removed properly.
Allow AutoML users to drop training series that are not long enough when forecasting.
Allow AutoML users to drop grains from the test set that does not exist in the training set when
forecasting.
Now AutoMLStep passes through automl config to backend to avoid any issues on changes or
additions of new config parameters.
AutoML Data Guardrail is now in public preview. User will see a Data Guardrail report (for
classification/regression tasks) after training and also be able to access it through SDK API.
azureml-train-core
Added torch 1.2 support in PyTorch Estimator.
azureml-widgets
Improved confusion matrix charts for classification training.
Azure Machine Learning Data Prep SDK v1.1.12
New features
Lists of strings can now be passed in as input to read_* methods.
Bug fixes and improvements
The performance of read_parquet has been improved when running in Spark.
Fixed an issue where column_type_builder failed in case of a single column with ambiguous date
formats.
Azure portal
Preview Feature
Log and output file streaming is now available for run details pages. The files will stream updates in
real time when the preview toggle is turned on.
Ability to set quota at a workspace level is released in preview. AmlCompute quotas are allocated at
the subscription level, but we now allow you to distribute that quota between workspaces and allocate
it for fair sharing and governance. Just click on the Usages+Quotas blade in the left navigation bar
of your workspace and select the Configure Quotas tab. You must be a subscription admin to be
able to set quotas at the workspace level since this is a cross-workspace operation.
2019-08-05
Azure Machine Learning SDK for Python v1.0.55
New features
Token-based authentication is now supported for the calls made to the scoring endpoint deployed on
AKS. We will continue to support the current key based authentication and users can use one of these
authentication mechanisms at a time.
Ability to register a blob storage that is behind the virtual network (VNet) as a datastore.
Bug fixes and improvements
azureml-automl-core
Fixes a bug where validation size for CV splits is small and results in bad predicted vs. true
charts for regression and forecasting.
The logging of forecasting tasks on the remote runs improved, now user is provided with
comprehensive error message if the run was failed.
Fixed failures of Timeseries if preprocess flag is True.
Made some forecasting data validation error messages more actionable.
Reduced memory consumption of AutoML runs by dropping and/or lazy loading of datasets,
especially in between process spawns
azureml-contrib-explain-model
Added model_task flag to explainers to allow user to override default automatic inference logic
for model type
Widget changes: Automatically installs with contrib , no more nbextension install/enable -
support explanation with global feature importance (for example, Permutative)
Dashboard changes: - Box plots and violin plots in addition to beeswarm plot on summary page
- Much faster rerendering of beeswarm plot on 'Top -k' slider change - helpful message
explaining how top-k is computed - Useful customizable messages in place of charts when data
not provided
azureml-core
Added Model.package() method to create Docker images and Dockerfiles that encapsulate
models and their dependencies.
Updated local webservices to accept InferenceConfigs containing Environment objects.
Fixed Model.register() producing invalid models when '.' (for the current directory) is passed as
the model_path parameter.
Add Run.submit_child, the functionality mirrors Experiment.submit while specifying the run as
the parent of the submitted child run.
Support configuration options from Model.register in Run.register_model.
Ability to run JAR jobs on existing cluster.
Now supporting instance_pool_id and cluster_log_dbfs_path parameters.
Added support for using an Environment object when deploying a Model to a Webservice. The
Environment object can now be provided as a part of the InferenceConfig object.
Add appinsifht mapping for new regions - centralus - westus - northcentralus
Added documentation for all the attributes in all the Datastore classes.
Added blob_cache_timeout parameter to Datastore.register_azure_blob_container .
Added save_to_directory and load_from_directory methods to
azureml.core.environment.Environment.
Added the "az ml environment download" and "az ml environment register" commands to the
CLI.
Added Environment.add_private_pip_wheel method.
azureml-explain-model
Added dataset tracking to Explanations using the Dataset service (preview).
Decreased default batch size when streaming global explanations from 10k to 100.
Added model_task flag to explainers to allow user to override default automatic inference logic
for model type.
azureml-mlflow
Fixed bug in mlflow.azureml.build_image where nested directories are ignored.
azureml-pipeline-steps
Added ability to run JAR jobs on existing Azure Databricks cluster.
Added support instance_pool_id and cluster_log_dbfs_path parameters for DatabricksStep step.
Added support for pipeline parameters in DatabricksStep step.
azureml-train-automl
Added docstrings for the Ensemble related files.
Updated docs to more appropriate language for max_cores_per_iteration and
max_concurrent_iterations
The logging of forecasting tasks on the remote runs improved, now user is provided with
comprehensive error message if the run was failed.
Removed get_data from pipeline automlstep notebook.
Started support dataprep in automlstep .
Azure Machine Learning Data Prep SDK v1.1.10
New features
You can now request to execute specific inspectors (for example, histogram, scatter plot, etc.) on
specific columns.
Added a parallelize argument to append_columns . If True, data will be loaded into memory but
execution will run in parallel; if False, execution will be streaming but single-threaded.
2019-07-23
Azure Machine Learning SDK for Python v1.0.53
New features
Automated Machine Learning now supports training ONNX models on the remote compute target
Azure Machine Learning now provides ability to resume training from a previous run, checkpoint, or
model files.
Learn how to use estimators to resume training from a previous run
Bug fixes and improvements
azure-cli-ml
CLI commands "model deploy" and "service update" now accept parameters, config files, or a
combination of the two. Parameters have precedence over attributes in files.
Model description can now be updated after registration
azureml-automl-core
Update NimbusML dependency to 1.2.0 version (current latest).
Adding support for NimbusML estimators & pipelines to be used within AutoML estimators.
Fixing a bug in the Ensemble selection procedure that was unnecessarily growing the resulting
ensemble even if the scores remained constant.
Enable reuse of some featurizations across CV Splits for forecasting tasks. This speeds up the
run-time of the setup run by roughly a factor of n_cross_validations for expensive
featurizations like lags and rolling windows.
Addressing an issue if time is out of pandas supported time range. We now raise a
DataException if time is less than pd.Timestamp.min or greater than pd.Timestamp.max
Forecasting now allows different frequencies in train and test sets if they can be aligned. For
example, "quarterly starting in January" and at "quarterly starting in October" can be aligned.
The property "parameters" was added to the TimeSeriesTransformer.
Remove old exception classes.
In forecasting tasks, the target_lags parameter now accepts a single integer value or a list of
integers. If the integer was provided, only one lag will be created. If a list is provided, the unique
values of lags will be taken. target_lags=[1, 2, 2, 4] will create lags of one, two and four periods.
Fix the bug about losing columns types after the transformation (bug linked);
In model.forecast(X, y_query) , allow y_query to be an object type containing None(s) at the
begin (#459519).
Add expected values to automl output
azureml-contrib-datadrift
Improvements to example notebook including switch to azureml-opendatasets instead of
azureml-contrib-opendatasets and performance improvements when enriching data
azureml-contrib-explain-model
Fixed transformations argument for LIME explainer for raw feature importance in azureml-
contrib-explain-model package
Added segmentations to image explanations in image explainer for the AzureML-contrib-
explain-model package
Add scipy sparse support for LimeExplainer
Added batch_size to mimic explainer when include_local=False , for streaming global
explanations in batches to improve execution time of DecisionTreeExplainableModel
azureml-contrib-featureengineering
Fix for calling set_featurizer_timeseries_params(): dict value type change and null check - Add
notebook for timeseries featurizer
Update NimbusML dependency to 1.2.0 version (current latest).
azureml-core
Added the ability to attach DBFS datastores in the AzureML CLI
Fixed the bug with datastore upload where an empty folder is created if target_path started
with /
Fixed deepcopy issue in ServicePrincipalAuthentication.
Added the "az ml environment show" and "az ml environment list" commands to the CLI.
Environments now support specifying a base_dockerfile as an alternative to an already-built
base_image.
The unused RunConfiguration setting auto_prepare_environment has been marked as
deprecated.
Model description can now be updated after registration
Bugfix: Model and Image delete now provides more information about retrieving upstream
objects that depend on them if delete fails due to an upstream dependency.
Fixed bug that printed blank duration for deployments that occur when creating a workspace
for some environments.
Improved failure exceptions for workspace creation. Such that users don't see "Unable to create
workspace. Unable to find..." as the message and instead see the actual creation failure.
Add support for token authentication in AKS webservices.
Add get_token() method to Webservice objects.
Added CLI support to manage machine learning datasets.
Datastore.register_azure_blob_container now optionally takes a blob_cache_timeout value (in
seconds) which configures blobfuse's mount parameters to enable cache expiration for this
datastore. The default is no timeout, such as when a blob is read, it will stay in the local cache
until the job is finished. Most jobs will prefer this setting, but some jobs need to read more data
from a large dataset than will fit on their nodes. For these jobs, tuning this parameter will help
them succeed. Take care when tuning this parameter: setting the value too low can result in
poor performance, as the data used in an epoch may expire before being used again. All reads
will be done from blob storage/network rather than the local cache, which negatively impacts
training times.
Model description can now properly be updated after registration
Model and Image deletion now provides more information about upstream objects that
depend on them, which causes the delete to fail
Improve resource utilization of remote runs using azureml.mlflow.
azureml-explain-model
Fixed transformations argument for LIME explainer for raw feature importance in azureml-
contrib-explain-model package
add scipy sparse support for LimeExplainer
added shape linear explainer wrapper, as well as another level to tabular explainer for
explaining linear models
for mimic explainer in explain model library, fixed error when include_local=False for sparse
data input
add expected values to automl output
fixed permutation feature importance when transformations argument supplied to get raw
feature importance
added batch_size to mimic explainer when include_local=False , for streaming global
explanations in batches to improve execution time of DecisionTreeExplainableModel
for model explainability library, fixed blackbox explainers where pandas dataframe input is
required for prediction
Fixed a bug where explanation.expected_values would sometimes return a float rather than a
list with a float in it.
azureml-mlflow
Improve performance of mlflow.set_experiment(experiment_name)
Fix bug in use of InteractiveLoginAuthentication for mlflow tracking_uri
Improve resource utilization of remote runs using azureml.mlflow.
Improve the documentation of the azureml-mlflow package
Patch bug where mlflow.log_artifacts("my_dir") would save artifacts under
my_dir/<artifact-paths> instead of <artifact-paths>
azureml-opendatasets
Pin pyarrow of opendatasets to old versions (<0.14.0) because of memory issue newly
introduced there.
Move azureml-contrib-opendatasets to azureml-opendatasets.
Allow open dataset classes to be registered to Azure Machine Learning workspace and leverage
AML Dataset capabilities seamlessly.
Improve NoaaIsdWeather enrich performance in non-SPARK version significantly.
azureml-pipeline-steps
DBFS Datastore is now supported for Inputs and Outputs in DatabricksStep.
Updated documentation for Azure Batch Step with regard to inputs/outputs.
In AzureBatchStep, changed delete_batch_job_after_finish default value to true.
azureml-telemetr y
Move azureml-contrib-opendatasets to azureml-opendatasets.
Allow open dataset classes to be registered to Azure Machine Learning workspace and leverage
AML Dataset capabilities seamlessly.
Improve NoaaIsdWeather enrich performance in non-SPARK version significantly.
azureml-train-automl
Updated documentation on get_output to reflect the actual return type and provide additional
notes on retrieving key properties.
Update NimbusML dependency to 1.2.0 version (current latest).
add expected values to automl output
azureml-train-core
Strings are now accepted as compute target for Automated Hyperparameter Tuning
The unused RunConfiguration setting auto_prepare_environment has been marked as
deprecated.
Azure Machine Learning Data Prep SDK v1.1.9
New features
Added support for reading a file directly from an http or https url.
Bug fixes and improvements
Improved error message when attempting to read a Parquet Dataset from a remote source (which is
not currently supported).
Fixed a bug when writing to Parquet file format in ADLS Gen 2, and updating the ADLS Gen 2
container name in the path.
2019-07-09
Visual Interface
Preview features
Added "Execute R script" module in visual interface.
Azure Machine Learning SDK for Python v1.0.48
New features
azureml-opendatasets
azureml-contrib-opendatasets is now available as azureml-opendatasets . The old
package can still work, but we recommend you using azureml-opendatasets moving
forward for richer capabilities and improvements.
This new package allows you to register open datasets as Dataset in Azure Machine Learning
workspace, and leverage whatever functionalities that Dataset offers.
It also includes existing capabilities such as consuming open datasets as Pandas/SPARK
dataframes, and location joins for some dataset like weather.
Preview features
HyperDriveConfig can now accept pipeline object as a parameter to support hyperparameter tuning
using a pipeline.
Bug fixes and improvements
azureml-train-automl
Fixed the bug about losing columns types after the transformation.
Fixed the bug to allow y_query to be an object type containing None(s) at the beginning.
Fixed the issue in the Ensemble selection procedure that was unnecessarily growing the
resulting ensemble even if the scores remained constant.
Fixed the issue with allow list_models and block list_models settings in AutoMLStep.
Fixed the issue that prevented the usage of preprocessing when AutoML would have been used
in the context of Azure ML Pipelines.
azureml-opendatasets
Moved azureml-contrib-opendatasets to azureml-opendatasets.
Allowed open dataset classes to be registered to Azure Machine Learning workspace and
leverage AML Dataset capabilities seamlessly.
Improved NoaaIsdWeather enrich performance in non-SPARK version significantly.
azureml-explain-model
Updated online documentation for interpretability objects.
Added batch_size to mimic explainer when include_local=False , for streaming global
explanations in batches to improve execution time of DecisionTreeExplainableModel for model
explainability library.
Fixed the issue where explanation.expected_values would sometimes return a float rather than
a list with a float in it.
Added expected values to automl output for mimic explainer in explain model library.
Fixed permutation feature importance when transformations argument supplied to get raw
feature importance.
azureml-core
Added the ability to attach DBFS datastores in the AzureML CLI.
Fixed the issue with datastore upload where an empty folder is created if target_path started
with / .
Enabled comparison of two datasets.
Model and Image delete now provides more information about retrieving upstream objects
that depend on them if delete fails due to an upstream dependency.
Deprecated the unused RunConfiguration setting in auto_prepare_environment.
azureml-mlflow
Improved resource utilization of remote runs that use azureml.mlflow.
Improved the documentation of the azureml-mlflow package.
Fixed the issue where mlflow.log_artifacts("my_dir") would save artifacts under
"my_dir/artifact-paths" instead of "artifact-paths".
azureml-pipeline-core
Parameter hash_paths for all pipeline steps is deprecated and will be removed in future. By
default contents of the source_directory is hashed (except files listed in .amlignore or
.gitignore )
Continued improving Module and ModuleStep to support compute type-specific modules, to
prepare for RunConfiguration integration and other changes to unlock compute type-specific
module usage in pipelines.
azureml-pipeline-steps
AzureBatchStep: Improved documentation with regard to inputs/outputs.
AzureBatchStep: Changed delete_batch_job_after_finish default value to true.
azureml-train-core
Strings are now accepted as compute target for Automated Hyperparameter Tuning.
Deprecated the unused RunConfiguration setting in auto_prepare_environment.
Deprecated parameters conda_dependencies_file_path and pip_requirements_file_path in favor
of conda_dependencies_file and pip_requirements_file respectively.
azureml-opendatasets
Improve NoaaIsdWeather enrich performance in non-SPARK version significantly.
2019-04-26
Azure Machine Learning SDK for Python v1.0.33 released.
Azure ML Hardware Accelerated Models on FPGAs is generally available.
You can now use the azureml-accel-models package to:
Train the weights of a supported deep neural network (ResNet 50, ResNet 152, DenseNet-121,
VGG-16, and SSD-VGG)
Use transfer learning with the supported DNN
Register the model with Model Management Service and containerize the model
Deploy the model to an Azure VM with an FPGA in an Azure Kubernetes Service (AKS) cluster
Deploy the container to an Azure Data Box Edge server device
Score your data with the gRPC endpoint with this sample
Automated Machine Learning
Feature sweeping to enable dynamically adding featurizers for performance optimization. New
featurizers: work embeddings, weight of evidence, target encodings, text target encoding, cluster distance
Smart CV to handle train/valid splits inside automated ML
Few memory optimization changes and runtime performance improvement
Performance improvement in model explanation
ONNX model conversion for local run
Added Subsampling support
Intelligent Stopping when no exit criteria defined
Stacked ensembles
Time Series Forecasting
New predict forecast function
You can now use rolling-origin cross validation on time series data
New functionality added to configure time series lags
New functionality added to support rolling window aggregate features
New Holiday detection and featurizer when country code is defined in experiment settings
Azure Databricks
Enabled time series forecasting and model explainabilty/interpretability capability
You can now cancel and resume (continue) automated ML experiments
Added support for multicore processing
MLOps
Local deployment & debugging for scoring containers
You can now deploy an ML model locally and iterate quickly on your scoring file and dependencies to
ensure they behave as expected.
Introduced InferenceConfig & Model.deploy()
Model deployment now supports specifying a source folder with an entry script, the same as a
RunConfig. Additionally, model deployment has been simplified to a single command.
Git reference tracking
Customers have been requesting basic Git integration capabilities for some time as it helps maintain a
complete audit trail. We have implemented tracking across major entities in Azure ML for Git-related
metadata (repo, commit, clean state). This information will be collected automatically by the SDK and CLI.
Model profiling & validation ser vice
Customers frequently complain of the difficulty to properly size the compute associated with their
inference service. With our model profiling service, the customer can provide sample inputs and we will
profile across 16 different CPU / memory configurations to determine optimal sizing for deployment.
Bring your own base image for inference
Another common complaint was the difficulty in moving from experimentation to inference RE sharing
dependencies. With our new base image sharing capability, you can now reuse your experimentation
base images, dependencies and all, for inference. This should speed up deployments and reduce the gap
from the inner to the outer loop.
Improved Swagger schema generation experience
Our previous swagger generation method was error prone and impossible to automate. We have a new
in-line way of generating swagger schemas from any Python function via decorators. We have open-
sourced this code and our schema generation protocol is not coupled to the Azure ML platform.
Azure ML CLI is generally available (GA)
Models can now be deployed with a single CLI command. We got common customer feedback that no
one deploys an ML model from a Jupyter notebook. The CLI reference documentation has been
updated.
2019-04-22
Azure Machine Learning SDK for Python v1.0.30 released.
The PipelineEndpoint was introduced to add a new version of a published pipeline while maintaining same
endpoint.
2019-04-15
Azure portal
You can now resubmit an existing Script run on an existing remote compute cluster.
You can now run a published pipeline with new parameters on the Pipelines tab.
Run details now supports a new Snapshot file viewer. You can view a snapshot of the directory when you
submitted a specific run. You can also download the notebook that was submitted to start the run.
You can now cancel parent runs from the Azure portal.
2019-04-08
Azure Machine Learning SDK for Python v1.0.23
New features
The Azure Machine Learning SDK now supports Python 3.7.
Azure Machine Learning DNN Estimators now provide built-in multi-version support. For example,
TensorFlow estimator now accepts a framework_version parameter, and users can specify version
'1.10' or '1.12'. For a list of the versions supported by your current SDK release, call
get_supported_versions() on the desired framework class (for example,
). For a list of the versions supported by the latest SDK release,
TensorFlow.get_supported_versions()
see the DNN Estimator documentation.
2019-03-25
Azure Machine Learning SDK for Python v1.0.21
New features
The azureml.core.Run.create_children method allows low-latency creation of multiple child-runs with a
single call.
2019-03-11
Azure Machine Learning SDK for Python v1.0.18
Changes
The azureml-tensorboard package replaces azureml-contrib-tensorboard.
With this release, you can set up a user account on your managed compute cluster (amlcompute),
while creating it. This can be done by passing these properties in the provisioning configuration. You
can find more details in the SDK reference documentation.
Azure Machine Learning Data Prep SDK v1.0.17
New features
Now supports adding two numeric columns to generate a resultant column using the expression
language.
Bug fixes and improvements
Improved the documentation and parameter checking for random_split.
2019-02-27
Azure Machine Learning Data Prep SDK v1.0.16
Bug fix
Fixed a Service Principal authentication issue that was caused by an API change.
2019-02-25
Azure Machine Learning SDK for Python v1.0.17
New features
Azure Machine Learning now provides first class support for popular DNN framework Chainer. Using
Chainer class users can easily train and deploy Chainer models.
Learn how to run distributed training with ChainerMN
Learn how to run hyperparameter tuning with Chainer using HyperDrive
Azure Machine Learning Pipelines added ability to trigger a Pipeline run based on datastore
modifications. The pipeline schedule notebook is updated to showcase this feature.
Bug fixes and improvements
We have added support in Azure Machine Learning pipelines for setting the
source_directory_data_store property to a desired datastore (such as a blob storage) on
RunConfigurations that are supplied to the PythonScriptStep. By default Steps use Azure File store as
the backing datastore, which may run into throttling issues when a large number of steps are executed
concurrently.
Azure portal
New features
New drag and drop table editor experience for reports. Users can drag a column from the well to the
table area where a preview of the table will be displayed. The columns can be rearranged.
New Logs file viewer
Links to experiment runs, compute, models, images, and deployments from the activities tab
Next steps
Read the overview for Azure Machine Learning.
Azure Machine Learning CLI (v2) release notes
5/25/2022 • 7 minutes to read • Edit Online
2022-05-24
Azure Machine Learning CLI (v2) v2.4.0
The Azure Machine Learning CLI (v2) is now GA.
az ml job
The command group is marked as GA.
Added AutoML job type in public preview.
Added schedules property to pipeline job in public preview.
Added an option to list only archived jobs.
Improved reliability of az ml job download command.
az ml data
The command group is marked as GA.
Added MLTable data type in public preview.
Added an option to list only archived data assets.
az ml environment
Added an option to list only archived environments.
az ml model
The command group is marked as GA.
Allow models to be created from job outputs.
Added an option to list only archived models.
az ml online-deployment
The command group is marked as GA.
Removed timeout waiting for deployment creation.
Improved online deployment list view.
az ml online-endpoint
The command group is marked as GA.
Added mirror_traffic property to online endpoints in public preview.
Improved online endpoint list view.
az ml batch-deployment
The command group is marked as GA.
Added support for uri_file and uri_folder as invocation input.
Fixed a bug in batch deployment update.
Fixed a bug in batch deployment list-jobs output.
az ml batch-endpoint
The command group is marked as GA.
Added support for uri_file and uri_folder as invocation input.
Fixed a bug in batch endpoint update.
Fixed a bug in batch endpoint list-jobs output.
az ml component
The command group is marked as GA.
Added an option to list only archived components.
az ml code
This command group is removed.
2022-03-14
Azure Machine Learning CLI (v2) v2.2.1
az ml job
For all job types, flattened the code section of the YAML schema. Instead of code.local_path to
specify the path to the source code directory, it is now just code
For all job types, changed the schema for defining data inputs to the job in the job YAML. Instead of
specifying the data path using either the file or folder fields, use the path field to specify either a
local path, a URI to a cloud path containing the data, or a reference to an existing registered Azure ML
data asset via path: azureml:<data_name>:<data_version> . Also specify the type field to clarify
whether the data source is a single file ( uri_file ) or a folder ( uri_folder ). If type field is omitted, it
defaults to type: uri_folder . For more information, see the section of any of the job YAML references
that discuss the schema for specifying input data.
In the sweep job YAML schema, changed the sampling_algorithm field from a string to an object in
order to support additional configurations for the random sampling algorithm type
Removed the component job YAML schema. With this release, if you want to run a command job
inside a pipeline that uses a component, just specify the component to the component field of the
command job YAML definition.
For all job types, added support for referencing the latest version of a nested asset in the job YAML
configuration. When referencing a registered environment or data asset to use as input in a job, you
can alias by latest version rather than having to explicitly specify the version. For example:
environment: azureml:AzureML-Minimal@latest
For pipeline jobs, introduced the ${{ parent }} context for binding inputs and outputs between steps
in a pipeline. For more information, see Expression syntax for binding inputs and outputs between
steps in a pipeline job.
Added support for downloading named outputs of job via the --output-name argument for the
az ml job download command
az ml data
Deprecated the az ml dataset subgroup, now using az ml data instead
There are two types of data that can now be created, either from a single file source ( type: uri_file )
or a folder ( type: uri_folder ). When creating the data asset, you can either specify the data source
from a local file / folder or from a URI to a cloud path location. See the data YAML schema for the full
schema
az ml environment
In the environment YAML schema, renamed the build.local_path field to build.path
Removed the build.context_uri field, the URI of the uploaded build context location will be accessible
via build.path when the environment is returned
az ml model
In the model YAML schema, model_uri and local_path fields removed and consolidated to one path
field that can take either a local path or a cloud path URI. model_format field renamed to type ; the
default type is custom_model , but you can specify one of the other types ( mlflow_model , triton_model )
to use the model in no-code deployment scenarios
For az ml model create , --model-uri and --local-path arguments removed and consolidated to one
--path argument that can take either a local path or a cloud path URI
Added the az ml model download command to download a model's artifact files
az ml online-deployment
In the online deployment YAML schema, flattened the code section of the code_configuration field.
Instead of code_configuration.code.local_path to specify the path to the source code directory
containing the scoring files, it is now just code_configuration.code
Added an environment_variables field to the online deployment YAML schema to support configuring
environment variables for an online deployment
az ml batch-deployment
In the batch deployment YAML schema, flattened the code section of the code_configuration field.
Instead of code_configuration.code.local_path to specify the path to the source code directory
containing the scoring files, it is now just code_configuration.code
az ml component
Flattened the code section of the command component YAML schema. Instead of code.local_path to
specify the path to the source code directory, it is now just code
Added support for referencing the latest version of a registered environment to use in the component
YAML configuration. When referencing a registered environment, you can alias by latest version rather
than having to explicitly specify the version. For example:
environment: azureml:AzureML-Minimal@latest
Renamed the component input and output type value from path to uri_folder for the type field
when defining a component input or output
Removed the delete commands for assets (model, component, data, environment). The existing delete
functionality is only a soft delete, so the delete commands will be reintroduced in a later release once hard
delete is supported
Added support for archiving and restoring assets (model, component, data, environment) and jobs, e.g.
az ml model archive and az ml model restore . You can now archive assets and jobs, which will hide the
archived entity from list queries (e.g. az ml model list ).
2021-10-04
Azure Machine Learning CLI (v2) v2.0.2
az ml workspace
Updated workspace YAML schema
az ml compute
Updated YAML schemas for AmlCompute and Compute Instance
Removed support for legacy AKS attach via az ml compute attach . Azure Arc-enabled Kubernetes
attach will be supported in the next release
az ml datastore
Updated YAML schemas for Azure blob, Azure file, Azure Data Lake Gen1, and Azure Data Lake Gen2
datastores
Added support for creating Azure Data Lake Storage Gen1 and Gen2 datastores
az ml job
Updated YAML schemas for command job and sweep job
Added support for running pipeline jobs (pipeline job YAML schema)
Added support for job input literals and input data URIs for all job types
Added support for job outputs for all job types
Changed the expression syntax from { <expression> } to ${{ <expression> }} . For more
information, see Expression syntax for configuring Azure ML jobs
az ml environment
Updated environment YAML schema
Added support for creating environments from Docker build context
az ml model
Updated model YAML schema
Added new model_format property to Model for no-code deployment scenarios
az ml dataset
Renamed az ml data subgroup to az ml dataset
Updated dataset YAML schema
az ml component
Added the az ml component commands for managing Azure ML components
Added support for command components (command component YAML schema)
az ml online-endpoint
az ml endpoint subgroup split into two separate groups: az ml online-endpoint and
az ml batch-endpoint
Updated online endpoint YAML schema
Added support for local endpoints for dev/test scenarios
Added interactive VSCode debugging support for local endpoints (added the --vscode-debug flag to
az ml batch-endpoint create/update )
az ml online-deployment
az ml deployment subgroup split into two separate groups: az ml online-deployment and
az ml batch-deployment
Updated managed online deployment YAML schema
Added autoscaling support via integration with Azure Monitor Autoscale
Added support for updating multiple online deployment properties in the same update operation
Added support for performing concurrent operations on deployments under the same endpoint
az ml batch-endpoint
az ml endpoint subgroup split into two separate groups: az ml online-endpoint and
az ml batch-endpoint
Updated batch endpoint YAML schema
Removed traffic property; replaced with a configurable default deployment property
Added support for input data URIs for az ml batch-endpoint invoke
Added support for VNet ingress (private link)
az ml batch-deployment
az ml deployment subgroup split into two separate groups: az ml online-deployment and
az ml batch-deployment
Updated batch deployment YAML schema
2021-05-25
Announcing the CLI (v2) (preview) for Azure Machine Learning
The ml extension to the Azure CLI is the next-generation interface for Azure Machine Learning. It enables you to
train and deploy models from the command line, with features that accelerate scaling data science up and out
while tracking the model lifecycle. Install and get started.
Service limits in Azure Machine Learning
5/25/2022 • 2 minutes to read • Edit Online
This section lists basic quotas and throttling thresholds in Azure Machine Learning.
To learn how increase resource quotas, see "Manage and increase quotas for resources"
IMPORTANT
Azure Machine Learning doesn't store or process your data outside of the region where you deploy.
Workspaces
L IM IT VA L UE
Runs
L IM IT VA L UE
Number of properties 50
Number of tags 50
Metrics
L IM IT VA L UE
NOTE
If you are hitting the limit of metric names per run because you are formatting variables into the metric name, consider
instead to use a row metric where one column is the variable value and the second column is the metric value.
Artifacts
L IM IT VA L UE
Limit increases
Some limits can be increased for individual workspaces by contacting support.
Next steps
Configure your Azure Machine Learning environment
Learn how increase resource quotas in "Manage and increase quotas for resources".
Azure Machine Learning feature availability across
clouds regions
5/25/2022 • 8 minutes to read • Edit Online
Learn what Azure Machine Learning features are available in the Azure Government, Azure Germany, and Azure
China 21Vianet regions.
In the list of global Azure regions, there are several regions that serve specific markets in addition to the public
cloud regions. For example, the Azure Government and the Azure China 21Vianet regions. Azure Machine
Learning is deployed into the following regions, in addition to public cloud regions:
Azure Government regions US-Arizona and US-Virginia .
Azure China 21Vianet region China-East-2 .
Azure Machine Learning is still in devlopment in Airgap Regions.
The information in the rest of this document provides information on what features of Azure Machine Learning
are available in these regions, along with region-specific information on using these features.
Azure Government
F EAT URE P UB L IC C LO UD STAT US US- VIRGIN IA US- A RIZ O N A
Automated machine
learning
Azure Databricks GA NO NO
integration
Machine Learning
pipelines
F EAT URE P UB L IC C LO UD STAT US US- VIRGIN IA US- A RIZ O N A
Azure Databricks GA NO NO
Integration with ML
Pipeline
Integrated notebooks
Compute instance
SDK suppor t
Security
F EAT URE P UB L IC C LO UD STAT US US- VIRGIN IA US- A RIZ O N A
Compute
Machine learning
lifecycle
FPGA-based Hardware GA NO NO
Accelerated Models
F EAT URE P UB L IC C LO UD STAT US US- VIRGIN IA US- A RIZ O N A
Responsible ML
Training
Inference
F EAT URE P UB L IC C LO UD STAT US US- VIRGIN IA US- A RIZ O N A
Other
General Machine
Learning Ser vice Usage
SC EN A RIO US- VIRGIN IA US- A RIZ O N A L IM ITAT IO N S
Automated machine
learning
Machine Learning
pipelines
Integrated notebooks
Compute instance
SDK suppor t
Security
Compute
Machine learning
lifecycle
F EAT URE P UB L IC C LO UD STAT US C H - EA ST - 2 C H - N O RT H - 3
Labeling
Responsible AI
Training
Inference
Other
Next steps
To learn more about the regions that Azure Machine learning is available in, see Products by region.
What happened to Azure Machine Learning
Workbench?
5/25/2022 • 4 minutes to read • Edit Online
The Azure Machine Learning Workbench application and some other early features were deprecated and
replaced in the September 2018 release to make way for an improved architecture.
To improve your experience, the release contains many significant updates prompted by customer feedback. The
core functionality from experiment runs to model deployment hasn't changed. But now, you can use the robust
Python SDK, and the Azure CLI to accomplish your machine learning tasks and pipelines.
Most of the artifacts that were created in the earlier version of Azure Machine Learning are stored in your own
local or cloud storage. These artifacts won't ever disappear.
In this article, you learn about what changed and how it affects your pre-existing work with the Azure Machine
Learning Workbench and its APIs.
WARNING
This article is not for Azure Machine Learning Studio users. It is for Azure Machine Learning customers who have installed
the Workbench (preview) application and/or have experimentation and model management preview accounts.
What changed?
The latest release of Azure Machine Learning includes the following features:
A simplified Azure resources model.
A new portal UI to manage your experiments and compute targets.
A new, more comprehensive Python SDK.
The new expanded Azure CLI extension for machine learning.
The architecture was redesigned for ease of use. Instead of multiple Azure resources and accounts, you only
need an Azure Machine Learning Workspace. You can create workspaces quickly in the Azure portal. By using a
workspace, multiple users can store training and deployment compute targets, model experiments, Docker
images, deployed models, and so on.
Although there are new improved CLI and SDK clients in the current release, the desktop workbench application
itself has been retired. Experiments can be managed in the workspace dashboard in Azure Machine Learning
studio. Use the dashboard to get your experiment history, manage the compute targets attached to your
workspace, manage your models and Docker images, and even deploy web services.
Support timeline
On January 9th, 2019 support for Machine Learning Workbench, Azure Machine Learning Experimentation and
Model Management accounts, and their associated SDK and CLI ended.
All the latest capabilities are available by using this SDK, the CLI, and the portal.
Start training your models and tracking the run histories using the new CLI and SDK. You can learn how with the
Tutorial: train models with Azure Machine Learning.
run = exp.submit(source_directory=script_folder,
script='train.py', run_config=run_config_system_managed)
Next steps
Learn about the latest architecture for Azure Machine Learning.
For an overview of the service, read What is Azure Machine Learning?.
Start with Quickstart: Get started with Azure Machine Learning. Then use these resources to create your first
experiment with your preferred method:
Run a "Hello world!" Python script (part 1 of 3)
Use a Jupyter notebook to train image classification models
Use automated machine learning
Use the designer's drag & drop capabilities
Use the ML extension to the CLI
Use a keyboard to use Azure Machine Learning
designer
5/25/2022 • 2 minutes to read • Edit Online
Learn how to use a keyboard and screen reader to use Azure Machine Learning designer. For a list of keyboard
shortcuts that work everywhere in the Azure portal, see Keyboard shortcuts in the Azure portal
This workflow has been tested with Narrator and JAWS, but it should work with other standard screen readers.
Navigation shortcuts
K EY ST RO K E DESC RIP T IO N
Ctrl + G Move focus to first failed node if the pipeline run failed
Action shortcuts
Use the following shortcuts with the access key. For more information on access keys, see
https://en.wikipedia.org/wiki/Access_key.
K EY ST RO K E A C T IO N
Next steps
Turn on high contrast or change theme
Accessibility related tools at Microsoft