0% found this document useful (0 votes)
1K views350 pages

Azure Cognitive Services Openai

This document provides an overview of the quotas and limits for Azure OpenAI Service, including: - Default quotas per model that vary based on the model, with most models having a default of 240,000 tokens per minute. - Limits on the number of resources, model deployments, training jobs, file sizes and more that can be used with each Azure OpenAI resource. - Best practices for staying within rate limits such as implementing retry logic and gradually increasing workload.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1K views350 pages

Azure Cognitive Services Openai

This document provides an overview of the quotas and limits for Azure OpenAI Service, including: - Default quotas per model that vary based on the model, with most models having a default of 240,000 tokens per minute. - Limits on the number of resources, model deployments, training jobs, file sizes and more that can be used with each Azure OpenAI resource. - Best practices for staying within rate limits such as implementing retry logic and gradually increasing workload.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 350

Tell us about your PDF experience.

Azure OpenAI Service Documentation


Learn how to use Azure OpenAI's powerful language models including the GPT-3, Codex and
Embeddings model series for content generation, summarization, semantic search, and natural
language to code translation.

OVERVIEW QUICKSTART
What is Azure OpenAI Service? Quickstarts

HOW-TO GUIDE TUTORIAL


Create a resource Embeddings

HOW-TO GUIDE TRAINING


Completions Intro to Azure OpenAI training

CONCEPT REFERENCE
Azure OpenAI Models Support and help options

Additional resources

Azure OpenAI Video


Azure OpenAI Studio
Combining OpenAI models with the power of Azure
Region support

Quotas and limits


Apply for access to Azure OpenAI

Reference Tools
REST API Azure CLI
Terms of use
PowerShell
What is Azure OpenAI Service?
Article • 06/28/2023

Azure OpenAI Service provides REST API access to OpenAI's powerful language models
including the GPT-3, Codex and Embeddings model series. In addition, the new GPT-4
and ChatGPT (gpt-35-turbo) model series have now reached general availability. These
models can be easily adapted to your specific task including but not limited to content
generation, summarization, semantic search, and natural language to code translation.
Users can access the service through REST APIs, Python SDK, or our web-based interface
in the Azure OpenAI Studio.

Features overview

Feature Azure OpenAI

Models available NEW GPT-4 series

GPT-3 base series

NEW ChatGPT (gpt-35-turbo)

Codex series

Embeddings series

Learn more in our Models page.

Fine-tuning Ada

Babbage

Curie
Cushman

Davinci

Fine-tuning is currently unavailable to new customers.

Price Available here

Virtual network support Yes, unless using Azure OpenAI on your data.
& private link support

Managed Identity Yes, via Azure Active Directory

UI experience Azure portal for account & resource management,

Azure OpenAI Service Studio for model exploration and fine tuning

Model regional Model availability


availability

Content filtering Prompts and completions are evaluated against our content policy
with automated systems. High severity content will be filtered.
Responsible AI
At Microsoft, we're committed to the advancement of AI driven by principles that put
people first. Generative models such as the ones available in Azure OpenAI have
significant potential benefits, but without careful design and thoughtful mitigations,
such models have the potential to generate incorrect or even harmful content. Microsoft
has made significant investments to help guard against abuse and unintended harm,
which includes requiring applicants to show well-defined use cases, incorporating
Microsoft’s principles for responsible AI use , building content filters to support
customers, and providing responsible AI implementation guidance to onboarded
customers.

How do I get access to Azure OpenAI?


How do I get access to Azure OpenAI?

Access is currently limited as we navigate high demand, upcoming product


improvements, and Microsoft’s commitment to responsible AI . For now, we're working
with customers with an existing partnership with Microsoft, lower risk use cases, and
those committed to incorporating mitigations.

More specific information is included in the application form. We appreciate your


patience as we work to responsibly enable broader access to Azure OpenAI.

Apply here for access:

Apply now

Comparing Azure OpenAI and OpenAI


Azure OpenAI Service gives customers advanced language AI with OpenAI GPT-4, GPT-
3, Codex, and DALL-E models with the security and enterprise promise of Azure. Azure
OpenAI co-develops the APIs with OpenAI, ensuring compatibility and a smooth
transition from one to the other.

With Azure OpenAI, customers get the security capabilities of Microsoft Azure while
running the same models as OpenAI. Azure OpenAI offers private networking, regional
availability, and responsible AI content filtering.

Key concepts
Prompts & completions
The completions endpoint is the core component of the API service. This API provides
access to the model's text-in, text-out interface. Users simply need to provide an input
prompt containing the English text command, and the model will generate a text
completion.

Here's an example of a simple prompt and completion:

Prompt:
""" count to 5 in a for loop """

Completion:
for i in range(1, 6): print(i)

Tokens
Azure OpenAI processes text by breaking it down into tokens. Tokens can be words or
just chunks of characters. For example, the word “hamburger” gets broken up into the
tokens “ham”, “bur” and “ger”, while a short and common word like “pear” is a single
token. Many tokens start with a whitespace, for example “ hello” and “ bye”.

The total number of tokens processed in a given request depends on the length of your
input, output and request parameters. The quantity of tokens being processed will also
affect your response latency and throughput for the models.

Resources
Azure OpenAI is a new product offering on Azure. You can get started with Azure
OpenAI the same way as any other Azure product where you create a resource, or
instance of the service, in your Azure Subscription. You can read more about Azure's
resource management design.

Deployments
Once you create an Azure OpenAI Resource, you must deploy a model before you can
start making API calls and generating text. This action can be done using the
Deployment APIs. These APIs allow you to specify the model you wish to use.

Prompt engineering
GPT-3, GPT-3.5, and GPT-4 models from OpenAI are prompt-based. With prompt-based
models, the user interacts with the model by entering a text prompt, to which the model
responds with a text completion. This completion is the model’s continuation of the
input text.

While these models are extremely powerful, their behavior is also very sensitive to the
prompt. This makes prompt engineering an important skill to develop.

Prompt construction can be difficult. In practice, the prompt acts to configure the model
weights to complete the desired task, but it's more of an art than a science, often
requiring experience and intuition to craft a successful prompt.

Models
The service provides users access to several different models. Each model provides a
different capability and price point.

GPT-4 models are the latest available models. Due to high demand access to this model
series is currently only available by request. To request access, existing Azure OpenAI
customers can apply by filling out this form

The GPT-3 base models are known as Davinci, Curie, Babbage, and Ada in decreasing
order of capability and increasing order of speed.

The Codex series of models is a descendant of GPT-3 and has been trained on both
natural language and code to power natural language to code use cases. Learn more
about each model on our models concept page.

The DALL-E models, currently in preview, generate images from text prompts that the
user provides.

Next steps
Learn more about the underlying models that power Azure OpenAI.
Azure OpenAI Service quotas and limits
Article • 06/21/2023

This article contains a quick reference and a detailed description of the quotas and limits
for Azure OpenAI in Azure Cognitive Services.

Quotas and limits reference


The following sections provide you with a quick guide to the default quotas and limits
that apply to Azure OpenAI:

Limit Name Limit Value

OpenAI resources per region per Azure 30


subscription

Default quota per model and region (in Text-Davinci-003: 120 K

tokens-per-minute)1 GPT-4: 20 K

GPT-4-32K: 60 K
All others: 240 K

Default DALL-E quota limits 2 concurrent requests

Maximum prompt tokens per request Varies per model. For more information, see Azure
OpenAI Service models

Max fine-tuned model deployments 2

Total number of training jobs per resource 100

Max simultaneous running training jobs 1


per resource

Max training jobs queued 20

Max Files per resource 30

Total size of all files per resource 1 GB

Max training job time (job will fail if 720 hours


exceeded)

Max training job size (tokens in training 2 Billion


file) x (# of epochs)

1 Default quota limits are subject to change.


General best practices to remain within rate limits
To minimize issues related to rate limits, it's a good idea to use the following techniques:

Implement retry logic in your application.


Avoid sharp changes in the workload. Increase the workload gradually.
Test different load increase patterns.
Increase the quota assigned to your deployment. Move quota from another
deployment, if necessary.

How to request increases to the default quotas and limits


Quota increase requests can be submitted from the Quotas page of Azure OpenAI
Studio. Please note that due to overwhelming demand, we are not currently approving
new quota increase requests. Your request will be queued until it can be filled at a later
time.

For other rate limits, please submit a service request.

Next steps
Explore how to manage quota for your Azure OpenAI deployments.
Learn more about
the underlying models that power Azure OpenAI.
Azure OpenAI Service models
Article • 06/30/2023

Azure OpenAI provides access to many different models, grouped by family and
capability. A model family typically associates models by their intended task. The
following table describes model families currently available in Azure OpenAI. Not all
models are available in all regions currently. Refer to the model capability table in this
article for a full breakdown.

Model Description
family

GPT-4 A set of models that improve on GPT-3.5 and can understand as well as generate
natural language and code.

GPT-3 A series of models that can understand and generate natural language. This
includes the new ChatGPT model.

DALL-E A series of models in preview that can generate original images from natural
(Preview) language.

Codex A series of models that can understand and generate code, including translating
natural language to code.

Embeddings A set of models that can understand and use embeddings. An embedding is a
special format of data representation that can be easily utilized by machine
learning models and algorithms. The embedding is an information dense
representation of the semantic meaning of a piece of text. Currently, we offer three
families of Embeddings models for different functionalities: similarity, text search,
and code search.

Model capabilities
Each model family has a series of models that are further distinguished by capability.
These capabilities are typically identified by names, and the alphabetical order of these
names generally signifies the relative capability and cost of that model within a given
model family. For example, GPT-3 models use names such as Ada, Babbage, Curie, and
Davinci to indicate relative capability and cost. Davinci is more capable and more
expensive than Curie, which in turn is more capable and more expensive than Babbage,
and so on.

7 Note
Any task that can be performed by a less capable model like Ada can be performed
by a more capable model like Curie or Davinci.

Naming convention
Azure OpenAI model names typically correspond to the following standard naming
convention:

{capability}-{family}[-{input-type}]-{identifier}

Element Description

{capability} The model capability of the model. For example, GPT-3 models uses text , while
Codex models use code .

{family} The relative family of the model. For example, GPT-3 models include ada ,
babbage , curie , and davinci .

{input-type} (Embeddings models only) The input type of the embedding supported by the
model. For example, text search embedding models support doc and query .

{identifier} The version identifier of the model.

For example, our most powerful GPT-3 model is called text-davinci-003 , while our
most powerful Codex model is called code-davinci-002 .

The older versions of GPT-3 models named ada , babbage , curie , and davinci that
don't follow the standard naming convention are primarily intended for fine tuning.
For more information, see Learn how to customize a model for your application.

Finding what models are available


You can get a list of models that are available for both inference and fine-tuning by your
Azure OpenAI resource by using the Models List API.

Model updates
Azure OpenAI now supports automatic updates for select model deployments. On
models where automatic update support is available, a model version drop-down will be
visible in Azure OpenAI Studio under Create new deployment and Edit deployment:

Auto update to default


When Auto-update to default is selected your model deployment will be automatically
updated within two weeks of a new version being released.

If you are still in the early testing phases for completion and chat completion based
models we recommend deploying models with auto-update to default set whenever it
is available. For embeddings models while we recommend using the latest model
version, you should choose when you want to upgrade since embeddings generated
with an earlier model version will not be interchangeable with the new version.

Specific model version


As your use of Azure OpenAI evolves, and you start to build and integrate with
applications you will likely want to manually control model updates so that you can first
test and validate that model performance is remaining consistent for your use case prior
to upgrade.
When you select a specific model version for a deployment this version will remain
selected until you either choose to manually update yourself, or once you reach the
expiration date for the model. When the deprecation/expiration date is reached the
model will auto-upgrade to the latest available version.

GPT-35-Turbo 0301 and GPT-4 0314 expiration


The original gpt-35-turbo ( 0301 ) and both gpt-4 ( 0314 ) models will expire no earlier
than October 15th, 2023. Upon expiration, deployments will automatically be upgraded
to the default version. If you would like your deployment to stop accepting completion
requests rather than upgrading, then you will be able to set the model upgrade option
to expire through the API. We will publish guidelines on this by September 1.

Viewing deprecation dates


For currently deployed models, from Azure OpenAI Studio select Deployments:

To view deprecation/expiration dates for all available models in a given region from
Azure OpenAI Studio select Models > Column options > Select Deprecation fine tune
and Deprecation inference:


Update & deploy models via the API
HTTP

PUT
https://management.azure.com/subscriptions/{subscriptionId}/resourceGroups/{
resourceGroupName}/providers/Microsoft.CognitiveServices/accounts/{accountNa
me}/deployments/{deploymentName}?api-version=2023-05-01

Path parameters

Parameter Type Required? Description

acountname string Required The name of your Azure OpenAI Resource.

deploymentName string Required The deployment name you chose when you deployed
an existing model or the name you would like a new
model deployment to have.

resourceGroupName string Required The name of the associated resource group for this
model deployment.

subscriptionId string Required Subscription ID for the associated subscription.

api-version string Required The API version to use for this operation. This follows
the YYYY-MM-DD format.

Supported versions

2023-05-01 Swagger spec

Request body

This is only a subset of the available request body parameters. For the full list of the
parameters you can refer to the REST API spec .

Parameter Type Description

versionUpgradeOption String Deployment model version upgrade options:

OnceNewDefaultVersionAvailable

OnceCurrentVersionExpired

NoAutoUpgrade

capacity integer This represents the amount of quota you are assigning to this
deployment. A value of 1 equals 1,000 Tokens per Minute
(TPM)
Example request

Bash

curl -X PUT https://management.azure.com/subscriptions/00000000-0000-0000-


0000-000000000000/resourceGroups/resource-group-
temp/providers/Microsoft.CognitiveServices/accounts/docs-openai-test-
001/deployments/text-embedding-ada-002-test-1" \

-H "Content-Type: application/json" \

-H 'Authorization: Bearer YOUR_AUTH_TOKEN' \

-d '{"sku":{"name":"Standard","capacity":1},"properties": {"model":
{"format": "OpenAI","name": "text-embedding-ada-002","version":
"2"},"versionUpgradeOption":"OnceCurrentVersionExpired"}}'

7 Note

There are multiple ways to generate an authorization token. The easiest method for
initial testing is to launch the Cloud Shell from https://portal.azure.com . Then
run az account get-access-token. You can use this token as your temporary
authorization token for API testing.

Example response

JSON

"id": "/subscriptions/{subscription-id}/resourceGroups/resource-group-
temp/providers/Microsoft.CognitiveServices/accounts/docs-openai-test-
001/deployments/text-embedding-ada-002-test-1",

"type": "Microsoft.CognitiveServices/accounts/deployments",

"name": "text-embedding-ada-002-test-1",

"sku": {

"name": "Standard",
"capacity": 1

},

"properties": {

"model": {

"format": "OpenAI",

"name": "text-embedding-ada-002",

"version": "2"

},

"versionUpgradeOption": "OnceCurrentVersionExpired",
"capabilities": {

"embeddings": "true",

"embeddingsMaxInputs": "1"

},

"provisioningState": "Succeeded",

"ratelimits": [

"key": "request",

"renewalPeriod": 10,

"count": 2

},

"key": "token",

"renewalPeriod": 60,

"count": 1000

},

"systemData": {

"createdBy": "docs@contoso.com",

"createdByType": "User",

"createdAt": "2023-06-13T00:12:38.885937Z",

"lastModifiedBy": "docs@contoso.com",

"lastModifiedByType": "User",

"lastModifiedAt": "2023-06-13T02:41:04.8410965Z"

},

"etag": "\"{GUID}\""

Finding the right model


We recommend starting with the most capable model in a model family to confirm
whether the model capabilities meet your requirements. Then you can stay with that
model or move to a model with lower capability and cost, optimizing around that
model's capabilities.

GPT-4 models
GPT-4 can solve difficult problems with greater accuracy than any of OpenAI's previous
models. Like gpt-35-turbo, GPT-4 is optimized for chat but works well for traditional
completions tasks.

Due to high demand access to this model series is currently only available by request. To
request access, existing Azure OpenAI customers can apply by filling out this form

gpt-4

gpt-4-32k

The gpt-4 supports 8192 max input tokens and the gpt-4-32k supports up to 32,768
tokens.
GPT-3 models
The GPT-3 models can understand and generate natural language. The service offers
four model capabilities, each with different levels of power and speed suitable for
different tasks. Davinci is the most capable model, while Ada is the fastest. In the order
of greater to lesser capability, the models are:

text-davinci-003
text-curie-001

text-babbage-001

text-ada-001

While Davinci is the most capable, the other models provide significant speed
advantages. Our recommendation is for users to start with Davinci while experimenting,
because it produces the best results and validate the value that Azure OpenAI can
provide. Once you have a prototype working, you can then optimize your model choice
with the best latency/performance balance for your application.

Davinci
Davinci is the most capable model and can perform any task the other models can
perform, often with less instruction. For applications requiring deep understanding of
the content, like summarization for a specific audience and creative content generation,
Davinci produces the best results. The increased capabilities provided by Davinci require
more compute resources, so Davinci costs more and isn't as fast as other models.

Another area where Davinci excels is in understanding the intent of text. Davinci is
excellent at solving many kinds of logic problems and explaining the motives of
characters. Davinci has been able to solve some of the most challenging AI problems
involving cause and effect.

Use for: Complex intent, cause and effect, summarization for audience

Curie
Curie is powerful, yet fast. While Davinci is stronger when it comes to analyzing
complicated text, Curie is capable for many nuanced tasks like sentiment classification
and summarization. Curie is also good at answering questions and performing Q&A and
as a general service chatbot.

Use for: Language translation, complex classification, text sentiment, summarization


Babbage
Babbage can perform straightforward tasks like simple classification. It’s also capable
when it comes to semantic search, ranking how well documents match up with search
queries.

Use for: Moderate classification, semantic search classification

Ada
Ada is usually the fastest model and can perform tasks like parsing text, address
correction and certain kinds of classification tasks that don’t require too much nuance.
Ada’s performance can often be improved by providing more context.

Use for: Parsing text, simple classification, address correction, keywords

ChatGPT (gpt-35-turbo)
The ChatGPT model (gpt-35-turbo) is a language model designed for conversational
interfaces and the model behaves differently than previous GPT-3 models. Previous
models were text-in and text-out, meaning they accepted a prompt string and returned
a completion to append to the prompt. However, the ChatGPT model is conversation-in
and message-out. The model expects a prompt string formatted in a specific chat-like
transcript format, and returns a completion that represents a model-written message in
the chat.

To learn more about the ChatGPT model and how to interact with the Chat API check
out our in-depth how-to.

DALL-E models (Preview)


The DALL-E models, currently in preview, generate images from text prompts that the
user provides.

Codex models
The Codex models are descendants of our base GPT-3 models that can understand and
generate code. Their training data contains both natural language and billions of lines of
public code from GitHub.
They’re most capable in Python and proficient in over a dozen languages, including C#,
JavaScript, Go, Perl, PHP, Ruby, Swift, TypeScript, SQL, and Shell. In the order of greater
to lesser capability, the Codex models are:

code-davinci-002
code-cushman-001

Davinci
Similar to GPT-3, Davinci is the most capable Codex model and can perform any task the
other models can perform, often with less instruction. For applications requiring deep
understanding of the content, Davinci produces the best results. Greater capabilities
require more compute resources, so Davinci costs more and isn't as fast as other
models.

Cushman
Cushman is powerful, yet fast. While Davinci is stronger when it comes to analyzing
complicated tasks, Cushman is a capable model for many code generation tasks.
Cushman typically runs faster and cheaper than Davinci, as well.

Embeddings models

) Important

We strongly recommend using text-embedding-ada-002 (Version 2) . This


model/version provides parity with OpenAI's text-embedding-ada-002 . To learn
more about the improvements offered by this model, please refer to OpenAI's blog
post . Even if you are currently using Version 1 you should migrate to Version 2 to
take advantage of the latest weights/updated token limit. Version 1 and Version 2
are not interchangeable, so document embedding and document search must be
done using the same version of the model.

Currently, we offer three families of Embeddings models for different functionalities:

Similarity
Text search
Code search
Each family includes models across a range of capability. The following list indicates the
length of the numerical vector returned by the service, based on model capability:

Base Model Model(s) Dimensions

Ada models ending in -001 (Version 1) 1024

Ada text-embedding-ada-002 (Version 2) 1536

Babbage 2048

Curie 4096

Davinci 12288

Davinci is the most capable, but is slower and more expensive than the other models.
Ada is the least capable, but is both faster and cheaper.

Similarity embedding
These models are good at capturing semantic similarity between two or more pieces of
text.

Use cases Models

Clustering, regression, anomaly detection, visualization text-similarity-ada-001

text-similarity-babbage-001

text-similarity-curie-001

text-similarity-davinci-001

Text search embedding


These models help measure whether long documents are relevant to a short search
query. There are two input types supported by this family: doc , for embedding the
documents to be retrieved, and query , for embedding the search query.

Use cases Models


Use cases Models

Search, context relevance, information retrieval text-search-ada-doc-001

text-search-ada-query-001

text-search-babbage-doc-001

text-search-babbage-query-001

text-search-curie-doc-001

text-search-curie-query-001

text-search-davinci-doc-001

text-search-davinci-query-001

Code search embedding


Similar to text search embedding models, there are two input types supported by this
family: code , for embedding code snippets to be retrieved, and text , for embedding
natural language search queries.

Use cases Models

Code search and relevance code-search-ada-code-001

code-search-ada-text-001

code-search-babbage-code-001

code-search-babbage-text-001

When using our embeddings models, keep in mind their limitations and risks.

Model Summary table and region availability

) Important

South Central US is temporarily unavailable for creating new resources due to high
demand.

GPT-3 Models
These models can be used with Completion API requests. gpt-35-turbo is the only
model that can be used with both Completion API requests and the Chat Completion
API.
Model ID Base model Regions Fine- Max Training
Tuning Request Data (up
Regions (tokens) to)

ada N/A N/A 2,049 Oct 2019

text-ada-001 East US, South Central US, West N/A 2,049 Oct 2019
Europe

babbage N/A N/A 2,049 Oct 2019

text- East US, South Central US, West N/A 2,049 Oct 2019
babbage-001 Europe

curie N/A N/A 2,049 Oct 2019

text-curie-001 East US, South Central US, West N/A 2,049 Oct 2019
Europe

davinci N/A N/A 2,049 Oct 2019

text-davinci- South Central US, West Europe N/A


001

text-davinci- East US, South Central US, West N/A 4,097 Jun 2021
002 Europe

text-davinci- East US, West Europe N/A 4,097 Jun 2021


003

text-davinci- N/A N/A


fine-tune-002

gpt-35-turbo1 East US, France Central, South N/A 4,096 Sep 2021
(0301) Central US, UK South, West Europe

gpt-35-turbo East US, France Central, UK South N/A 4,096 Sep 2021
(0613)

gpt-35-turbo- East US, France Central, UK South N/A 16,384 Sep 2021
16k (0613)

1
Version 0301 of gpt-35-turbo will be deprecated no earlier than October 15th, 2023 in
favor of version 0613 .

GPT-4 Models
These models can only be used with the Chat Completion API.
Model ID Base model Fine-Tuning Max Request Training Data
Regions Regions (tokens) (up to)

gpt-4 1,2 East US, France N/A 8,192 September 2021


(0314) Central

gpt-4-32k 1,2 East US, France N/A 32,768 September 2021


(0314) Central

gpt-4 1 (0613) East US, France N/A 8,192 September 2021


Central

gpt-4-32k 1 East US, France N/A 32,768 September 2021


(0613) Central

1
The model is only available by request .

2 Version 0314 of gpt-4 and gpt-4-32k will be deprecated no earlier than October 15th,
2023 in favor of version 0613 .

Dall-E Models

Model Base model Fine-Tuning Max Request Training Data (up


ID Regions Regions (characters) to)

dalle2 East US N/A 1000 N/A

Codex Models
These models can only be used with Completions API requests.

Model ID Base model Regions Fine-Tuning Max Request Training Data


Regions (tokens) (up to)

code- South Central US, Currently 2,048


cushman-0011 West Europe unavailable

code-davinci- East US, West Europe N/A 8,001 Jun 2021


002

1
The model is available for fine-tuning by request only. Currently we aren't accepting
new requests to fine-tune the model.

Embeddings Models
These models can only be used with Embedding API requests.
7 Note

We strongly recommend using text-embedding-ada-002 (Version 2) . This


model/version provides parity with OpenAI's text-embedding-ada-002 . To learn
more about the improvements offered by this model, please refer to OpenAI's blog
post . Even if you are currently using Version 1 you should migrate to Version 2 to
take advantage of the latest weights/updated token limit. Version 1 and Version 2
are not interchangeable, so document embedding and document search must be
done using the same version of the model.

Model ID Base model Regions Fine-Tuning Max Request Training


Regions (tokens) Data (up to)

text-embedding-ada- East US, South Central N/A 8,191 Sep 2021


002 (version 2) US

text-embedding-ada- East US, South Central N/A 2,046 Sep 2021


002 (version 1) US, West Europe

text-similarity-ada- East US, South Central N/A 2,046 Aug 2020


001 US, West Europe

text-similarity- South Central US, West N/A 2,046 Aug 2020


babbage-001 Europe

text-similarity-curie- East US, South Central N/A 2046 Aug 2020


001 US, West Europe

text-similarity-davinci- South Central US, West N/A 2,046 Aug 2020


001 Europe

text-search-ada-doc- South Central US, West N/A 2,046 Aug 2020


001 Europe

text-search-ada- South Central US, West N/A 2,046 Aug 2020


query-001 Europe

text-search-babbage- South Central US, West N/A 2,046 Aug 2020


doc-001 Europe

text-search-babbage- South Central US, West N/A 2,046 Aug 2020


query-001 Europe

text-search-curie-doc- South Central US, West N/A 2,046 Aug 2020


001 Europe

text-search-curie- South Central US, West N/A 2,046 Aug 2020


query-001 Europe
Model ID Base model Regions Fine-Tuning Max Request Training
Regions (tokens) Data (up to)

text-search-davinci- South Central US, West N/A 2,046 Aug 2020


doc-001 Europe

text-search-davinci- South Central US, West N/A 2,046 Aug 2020


query-001 Europe

code-search-ada- South Central US, West N/A 2,046 Aug 2020


code-001 Europe

code-search-ada-text- South Central US, West N/A 2,046 Aug 2020


001 Europe

code-search- South Central US, West N/A 2,046 Aug 2020


babbage-code-001 Europe

code-search- South Central US, West N/A 2,046 Aug 2020


babbage-text-001 Europe

Next steps
Learn more about Azure OpenAI
Learn more about fine-tuning Azure OpenAI models
Azure OpenAI Service legacy models
Article • 07/07/2023

Azure OpenAI Service offers a variety of models for different use cases. The following
models are not available for new deployments beginning July 6, 2023. Deployments
created prior to July 6, 2023 remain available to customers until July 5, 2024. We
recommend customers migrate to the replacement models prior to the July 5, 2024
retirement.

GPT-3.5
The impacted GPT-3.5 models are the following. The replacement for the GPT-3.5
models is GPT-3.5 Turbo Instruct when that model becomes available.

text-davinci-002

text-davinci-003

code-davinci-002

GPT-3
The impacted GPT-3 models are the following. The replacement for the GPT-3 models is
GPT-3.5 Turbo Instruct when that model becomes available.

text-ada-001
text-babbage-001

text-curie-001

text-davinci-001
code-cushman-001

Embedding models
The embedding models below will be retired effective July 5, 2024. Customers should
migrate to text-embedding-ada-002 (version 2).

Similarity
Text search
Code search
Each family includes models across a range of capability. The following list indicates the
length of the numerical vector returned by the service, based on model capability:

Base Model Model(s) Dimensions

Ada 1024

Babbage 2048

Curie 4096

Davinci 12288

Similarity embedding
These models are good at capturing semantic similarity between two or more pieces of
text.

Use cases Models

Clustering, regression, anomaly detection, visualization text-similarity-ada-001

text-similarity-babbage-001

text-similarity-curie-001

text-similarity-davinci-001

Text search embedding


These models help measure whether long documents are relevant to a short search
query. There are two input types supported by this family: doc , for embedding the
documents to be retrieved, and query , for embedding the search query.

Use cases Models

Search, context relevance, information retrieval text-search-ada-doc-001

text-search-ada-query-001

text-search-babbage-doc-001

text-search-babbage-query-001

text-search-curie-doc-001

text-search-curie-query-001

text-search-davinci-doc-001

text-search-davinci-query-001

Code search embedding


Similar to text search embedding models, there are two input types supported by this
family: code , for embedding code snippets to be retrieved, and text , for embedding
natural language search queries.

Use cases Models

Code search and relevance code-search-ada-code-001

code-search-ada-text-001

code-search-babbage-code-001

code-search-babbage-text-001

Model summary table and region availability


Region availability is for customers with deployments of the models prior to July 6, 2023.

GPT-3.5 models

Model ID Base model Regions Fine-Tuning Max Request Training Data


Regions (tokens) (up to)

text- East US, South Central US, N/A 4,097 Jun 2021
davinci-002 West Europe

text- East US, West Europe N/A 4,097 Jun 2021


davinci-003

code- East US, West Europe N/A 8,001 Jun 2021


davinci-002

GPT-3 models

Model ID Base model Regions Fine-Tuning Max Request Training Data


Regions (tokens) (up to)

ada N/A N/A 2,049 Oct 2019

text-ada- East US, South Central US, N/A 2,049 Oct 2019
001 West Europe

babbage N/A N/A 2,049 Oct 2019

text- East US, South Central US, N/A 2,049 Oct 2019
babbage- West Europe
001
Model ID Base model Regions Fine-Tuning Max Request Training Data
Regions (tokens) (up to)

curie N/A N/A 2,049 Oct 2019

text-curie- East US, South Central US, N/A 2,049 Oct 2019
001 West Europe

davinci N/A N/A 2,049 Oct 2019

text- South Central US, West N/A


davinci-001 Europe

Codex models

Model ID Base model Regions Fine-Tuning Max Request Training Data


Regions (tokens) (up to)

code- South Central US, West N/A 2,048


cushman-001 Europe

Embedding models

Model ID Base model Regions Fine-Tuning Max Request Training


Regions (tokens) Data (up to)

text-similarity-ada- East US, South Central N/A 2,046 Aug 2020


001 US, West Europe

text-similarity- South Central US, West N/A 2,046 Aug 2020


babbage-001 Europe

text-similarity-curie- East US, South Central N/A 2046 Aug 2020


001 US, West Europe

text-similarity- South Central US, West N/A 2,046 Aug 2020


davinci-001 Europe

text-search-ada- South Central US, West N/A 2,046 Aug 2020


doc-001 Europe

text-search-ada- South Central US, West N/A 2,046 Aug 2020


query-001 Europe

text-search- South Central US, West N/A 2,046 Aug 2020


babbage-doc-001 Europe
Model ID Base model Regions Fine-Tuning Max Request Training
Regions (tokens) Data (up to)

text-search- South Central US, West N/A 2,046 Aug 2020


babbage-query-001 Europe

text-search-curie- South Central US, West N/A 2,046 Aug 2020


doc-001 Europe

text-search-curie- South Central US, West N/A 2,046 Aug 2020


query-001 Europe

text-search-davinci- South Central US, West N/A 2,046 Aug 2020


doc-001 Europe

text-search-davinci- South Central US, West N/A 2,046 Aug 2020


query-001 Europe

code-search-ada- South Central US, West N/A 2,046 Aug 2020


code-001 Europe

code-search-ada- South Central US, West N/A 2,046 Aug 2020


text-001 Europe

code-search- South Central US, West N/A 2,046 Aug 2020


babbage-code-001 Europe

code-search- South Central US, West N/A 2,046 Aug 2020


babbage-text-001 Europe
What's new in Azure OpenAI Service
Article • 06/30/2023

June 2023

Use Azure OpenAI on your own data (preview)


Azure OpenAI on your data is now available in preview, enabling you to chat with
OpenAI models such as ChatGPT and GPT-4 and receive responses based on your
data.

New versions of gpt-35-turbo and gpt-4 models


gpt-35-turbo (version 0613)
gpt-35-turbo-16k (version 0613)
gpt-4 (version 0613)
gpt-4-32k (version 0613)

UK South
Azure OpenAI is now available in the UK South region. Check the models page, for
the latest information on model availability in each region.

Content filtering & annotations (Preview)


How to configure content filters with Azure OpenAI Service.
Enable annotations to view content filtering category and severity information as
part of your GPT based Completion and Chat Completion calls.

Quota
Quota provides the flexibility to actively manage the allocation of rate limits across
the deployments within your subscription.

May 2023

Java & JavaScript SDK support


NEW Azure OpenAI preview SDKs offering support for JavaScript and Java.

Azure OpenAI Chat Completion General Availability (GA)


General availability support for:
Chat Completion API version 2023-05-15 .
GPT-35-Turbo models.
GPT-4 model series. Due to high demand access to this model series is currently
only available by request. To request access, existing Azure OpenAI customers
can apply by filling out this form

If you are currently using the 2023-03-15-preview API, we recommend migrating to the
GA 2023-05-15 API. If you are currently using API version 2022-12-01 this API remains
GA, but does not include the latest Chat Completion capabilities.

) Important

Using the current versions of the GPT-35-Turbo models with the completion
endpoint remains in preview.

France Central
Azure OpenAI is now available in the France Central region. Check the models
page, for the latest information on model availability in each region.

April 2023
DALL-E 2 public preview. Azure OpenAI Service now supports image generation
APIs powered by OpenAI's DALL-E 2 model. Get AI-generated images based on the
descriptive text you provide. To learn more, check out the quickstart. To request
access, existing Azure OpenAI customers can apply by filling out this form .

Inactive deployments of customized models will now be deleted after 15 days;


models will remain available for redeployment. If a customized (fine-tuned)
model is deployed for more than fifteen (15) days during which no completions or
chat completions calls are made to it, the deployment will automatically be deleted
(and no further hosting charges will be incurred for that deployment). The
underlying customized model will remain available and can be redeployed at any
time. To learn more check out the how-to-article.
March 2023
GPT-4 series models are now available in preview on Azure OpenAI. To request
access, existing Azure OpenAI customers can apply by filling out this form . These
models are currently available in the East US and South Central US regions.

New Chat Completion API for ChatGPT and GPT-4 models released in preview
on 3/21. To learn more checkout the updated quickstarts and how-to article.

ChatGPT (gpt-35-turbo) preview. To learn more checkout the how-to article.

Increased training limits for fine-tuning: The max training job size (tokens in
training file) x (# of epochs) is 2 Billion tokens for all models. We have also
increased the max training job from 120 to 720 hours.

Adding additional use cases to your existing access.  Previously, the process for
adding new use cases required customers to reapply to the service. Now, we're
releasing a new process that allows you to quickly add new use cases to your use
of the service. This process follows the established Limited Access process within
Azure Cognitive Services. Existing customers can attest to any and all new use
cases here . Please note that this is required anytime you would like to use the
service for a new use case you did not originally apply for.

February 2023

New Features
.NET SDK(inference) preview release | Samples
Terraform SDK update to support Azure OpenAI management operations.
Inserting text at the end of a completion is now supported with the suffix
parameter.

Updates
Content filtering is on by default.

New articles on:

Monitoring an Azure OpenAI Service


Plan and manage costs for Azure OpenAI

New training course:


Intro to Azure OpenAI

January 2023

New Features
Service GA. Azure OpenAI Service is now generally available.​

New models: Addition of the latest text model, text-davinci-003 (East US, West
Europe), text-ada-embeddings-002 (East US, South Central US, West Europe)

December 2022

New features
The latest models from OpenAI. Azure OpenAI provides access to all the latest
models including the GPT-3.5 series​.

New API version (2022-12-01). This update includes several requested


enhancements including token usage information in the API response, improved
error messages for files, alignment with OpenAI on fine-tuning creation data
structure, and support for the suffix parameter to allow custom naming of fine-
tuned jobs.

Higher request per second limits. 50 for non-Davinci models. 20 for Davinci
models.​

Faster fine-tune deployments. Deploy an Ada and Curie fine-tuned models in


under 10 minutes.​

Higher training limits: 40M training tokens for Ada, Babbage, and Curie. 10M for
Davinci.​

Process for requesting modifications to the abuse & miss-use data logging &
human review. Today, the service logs request/response data for the purposes of
abuse and misuse detection to ensure that these powerful models aren't abused.
However, many customers have strict data privacy and security requirements that
require greater control over their data. To support these use cases, we're releasing
a new process for customers to modify the content filtering policies or turn off the
abuse logging for low-risk use cases. This process follows the established Limited
Access process within Azure Cognitive Services and existing OpenAI customers can
apply here .​

Customer managed key (CMK) encryption. CMK provides customers greater


control over managing their data in Azure OpenAI by providing their own
encryption keys used for storing training data and customized models. Customer-
managed keys (CMK), also known as bring your own key (BYOK), offer greater
flexibility to create, rotate, disable, and revoke access controls. You can also audit
the encryption keys used to protect your data. Learn more from our encryption at
rest documentation.

Lockbox support​

SOC-2 compliance​

Logging and diagnostics through Azure Resource Health, Cost Analysis, and
Metrics & Diagnostic settings​.

Studio improvements. Numerous usability improvements to the Studio workflow


including Azure AD role support to control who in the team has access to create
fine-tuned models and deploy.

Changes (breaking)
Fine-tuning create API request has been updated to match OpenAI’s schema.

Preview API versions:

JSON

{​

"training_file": "file-XGinujblHPwGLSztz8cPS8XY",​

"hyperparams": {


"batch_size": 4,​

"learning_rate_multiplier": 0.1,​

"n_epochs": 4,​

"prompt_loss_weight": 0.1,​

}​

API version 2022-12-01:

JSON

{​

"training_file": "file-XGinujblHPwGLSztz8cPS8XY",​

"batch_size": 4,​

"learning_rate_multiplier": 0.1,​

"n_epochs": 4,​

"prompt_loss_weight": 0.1,​

Content filtering is temporarily off by default. Azure content moderation works


differently than OpenAI. Azure OpenAI runs content filters during the generation call to
detect harmful or abusive content and filters them from the response. Learn More​

​These models will be re-enabled in Q1 2023 and be on by default. ​

Customer actions​

Contact Azure Support if you would like these turned on for your subscription​.
Apply for filtering modifications , if you would like to have them remain off. (This
option will be for low-risk use cases only.)​

Next steps
Learn more about the underlying models that power Azure OpenAI.
Azure OpenAI Service frequently
asked questions
FAQ

If you can't find answers to your questions in this document, and still need help check
the Cognitive Services support options guide. Azure OpenAI is part of Azure Cognitive
Services.

Data and Privacy


Do you use my company data to train any of the
models?
Azure OpenAI doesn't use customer data to retrain models. For more information, see
the Azure OpenAI data, privacy, and security guide.

General
Does Azure OpenAI support GPT-4?
Azure OpenAI supports the latest GPT-4 models. These models are currently only
available by request. For access, existing Azure OpenAI customers can apply by filling
out this form .

How do the capabilities of Azure OpenAI


compare to OpenAI?
Azure OpenAI Service gives customers advanced language AI with OpenAI GPT-3,
Codex, and DALL-E models with the security and enterprise promise of Azure. Azure
OpenAI co-develops the APIs with OpenAI, ensuring compatibility and a smooth
transition from one to the other.

With Azure OpenAI, customers get the security capabilities of Microsoft Azure while
running the same models as OpenAI.
Does Azure OpenAI support VNETs and Private
Endpoints?
Yes, as part of Azure Cognitive Services, Azure OpenAI supports VNETs and Private
Endpoints. To learn more consult the Cognitive Services virtual networking guidance

Do the GPT-4 models currently support image


input?
No, GPT-4 is designed by OpenAI to be multimodal, but currently only text input and
output are supported.

How do I apply for new use cases?


Previously, the process for adding new use cases required customers to reapply to the
service. Now, we're releasing a new process that allows you to quickly add new use
cases to your use of the service. This process follows the established Limited Access
process within Azure Cognitive Services. Existing customers can attest to any and all new
use cases here . Please note that this is required anytime you would like to use the
service for a new use case you did not originally apply for.

I am trying to use embeddings and received the


error "InvalidRequestError: Too many inputs. The
max number of inputs is 1." How do I fix this?
This error typically occurs when you try to send a batch of text to embed in a single API
request as an array. Currently Azure OpenAI does not support batching with embedding
requests. Embeddings API calls should consist of a single string input per request. The
string can be up to 8191 tokens in length when using the text-embedding-ada-002
(Version 2) model.

Where can I read about better ways to use Azure


OpenAI to get the responses I want from the
service?
Check out our introduction to prompt engineering While these models are extremely
powerful, their behavior is also very sensitive to the prompts they receive from the user.
This makes prompt construction an important skill to develop. After you've mastered
the introduction, check out our article on more advanced prompt engineering
techniques.

Getting access to Azure OpenAI Service


How do I get access to Azure OpenAI?
Access is currently limited as we navigate high demand, upcoming product
improvements, and Microsoft's commitment to responsible AI. For now, we're working
with customers with an existing partnership with Microsoft, lower risk use cases, and
those committed to incorporating mitigations. In addition to applying for initial access,
all solutions using Azure OpenAI are required to go through a use case review before
they can be released for production use. Apply here for initial access or for a production
review: Apply now

After I apply for access, how long will I have to


wait to get approved?
We don't currently provide a timeline for access approval.

Learning more and where to ask


questions
Where can I read about the latest updates to
Azure OpenAI?
For monthly updates, see our what's new page.

Where can I get training to get started learning


and build my skills around Azure OpenAI?
Check out our introduction to Azure OpenAI training course.

Where can I post questions and see answers to


other common questions?
We recommend posting questions on Microsoft Q&A
Alternatively, you can post questions on Stack Overflow

Where do I go for Azure OpenAI customer


support?
Azure OpenAI is part of Azure Cognitive Services. You can learn about all the support
options for Azure Cognitive Services in the support and help options guide.

Models and fine-tuning


What models are available?
Consult the Azure OpenAI model availability guide.

Where can I find out what region a model is


available in?
Consult the Azure OpenAI model availability guide for region availability.

What is the difference between a base model


and a fine-tuned model?
A base model is a model that hasn't been customized or fine-tuned for a specific use
case. Fine-tuned models are customized versions of base models where a model's
weights are trained on a unique set of prompts. Fine-tuned models let you achieve
better results on a wider number of tasks without needing to provide detailed examples
for in-context learning as part of your completion prompt. To learn more, review our
fine-tuning guide.

What is the maximum number of fine-tuned


models I can create?
100

What are the SLAs for API responses in Azure


OpenAI?
We don't have a defined API response time Service Level Agreement (SLA) at this time.
The overall SLA for Azure OpenAI Service is the same as for other Azure Cognitive
Services. For more information, see the Cognitive Services section of the Service Level
Agreements (SLA) for Online Services page .

Why was my fine-tuned model deployment


deleted?
If a customized (fine-tuned) model is deployed for more than fifteen (15) days during
which no completions or chat completions calls are made to it, the deployment will
automatically be deleted (and no further hosting charges will be incurred for that
deployment). The underlying customized model will remain available and can be
redeployed at any time. To learn more check out the how-to-article.

How do I deploy a model with the REST API?


There are currently two different REST APIs that allow model deployment. For the latest
model deployment features such as the ability to specify a model version during
deployment for models like text-embedding-ada-002 Version 2, use the Cognitive
Services Create or Update REST API call.

Can I use quota to increase the max token limit


of a model?
No, quota Tokens-Per-Minute (TPM) allocation is not related to the max input token
limit of a model. Model input token limits are defined in the models table and are not
impacted by changes made to TPM.

Web app
How can I customize my published web app?
You can customize your published web app in the Azure portal. The source code for the
published web app is available on GitHub , where you can find information on
changing the app frontend, as well as instructions for building and deploying the app.

Will my web app be overwritten when I deploy


the app again from the Azure AI Studio?
Your app code will not be overwritten when you update your app. The app will be
updated to use the Azure OpenAI resource, Azure Cognitive Search index (if you're
using Azure OpenAI on your data), and model settings selected in the Azure OpenAI
Studio without any change to the appearance or functionality.

Using your data


What is Azure OpenAI on your data?
Azure OpenAI on your data is a feature of the Azure OpenAI Services that helps
organizations to generate customized insights, content, and searches using their
designated data sources. It works with the capabilities of the OpenAI models in Azure
OpenAI to provide more accurate and relevant responses to user queries in natural
language. Azure OpenAI on your data can be integrated with customer's existing
applications and workflows, offers insights into key performance indicators, and can
interact with users seamlessly.

How can I access Azure OpenAI on your data?


All Azure OpenAI customers can use Azure OpenAI on your data via the Azure AI studio
and Rest API.

What data sources does Azure OpenAI on your


data support?
Azure OpenAI on your data supports ingestion from Azure Cognitive Search, Azure Blob
Storage, and uploading local files. You can learn more about Azure OpenAI on your data
from the conceptual article and quickstart.

How much does it cost to use Azure OpenAI on


your data?
When using Azure OpenAI on your data, you incur costs when you use Azure Cognitive
Search, Azure Blob Storage, Azure Web App Service, semantic search and OpenAI
models. There is no additional cost for using the "your data" feature in the Azure AI
Studio.
How can I customize or automate the index
creation process?
You can prepare the index yourself using a script provided on GitHub . Using this script
will create an Azure Cognitive Search index with all the information needed to better
leverage your data, with your documents broken down into manageable chunks. Please
see the README file with the data preparation code for details on how to run it.

How can I update my index?


You can upload additional data to your Azure Blob Container and use it as your data
source when you create a new index. The new index will include all of the data in your
container.

What file types does Azure OpenAI on your data


support?
See Using your data for more information on supported file types.

Is responsible AI supported by Azure OpenAI on


your data?
Yes, Azure OpenAI on your data is part of the Azure OpenAI service and works with the
models available in Azure OpenAI. The content filtering and abuse monitoring features
of Azure OpenAI still apply. For more information, see the overview of Responsible AI
practices for Azure OpenAI models and the Transparency Note for Azure OpenAI for
additional guidance on using Azure OpenAI on your data responsibly.

Is there a token limit on the system message?


Yes, the token limit on the system message is 200. If the system message is more than
200 tokens, the rest of the tokens beyond the first 200 will be ignored.

Do the query language and the data source


language need to be the same?
You must send queries in the same language of your data. Your data can be in any of the
languages supported by Azure Cognitive Search.
Next steps
Azure OpenAI quotas and limits
Azure OpenAI what's new
Azure OpenAI quickstarts
Quickstart: Get started generating text
using Azure OpenAI Service
Article • 05/24/2023

Use this article to get started making your first calls to Azure OpenAI.

Prerequisites
An Azure subscription - Create one for free .

Access granted to Azure OpenAI in the desired Azure subscription.

Currently, access to this service is granted only by application. You can apply for
access to Azure OpenAI by completing the form at https://aka.ms/oai/access .
Open an issue on this repo to contact us if you have an issue.

An Azure OpenAI resource with a model deployed. For more information about
model deployment, see the resource deployment guide.

I ran into an issue with the prerequisites.

Go to the Azure OpenAI Studio


Navigate to Azure OpenAI Studio at https://oai.azure.com/ and sign-in with
credentials that have access to your OpenAI resource. During or after the sign-in
workflow, select the appropriate directory, Azure subscription, and Azure OpenAI
resource.

From the Azure OpenAI Studio landing page navigate further to explore examples for
prompt completion, manage your deployments and models, and find learning resources
such as documentation and community forums.

Go to the Playground for experimentation and fine-tuning workflow.

Playground
Start exploring Azure OpenAI capabilities with a no-code approach through the GPT-3
Playground. It's simply a text box where you can submit a prompt to generate a
completion. From this page, you can quickly iterate and experiment with the capabilities.


You can select a deployment and choose from a few pre-loaded examples to get started.
If your resource doesn't have a deployment, select Create a deployment and follow the
instructions provided by the wizard. For more information about model deployment, see
the resource deployment guide.

You can experiment with the configuration settings such as temperature and pre-
response text to improve the performance of your task. You can read more about each
parameter in the REST API.

Selecting the Generate button will send the entered text to the completions API
and stream the results back to the text box.
Select the Undo button to undo the prior generation call.
Select the Regenerate button to complete an undo and generation call together.

Azure OpenAI also performs content moderation on the prompt inputs and generated
outputs. The prompts or responses may be filtered if harmful content is detected. For
more information, see the content filter article.

In the GPT-3 playground you can also view Python and curl code samples pre-filled
according to your selected settings. Just select View code next to the examples
dropdown. You can write an application to complete the same task with the OpenAI
Python SDK, curl, or other REST API client.

Try text summarization


To use the Azure OpenAI for text summarization in the GPT-3 Playground, follow these
steps:

1. Sign in to Azure OpenAI Studio .

2. Select the subscription and OpenAI resource to work with.

3. Select GPT-3 Playground at the top of the landing page.

4. Select your deployment from the Deployments dropdown. If your resource doesn't
have a deployment, select Create a deployment and then revisit this step.

5. Select Summarize Text from the Examples dropdown.


6. Select Generate . Azure OpenAI will attempt to capture the context of text and
rephrase it succinctly. You should get a result that resembles the following text:

Tl;dr A neutron star is the collapsed core of a supergiant star. These


incredibly dense objects are incredibly fascinating due to their
strange properties and their potential for phenomena such as extreme
gravitational forces and a strong magnetic field.

The accuracy of the response can vary per model. The Davinci based model in this
example is well-suited to this type of summarization, whereas a Codex based model
wouldn't perform as well at this particular task.

I ran into an issue with the playground.

Clean up resources
If you want to clean up and remove an OpenAI resource, you can delete the resource or
resource group. Deleting the resource group also deletes any other resources associated
with it.

Portal
Azure CLI

Next steps
Learn more about how to generate the best completion in our How-to guide on
completions.
For more examples check out the Azure OpenAI Samples GitHub repository .
Quickstart: Get started using ChatGPT
and GPT-4 with Azure OpenAI Service
Article • 05/24/2023

Use this article to get started using Azure OpenAI.

Prerequisites
An Azure subscription - Create one for free .

Access granted to Azure OpenAI in the desired Azure subscription.

Currently, access to this service is granted only by application. You can apply for
access to Azure OpenAI by completing the form at https://aka.ms/oai/access .
Open an issue on this repo to contact us if you have an issue.

An Azure OpenAI Service resource with either the gpt-35-turbo or the gpt-4 1
models deployed. For more information about model deployment, see the
resource deployment guide.

1
GPT-4 models are currently only available by request. To access these models,
existing Azure OpenAI customers can apply for access by filling out this form .

I ran into an issue with the prerequisites.

Go to Azure OpenAI Studio


Navigate to Azure OpenAI Studio at https://oai.azure.com/ and sign-in with
credentials that have access to your OpenAI resource. During or after the sign-in
workflow, select the appropriate directory, Azure subscription, and Azure OpenAI
resource.

From the Azure OpenAI Studio landing page, select Chat playground.

Playground
Start exploring OpenAI capabilities with a no-code approach through the Azure OpenAI
Studio Chat playground. From this page, you can quickly iterate and experiment with the
capabilities.

Assistant setup
You can use the Assistant setup dropdown to select a few pre-loaded System message
examples to get started.

System messages give the model instructions about how it should behave and any
context it should reference when generating a response. You can describe the assistant's
personality, tell it what it should and shouldn't answer, and tell it how to format
responses.

Add few-shot examples allows you to provide conversational examples that are used by
the model for in-context learning.

At any time while using the Chat playground you can select View code to see Python,
curl, and json code samples pre-populated based on your current chat session and
settings selections. You can then take this code and write an application to complete the
same task you're currently performing with the playground.

Chat session
Selecting the Send button sends the entered text to the completions API and the results
are returned back to the text box.

Select the Clear chat button to delete the current conversation history.

Settings

Name Description

Deployments Your deployment name that is associated with a specific model. For ChatGPT, you
need to use the gpt-35-turbo model.

Temperature Controls randomness. Lowering the temperature means that the model produces
more repetitive and deterministic responses. Increasing the temperature results in
more unexpected or creative responses. Try adjusting temperature or Top P but
not both.

Max length Set a limit on the number of tokens per model response. The API supports a
(tokens) maximum of 4096 tokens shared between the prompt (including system message,
examples, message history, and user query) and the model's response. One token
is roughly four characters for typical English text.

Top Similar to temperature, this controls randomness but uses a different method.
probabilities Lowering Top P narrows the model’s token selection to likelier tokens. Increasing
Top P lets the model choose from tokens with both high and low likelihood. Try
adjusting temperature or Top P but not both.
Name Description

Multi-turn Select the number of past messages to include in each new API request. This
conversations helps give the model context for new user queries. Setting this number to 10
results in five user queries and five system responses.

Stop Stop sequence make the model end its response at a desired point. The model
sequences response ends before the specified sequence, so it won't contain the stop
sequence text. For ChatGPT, using <|im_end|> ensures that the model response
doesn't generate a follow-up user query. You can include as many as four stop
sequences.

Show panels
By default there are three panels: assistant setup, chat session, and settings. Show
panels allows you to add, remove, and rearrange the panels. If you ever close a panel
and need to get it back, use Show panels to restore the lost panel.

Start a chat session


1. From the assistant setup drop-down, select Xbox customer support agent

2. You'll be prompted asking if you want to update the system message, select
Continue.

3. In the chat session pane, enter the following question: "I'm interested in buying a
new Xbox", and select Send.

4. You'll receive a response similar to:

5. Enter a follow-up question like: "which models support 4K?"


6. Now that you have a basic conversation select View code from under Assistant
setup and you'll have a replay of the code behind the entire conversation so far:

Python

#Note: The openai-python library support for Azure OpenAI is in preview.

import os

import openai

openai.api_type = "azure"

openai.api_base = "https://docs-test-001.openai.azure.com/"

openai.api_version = "2023-05-15"
openai.api_key = os.getenv("OPENAI_API_KEY")

response = openai.ChatCompletion.create(

engine="gpt-35-turbo", # replace this value with the deployment name you


chose when you deployed the associated model.

messages = [{"role":"system","content":"You are an Xbox customer support


agent whose primary goal is to help users with issues they are experiencing
with their Xbox devices. You are friendly and concise. You only provide
factual answers to queries, and do not provide answers that are not related
to Xbox."},{"role":"user","content":"How much is a PS5?"},
{"role":"assistant","content":"I apologize, but I do not have information
about the prices of other gaming devices such as the PS5. My primary focus
is to assist with issues regarding Xbox devices. Is there a specific issue
you are having with your Xbox device that I may be able to help with?"},
{"role":"user","content":"I am interested in buying an Xbox?"},
{"role":"assistant","content":"Great! There are several Xbox models
available in the market, including the Xbox Series X, Xbox Series S, and
Xbox One. The Xbox Series X is the latest and most powerful console, while
the Xbox Series S is a more affordable option that still offers great
performance. The Xbox One is the previous generation console, but it still
has a large library of games available to play.\n\nBefore purchasing an
Xbox, you may want to consider factors such as your budget, the types of
games you want to play, and whether you want to play games in 4K resolution.
Once you have decided on a console, you can purchase it from a retailer or
directly from the Xbox website. Let me know if you have any other questions
or concerns!"},{"role":"user","content":"Which models are 4K?"},
{"role":"assistant","content":"Both the Xbox Series X and Xbox One X support
4K resolution gaming. The Xbox Series S, on the other hand, supports up to
1440p resolution gaming, but it can upscale games to 4K. It's worth noting
that in order to take advantage of 4K gaming, you'll need a 4K TV or monitor
that supports HDR (High Dynamic Range) and HDMI 2.0 or higher."}],
temperature=0,

max_tokens=350,

top_p=0.95,

frequency_penalty=0,

presence_penalty=0,

stop=None)

I ran into an issue with the playground.

Understanding the prompt structure


If you examine the sample from View code you'll notice some unique tokens that
weren't part of a typical GPT completion call. ChatGPT was trained to use special tokens
to delineate different parts of the prompt. Content is provided to the model in between
<|im_start|> and <|im_end|> tokens. The prompt begins with a system message that
can be used to prime the model by including context or instructions for the model. After
that, the prompt contains a series of messages between the user and the assistant.

The assistant's response to the prompt will then be returned below the
<|im_start|>assistant token and will end with <|im_end|> denoting that the assistant

has finished its response. You can also use the Show raw syntax toggle button to
display these tokens within the chat session panel.

The ChatGPT how-to guide provides an in-depth introduction into the new prompt
structure and how to use the gpt-35-turbo model effectively.

Deploy a web app


Once you're satisfied with the experience in Azure OpenAI studio, you can deploy a web
app directly from the
Studio by selecting the Deploy to button.

The first time you deploy a web app, you should select Create a new web app. Choose a
name for the app, which will
become part of the app URL. For example,
https://<appname>.azurewebsites.net .

Select your subscription, resource group, location, and pricing plan for the published
app. To
update an existing app, select Publish to an existing web app and choose the
name of your previous
app from the dropdown menu.

I ran into an issue with the web app.

Clean up resources
Once you're done testing out the Chat playground, if you want to clean up and remove
an OpenAI resource, you can delete the resource or resource group. Deleting the
resource group also deletes any other resources associated with it.

Portal
Azure CLI

Next steps
Learn more about how to work with ChatGPT and the new gpt-35-turbo model
with the ChatGPT how-to guide.
For more examples check out the Azure OpenAI Samples GitHub repository
Quickstart: Get started generating
images using Azure OpenAI Service
Article • 06/12/2023

Use this guide to get started generating images with Azure OpenAI in your browser.

7 Note

The image generation API creates an image from a text prompt. It does not edit
existing images or create variations.

Prerequisites
An Azure subscription - Create one for free .
Access granted to DALL-E in the desired Azure subscription.
Currently, access to
this service is granted only by application. You can apply for access to Azure
OpenAI by completing the form at https://aka.ms/oai/access . Existing Azure
OpenAI customers need to re-enter the form to get access to DALL-E. Open an
issue on this repo to contact us if you have an issue.
An Azure OpenAI resource created in the East US region. For more information
about model deployment, see the resource deployment guide.

Go to the Azure OpenAI Studio


Navigate to Azure OpenAI Studio at https://oai.azure.com/ and sign in with the
credentials associated with your OpenAI resource. During or after the sign-in workflow,
select the appropriate directory, Azure subscription, and Azure OpenAI resource.

From the Azure OpenAI Studio landing page, select DALL·E 2 playground to use the
image generation APIs.

Try out image generation


Start exploring Azure OpenAI capabilities with a no-code approach through the DALL-E
2 playground. Enter your image prompt into the text box and select Generate. When the
AI-generated image is ready, it will appear on the page.
7 Note

The image generation APIs come with a content moderation filter. If the service
recognizes your prompt as harmful content, it won't return a generated image. For
more information, see the content filter guide.

In the DALL-E 2 playground, you can also view Python and cURL code samples, which
are pre-filled according to your settings. Select View code near the top of the page. You
can use this code to write an application that completes the same task.

Clean up resources
If you want to clean up and remove an OpenAI resource, you can delete the resource or
resource group. Deleting the resource group also deletes any other resources associated
with it.

Portal
Azure CLI

Next steps
Azure OpenAI Overview
For more examples check out the Azure OpenAI Samples GitHub repository .
Quickstart: Chat with Azure OpenAI
models using your own data
Article • 06/28/2023

In this quickstart you can use your own data with Azure OpenAI models. Using Azure
OpenAI's models on your data can provide you with a powerful conversational AI
platform that enables faster and more accurate communication.

Prerequisites
An Azure subscription - Create one for free .

Access granted to Azure OpenAI in the desired Azure subscription.

Azure OpenAI requires registration and is currently only available to approved


enterprise customers and partners. See Limited access to Azure OpenAI Service for
more information. You can apply for access to Azure OpenAI by completing the
form at https://aka.ms/oai/access . Open an issue on this repo to contact us if
you have an issue.

An Azure OpenAI resource with a chat model deployed (for example, GPT-3 or
GPT-4). For more information about model deployment, see the resource
deployment guide.

I ran into an issue with the prerequisites.

Add your data using Azure OpenAI Studio


Navigate to Azure OpenAI Studio and sign-in with credentials that have access to your
Azure OpenAI resource. During or after the sign-in workflow, select the appropriate
directory, Azure subscription, and Azure OpenAI resource.

1. Select the Chat playground tile.


2. On the Assistant setup tile, select Add your data (preview) > + Add a data
source.

3. In the pane that appears, select Upload files under Select data source. Select
Upload files. Azure OpenAI needs both a storage resource and a search resource
to access and index your data.

 Tip

For a list of supported data sources, see Data source options


For documents and datasets with long text, we recommend using the
available data preparation script.

a. For Azure OpenAI to access your storage account, you will need to turn on
Cross-origin resource sharing (CORS) . If CORS isn't already turned on for the
Azure Blob storage resource, select Turn on CORS.
b. Select your Azure Cognitive Search resource, and select the acknowledgment
that connecting it will incur usage on your account. Then select Next.

4. On the Upload files pane, select Browse for a file and select the files you want to
upload. Then select Upload files. Then select Next.

5. Review the details you entered, and select Save and close. You can now chat with
the model and it will use information from your data to construct the response.

I ran into an issue adding my data.

Chat playground
Start exploring Azure OpenAI capabilities with a no-code approach through the chat
playground. It's simply a text box where you can submit a prompt to generate a
completion. From this page, you can quickly iterate and experiment with the capabilities.

You can experiment with the configuration settings such as temperature and pre-
response text to improve the performance of your task. You can read more about each
parameter in the REST API.

Selecting the Generate button will send the entered text to the completions API
and stream the results back to the text box.
Select the Undo button to undo the prior generation call.
Select the Regenerate button to complete an undo and generation call together.

Deploy a web app


Once you're satisfied with the experience in Azure OpenAI studio, you can deploy a web
app directly from the
Studio by selecting the Deploy to button.


The first time you deploy a web app, you should select Create a new web app. Choose a
name for the app, which will
become part of the app URL. For example,
https://<appname>.azurewebsites.net .

Select your subscription, resource group, location, and pricing plan for the published
app. To
update an existing app, select Publish to an existing web app and choose the
name of your previous
app from the dropdown menu.

I ran into an issue with the web app.

Important considerations
Publishing creates an Azure App Service in your subscription. It may incur costs
depending on the
pricing plan you select. When you're done with your app, you
can delete it from the Azure portal.

By default, the app will only be accessible to you. To add authentication (for
example, restrict access to the app to members of your Azure tenant):

1. Go to the Azure portal and search for the app name you specified during
publishing. Select the web app, and go to the Authentication tab on the left
navigation menu. Then select Add an identity provider.

2. Select Microsoft as the identity provider. The default settings on this page will
restrict the app to your tenant only, so you don't need to change anything
else here. Then select Add
Now users will be asked to sign in with their Azure Active Directory account to be
able to access your app. You can follow a similar process to add another identity
provider if you prefer. The app doesn't use the user's login information in any
other way other than verifying they are a member of your tenant.

Clean up resources
If you want to clean up and remove an OpenAI or Azure Cognitive Search resource, you
can delete the resource or resource group. Deleting the resource group also deletes any
other resources associated with it.

Cognitive Services resources


Azure Cognitive Search resources
Azure app service resources

Next steps
Learn more about using your data in Azure OpenAI Service
Chat app sample code on GitHub .
Abuse Monitoring
Article • 06/19/2023

Azure OpenAI Service detects and mitigates instances of recurring content and/or
behaviors that suggest use of the service in a manner that may violate the Code of
Conduct or other applicable product terms. Details on how data is handled can be
found on the Data, Privacy and Security page.

Components of abuse monitoring


There are several components to abuse monitoring:

Content Classification: Classifier models detect harmful language and/or images in


user prompts (inputs) and completions (outputs). The system looks for categories
of harms as defined in the Content Requirements, and assigns severity levels as
described in more detail on the Content Filtering page.

Abuse Pattern Capture: Azure OpenAI Service’s abuse monitoring looks at


customer usage patterns and employs algorithms and heuristics to detect
indicators of potential abuse. Detected patterns consider, for example, the
frequency and severity at which harmful content is detected in a customer’s
prompts and completions.

Human Review and Decision: When prompts and/or completions are flagged
through content classification and abuse pattern capture as described above,
authorized Microsoft employees may assess the flagged content, and either
confirm or correct the classification or determination based on predefined
guidelines and policies. Data can be accessed for human review only by authorized
Microsoft employees via Secure Access Workstations (SAWs) with Just-In-Time (JIT)
request approval granted by team managers. For Azure OpenAI Service resources
deployed in the European Economic Area, the authorized Microsoft employees are
located in the European Economic Area.

Notification and Action: When a threshold of abusive behavior has been


confirmed based on the preceding three steps, the customer is informed of the
determination by email. Except in cases of severe or recurring abuse, customers
typically are given an opportunity to explain or remediate—and implement
mechanisms to prevent recurrence of—the abusive behavior. Failure to address the
behavior—or recurring or severe abuse—may result in suspension or termination
of the customer’s access to Azure OpenAI resources and/or capabilities.
Next steps
Learn more about the underlying models that power Azure OpenAI.
Learn more about understanding and mitigating risks associated with your
application: Overview of Responsible AI practices for Azure OpenAI models.
Learn more about how data is processed in connection with content filtering and
abuse monitoring: Data, privacy, and security for Azure OpenAI Service.
Content filtering
Article • 06/09/2023

Azure OpenAI Service includes a content filtering system that works alongside core
models. This system works by running both the prompt and completion through an
ensemble of classification models aimed at detecting and preventing the output of
harmful content. The content filtering system detects and takes action on specific
categories of potentially harmful content in both input prompts and output
completions. Variations in API configurations and application design may affect
completions and thus filtering behavior. The content filtering system supports the
following languages: English, German, Japanese, Spanish, French, Italian, Portuguese,
and Chinese. It might not be able to detect inappropriate content in languages that it
has not been trained or tested to process.

In addition to the content filtering system, the Azure OpenAI Service performs
monitoring to detect content and/or behaviors that suggest use of the service in a
manner that may violate applicable product terms. For more information about
understanding and mitigating risks associated with your application, see the
Transparency Note for Azure OpenAI. For more information about how data is
processed in connection with content filtering and abuse monitoring, see Data, privacy,
and security for Azure OpenAI Service.

The following sections provide information about the content filtering categories, the
filtering severity levels and their configurability, and API scenarios to be considered in
application design and implementation.

Content filtering categories


The content filtering system integrated in the Azure OpenAI Service contains neural
multi-class classification models aimed at detecting and filtering harmful content; the
models cover four categories (hate, sexual, violence, and self-harm) across four severity
levels (safe, low, medium, and high). Content detected at the 'safe' severity level is
labeled in annotations but is not subject to filtering and is not configurable.

Categories

Category Description
Category Description

Hate The hate category describes language attacks or uses that include pejorative or
discriminatory language with reference to a person or identity group on the basis of
certain differentiating attributes of these groups including but not limited to race,
ethnicity, nationality, gender identity and expression, sexual orientation, religion,
immigration status, ability status, personal appearance, and body size.

Sexual The sexual category describes language related to anatomical organs and genitals,
romantic relationships, acts portrayed in erotic or affectionate terms, physical sexual
acts, including those portrayed as an assault or a forced sexual violent act against
one’s will, prostitution, pornography, and abuse.

Violence The violence category describes language related to physical actions intended to
hurt, injure, damage, or kill someone or something; describes weapons, etc.

Self- The self-harm category describes language related to physical actions intended to
Harm purposely hurt, injure, or damage one’s body, or kill oneself.

Severity levels

Category Description

Safe Content may be related to violence, self-harm, sexual, or hate categories but the
terms are used in general, journalistic, scientific, medical, and similar professional
contexts, which are appropriate for most audiences.

Low Content that expresses prejudiced, judgmental, or opinionated views, includes


offensive use of language, stereotyping, use cases exploring a fictional world (for
example, gaming, literature) and depictions at low intensity.

Medium Content that uses offensive, insulting, mocking, intimidating, or demeaning language
towards specific identity groups, includes depictions of seeking and executing
harmful instructions, fantasies, glorification, promotion of harm at medium intensity.

High Content that displays explicit and severe harmful instructions, actions, damage, or
abuse; includes endorsement, glorification, or promotion of severe harmful acts,
extreme or illegal forms of harm, radicalization, or non-consensual power exchange
or abuse.

Configurability (preview)
The default content filtering configuration is set to filter at the medium severity
threshold for all four content harm categories for both prompts and completions. That
means that content that is detected at severity level medium or high is filtered, while
content detected at severity level low is not filtered by the content filters. The
configurability feature is available in preview and allows customers to adjust the
settings, separately for prompts and completions, to filter content for each content
category at different severity levels as described in the table below:

Severity Configurable Configurable Descriptions


filtered for prompts for
completions

Low, Yes Yes Strictest filtering configuration. Content detected at


medium, severity levels low, medium and high is filtered.
high

Medium, Yes Yes Default setting. Content detected at severity level low
high is not filtered, content at medium and high is filtered.

High If approved* If approved* Content detected at severity levels low and medium
is not filtered. Only content at severity level high is
filtered. Requires approval*.

No If approved* If approved* No content is filtered regardless of severity level


filters detected. Requires approval*.

*
Only customers who have been approved for modified content filtering have full
content filtering control, including configuring content filters at severity level high only
or turning content filters off. Apply for modified content filters via this form: Azure
OpenAI Limited Access Review: Modified Content Filters and Abuse Monitoring
(microsoft.com)

Content filtering configurations are created within a Resource in Azure AI Studio, and
can be associated with Deployments. Learn more about configurability here.

Scenario details
When the content filtering system detects harmful content, you'll receive either an error
on the API call if the prompt was deemed inappropriate or the finish_reason on the
response will be content_filter to signify that some of the completion was filtered.
When building your application or system, you'll want to account for these scenarios
where the content returned by the Completions API is filtered, which may result in
content that is incomplete. How you act on this information will be application specific.
The behavior can be summarized in the following points:

Prompts that are classified at a filtered category and severity level will return an
HTTP 400 error.
Non-streaming completions calls won't return any content when the content is
filtered. The finish_reason value will be set to content_filter. In rare cases with
longer responses, a partial result can be returned. In these cases, the
finish_reason will be updated.

For streaming completions calls, segments will be returned back to the user as
they're completed. The service will continue streaming until either reaching a stop
token, length, or when content that is classified at a filtered category and severity
level is detected.
Scenario: You send a non-streaming completions call
asking for multiple outputs; no content is classified at a
filtered category and severity level
The table below outlines the various ways content filtering can appear:

HTTP Response behavior


response
code

200 In the cases when all generation passes the filters as configured, no content
moderation details are added to the response. The finish_reason for each generation
will be either stop or length.

Example request payload:

JSON

"prompt":"Text example",

"n": 3,

"stream": false

Example response JSON:

JSON

"id": "example-id",
"object": "text_completion",

"created": 1653666286,

"model": "davinci",
"choices": [

"text": "Response generated text",

"index": 0,
"finish_reason": "stop",

"logprobs": null

Scenario: Your API call asks for multiple responses (N>1)


and at least 1 of the responses is filtered
HTTP Response Response behavior
Code

200 The generations that were filtered will have a finish_reason value of
content_filter .

Example request payload:

JSON

"prompt":"Text example",

"n": 3,

"stream": false

Example response JSON:

JSON

"id": "example",

"object": "text_completion",

"created": 1653666831,

"model": "ada",

"choices": [

"text": "returned text 1",

"index": 0,
"finish_reason": "length",

"logprobs": null

},

"text": "returned text 2",

"index": 1,
"finish_reason": "content_filter",

"logprobs": null

Scenario: An inappropriate input prompt is sent to the


completions API (either for streaming or non-streaming)

HTTP Response behavior


Response
Code
HTTP Response behavior
Response
Code

400 The API call will fail when the prompt triggers a content filter as configured.
Modify the prompt and try again.

Example request payload:

JSON

"prompt":"Content that triggered the filtering model"

Example response JSON:

JSON

"error": {

"message": "The response was filtered",

"type": null,

"param": "prompt",

"code": "content_filter",

"status": 400

Scenario: You make a streaming completions call; no


output content is classified at a filtered category and
severity level

HTTP Response behavior


Response
Code

200 In this case, the call will stream back with the full generation and finish_reason will
be either 'length' or 'stop' for each generated response.

Example request payload:

JSON

"prompt":"Text example",

"n": 3,

"stream": true

Example response JSON:

JSON

"id": "cmpl-example",

"object": "text_completion",

"created": 1653670914,

"model": "ada",

"choices": [

"text": "last part of generation",

"index": 2,
"finish_reason": "stop",

"logprobs": null

Scenario: You make a streaming completions call asking


for multiple completions and at least a portion of the
output content is filtered

HTTP Response behavior


Response
Code

200 For a given generation index, the last chunk of the generation will include a non-null
finish_reason value. The value will be content_filter when the generation was
filtered.

Example request payload:

JSON

"prompt":"Text example",

"n": 3,

"stream": true

Example response JSON:


JSON

"id": "cmpl-example",

"object": "text_completion",

"created": 1653670515,

"model": "ada",

"choices": [

"text": "Last part of generated text streamed back",

"index": 2,
"finish_reason": "content_filter",

"logprobs": null

Scenario: Content filtering system doesn't run on the


completion

HTTP Response behavior


Response
Code

200 If the content filtering system is down or otherwise unable to complete the operation
in time, your request will still complete without content filtering. You can determine
that the filtering wasn't applied by looking for an error message in the
content_filter_result object.

Example request payload:

JSON

"prompt":"Text example",

"n": 1,

"stream": false

Example response JSON:

JSON

"id": "cmpl-example",

"object": "text_completion",

"created": 1652294703,

"model": "ada",

"choices": [

"text": "generated text",

"index": 0,
"finish_reason": "length",

"logprobs": null,

"content_filter_result": {

"error": {

"code": "content_filter_error",

"message": "The contents are not filtered"

Annotations (preview)
When annotations are enabled as shown in the code snippet below, the following
information is returned via the API: content filtering category (hate, sexual, violence,
self-harm); within each content filtering category, the severity level (safe, low, medium or
high); filtering status (true or false).

Annotations are currently in preview for Completions and Chat Completions (GPT
models); the following code snippet shows how to use annotations in preview:

Python

# Note: The openai-python library support for Azure OpenAI is in preview.

# os.getenv() for the endpoint and key assumes that you are using
environment variables.

import os

import openai

openai.api_type = "azure"

openai.api_base = os.getenv("AZURE_OPENAI_ENDPOINT")

openai.api_version = "2023-06-01-preview" # API version required to test out


Annotations preview

openai.api_key = os.getenv("AZURE_OPENAI_KEY")

response = openai.Completion.create(

engine="text-davinci-003", # engine = "deployment_name".

prompt="{Example prompt where a severity level of low is detected}"

# Content that is detected at severity level medium or high is filtered,

# while content detected at severity level low isn't filtered by the


content filters.

print(response)

Output
JSON

"choices": [

"content_filter_results": {

"hate": {

"filtered": false,

"severity": "safe"

},

"self_harm": {

"filtered": false,

"severity": "safe"

},

"sexual": {

"filtered": false,

"severity": "safe"

},

"violence": {

"filtered": false,

"severity": "low"

},

"finish_reason": "length",

"index": 0,

"logprobs": null,
"text": {"\")(\"Example model response will be returned\").}"

],

"created": 1685727831,

"id": "cmpl-7N36VZAVBMJtxycrmiHZ12aK76a6v",

"model": "text-davinci-003",

"object": "text_completion",

"prompt_annotations": [

"content_filter_results": {

"hate": {

"filtered": false,

"severity": "safe"

},

"self_harm": {

"filtered": false,

"severity": "safe"

},

"sexual": {

"filtered": false,

"severity": "safe"

},

"violence": {

"filtered": false,

"severity": "safe"

},

"prompt_index": 0
}

],

"usage": {

"completion_tokens": 16,

"prompt_tokens": 5,
"total_tokens": 21

The following code snippet shows how to retrieve annotations when content was
filtered:

Python

# Note: The openai-python library support for Azure OpenAI is in preview.

# os.getenv() for the endpoint and key assumes that you are using
environment variables.

import os

import openai

openai.api_type = "azure"

openai.api_base = os.getenv("AZURE_OPENAI_ENDPOINT")

openai.api_version = "2023-06-01-preview" # API version required to test out


Annotations preview

openai.api_key = os.getenv("AZURE_OPENAI_KEY")

try:

response = openai.Completion.create(

prompt="<PROMPT>",

engine="<MODEL_DEPLOYMENT_NAME>",

print(response)

except openai.error.InvalidRequestError as e:

if e.error.code == "content_filter" and e.error.innererror:

content_filter_result = e.error.innererror.content_filter_result

# print the formatted JSON

print(content_filter_result)

# or access the individual categories and details

for category, details in content_filter_result.items():

print(f"{category}:\n filtered={details['filtered']}\n severity=


{details['severity']}")

For details on the inference REST API endpoints for Azure OpenAI and how to create
Chat and Completions please follow Azure OpenAI Service REST API reference guidance.
Annotations are returned for all scenarios when using 2023-06-01-preview .

Example scenario: An input prompt containing content


that is classified at a filtered category and severity level is
sent to the completions API
JSON

"error": {

"message": "The response was filtered due to the prompt triggering


Azure Content management policy.
Please modify your prompt and retry. To learn more about
our content filtering policies

please read our documentation:


https://go.microsoft.com/fwlink/?linkid=21298766",

"type": null,

"param": "prompt",

"code": "content_filter",

"status": 400,

"innererror": {

"code": "ResponsibleAIPolicyViolation",

"content_filter_result": {

"hate": {

"filtered": true,

"severity": "high"

},

"self-harm": {

"filtered": true,

"severity": "high"

},

"sexual": {

"filtered": false,

"severity": "safe"

},

"violence": {

"filtered":true,

"severity": "medium"

Best practices
As part of your application design, consider the following best practices to deliver a
positive experience with your application while minimizing potential harms:
Decide how you want to handle scenarios where your users send prompts
containing content that is classified at a filtered category and severity level or
otherwise misuse your application.
Check the finish_reason to see if a completion is filtered.
Check that there's no error object in the content_filter_result (indicating that
content filters didn't run).

Next steps
Learn more about the underlying models that power Azure OpenAI.
Apply for modified content filters via this form .
Azure OpenAI content filtering is powered by Azure AI Content Safety .
Learn more about understanding and mitigating risks associated with your
application: Overview of Responsible AI practices for Azure OpenAI models.
Learn more about how data is processed in connection with content filtering and
abuse monitoring: Data, privacy, and security for Azure OpenAI Service.
Understanding embeddings in Azure
OpenAI Service
Article • 05/10/2023

An embedding is a special format of data representation that can be easily utilized by


machine learning models and algorithms. The embedding is an information dense
representation of the semantic meaning of a piece of text. Each embedding is a vector
of floating-point numbers, such that the distance between two embeddings in the
vector space is correlated with semantic similarity between two inputs in the original
format. For example, if two texts are similar, then their vector representations should
also be similar.

Embedding models
Different Azure OpenAI embedding models are specifically created to be good at a
particular task. Similarity embeddings are good at capturing semantic similarity
between two or more pieces of text. Text search embeddings help measure whether
long documents are relevant to a short query. Code search embeddings are useful for
embedding code snippets and embedding natural language search queries.

Embeddings make it easier to do machine learning on large inputs representing words


by capturing the semantic similarities in a vector space. Therefore, we can use
embeddings to determine if two text chunks are semantically related or similar, and
provide a score to assess similarity.

Cosine similarity
Azure OpenAI embeddings rely on cosine similarity to compute similarity between
documents and a query.

From a mathematic perspective, cosine similarity measures the cosine of the angle
between two vectors projected in a multi-dimensional space. This is beneficial because if
two documents are far apart by Euclidean distance because of size, they could still have
a smaller angle between them and therefore higher cosine similarity. For more
information about cosine similarity equations, see this article on Wikipedia .

An alternative method of identifying similar documents is to count the number of


common words between documents. Unfortunately, this approach doesn't scale since an
expansion in document size is likely to lead to a greater number of common words
detected even among completely disparate topics. For this reason, cosine similarity can
offer a more effective alternative.

Next steps
Learn more about using Azure OpenAI and embeddings to perform document search
with our embeddings tutorial.
Introduction to red teaming large
language models (LLMs)
Article • 05/19/2023

The term red teaming has historically described systematic adversarial attacks for testing
security vulnerabilities. With the rise of LLMs, the term has extended beyond traditional
cybersecurity and evolved in common usage to describe many kinds of probing, testing,
and attacking of AI systems. With LLMs, both benign and adversarial usage can produce
potentially harmful outputs, which can take many forms, including harmful content such
as hate speech, incitement or glorification of violence, or sexual content.

Red teaming is an essential practice in the responsible development of systems and


features using LLMs. While not a replacement for systematic measurement and
mitigation work, red teamers help to uncover and identify harms and, in turn, enable
measurement strategies to validate the effectiveness of mitigations.

Microsoft has conducted red teaming exercises and implemented safety systems
(including content filters and other mitigation strategies) for its Azure OpenAI Service
models (see this Responsible AI Overview). However, the context of your LLM
application will be unique and you also should conduct red teaming to:

Test the LLM base model and determine whether there are gaps in the existing
safety systems, given the context of your application system.
Identify and mitigate shortcomings in the existing default filters or mitigation
strategies.
Provide feedback on failures so we can make improvements.

Here is how you can get started in your process of red teaming LLMs. Advance planning
is critical to a productive red teaming exercise.

Getting started

Managing your red team


Assemble a diverse group of red teamers.

LLM red teamers should be a mix of people with diverse social and professional
backgrounds, demographic groups, and interdisciplinary expertise that fits the
deployment context of your AI system. For example, if you’re designing a chatbot to
help health care providers, medical experts can help identify risks in that domain.
Recruit red teamers with both benign and adversarial mindsets.

Having red teamers with an adversarial mindset and security-testing experience is


essential for understanding security risks, but red teamers who are ordinary users of
your application system and haven’t been involved in its development can bring
valuable perspectives on harms that regular users might encounter.

Remember that handling potentially harmful content can be mentally taxing.

You will need to take care of your red teamers, not only by limiting the amount of time
they spend on an assignment, but also by letting them know they can opt out at any
time. Also, avoid burnout by switching red teamers’ assignments to different focus
areas.

Planning your red teaming

Where to test
Because a system is developed using a LLM base model, you may need to test at several
different layers:

The LLM base model with its safety system in place to identify any gaps that may
need to be addressed in the context of your application system. (Testing is usually
through an API endpoint.)
Your application system. (Testing is usually through a UI.)
Both the LLM base model and your application system before and after mitigations
are in place.

How to test
Consider conducting iterative red teaming in at least two phases:

1. Open-ended red teaming, where red teamers are encouraged to discover a variety
of harms. This can help you develop a taxonomy of harms to guide further testing.
Note that developing a taxonomy of undesired LLM outputs for your application
system is crucial to being able to measure the success of specific mitigation efforts.
2. Guided red teaming, where red teamers are assigned to focus on specific harms
listed in the taxonomy while staying alert for any new harms that may emerge. Red
teamers can also be instructed to focus testing on specific features of a system for
surfacing potential harms.

Be sure to:
Provide your red teamers with clear instructions for what harms or system features
they will be testing.
Give your red teamers a place for recording their findings. For example, this could
be a simple spreadsheet specifying the types of data that red teamers should
provide, including basics such as:
The type of harm that was surfaced.
The input prompt that triggered the output.
An excerpt from the problematic output.
Comments about why the red teamer considered the output problematic.
Maximize the effort of responsible AI red teamers who have expertise for testing
specific types of harms or undesired outputs. For example, have security subject
matter experts focus on jailbreaks, metaprompt extraction, and content related to
aiding cyberattacks.

Reporting red teaming findings


You will want to summarize and report red teaming top findings at regular intervals to
key stakeholders, including teams involved in the measurement and mitigation of LLM
failures so that the findings can inform critical decision making and prioritizations.

Next steps
Learn about other mitigation strategies like prompt engineering
Introduction to prompt engineering
Article • 03/21/2023

GPT-3, GPT-3.5, and GPT-4 models from OpenAI are prompt-based. With prompt-based
models, the user interacts with the model by entering a text prompt, to which the model
responds with a text completion. This completion is the model’s continuation of the
input text.

While these models are extremely powerful, their behavior is also very sensitive to the
prompt. This makes prompt construction an important skill to develop.

Prompt construction can be difficult. In practice, the prompt acts to configure the model
weights to complete the desired task, but it's more of an art than a science, often
requiring experience and intuition to craft a successful prompt. The goal of this article is
to help get you started with this learning process. It attempts to capture general
concepts and patterns that apply to all GPT models. However it's important to
understand that each model behaves differently, so the learnings may not apply equally
to all models.

Basics
This section covers the basic concepts and elements of GPT prompts.

Text prompts are how users interact with GPT models. As with all generative language
models, GPT models attempt to produce the next series of words that are most likely to
follow from the previous text. It's as if we're saying What is the first thing that comes to
your mind when I say <prompt> ?”. The examples below demonstrate this behavior. Given
the first words of famous content, the model is able to accurately continue the text.

Prompt Completion

Four score fathers brought forth on this continent, a new nation, conceived in Liberty, and
and seven dedicated to the proposition that all men are created equal. […]
years ago
our

"Call me "Some years ago—never mind how long precisely—having little o no money in my
Ishmael." purse, and nothing particular to interest me on shore, I thought I would sail about
a little and see the watery part of the world." […]

As you develop more complex prompts, it's helpful to keep this fundamental behavior in
mind. Regardless of the prompt that is provided, the model is simply responding with
what it determines is most likely (given its training data and training targets). If you ask
a question in your prompt, the model isn't following a separate “Q&A” code path, but
rather it appears to answer the question because an answer is the most likely sort of
response for the given question as input.

Prompt components
When using the Completion API while there's no differentiation between different parts
of the prompt, it can still be useful for learning and discussion to identify underlying
prompt components. With the Chat Completion API there are distinct sections of the
prompt that are sent to the API in the form of an array of dictionaries with associated
roles: system, user, and assistant. This guidance focuses more generally on how to think
about prompt construction rather than providing prescriptive guidance that is specific to
one API over another.

It's also important to understand that while there could be other valid ways to dissect
prompts, the goal of this breakdown is to provide a relatively simple way to think about
prompt construction. With the Completion API, all the components are optional, but at
least one must be present and most prompts include more than one component. There
can be some grey area between components as well. The order presented below
roughly corresponds to how commonly each component is used, from most to least.

Instructions
Instructions are likely the most commonly used prompt component. Instructions are
simply that: instructions to the model on what to do. While simple in concept, they can
be complex in practice. The following table illustrates this by showing both simple and
more complex instructions, in two different forms.

Instruction Prompt Completion


Type

Simple Write an introduction for Welcome to our weekly newsletter. This week we have
instructions a weekly newsletter. a special focus on productivity. We'll be sharing tips
and tricks on how to be productive in all areas of your
life. We hope you enjoy and please share with your
friends!
Instruction Prompt Completion
Type

Complex Write an introduction for Welcome to Contoso's weekly newsletter! We had a


instructions a weekly newsletter to great all hands meeting last week and I want to thank
my company named the team for all their hard work over the tough past
Contoso. It should few months. We're looking forward to a bright future
mention the great all and I'm confident that with our team's dedication and
hands meeting we had commitment, we will achieve great things. Thank you
last week and thank the for being a part of our team and I look forward to
team for all their hard working with you all in the coming weeks and months.
work over the tough past Sincerely, [Your name].
few months.

Complex Write an introduction for Hello Contoso team! I wanted to take a moment to
Instructions a weekly newsletter, thank you for all your hard work over the past few
including the following:
months. It's been a tough couple of months, but you've
- My company is named all pulled together and done an amazing job. I'm really
contoso
proud of you all. Looking ahead to the next quarter, I'm
- An exuberant greeting
feeling really positive. I think we're going to achieve
- Thanking the team for some great things. Thanks again, The Senior Leadership
their hard work over a Team.
couple tough months

- Positive outlook for the


coming quarter

- Signed by the SLT

Primary Content
Primary content refers to some sort of text that is being processed or transformed by
the model. Primary content is typically used in conjunction with instructions. A simple
example would be language translation. In the example below, the English text would be
considered the primary content while “Translate to French:” is an instruction.

Prompt Completion

Can you please tell me how to get to the Pouvez-vous s'il vous plaît me dire comment aller
museum?
au musée?
Translate to French:

Primary content can also be much longer. In the following example, the primary content
is the introduction section from the Wikipedia entry on time complexity , which is
almost 300 words long. This content has been abbreviated in our table for ease of
display.

Prompt Completion
Prompt Completion

In computer science, the time Time complexity is a measure of how long an algorithm takes
complexity is the to run. It is typically expressed as a function of the size of the
computational complexity that input. The most common measure is worst-case time
describes the amount of complexity, which is the maximum amount of time required for
computer time it takes to run inputs of a given size.
an algorithm. Time complexity
is commonly estimated by [...]

Summarize the previous text in


a few sentences:

In computer science, the time The time complexity of an algorithm is the amount of time it
complexity is the takes to run an algorithm. Time complexity is commonly
computational complexity that estimated by counting the number of elementary operations
describes the amount of performed by the algorithm, supposing that each elementary
computer time it takes to run operation takes a fixed amount of time to perform. The time
an algorithm. Time complexity complexity is generally expressed as a function of the size of
is commonly estimated by [...]
the input.
TL;DR (Too long didn't read)

GPT models can also handle primary content that is structured. In the example below, a
TSV about beers in Belgium (abbreviated for display in this article) is passed to the
model as part of the prompt. It is capable of properly interpreting the content and
answering questions about the data.

Prompt Completion

Belgian Beer Brands


There are 10 beers in the table that
Beer name Beer style color ABV Brewery
are less than 6% ABV.
"Chimay Gold: ""Doree""" trappist pale ale gold 4.80%
Chimay Brewery (official trappist)

"Chimay Blue: ""Grande Reserve""" trappist dark ale


dark 9.00%

[…]

Referencing the table above, how many beers are less than
6% ABV?

Examples
Successful prompts often rely on the practice of “one-shot” or “few-shot” learning. This
refers to the inclusion of one or more examples of the desired behavior of the model,
typically by including input and output pairs. This is not learning in the sense that the
model is permanently changed, but rather that the examples better condition the model
to respond as desired for only the current inference. The use of prompts with no
examples is sometimes referred to as “zero-shot” learning. Note that with the Chat
Completion API few-shot learning examples are typically added to the messages array in
the form of example user/assistant interactions after the initial system message.

Learning Prompt Completion


Type

Zero- Headline: Coach The coach is confident that the injury won't derail the Warriors'
shot confident injury season. The team is still focused on their goals and that they
won't derail will continue to work hard to achieve them.
Warriors

Topic:

Few-shot Headline: Twins' Basketball


Correa to use opt-
out, test free agency

Topic: Baseball

Headline: Qatar
World Cup to have
zones for sobering
up

Topic: Soccer

Headline: Yates:
Fantasy football
intel for Week 6

Topic: Football

Headline: Coach
confident injury
won't derail
Warriors

Topic:

The example above illustrates the utility of few-shot learning. Without the examples, the
model seems to be guessing at the desired behavior, while the examples cleanly show
the model how to operate. This also demonstrates the power of the model, it can infer
the category of label that is wanted, even without a ‘basketball’ label in the examples.

Cue
Cues act as the “jumpstart” for the output of the model, helping to direct the model to
the desired output. It is often a prefix that the model can build onto. In the example
below, we show the cue directing the story to include key elements. Cues are often used
along with instructions, but not always. In the examples below, the model is being used
to summarize a long announcement email. A cue is used in one case to help focus the
output of the model and to suggest a specific output format (bullet points) in the other.
In the context of the Chat Completion API instructions take the form of the system
message, and few-shot examples are indicated by providing a series of example
user/assistant interactions to help prime the model.

Number Prompt Completion


of Cues

0 Performance The purpose of this email is to inform employees of recent changes


reviews are a to the performance review experience, which is part of Contoso's
core Performance & Development approach. The changes include an
component of improved structure, upgraded in-tool user experience, and focused
our prompts. Additionally, there is now an optional Conversation
Performance & Starters section for employees to flag topics they'd like to discuss
Development with their managers.
approach at
Contoso. […]

Summarize the
above email
message:

1 Performance that Contoso is improving the performance review experience by


reviews are a making it more focused on core priorities, expectations and impact.
core Additionally, they are upgrading the in-tool user experience with
component of tips, resources and guidance. Finally, they are adding guidance on
our how to activate company-wide core priorities.
Performance &
Development
approach at
Contoso. […]

Summarize the
above email
message:

The key
takeaway of the
message is
Number Prompt Completion
of Cues

2 Performance Contoso is introducing changes to the performance review


reviews are a experience, based on feedback from employees.

core • The changes include a new focus on core priorities, expectations,


component of and impact; an upgraded user experience; and new guidance on
our company-wide core priorities.

Performance & • Performance reviews are a time to reflect on past performance and
Development learning as well as to plan for future priorities and growth
approach at opportunities.

Contoso. […]
• Employees and managers should write comments in advance of
Summarize the the performance review meeting.

above email • There should be at least two performance reviews per year.

message:
• The time investment for each review should remain about the
Key Points:
same.

Supporting content
Supporting content is information that the model can utilize to influence the output in
some way. It differs from primary content in that it is not the main target of task, but it is
typically used along with primary content. Common examples include contextual
information such as the current date, the name of the user, user preferences, and so on.
The example below uses supporting content to help organize a set of planned
workshops for the user. Without the supporting (important topics) the model simply lists
off the workshops (truncated for display), when informed of my important topics, the
model is able to accurately group the meetings.

Supporting Prompt Completion


Content
Supporting Prompt Completion
Content

Without Workshop list:


Workshop
Title Description
list:

Planning Workshop Kickoff Please join us to kick off our workshop! Planning
Hear from our leaders about cross group priorities and learnings for Workshop
the year ahead. The invitation to this event has already been sent to Kickoff

everyone in […]
-Planning
Summarize the workshops above, grouped by topic: Workshop
Wrap-up

- Core
MeetUp

- Customer
Driven
Perspectives

-Impact and
Key Focus
Areas

-Best
practices

-
Accelerating
app
adoption
[…]

With Workshop list:


1. Prompt
Title Description
Engineering:
Planning Workshop Kickoff Please join us to kick off our workshop! - New
Hear from our leaders about cross group priorities and learnings for prompt
the year ahead. The invitation to this event has already been sent to engineering
everyone in […]
tips

My Important Topics: prompt engineering, search, GPT models


- The art of
Summarize the workshops above, grouped by my important topics: prompt
engineering

2. Search:

-Intro to
vector
search with
embedding

3. GPT
Models:

- Intro to
GPT-4

- GPT-35-
Turbo in-
depth.
Best practices
Be Specific. Leave as little to interpretation as possible. Restrict the operational
space.
Be Descriptive. Use analogies.
Double Down. Sometimes you may need to repeat yourself to the model. Give
instructions before and after your primary content, use an instruction and a cue,
etc.
Order Matters. The order in which you present information to the model may
impact the output. Whether you put instructions before your content (“summarize
the following…”) or after (“summarize the above…”) can make a difference in
output. Even the order of few-shot examples can matter. This is referred to as
recency bias.
Give the model an “out”. It can sometimes be helpful to give the model an
alternative path if it is unable to complete the assigned task. For example, when
asking a question over a piece of text you might include something like "respond
with ‘not found’ if the answer is not present." This can help the model avoid
generating false responses.

Space efficiency
While the input size increases with each new generation of GPT models, there will
continue to be scenarios that provide more data than the model can handle. GPT
models break words into “tokens”. While common multi-syllable words are often a
single token, less common words are broken in syllables. Tokens can sometimes be
counter-intuitive, as shown by the example below which demonstrates token boundaries
for different date formats. In this case, spelling out the entire month is more space
efficient than a fully numeric date. The current range of token support goes from 2000
tokens with earlier GPT-3 models to up to 32,768 tokens with the 32k version of the
latest GPT-4 model.

Given this limited space, it is important to use it as efficiently as possible.

Tables – As shown in the examples in the previous section, GPT models can
understand tabular formatted data quite easily. This can be a space efficient way to
include data, rather than preceding every field with name (such as with JSON).
White Space – Consecutive whitespaces are treated as separate tokens which can
be an easy way to waste space. Spaces preceding a word, on the other hand, are
typically treated as part of the same token as the word. Carefully watch your usage
of whitespace and don’t use punctuation when a space alone will do.

Next steps
Learn more about Azure OpenAI
Prompt engineering techniques
Article • 04/23/2023

This guide will walk you through some advanced techniques in prompt design and
prompt engineering. If you're new to prompt engineering, we recommend starting with
our introduction to prompt engineering guide.

While the principles of prompt engineering can be generalized across many different
model types, certain models expect a specialized prompt structure. For Azure OpenAI
GPT models, there are currently two distinct APIs where prompt engineering comes into
play:

Chat Completion API.


Completion API.

Each API requires input data to be formatted differently, which in turn impacts overall
prompt design. The Chat Completion API supports the GPT-35-Turbo and GPT-4
models. These models are designed to take input formatted in a specific chat-like
transcript stored inside an array of dictionaries.

The Completion API supports the older GPT-3 models and has much more flexible input
requirements in that it takes a string of text with no specific format rules. Technically the
GPT-35-Turbo models can be used with either APIs, but we strongly recommend using
the Chat Completion API for these models. To learn more, please consult our in-depth
guide on using these APIs.

The techniques in this guide will teach you strategies for increasing the accuracy and
grounding of responses you generate with a Large Language Model (LLM). It is,
however, important to remember that even when using prompt engineering effectively
you still need to validate the responses the models generate. Just because a carefully
crafted prompt worked well for a particular scenario doesn't necessarily mean it will
generalize more broadly to certain use cases. Understanding the limitations of LLMs, is
just as important as understanding how to leverage their strengths.

This guide does not go in-depth into the mechanics behind the message structure for
Chat Completions. If you aren't familiar with interacting with the ChatGPT and GPT-4
models programmatically we recommend reading our how-to guide on the Chat
Completion API first.

7 Note
All of the examples in this section of the guide were tested against a base GPT-4
model in English. If you are reading a localized version of this article in another
language, these responses represent a localized translation of the English results. To
learn more about potential limitations depending on what language you are using
to prompt a model, please consult our Responsible AI transparency note.

System message
The system message is included at the beginning of the prompt and is used to prime
the model with context, instructions, or other information relevant to your use case. You
can use the system message to describe the assistant’s personality, define what the
model should and shouldn’t answer, and define the format of model responses.

The example below, shows a sample system message and the resulting model response:

System message User Assistant

You are an AI assistant that helps people find information What can you Dear John, I'm
and responds in rhyme. If the user asks you a question you tell about me, sorry to say,

don't know the answer to, say so. John Doe? But I don't have
info on you today.

I'm just an AI with


knowledge in my
brain,

But without your


input, I can't
explain.

So please tell me
more about what
you seek,

And I'll do my best


to give you an
answer unique.

Some other examples of system messages are:

“Assistant is a large language model trained by OpenAI.”


“Assistant is an intelligent chatbot designed to help users answer technical
questions about Azure OpenAI Service. Only answer questions using the context
below and if you're not sure of an answer, you can say "I don't know".
“Assistant is an intelligent chatbot designed to help users answer their tax related
questions.”
“You are an assistant designed to extract entities from text. Users will paste in a
string of text and you will respond with entities you've extracted from the text as a
JSON object. Here's an example of your output format:

JSON

"name": "",

"company": "",

"phone_number": ""

An important detail to understand is that even if you instruct a model in the system
message to answer I don't know when unsure of an answer this does not guarantee that
the request will be honored. A well designed system message can increase the
likelihood of a certain outcome, but it is still possible that an incorrect response could
be generated that contradicts the intent of the instruction in the system message.

Few-shot learning
A common way to adapt language models to new tasks is to use few-shot learning. In
few-shot learning, a set of training examples is provided as part of the prompt to give
additional context to the model.

When using the Chat Completions API, a series of messages between the User and
Assistant (written in the new prompt format), can serve as examples for few-shot
learning. These examples can be used to prime the model to respond in a certain way,
emulate particular behaviors, and seed answers to common questions.

System message Few-shot examples

Assistant is an intelligent chatbot User - "When do I need to file my taxes by?"

designed to help users answer Assistant - "In 2023, you will need to file your taxes by April
their tax related questions.
18th. The date falls after the usual April 15th deadline
Instructions
because April 15th falls on a Saturday in 2023. For more
- Only answer questions related to details, see https://www.irs.gov/filing/individuals/when-to-
taxes.
file "

- If you're unsure of an answer, you


can say "I don't know" or "I'm not User - "How can I check the status of my tax refund?"

sure" and recommend users go to Assistant - "You can check the status of your tax refund by
the IRS website for more visiting https://www.irs.gov/refunds "
information.

The table above describes the basic prompt structure, but for a detailed explanation of
the exact prompt format you will need to refer to the Chat Completion guide.
Non chat scenarios
While the Chat Completion API is optimized to work with multi-turn conversations, it
also can be used for non chat scenarios. For example, for a sentiment analysis scenario,
you might use the following prompt:

System User Assistant


message

You are an hi thank you for calling contoso pharmacy who am i Sentiment rating: 8

assistant speaking with today i understand your frustration mary um The speaker in this
designed to can you tell me what exactly you're trying to accomplish text seems to be
analyze OK uh certainly happy to check that for you one moment polite, helpful, and
sentiment please i see here that you were on a generic form of lipitor patient while
from speech is that right OK uh so i see that your doctor stopped addressing the
data. Users prescribing these drugs in twenty twenty one actually OK concerns of the
will paste in uh yeah i'd be happy to check that for you uh because customer. They take
a string of sometimes there's a gap in the system and it just doesn't the time to check
text and you get reported so let me take a look here just a moment so the information,
will respond i'm seeing here that your doctor had these drugs provide clear
with an prescribed to you from twenty twelve through twenty explanations, and
assessment twenty one OK um yeah according to the latest records uh offer assistance.
of the provided by doctor's office you're now on a different drug However, the rating
response of metformin would you like us to go ahead and fill that is not a 10 because
the speaker. prescription for you for pick up a bit of form in five there is still a hint of
Rate on a hundred milligrams uh you're very welcome mary please let uncertainty in some
scale of 1-10 us know if there's anything else we can do for you today. parts of the
(10 being conversation, which
highest). may not convey
Explain why complete
this rating confidence to the
was given. customer.

Start with clear instructions


The sequence in which information appears in the prompt matters. This is because GPT
style models are built in a certain way that defines how they process the input. Our
research suggests that telling the model the task you want it to do at the beginning of
the prompt, before sharing additional contextual information or examples, can help
produce higher-quality outputs.

7 Note

Although following this technique is still generally recommended, in contrast to


previous model versions (GPT-3 and prior), our testing showed that the model
response with ChatGPT and GPT-4 models was the same regardless of whether the
technique is utilized. In the case of the below example, we see that adding the
statement “Several sources … eruption” at the beginning or the end of the prompt
did not cause any change in the final model response.

System User Assistant


message

You are an Your task is to verify if the No, the statement is not directly implied or
AI assistant statement "Several sources stated by the snippets. The snippets mention a
that helps mention a chance of another chance of a "megaquake" and a "magnitude 9
people find large eruption" is supported by a Cascadia earthquake" hitting Seattle in the
information. specific quote from the following next 50 years, but do not mention a chance of
set of snippets.
another large eruption.
---

SNIPPETS

[1] 14 percent chance of


megaquake hitting Seattle,
experts say

SEATTLE - There's a 14 percent


chance of a magnitude 9
Cascadia earthquake hitting
Seattle in the next 50 years, the
U.S. Geological Survey estimates.
"Unfortunately, we are unable
to...

[2] Earthquake experts lay out


latest outlook for Seattle's 'Really
Big One’

“We say that there's


approximately a 14% chance of
another approximately
magnitude-9 earthquake
occurring in the next 50 years,”
said a geophysicist at the
University of Washington...

---

Is the statement "Several sources


mention a chance of another
large eruption" directly implied or
stated by the snippets?

Repeat instructions at the end


Models can be susceptible to recency bias, which in this context means that information
at the end of the prompt might have more significant influence over the output than
information at the beginning of the prompt. Therefore, it is worth experimenting with
repeating the instructions at the end of the prompt and evaluating the impact on the
generated response.

Prime the output


This refers to including a few words or phrases at the end of the prompt to obtain a
model response that follows the desired form. For example, using a cue such as “Here’s
a bulleted list of key points:\n- ” can help make sure the output is formatted as a
list of bullet points.

System message User Assistant

You are an AI assistant John Smith is married to Lucy Smith. They have five "John Smith
that helps people find kids, and he works as a software engineer at married Lucy
information. Answer in Microsoft. What search queries should I do to fact- Smith five kids
as few words as check this? ## One possible search query is: software
possible. engineer
Microsoft"

In the above prompt, the text One possible search query is: primes the model to produce
a single output. Without this cue the model produces several search queries as output.

Add clear syntax


Using clear syntax for your prompt—including punctuation, headings, and section
markers—helps communicate intent and often makes outputs easier to parse.

In the example below, separators ( --- in this case) have been added between different
sources of information or steps. This allows the use of --- as a stopping condition for
generation. In addition, section headings or special variables are presented in uppercase
to differentiate them.

System User Assistant


message
System User Assistant
message

You are an AI You will read a paragraph, and then issue queries 1. "John Smith Microsoft"
assistant that to a search engine in order to fact-check it. Also - To check if John Smith
helps people explain the queries.
is indeed employed at
find information. ---
Microsoft.

PARAGRAPH
2. "John Smith Lucy
John Smith is married to Lucy Smith. They have Smith" - To check if John
five kids, and he works as a software engineer at Smith is married to Lucy
Microsoft. What search queries should I do to Smith.

fact-check this?
3. "John Smith children" -
---
To check if John Smith
QUERIES has five children.

If you're not sure what syntax to use, consider using Markdown or XML. The models
have been trained on a lot of web content in XML and Markdown which may provide
better results.

Break the task down


Large language models (LLMs) often perform better if the task is broken down into
smaller steps. For example, in the search query prompt referenced earlier, the prompt
can be restructured so that the model is first instructed to extract relevant facts, and
then instructed to generate search queries that can be used to verify those facts.

System message User Assistant


System message User Assistant

You are an AI You will read a paragraph, and then issue queries to a 1. John Smith is
assistant that helps search engine in order to fact-check it.
married to Lucy
people find ---
Smith

information. PARAGRAPH
2. They have five
John Smith is married to Lucy Smith. They have five kids, kids

and he works as a software engineer at Microsoft. What 3. He works as a


search queries should I do to fact-check this?
software
---
engineer at
Now you will extract factual claims, and issue queries to Microsoft

fact-check them. When issuing a query, use the function


SEARCH("query")
QUERIES

SEARCH("John
FACTUAL CLAIMS Smith married
Lucy Smith")

SEARCH("John
Smith family
size")

SEARCH("John
Smith Microsoft
software
engineer")

Notice the use of clear syntax to differentiate the sections and prime the output. In this
simple example, breaking the task down from one to two steps is not very dramatic, but
when trying to do this for a larger piece of text with many factual claims, breaking the
task down can make a significant difference.

Use of affordances
Sometimes we can get the model to use an affordance instead of relying on its own
parameters for information and answers. Search, for example, can be an affordance to
help mitigate against fabricated answers, and to get up-to-date information.

One simple way to use an affordance is to stop generation once the affordance calls are
generated by the model, then paste the outcomes back into the prompt. Below is an
example of what the follow-up call could be after doing the SEARCH calls above. Notice
how we pasted the search results into the prompt and replaced the previous SEARCH
calls.

System User Assistant


message

You are an You will read a paragraph, - John Smith is married to Lucy Smith: Confirmed.
AI assistant extract factual claims, and Snippet [1] and [2] both mention John Smith's wife
System User Assistant
message

that helps then use search engine as Lucy.

people find results to fact-check them

information. ---
- John and Lucy have five kids: Confirmed. Snippet
PARAGRAPH
[3] mentions that John was accompanied by his
John Smith is married to Lucy two daughters and three sons.

Smith. They have five kids,


and he works as a software - John works as a software engineer at Microsoft:
engineer at Microsoft. What Partially confirmed. Snippet [4] mentions that John
search queries should I do to spent 10 years at Microsoft, but it does not
fact-check this?
explicitly state that he currently works there.
---
Further research may be needed to confirm his
FACTUAL CLAIMS
current employment status.
- John Smith is married to
Lucy Smith

- John and Lucy have five


kids

- John works as a software


engineer at Microsoft

---

Here are various search


queries issued to research
the claims above:

SEARCH QUERIES

- John Smith married to Lucy


Smith

- John Smith number of


children

- John Smith software


engineer Microsoft

---

Here are snippets of the


search results:

SNIPPETS:

[1] … John Smith’s wedding


was on September 25, 2012

[2] … John Smith was


accompanied by his wife
Lucy to a party

[3]John was accompanied to


the soccer game by his two
daughters and three sons

[4] … After spending 10 years


at Microsoft, Smith founded
his own startup, Tailspin Toys

[5] John M is the town smith,


and he married Fiona. They
System User Assistant
message

have a daughter named Lucy

---

Given the snippets, fact check


each of the factual claims
above:

Chain of thought prompting


This is a variation on the break the task down technique. Instead of splitting a task into
smaller steps, in this approach the model response is instructed to proceed step-by-step
and present all the steps involved. Doing so reduces the possibility of inaccuracy of
outcomes and makes assessing the model response easier.

System User Assistant


message
System User Assistant
message

You are an Who was the most decorated Step 1: Determine the year of the Sydney
AI assistant (maximum medals) individual athlete Olympic games. The Sydney Olympic
that helps in the Olympic games that were held games were held in the year 2000.

people find at Sydney? Take a step-by-step


information. approach in your response, cite Step 2: Find a list of individual athletes
sources and give reasoning before who won multiple medals in the 2000
sharing final answer in the below Sydney Olympics.

format: ANSWER is: <name>


According to the official website of the
Olympic Games, there were several
individual athletes who won multiple
medals in the 2000 Sydney Olympics.
Some of the notable names include:

• Ian Thorpe (Australia) - 3 gold, 2 silver,


0 bronze

• Pieter van den Hoogenband


(Netherlands) - 2 gold, 0 silver, 2 bronze

• Lenny Krayzelburg (USA) - 3 gold, 0


silver, 0 bronze

• Inge de Bruijn (Netherlands) - 3 gold, 1


silver, 0 bronze

Step 3: Determine the athlete with the


most medals. After reviewing the list of
individual athletes who won multiple
medals in the 2000 Sydney Olympics, we
can see that the athlete with the most
medals is Ian Thorpe from Australia, who
won a total of 5 medals (3 gold and 2
silver).

Therefore, the ANSWER is: Ian Thorpe.

Specifying the output structure


Using your prompt to specify the structure of the output can have a significant impact
on the nature and quality of the results. Sometimes system message inputs as “only
write true facts” or “do not fabricate information” might not be adequate mitigations.
Instead, asking for the model response to also include citations can help result in a
reduction of the prevalence of incorrect responses.

If you instruct the model to cite the source material when it makes statements, those
statements are much more likely to be grounded. Asking for citations makes it so that
the model must make two errors every time it generates a response: the first error is the
fabricated response, and the second is the bad citation. Note that the closer the citation
is to the text it supports, the shorter the distance ahead the model needs to anticipate
the citation, which suggests that inline citations are better mitigations for false content
generation than citations at the end of the content.

Similarly, if you ask the model to extract factual statements from a paragraph, it may
extract compound statements such as ‘X is doing Y AND Z’ (which may be harder to
verify). This can be avoided by specifying an output structure such as (entity1,
relationship, entity2).

The examples below show the use of citations and directing the model response to fit a
defined structure.

System User Assistant


message

You are an List down the different behaviors exhibited by the Customer 1. Polite Greeting
AI assistant Support agent in the following conversation.
BEHAVIOR("Agent
that helps ---
starts the
people find hi thank you for calling contoso pharmacy who am i speaking conversation with
information with today i understand your frustration mary um can you tell a polite greeting
me what exactly you're trying to accomplish OK uh certainly and asks for the
happy to check that for you one moment please i see here that caller's name")

you were on a generic form of lipitor is that right OK uh so i 2. Empathetic


see that your doctor stopped prescribing these drugs in BEHAVIOR("Agent
twenty twenty one actually OK uh yeah i'd be happy to check acknowledges
that for you uh because sometimes there's a gap in the system Mary's frustration
and it just doesn't get reported so let me take a look here just and tries to
a moment so i'm seeing here that your doctor had these drugs understand her
prescribed to you from twenty twelve through twenty twenty concern")

one OK um yeah according to the latest records uh provided 3. Attentive


by doctor's office you're now on a different drug metformin BEHAVIOR("Agent
would you like us to go ahead and fill that prescription for you listens to the
for pick up a bit of form in five hundred milligrams uh you're customer's
very welcome mary please let us know if there's anything else problem and asks
we can do for you today.
for more details")

---
4. Proactive
How was the response of the Customer Support agent?
BEHAVIOR("Agent
Classify and list down their various behaviors and cite the offers to check
reason behind the classification, using the format the issue for the
BEHAVIOR("reason") customer")

5. Informative
BEHAVIOR("Agent
provides the
relevant
information
about the
System User Assistant
message

customer's
prescription
history")

6. Thorough
BEHAVIOR("Agent
checks for any
possible gaps in
the system and
provides a
complete
response")

7. Confirmatory
BEHAVIOR("Agent
confirms the
current
medication with
the customer")

8. Solution-
oriented
BEHAVIOR("Agent
offers to fill the
new prescription
for the
customer")

9. Courteous
BEHAVIOR("Agent
thanks the
customer and
offers further
assistance if
needed")

System User Assistant


message
System User Assistant
message

You are an AI Cluster the following news headlines into


assistant that topic categories based on patterns seen {

helps people within the text. Also mention reasoning "ARTIFICIAL_INTELLIGENCE": {

find behind how these categories were defined.


"HEADLINES": [

information. Output format:


"From books to presentations in
{
10s with AR + ML",

"TOPIC_NAME": "",
"Demo from 1993 of 32-year-old
"HEADLINES": [],
Yann LeCun showing off the
"REASONING": ""
World's first Convolutional
}
Network for Text Recognition",

"First Order Motion Model


Input news headlines:
applied to animate paintings"

1. "From books to presentations in 10s with ],

AR + ML"
"REASONING": "These headlines
2. "Demo from 1993 of 32-year-old Yann are related to artificial
LeCun showing off the World's first intelligence, machine learning,
Convolutional Network for Text Recognition"
and their applications in various
3. "First Order Motion Model applied to fields."

animate paintings"
},

4. "Robinhood and other brokers literally "FINANCIAL_MARKETS": {

blocking purchase of $GME, $NOK, $BB, "HEADLINES": [

$AMC; allow sells"


"Robinhood and other brokers
5. "United Airlines stock down over 5% literally blocking purchase of
premarket trading"
$GME, $NOK, $BB, $AMC; allow
Output:
sells",

"United Airlines stock down over


5% premarket trading",

"Bitcoin was nearly $20,000 a


year ago today"

],

"REASONING": "These headlines


are related to financial markets,
stocks, cryptocurrencies, and
trading platforms."

Temperature and Top_p parameters


Changing the temperature parameter changes the output of the model. The
temperature parameter can be set between 0 and 2. A higher value, for example 0.7 will
make the output more random and produce more divergent responses, while a lower
value, like. 0.2, will make the output more focused and concrete. A fictional story could
be generated using a higher temperature. Whereas to generate a legal document it is
recommended to use a much lower temperature.
Top_probability is another parameter
and is similar to Temperature in that it also controls the randomness of the model
response, but it does so in a different way. The general recommendation is to alter one
of these two parameters at a time, not both.

Provide grounding context


One of the most effective ways to provide reliable answers is to give the model data to
draw its responses from (grounding data). If your use case relies on up-to-date, reliable
information and is not purely a creative scenario, we strongly recommend providing
grounding data. In general, the closer you can get your source material to the final form
of the answer you want, the less work the model needs to do, which means there is less
opportunity for error.
Here is an example where the system is provided a recent blog
describing the launch of GPT-4 in Azure OpenAI Service , and asked to name some of
its early customers.

System message User Assistant

You are an AI assistant that helps Who are some Here are some of the early customers
people find information. You will of the early of the Azure OpenAI service GPT-4
provide answers exclusively from customers of models:

below text and respond in 100 words the Azure - Epic Healthcare: They plan to use
or less:
OpenAI service GPT-4 to help physicians and nurses
"At Microsoft, we are constantly GPT-4 models? spend less time on data entry and
discovering new ways to unleash investigation.

creativity, unlock productivity, and - Coursera: They are using Azure


uplevel skills so that more people can OpenAI Service with GPT-4 to create a
benefit from using AI. This is allowing new AI-powered learning experience
our customers to …
on their platform.

- Coca-Cola: They are using Azure


… thereby reducing the likelihood that OpenAI to build a knowledge hub and
the model will generate a harmful plan to leverage GPT-4's multimodal
response.”1
capabilities for marketing, advertising,
public relations, and customer
Answer in 100 words or less. Use bullet relations.

lists wherever possible. - Other companies of all sizes are


using Azure AI to improve customer
experiences, summarize long-form
content, help write software, and
reduce risk by predicting the right tax
data.

1
Text from Azure OpenAI GPT-4 launch blog .
Next steps
Learn more about Azure OpenAI.
Get started with the ChatGPT model with the ChatGPT quickstart.
For more examples, check out the Azure OpenAI Samples GitHub repository
System message framework and
template recommendations for Large
Language Models (LLMs)
Article • 06/01/2023

This article provides a recommended framework and example templates to help write an
effective system message, sometimes referred to as a metaprompt or system prompt
that can be used to guide an AI system’s behavior and improve system performance. If
you're new to prompt engineering, we recommend starting with our introduction to
prompt engineering and prompt engineering techniques guidance.

This guide provides system message recommendations and resources that, along with
other prompt engineering techniques, can help increase the accuracy and grounding of
responses you generate with a Large Language Model (LLM). However, it is important to
remember that even when using these templates and guidance, you still need to
validate the responses the models generate. Just because a carefully crafted system
message worked well for a particular scenario doesn't necessarily mean it will work more
broadly across other scenarios. Understanding the limitations of LLMs and the
mechanisms for evaluating and mitigating those limitations is just as important as
understanding how to leverage their strengths.

The LLM system message framework described here covers four concepts:

Define the model’s profile, capabilities, and limitations for your scenario
Define the model’s output format
Provide example(s) to demonstrate the intended behavior of the model
Provide additional behavioral guardrails

Define the model’s profile, capabilities, and


limitations for your scenario
Define the specific task(s) you would like the model to complete. Describe who
the users of the model will be, what inputs they will provide to the model, and
what you expect the model to do with the inputs.

Define how the model should complete the tasks, including any additional tools
(like APIs, code, plug-ins) the model can use. If it doesn’t use additional tools, it
can rely on its own parametric knowledge.
Define the scope and limitations of the model’s performance. Provide clear
instructions on how the model should respond when faced with any limitations.
For example, define how the model should respond if prompted on subjects or for
uses that are off topic or otherwise outside of what you want the system to do.

Define the posture and tone the model should exhibit in its responses.

Define the model's output format


When using the system message to define the model’s desired output format in your
scenario, consider and include the following types of information:

Define the language and syntax of the output format. If you want the output to
be machine parse-able, you may want the output to be in formats like JSON, XSON
or XML.

Define any styling or formatting preferences for better user or machine


readability. For example, you may want relevant parts of the response to be bolded
or citations to be in a specific format.

Provide example(s) to demonstrate the


intended behavior of the model
When using the system message to demonstrate the intended behavior of the model in
your scenario, it is helpful to provide specific examples. When providing examples,
consider the following:

Describe difficult use cases where the prompt is ambiguous or complicated, to give
the model additional visibility into how to approach such cases.
Show the potential “inner monologue” and chain-of-thought reasoning to better
inform the model on the steps it should take to achieve the desired outcomes.

Define additional behavioral guardrails


When defining additional safety and behavioral guardrails, it’s helpful to first identify
and prioritize the harms you’d like to address. Depending on the application, the
sensitivity and severity of certain harms could be more important than others.

Next steps
Learn more about Azure OpenAI
Learn more about deploying Azure OpenAI responsibly
For more examples, check out the Azure OpenAI Samples GitHub repository
Azure OpenAI on your data (preview)
Article • 07/07/2023

Azure OpenAI on your data enables you to run supported chat models such as GPT-35-
Turbo and GPT-4 on your data without needing to train or fine-tune models. Running
models on your data enables you to chat on top of, and analyze your data with greater
accuracy and speed. By doing so, you can unlock valuable insights that can help you
make better business decisions, identify trends and patterns, and optimize your
operations. One of the key benefits of Azure OpenAI on your data is its ability to tailor
the content of conversational AI.

To get started, connect your data source using Azure OpenAI Studio and start asking
questions and chatting on your data.

Because the model has access to, and can reference specific sources to support its
responses, answers are not only based on its pretrained knowledge but also on the
latest information available in the designated data source. This grounding data also
helps the model avoid generating responses based on outdated or incorrect
information.

7 Note

To get started, you need to already have been approved for Azure OpenAI access
and have an Azure OpenAI Service resource with either the gpt-35-turbo or the
gpt-4 models deployed.

What is Azure OpenAI on your data


Azure OpenAI on your data works with OpenAI's powerful GPT-35-Turbo and GPT-4
language models, enabling them to provide responses based on your data. You can
access Azure OpenAI on your data using a REST API or the web-based interface in the
Azure OpenAI Studio to create a solution that connects to your data to enable an
enhanced chat experience.

One of the key features of Azure OpenAI on your data is its ability to retrieve and utilize
data in a way that enhances the model's output. Azure OpenAI on your data, together
with Azure Cognitive Search, determines what data to retrieve from the designated data
source based on the user input and provided conversation history. This data is then
augmented and resubmitted as a prompt to the OpenAI model, with retrieved
information being appended to the original prompt. Although retrieved data is being
appended to the prompt, the resulting input is still processed by the model like any
other prompt. Once the data has been retrieved and the prompt has been submitted to
the model, the model uses this information to provide a completion. See the Data,
privacy, and security for Azure OpenAI Service article for more information.

Data source options


Azure OpenAI on your data uses an Azure Cognitive Search index to determine what
data to retrieve based on user inputs and provided conversation history. We
recommend using Azure OpenAI Studio to create your index from a blob storage or
local files. See the quickstart article for more information.

Ingesting your data into Azure cognitive search


For documents and datasets with long text, you should use the available data
preparation script to ingest the data into cognitive search. The script chunks the data
so that your response with the service will be more accurate. This script also supports
scanned PDF file and images and ingests the data using Form Recognizer.

Data formats and file types


Azure OpenAI on your data supports the following filetypes:

.txt

.md
.html

Microsoft Word files


Microsoft PowerPoint files
PDF

There are some caveats about document structure and how it might affect the quality of
responses from the model:

The model provides the best citation titles from markdown ( .md ) files.

If a document is a PDF file, the text contents are extracted as a preprocessing step
(unless you're connecting your own Azure Cognitive Search index). If your
document contains images, graphs, or other visual content, the model's response
quality depends on the quality of the text that can be extracted from them.
If you're converting data from an unsupported format into a supported format,
make sure the conversion:
Doesn't lead to significant data loss.
Doesn't add unexpected noise to your data.

This will impact the quality of Azure Cognitive Search and the model response.

Virtual network support & private link support


Azure OpenAI on your data does not currently support private endpoints.

Recommended settings
Use the following sections to help you configure Azure OpenAI on your data for optimal
results.

System message
Give the model instructions about how it should behave and any context it should
reference when generating a response. You can describe the assistant’s personality, what
it should and shouldn’t answer, and how to format responses. There’s no token limit for
the system message, but will be included with every API call and counted against the
overall token limit. The system message will be truncated if it's greater than 200 tokens.

For example, if you're creating a chatbot where the data consists of transcriptions of
quarterly financial earnings calls, you might use the following system message:

"You are a financial chatbot useful for answering questions from financial reports. You are
given excerpts from the earnings call. Please answer the questions by parsing through all
dialogue."

This system message can help improve the quality of the response by specifying the
domain (in this case finance) and mentioning that the data consists of call transcriptions.
It helps set the necessary context for the model to respond appropriately.

7 Note

The system message is only guidance. The model might not adhere to every
instruction specified because it has been primed with certain behaviors such as
objectivity, and avoiding controversial statements. Unexpected behavior may occur
if the system message contradicts with these behaviors.
Maximum response
Set a limit on the number of tokens per model response. The upper limit for Azure
OpenAI on Your Data is 1500. This is equivalent to setting the max_tokens parameter in
the API.

Limit responses to your data


This option encourages the model to respond using your data only, and is selected by
default. If you unselect this option, the model may more readily apply its internal
knowledge to respond. Determine the correct selection based on your use case and
scenario.

Semantic search

) Important

Semantic search is subject to additional pricing


Currently Azure OpenAI on your data supports semantic search for English
data only. Only enable semantic search if both your documents and use case
are in English.

If semantic search is enabled for your Azure Cognitive Search service, you are more
likely to produce better retrieval of your data, which can improve response and citation
quality.

Index field mapping


If you're using your own index, you will be prompted in the Azure OpenAI Studio to
define which fields you want to map for answering questions when you add your data
source. You can provide multiple fields for Content data, and should include all fields
that have text pertaining to your use case.

In this example, the fields mapped to Content data and Title provide information to the
model to answer questions. Title is also used to title citation text. The field mapped to
File name generates the citation names in the response.

Mapping these fields correctly helps ensure the model has better response and citation
quality.

Interacting with the model


Use the following practices for best results when chatting with the model.

Conversation history

Before starting a new conversation (or asking a question that is not related to the
previous ones), clear the chat history.
Getting different responses for the same question between the first conversational
turn and subsequent turns can be expected because the conversation history
changes the current state of the model. If you receive incorrect answers, report it
as a quality bug.

Model response

If you are not satisfied with the model response for a specific question, try either
making the question more specific or more generic to see how the model
responds, and reframe your question accordingly.

Chain-of-thought prompting has been shown to be effective in getting the model


to produce desired outputs for complex questions/tasks.

Question length
Avoid asking long questions and break them down into multiple questions if possible.
The GPT models have limits on the number of tokens they can accept. Token limits are
counted toward: the user question, the system message, the retrieved search documents
(chunks), internal prompts, the conversation history (if any), and the response. If the
question exceeds the token limit, it will be truncated.

Multi-lingual support

Azure OpenAI on your data supports queries that are in the same language as the
documents. For example, if your data is in Japanese, then queries need to be in
Japanese too.

Currently Azure OpenAI on your data supports semantic search for English data
only. Don't enable semantic search if your data is in other languages.

We recommend using a system message to inform the model that your data is in
another language. For example:

“You are an AI assistant that helps people find information. You retrieve Japanese
documents, and you should read them carefully in Japanese and answer in
Japanese.”

If you have documents in multiple languages, we recommend building a new index


for each language and connecting them separately to Azure OpenAI.

Using the API


Consider setting the following parameters even if they are optional for using the API.

Parameter Recommendation

fieldsMapping Explicitly set the title and content fields of your index. This impacts the search
retrieval quality of Azure Cognitive Search, which impacts the overall response
and citation quality.

roleInformation Corresponds to the “System Message” in the Azure OpenAI Studio. See the
System message section above for recommendations.

Streaming data

You can send a streaming request using the stream parameter, allowing data to be sent
and received incrementally, without waiting for the entire API response. This can
improve performance and user experience, especially for large or dynamic data.
JSON

"stream": true,

"dataSources": [

"type": "AzureCognitiveSearch",

"parameters": {

"endpoint": "'$SearchEndpoint'",

"key": "'$SearchKey'",

"indexName": "'$SearchIndex'"

],

"messages": [

"role": "user",

"content": "What are the differences between Azure Machine


Learning and Azure Cognitive Services?"

Conversation history for better results

When chatting with a model, providing a history of the chat will help the model return
higher quality results.

JSON

"dataSources": [

"type": "AzureCognitiveSearch",

"parameters": {

"endpoint": "'$SearchEndpoint'",

"key": "'$SearchKey'",

"indexName": "'$SearchIndex'"

],

"messages": [

"role": "user",

"content": "What are the differences between Azure Machine


Learning and Azure Cognitive Services?"

},

"role": "tool",

"content": "{\"citations\": [{\"content\": \"title: Cognitive


Services and Machine Learning\\ntitleSuffix: Azure Cognitive
Services\\ndescription: Learn where Azure Cognitive Services fits in with
other Azure offerings for machine learning.\\nCognitive Services and machine
learning\\nCognitive Services provides machine learning capabilities to
solve general problems such as...\\n \"articles\\\\cognitive-
services\\\\cognitive-services-and-machine-learning.md\", \"url\": null,
\"metadata\": {\"chunking\": \"orignal document size=1018.
Scores=0.32200050354003906 and 1.2880020141601562.Org Highlight
count=115.\"}, \"chunk_id\": \"0\"}], \"intent\": \"[\\\"What are the
differences between Azure Machine Learning and Azure Cognitive Services?
\\\"]\"}"

},

"role": "assistant",

"content": " \nAzure Machine Learning is a product and service


tailored for data scientists to build, train, and deploy machine learning
models [doc1]..."

},

"role": "user",

"content": "How do I use Azure machine learning?"

Next steps
Get started using your data with Azure OpenAI
Introduction to prompt engineering
Create a resource and deploy a model
using Azure OpenAI
Article • 05/23/2023

Use this article to get started with Azure OpenAI with step-by-step instructions to create
a resource and deploy a model. While the steps for resource creation and model
deployment can be completed in a few minutes, the actual deployment process itself
can take more than hour. You can create your resource, start your deployment, and then
check back in on your deployment later rather than actively waiting for the deployment
to complete.

Prerequisites
An Azure subscription - Create one for free

Access granted to Azure OpenAI in the desired Azure subscription

Currently, access to this service is granted only by application. You can apply for
access to Azure OpenAI by completing the form at https://aka.ms/oai/access .
Open an issue on this repo to contact us if you have an issue.

Create a resource
Resources in Azure can be created several different ways:

Within the Azure portal


Using the REST APIs, Azure CLI, PowerShell or client libraries
Via ARM templates

This guide walks you through the Azure portal creation experience.

1. Navigate to the create page: Azure OpenAI Service Create Page

2. On the Create page provide the following information:

Field Description

Subscription Select the Azure subscription used in your OpenAI onboarding application

Resource The Azure resource group that will contain your OpenAI resource. You can
group create a new group or add it to a pre-existing group.
Field Description

Region The location of your instance. Different locations may introduce latency, but
have no impact on the runtime availability of your resource.

Name A descriptive name for your cognitive services resource. For example,
MyOpenAIResource.

Pricing Tier Only 1 pricing tier is available for the service currently

Deploy a model
Before you can generate text or inference, you need to deploy a model. You can select
from one of several available models in Azure OpenAI Studio.

To deploy a model, follow these steps:

1. Sign in to Azure OpenAI Studio .

2. Select the subscription and Azure OpenAI resource to work with.


3. Under Management select Deployments.

4. Select Create new deployment.

Field Description

Select a Model availability varies by region.For a list of available models per region,
model see Model Summary table and region availability.

Deployment Choose a name carefully. The deployment name will be used in your code to
name call the model via the client libraries and REST API

Advanced Content Filter - Assign a content filter to your deployment.

Options Tokens per Minute Rate Limit - Adjust the Tokens per Minute (TPM) to set
the effective rate limit for your deployment. You can modify this value at any
time via the Quotas menu

5. Select a model from the drop-down.

6. Enter a deployment name to help you identify the model.

7. For your first deployment leave the Advanced Options set to the defaults.

The deployments table displays a new entry that corresponds to this newly created
model. Your deployment status will move to succeeded when the deployment is
complete and ready for use.

Next steps
Now that you have a resource and your first model deployed get started making
API calls and generating text with our quickstarts.
Learn more about the underlying models that power Azure OpenAI.
Learn how to work with the ChatGPT
and GPT-4 models
Article • 05/15/2023

The ChatGPT and GPT-4 models are language models that are optimized for
conversational interfaces. The models behave differently than the older GPT-3 models.
Previous models were text-in and text-out, meaning they accepted a prompt string and
returned a completion to append to the prompt. However, the ChatGPT and GPT-4
models are conversation-in and message-out. The models expect input formatted in a
specific chat-like transcript format, and return a completion that represents a model-
written message in the chat. While this format was designed specifically for multi-turn
conversations, you'll find it can also work well for non-chat scenarios too.

In Azure OpenAI there are two different options for interacting with these type of
models:

Chat Completion API.


Completion API with Chat Markup Language (ChatML).

The Chat Completion API is a new dedicated API for interacting with the ChatGPT and
GPT-4 models. This API is the preferred method for accessing these models. It is also
the only way to access the new GPT-4 models.

ChatML uses the same completion API that you use for other models like text-davinci-
002, it requires a unique token based prompt format known as Chat Markup Language
(ChatML). This provides lower level access than the dedicated Chat Completion API, but
also requires additional input validation, only supports ChatGPT (gpt-35-turbo) models,
and the underlying format is more likely to change over time.

This article walks you through getting started with the new ChatGPT and GPT-4 models.
It's important to use the techniques described here to get the best results. If you try to
interact with the models the same way you did with the older model series, the models
will often be verbose and provide less useful responses.

Working with the ChatGPT and GPT-4 models


The following code snippet shows the most basic way to use the ChatGPT and GPT-4
models with the Chat Completion API. If this is your first time using these models
programmatically, we recommend starting with our ChatGPT & GPT-4 Quickstart.
GPT-4 models are currently only available by request. Existing Azure OpenAI
customers can apply for access by filling out this form .

Python

import os

import openai

openai.api_type = "azure"

openai.api_version = "2023-05-15"

openai.api_base = os.getenv("OPENAI_API_BASE") # Your Azure OpenAI


resource's endpoint value.

openai.api_key = os.getenv("OPENAI_API_KEY")

response = openai.ChatCompletion.create(

engine="gpt-35-turbo", # The deployment name you chose when you deployed


the ChatGPT or GPT-4 model.

messages=[

{"role": "system", "content": "Assistant is a large language model


trained by OpenAI."},

{"role": "user", "content": "Who were the founders of Microsoft?"}

print(response)

print(response['choices'][0]['message']['content'])

Output

"choices": [

"finish_reason": "stop",

"index": 0,

"message": {

"content": "The founders of Microsoft are Bill Gates and Paul Allen.
They co-founded the company in 1975.",

"role": "assistant"

],

"created": 1679014551,

"id": "chatcmpl-6usfn2yyjkbmESe3G4jaQR6bsScO1",

"model": "gpt-3.5-turbo-0301",

"object": "chat.completion",

"usage": {

"completion_tokens": 86,

"prompt_tokens": 37,

"total_tokens": 123

7 Note

The following parameters aren't available with the new ChatGPT and GPT-4 models:
logprobs , best_of , and echo . If you set any of these parameters, you'll get an error.

Every response includes a finish_reason . The possible values for finish_reason are:

stop: API returned complete model output.


length: Incomplete model output due to max_tokens parameter or token limit.
content_filter: Omitted content due to a flag from our content filters.
null:API response still in progress or incomplete.

Consider setting max_tokens to a slightly higher value than normal such as 300 or 500.
This ensures that the model doesn't stop generating text before it reaches the end of
the message.

Model versioning

7 Note

gpt-35-turbo is equivalent to the gpt-3.5-turbo model from OpenAI.

Unlike previous GPT-3 and GPT-3.5 models, the gpt-35-turbo model as well as the gpt-
4 and gpt-4-32k models will continue to be updated. When creating a deployment of
these models, you'll also need to specify a model version.

Currently, only version 0301 is available for ChatGPT and 0314 for GPT-4 models. We'll
continue to make updated versions available in the future. You can find the model
deprecation times for these models on our models page.

Working with the Chat Completion API


OpenAI trained the ChatGPT and GPT-4 models to accept input formatted as a
conversation. The messages parameter takes an array of dictionaries with a conversation
organized by role.
The format of a basic Chat Completion is as follows:

{"role": "system", "content": "Provide some context and/or instructions to


the model"},

{"role": "user", "content": "The users messages goes here"}

A conversation with one example answer followed by a question would look like:

{"role": "system", "content": "Provide some context and/or instructions to


the model."},

{"role": "user", "content": "Example question goes here."},

{"role": "assistant", "content": "Example answer goes here."},

{"role": "user", "content": "First question/message for the model to


actually respond to."}

System role
The system role also known as the system message is included at the beginning of the
array. This message provides the initial instructions to the model. You can provide
various information in the system role including:

A brief description of the assistant


Personality traits of the assistant
Instructions or rules you would like the assistant to follow
Data or information needed for the model, such as relevant questions from an FAQ

You can customize the system role for your use case or just include basic instructions.
The system role/message is optional, but it's recommended to at least include a basic
one to get the best results.

Messages
After the system role, you can include a series of messages between the user and the
assistant.

{"role": "user", "content": "What is thermodynamics?"}

To trigger a response from the model, you should end with a user message indicating
that it's the assistant's turn to respond. You can also include a series of example
messages between the user and the assistant as a way to do few shot learning.

Message prompt examples


The following section shows examples of different styles of prompts that you could use
with the ChatGPT and GPT-4 models. These examples are just a starting point, and you
can experiment with different prompts to customize the behavior for your own use
cases.

Basic example
If you want the ChatGPT model to behave similarly to chat.openai.com , you can use a
basic system message like "Assistant is a large language model trained by OpenAI."

{"role": "system", "content": "Assistant is a large language model trained


by OpenAI."},

{"role": "user", "content": "Who were the founders of Microsoft?"}

Example with instructions


For some scenarios, you may want to give additional instructions to the model to define
guardrails for what the model is able to do.

{"role": "system", "content": "Assistant is an intelligent chatbot designed


to help users answer their tax related questions.

Instructions:

- Only answer questions related to taxes.

- If you're unsure of an answer, you can say "I don't know" or "I'm not
sure" and recommend users go to the IRS website for more information. "},

{"role": "user", "content": "When are my taxes due?"}

Using data for grounding

You can also include relevant data or information in the system message to give the
model extra context for the conversation. If you only need to include a small amount of
information, you can hard code it in the system message. If you have a large amount of
data that the model should be aware of, you can use embeddings or a product like
Azure Cognitive Search to retrieve the most relevant information at query time.

{"role": "system", "content": "Assistant is an intelligent chatbot designed


to help users answer technical questions about Azure OpenAI Serivce. Only
answer questions using the context below and if you're not sure of an
answer, you can say 'I don't know'.

Context:

- Azure OpenAI Service provides REST API access to OpenAI's powerful


language models including the GPT-3, Codex and Embeddings model series.

- Azure OpenAI Service gives customers advanced language AI with OpenAI GPT-
3, Codex, and DALL-E models with the security and enterprise promise of
Azure. Azure OpenAI co-develops the APIs with OpenAI, ensuring compatibility
and a smooth transition from one to the other.

- At Microsoft, we're committed to the advancement of AI driven by


principles that put people first. Microsoft has made significant investments
to help guard against abuse and unintended harm, which includes requiring
applicants to show well-defined use cases, incorporating Microsoft’s
principles for responsible AI use."

},

{"role": "user", "content": "What is Azure OpenAI Service?"}

Few shot learning with Chat Completion


You can also give few shot examples to the model. The approach for few shot learning
has changed slightly because of the new prompt format. You can now include a series of
messages between the user and the assistant in the prompt as few shot examples. These
examples can be used to seed answers to common questions to prime the model or
teach particular behaviors to the model.

This is only one example of how you can use few shot learning with ChatGPT and GPT-4.
You can experiment with different approaches to see what works best for your use case.

{"role": "system", "content": "Assistant is an intelligent chatbot designed


to help users answer their tax related questions. "},

{"role": "user", "content": "When do I need to file my taxes by?"},

{"role": "assistant", "content": "In 2023, you will need to file your taxes
by April 18th. The date falls after the usual April 15th deadline because
April 15th falls on a Saturday in 2023. For more details, see
https://www.irs.gov/filing/individuals/when-to-file."},

{"role": "user", "content": "How can I check the status of my tax refund?"},

{"role": "assistant", "content": "You can check the status of your tax
refund by visiting https://www.irs.gov/refunds"}

Using Chat Completion for non-chat scenarios


The Chat Completion API is designed to work with multi-turn conversations, but it also
works well for non-chat scenarios.

For example, for an entity extraction scenario, you might use the following prompt:

{"role": "system", "content": "You are an assistant designed to extract


entities from text. Users will paste in a string of text and you will
respond with entities you've extracted from the text as a JSON object.
Here's an example of your output format:

"name": "",

"company": "",

"phone_number": ""

}"},

{"role": "user", "content": "Hello. My name is Robert Smith. I'm calling


from Contoso Insurance, Delaware. My colleague mentioned that you are
interested in learning about our comprehensive benefits policy. Could you
give me a call back at (555) 346-9322 when you get a chance so we can go
over the benefits?"}

Creating a basic conversation loop


The examples so far have shown you the basic mechanics of interacting with the Chat
Completion API. This example shows you how to create a conversation loop that
performs the following actions:

Continuously takes console input, and properly formats it as part of the messages
array as user role content.
Outputs responses that are printed to the console and formatted and added to the
messages array as assistant role content.

This means that every time a new question is asked, a running transcript of the
conversation so far is sent along with the latest question. Since the model has no
memory, you need to send an updated transcript with each new question or the model
will lose context of the previous questions and answers.

Python

import os

import openai

openai.api_type = "azure"

openai.api_version = "2023-05-15"

openai.api_base = os.getenv("OPENAI_API_BASE") # Your Azure OpenAI


resource's endpoint value .

openai.api_key = os.getenv("OPENAI_API_KEY")

conversation=[{"role": "system", "content": "You are a helpful assistant."}]

while(True):

user_input = input()

conversation.append({"role": "user", "content": user_input})

response = openai.ChatCompletion.create(

engine="gpt-3.5-turbo", # The deployment name you chose when you


deployed the ChatGPT or GPT-4 model.

messages = conversation

conversation.append({"role": "assistant", "content": response['choices']


[0]['message']['content']})

print("\n" + response['choices'][0]['message']['content'] + "\n")

When you run the code above you will get a blank console window. Enter your first
question in the window and then hit enter. Once the response is returned, you can
repeat the process and keep asking questions.

Managing conversations
The previous example will run until you hit the model's token limit. With each question
asked, and answer received, the messages array grows in size. The token limit for gpt-
35-turbo is 4096 tokens, whereas the token limits for gpt-4 and gpt-4-32k are 8192 and

32768 respectively. These limits include the token count from both the message array
sent and the model response. The number of tokens in the messages array combined
with the value of the max_tokens parameter must stay under these limits or you'll receive
an error.

It's your responsibility to ensure the prompt and completion falls within the token limit.
This means that for longer conversations, you need to keep track of the token count and
only send the model a prompt that falls within the limit.

7 Note

We strongly recommend staying within the documented input token limit for all
models even if you find you are able to exceed that limit.

The following code sample shows a simple chat loop example with a technique for
handling a 4096 token count using OpenAI's tiktoken library.
The code requires tiktoken 0.3.0 . If you have an older version run pip install tiktoken
--upgrade .

Python

import tiktoken

import openai

import os

openai.api_type = "azure"

openai.api_version = "2023-05-15"

openai.api_base = os.getenv("OPENAI_API_BASE") # Your Azure OpenAI


resource's endpoint value .

openai.api_key = os.getenv("OPENAI_API_KEY")

system_message = {"role": "system", "content": "You are a helpful


assistant."}

max_response_tokens = 250

token_limit= 4096

conversation=[]

conversation.append(system_message)

def num_tokens_from_messages(messages, model="gpt-3.5-turbo-0301"):

encoding = tiktoken.encoding_for_model(model)

num_tokens = 0

for message in messages:

num_tokens += 4 # every message follows <im_start>


{role/name}\n{content}<im_end>\n

for key, value in message.items():

num_tokens += len(encoding.encode(value))

if key == "name": # if there's a name, the role is omitted

num_tokens += -1 # role is always required and always 1


token

num_tokens += 2 # every reply is primed with <im_start>assistant

return num_tokens

while(True):

user_input = input("")

conversation.append({"role": "user", "content": user_input})

conv_history_tokens = num_tokens_from_messages(conversation)

while (conv_history_tokens+max_response_tokens >= token_limit):

del conversation[1]

conv_history_tokens = num_tokens_from_messages(conversation)

response = openai.ChatCompletion.create(

engine="gpt-35-turbo", # The deployment name you chose when you


deployed the ChatGPT or GPT-4 model.

messages = conversation,

temperature=.7,

max_tokens=max_response_tokens,

conversation.append({"role": "assistant", "content": response['choices']


[0]['message']['content']})

print("\n" + response['choices'][0]['message']['content'] + "\n")

In this example once the token count is reached the oldest messages in the conversation
transcript will be removed. del is used instead of pop() for efficiency, and we start at
index 1 so as to always preserve the system message and only remove user/assistant
messages. Over time, this method of managing the conversation can cause the
conversation quality to degrade as the model will gradually lose context of the earlier
portions of the conversation.

An alternative approach is to limit the conversation duration to the max token length or
a certain number of turns. Once the max token limit is reached and the model would
lose context if you were to allow the conversation to continue, you can prompt the user
that they need to begin a new conversation and clear the messages array to start a
brand new conversation with the full token limit available.

The token counting portion of the code demonstrated previously, is a simplified version
of one of OpenAI's cookbook examples .

Next steps
Learn more about Azure OpenAI.
Get started with the ChatGPT model with the ChatGPT quickstart.
For more examples, check out the Azure OpenAI Samples GitHub repository
Learn how to generate or manipulate
text
Article • 02/17/2023

The completions endpoint can be used for a wide variety of tasks. It provides a simple
but powerful text-in, text-out interface to any of our models. You input some text as a
prompt, and the model will generate a text completion that attempts to match whatever
context or pattern you gave it. For example, if you give the API the prompt, "As
Descartes said, I think, therefore", it will return the completion " I am" with high
probability.

The best way to start exploring completions is through our playground in Azure OpenAI
Studio . It's a simple text box where you can submit a prompt to generate a
completion. You can start with a simple example like the following:

write a tagline for an ice cream shop

once you submit, you'll see something like the following generated:

Console

write a tagline for an ice cream shop

we serve up smiles with every scoop!

The actual completion results you see may differ because the API is stochastic by
default. In other words, you might get a slightly different completion every time you call
it, even if your prompt stays the same. You can control this behavior with the
temperature setting.

This simple, "text in, text out" interface means you can "program" the model by
providing instructions or just a few examples of what you'd like it to do. Its success
generally depends on the complexity of the task and quality of your prompt. A general
rule is to think about how you would write a word problem for a middle school student
to solve. A well-written prompt provides enough information for the model to know
what you want and how it should respond.

7 Note

Keep in mind that the models' training data cuts off in October 2019, so they may
not have knowledge of current events. We plan to add more continuous training in
the future.
Prompt design

Basics
OpenAI's models can do everything from generating original stories to performing
complex text analysis. Because they can do so many things, you have to be explicit in
showing what you want. Showing, not just telling, is often the secret to a good prompt.

The models try to predict what you want from the prompt. If you send the words "Give
me a list of cat breeds," the model wouldn't automatically assume that you're asking for
a list of cat breeds. You could as easily be asking the model to continue a conversation
where the first words are "Give me a list of cat breeds" and the next ones are "and I'll tell
you which ones I like." If the model only assumed that you wanted a list of cats, it
wouldn't be as good at content creation, classification, or other tasks.

There are three basic guidelines to creating prompts:

Show and tell. Make it clear what you want either through instructions, examples, or a
combination of the two. If you want the model to rank a list of items in alphabetical
order or to classify a paragraph by sentiment, show it that's what you want.

Provide quality data. If you're trying to build a classifier or get the model to follow a
pattern, make sure that there are enough examples. Be sure to proofread your examples
— the model is usually smart enough to see through basic spelling mistakes and give
you a response, but it also might assume that the mistakes are intentional and it can
affect the response.

Check your settings. The temperature and top_p settings control how deterministic the
model is in generating a response. If you're asking it for a response where there's only
one right answer, then you'd want to set these settings to lower values. If you're looking
for a response that's not obvious, then you might want to set them to higher values. The
number one mistake people use with these settings is assuming that they're "cleverness"
or "creativity" controls.

Troubleshooting
If you're having trouble getting the API to perform as expected, follow this checklist:

1. Is it clear what the intended generation should be?


2. Are there enough examples?
3. Did you check your examples for mistakes? (The API won't tell you directly)
4. Are you using temp and top_p correctly?

Classification
To create a text classifier with the API we provide a description of the task and provide a
few examples. In this demonstration we show the API how to classify the sentiment of
Tweets.

Console

This is a tweet sentiment classifier

Tweet: "I loved the new Batman movie!"

Sentiment: Positive

Tweet: "I hate it when my phone battery dies."

Sentiment: Negative

Tweet: "My day has been 👍"

Sentiment: Positive

Tweet: "This is the link to the article"

Sentiment: Neutral

Tweet: "This new music video blew my mind"

Sentiment:

It's worth paying attention to several features in this example:

1. Use plain language to describe your inputs and outputs


We use plain language for
the input "Tweet" and the expected output "Sentiment." For best practices, start with
plain language descriptions. While you can often use shorthand or keys to indicate the
input and output, when building your prompt it's best to start by being as descriptive as
possible and then working backwards removing extra words as long as the performance
to the prompt is consistent.

2. Show the API how to respond to any case


In this example we provide multiple
outcomes "Positive", "Negative" and "Neutral." A neutral outcome is important because
there will be many cases where even a human would have a hard time determining if
something is positive or negative and situations where it's neither.

3. You can use text and emoji


The classifier is a mix of text and emoji 👍. The API reads
emoji and can even convert expressions to and from them.

4. You need fewer examples for familiar tasks


For this classifier we only provided a
handful of examples. This is because the API already has an understanding of sentiment
and the concept of a tweet. If you're building a classifier for something the API might
not be familiar with, it might be necessary to provide more examples.

Improving the classifier's efficiency


Now that we have a grasp of how to build a classifier, let's take that example and make
it even more efficient so that we can use it to get multiple results back from one API call.

This is a tweet sentiment classifier

Tweet: "I loved the new Batman movie!"

Sentiment: Positive

Tweet: "I hate it when my phone battery dies"

Sentiment: Negative

Tweet: "My day has been 👍"

Sentiment: Positive

Tweet: "This is the link to the article"

Sentiment: Neutral

Tweet text

1. "I loved the new Batman movie!"

2. "I hate it when my phone battery dies"

3. "My day has been 👍"

4. "This is the link to the article"

5. "This new music video blew my mind"

Tweet sentiment ratings:

1: Positive

2: Negative

3: Positive

4: Neutral

5: Positive

Tweet text

1. "I can't stand homework"

2. "This sucks. I'm bored 😠"

3. "I can't wait for Halloween!!!"

4. "My cat is adorable ❤️❤️"

5. "I hate chocolate"

Tweet sentiment ratings:

1.

After showing the API how tweets are classified by sentiment we then provide it a list of
tweets and then a list of sentiment ratings with the same number index. The API is able
to pick up from the first example how a tweet is supposed to be classified. In the second
example it sees how to apply this to a list of tweets. This allows the API to rate five (and
even more) tweets in just one API call.

It's important to note that when you ask the API to create lists or evaluate text you need
to pay extra attention to your probability settings (Top P or Temperature) to avoid drift.

1. Make sure your probability setting is calibrated correctly by running multiple tests.

2. Don't make your list too long or the API is likely to drift.

Generation
One of the most powerful yet simplest tasks you can accomplish with the API is
generating new ideas or versions of input. You can give the API a list of a few story ideas
and it will try to add to that list. We've seen it create business plans, character
descriptions and marketing slogans just by providing it a handful of examples. In this
demonstration we'll use the API to create more examples for how to use virtual reality in
the classroom:

Ideas involving education and virtual reality

1. Virtual Mars

Students get to explore Mars via virtual reality and go on missions to


collect and catalog what they see.

2.

All we had to do in this example is provide the API with just a description of what the list
is about and one example. We then prompted the API with the number 2. indicating
that it's a continuation of the list.

Although this is a very simple prompt, there are several details worth noting:

1. We explained the intent of the list

Just like with the classifier, we tell the API up front what the list is about. This helps it
focus on completing the list and not trying to guess what the pattern is behind it.

2. Our example sets the pattern for the rest of the list

Because we provided a one-sentence description, the API is going to try to follow that
pattern for the rest of the items it adds to the list. If we want a more verbose response,
we need to set that up from the start.
3. We prompt the API by adding an incomplete entry

When the API sees 2. and the prompt abruptly ends, the first thing it tries to do is
figure out what should come after it. Since we already had an example with number one
and gave the list a title, the most obvious response is to continue adding items to the
list.

Advanced generation techniques

You can improve the quality of the responses by making a longer more diverse list in
your prompt. One way to do that is to start off with one example, let the API generate
more and select the ones that you like best and add them to the list. A few more high-
quality variations can dramatically improve the quality of the responses.

Conversation
The API is extremely adept at carrying on conversations with humans and even with
itself. With just a few lines of instruction, we've seen the API perform as a customer
service chatbot that intelligently answers questions without ever getting flustered or a
wise-cracking conversation partner that makes jokes and puns. The key is to tell the API
how it should behave and then provide a few examples.

Here's an example of the API playing the role of an AI answering questions:

The following is a conversation with an AI assistant. The assistant is


helpful, creative, clever, and very friendly.

Human: Hello, who are you?

AI: I am an AI created by OpenAI. How can I help you today?

Human:

This is all it takes to create a chatbot capable of carrying on a conversation. But


underneath its simplicity there are several things going on that are worth paying
attention to:

1. We tell the API the intent but we also tell it how to behave
Just like the other
prompts, we cue the API into what the example represents, but we also add another key
detail: we give it explicit instructions on how to interact with the phrase "The assistant is
helpful, creative, clever, and very friendly."

Without that instruction the API might stray and mimic the human it's interacting with
and become sarcastic or some other behavior we want to avoid.
2. We give the API an identity
At the start we have the API respond as an AI that was
created by OpenAI. While the API has no intrinsic identity, this helps it respond in a way
that's as close to the truth as possible. You can use identity in other ways to create other
kinds of chatbots. If you tell the API to respond as a woman who works as a research
scientist in biology, you'll get intelligent and thoughtful comments from the API similar
to what you'd expect from someone with that background.

In this example we create a chatbot that is a bit sarcastic and reluctantly answers
questions:

Marv is a chatbot that reluctantly answers questions.

###

User: How many pounds are in a kilogram?

Marv: This again? There are 2.2 pounds in a kilogram. Please make a note of
this.

###

User: What does HTML stand for?

Marv: Was Google too busy? Hypertext Markup Language. The T is for try to
ask better questions in the future.

###

User: When did the first airplane fly?

Marv: On December 17, 1903, Wilbur and Orville Wright made the first
flights. I wish they'd come and take me away.

###

User: Who was the first man in space?

Marv:

To create an amusing and somewhat helpful chatbot we provide a few examples of


questions and answers showing the API how to reply. All it takes is just a few sarcastic
responses and the API is able to pick up the pattern and provide an endless number of
snarky responses.

Transformation
The API is a language model that is familiar with a variety of ways that words and
characters can be used to express information. This ranges from natural language text to
code and languages other than English. The API is also able to understand content on a
level that allows it to summarize, convert and express it in different ways.

Translation
In this example we show the API how to convert from English to French:
English: I do not speak French.

French: Je ne parle pas français.


English: See you later!

French: À tout à l'heure!

English: Where is a good restaurant?

French: Où est un bon restaurant?


English: What rooms do you have available?

French: Quelles chambres avez-vous de disponible?

English:

This example works because the API already has a grasp of French, so there's no need to
try to teach it this language. Instead, we just need to provide enough examples that API
understands that it's converting from one language to another.

If you want to translate from English to a language the API is unfamiliar with you'd need
to provide it with more examples and a fine-tuned model to do it fluently.

Conversion
In this example we convert the name of a movie into emoji. This shows the adaptability
of the API to picking up patterns and working with other characters.

Back to Future: 👨👴🚗🕒

Batman: 🤵🦇

Transformers: 🚗🤖

Wonder Woman: 👸🏻👸🏼👸🏽👸🏾👸🏿

Spider-Man: 🕸🕷🕸🕸🕷🕸

Winnie the Pooh: 🐻🐼🐻

The Godfather: 👨👩👧🕵🏻‍♂️👲💥

Game of Thrones: 🏹🗡🗡🏹

Spider-Man:

Summarization
The API is able to grasp the context of text and rephrase it in different ways. In this
example, the API takes a block of text and creates an explanation a child would
understand. This illustrates that the API has a deep grasp of language.

My ten-year-old asked me what this passage means:

"""

A neutron star is the collapsed core of a massive supergiant star, which had
a total mass of between 10 and 25 solar masses, possibly more if the star
was especially metal-rich.[1] Neutron stars are the smallest and densest
stellar objects, excluding black holes and hypothetical white holes, quark
stars, and strange stars.[2] Neutron stars have a radius on the order of 10
kilometres (6.2 mi) and a mass of about 1.4 solar masses.[3] They result
from the supernova explosion of a massive star, combined with gravitational
collapse, that compresses the core past white dwarf star density to that of
atomic nuclei.

"""

I rephrased it for him, in plain language a ten-year-old can understand:

"""

In this example we place whatever we want summarized between the triple quotes. It's
worth noting that we explain both before and after the text to be summarized what our
intent is and who the target audience is for the summary. This is to keep the API from
drifting after it processes a large block of text.

Completion
While all prompts result in completions, it can be helpful to think of text completion as
its own task in instances where you want the API to pick up where you left off. For
example, if given this prompt, the API will continue the train of thought about vertical
farming. You can lower the temperature setting to keep the API more focused on the
intent of the prompt or increase it to let it go off on a tangent.

Vertical farming provides a novel solution for producing food locally,


reducing transportation costs and

This next prompt shows how you can use completion to help write React components.
We send some code to the API, and it's able to continue the rest because it has an
understanding of the React library. We recommend using models from our Codex series
for tasks that involve understanding or generating code. Currently, we support two
Codex models: code-davinci-002 and code-cushman-001 . For more information about
Codex models, see the Codex models section in Models.

import React from 'react';

const HeaderComponent = () => (

Factual responses
The API has a lot of knowledge that it's learned from the data it was trained on. It also
has the ability to provide responses that sound very real but are in fact made up. There
are two ways to limit the likelihood of the API making up an answer.

1. Provide a ground truth for the API


If you provide the API with a body of text to
answer questions about (like a Wikipedia entry) it will be less likely to confabulate a
response.

2. Use a low probability and show the API how to say "I don't know"
If the API
understands that in cases where it's less certain about a response that saying "I don't
know" or some variation is appropriate, it will be less inclined to make up answers.

In this example we give the API examples of questions and answers it knows and then
examples of things it wouldn't know and provide question marks. We also set the
probability to zero so the API is more likely to respond with a "?" if there's any doubt.

Q: Who is Batman?

A: Batman is a fictional comic book character.

Q: What is torsalplexity?

A: ?

Q: What is Devz9?

A: ?

Q: Who is George Lucas?

A: George Lucas is American film director and producer famous for creating
Star Wars.

Q: What is the capital of California?

A: Sacramento.

Q: What orbits the Earth?

A: The Moon.

Q: Who is Fred Rickerson?

A: ?

Q: What is an atom?

A: An atom is a tiny particle that makes up everything.

Q: Who is Alvan Muntz?

A: ?

Q: What is Kozar-09?

A: ?

Q: How many moons does Mars have?


A: Two, Phobos and Deimos.

Q:

Working with code


The Codex model series is a descendant of OpenAI's base GPT-3 series that's been
trained on both natural language and billions of lines of code. It's most capable in
Python and proficient in over a dozen languages including C#, JavaScript, Go, Perl, PHP,
Ruby, Swift, TypeScript, SQL, and even Shell.

Learn more about generating code completions, with the working with code guide

Next steps
Learn how to work with code (Codex).
Learn more about the underlying models that
power Azure OpenAI.
How to switch between OpenAI and
Azure OpenAI endpoints with Python
Article • 07/07/2023

While OpenAI and Azure OpenAI Service rely on a common Python client library , there
are small changes you need to make to your code in order to swap back and forth
between endpoints. This article walks you through the common changes and differences
you'll experience when working across OpenAI and Azure OpenAI.

7 Note

This library is maintained by OpenAI and is currently in preview. Refer to the release
history or the version.py commit history to track the latest updates to the
library.

Authentication
We recommend using environment variables. If you haven't done this before our Python
quickstarts walk you through this configuration.

API key

OpenAI Azure OpenAI

Python Python

import openai
import openai

openai.api_key = openai.api_type = "azure"

"sk-..."
openai.api_key = "..."

openai.organization openai.api_base = "https://example-


= "..."
endpoint.openai.azure.com"

openai.api_version = "2023-05-15" # subject to


change

Azure Active Directory authentication

OpenAI Azure OpenAI


Python Python

import openai
import openai

from azure.identity import DefaultAzureCredential

openai.api_key =
"sk-..."
credential = DefaultAzureCredential()

openai.organizat token =
ion = "..."
credential.get_token("https://cognitiveservices.az
ure.com/.default")

openai.api_type = "azuread"

openai.api_key = token.token

openai.api_base = "https://example-
endpoint.openai.azure.com"

openai.api_version = "2023-05-15" # subject to


change

Keyword argument for model


OpenAI uses the model keyword argument to specify what model to use. Azure OpenAI
has the concept of deployments and uses the deployment_id keyword argument to
describe which model deployment to use. Azure OpenAI also supports the use of
engine interchangeably with deployment_id .

For OpenAI engine still works in most instances, but it's deprecated and model is
preferred.

OpenAI Azure OpenAI

Python Python

completion = completion =
openai.Completion.create(
openai.Completion.create(

prompt="<prompt>",
prompt="<prompt>",

model="text-davinci-003"
deployment_id="text-davinci-
)
003"


#engine="text-davinci-003"

chat_completion = )

openai.ChatCompletion.create(

messages="<messages>",
chat_completion =
model="gpt-4"
openai.ChatCompletion.create(

)
messages="<messages>",

deployment_id="gpt-4"

embedding = #engine="gpt-4"

openai.Embedding.create(

input="<input>",
)

model="text-embedding-ada-002"

)
embedding =
openai.Embedding.create(

input="<input>",

deployment_id="text-embedding-
ada-002"

#engine="text-embedding-ada-
002"

Azure OpenAI embeddings doesn't support


multiple inputs
Many examples show passing multiple inputs into the embeddings API. For Azure
OpenAI, currently we must pass a single text input per call.

OpenAI Azure OpenAI

Python Python

inputs = ["A", "B", "C"]


inputs = ["A", "B", "C"]

embedding = for text in inputs:

openai.Embedding.create(
embedding =
input=inputs,
openai.Embedding.create(

model="text-embedding-ada- input=text,

002"
deployment_id="text-
)
embedding-ada-002"

#engine="text-embedding-ada-
002"

Next steps
Learn more about how to work with GPT-35-Turbo and the GPT-4 models with our
how-to guide.
For more examples, check out the Azure OpenAI Samples GitHub repository
How to configure content filters with
Azure OpenAI Service
Article • 06/09/2023

7 Note

All customers have the ability to modify the content filters to be stricter (for
example, to filter content at lower severity levels than the default). Approval is
required for full content filtering control, including (i) configuring content filters at
severity level high only (ii) or turning the content filters off. Managed customers
only may apply for full content filtering control via this form: Azure OpenAI Limited
Access Review: Modified Content Filters and Abuse Monitoring
(microsoft.com) .

The content filtering system integrated into Azure OpenAI Service runs alongside the
core models and uses an ensemble of multi-class classification models to detect four
categories of harmful content (violence, hate, sexual, and self-harm) at four severity
levels respectively (safe, low, medium, and high). The default content filtering
configuration is set to filter at the medium severity threshold for all four content harms
categories for both prompts and completions. That means that content that is detected
at severity level medium or high is filtered, while content detected at severity level low
or safe is not filtered by the content filters. Learn more about content categories,
severity levels, and the behavior of the content filtering system here.

Content filters can be configured at resource level. Once a new configuration is created,
it can be associated with one or more deployments. For more information about model
deployment, see the resource deployment guide.

The configurability feature is available in preview and allows customers to adjust the
settings, separately for prompts and completions, to filter content for each content
category at different severity levels as described in the table below. Content detected at
the 'safe' severity level is labeled in annotations but is not subject to filtering and is not
configurable.

Severity Configurable Configurable Descriptions


filtered for prompts for
completions
Severity Configurable Configurable Descriptions
filtered for prompts for
completions

Low, Yes Yes Strictest filtering configuration. Content detected at


medium, severity levels low, medium and high is filtered.
high

Medium, Yes Yes Default setting. Content detected at severity level low
high is not filtered, content at medium and high is filtered.

High If approved* If approved* Content detected at severity levels low and medium
is not filtered. Only content at severity level high is
filtered. Requires approval*.

No If approved* If approved* No content is filtered regardless of severity level


filters detected. Requires approval*.

* Only approved customers have full content filtering control, including configuring
content filters at severity level high only or turning the content filters off. Managed
customers only can apply for full content filtering control via this form: Azure OpenAI
Limited Access Review: Modified Content Filters and Abuse Monitoring
(microsoft.com)

Configuring content filters via Azure OpenAI


Studio (preview)
The following steps show how to set up a customized content filtering configuration for
your resource.

1. Go to Azure OpenAI Studio and navigate to the Content Filters tab (in the bottom
left navigation, as designated by the red box below).

2. Create a new customized content filtering configuration.

This leads to the following configuration view, where you can choose a name for
the custom content filtering configuration.

3. This is the view of the default content filtering configuration, where content is
filtered at medium and high severity levels for all categories. You can modify the
content filtering severity level for both prompts and completions separately
(configuration for prompts is in the left column and configuration for completions
is in the right column, as designated with the blue boxes below) for each of the
four content categories (content categories are listed on the left side of the screen,
as designated with the green box below). There are three severity levels for each
category that are partially or fully configurable: Low, medium, and high (labeled at
the top of each column, as designated with the red box below).

4. If you determine that your application or usage scenario requires stricter filtering
for some or all content categories, you can configure the settings, separately for
prompts and completions, to filter at more severity levels than the default setting.
An example is shown in the image below, where the filtering level for user prompts
is set to the strictest configuration for hate and sexual, with low severity content
filtered along with content classified as medium and high severity (outlined in the
red box below). In the example, the filtering levels for model completions are set at
the strictest configuration for all content categories (blue box below). With this
modified filtering configuration in place, low, medium, and high severity content
will be filtered for the hate and sexual categories in user prompts; medium and
high severity content will be filtered for the self-harm and violence categories in
user prompts; and low, medium, and high severity content will be filtered for all
content categories in model completions.

5. If your use case was approved for modified content filters as outlined above, you
will receive full control over content filtering configurations. With full control, you
can choose to turn filtering off, or filter only at severity level high, while accepting
low and medium severity content. In the image below, filtering for the categories
of self-harm and violence is turned off for user prompts (red box below), while
default configurations are retained for other categories for user prompts. For
model completions, only high severity content is filtered for the category self-harm
(blue box below), and filtering is turned off for violence (green box below), while
default configurations are retained for other categories.

You can create multiple content filtering configurations as per your requirements.

6. Next, to make a custom content filtering configuration operational, assign a


configuration to one or more deployments in your resource. To do this, go to the
Deployments tab and select Edit deployment (outlined near the top of the screen
in a red box below).

7. Go to advanced options (outlined in the blue box below) select the content filter
configuration suitable for that deployment from the Content Filter dropdown
(outlined near the bottom of the dialog box in the red box below).

8. Select Save and close to apply the selected configuration to the deployment.

9. You can also edit and delete a content filter configuration if required. To do this,
navigate to the content filters tab and select the desired action (options outlined
near the top of the screen in the red box below). You can edit/delete only one
filtering configuration at a time.


7 Note

Before deleting a content filtering configuration, you will need to unassign it


from any deployment in the Deployments tab.

Best practices
We recommend informing your content filtering configuration decisions through an
iterative identification (for example, red team testing, stress-testing, and analysis) and
measurement process to address the potential harms that are relevant for a specific
model, application, and deployment scenario. After implementing mitigations such as
content filtering, repeat measurement to test effectiveness. Recommendations and best
practices for Responsible AI for Azure OpenAI, grounded in the Microsoft Responsible AI
Standard can be found in the Responsible AI Overview for Azure OpenAI.

Next steps
Learn more about Responsible AI practices for Azure OpenAI: Overview of
Responsible AI practices for Azure OpenAI models.
Read more about content filtering categories and severity levels with Azure
OpenAI Service.
Learn more about red teaming from our: Introduction to red teaming large
language models (LLMs) article.
Codex models and Azure OpenAI
Service
Article • 02/17/2023

The Codex model series is a descendant of our GPT-3 series that's been trained on both
natural language and billions of lines of code. It's most capable in Python and proficient
in over a dozen languages including C#, JavaScript, Go, Perl, PHP, Ruby, Swift,
TypeScript, SQL, and even Shell.

You can use Codex for a variety of tasks including:

Turn comments into code


Complete your next line or function in context
Bring knowledge to you, such as finding a useful library or API call for an
application
Add comments
Rewrite code for efficiency

How to use the Codex models


Here are a few examples of using Codex that can be tested in Azure OpenAI Studio's
playground with a deployment of a Codex series model, such as code-davinci-002 .

Saying "Hello" (Python)


Python

"""

Ask the user for their name and say "Hello"

"""

Create random names (Python)


Python

"""

1. Create a list of first names

2. Create a list of last names

3. Combine them randomly into a list of 100 full names

"""

Create a MySQL query (Python)


Python

"""

Table customers, columns = [CustomerId, FirstName, LastName, Company,


Address, City, State, Country, PostalCode, Phone, Fax, Email, SupportRepId]

Create a MySQL query for all customers in Texas named Jane

"""

query =

Explaining code (JavaScript)


JavaScript

// Function 1

var fullNames = [];

for (var i = 0; i < 50; i++) {

fullNames.push(names[Math.floor(Math.random() * names.length)]

+ " " + lastNames[Math.floor(Math.random() * lastNames.length)]);

// What does Function 1 do?

Best practices

Start with a comment, data or code


You can experiment using one of the Codex models in our playground (styling
instructions as comments when needed.)

To get Codex to create a useful completion, it's helpful to think about what information
a programmer would need to perform a task. This could simply be a clear comment or
the data needed to write a useful function, like the names of variables or what class a
function handles.

In this example we tell Codex what to call the function and what task it's going to
perform.

Python

# Create a function called 'nameImporter' to add a first and last name to


the database

This approach scales even to the point where you can provide Codex with a comment
and an example of a database schema to get it to write useful query requests for various
databases. Here's an example where we provide the columns and table names for the
query.

Python

# Table albums, columns = [AlbumId, Title, ArtistId]

# Table artists, columns = [ArtistId, Name]

# Table media_types, columns = [MediaTypeId, Name]

# Table playlists, columns = [PlaylistId, Name]

# Table playlist_track, columns = [PlaylistId, TrackId]

# Table tracks, columns = [TrackId, Name, AlbumId, MediaTypeId, GenreId,


Composer, Milliseconds, Bytes, UnitPrice]

# Create a query for all albums with more than 10 tracks

When you show Codex the database schema, it's able to make an informed guess about
how to format a query.

Specify the programming language


Codex understands dozens of different programming languages. Many share similar
conventions for comments, functions and other programming syntax. By specifying the
language and what version in a comment, Codex is better able to provide a completion
for what you want. That said, Codex is fairly flexible with style and syntax. Here's an
example for R and Python.

# R language

# Calculate the mean distance between an array of points

Python

# Python 3

# Calculate the mean distance between an array of points

Prompt Codex with what you want it to do


If you want Codex to create a webpage, placing the first line of code in an HTML
document ( <!DOCTYPE html> ) after your comment tells Codex what it should do next.
The same method works for creating a function from a comment (following the
comment with a new line starting with func or def).

HTML

<!-- Create a web page with the title 'Kat Katman attorney at paw' -->

<!DOCTYPE html>

Placing <!DOCTYPE html> after our comment makes it very clear to Codex what we want
it to do.

Or if we want to write a function we could start the prompt as follows and Codex will
understand what it needs to do next.

Python

# Create a function to count to 100

def counter

Specifying libraries will help Codex understand what you


want
Codex is aware of a large number of libraries, APIs and modules. By telling Codex which
ones to use, either from a comment or importing them into your code, Codex will make
suggestions based upon them instead of alternatives.

HTML

<!-- Use A-Frame version 1.2.0 to create a 3D website -->

<!-- https://aframe.io/releases/1.2.0/aframe.min.js -->

By specifying the version, you can make sure Codex uses the most current library.

7 Note

Codex can suggest helpful libraries and APIs, but always be sure to do your own
research to make sure that they're safe for your application.

Comment style can affect code quality


With some languages, the style of comments can improve the quality of the output. For
example, when working with Python, in some cases using doc strings (comments
wrapped in triple quotes) can give higher quality results than using the pound ( # )
symbol.

Python

"""

Create an array of users and email addresses

"""

Comments inside of functions can be helpful


Recommended coding standards usually suggest placing the description of a function
inside the function. Using this format helps Codex more clearly understand what you
want the function to do.

Python

def getUserBalance(id):

"""

Look up the user in the database ‘UserData' and return their current
account balance.

"""

Provide examples for more precise results


If you have a particular style or format you need Codex to use, providing examples or
demonstrating it in the first part of the request will help Codex more accurately match
what you need.

Python

"""

Create a list of random animals and species

"""

animals = [ {"name": "Chomper", "species": "Hamster"}, {"name":

Lower temperatures give more precise results


Setting the API temperature to 0, or close to zero (such as 0.1 or 0.2) tends to give
better results in most cases. Unlike GPT-3 models, where a higher temperature can
provide useful creative and random results, higher temperatures with Codex models
may give you really random or erratic responses.

In cases where you need Codex to provide different potential results, start at zero and
then increment upwards by 0.1 until you find suitable variation.

Organize tasks into functions


We can get Codex to write functions by specifying what the function should do in as
precise terms as possible in comment. For example, by writing the following comment,
Codex creates a JavaScript timer function that's triggered when a user presses a button:

A simple JavaScript timer

JavaScript

// Create a timer that creates an alert in 10 seconds

Creating example data


Testing applications often requires using example data. Because Codex is a language
model that understands how to comprehend and write natural language, you can ask
Codex to create data like arrays of made up names, products and other variables. For
example, here we ask Codex to create an array of weather temperatures.

JavaScript

/* Create an array of weather temperatures for San Francisco */

Asking Codex to perform this task will produce a table like this:

JavaScript

var weather = [

{ month: 'January', high: 58, low: 48 },

{ month: 'February', high: 61, low: 50 },

{ month: 'March', high: 64, low: 53 },

{ month: 'April', high: 67, low: 55 },

{ month: 'May', high: 70, low: 58 },

{ month: 'June', high: 73, low: 61 },

{ month: 'July', high: 76, low: 63 },

{ month: 'August', high: 77, low: 64 },

{ month: 'September', high: 76, low: 63 },

{ month: 'October', high: 73, low: 61 },

{ month: 'November', high: 68, low: 57 },

{ month: 'December', high: 64, low: 54 }

];

Compound functions and small applications


We can provide Codex with a comment consisting of a complex request like creating a
random name generator or performing tasks with user input and Codex can generate
the rest provided there are enough tokens.

JavaScript

/*

Create a list of animals

Create a list of cities

Use the lists to generate stories about what I saw at the zoo in each city

*/

Limit completion size for more precise results or lower


latency
Requesting longer completions in Codex can lead to imprecise answers and repetition.
Limit the size of the query by reducing max_tokens and setting stop tokens. For
instance, add \n as a stop sequence to limit completions to one line of code. Smaller
completions also incur less latency.

Use streaming to reduce latency


Large Codex queries can take tens of seconds to complete. To build applications that
require lower latency, such as coding assistants that perform autocompletion, consider
using streaming. Responses will be returned before the model finishes generating the
entire completion. Applications that need only part of a completion can reduce latency
by cutting off a completion either programmatically or by using creative values for stop .

Users can combine streaming with duplication to reduce latency by requesting more
than one solution from the API, and using the first response returned. Do this by setting
n > 1 . This approach consumes more token quota, so use carefully (for example, by
using reasonable settings for max_tokens and stop ).

Use Codex to explain code


Codex's ability to create and understand code allows us to use it to perform tasks like
explaining what the code in a file does. One way to accomplish this is by putting a
comment after a function that starts with "This function" or "This application is." Codex
will usually interpret this as the start of an explanation and complete the rest of the text.

JavaScript

/* Explain what the previous function is doing: It

Explaining an SQL query


In this example, we use Codex to explain in a human readable format what an SQL query
is doing.

SQL

SELECT DISTINCT department.name

FROM department

JOIN employee ON department.id = employee.department_id

JOIN salary_payments ON employee.id = salary_payments.employee_id

WHERE salary_payments.date BETWEEN '2020-06-01' AND '2020-06-30'

GROUP BY department.name

HAVING COUNT(employee.id) > 10;

-- Explanation of the above query in human readable format

--

Writing unit tests


Creating a unit test can be accomplished in Python simply by adding the comment "Unit
test" and starting a function.

Python

# Python 3

def sum_numbers(a, b):

return a + b

# Unit test

def

Checking code for errors


By using examples, you can show Codex how to identify errors in code. In some cases no
examples are required, however demonstrating the level and detail to provide a
description can help Codex understand what to look for and how to explain it. (A check
by Codex for errors shouldn't replace careful review by the user. )

JavaScript

/* Explain why the previous function doesn't work. */

Using source data to write database functions


Just as a human programmer would benefit from understanding the database structure
and the column names, Codex can use this data to help you write accurate query
requests. In this example, we insert the schema for a database and tell Codex what to
query the database for.

Python

# Table albums, columns = [AlbumId, Title, ArtistId]

# Table artists, columns = [ArtistId, Name]

# Table media_types, columns = [MediaTypeId, Name]

# Table playlists, columns = [PlaylistId, Name]

# Table playlist_track, columns = [PlaylistId, TrackId]

# Table tracks, columns = [TrackId, Name, AlbumId, MediaTypeId, GenreId,


Composer, Milliseconds, Bytes, UnitPrice]

# Create a query for all albums with more than 10 tracks

Converting between languages


You can get Codex to convert from one language to another by following a simple
format where you list the language of the code you want to convert in a comment,
followed by the code and then a comment with the language you want it translated
into.

Python

# Convert this from Python to R

# Python version

[ Python code ]

# End

# R version

Rewriting code for a library or framework


If you want Codex to make a function more efficient, you can provide it with the code to
rewrite followed by an instruction on what format to use.

JavaScript

// Rewrite this as a React component

var input = document.createElement('input');

input.setAttribute('type', 'text');

document.body.appendChild(input);
var button = document.createElement('button');

button.innerHTML = 'Say Hello';

document.body.appendChild(button);

button.onclick = function() {

var name = input.value;

var hello = document.createElement('div');

hello.innerHTML = 'Hello ' + name;

document.body.appendChild(hello);

};

// React version:

Next steps
Learn more about the underlying models that power Azure OpenAI.
Learn how to generate embeddings with
Azure OpenAI
Article • 05/16/2023

An embedding is a special format of data representation that can be easily utilized by


machine learning models and algorithms. The embedding is an information dense
representation of the semantic meaning of a piece of text. Each embedding is a vector
of floating point numbers, such that the distance between two embeddings in the
vector space is correlated with semantic similarity between two inputs in the original
format. For example, if two texts are similar, then their vector representations should
also be similar.

How to get embeddings


To obtain an embedding vector for a piece of text, we make a request to the
embeddings endpoint as shown in the following code snippets:

console

Console

curl
https://YOUR_RESOURCE_NAME.openai.azure.com/openai/deployments/YOUR_DEPL
OYMENT_NAME/embeddings?api-version=2023-05-15\

-H 'Content-Type: application/json' \

-H 'api-key: YOUR_API_KEY' \

-d '{"input": "Sample Document goes here"}'

Best practices

Verify inputs don't exceed the maximum length


The maximum length of input text for our embedding models is 2048 tokens (equivalent
to around 2-3 pages of text). You should verify that your inputs don't exceed this limit
before making a request.

Choose the best model for your task


For the search models, you can obtain embeddings in two ways. The <search_model>-
doc model is used for longer pieces of text (to be searched over) and the
<search_model>-query model is used for shorter pieces of text, typically queries or class

labels in zero shot classification. You can read more about all of the Embeddings models
in our Models guide.

Replace newlines with a single space


Unless you're embedding code, we suggest replacing newlines (\n) in your input with a
single space, as we have observed inferior results when newlines are present.

Limitations & risks


Our embedding models may be unreliable or pose social risks in certain cases, and may
cause harm in the absence of mitigations. Review our Responsible AI content for more
information on how to approach their use responsibly.

Next steps
Learn more about using Azure OpenAI and embeddings to perform document
search with our embeddings tutorial.
Learn more about the underlying models that power Azure OpenAI.
Learn how to prepare your dataset for
fine-tuning
Article • 03/13/2023

The first step of customizing your model is to prepare a high quality dataset. To do this
you'll need a set of training examples composed of single input prompts and the
associated desired output ('completion'). This format is notably different than using
models during inference in the following ways:

Only provide a single prompt vs a few examples.


You don't need to provide detailed instructions as part of the prompt.
Each prompt should end with a fixed separator to inform the model when the
prompt ends and the completion begins. A simple separator, which generally
works well is \n\n###\n\n . The separator shouldn't appear elsewhere in any
prompt.
Each completion should start with a whitespace due to our tokenization, which
tokenizes most words with a preceding whitespace.
Each completion should end with a fixed stop sequence to inform the model when
the completion ends. A stop sequence could be \n , ### , or any other token that
doesn't appear in any completion.
For inference, you should format your prompts in the same way as you did when
creating the training dataset, including the same separator. Also specify the same
stop sequence to properly truncate the completion.
The dataset cannot exceed 100 MB in total file size.

Best practices
Customization performs better with high-quality examples and the more you have,
generally the better the model performs. We recommend that you provide at least a few
hundred high-quality examples to achieve a model that performs better than using well-
designed prompts with a base model. From there, performance tends to linearly
increase with every doubling of the number of examples. Increasing the number of
examples is usually the best and most reliable way of improving performance.

If you're fine-tuning on a pre-existing dataset rather than writing prompts from scratch,
be sure to manually review your data for offensive or inaccurate content if possible, or
review as many random samples of the dataset as possible if it's large.
Specific guidelines
Fine-tuning can solve various problems, and the optimal way to use it may depend on
your specific use case. Below, we've listed the most common use cases for fine-tuning
and corresponding guidelines.

Classification
Classifiers are the easiest models to get started with. For classification problems we
suggest using ada, which generally tends to perform only very slightly worse than more
capable models once fine-tuned, while being significantly faster. In classification
problems, each prompt in the dataset should be classified into one of the predefined
classes. For this type of problem, we recommend:

Use a separator at the end of the prompt, for example, \n\n###\n\n . Remember to
also append this separator when you eventually make requests to your model.
Choose classes that map to a single token. At inference time, specify
max_tokens=1 since you only need the first token for classification.
Ensure that the prompt + completion doesn't exceed 2048 tokens, including the
separator
Aim for at least 100 examples per class
To get class log probabilities, you can specify logprobs=5 (for five classes) when
using your model
Ensure that the dataset used for fine-tuning is very similar in structure and type of
task as what the model will be used for

Case study: Is the model making untrue statements?


Let's say you'd like to ensure that the text of the ads on your website mentions the
correct product and company. In other words, you want to ensure the model isn't
making things up. You may want to fine-tune a classifier which filters out incorrect ads.

The dataset might look something like the following:

JSON

{"prompt":"Company: BHFF insurance\nProduct: allround insurance\nAd:One stop


shop for all your insurance needs!\nSupported:", "completion":" yes"}

{"prompt":"Company: Loft conversion specialists\nProduct: -\nAd:Straight


teeth in weeks!\nSupported:", "completion":" no"}

In the example above, we used a structured input containing the name of the company,
the product, and the associated ad. As a separator we used \nSupported: which clearly
separated the prompt from the completion. With a sufficient number of examples, the
separator you choose doesn't make much of a difference (usually less than 0.4%) as long
as it doesn't appear within the prompt or the completion.

For this use case we fine-tuned an ada model since it is faster and cheaper, and the
performance is comparable to larger models because it's a classification task.

Now we can query our model by making a Completion request.

Console

curl
https://YOUR_RESOURCE_NAME.openaiazure.com/openai/deployments/YOUR_DEPLOYMEN
T_NAME/completions?api-version=2023-05-15\ \

-H 'Content-Type: application/json' \

-H 'api-key: YOUR_API_KEY' \

-d '{

"prompt": "Company: Reliable accountants Ltd\nProduct: Personal Tax


help\nAd:Best advice in town!\nSupported:",

"max_tokens": 1

}'

Which will return either yes or no .

Case study: Sentiment analysis

Let's say you'd like to get a degree to which a particular tweet is positive or negative.
The dataset might look something like the following:

Console

{"prompt":"Overjoyed with the new iPhone! ->", "completion":" positive"}

{"prompt":"@contoso_basketball disappoint for a third straight night. ->",


"completion":" negative"}

Once the model is fine-tuned, you can get back the log probabilities for the first
completion token by setting logprobs=2 on the completion request. The higher the
probability for positive class, the higher the relative sentiment.

Now we can query our model by making a Completion request.

Console
curl
https://YOUR_RESOURCE_NAME.openaiazure.com/openai/deployments/YOUR_DEPLOYMEN
T_NAME/completions?api-version=2023-05-15\ \

-H 'Content-Type: application/json' \

-H 'api-key: YOUR_API_KEY' \

-d '{

"prompt": "Excited to share my latest blog post! ->",

"max_tokens": 1,

"logprobs": 2

}'

Which will return:

JSON

"object": "text_completion",

"created": 1589498378,

"model": "YOUR_FINE_TUNED_MODEL_NAME",

"choices": [

"logprobs": {

"text_offset": [

19

],

"token_logprobs": [

-0.03597255

],

"tokens": [

" positive"

],

"top_logprobs": [

" negative": -4.9785037,

" positive": -0.03597255

},

"text": " positive",

"index": 0,

"finish_reason": "length"

Case study: Categorization for Email triage


Let's say you'd like to categorize incoming email into one of a large number of
predefined categories. For classification into a large number of categories, we
recommend you convert those categories into numbers, which will work well with up to
approximately 500 categories. We've observed that adding a space before the number
sometimes slightly helps the performance, due to tokenization. You may want to
structure your training data as follows:

JSON

"prompt":"Subject: <email_subject>\nFrom:<customer_name>\nDate:
<date>\nContent:<email_body>\n\n###\n\n", "completion":"
<numerical_category>"

For example:

JSON

"prompt":"Subject: Update my address\nFrom:Joe


Doe\nTo:support@ourcompany.com\nDate:2021-06-03\nContent:Hi,\nI would like
to update my billing address to match my delivery address.\n\nPlease let me
know once done.\n\nThanks,\nJoe\n\n###\n\n",

"completion":" 4"

In the example above we used an incoming email capped at 2043 tokens as input. (This
allows for a four token separator and a one token completion, summing up to 2048.) As
a separator we used \n\n###\n\n and we removed any occurrence of ### within the
email.

Conditional generation
Conditional generation is a problem where the content needs to be generated given
some kind of input. This includes paraphrasing, summarizing, entity extraction, product
description writing given specifications, chatbots and many others. For this type of
problem we recommend:

Use a separator at the end of the prompt, for example, \n\n###\n\n . Remember to
also append this separator when you eventually make requests to your model.
Use an ending token at the end of the completion, for example, END .
Remember to add the ending token as a stop sequence during inference, for
example, stop=[" END"] .
Aim for at least ~500 examples.
Ensure that the prompt + completion doesn't exceed 2048 tokens, including the
separator.
Ensure the examples are of high quality and follow the same desired format.
Ensure that the dataset used for fine-tuning is similar in structure and type of task
as what the model will be used for.
Using Lower learning rate and only 1-2 epochs tends to work better for these use
cases.

Case study: Write an engaging ad based on a Wikipedia article


This is a generative use case so you would want to ensure that the samples you provide
are of the highest quality, as the fine-tuned model will try to imitate the style (and
mistakes) of the given examples. A good starting point is around 500 examples. A
sample dataset might look like this:

JSON

"prompt":"<Product Name>\n<Wikipedia description>\n\n###\n\n",

"completion":" <engaging ad> END"

For example:

JSON

"prompt":"Samsung Galaxy Feel\nThe Samsung Galaxy Feel is an Android


smartphone developed by Samsung Electronics exclusively for the Japanese
market. The phone was released in June 2017 and was sold by NTT Docomo. It
runs on Android 7.0 (Nougat), has a 4.7 inch display, and a 3000 mAh
battery.\nSoftware\nSamsung Galaxy Feel runs on Android 7.0 (Nougat), but
can be later updated to Android 8.0 (Oreo).\nHardware\nSamsung Galaxy Feel
has a 4.7 inch Super AMOLED HD display, 16 MP back facing and 5 MP front
facing cameras. It has a 3000 mAh battery, a 1.6 GHz Octa-Core ARM Cortex-
A53 CPU, and an ARM Mali-T830 MP1 700 MHz GPU. It comes with 32GB of
internal storage, expandable to 256GB via microSD. Aside from its software
and hardware specifications, Samsung also introduced a unique a hole in the
phone's shell to accommodate the Japanese perceived penchant for
personalizing their mobile phones. The Galaxy Feel's battery was also touted
as a major selling point since the market favors handsets with longer
battery life. The device is also waterproof and supports 1seg digital
broadcasts using an antenna that is sold separately.\n\n###\n\n",

"completion":"Looking for a smartphone that can do it all? Look no


further than Samsung Galaxy Feel! With a slim and sleek design, our latest
smartphone features high-quality picture and video capabilities, as well as
an award winning battery life. END"

Here we used a multiline separator, as Wikipedia articles contain multiple paragraphs


and headings. We also used a simple end token, to ensure that the model knows when
the completion should finish.

Case study: Entity extraction


This is similar to a language transformation task. To improve the performance, it's best
to either sort different extracted entities alphabetically or in the same order as they
appear in the original text. This helps the model to keep track of all the entities which
need to be generated in order. The dataset could look as follows:

JSON

"prompt":"<any text, for example news article>\n\n###\n\n",

"completion":" <list of entities, separated by a newline> END"

For example:

JSON

"prompt":"Portugal will be removed from the UK's green travel list from
Tuesday, amid rising coronavirus cases and concern over a \"Nepal mutation
of the so-called Indian variant\". It will join the amber list, meaning
holidaymakers should not visit and returnees must isolate for 10
days...\n\n###\n\n",

"completion":" Portugal\nUK\nNepal mutation\nIndian variant END"

A multi-line separator works best, as the text will likely contain multiple lines. Ideally
there will be a high diversity of the types of input prompts (news articles, Wikipedia
pages, tweets, legal documents), which reflect the likely texts which will be encountered
when extracting entities.

Case study: Customer support chatbot

A chatbot will normally contain relevant context about the conversation (order details),
summary of the conversation so far, and most recent messages. For this use case the
same past conversation can generate multiple rows in the dataset, each time with a
slightly different context, for every agent generation as a completion. This use case
requires a few thousand examples, as it likely deals with different types of requests, and
customer issues. To ensure the performance is of high quality, we recommend vetting
the conversation samples to ensure the quality of agent messages. The summary can be
generated with a separate text transformation fine tuned model. The dataset could look
as follows:

JSON

{"prompt":"Summary: <summary of the interaction so far>\n\nSpecific


information:<for example order details in natural
language>\n\n###\n\nCustomer: <message1>\nAgent: <response1>\nCustomer:
<message2>\nAgent:", "completion":" <response2>\n"}

{"prompt":"Summary: <summary of the interaction so far>\n\nSpecific


information:<for example order details in natural
language>\n\n###\n\nCustomer: <message1>\nAgent: <response1>\nCustomer:
<message2>\nAgent: <response2>\nCustomer: <message3>\nAgent:",
"completion":" <response3>\n"}

Here we purposefully separated different types of input information, but maintained


Customer Agent dialog in the same format between a prompt and a completion. All the
completions should only be by the agent, and we can use \n as a stop sequence when
doing inference.

Case study: Product description based on a technical list of


properties

Here it's important to convert the input data into a natural language, which will likely
lead to superior performance. For example, the following format:

JSON

"prompt":"Item=handbag, Color=army_green, price=$99, size=S->",

"completion":"This stylish small green handbag will add a unique touch


to your look, without costing you a fortune."

Won't work as well as:

JSON

"prompt":"Item is a handbag. Colour is army green. Price is midrange.


Size is small.->",

"completion":"This stylish small green handbag will add a unique touch


to your look, without costing you a fortune."

For high performance, ensure that the completions were based on the description
provided. If external content is often consulted, then adding such content in an
automated way would improve the performance. If the description is based on images,
it may help to use an algorithm to extract a textual description of the image. Since
completions are only one sentence long, we can use . as the stop sequence during
inference.

Open ended generation


For this type of problem we recommend:

Leave the prompt empty.


No need for any separators.
You'll normally want a large number of examples, at least a few thousand.
Ensure the examples cover the intended domain or the desired tone of voice.

Case study: Maintaining company voice

Many companies have a large amount of high quality content generated in a specific
voice. Ideally all generations from our API should follow that voice for the different use
cases. Here we can use the trick of leaving the prompt empty, and feeding in all the
documents which are good examples of the company voice. A fine-tuned model can be
used to solve many different use cases with similar prompts to the ones used for base
models, but the outputs are going to follow the company voice much more closely than
previously.

JSON

{"prompt":"", "completion":" <company voice textual content>"}

{"prompt":"", "completion":" <company voice textual content2>"}

A similar technique could be used for creating a virtual character with a particular
personality, style of speech and topics the character talks about.

Generative tasks have a potential to leak training data when requesting completions
from the model, so extra care needs to be taken that this is addressed appropriately. For
example personal or sensitive company information should be replaced by generic
information or not be included into fine-tuning in the first place.
Next steps
Fine tune your model with our How-to guide
Learn more about the underlying models that power Azure OpenAI Service
Learn how to customize a model for
your application
Article • 04/05/2023

Azure OpenAI Service lets you tailor our models to your personal datasets using a
process known as fine-tuning. This customization step will let you get more out of the
service by providing:

Higher quality results than what you can get just from prompt design
The ability to train on more examples than can fit into a prompt
Lower-latency requests

A customized model improves on the few-shot learning approach by training the


model's weights on your specific prompts and structure. The customized model lets you
achieve better results on a wider number of tasks without needing to provide examples
in your prompt. The result is less text sent and fewer tokens processed on every API call,
saving cost and improving request latency.

7 Note

There is a breaking change in the create fine tunes command in the latest 12-01-
2022 GA API. For the latest command syntax consult the reference documentation

Prerequisites
An Azure subscription - Create one for free

Access granted to Azure OpenAI in the desired Azure subscription

Currently, access to this service is granted only by application. You can apply for
access to Azure OpenAI by completing the form at https://aka.ms/oai/access .
Open an issue on this repo to contact us if you have an issue.

An Azure OpenAI resource

For more information about creating a resource, see Create a resource and deploy
a model using Azure OpenAI.

Fine-tuning workflow
The fine-tuning workflow in Azure OpenAI Studio requires the following steps:

1. Prepare your training and validation data


2. Use the Create customized model wizard in Azure OpenAI Studio to train your
customized model
a. Select a base model
b. Choose your training data
c. Optionally, choose your validation data
d. Optionally, choose advanced options for your fine-tune job
e. Review your choices and train your new customized model
3. Check the status of your customized model
4. Deploy your customized model for use
5. Use your customized model
6. Optionally, analyze your customized model for performance and fit

Prepare your training and validation data


Your training data and validation data sets consist of input & output examples for how
you would like the model to perform.

The training and validation data you use must be formatted as a JSON Lines (JSONL)
document in which each line represents a single prompt-completion pair. The OpenAI
command-line interface (CLI) includes a data preparation tool that validates, gives
suggestions, and reformats your training data into a JSONL file ready for fine-tuning.

Here's an example of the training data format:

JSON

{"prompt": "<prompt text>", "completion": "<ideal generated text>"}

{"prompt": "<prompt text>", "completion": "<ideal generated text>"}

{"prompt": "<prompt text>", "completion": "<ideal generated text>"}

In addition to the JSONL format, training and validation data files must be encoded in
UTF-8 and include a byte-order mark (BOM), and the file must be less than 200 MB in
size. For more information about formatting your training data, see Learn how to
prepare your dataset for fine-tuning.

Creating your training and validation datasets


Designing your prompts and completions for fine-tuning is different from designing
your prompts for use with any of our GPT-3 base models. Prompts for completion calls
often use either detailed instructions or few-shot learning techniques, and consist of
multiple examples. For fine-tuning, we recommend that each training example consists
of a single input prompt and its desired completion output. You don't need to give
detailed instructions or multiple completion examples for the same prompt.

The more training examples you have, the better. We recommend having at least 200
training examples. In general, we've found that each doubling of the dataset size leads
to a linear increase in model quality.

For more information about preparing training data for various tasks, see Learn how to
prepare your dataset for fine-tuning.

OpenAI CLI data preparation tool


We recommend using OpenAI's command-line interface (CLI) to assist with many of the
data preparation steps. OpenAI has developed a tool that validates, gives suggestions,
and reformats your data into a JSONL file ready for fine-tuning.

To install the CLI, run the following Python command:

Console

pip install --upgrade openai

To analyze your training data with the data preparation tool, run the following Python
command, replacing <LOCAL_FILE> with the full path and file name of the training data
file to be analyzed:

Console

openai tools fine_tunes.prepare_data -f <LOCAL_FILE>

This tool accepts files in the following data formats, if they contain a prompt and a
completion column/key:

Comma-separated values (CSV)


Tab-separated values (TSV)
Microsoft Excel workbook (XLSX)
JavaScript Object Notation (JSON)
JSON Lines (JSONL)

The tool reformats your training data and saves output into a JSONL file ready for fine-
tuning, after guiding you through the process of implementing suggested changes.
Use the Create customized model wizard
Azure OpenAI Studio provides the Create customized model wizard, so you can
interactively create and train a fine-tuned model for your Azure resource.

Go to the Azure OpenAI Studio


Navigate to the Azure OpenAI Studio at https://oai.azure.com/ and sign in with
credentials that have access to your Azure OpenAI resource. During the sign-in
workflow, select the appropriate directory, Azure subscription, and Azure OpenAI
resource.

Landing page
You'll first land on our main page for Azure OpenAI Studio. From here, you can start
fine-tuning a custom model.

Select the Start fine-tuning a custom model button under Manage deployments and
models section of the landing page, highlighted in the following picture, to start fine-
tuning a custom model.

7 Note

If your resource doesn't have a model already deployed in it, a warning is


displayed. You can ignore that warning for the purposes of fine-tuning a model,
because you'll be fine-tuning and deploying a new customized model.

Start the wizard from the Models page


To create a customized model, select the Create customized model button under the
Provided models section on the Models page, highlighted in the following picture, to
start the Create customized model wizard.

Select a base model


The first step in creating a customized model is to choose a base model. The Base
model pane lets you choose a base model to use for your customized model, and the
choice influences both the performance and the cost of your model. You can create a
customized model from one of the following available base models:

ada

babbage

curie

code-cushman-001 *

davinci *

* currently unavailable for new customers.

For more information about our base models that can be fine-tuned, see Models. Select
a base model from the Base model type dropdown, as shown in the following picture,
and then select Next to continue.

Choose your training data


The next step is to either choose existing prepared training data or upload new
prepared training data to use when customizing your model. The Training data pane,
shown in the following picture, displays any existing, previously uploaded datasets and
provides options by which you can upload new training data.

If your training data has already been uploaded to the service, select Choose dataset,
and then select the file from the list shown in the Training data pane. Otherwise, select
either Local file to upload training data from a local file, or Azure blob or other shared
web locations to import training data from Azure Blob or another shared web location.

For large data files, we recommend you import from an Azure Blob store. Large files can
become unstable when uploaded through multipart forms because the requests are
atomic and can't be retried or resumed. For more information about Azure Blob storage,
see What is Azure Blob storage?

7 Note

Training data files must be formatted as JSONL files, encoded in UTF-8 with a byte-
order mark (BOM), and less than 200 MB in size.

To upload training data from a local file


You can upload a new training dataset to the service from a local file by using one of the
following methods:

Drag and drop the file into the client area of the Training data pane, and then
select Upload file
Select Browse for a file from the client area of the Training data pane, choose the
file to upload from the Open dialog, and then select Upload file.

After you've selected and uploaded the training dataset, select Next to optionally
choose your validation data.

To import training data from an Azure Blob store

You can import a training dataset from Azure Blob or another shared web location by
providing the name and location of the file, as shown in the following picture. Enter the
name of the file in File name and the Azure Blob URL, Azure Storage shared access
signature (SAS), or other link to an accessible shared web location that contains the file
in File location, then select Upload file to import the training dataset to the service.

After you've selected and uploaded the training dataset, select Next to optionally
choose your validation data.

Choose your validation data

You can now choose to optionally use validation data in the training process of your
fine-tuned model. If you don't want to use validation data, you can choose Next to
choose advanced options for your model. Otherwise, if you have a validation dataset,
you can either choose existing prepared validation data or upload new prepared
validation data to use when customizing your model. The Validation data pane, shown
in the following picture, displays any existing, previously uploaded training and
validation datasets and provides options by which you can upload new validation data.

If your validation data has already been uploaded to the service, select Choose dataset,
and then select the file from the list shown in the Validation data pane. Otherwise,
select either Local file to upload validation data from a local file, or Azure blob or other
shared web locations to import validation data from Azure Blob or another shared web
location.

For large data files, we recommend you import from an Azure Blob store. Large files can
become unstable when uploaded through multipart forms because the requests are
atomic and can't be retried or resumed.

7 Note

Like training data files, validation data files must be formatted as JSONL files,
encoded in UTF-8 with a byte-order mark (BOM), and less than 200 MB in size.

To upload validation data from a local file

You can upload a new validation dataset to the service from a local file by using one of
the following methods:
Drag and drop the file into the client area of the Validation data pane, and then
select Upload file
Select Browse for a file from the client area of the Validation data pane, choose
the file to upload from the Open dialog, and then select Upload file.

After you've uploaded the validation dataset, select Next to optionally choose advanced
options.

To import validation data from an Azure Blob store

You can import a validation dataset from Azure Blob or another shared web location by
providing the name and location of the file, as shown in the following picture. Enter the
name of the file in File name and the Azure Blob URL, Azure Storage shared access
signature (SAS), or other link to an accessible shared web location that contains the file
in File location, then select Upload file to import the validation dataset to the service.

After you've imported the validation dataset, select Next to optionally choose advanced
options.

Choose advanced options


You can either use default values for the hyperparameters of the fine-tune job that the
wizard runs to train your fine-tuned model, or you can adjust those hyperparameters for
your customization needs in the Advanced options pane, shown in the following
picture.

Either select Default to use the default values for the fine-tune job, or select Advanced
to display and edit the hyperparameter values, as shown in the following picture.


The following hyperparameters are available:

Parameter Description
name

Number The number of epochs to train the model for. An epoch refers to one full cycle
of epochs through the training dataset.

Batch size The batch size to use for training. The batch size is the number of training examples
used to train a single forward and backward pass.

Learning The learning rate multiplier to use for training. The fine-tuning learning rate is the
rate original learning rate used for pre-training, multiplied by this value.
multiplier

Prompt The weight to use for loss on the prompt tokens. This value controls how much the
loss model tries to learn to generate the prompt (as compared to the completion, which
weight always has a weight of 1.0.) Increasing this value can add a stabilizing effect to
training when completions are short.

For more information about these hyperparameters, see the Create a Fine tune job
section of the REST API documentation.

After you've chosen either default or advanced options, select Next to review your
choices and train your fine-tuned model.

Review your choices and train your model


The Review and train pane of the wizard displays information about the choices you've
made in the Create customized model wizard for your fine-tuned model, as shown in
the following picture.

If you're ready to train your model, select Save and close to start the fine-tune job and
return to the Models page.

Check the status of your customized model


The Models page displays information about your customized model in the Customized
models tab, as shown in the following picture. The tab includes information about the
status and job ID of the fine-tune job for your customized model. When the job is
completed, the file ID of the result file is also displayed.


After you've started a fine-tune job, it may take some time to complete. Your job may be
queued behind other jobs on our system, and training your model can take minutes or
hours depending on the model and dataset size. You can check the status of the fine-
tune job for your customized model in the Status column of the Customized models
tab on the Models page, and you can select Refresh to update the information on that
page.

You can also select the name of the model from the Model name column of the Models
page to display more information about your customized model, including the status of
the fine-tune job, training results, training events, and hyperparameters used in the job.
You can select the Refresh button to refresh the information for your model, as shown in
the following picture.

From the model page, you can also select Download training file to download the
training data you used for the model, or select Download results to download the result
file attached to the fine-tune job for your model and analyze your customized model for
training and validation performance.

Deploy a customized model


When the fine-tune job has succeeded, you can deploy the customized model from the
Models pane. You must deploy your customized model to make it available for use with
completion calls.
) Important

After a customized model is deployed, if at any time the deployment remains


inactive for greater than fifteen (15) days, the deployment will automatically be
deleted. The deployment of a customized model is “inactive” if the model was
deployed more than fifteen (15) days ago and no completions or chat completions
calls were made to it during a continuous 15-day period. The deletion of an inactive
deployment does NOT delete or affect the underlying customized model, and the
customized model can be redeployed at any time. As described in Azure OpenAI
Service pricing , each customized (fine-tuned) model that is deployed incurs an
hourly hosting cost regardless of whether completions or chat completions calls are
being made to the model. To learn more about planning and managing costs with
Azure OpenAI, refer to our cost management guide.

7 Note

Only one deployment is permitted for a customized model. An error message is


displayed if you select an already-deployed customized model.

To deploy your customized model, select the customized model to be deployed and
then select Deploy model, as shown in the following picture.

The Deploy model dialog is presented, in which you can provide a name for the
deployment of your customized model. Enter a name in Deployment name and then
select Create to start the deployment of your customized model.

You can monitor the progress of your deployment from the Deployments pane of Azure
OpenAI Studio.

Use a deployed customized model


Once your customized model has been deployed, you can use it like any other deployed
model. For example, you can use the Playground pane of Azure OpenAI Studio to
experiment with your new deployment, as shown in the following picture. You can
continue to use the same parameters with your customized model, such as temperature
and frequency penalty, as you can with other deployed models.

7 Note

As with all applications, we require a review process prior to going live.

Analyze your customized model


Azure OpenAI attaches a result file, named results.csv , to each fine-tune job once it's
completed. You can use the result file to analyze the training and validation performance
of your customized model. The file ID for the result file is listed for each customized
model in the Result file Id column of the Models pane for Azure OpenAI Studio. You can
use the file ID to identify and download the result file from the File Management pane
of Azure OpenAI Studio.

The result file is a CSV file containing a header row and a row for each training step
performed by the fine-tune job. The result file contains the following columns:

Column name Description

step The number of the training step. A training step represents a


single pass, forward and backward, on a batch of training data.

elapsed_tokens The number of tokens the customized model has seen so far,
including repeats.
Column name Description

elapsed_examples The number of examples the model has seen so far, including
repeats.

Each example represents one element in that step's batch of


training data. For example, if the Batch size parameter is set to
32 in the Advanced options pane, this value increments by 32 in
each training step.

training_loss The loss for the training batch.

training_sequence_accuracy The percentage of completions in the training batch for which


the model's predicted tokens exactly matched the true
completion tokens.

For example, if the batch size is set to 3 and your data contains
completions [[1, 2], [0, 5], [4, 2]] , this value is set to 0.67
(2 of 3) if the model predicted [[1, 1], [0, 5], [4, 2]] .

training_token_accuracy The percentage of tokens in the training batch that were


correctly predicted by the model.

For example, if the batch size is set to 3 and your data contains
completions [[1, 2], [0, 5], [4, 2]] , this value is set to 0.83
(5 of 6) if the model predicted [[1, 1], [0, 5], [4, 2]] .

validation_loss The loss for the validation batch.

validation_sequence_accuracy The percentage of completions in the validation batch for which


the model's predicted tokens exactly matched the true
completion tokens.

For example, if the batch size is set to 3 and your data contains
completions [[1, 2], [0, 5], [4, 2]] , this value is set to 0.67
(2 of 3) if the model predicted [[1, 1], [0, 5], [4, 2]] .

validation_token_accuracy The percentage of tokens in the validation batch that were


correctly predicted by the model.

For example, if the batch size is set to 3 and your data contains
completions [[1, 2], [0, 5], [4, 2]] , this value is set to 0.83
(5 of 6) if the model predicted [[1, 1], [0, 5], [4, 2]] .

Clean up your deployments, customized


models, and training files
When you're done with your customized model, you can delete the deployment and
model. You can also delete the training and validation files you uploaded to the service,
if needed.
Delete your model deployment

) Important

After a customized model is deployed, if at any time the deployment remains


inactive for greater than fifteen (15) days, the deployment will automatically be
deleted. The deployment of a customized model is “inactive” if the model was
deployed more than fifteen (15) days ago and no completions or chat completions
calls were made to it during a continuous 15-day period. The deletion of an inactive
deployment does NOT delete or affect the underlying customized model, and the
customized model can be redeployed at any time. As described in Azure OpenAI
Service pricing , each customized (fine-tuned) model that is deployed incurs an
hourly hosting cost regardless of whether completions or chat completions calls are
being made to the model. To learn more about planning and managing costs with
Azure OpenAI, refer to our cost management guide.

You can delete the deployment for your customized model from the Deployments page
for Azure OpenAI Studio. Select the deployment to delete, and then select Delete to
delete the deployment.

Delete your customized model


You can delete a customized model from the Models page for Azure OpenAI Studio.
Select the customized model to delete from the Customized models tab, and then
select Delete to delete the customized model.

7 Note

You cannot delete a customized model if it has an existing deployment. You must
first delete your model deployment before you can delete your customized model.

Delete your training files


You can optionally delete training and validation files you've uploaded for training, and
result files generated during training, from the File Management page for Azure OpenAI
Studio. Select the file to delete, and then select Delete to delete the file.

Next steps
Explore the full REST API Reference documentation to learn more about all the
fine-tuning capabilities. You can find the full REST documentation here.
Explore more of the Python SDK operations here .
How to configure Azure OpenAI Service
with managed identities
Article • 05/10/2023

More complex security scenarios require Azure role-based access control (Azure RBAC).
This document covers how to authenticate to your OpenAI resource using Azure Active
Directory (Azure AD).

In the following sections, you'll use the Azure CLI to assign roles, and obtain a bearer
token to call the OpenAI resource. If you get stuck, links are provided in each section
with all available options for each command in Azure Cloud Shell/Azure CLI.

Prerequisites
An Azure subscription - Create one for free

Access granted to the Azure OpenAI service in the desired Azure subscription

Currently, access to this service is granted only by application. You can apply for
access to Azure OpenAI by completing the form at https://aka.ms/oai/access .
Open an issue on this repo to contact us if you have an issue.

Azure CLI - Installation Guide

The following Python libraries: os, requests, json

Sign into the Azure CLI


To sign-in to the Azure CLI, run the following command and complete the sign-in. You
may need to do it again if your session has been idle for too long.

Azure CLI

az login

Assign yourself to the Cognitive Services User


role
Assigning yourself to the Cognitive Services User role will allow you to use your account
for access to the specific cognitive services resource

1. Get your user information

Azure CLI

export user=$(az account show -o json | jq -r .user.name)

2. Assign yourself to “Cognitive Services User” role.

Azure CLI

export resourceId=$(az group show -g $myResourceGroupName -o json | jq


-r .id)

az role assignment create --role "Cognitive Services User" --assignee


$user --scope $resourceId

7 Note

Role assignment change will take ~5 mins to become effective.

3. Acquire an Azure AD access token. Access tokens expire in one hour. you'll then
need to acquire another one.

Azure CLI

export accessToken=$(az account get-access-token --resource


https://cognitiveservices.azure.com -o json | jq -r .accessToken)

4. Make an API call


Use the access token to authorize your API call by setting the
Authorization header value.

Bash

curl ${endpoint%/}/openai/deployments/YOUR_DEPLOYMENT_NAME/completions?
api-version=2023-05-15 \

-H "Content-Type: application/json" \

-H "Authorization: Bearer $accessToken" \

-d '{ "prompt": "Once upon a time" }'

Authorize access to managed identities


OpenAI supports Azure Active Directory (Azure AD) authentication with managed
identities for Azure resources. Managed identities for Azure resources can authorize
access to Cognitive Services resources using Azure AD credentials from applications
running in Azure virtual machines (VMs), function apps, virtual machine scale sets, and
other services. By using managed identities for Azure resources together with Azure AD
authentication, you can avoid storing credentials with your applications that run in the
cloud.

Enable managed identities on a VM


Before you can use managed identities for Azure resources to authorize access to
Cognitive Services resources from your VM, you must enable managed identities for
Azure resources on the VM. To learn how to enable managed identities for Azure
Resources, see:

Azure portal
Azure PowerShell
Azure CLI
Azure Resource Manager template
Azure Resource Manager client libraries

For more information about managed identities, see Managed identities for Azure
resources.
Manage Azure OpenAI Service quota
Article • 06/09/2023

Quota provides the flexibility to actively manage the allocation of rate limits across the
deployments within your subscription. This article walks through the process of
managing your Azure OpenAI quota.

Introduction to quota
Azure OpenAI's quota feature enables assignment of rate limits to your deployments,
up-to a global limit called your “quota.” Quota is assigned to your subscription on a per-
region, per-model basis in units of Tokens-per-Minute (TPM). When you onboard a
subscription to Azure OpenAI, you'll receive default quota for most available models.
Then, you'll assign TPM to each deployment as it is created, and the available quota for
that model will be reduced by that amount. You can continue to create deployments
and assign them TPM until you reach your quota limit. Once that happens, you can only
create new deployments of that model by reducing the TPM assigned to other
deployments of the same model (thus freeing TPM for use), or by requesting and being
approved for a model quota increase in the desired region.

7 Note

With a quota of 240,000 TPM for GPT-35-Turbo in East US, a customer can create a
single deployment of 240K TPM, 2 deployments of 120K TPM each, or any number
of deployments in one or multiple Azure OpenAI resources as long as their TPM
adds up to less than 240K total in that region.

When a deployment is created, the assigned TPM will directly map to the tokens-per-
minute rate limit enforced on its inferencing requests. A Requests-Per-Minute (RPM)
rate limit will also be enforced whose value is set proportionally to the TPM assignment
using the following ratio:

6 RPM per 1000 TPM.

The flexibility to distribute TPM globally within a subscription and region has allowed
Azure OpenAI Service to loosen other restrictions:

The maximum resources per region are increased to 30.


The limit on creating no more than one deployment of the same model in a
resource has been removed.
Assign quota
When you create a model deployment, you have the option to assign Tokens-Per-
Minute (TPM) to that deployment. TPM can be modified in increments of 1,000, and will
map to the TPM and RPM rate limits enforced on your deployment, as discussed above.

To create a new deployment from within the Azure AI Studio under Management select
Deployments > Create new deployment.

The option to set the TPM is under the Advanced options drop-down:

Post deployment you can adjust your TPM allocation by selecting Edit deployment
under Management > Deployments in Azure AI Studio. You can also modify this
selection within the new quota management experience under Management > Quotas.
) Important

Quotas and limits are subject to change, for the most up-date-information consult
our quotas and limits article.

Model specific settings


Different model deployments, also called model classes have unique max TPM values
that you're now able to control. This represents the maximum amount of TPM that can
be allocated to that type of model deployment in a given region. While each model
type represents its own unique model class, the max TPM value is currently only
different for certain model classes:

GPT-4
GPT-4-32K
Text-Davinci-003

All other model classes have a common max TPM value.

7 Note

Quota Tokens-Per-Minute (TPM) allocation is not related to the max input token
limit of a model. Model input token limits are defined in the models table and are
not impacted by changes made to TPM.

View and request quota


For an all up view of your quota allocations across deployments in a given region, select
Management > Quota in Azure AI Studio:

Quota Name: There's one quota value per region for each model type. The quota
covers all versions of that model. The quota name can be expanded in the UI to
show the deployments that are using the quota.
Deployment: Model deployments divided by model class.
Usage/Limit: For the quota name, this shows how much quota is used by
deployments and the total quota approved for this subscription and region. This
amount of quota used is also represented in the bar graph.
Request Quota: The icon in this field navigates to a form where requests to
increase quota can be submitted.

Migrating existing deployments


As part of the transition to the new quota system and TPM based allocation, all existing
Azure OpenAI model deployments have been automatically migrated to use quota. In
cases where the existing TPM/RPM allocation exceeds the default values due to previous
custom rate-limit increases, equivalent TPM were assigned to the impacted
deployments.

Understanding rate limits


Assigning TPM to a deployment sets the Tokens-Per-Minute (TPM) and Requests-Per-
Minute (RPM) rate limits for the deployment, as described above. TPM rate limits are
based on the maximum number of tokens that are estimated to be processed by a
request at the time the request is received. It isn't the same as the token count used for
billing, which is computed after all processing is completed.

As each request is received, Azure OpenAI computes an estimated max processed-token


count that includes the following:

Prompt text and count


The max_tokens parameter setting
The best_of parameter setting

As requests come into the deployment endpoint, the estimated max-processed-token


count is added to a running token count of all requests that is reset each minute. If at
any time during that minute, the TPM rate limit value is reached, then further requests
will receive a 429 response code until the counter resets.

RPM rate limits are based on the number of requests received over time. The rate limit
expects that requests be evenly distributed over a one-minute period. If this average
flow isn't maintained, then requests may receive a 429 response even though the limit
isn't met when measured over the course of a minute. To implement this behavior, Azure
OpenAI Service evaluates the rate of incoming requests over a small period of time,
typically 1 or 10 seconds. If the number of requests received during that time exceeds
what would be expected at the set RPM limit, then new requests will receive a 429
response code until the next evaluation period. For example, if Azure OpenAI is
monitoring request rate on 1-second intervals, then rate limiting will occur for a 600-
RPM deployment if more than 10 requests are received during each 1-second period
(600 requests per minute = 10 requests per second).

Rate limit best practices


To minimize issues related to rate limits, it's a good idea to use the following techniques:

Set max_tokens and best_of to the minimum values that serve the needs of your
scenario. For example, don’t set a large max-tokens value if you expect your
responses to be small.
Use quota management to increase TPM on deployments with high traffic, and to
reduce TPM on deployments with limited needs.
Implement retry logic in your application.
Avoid sharp changes in the workload. Increase the workload gradually.
Test different load increase patterns.
Next steps
To review quota defaults for Azure OpenAI, consult the quotas & limits article
Configure Azure Cognitive Services
virtual networks
Article • 07/04/2023

Azure Cognitive Services provides a layered security model. This model enables you to
secure your Cognitive Services accounts to a specific subset of networks​. When network
rules are configured, only applications requesting data over the specified set of
networks can access the account. You can limit access to your resources with request
filtering. Allowing only requests originating from specified IP addresses, IP ranges or
from a list of subnets in Azure Virtual Networks.

An application that accesses a Cognitive Services resource when network rules are in
effect requires authorization. Authorization is supported with Azure Active Directory
(Azure AD) credentials or with a valid API key.

) Important

Turning on firewall rules for your Cognitive Services account blocks incoming
requests for data by default. In order to allow requests through, one of the
following conditions needs to be met:

The request should originate from a service operating within an Azure Virtual
Network (VNet) on the allowed subnet list of the target Cognitive Services
account. The endpoint in requests originated from VNet needs to be set as
the custom subdomain of your Cognitive Services account.
Or the request should originate from an allowed list of IP addresses.

Requests that are blocked include those from other Azure services, from the Azure
portal, from logging and metrics services, and so on.

7 Note

We recommend that you use the Azure Az PowerShell module to interact with
Azure. See Install Azure PowerShell to get started. To learn how to migrate to the
Az PowerShell module, see Migrate Azure PowerShell from AzureRM to Az.

Scenarios
To secure your Cognitive Services resource, you should first configure a rule to deny
access to traffic from all networks (including internet traffic) by default. Then, you should
configure rules that grant access to traffic from specific VNets. This configuration
enables you to build a secure network boundary for your applications. You can also
configure rules to grant access to traffic from select public internet IP address ranges,
enabling connections from specific internet or on-premises clients.

Network rules are enforced on all network protocols to Azure Cognitive Services,
including REST and WebSocket. To access data using tools such as the Azure test
consoles, explicit network rules must be configured. You can apply network rules to
existing Cognitive Services resources, or when you create new Cognitive Services
resources. Once network rules are applied, they're enforced for all requests.

Supported regions and service offerings


Virtual networks (VNETs) are supported in regions where Cognitive Services are
available . Cognitive Services supports service tags for network rules configuration. The
services listed below are included in the CognitiveServicesManagement service tag.

" Anomaly Detector
" Azure OpenAI
" Computer Vision
" Content Moderator
" Custom Vision
" Face
" Language Understanding (LUIS)
" Personalizer
" Speech service
" Language service
" QnA Maker
" Translator Text

7 Note

If you're using, Azure OpenAI, LUIS, Speech Services, or Language services, the
CognitiveServicesManagement tag only enables you use the service using the SDK
or REST API. To access and use Azure OpenAI Studio, LUIS portal , Speech Studio or
Language Studio from a virtual network, you will need to use the following tags:

AzureActiveDirectory
AzureFrontDoor.Frontend
AzureResourceManager
CognitiveServicesManagement
CognitiveServicesFrontEnd

Change the default network access rule


By default, Cognitive Services resources accept connections from clients on any network.
To limit access to selected networks, you must first change the default action.

2 Warning

Making changes to network rules can impact your applications' ability to connect
to Azure Cognitive Services. Setting the default network rule to deny blocks all
access to the data unless specific network rules that grant access are also applied.
Be sure to grant access to any allowed networks using network rules before you
change the default rule to deny access. If you are allow listing IP addresses for your
on-premises network, be sure to add all possible outgoing public IP addresses from
your on-premises network.

Managing default network access rules


You can manage default network access rules for Cognitive Services resources through
the Azure portal, PowerShell, or the Azure CLI.

Azure portal

1. Go to the Cognitive Services resource you want to secure.

2. Select the RESOURCE MANAGEMENT menu called Virtual network.


3. To deny access by default, choose to allow access from Selected networks.
With the Selected networks setting alone, unaccompanied by configured
Virtual networks or Address ranges - all access is effectively denied. When all
access is denied, requests attempting to consume the Cognitive Services
resource aren't permitted. The Azure portal, Azure PowerShell or, Azure CLI
can still be used to configure the Cognitive Services resource.

4. To allow traffic from all networks, choose to allow access from All networks.

5. Select Save to apply your changes.

Grant access from a virtual network


You can configure Cognitive Services resources to allow access only from specific
subnets. The allowed subnets may belong to a VNet in the same subscription, or in a
different subscription, including subscriptions belonging to a different Azure Active
Directory tenant.

Enable a service endpoint for Azure Cognitive Services within the VNet. The service
endpoint routes traffic from the VNet through an optimal path to the Azure Cognitive
Services service. The identities of the subnet and the virtual network are also transmitted
with each request. Administrators can then configure network rules for the Cognitive
Services resource that allow requests to be received from specific subnets in a VNet.
Clients granted access via these network rules must continue to meet the authorization
requirements of the Cognitive Services resource to access the data.

Each Cognitive Services resource supports up to 100 virtual network rules, which may be
combined with IP network rules.

Required permissions
To apply a virtual network rule to a Cognitive Services resource, the user must have the
appropriate permissions for the subnets being added. The required permission is the
default Contributor role, or the Cognitive Services Contributor role. Required permissions
can also be added to custom role definitions.

Cognitive Services resource and the virtual networks granted access may be in different
subscriptions, including subscriptions that are a part of a different Azure AD tenant.

7 Note

Configuration of rules that grant access to subnets in virtual networks that are a
part of a different Azure Active Directory tenant are currently only supported
through PowerShell, CLI and REST APIs. Such rules cannot be configured through
the Azure portal, though they may be viewed in the portal.

Managing virtual network rules


You can manage virtual network rules for Cognitive Services resources through the
Azure portal, PowerShell, or the Azure CLI.

Azure portal

1. Go to the Cognitive Services resource you want to secure.

2. Select the RESOURCE MANAGEMENT menu called Virtual network.


3. Check that you've selected to allow access from Selected networks.

4. To grant access to a virtual network with an existing network rule, under


Virtual networks, select Add existing virtual network.

5. Select the Virtual networks and Subnets options, and then select Enable.
6. To create a new virtual network and grant it access, select Add new virtual
network.
7. Provide the information necessary to create the new virtual network, and then
select Create.
7 Note

If a service endpoint for Azure Cognitive Services wasn't previously


configured for the selected virtual network and subnets, you can
configure it as part of this operation.

Presently, only virtual networks belonging to the same Azure Active


Directory tenant are shown for selection during rule creation. To grant
access to a subnet in a virtual network belonging to another tenant,
please use PowerShell, CLI or REST APIs.
8. To remove a virtual network or subnet rule, select ... to open the context menu
for the virtual network or subnet, and select Remove.

9. Select Save to apply your changes.

) Important

Be sure to set the default rule to deny, or network rules have no effect.

Grant access from an internet IP range


You can configure Cognitive Services resources to allow access from specific public
internet IP address ranges. This configuration grants access to specific services and on-
premises networks, effectively blocking general internet traffic.

Provide allowed internet address ranges using CIDR notation in the form
16.17.18.0/24 or as individual IP addresses like 16.17.18.19 .

 Tip

Small address ranges using "/31" or "/32" prefix sizes are not supported. These
ranges should be configured using individual IP address rules.
IP network rules are only allowed for public internet IP addresses. IP address ranges
reserved for private networks (as defined in RFC 1918 ) aren't allowed in IP rules.
Private networks include addresses that start with 10.* , 172.16.* - 172.31.* , and
192.168.* .

Only IPV4 addresses are supported at this time. Each Cognitive Services resource
supports up to 100 IP network rules, which may be combined with Virtual network rules.

Configuring access from on-premises networks


To grant access from your on-premises networks to your Cognitive Services resource
with an IP network rule, you must identify the internet facing IP addresses used by your
network. Contact your network administrator for help.

If you're using ExpressRoute on-premises for public peering or Microsoft peering, you
need to identify the NAT IP addresses. For public peering, each ExpressRoute circuit by
default uses two NAT IP addresses. Each is applied to Azure service traffic when the
traffic enters the Microsoft Azure network backbone. For Microsoft peering, the NAT IP
addresses that are used are either customer provided or are provided by the service
provider. To allow access to your service resources, you must allow these public IP
addresses in the resource IP firewall setting. To find your public peering ExpressRoute
circuit IP addresses, open a support ticket with ExpressRoute via the Azure portal.
Learn more about NAT for ExpressRoute public and Microsoft peering.

Managing IP network rules


You can manage IP network rules for Cognitive Services resources through the Azure
portal, PowerShell, or the Azure CLI.

Azure portal

1. Go to the Cognitive Services resource you want to secure.

2. Select the RESOURCE MANAGEMENT menu called Virtual network.

3. Check that you've selected to allow access from Selected networks.

4. To grant access to an internet IP range, enter the IP address or address range


(in CIDR format ) under Firewall > Address Range. Only valid public IP (non-
reserved) addresses are accepted.
5. To remove an IP network rule, select the trash can チ icon next to the address
range.

6. Select Save to apply your changes.

) Important

Be sure to set the default rule to deny, or network rules have no effect.
Use private endpoints
You can use private endpoints for your Cognitive Services resources to allow clients on a
virtual network (VNet) to securely access data over a Private Link. The private endpoint
uses an IP address from the VNet address space for your Cognitive Services resource.
Network traffic between the clients on the VNet and the resource traverses the VNet
and a private link on the Microsoft backbone network, eliminating exposure from the
public internet.

Private endpoints for Cognitive Services resources let you:

Secure your Cognitive Services resource by configuring the firewall to block all
connections on the public endpoint for the Cognitive Services service.
Increase security for the VNet, by enabling you to block exfiltration of data from
the VNet.
Securely connect to Cognitive Services resources from on-premises networks that
connect to the VNet using VPN or ExpressRoutes with private-peering.

Conceptual overview
A private endpoint is a special network interface for an Azure resource in your VNet.
Creating a private endpoint for your Cognitive Services resource provides secure
connectivity between clients in your VNet and your resource. The private endpoint is
assigned an IP address from the IP address range of your VNet. The connection between
the private endpoint and the Cognitive Services service uses a secure private link.

Applications in the VNet can connect to the service over the private endpoint
seamlessly, using the same connection strings and authorization mechanisms that they
would use otherwise. The exception is the Speech Services, which require a separate
endpoint. See the section on Private endpoints with the Speech Services. Private
endpoints can be used with all protocols supported by the Cognitive Services resource,
including REST.

Private endpoints can be created in subnets that use Service Endpoints. Clients in a
subnet can connect to one Cognitive Services resource using private endpoint, while
using service endpoints to access others.

When you create a private endpoint for a Cognitive Services resource in your VNet, a
consent request is sent for approval to the Cognitive Services resource owner. If the user
requesting the creation of the private endpoint is also an owner of the resource, this
consent request is automatically approved.
Cognitive Services resource owners can manage consent requests and the private
endpoints, through the 'Private endpoints' tab for the Cognitive Services resource in the
Azure portal .

Private endpoints
When creating the private endpoint, you must specify the Cognitive Services resource it
connects to. For more information on creating a private endpoint, see:

Create a private endpoint using the Private Link Center in the Azure portal
Create a private endpoint using Azure CLI
Create a private endpoint using Azure PowerShell

Connecting to private endpoints

7 Note

Azure OpenAI Service uses a different private DNS zone and public DNS zone
forwarder than other Azure Cognitive Services. Refer to the Azure services DNS
zone configuration article for the correct zone and forwader names.

Clients on a VNet using the private endpoint should use the same connection string for
the Cognitive Services resource as clients connecting to the public endpoint. The
exception is the Speech Services, which require a separate endpoint. See the section on
Private endpoints with the Speech Services. We rely upon DNS resolution to
automatically route the connections from the VNet to the Cognitive Services resource
over a private link.

We create a private DNS zone attached to the VNet with the necessary updates for the
private endpoints, by default. However, if you're using your own DNS server, you may
need to make more changes to your DNS configuration. The section on DNS changes
below describes the updates required for private endpoints.

Private endpoints with the Speech Services


See Using Speech Services with private endpoints provided by Azure Private Link.

DNS changes for private endpoints


When you create a private endpoint, the DNS CNAME resource record for the Cognitive
Services resource is updated to an alias in a subdomain with the prefix privatelink . By
default, we also create a private DNS zone, corresponding to the privatelink
subdomain, with the DNS A resource records for the private endpoints.

When you resolve the endpoint URL from outside the VNet with the private endpoint, it
resolves to the public endpoint of the Cognitive Services resource. When resolved from
the VNet hosting the private endpoint, the endpoint URL resolves to the private
endpoint's IP address.

This approach enables access to the Cognitive Services resource using the same
connection string for clients in the VNet hosting the private endpoints and clients
outside the VNet.

If you're using a custom DNS server on your network, clients must be able to resolve the
fully qualified domain name (FQDN) for the Cognitive Services resource endpoint to the
private endpoint IP address. Configure your DNS server to delegate your private link
subdomain to the private DNS zone for the VNet.

 Tip

When using a custom or on-premises DNS server, you should configure your DNS
server to resolve the Cognitive Services resource name in the 'privatelink'
subdomain to the private endpoint IP address. You can do this by delegating the
'privatelink' subdomain to the private DNS zone of the VNet, or configuring the
DNS zone on your DNS server and adding the DNS A records.

For more information on configuring your own DNS server to support private endpoints,
see the following articles:

Name resolution for resources in Azure virtual networks


DNS configuration for private endpoints

Pricing
For pricing details, see Azure Private Link pricing .

Next steps
Explore the various Azure Cognitive Services
Learn more about Azure Virtual Network Service Endpoints
Use Azure OpenAI with large datasets
Article • 02/17/2023

Azure OpenAI can be used to solve a large number of natural language tasks through
prompting the completion API. To make it easier to scale your prompting workflows
from a few examples to large datasets of examples, we have integrated the Azure
OpenAI service with the distributed machine learning library SynapseML . This
integration makes it easy to use the Apache Spark distributed computing framework
to process millions of prompts with the OpenAI service. This tutorial shows how to apply
large language models at a distributed scale using Azure Open AI and Azure Synapse
Analytics.

Prerequisites
An Azure subscription - Create one for free

Access granted to Azure OpenAI in the desired Azure subscription

Currently, access to this service is granted only by application. You can apply for
access to Azure OpenAI by completing the form at https://aka.ms/oai/access .
Open an issue on this repo to contact us if you have an issue.

An Azure OpenAI resource – create a resource

An Apache Spark cluster with SynapseML installed - create a serverless Apache


Spark pool here

We recommend creating a Synapse workspace, but an Azure Databricks, HDInsight, or


Spark on Kubernetes, or even a Python environment with the pyspark package, will also
work.

Import this guide as a notebook


The next step is to add this code into your Spark cluster. You can either create a
notebook in your Spark platform and copy the code into this notebook to run the demo,
or download the notebook and import it into Synapse Analytics.

1. Download this demo as a notebook (click Raw, then save the file)
2. Import the notebook into the Synapse Workspace or, if using Databricks, into the
Databricks Workspace
3. Install SynapseML on your cluster. See the installation instructions for Synapse at
the bottom of the SynapseML website . This requires pasting another cell at the
top of the notebook you imported
4. Connect your notebook to a cluster and follow along, editing and running the cells
below.

Fill in your service information


Next, edit the cell in the notebook to point to your service. In particular, set the
resource_name , deployment_name , location , and key variables to the corresponding

values for your Azure OpenAI resource.

) Important

Remember to remove the key from your code when you're done, and never post it
publicly. For production, use a secure way of storing and accessing your credentials
like Azure Key Vault. See the Cognitive Services security article for more
information.

Python

import os

# Replace the following values with your Azure OpenAI resource information

resource_name = "RESOURCE_NAME" # The name of your Azure OpenAI


resource.

deployment_name = "DEPLOYMENT_NAME" # The name of your Azure OpenAI


deployment.

location = "RESOURCE_LOCATION" # The location or region ID for your


resource.

key = "RESOURCE_API_KEY" # The key for your resource.

assert key is not None and resource_name is not None

Create a dataset of prompts


Next, create a dataframe consisting of a series of rows, with one prompt per row.

You can also load data directly from Azure Data Lake Storage (ADLS) or other databases.
For more information about loading and preparing Spark dataframes, see the Apache
Spark data loading guide .

Python
df = spark.createDataFrame(

("Hello my name is",),

("The best code is code that's",),

("SynapseML is ",),

).toDF("prompt")

Create the OpenAICompletion Apache Spark


client
To apply the OpenAI Completion service to the dataframe that you just created, create
an OpenAICompletion object that serves as a distributed client. Parameters of the service
can be set either with a single value, or by a column of the dataframe with the
appropriate setters on the OpenAICompletion object. Here, we're setting maxTokens to
200. A token is around four characters, and this limit applies to the sum of the prompt
and the result. We're also setting the promptCol parameter with the name of the prompt
column in the dataframe.

Python

from synapse.ml.cognitive import OpenAICompletion

completion = (

OpenAICompletion()

.setSubscriptionKey(key)

.setDeploymentName(deployment_name)

.setUrl("https://{}.openai.azure.com/".format(resource_name))

.setMaxTokens(200)

.setPromptCol("prompt")

.setErrorCol("error")

.setOutputCol("completions")

Transform the dataframe with the


OpenAICompletion client
Now that you have the dataframe and the completion client, you can transform your
input dataset and add a column called completions with all of the information the
service adds. We'll select out just the text for simplicity.

Python
from pyspark.sql.functions import col

completed_df = completion.transform(df).cache()

display(completed_df.select(

col("prompt"), col("error"),
col("completions.choices.text").getItem(0).alias("text")))

Your output should look something like the following example; note that the completion
text can vary.

prompt error text

Hello my name is undefined Makaveli I'm eighteen years old and I want to

be a rapper when I grow up I love writing and making music


I'm from Los

Angeles, CA

The best code is undefined understandable This is a subjective statement,

code that's and there is no definitive answer.

SynapseML is undefined A machine learning algorithm that is able to learn how to


predict the future outcome of events.

Other usage examples

Improve throughput with request batching


The example above makes several requests to the service, one for each prompt. To
complete multiple prompts in a single request, use batch mode. First, in the
OpenAICompletion object, instead of setting the Prompt column to "Prompt", specify

"batchPrompt" for the BatchPrompt column.


To do so, create a dataframe with a list of
prompts per row.

7 Note

There is currently a limit of 20 prompts in a single request and a limit of 2048


"tokens", or approximately 1500 words.

Python

batch_df = spark.createDataFrame(
[

(["The time has come", "Pleased to", "Today stocks", "Here's to"],),

(["The only thing", "Ask not what", "Every litter", "I am"],),

).toDF("batchPrompt")

Next we create the OpenAICompletion object. Rather than setting the prompt column, set
the batchPrompt column if your column is of type Array[String] .

Python

batch_completion = (

OpenAICompletion()

.setSubscriptionKey(key)

.setDeploymentName(deployment_name)

.setUrl("https://{}.openai.azure.com/".format(resource_name))

.setMaxTokens(200)

.setBatchPromptCol("batchPrompt")

.setErrorCol("error")

.setOutputCol("completions")

In the call to transform, a request will then be made per row. Because there are multiple
prompts in a single row, each request will be sent with all prompts in that row. The
results will contain a row for each row in the request.

Python

completed_batch_df = batch_completion.transform(batch_df).cache()

display(completed_batch_df)

7 Note

There is currently a limit of 20 prompts in a single request and a limit of 2048


"tokens", or approximately 1500 words.

Using an automatic mini-batcher


If your data is in column format, you can transpose it to row format using SynapseML's
FixedMiniBatcherTransformer .

Python

from pyspark.sql.types import StringType

from synapse.ml.stages import FixedMiniBatchTransformer

from synapse.ml.core.spark import FluentAPI

completed_autobatch_df = (df

.coalesce(1) # Force a single partition so that our little 4-row dataframe


makes a batch of size 4, you can remove this step for large datasets

.mlTransform(FixedMiniBatchTransformer(batchSize=4))

.withColumnRenamed("prompt", "batchPrompt")

.mlTransform(batch_completion))

display(completed_autobatch_df)

Prompt engineering for translation


Azure OpenAI can solve many different natural language tasks through prompt
engineering. Here, we show an example of prompting for language translation:

Python

translate_df = spark.createDataFrame(

("Japanese: Ookina hako \nEnglish: Big box \nJapanese: Midori


tako\nEnglish:",),

("French: Quelle heure est-il à Montréal? \nEnglish: What time is it


in Montreal? \nFrench: Où est le poulet? \nEnglish:",),

).toDF("prompt")

display(completion.transform(translate_df))

Prompt for question answering


Here, we prompt the GPT-3 model for general-knowledge question answering:

Python

qa_df = spark.createDataFrame(

"Q: Where is the Grand Canyon?\nA: The Grand Canyon is in


Arizona.\n\nQ: What is the weight of the Burj Khalifa in kilograms?\nA:",

).toDF("prompt")

display(completion.transform(qa_df))

Azure OpenAI Service encryption of


data at rest
Article • 04/21/2023

Azure OpenAI automatically encrypts your data when it's persisted to the cloud. The
encryption protects your data and helps you meet your organizational security and
compliance commitments. This article covers how Azure OpenAI handles encryption of
data at rest, specifically training data and fine-tuned models. For information on how
data provided by you to the service is processed, used, and stored, consult the data,
privacy, and security article.

About Cognitive Services encryption


Azure OpenAI is part of Azure Cognitive Services. Cognitive Services data is encrypted
and decrypted using FIPS 140-2 compliant 256-bit AES encryption. Encryption and
decryption are transparent, meaning encryption and access are managed for you. Your
data is secure by default and you don't need to modify your code or applications to take
advantage of encryption.

About encryption key management


By default, your subscription uses Microsoft-managed encryption keys. There's also the
option to manage your subscription with your own keys called customer-managed keys
(CMK). CMK offers greater flexibility to create, rotate, disable, and revoke access
controls. You can also audit the encryption keys used to protect your data.

Customer-managed keys with Azure Key Vault


Customer-managed keys (CMK), also known as Bring your own key (BYOK), offer greater
flexibility to create, rotate, disable, and revoke access controls. You can also audit the
encryption keys used to protect your data.

You must use Azure Key Vault to store your customer-managed keys. You can either
create your own keys and store them in a key vault, or you can use the Azure Key Vault
APIs to generate keys. The Cognitive Services resource and the key vault must be in the
same region and in the same Azure Active Directory (Azure AD) tenant, but they can be
in different subscriptions. For more information about Azure Key Vault, see What is
Azure Key Vault?.
To request the ability to use customer-managed keys, fill out and submit the Cognitive
Services Customer-Managed Key Request Form . It will take approximately 3-5
business days to hear back on the status of your request.

To enable customer-managed keys, you must also enable both the Soft Delete and Do
Not Purge properties on the key vault.

Only RSA keys of size 2048 are supported with Cognitive Services encryption. For more
information about keys, see Key Vault keys in About Azure Key Vault keys, secrets and
certificates.

Enable customer-managed keys for your


resource
To enable customer-managed keys in the Azure portal, follow these steps:

1. Go to your Cognitive Services resource.


2. On the left, select Encryption.
3. Under Encryption type, select Customer Managed Keys, as shown in the following
screenshot.

Specify a key
After you enable customer-managed keys, you can specify a key to associate with the
Cognitive Services resource.

Specify a key as a URI


To specify a key as a URI, follow these steps:
1. In the Azure portal, go to your key vault.

2. Under Settings, select Keys.

3. Select the desired key, and then select the key to view its versions. Select a key
version to view the settings for that version.

4. Copy the Key Identifier value, which provides the URI.

5. Go back to your Cognitive Services resource, and then select Encryption.

6. Under Encryption key, select Enter key URI.

7. Paste the URI that you copied into the Key URI box.
8. Under Subscription, select the subscription that contains the key vault.

9. Save your changes.

Specify a key from a key vault


To specify a key from a key vault, first make sure that you have a key vault that contains
a key. Then follow these steps:

1. Go to your Cognitive Services resource, and then select Encryption.

2. Under Encryption key, select Select from Key Vault.

3. Select the key vault that contains the key that you want to use.

4. Select the key that you want to use.


5. Save your changes.

Update the key version


When you create a new version of a key, update the Cognitive Services resource to use
the new version. Follow these steps:

1. Go to your Cognitive Services resource, and then select Encryption.


2. Enter the URI for the new key version. Alternately, you can select the key vault and
then select the key again to update the version.
3. Save your changes.

Use a different key


To change the key that you use for encryption, follow these steps:

1. Go to your Cognitive Services resource, and then select Encryption.


2. Enter the URI for the new key. Alternately, you can select the key vault and then
select a new key.
3. Save your changes.

Rotate customer-managed keys


You can rotate a customer-managed key in Key Vault according to your compliance
policies. When the key is rotated, you must update the Cognitive Services resource to
use the new key URI. To learn how to update the resource to use a new version of the
key in the Azure portal, see Update the key version.
Rotating the key doesn't trigger re-encryption of data in the resource. No further action
is required from the user.

Revoke a customer-managed key


You can revoke a customer-managed encryption key by changing the access policy, by
changing the permissions on the key vault, or by deleting the key.

To change the access policy of the managed identity that your registry uses, run the az-
keyvault-delete-policy command:

Azure CLI

az keyvault delete-policy \

--resource-group <resource-group-name> \

--name <key-vault-name> \

--key_id <key-vault-key-id>

To delete the individual versions of a key, run the az-keyvault-key-delete command. This
operation requires the keys/delete permission.

Azure CLI

az keyvault key delete \

--name <key-vault-name> \

--object-id $identityPrincipalID \

) Important

Revoking access to an active customer-managed key while CMK is still enabled will
prevent downloading of training data and results files, fine-tuning new models, and
deploying fine-tuned models. However, previously deployed fine-tuned models will
continue to operate and serve traffic until those deployments are deleted.

Delete training, validation, and training results data


The Files API allows customers to upload their training data for the purpose of fine-
tuning a model. This data is stored in Azure Storage, within the same region as the
resource and logically isolated with their Azure subscription and API Credentials.
Uploaded files can be deleted by the user via the DELETE API operation.
Delete fine-tuned models and deployments
The Fine-tunes API allows customers to create their own fine-tuned version of the
OpenAI models based on the training data that you've uploaded to the service via the
Files APIs. The trained fine-tuned models are stored in Azure Storage in the same region,
encrypted at rest (either with Microsoft-managed keys or customer-managed keys) and
logically isolated with their Azure subscription and API credentials. Fine-tuned models
and deployments can be deleted by the user by calling the DELETE API operation.

Disable customer-managed keys


When you disable customer-managed keys, your Cognitive Services resource is then
encrypted with Microsoft-managed keys. To disable customer-managed keys, follow
these steps:

1. Go to your Cognitive Services resource, and then select Encryption.


2. Select Microsoft Managed Keys > Save.

When you previously enabled customer managed keys this also enabled a system
assigned managed identity, a feature of Azure AD. Once the system assigned managed
identity is enabled, this resource will be registered with Azure Active Directory. After
being registered, the managed identity will be given access to the Key Vault selected
during customer managed key setup. You can learn more about Managed Identities.

) Important

If you disable system assigned managed identities, access to the key vault will be
removed and any data encrypted with the customer keys will no longer be
accessible. Any features depended on this data will stop working.

) Important

Managed identities do not currently support cross-directory scenarios. When you


configure customer-managed keys in the Azure portal, a managed identity is
automatically assigned under the covers. If you subsequently move the
subscription, resource group, or resource from one Azure AD directory to another,
the managed identity associated with the resource is not transferred to the new
tenant, so customer-managed keys may no longer work. For more information, see
Transferring a subscription between Azure AD directories in FAQs and known
issues with managed identities for Azure resources.
Next steps
Language service Customer-Managed Key Request Form
Learn more about Azure Key Vault
Business Continuity and Disaster
Recovery (BCDR) considerations with
Azure OpenAI Service
Article • 06/22/2023

Azure OpenAI is available in multiple regions. Since subscription keys are region bound,
when a customer acquires a key, they select the region in which their deployments will
reside and from then on, all operations stay associated with that Azure server region.

It's rare, but not impossible, to encounter a network issue that hits an entire region. If
your service needs to always be available, then you should design it to either fail-over
into another region or split the workload between two or more regions. Both
approaches require at least two Azure OpenAI resources in different regions. This article
provides general recommendations for how to implement Business Continuity and
Disaster Recovery (BCDR) for your Azure OpenAI applications.

Best practices
Today customers will call the endpoint provided during deployment for both
deployments and inference. These operations are stateless, so no data is lost in the case
that a region becomes unavailable.

If a region is non-operational customers must take steps to ensure service continuity.

Business continuity
The following set of instructions applies both customers using default endpoints and
those using custom endpoints.

Default endpoint recovery


If you're using a default endpoint, you should configure your client code to monitor
errors, and if the errors persist, be prepared to redirect to another region of your choice
where you have an Azure OpenAI subscription.

Follow these steps to configure your client to monitor errors:

1. Use the models page to identify the list of available regions for Azure OpenAI.
2. Select a primary and one secondary/backup regions from the list.

3. Create Azure OpenAI resources for each region selected.

4. For the primary region and any backup regions your code will need to know:

a. Base URI for the resource

b. Regional access key or Azure Active Directory access

5. Configure your code so that you monitor connectivity errors (typically connection
timeouts and service unavailability errors).

a. Given that networks yield transient errors, for single connectivity issue
occurrences, the suggestion is to retry.

b. For persistence redirect traffic to the backup resource in the region you've
created.

BCDR requires custom code


The recovery from regional failures for this usage type can be performed
instantaneously and at a very low cost. This does however, require custom development
of this functionality on the client side of your application.
Monitoring Azure OpenAI Service
Article • 03/15/2023

When you have critical applications and business processes relying on Azure resources, you want to
monitor those resources for their availability, performance, and operation.

This article describes the monitoring data generated by Azure OpenAI Service. Azure OpenAI is part of
Cognitive Services, which uses Azure Monitor. If you're unfamiliar with the features of Azure Monitor
common to all Azure services that use it, read Monitoring Azure resources with Azure Monitor.

Monitoring data
Azure OpenAI collects the same kinds of monitoring data as other Azure resources that are described in
Monitoring data from Azure resources.

Collection and routing


Platform metrics and the Activity log are collected and stored automatically, but can be routed to other
locations by using a diagnostic setting.

Resource Logs aren't collected and stored until you create a diagnostic setting and route them to one or
more locations.

See Create diagnostic setting to collect platform logs and metrics in Azure for the detailed process for
creating a diagnostic setting using the Azure portal, CLI, or PowerShell. When you create a diagnostic
setting, you specify which categories of logs to collect.

Keep in mind that using diagnostic settings and sending data to Azure Monitor Logs has additional costs
associated with it. To understand more, consult the Azure Monitor cost calculation guide.

The metrics and logs you can collect are discussed in the following sections.

Analyzing metrics
You can analyze metrics for Azure OpenAI by opening Metrics which can be found underneath the
Monitoring section when viewing your Azure OpenAI resource in the Azure portal. See Getting started
with Azure Metrics Explorer for details on using this tool.

Azure OpenAI is a part of Cognitive Services. For a list of all platform metrics collected for Cognitive
Services and Azure OpenAI, see Cognitive Services supported metrics.

For the current subset of metrics available in Azure OpenAI:

Azure OpenAI Metrics


Metric Exportable Metric Unit Aggregation Description Dimensions
via Display Type
Diagnostic Name
Settings?

BlockedCalls Yes Blocked Count Total Number of ApiName,


Calls calls that OperationName, Region,
exceeded RatelimitKey
rate or
quota limit.

ClientErrors Yes Client Count Total Number of ApiName,


Errors calls with OperationName, Region,
client side RatelimitKey
error (HTTP
response
code 4xx).

DataIn Yes Data In Bytes Total Size of ApiName,


incoming OperationName, Region
data in
bytes.

DataOut Yes Data Out Bytes Total Size of ApiName,


outgoing OperationName, Region
data in
bytes.

FineTunedTrainingHours Yes Processed Count Total Number of ApiName,


FineTuned Training ModelDeploymentName,
Training Hours FeatureName,
Hours Processed UsageChannel, Region
on an
OpenAI
FineTuned
Model

Latency Yes Latency MilliSeconds Average Latency in ApiName,


milliseconds. OperationName, Region,
RatelimitKey

Ratelimit Yes Ratelimit Count Total The current Region, RatelimitKey


ratelimit of
the ratelimit
key.

ServerErrors Yes Server Count Total Number of ApiName,


Errors calls with OperationName, Region,
service RatelimitKey
internal
error (HTTP
response
code 5xx).

SuccessfulCalls Yes Successful Count Total Number of ApiName,


Calls successful OperationName, Region,
calls. RatelimitKey
Metric Exportable Metric Unit Aggregation Description Dimensions
via Display Type
Diagnostic Name
Settings?

TokenTransaction Yes Processed Count Total Number of ApiName,


Inference Inference ModelDeploymentName,
Tokens Tokens FeatureName,
Processed UsageChannel, Region
on an
OpenAI
Model

TotalCalls Yes Total Calls Count Total Total ApiName,


number of OperationName, Region,
calls. RatelimitKey

TotalErrors Yes Total Count Total Total ApiName,


Errors number of OperationName, Region,
calls with RatelimitKey
error
response
(HTTP
response
code 4xx or
5xx).

Analyzing logs
Data in Azure Monitor Logs is stored in tables where each table has its own set of unique properties.

All resource logs in Azure Monitor have the same fields followed by service-specific fields. The common
schema is outlined in Azure Monitor resource log schema.

The Activity log is a type of platform log in Azure that provides insight into subscription-level events. You
can view it independently or route it to Azure Monitor Logs, where you can do much more complex
queries using Log Analytics.

For a list of the types of resource logs available for Azure OpenAI and other Cognitive Services, see
Resource provider operations for Cognitive Services

Kusto queries

) Important

When you select Logs from the Azure OpenAI menu, Log Analytics is opened with the query scope
set to the current Azure OpenAI resource. This means that log queries will only include data from that
resource. If you want to run a query that includes data from other resources or data from other Azure
services, select Logs from the Azure Monitor menu. See Log query scope and time range in Azure
Monitor Log Analytics for details.
To explore and get a sense of what type of information is available for your Azure OpenAI resource a
useful query to start with once you have deployed a model and sent some completion calls through the
playground is as follows:

Kusto

AzureDiagnostics

| take 100

| project TimeGenerated, _ResourceId, Category,OperationName, DurationMs, ResultSignature,


properties_s

Here we return a sample of 100 entries and are displaying a subset of the available columns of data in the
logs. The results are as follows:

If you wish to see all available columns of data, you can remove the scoping that is provided by the |
project line:

Kusto

AzureDiagnostics

| take 100

You can also select the arrow next to the table name to view all available columns and associated data
types.

To examine AzureMetrics run:

Kusto

AzureMetrics

| take 100

| project TimeGenerated, MetricName, Total, Count, TimeGrain, UnitName

Alerts
Azure Monitor alerts proactively notify you when important conditions are found in your monitoring data.
They allow you to identify and address issues in your system before your customers notice them. You can
set alerts on metrics, logs, and the activity log. Different types of alerts have different benefits and
drawbacks.

Every organization's alerting needs are going to vary, and will also evolve over time. Generally all alerts
should be actionable, with a specific intended response if the alert occurs. If there's no action for someone
to take, then it might be something you want to capture in a report, but not in an alert. Some use cases
may require alerting anytime certain error conditions exist. But in many environments, it might only be in
cases where errors exceed a certain threshold for a period of time where sending an alert is warranted.

Errors below certain thresholds can often be evaluated through regular analysis of data in Azure Monitor
Logs. As you analyze your log data over time, you may also find that a certain condition not occurring for
a long enough period of time might be valuable to track with alerts. Sometimes the absence of an event in
a log is just as important a signal as an error.

Depending on what type of application you're developing in conjunction with your use of Azure OpenAI,
Azure Monitor Application Insights may offer additional monitoring benefits at the application layer.

Next steps
See Monitoring Azure resources with Azure Monitor for details on monitoring Azure resources.
Read Understand log searches in Azure Monitor logs.
Plan to manage costs for Azure OpenAI
Service
Article • 04/05/2023

This article describes how you plan for and manage costs for Azure OpenAI Service.
Before you deploy the service, you can use the Azure pricing calculator to estimate costs
for Azure OpenAI. Later, as you deploy Azure resources, review the estimated costs.
After you've started using Azure OpenAI resources, use Cost Management features to
set budgets and monitor costs. You can also review forecasted costs and identify
spending trends to identify areas where you might want to act. Costs for Azure OpenAI
Service are only a portion of the monthly costs in your Azure bill. Although this article
explains how to plan for and manage costs for Azure OpenAI, you're billed for all Azure
services and resources used in your Azure subscription, including the third-party
services.

Prerequisites
Cost analysis in Cost Management supports most Azure account types, but not all of
them. To view the full list of supported account types, see Understand Cost
Management data. To view cost data, you need at least read access for an Azure
account. For information about assigning access to Azure Cost Management data, see
Assign access to data.

Estimate costs before using Azure OpenAI


Use the Azure pricing calculator to estimate the costs of using Azure OpenAI.

Understand the full billing model for Azure


OpenAI Service
Azure OpenAI Service runs on Azure infrastructure that accrues costs when you deploy
new resources. It's important to understand that there could be other additional
infrastructure costs that might accrue.

How you're charged for Azure OpenAI Service

Base series and Codex series models


Azure OpenAI base series and Codex series models are charged per 1,000 tokens. Costs
vary depending on which model series you choose: Ada, Babbage, Curie, Davinci, or
Code-Cushman.

Our models understand and process text by breaking it down into tokens. For reference,
each token is roughly four characters for typical English text.

Token costs are for both input and output. For example, if you have a 1,000 token
JavaScript code sample that you ask an Azure OpenAI model to convert to Python. You
would be charged approximately 1,000 tokens for the initial input request sent, and
1,000 more tokens for the output that is received in response for a total of 2,000 tokens.

In practice, for this type of completion call the token input/output wouldn't be perfectly
1:1. A conversion from one programming language to another could result in a longer
or shorter output depending on many different factors including the value assigned to
the max_tokens parameter.

Base Series and Codex series fine-tuned models


Azure OpenAI fine-tuned models are charged based on three factors:

Training hours
Hosting hours
Inference per 1,000 tokens

The hosting hours cost is important to be aware of since once a fine-tuned model is
deployed it continues to incur an hourly cost regardless of whether you're actively using
it. Fine-tuned model costs should be monitored closely.

) Important

After a customized model is deployed, if at any time the deployment remains


inactive for greater than fifteen (15) days, the deployment will automatically be
deleted. The deployment of a customized model is “inactive” if the model was
deployed more than fifteen (15) days ago and no completions or chat completions
calls were made to it during a continuous 15-day period. The deletion of an inactive
deployment does NOT delete or affect the underlying customized model, and the
customized model can be redeployed at any time. As described in Azure OpenAI
Service pricing , each customized (fine-tuned) model that is deployed incurs an
hourly hosting cost regardless of whether completions or chat completions calls are
being made to the model. To learn more about planning and managing costs with
Azure OpenAI, refer to our cost management guide.
Other costs that might accrue with Azure OpenAI Service
Keep in mind that enabling capabilities like sending data to Azure Monitor Logs,
alerting, etc. incurs additional costs for those services. These costs are visible under
those other services and at the subscription level, but aren't visible when scoped just to
your Azure OpenAI resource.

Using Azure Prepayment with Azure OpenAI Service


You can pay for Azure OpenAI Service charges with your Azure Prepayment credit.
However, you can't use Azure Prepayment credit to pay for charges for third party
products and services including those from the Azure Marketplace.

Monitor costs
As you use Azure resources with Azure OpenAI, you incur costs. Azure resource usage
unit costs vary by time intervals (seconds, minutes, hours, and days) or by unit usage
(bytes, megabytes, and so on.) As soon as Azure OpenAI use starts, costs can be
incurred and you can see the costs in cost analysis.

When you use cost analysis, you view Azure OpenAI costs in graphs and tables for
different time intervals. Some examples are by day, current and prior month, and year.
You also view costs against budgets and forecasted costs. Switching to longer views
over time can help you identify spending trends. And you see where overspending
might have occurred. If you've created budgets, you can also easily see where they're
exceeded.

To view Azure OpenAI costs in cost analysis:

1. Sign in to the Azure portal.


2. Select one of your Azure OpenAI resources.
3. Under Resource Management select Cost analysis
4. By default cost analysis is scoped to the individual Azure OpenAI resource.

To understand the breakdown of what makes up that cost, it can help to modify Group
by to Meter and in this case switching the chart type to Line. You can now see that for
this particular resource the source of the costs is from three different model series with
Text-Davinci Tokens representing the bulk of the costs.

It's important to understand scope when evaluating costs associated with Azure OpenAI.
If your resources are part of the same resource group you can scope Cost Analysis at
that level to understand the effect on costs. If your resources are spread across multiple
resource groups you can scope to the subscription level.

However, when scoped at a higher level you often need to add additional filters to be
able to zero in on Azure OpenAI usage. When scoped at the subscription level we see a
number of other resources that we may not care about in the context of Azure OpenAI
cost management. When scoping at the subscription level, we recommend navigating to
the full Cost analysis tool under the Cost Management service. Search for "Cost
Management" in the top Azure search bar to navigate to the full service experience,
which includes more options like creating budgets.

If you try to add a filter by service, you'll find that you can't find Azure OpenAI in the list.
This is because technically Azure OpenAI is part of Cognitive Services so the service level
filter is Cognitive Services, but if you want to see all Azure OpenAI resources across a
subscription without any other type of Cognitive Services resources you need to instead
scope to Service tier: Azure OpenAI:

Create budgets
You can create budgets to manage costs and create alerts that automatically notify
stakeholders of spending anomalies and overspending risks. Alerts are based on
spending compared to budget and cost thresholds. Budgets and alerts are created for
Azure subscriptions and resource groups, so they're useful as part of an overall cost
monitoring strategy.
Budgets can be created with filters for specific resources or services in Azure if you want
more granularity present in your monitoring. Filters help ensure that you don't
accidentally create new resources that cost you additional money. For more information
about the filter options available when you create a budget, see Group and filter
options.

) Important

While OpenAI has an option for hard limits that will prevent you from going over
your budget, Azure OpenAI does not currently provide this functionality. You are
able to kick off automation from action groups as part of your budget notifications
to take more advanced actions, but this requires additional custom development
on your part.

Export cost data


You can also export your cost data to a storage account. This is helpful when you need
or others to do additional data analysis for costs. For example, a finance team can
analyze the data using Excel or Power BI. You can export your costs on a daily, weekly, or
monthly schedule and set a custom date range. Exporting cost data is the
recommended way to retrieve cost datasets.

Next steps
Learn how to optimize your cloud investment with Azure Cost Management.
Learn more about managing costs with cost analysis.
Learn about how to prevent unexpected costs.
Take the Cost Management guided learning course.
Tutorial: Explore Azure OpenAI Service
embeddings and document search
Article • 06/14/2023

This tutorial will walk you through using the Azure OpenAI embeddings API to perform
document search where you'll query a knowledge base to find the most relevant
document.

In this tutorial, you learn how to:

" Install Azure OpenAI and other dependent Python libraries.


" Download the BillSum dataset and prepare it for analysis.
" Create environment variables for your resources endpoint and API key.
" Use the text-embedding-ada-002 (Version 2) model
" Use cosine similarity to rank search results.

) Important

We strongly recommend using text-embedding-ada-002 (Version 2) . This


model/version provides parity with OpenAI's text-embedding-ada-002 . To learn
more about the improvements offered by this model, please refer to OpenAI's blog
post . Even if you are currently using Version 1 you should migrate to Version 2 to
take advantage of the latest weights/updated token limit. Version 1 and Version 2
are not interchangeable, so document embedding and document search must be
done using the same version of the model.

Prerequisites
An Azure subscription - Create one for free
Access granted to Azure OpenAI in the desired Azure subscription
Currently,
access to this service is granted only by application. You can apply for access to
Azure OpenAI by completing the form at https://aka.ms/oai/access . Open an
issue on this repo to contact us if you have an issue.
Python 3.7.1 or later version
The following Python libraries: openai, num2words, matplotlib, plotly, scipy, scikit-
learn, pandas, tiktoken.
Jupyter Notebooks
An Azure OpenAI resource with the text-embedding-ada-002 (Version 2) model
deployed. This model is currently only available in certain regions. If you don't have
a resource the process of creating one is documented in our resource deployment
guide.

Set up

Python libraries
If you haven't already, you need to install the following libraries:

Windows Command Prompt

pip install openai num2words matplotlib plotly scipy scikit-learn pandas


tiktoken

Download the BillSum dataset


BillSum is a dataset of United States Congressional and California state bills. For
illustration purposes, we'll look only at the US bills. The corpus consists of bills from the
103rd-115th (1993-2018) sessions of Congress. The data was split into 18,949 train bills
and 3,269 test bills. The BillSum corpus focuses on mid-length legislation from 5,000 to
20,000 characters in length. More information on the project and the original academic
paper where this dataset is derived from can be found on the BillSum project's GitHub
repository

This tutorial uses the bill_sum_data.csv file that can be downloaded from our GitHub
sample data .

You can also download the sample data by running the following command on your
local machine:

Windows Command Prompt

curl "https://raw.githubusercontent.com/Azure-Samples/Azure-OpenAI-Docs-
Samples/main/Samples/Tutorials/Embeddings/data/bill_sum_data.csv" --output
bill_sum_data.csv

Retrieve key and endpoint


To successfully make a call against Azure OpenAI, you'll need an endpoint and a key.
Variable Value
name

ENDPOINT This value can be found in the Keys & Endpoint section when examining your
resource from the Azure portal. Alternatively, you can find the value in Azure OpenAI
Studio > Playground > Code View. An example endpoint is: https://docs-test-
001.openai.azure.com .

API-KEY This value can be found in the Keys & Endpoint section when examining your
resource from the Azure portal. You can use either KEY1 or KEY2 .

Go to your resource in the Azure portal. The Endpoint and Keys can be found in the
Resource Management section. Copy your endpoint and access key as you'll need both
for authenticating your API calls. You can use either KEY1 or KEY2 . Always having two
keys allows you to securely rotate and regenerate keys without causing a service
disruption.

Create and assign persistent environment variables for your key and endpoint.

Environment variables

Command Line

CMD

setx AZURE_OPENAI_API_KEY "REPLACE_WITH_YOUR_KEY_VALUE_HERE"

CMD

setx AZURE_OPENAI_ENDPOINT "REPLACE_WITH_YOUR_ENDPOINT_HERE"

After setting the environment variables, you may need to close and reopen Jupyter
notebooks or whatever IDE you're using in order for the environment variables to be
accessible. While we strongly recommend using Jupyter Notebooks, if for some reason
you cannot you'll need to modify any code that is returning a pandas dataframe by
using print(dataframe_name) rather than just calling the dataframe_name directly as is
often done at the end of a code block.

Run the following code in your preferred Python IDE:

Import libraries and list models


Python

import openai

import os

import re

import requests

import sys

from num2words import num2words

import os

import pandas as pd

import numpy as np

from openai.embeddings_utils import get_embedding, cosine_similarity

import tiktoken

API_KEY = os.getenv("AZURE_OPENAI_API_KEY")

RESOURCE_ENDPOINT = os.getenv("AZURE_OPENAI_ENDPOINT")

openai.api_type = "azure"

openai.api_key = API_KEY

openai.api_base = RESOURCE_ENDPOINT

openai.api_version = "2022-12-01"

url = openai.api_base + "/openai/deployments?api-version=2022-12-01"

r = requests.get(url, headers={"api-key": API_KEY})

print(r.text)

Output

"data": [

"scale_settings": {

"scale_type": "standard"

},

"model": "text-embedding-ada-002",

"owner": "organization-owner",

"id": "text-embedding-ada-002",

"status": "succeeded",

"created_at": 1657572678,

"updated_at": 1657572678,

"object": "deployment"

},

"scale_settings": {

"scale_type": "standard"

},

"model": "code-cushman-001",

"owner": "organization-owner",

"id": "code-cushman-001",

"status": "succeeded",

"created_at": 1657572712,

"updated_at": 1657572712,

"object": "deployment"

},

"scale_settings": {

"scale_type": "standard"

},

"model": "text-search-curie-doc-001",

"owner": "organization-owner",

"id": "text-search-curie-doc-001",

"status": "succeeded",

"created_at": 1668620345,

"updated_at": 1668620345,

"object": "deployment"

},

"scale_settings": {

"scale_type": "standard"

},

"model": "text-search-curie-query-001",

"owner": "organization-owner",

"id": "text-search-curie-query-001",

"status": "succeeded",

"created_at": 1669048765,

"updated_at": 1669048765,

"object": "deployment"

],

"object": "list"

The output of this command will vary based on the number and type of models you've
deployed. In this case, we need to confirm that we have an entry for text-embedding-
ada-002. If you find that you're missing this model, you'll need to deploy the model to
your resource before proceeding.

Now we need to read our csv file and create a pandas DataFrame. After the initial
DataFrame is created, we can view the contents of the table by running df .

Python

df=pd.read_csv(os.path.join(os.getcwd(),'bill_sum_data.csv')) # This assumes


that you have placed the bill_sum_data.csv in the same directory you are
running Jupyter Notebooks

df

Output:

The initial table has more columns than we need we'll create a new smaller DataFrame
called df_bills which will contain only the columns for text , summary , and title .

Python

df_bills = df[['text', 'summary', 'title']]

df_bills

Output:

Next we'll perform some light data cleaning by removing redundant whitespace and
cleaning up the punctuation to prepare the data for tokenization.
Python

pd.options.mode.chained_assignment = None #https://pandas.pydata.org/pandas-


docs/stable/user_guide/indexing.html#evaluation-order-matters

# s is input text

def normalize_text(s, sep_token = " \n "):

s = re.sub(r'\s+', ' ', s).strip()

s = re.sub(r". ,","",s)

# remove all instances of multiple spaces

s = s.replace("..",".")

s = s.replace(". .",".")

s = s.replace("\n", "")

s = s.strip()

return s

df_bills['text']= df_bills["text"].apply(lambda x : normalize_text(x))

Now we need to remove any bills that are too long for the token limit (8192 tokens).

Python

tokenizer = tiktoken.get_encoding("cl100k_base")

df_bills['n_tokens'] = df_bills["text"].apply(lambda x:
len(tokenizer.encode(x)))

df_bills = df_bills[df_bills.n_tokens<8192]

len(df_bills)

Output

20

7 Note

In this case all bills are under the embedding model input token limit, but you can
use the technique above to remove entries that would otherwise cause embedding
to fail. When faced with content that exceeds the embedding limit, you can also
chunk the content into smaller pieces and then embed those one at a time.

We'll once again examine df_bills.

Python

df_bills

Output:

To understand the n_tokens column a little more as well how text ultimately is tokenized,
it can be helpful to run the following code:

Python

sample_encode = tokenizer.encode(df_bills.text[0])

decode = tokenizer.decode_tokens_bytes(sample_encode)

decode

For our docs we're intentionally truncating the output, but running this command in
your environment will return the full text from index zero tokenized into chunks. You can
see that in some cases an entire word is represented with a single token whereas in
others parts of words are split across multiple tokens.

Output

[b'SECTION',

b' ',

b'1',

b'.',

b' SHORT',

b' TITLE',

b'.',

b' This',

b' Act',

b' may',

b' be',

b' cited',

b' as',

b' the',

b' ``',

b'National',

b' Science',

b' Education',

b' Tax',

b' In',

b'cent',

b'ive',

b' for',

b' Businesses',

b' Act',

b' of',

b' ',

b'200',

b'7',

b"''.",

b' SEC',

b'.',

b' ',

b'2',

b'.',

b' C',

b'RED',

b'ITS',

b' FOR',

b' CERT',

b'AIN',

b' CONTRIBUT',

b'IONS',

b' BEN',

b'EF',

b'IT',

b'ING',

b' SC',

If you then check the length of the decode variable, you'll find it matches the first
number in the n_tokens column.

Python

len(decode)

Output

1466

Now that we understand more about how tokenization works we can move on to
embedding. It is important to note, that we haven't actually tokenized the documents
yet. The n_tokens column is simply a way of making sure none of the data we pass to
the model for tokenization and embedding exceeds the input token limit of 8,192. When
we pass the documents to the embeddings model, it will break the documents into
tokens similar (though not necessarily identical) to the examples above and then convert
the tokens to a series of floating point numbers that will be accessible via vector search.
These embeddings can be stored locally or in an Azure Database. As a result, each bill
will have its own corresponding embedding vector in the new ada_v2 column on the
right side of the DataFrame.

Python

df_bills['ada_v2'] = df_bills["text"].apply(lambda x : get_embedding(x,


engine = 'text-embedding-ada-002')) # engine should be set to the deployment
name you chose when you deployed the text-embedding-ada-002 (Version 2)
model

Python

df_bills

Output:

As we run the search code block below, we'll embed the search query "Can I get
information on cable company tax revenue?" with the same text-embedding-ada-002
(Version 2) model. Next we'll find the closest bill embedding to the newly embedded
text from our query ranked by cosine similarity.

Python

# search through the reviews for a specific product

def search_docs(df, user_query, top_n=3, to_print=True):

embedding = get_embedding(

user_query,

engine="text-embedding-ada-002" # engine should be set to the


deployment name you chose when you deployed the text-embedding-ada-002
(Version 2) model

df["similarities"] = df.ada_v2.apply(lambda x: cosine_similarity(x,


embedding))

res = (

df.sort_values("similarities", ascending=False)

.head(top_n)

if to_print:

display(res)

return res

res = search_docs(df_bills, "Can I get information on cable company tax


revenue?", top_n=4)

Output:

Finally, we'll show the top result from document search based on user query against the
entire knowledge base. This returns the top result of the "Taxpayer's Right to View Act of
1993". This document has a cosine similarity score of 0.76 between the query and the
document:

Python

res["summary"][9]

Output

"Taxpayer's Right to View Act of 1993 - Amends the Communications Act of


1934 to prohibit a cable operator from assessing separate charges for any
video programming of a sporting, theatrical, or other entertainment event if
that event is performed at a facility constructed, renovated, or maintained
with tax revenues or by an organization that receives public financial
support. Authorizes the Federal Communications Commission and local
franchising authorities to make determinations concerning the applicability
of such prohibition. Sets forth conditions under which a facility is
considered to have been constructed, maintained, or renovated with tax
revenues. Considers events performed by nonprofit or public organizations
that receive tax subsidies to be subject to this Act if the event is
sponsored by, or includes the participation of a team that is part of, a tax
exempt organization."

Using this approach, you can use embeddings as a search mechanism across documents
in a knowledge base. The user can then take the top search result and use it for their
downstream task, which prompted their initial query.

Clean up resources
If you created an OpenAI resource solely for completing this tutorial and want to clean
up and remove an OpenAI resource, you'll need to delete your deployed models, and
then delete the resource or associated resource group if it's dedicated to your test
resource. Deleting the resource group also deletes any other resources associated with
it.

Portal
Azure CLI

Next steps
Learn more about Azure OpenAI's models:

Azure OpenAI Service models


Azure OpenAI speech to speech chat
Article • 04/19/2023

Reference documentation | Package (NuGet) | Additional Samples on GitHub

) Important

To complete the steps in this guide, access must be granted to Microsoft Azure
OpenAI Service in the desired Azure subscription. Currently, access to this service is
granted only by application. You can apply for access to Azure OpenAI by
completing the form at https://aka.ms/oai/access .

In this how-to guide, you can use Azure Cognitive Services Speech to converse with
Azure OpenAI Service. The text recognized by the Speech service is sent to Azure
OpenAI. The text response from Azure OpenAI is then synthesized by the Speech
service.

Speak into the microphone to start a conversation with Azure OpenAI.

The Speech service recognizes your speech and converts it into text (speech to
text).
Your request as text is sent to Azure OpenAI.
The Speech service text to speech (TTS) feature synthesizes the response from
Azure OpenAI to the default speaker.

Although the experience of this example is a back-and-forth exchange, Azure OpenAI


doesn't remember the context of your conversation.

Prerequisites
" Azure subscription - Create one for free
" Create a Speech resource in the Azure portal.
" Get the Speech resource key and region. After your Speech resource is deployed,
select Go to resource to view and manage keys. For more information about
Cognitive Services resources, see Get the keys for your resource.

Set up the environment


The Speech SDK is available as a NuGet package and implements .NET Standard 2.0.
You install the Speech SDK later in this guide, but first check the SDK installation guide
for any more requirements.

Set environment variables


This example requires environment variables named OPEN_AI_KEY , OPEN_AI_ENDPOINT ,
SPEECH_KEY , and SPEECH_REGION .

Your application must be authenticated to access Cognitive Services resources. For


production, use a secure way of storing and accessing your credentials. For example,
after you get a key for your Speech resource , write it to a new environment variable
on the local machine running the application.

 Tip

Don't include the key directly in your code, and never post it publicly. See the
Cognitive Services security article for more authentication options like Azure Key
Vault.

To set the environment variables, open a console window, and follow the instructions for
your operating system and development environment.

To set the OPEN_AI_KEY environment variable, replace your-openai-key with one of


the keys for your resource.
To set the OPEN_AI_ENDPOINT environment variable, replace your-openai-endpoint
with one of the regions for your resource.
To set the SPEECH_KEY environment variable, replace your-speech-key with one of
the keys for your resource.
To set the SPEECH_REGION environment variable, replace your-speech-region with
one of the regions for your resource.

Windows

Console

setx OPEN_AI_KEY your-openai-key

setx OPEN_AI_ENDPOINT your-openai-endpoint

setx SPEECH_KEY your-speech-key

setx SPEECH_REGION your-speech-region

7 Note
If you only need to access the environment variable in the current running
console, you can set the environment variable with set instead of setx .

After you add the environment variables, you may need to restart any running
programs that will need to read the environment variable, including the console
window. For example, if you are using Visual Studio as your editor, restart Visual
Studio before running the example.

Recognize speech from a microphone


Follow these steps to create a new console application.

1. Open a command prompt where you want the new project, and create a console
application with the .NET CLI. The Program.cs file should be created in the project
directory.

.NET CLI

dotnet new console

2. Install the Speech SDK in your new project with the .NET CLI.

.NET CLI

dotnet add package Microsoft.CognitiveServices.Speech

3. Install the Azure OpenAI SDK (prerelease) in your new project with the .NET CLI.

.NET CLI

dotnet add package Azure.AI.OpenAI --prerelease

4. Replace the contents of Program.cs with the following code.

C#

using System;

using System.IO;

using System.Threading.Tasks;

using Microsoft.CognitiveServices.Speech;

using Microsoft.CognitiveServices.Speech.Audio;

using Azure;

using Azure.AI.OpenAI;

using static System.Environment;

class Program

// This example requires environment variables named "OPEN_AI_KEY"


and "OPEN_AI_ENDPOINT"

// Your endpoint should look like the following


https://YOUR_OPEN_AI_RESOURCE_NAME.openai.azure.com/

static string openAIKey =


Environment.GetEnvironmentVariable("OPEN_AI_KEY");

static string openAIEndpoint =


Environment.GetEnvironmentVariable("OPEN_AI_ENDPOINT");

// Enter the deployment name you chose when you deployed the model.

static string engine = "text-davinci-002";

// This example requires environment variables named "SPEECH_KEY"


and "SPEECH_REGION"

static string speechKey =


Environment.GetEnvironmentVariable("SPEECH_KEY");

static string speechRegion =


Environment.GetEnvironmentVariable("SPEECH_REGION");

// Prompts Azure OpenAI with a request and synthesizes the


response.

async static Task AskOpenAI(string prompt)

// Ask Azure OpenAI

OpenAIClient client = new(new Uri(openAIEndpoint), new


AzureKeyCredential(openAIKey));

var completionsOptions = new CompletionsOptions()

Prompts = { prompt },
MaxTokens = 100,

};

Response<Completions> completionsResponse =
client.GetCompletions(engine, completionsOptions);

string text = completionsResponse.Value.Choices[0].Text.Trim();

Console.WriteLine($"Azure OpenAI response: {text}");

var speechConfig = SpeechConfig.FromSubscription(speechKey,


speechRegion);

// The language of the voice that speaks.

speechConfig.SpeechSynthesisVoiceName = "en-US-
JennyMultilingualNeural";

var audioOutputConfig = AudioConfig.FromDefaultSpeakerOutput();

using (var speechSynthesizer = new


SpeechSynthesizer(speechConfig, audioOutputConfig))

var speechSynthesisResult = await


speechSynthesizer.SpeakTextAsync(text).ConfigureAwait(true);

if (speechSynthesisResult.Reason ==
ResultReason.SynthesizingAudioCompleted)

Console.WriteLine($"Speech synthesized to speaker for


text: [{text}]");

else if (speechSynthesisResult.Reason ==
ResultReason.Canceled)

var cancellationDetails =
SpeechSynthesisCancellationDetails.FromResult(speechSynthesisResult);

Console.WriteLine($"Speech synthesis canceled:


{cancellationDetails.Reason}");

if (cancellationDetails.Reason ==
CancellationReason.Error)

Console.WriteLine($"Error details:
{cancellationDetails.ErrorDetails}");

// Continuously listens for speech input to recognize and send as


text to Azure OpenAI

async static Task ChatWithOpenAI()

// Should be the locale for the speaker's language.

var speechConfig = SpeechConfig.FromSubscription(speechKey,


speechRegion);

speechConfig.SpeechRecognitionLanguage = "en-US";

using var audioConfig =


AudioConfig.FromDefaultMicrophoneInput();

using var speechRecognizer = new SpeechRecognizer(speechConfig,


audioConfig);

var conversationEnded = false;

while(!conversationEnded)
{

Console.WriteLine("Azure OpenAI is listening. Say 'Stop' or


press Ctrl-Z to end the conversation.");

// Get audio from the microphone and then send it to the


TTS service.

var speechRecognitionResult = await


speechRecognizer.RecognizeOnceAsync();

switch (speechRecognitionResult.Reason)

case ResultReason.RecognizedSpeech:

if (speechRecognitionResult.Text == "Stop.")

Console.WriteLine("Conversation ended.");

conversationEnded = true;

else

Console.WriteLine($"Recognized speech:
{speechRecognitionResult.Text}");

await
AskOpenAI(speechRecognitionResult.Text).ConfigureAwait(true);

break;

case ResultReason.NoMatch:

Console.WriteLine($"No speech could be recognized:


");

break;

case ResultReason.Canceled:

var cancellationDetails =
CancellationDetails.FromResult(speechRecognitionResult);

Console.WriteLine($"Speech Recognition canceled:


{cancellationDetails.Reason}");

if (cancellationDetails.Reason ==
CancellationReason.Error)

Console.WriteLine($"Error details=
{cancellationDetails.ErrorDetails}");

break;

async static Task Main(string[] args)

try

await ChatWithOpenAI().ConfigureAwait(true);

catch (Exception ex)

Console.WriteLine(ex.Message);

5. To increase or decrease the number of tokens returned by Azure OpenAI, change


the MaxTokens property in the CompletionsOptions class instance. For more
information tokens and cost implications, see Azure OpenAI tokens and Azure
OpenAI pricing .

Run your new console application to start speech recognition from a microphone:

Console

dotnet run

) Important

Make sure that you set the OPEN_AI_KEY , OPEN_AI_ENDPOINT , SPEECH__KEY and
SPEECH__REGION environment variables as described previously. If you don't set

these variables, the sample will fail with an error message.

Speak into your microphone when prompted. The console output includes the prompt
for you to begin speaking, then your request as text, and then the response from Azure
OpenAI as text. The response from Azure OpenAI should be converted from text to
speech and then output to the default speaker.

Console

PS C:\dev\openai\csharp> dotnet run

Azure OpenAI is listening. Say 'Stop' or press Ctrl-Z to end the


conversation.

Recognized speech:Make a comma separated list of all continents.

Azure OpenAI response:Africa, Antarctica, Asia, Australia, Europe, North


America, South America

Speech synthesized to speaker for text [Africa, Antarctica, Asia, Australia,


Europe, North America, South America]

Azure OpenAI is listening. Say 'Stop' or press Ctrl-Z to end the


conversation.

Recognized speech: Make a comma separated list of 1 Astronomical observatory


for each continent. A list should include each continent name in
parentheses.

Azure OpenAI response:Mauna Kea Observatories (North America), La Silla


Observatory (South America), Tenerife Observatory (Europe), Siding Spring
Observatory (Australia), Beijing Xinglong Observatory (Asia), Naukluft
Plateau Observatory (Africa), Rutherford Appleton Laboratory (Antarctica)

Speech synthesized to speaker for text [Mauna Kea Observatories (North


America), La Silla Observatory (South America), Tenerife Observatory
(Europe), Siding Spring Observatory (Australia), Beijing Xinglong
Observatory (Asia), Naukluft Plateau Observatory (Africa), Rutherford
Appleton Laboratory (Antarctica)]
Azure OpenAI is listening. Say 'Stop' or press Ctrl-Z to end the
conversation.

Conversation ended.

PS C:\dev\openai\csharp>

Remarks
Now that you've completed the quickstart, here are some more considerations:

To change the speech recognition language, replace en-US with another supported
language. For example, es-ES for Spanish (Spain). The default language is en-US if
you don't specify a language. For details about how to identify one of multiple
languages that might be spoken, see language identification.
To change the voice that you hear, replace en-US-JennyMultilingualNeural with
another supported voice. If the voice doesn't speak the language of the text
returned from Azure OpenAI, the Speech service doesn't output synthesized audio.
To use a different model, replace text-davinci-002 with the ID of another
deployment. Keep in mind that the deployment ID isn't necessarily the same as the
model name. You named your deployment when you created it in Azure OpenAI
Studio .
Azure OpenAI also performs content moderation on the prompt inputs and
generated outputs. The prompts or responses may be filtered if harmful content is
detected. For more information, see the content filtering article.

Clean up resources
You can use the Azure portal or Azure Command Line Interface (CLI) to remove the
Speech resource you created.

Next steps
Learn more about Speech
Learn more about Azure OpenAI
Overview of Responsible AI practices for
Azure OpenAI models
Article • 05/19/2023

Many of the Azure OpenAI models are generative AI models that have demonstrated
improvements in advanced capabilities such as content and code generation,
summarization, and search. With many of these improvements also come increased
responsible AI challenges related to harmful content, manipulation, human-like
behavior, privacy, and more. For more information about the capabilities, limitations and
appropriate use cases for these models, please review the Transparency Note.

In addition to the Transparency Note, we have created technical recommendations and


resources to help customers design, develop, deploy, and use AI systems that
implement the Azure OpenAI models responsibly. Our recommendations are grounded
in the Microsoft Responsible AI Standard , which sets policy requirements that our own
engineering teams follow. Much of the content of the Standard follows a pattern, asking
teams to Identify, Measure, and Mitigate potential harms, and plan for how to Operate
the AI system as well. In alignment with those practices, these recommendations are
organized into four stages:

1. Identify : Identify and prioritize potential harms that could result from your AI
system through iterative red-teaming, stress-testing, and analysis.
2. Measure : Measure the frequency and severity of those harms by establishing clear
metrics, creating measurement test sets, and completing iterative, systematic
testing (both manual and automated).
3. Mitigate : Mitigate harms by implementing tools and strategies such as prompt
engineering and using our content filters. Repeat measurement to test
effectiveness after implementing mitigations.
4. Operate : Define and execute a deployment and operational readiness plan.

In addition to their correspondence to the Microsoft Responsible AI Standard, these


stages correspond closely to the functions in the NIST AI Risk Management
Framework .

Identify
Identifying potential harms that could occur in or be caused by an AI system is the first
stage of the Responsible AI lifecycle. The earlier you begin to identify potential harms,
the more effective you can be at mitigating the harms. When assessing potential harms,
it is important to develop an understanding of the types of harms that could result from
using the Azure OpenAI Service in your specific context(s). In this section, we provide
recommendations and resources you can use to identify harms through an impact
assessment, iterative red team testing, stress-testing, and analysis. Red teaming and
stress-testing are approaches where a group of testers come together and intentionally
probe a system to identify its limitations, risk surface, and vulnerabilities.

These steps have the goal of producing a prioritized list of potential harms for each
specific scenario.

1. Identify harms that are relevant for your specific model, application, and
deployment scenario.
a. Identify potential harms associated with the model and model capabilities (for
example, GPT-3 model vs GPT-4 model) that you're using in your system. This is
important to consider because each model has different capabilities, limitations,
and risks, as described more fully in the sections above.
b. Identify any other harms or increased scope of harm presented by the intended
use of the system you're developing. Consider using a Responsible AI Impact
Assessment to identify potential harms.
i. For example, let's consider an AI system that summarizes text. Some uses of
text generation are lower risk than others. For example, if the system is to be
used in a healthcare domain for summarizing doctor's notes, the risk of harm
arising from inaccuracies is higher than if the system is summarizing online
articles.
2. Prioritize harms based on elements of risk such as frequency and severity. Assess
the level of risk for each harm and the likelihood of each risk occurring in order to
prioritize the list of harms you've identified. Consider working with subject matter
experts and risk managers within your organization and with relevant external
stakeholders when appropriate.
3. Conduct red team testing and stress testing starting with the highest priority
harms, to develop a better understanding of whether and how the identified harms
are actually occurring in your scenario, as well as to identify new harms you didn't
initially anticipate.
4. Share this information with relevant stakeholders using your organization's
internal compliance processes.

At the end of this Identify stage, you should have a documented, prioritized list of
harms. When new harms and new instances of harms emerge through further testing
and use of the system, you can update and improve this list by following the above
process again.

Measure
Once a list of prioritized harms has been identified, the next stage involves developing
an approach for systematic measurement of each harm and conducting evaluations of
the AI system. There are manual and automated approaches to measurement. We
recommend you do both, starting with manual measurement.

Manual measurement is useful for:

1. Measuring progress on a small set of priority issues. When mitigating specific


harms, it's often most productive to keep manually checking progress against a
small dataset until the harm is no longer observed before moving to automated
measurement.
2. Defining and reporting metrics until automated measurement is reliable enough to
use alone.
3. Spot-checking periodically to measure the quality of automatic measurement.

Automated measurement is useful for:

1. Measuring at a large scale with increased coverage to provide more


comprehensive results.
2. Ongoing measurement to monitor for any regression as the system, usage, and
mitigations evolve.

Below, we provide specific recommendations to measure your AI system for potential


harms. We recommend you first complete this process manually and then develop a
plan to automate the process:

1. Create inputs that are likely to produce each prioritized harm: Create
measurement set(s) by generating many diverse examples of targeted inputs that
are likely to produce each prioritized harm.

2. Generate System Outputs: Pass in the examples from the measurement sets as
inputs to the system to generate system outputs. Document the outputs.

3. Evaluate System Outputs and Report Results to Relevant Stakeholders


a. Define clear metric(s). For each intended use of your system, establish metrics
that measure the frequency and degree of severity of each potentially harmful
output. Create clear definitions to classify outputs that will be considered
harmful or problematic in the context of your system and scenario, for each
type of prioritized harm you identified.
b. Assess the outputs against the clear metric definitions and record and quantify
the occurrences of harmful outputs. Repeat the measurements periodically, to
assess mitigations and monitor for any regression.
c. Share this information with relevant stakeholders using your organization's
internal compliance processes.

At the end of this measurement stage, you should have a defined measurement
approach to benchmark how your system performs for each potential harm as well as an
initial set of documented results. As you continue implementing and testing mitigations,
the metrics and measurement sets should continue to be refined (for example, to add
metrics for new harms that were initially unanticipated) and the results updated.

Mitigate
Mitigating harms presented by large language models such as the Azure OpenAI
models requires an iterative, layered approach that includes experimentation and
continual measurement. We recommend developing a mitigation plan that
encompasses four layers of mitigations for the harms identified in the earlier stages of
this process:

1. At the model level, it's important to understand the model(s) you'll be using and
what fine-tuning steps may have been taken by the model developers to align the
model towards its intended uses and to reduce the risk of potentially harmful uses
and outcomes.
a. For example, for GPT-4, model developers have been able to use reinforcement
learning methods as a responsible AI tool to better align the model towards the
designers' intended goals.
2. At the safety system level, you should understand the platform level mitigations
that have been implemented. such as the Azure OpenAI content filters which help
to block the output of harmful content.
3. At the application level, application developers can implement metaprompt and
user-centered design and user experience mitigations. Metaprompts are
instructions provided to the model to guide its behavior; their use can make a
critical difference in guiding the system to behave in accordance with your
expectations. User-centered design and user experience (UX) interventions are also
key mitigation tools to prevent misuse and overreliance on AI.
4. At the positioning level, there are many ways to educate the people who will use
or be affected by your system about its capabilities and limitations.

Below, we provide specific recommendations to implement mitigations at the different


layers. Not all of these mitigations are appropriate for every scenario, and conversely,
these mitigations may be insufficient for some scenarios. Give careful consideration to
your scenario and the prioritized harms you identified, and as you implement
mitigations, develop a process to measure and document their effectiveness for your
system and scenario.

1. Model level Mitigations: Review and identify which Azure OpenAI base model is
best suited for the system you're building and educate yourself about its
capabilities, limitations, and any measures taken to reduce the risk of the potential
harms you've identified. For example, if you're using GPT-4, in addition to reading
this Transparency Note, you can review OpenAI's GPT-4 System Card explaining
the safety challenges presented by the model and the safety processes that
OpenAI adopted to prepare GPT-4 for deployment. It may be worth experimenting
with different versions of the model(s) (including through red teaming and
measuring) to see how the harms present differently.

2. Safety System Level Mitigations: Identify and evaluate the effectiveness of


platform level solutions such as the Azure OpenAI content filters to help mitigate
the potential harms that you have identified.

3. Application Level Mitigations: Prompt engineering, including metaprompt tuning,


can be an effective mitigation for many different types of harm. Review and
implement metaprompt (also called the "system message" or "system prompt")
guidance and best practices documented here.

We recommend implementing the following user-centered design and user


experience (UX) interventions, guidance, and best practices to guide users to use
the system as intended and to prevent overreliance on the AI system:
a. Review and edit interventions: Design the user experience (UX) to encourage
people who use the system to review and edit the AI-generated outputs before
accepting them (see HAX G9 : Support efficient correction).
b. Highlight potential inaccuracies in the AI-generated outputs (see HAX G2 :
Make clear how well the system can do what it can do), both when users first
start using the system and at appropriate times during ongoing use. In the first
run experience (FRE), notify users that AI-generated outputs may contain
inaccuracies and that they should verify information. Throughout the
experience, include reminders to check AI-generated output for potential
inaccuracies, both overall and in relation to specific types of content the system
may generate incorrectly. For example, if your measurement process has
determined that your system has lower accuracy with numbers, mark numbers
in generated outputs to alert the user and encourage them to check the
numbers or seek external sources for verification.
c. User responsibility. Remind people that they are accountable for the final
content when they're reviewing AI-generated content. For example, when
offering code suggestions, remind the developer to review and test suggestions
before accepting.
d. Disclose AI's role in the interaction. Make people aware that they are
interacting with an AI system (as opposed to another human). Where
appropriate, inform content consumers that content has been partly or fully
generated by an AI model; such notices may be required by law or applicable
best practices, and can reduce inappropriate reliance on AI-generated outputs
and can help consumers use their own judgment about how to interpret and act
on such content.
e. Prevent the system from anthropomorphizing. AI models may output content
containing opinions, emotive statements, or other formulations that could imply
that they're human-like, that could be mistaken for a human identity, or that
could mislead people to think that a system has certain capabilities when it
doesn't. Implement mechanisms that reduce the risk of such outputs or
incorporate disclosures to help prevent misinterpretation of outputs.
f. Cite references and information sources. If your system generates content
based on references sent to the model, clearly citing information sources helps
people understand where the AI-generated content is coming from.
g. Limit the length of inputs and outputs, where appropriate. Restricting input
and output length can reduce the likelihood of producing undesirable content,
misuse of the system beyond its intended uses, or other harmful or unintended
uses.
h. Structure inputs and/or system outputs. Use prompt engineering techniques
within your application to structure inputs to the system to prevent open-ended
responses. You can also limit outputs to be structured in certain formats or
patterns. For example, if your system generates dialog for a fictional character in
response to queries, limit the inputs so that people can only query for a
predetermined set of concepts.
i. Prepare pre-determined responses. There are certain queries to which a model
may generate offensive, inappropriate, or otherwise harmful responses. When
harmful or offensive queries or responses are detected, you can design your
system to deliver a predetermined response to the user. Predetermined
responses should be crafted thoughtfully. For example, the application can
provide prewritten answers to questions such as "who/what are you?" to avoid
having the system respond with anthropomorphized responses. You can also
use predetermined responses for questions like, "What are your terms of use?"
to direct people to the correct policy.
j. Restrict automatic posting on social media. Limit how people can automate
your product or service. For example, you may choose to prohibit automated
posting of AI-generated content to external sites (including social media), or to
prohibit the automated execution of generated code.
k. Bot detection. Devise and implement a mechanism to prohibit users from
building an API on top of your product.

4. Positioning Level Mitigations:


a. Be appropriately transparent. It's important to provide the right level of
transparency to people who use the system, so that they can make informed
decisions around the use of the system.
b. Provide system documentation. Produce and provide educational materials for
your system, including explanations of its capabilities and limitations. For
example, this could be in the form of a "learn more" page accessible via the
system.
c. Publish user guidelines and best practices. Help users and stakeholders use the
system appropriately by publishing best practices, for example on prompt
crafting, reviewing generations before accepting them, etc. Such guidelines can
help people understand how the system works. When possible, incorporate the
guidelines and best practices directly into the UX.

As you implement mitigations to address potential identified harms, it's important to


develop a process for ongoing measurement of the effectiveness of such mitigations, to
document measurement results, and to review those measurement results to continually
improve the system.

Operate
Once measurement and mitigation systems are in place, we recommend that you define
and execute a deployment and operational readiness plan. This stage includes
completing appropriate reviews of your system and mitigation plans with relevant
stakeholders, establishing pipelines to collect telemetry and feedback, and developing
an incident response and rollback plan.

Some recommendations for how to deploy and operate a system that uses the Azure
OpenAI service with appropriate, targeted harms mitigations include:

1. Work with compliance teams within your organization to understand what types of
reviews are required for your system and when they are required (for example,
legal review, privacy review, security review, accessibility review, etc.).

2. Develop and implement the following:


a. Develop a phased delivery plan. We recommend you launch systems using the
Azure OpenAI service gradually using a "phased delivery" approach. This gives a
limited set of people the opportunity to try the system, provide feedback, report
issues and concerns, and suggest improvements before the system is released
more widely. It also helps to manage the risk of unanticipated failure modes,
unexpected system behaviors, and unexpected concerns being reported.
b. Develop an incident response plan. Develop an incident response plan and
evaluate the time needed to respond to an incident.
c. Develop a rollback plan Ensure you can roll back the system quickly and
efficiently in case an unanticipated incident occurs.
d. Prepare for immediate action for unanticipated harms. Build the necessary
features and processes to block problematic prompts and responses as they're
discovered and as close to real-time as possible. When unanticipated harms do
occur, block the problematic prompts and responses as quickly as possible,
develop and deploy appropriate mitigations, investigate the incident, and
implement a long-term solution.
e. Develop a mechanism to block people who are misusing your system.
Develop a mechanism to identify users who violate your content policies (for
example, by generating hate speech) or are otherwise using your system for
unintended or harmful purposes, and take action against further abuse. For
example, if a user frequently uses your system to generate content that is
blocked or flagged by content safety systems, consider blocking them from
further use of your system. Implement an appeal mechanism where appropriate.
f. Build effective user feedback channels. Implement feedback channels through
which stakeholders (and the general public, if applicable) can submit feedback
or report issues with generated content or that otherwise arise during their use
of the system. Document how such feedback is processed, considered, and
addressed. Evaluate the feedback and work to improve the system based on
user feedback. One approach could be to include buttons with generated
content that would allow users to identify content as "inaccurate," "harmful" or
"incomplete." This could provide a more widely used, structured and feedback
signal for analysis.
g. Telemetry data. Identify and record (consistent with applicable privacy laws,
policies, and commitments) signals that indicate user satisfaction or their ability
to use the system as intended. Use telemetry data to identify gaps and improve
the system.

This document is not intended to be, and should not be construed as providing, legal
advice. The jurisdiction in which you're operating may have various regulatory or legal
requirements that apply to your AI system. Consult a legal specialist if you are uncertain
about laws or regulations that might apply to your system, especially if you think those
might impact these recommendations. Be aware that not all of these recommendations
and resources are appropriate for every scenario, and conversely, these
recommendations and resources may be insufficient for some scenarios.

Learn more about responsible AI


Microsoft AI principles
Microsoft responsible AI resources
Microsoft Azure Learning courses on responsible AI

Learn more about Azure OpenAI


Limited access to Azure OpenAI Service - Azure Cognitive Services | Microsoft
Learn
Code of Conduct for the Azure OpenAI Service | Microsoft Learn
Data, privacy, and security for Azure OpenAI Service - Azure Cognitive Services |
Microsoft Learn
Transparency Note for Azure OpenAI
Service
Article • 05/19/2023

What is a Transparency Note?


An AI system includes not only the technology, but also the people who use it, the
people who are affected by it, and the environment in which it's deployed. Creating a
system that is fit for its intended purpose requires an understanding of how the
technology works, what its capabilities and limitations are, and how to achieve the best
performance. Microsoft's Transparency Notes are intended to help you understand how
our AI technology works, the choices system owners can make that influence system
performance and behavior, and the importance of thinking about the whole system,
including the technology, the people, and the environment. You can use Transparency
Notes when developing or deploying your own system, or share them with the people
who will use or be affected by your system.

Microsoft’s Transparency Notes are part of a broader effort at Microsoft to put our AI
Principles into practice. To find out more, see the Microsoft's AI principles .

The basics of the Azure OpenAI Models


Azure OpenAI provides customers with a fully managed AI service that lets developers
and data scientists apply OpenAI's powerful models including models that can generate
natural language, code, and images. Within the Azure OpenAI Service, the OpenAI
models are integrated with Microsoft-developed content filtering and abuse detection
models. Learn more about content filtering here and abuse detection here.

Select the tabs to see content for the relevant model type.

Introduction

Text and code models

As part of the fully managed Azure OpenAI Service, the GPT-3 models analyze and
generate natural language, Codex models analyze and generate code and plain text
code commentary, and the GPT-4 models can understand and generate natural
language and code. These models use an autoregressive architecture, meaning they
use data from prior observations to predict the most probable next word. This
process is then repeated by appending the newly generated content to the original
text to produce the complete generated response. Because the response is
conditioned on the input text, these models can be applied to various tasks simply
by changing the input text.

The GPT-3 series of models are pretrained on a wide body of publicly available free
text data. This data is sourced from a combination of web crawling (specifically, a
filtered version of Common Crawl , which includes a broad range of text from the
internet and comprises 60 percent of the weighted pretraining dataset) and higher-
quality datasets, including an expanded version of the WebText dataset, two
internet-based books corpora and English-language Wikipedia. The GPT-4 base
model was trained using publicly available data (such as internet data) and data
that was licensed by OpenAI. The model was fine-tuned using reinforcement
learning with human feedback (RLHF).

Learn more about the training and modeling techniques in OpenAI's GPT-3 , GPT-
4 , and Codex research papers. The guidance below is also drawn from OpenAI's
safety best practices .

Key terms

Term Definition

Prompt The text you send to the service in the API call. This text is then input into the
model. For example, one might input the following prompt:

Convert the questions to a command:

Q: Ask Constance if we need some bread

A: send-msg 'find constance' Do we need some bread?

Q: Send a message to Greg to figure out if things are ready for


Wednesday.

A:

Completion The text Azure OpenAI outputs in response. For example, the service may
or respond with the following answer to the above prompt: send-msg 'find
Generation greg' figure out if things are ready for Wednesday.

Token Azure OpenAI processes text by breaking it down into tokens. Tokens can be
words or just chunks of characters. For example, the word hamburger gets
broken up into the tokens ham , bur and ger , while a short and common word
like pear is a single token. Many tokens start with a whitespace, for example
hello and bye .
Capabilities
Text and code models

The GPT-4, GPT-3, and Codex Azure OpenAI Service models use natural language
instructions and examples in the prompt to identify the task. The model then
completes the task by predicting the most probable next text. This technique is
known as "in-context" learning. These models are not retrained during this step but
instead give predictions based on the context you include in the prompt.

There are three main approaches for in-context learning. These approaches vary
based on the amount of task-specific data that is given to the model:

Few-shot : In this case, a user includes several examples in the prompt that
demonstrate the expected answer format and content. The following example
shows a few-shot prompt providing multiple examples:

Convert the questions to a command:

Q: Ask Constance if we need some bread

A: send-msg `find constance` Do we need some bread?

Q: Send a message to Greg to figure out if things areready for


Wednesday.

A: send-msg `find greg` Is everything ready forWednesday?

Q: Ask Ilya if we're still having our meeting thisevening

A: send-msg `find ilya` Are we still having a meetingthis evening?

Q: Contact the ski store and figure out if I can getmy skis fixed before
I leave on Thursday

A: send-msg `find ski store` Would it be possible toget my skis fixed


before I leave on Thursday?

Q: Thank Nicolas for lunch

A: send-msg `find nicolas` Thank you for lunch!

Q: Tell Constance that I won't be home before 19:30tonight — unmovable


meeting.

A: send-msg `find constance` I won't be home before19:30 tonight. I have


a meeting I can't move.

Q: Tell John that I need to book an appointment at10:30

A:

The number of examples typically ranges from 0 to 100 depending on how many
can fit in the maximum input length for a single prompt. Few-shot learning enables
a major reduction in the amount of task-specific data required for accurate
predictions.

One-shot : This case is the same as the few-shot approach except only one example
is provided. The following example shows a one-shot prompt:
Convert the questions to a command:

Q: Ask Constance if we need some bread

A: send-msg `find constance` Do we need some bread?

Q: Send a message to Greg to figure out if things are ready for


Wednesday.

A:

Zero-shot: In this case, no examples are provided to the model and only the task
request is provided. The following example shows a zero-shot prompt:

Convert the question to a command:

Q: Ask Constance if we need some bread

A:

Use cases
Text and code models

Intended uses
The GPT-4, GPT-3, and Codex models in the Azure OpenAI service can be used in
multiple scenarios. The following list isn't comprehensive, but it illustrates the
diversity of tasks that can be supported with appropriate mitigations:

Chat and conversation interaction : Users can interact with a conversational


agent that responds with responses drawn from trusted documents such as
internal company documentation or tech support documentation.
Conversations must be limited to answering scoped questions.
Chat and conversation creation : Users can create a conversational agent that
responds with responses drawn from trusted documents such as internal
company documentation or tech support documentation. Conversations must
be limited to answering scoped questions.
Code generation or transformation scenarios : For example, converting one
programming language to another, generating docstrings for functions,
converting natural language to SQL.
Journalistic content : For use to create new journalistic content or to rewrite
journalistic content submitted by the user as a writing aid for predefined
topics. Users cannot use the application as a general content creation tool for
all topics. May not be used to generate content for political campaigns.
Question-answering : Users can ask questions and receive answers from
trusted source documents such as internal company documentation. The
application does not generate answers ungrounded in trusted source
documentation.
Reason over structured and unstructured data : Users can analyze inputs
using classification, sentiment analysis of text, or entity extraction. Examples
include analyzing product feedback sentiment, analyzing support calls and
transcripts, and refining text-based search with embeddings.
Search : Users can search trusted source documents such as internal company
documentation. The application does not generate results ungrounded in
trusted source documentation.
Summarization : Users can submit content to be summarized for predefined
topics built into the application and cannot use the application as an open-
ended summarizer. Examples include summarization of internal company
documentation, call center transcripts, technical reports, and product reviews.
Writing assistance on specific topics : Users can create new content or rewrite
content submitted by the user as a writing aid for business content or pre-
defined topics. Users can only rewrite or create content for specific business
purposes or predefined topics and cannot use the application as a general
content creation tool for all topics. Examples of business content include
proposals and reports. For journalistic use, see above Journalistic content use
case.

Considerations when choosing a use case


We encourage customers to use the Azure OpenAI GPT-4, GPT-3, and Codex
models in their innovative solutions or applications as approved in their Limited
Access registration form. However, here are some considerations when choosing a
use case:

Not suitable for open-ended, unconstrained content generation. Scenarios


where users can generate content on any topic are more likely to produce
offensive or harmful text. The same is true of longer generations.
Not suitable for scenarios where up-to-date, factually accurate information
is crucial unless you have human reviewers or are using the models to search
your own documents and have verified suitability for your scenario. The
service does not have information about events that occur after its training
date, likely has missing knowledge about some topics, and may not always
produce factually accurate information.
Avoid scenarios where use or misuse of the system could result in significant
physical or psychological injury to an individual. For example, scenarios that
diagnose patients or prescribe medications have the potential to cause
significant harm.
Avoid scenarios where use or misuse of the system could have a
consequential impact on life opportunities or legal status. Examples include
scenarios where the AI system could affect an individual's legal status, legal
rights, or their access to credit, education, employment, healthcare, housing,
insurance, social welfare benefits, services, opportunities, or the terms on
which they're provided.
Avoid high stakes scenarios that could lead to harm. The models hosted by
Azure OpenAI service reflect certain societal views, biases, and other
undesirable content present in the training data or the examples provided in
the prompt. As a result, we caution against using the models in high-stakes
scenarios where unfair, unreliable, or offensive behavior might be extremely
costly or lead to harm.
Carefully consider use cases in high stakes domains or industry: Examples
include but are not limited to healthcare, medicine, finance, or legal.
Carefully consider well-scoped chatbot scenarios. Limiting the use of the
service in chatbots to a narrow domain reduces the risk of generating
unintended or undesirable responses.
Carefully consider all generative use cases. Content generation scenarios may
be more likely to produce unintended outputs and these scenarios require
careful consideration and mitigations.

Limitations
When it comes to large-scale natural language models and image models, there are
particular fairness and responsible AI issues to consider. People use language and
images to describe the world and to express their beliefs, assumptions, attitudes, and
values. As a result, publicly available text and image data typically used to train large-
scale natural language processing and image generation models contains societal biases
relating to race, gender, religion, age, and other groups of people, as well as other
undesirable content. These societal biases are reflected in the distributions of words,
phrases, and syntactic structures.

Technical limitations, operational factors and ranges


U Caution

Please be advised that this section contains illustrative examples which include
terms and language that some individuals may find offensive.

Large-scale natural language and image models trained with such data can potentially
behave in ways that are unfair, unreliable, or offensive, in turn causing harms. Some of
the ways are listed here. We emphasize that these types of harms aren't mutually
exclusive. A single model can exhibit more than one type of harm, potentially relating to
multiple different groups of people. For example:

Allocation: These models can be used in ways that lead to unfair allocation of
resources or opportunities. For example, automated resume screening systems can
withhold employment opportunities from one gender if they're trained on resume
data that reflects the existing gender imbalance in a particular industry. Or the
DALL·E 2 model could be used to create imagery in the style of a known artist,
which could affect the value of the artist's work or the artist's life opportunities.
Quality of service: The Azure OpenAI models are trained primarily on English text
and images with English text descriptions. Languages other than English will
experience worse performance. English language varieties with less representation
in the training data might experience worse performance. The publicly available
images used to train the DALL·E models might reinforce public bias and other
undesirable content. The models are also unable to generate consistent
photorealistic images and comprehensive text at this time.
Stereotyping: These models can reinforce stereotypes. For example, when
translating "He is a nurse" and "She is a doctor" into a genderless language such as
Turkish and then back into English, many machine translation systems yield the
stereotypical (and incorrect) results of "She is a nurse" and "He is a doctor." With
DALL·E 2, when generating an image based on the prompt "Fatherless children,"
the model could generate images of Black children only, reinforcing harmful
stereotypes that may exist in publicly available images.
Demeaning: These models can demean people. For example, an open-ended
content generation system with inappropriate or insufficient mitigations might
produce offensive or demeaning to a particular group of people.
Overrepresentation and underrepresentation: These models can over- or under-
represent groups of people, or even erase their representation entirely. For
example, if text prompts that contain the word "gay" are detected as potentially
harmful or offensive, this could to the underrepresentation or even erasure of
legitimate image generations by or about the LGBTQIA+ community.
Inappropriate or offensive content: These models can produce other types of
inappropriate or offensive content. Examples include the ability to create images
that potentially contain harmful artifacts such as hate symbols; images that illicit
harmful connotations; images that relate to contested, controversial, or
ideologically polarizing topics; images that are manipulative; images that contain
sexually charged content that isn't caught by sexual-related content filters; and
images that relate to sensitive or emotionally charged topics. For example, a well-
intentioned text prompt aimed to create an image of the New York skyline with
clouds and airplanes flying over it might unintentionally generate images that illicit
sentiments related to the events surrounding 9/11.
Disinformation and misinformation about sensitive topics: Because DALL·E 2 is a
powerful image generation model, it can be used to produce disinformation and
misinformation that can be extremely harmful. For example, the model might
generate an image of a political leader engaging in activity of a violent or sexual
(or simply inaccurate) nature that might lead to defamation and other
consequential harms, including but not limited to public protests, political change,
or fake news.
Information reliability: Language model responses can fabricate content that may
sound reasonable but is nonsensical or inaccurate with respect to external
validation sources. Even when drawing responses from trusted source information,
responses may misrepresent that content.
False information: Azure OpenAI doesn't fact-check or verify content that is
provided by customers or users. Depending on how you've developed your
application, it might produce false information unless you've built in mitigations
(see Best practices for improving system performance below).

System performance
In many AI systems, performance is often defined in relation to accuracy—that is, how
often the AI system offers a correct prediction or output. With large-scale natural
language models and image models, two different users might look at the same output
and have different opinions of how useful or relevant it is, which means that
performance for these systems must be defined more flexibly. Here, we broadly consider
performance to mean that the application performs as you and your users expect,
including not generating harmful outputs.

Azure OpenAI service can support a wide range of applications like search, classification,
code generation, and image generation, each with different performance metrics and
mitigation strategies. There are several steps you can take to mitigate some of the
concerns listed under "Limitations" and to improve performance. Other important
mitigation techniques are outlined in the section Evaluating and integrating Azure
OpenAI for your use below.

Best practices for improving system performance


Show and tell when designing prompts.  With text and code models, make it clear
to the model what kind of outputs you expect through instructions, examples, or a
combination of the two. If you want the model to rank a list of items in
alphabetical order or to classify a paragraph by sentiment, show it that's what you
want.
Keep your application on topic. Carefully structure prompts and image inputs to
reduce the chance of producing undesired content, even if a user tries to use it for
this purpose. For instance, you might indicate in your prompt that a chatbot only
engages in conversations about mathematics and otherwise responds "I'm sorry.
I'm afraid I can't answer that." Adding adjectives like "polite" and examples in your
desired tone to your prompt can also help steer outputs. With image models, you
might indicate in your prompt or image input that your application generates only
conceptual images. It might otherwise generate a pop-up notification that explains
that the application is not for photorealistic use or to portray reality. Consider
nudging users toward acceptable queries and image inputs, either by listing such
examples up front or by offering them as suggestions upon receiving an off-topic
request. Consider training a classifier to determine whether an input (prompt or
image) is on topic or off topic.
Provide quality data. With text and code models, if you're trying to build a
classifier or get the model to follow a pattern, make sure that there are enough
examples. Be sure to proofread your examples—the model is usually smart enough
to see through basic spelling mistakes and give you a response, but it also might
assume this is intentional and it could affect the response. Providing quality data
also includes giving your model reliable data to draw responses from in chat and
question answering systems.
Measure model quality.  As part of general model quality, consider measuring and
improving fairness-related metrics and other metrics related to responsible AI in
addition to traditional accuracy measures for your scenario. Consider resources
like this checklist when you measure the fairness of the system. These
measurements come with limitations, which you should acknowledge and
communicate to stakeholders along with evaluation results.
Limit the length, structure, and rate of inputs and outputs. Restricting the length
or structure of inputs and outputs can increase the likelihood that the application
will stay on task and mitigate, at least in part, any potentially unfair, unreliable, or
offensive behavior. Other options to reduce the risk of misuse include (i) restricting
the source of inputs (for example, limiting inputs to a particular domain or to
authenticated users rather than being open to anyone on the internet) and (ii)
implementing usage rate limits.
Encourage human review of outputs prior to publication or dissemination. With
generative AI, there is potential for generating content that might be offensive or
not related to the task at hand, even with mitigations in place. To ensure that the
generated output meets the task of the customer, consider building ways to
remind customers to review their outputs for quality prior to sharing widely. This
can reduce many different harms, including offensive material, disinformation, and
more.
Implement additional scenario-specific mitigations. Refer to the mitigations
outlined in Evaluating and integrating Azure OpenAI for your use including content
moderation strategies. These do not represent every mitigation that might be
required for your application, but they point to the general minimum baseline we
check for when approving use cases for Azure OpenAI Service.

Evaluating and integrating Azure OpenAI for


your use
For additional information on how to evaluate and integrate these models responsibly,
please see the RAI Overview document.

Learn more about responsible AI


Microsoft AI principles
Microsoft responsible AI resources
Microsoft Azure Learning courses on responsible AI

Learn more about Azure OpenAI


Limited access to Azure OpenAI Service - Azure Cognitive Services | Microsoft
Learn
Code of Conduct for the Azure OpenAI Service | Microsoft Learn
Data, privacy, and security for Azure OpenAI Service - Azure Cognitive Services |
Microsoft Learn
Limited access to Azure OpenAI Service
Article • 04/28/2023

As part of Microsoft's commitment to responsible AI, we are designing and releasing


Azure OpenAI Service with the intention of protecting the rights of individuals and
society and fostering transparent human-computer interaction. For this reason, we
currently limit the access and use of Azure OpenAI, including limiting access to the
ability to modify content filters and/or abuse monitoring.

Registration process
Azure OpenAI requires registration and is currently only available to approved enterprise
customers and partners. Customers who wish to use Azure OpenAI are required to
submit a registration form .

Customers must attest to any and all use cases for which they will use the service (the
use cases from which customers may select will populate in the form after selection of
the desired model(s) in Question 22 in the initial registration form). Customers who wish
to add additional use cases after initial onboarding must submit the additional use cases
using this form . The use of Azure OpenAI is limited to use cases that have been
selected in a registration form. Microsoft may require customers to re-verify this
information. Read more about example use cases and use cases to avoid here.

Customers who wish to modify content filters and modify abuse monitoring after they
have onboarded to the service are subject to additional eligibility criteria and scenario
restrictions. At this time, modified content filters and/or modified abuse monitoring for
Azure OpenAI Service are only available to managed customers and partners working
with Microsoft account teams and have additional use case restrictions. Customers
meeting these requirements can register here .

Access to the Azure OpenAI Service is subject to Microsoft's sole discretion based on
eligibility criteria and a vetting process, and customers must acknowledge that they
have read and understand the Azure terms of service for Azure OpenAI Service.

Azure OpenAI Service is made available to customers under the terms governing their
subscription to Microsoft Azure Services, including the Azure OpenAI section of the
Microsoft Product Terms . Please review these terms carefully as they contain
important conditions and obligations governing your use of Azure OpenAI Service.

Important links
Register to use Azure OpenAI
Add additional use cases (if needed)
Register to modify content filters and abuse monitoring (if needed)

Help and support


FAQ about Limited Access can be found here.
If you need help with Azure OpenAI, find
support here. Report abuse of Azure OpenAI here .

Report problematic content to cscraireport@microsoft.com.

See also
Code of conduct for Azure OpenAI Service integrations
Transparency note for Azure OpenAI Service
Characteristics and limitations for Azure OpenAI Service
Data, privacy, and security for Azure OpenAI Service
Code of conduct for Azure OpenAI
Service
Article • 03/13/2023

The following Code of Conduct defines the requirements that all Azure OpenAI Service
implementations must adhere to in good faith. This code of conduct is in addition to the
Acceptable Use Policy in the Microsoft Online Services Terms .

Access requirements
Azure OpenAI Service is a Limited Access service that requires registration and is only
available to approved enterprise customers and partners. Customers who wish to use
this service are required to register through this form . To learn more, see Limited
Access to Azure OpenAI Service.

Responsible AI mitigation requirements


Integrations with Azure OpenAI Service must:

Implement meaningful human oversight


Implement strong technical limits on inputs and outputs to reduce the likelihood
of misuse beyond the application's intended purpose
Test applications thoroughly to find and mitigate undesirable behaviors
Establish feedback channels
Implement additional scenario-specific mitigations

To learn more, see the Azure OpenAI transparency note.

Integrations with Azure OpenAI Service must


not:
be used in any way that violates Microsoft’s Acceptable Use Policy , including but
not limited to any use prohibited by law, regulation, government order, or decree,
or any use that violates the rights of others;
be used in any way that is inconsistent with this code of conduct, including the
Limited Access requirements, the Responsible AI mitigation requirements, and the
Content requirements;
exceed the use case(s) you identified to Microsoft in connection with your request
to use the service;
interact with individuals under the age of consent in any way that could result in
exploitation or manipulation or is otherwise prohibited by law or regulation;
generate or interact with content prohibited in this Code of Conduct;
be presented alongside or monetize content prohibited in this Code of Conduct;
make decisions without appropriate human oversight if your application may have
a consequential impact on any individual’s legal position, financial position, life
opportunities, employment opportunities, human rights, or result in physical or
psychological injury to an individual;
infer sensitive information about people without their explicit consent unless if
used in a lawful manner by a law enforcement entity, court, or government official
subject to judicial oversight in a jurisdiction that maintains a fair and independent
judiciary; or
be used for chatbots that (i) are erotic, romantic, or used for companionship
purposes, or which are otherwise prohibited by this Code of Conduct; (ii) are
personas of specific people without their explicit consent; (iii) claim to have special
wisdom/insight/knowledge, unless very clearly labeled as being for entertainment
purposes only; or (iv) enable end users to create their own chatbots without
oversight.

Content requirements
We prohibit the use of our service for generating content that can inflict harm on
individuals or society. Our content policies are intended to improve the safety of our
platform.

These content requirements apply to the output of all models developed by OpenAI and
hosted in Azure OpenAI, such as GPT-3, GPT-4, Codex models, and DALL·E 2, and
includes content provided as input to the service and content generated as output from
the service.

Exploitation and Abuse

Child sexual exploitation and abuse

Azure OpenAI Service prohibits content that describes, features, or promotes child
sexual exploitation or abuse, whether or not prohibited by law. This includes sexual
content involving a child or that sexualizes a child.
Grooming
Azure OpenAI Service prohibits content that describes or is used for purposes of
grooming of children. Grooming is the act of an adult building a relationship with a
child for the purposes of exploitation, especially sexual exploitation. This includes
communicating with a child for the purpose of sexual exploitation, trafficking, or other
forms of exploitation.

Non-consensual intimate content

Azure OpenAI Service prohibits content that describes, features, or promotes non-
consensual intimate activity.

Sexual solicitation

Azure OpenAI Service prohibits content that describes, features, or promotes, or is used
for, purposes of solicitation of commercial sexual activity and sexual services. This
includes encouragement and coordination of real sexual activity.

Trafficking

Azure OpenAI Service prohibits content describing or used for purposes of human
trafficking. This includes the recruitment of individuals, facilitation of transport, and
payment for, and the promotion of, exploitation of people such as forced labor,
domestic servitude, sexual slavery, forced marriages, and forced medical procedures.

Suicide and Self-Injury


Azure OpenAI Service prohibits content that describes, praises, supports, promotes,
glorifies, encourages and/or instructs individual(s) on self-injury or to take their life.

Violent Content and Conduct

Graphic violence and gore

Azure OpenAI Service prohibits content that describes, features, or promotes graphic
violence or gore.

Terrorism and Violent Extremism


Azure OpenAI Service prohibits content that depicts an act of terrorism; praises, or
supports a terrorist organization, terrorist actor, or violent terrorist ideology; encourages
terrorist activities; offers aid to terrorist organizations or terrorist causes; or aids in
recruitment to a terrorist organization.

Violent Threats, Incitement, and Glorification of Violence

Azure OpenAI Service prohibits content advocating or promoting violence toward


others through violent threats or incitement.

Harmful Content

Hate speech and discrimination


Azure OpenAI Service prohibits content that attacks, denigrates, intimidates, degrades,
targets, or excludes individuals or groups on the basis of traits such as actual or
perceived race, ethnicity, national origin, gender, gender identity, sexual orientation,
religious affiliation, age, disability status, caste, or any other characteristic that is
associated with systemic prejudice or marginalization.

Bullying and harassment

Azure OpenAI Service prohibits content that targets individual(s) or group(s) with
threats, intimidation, insults, degrading or demeaning language or images, promotion of
physical harm, or other abusive behavior such as stalking.

Deception, disinformation, and inauthentic activity


Azure OpenAI Service prohibits content that is intentionally deceptive and likely to
adversely affect the public interest, including deceptive or untrue content relating to
health, safety, election integrity, or civic participation. Azure OpenAI Service also
prohibits inauthentic interactions, such as fake accounts, automated inauthentic activity,
impersonation to gain unauthorized information or privileges, and claims to be from any
person, company, government body, or entity without explicit permission to make that
representation.

Active malware or exploits

Content that directly supports unlawful active attacks or malware campaigns that cause
technical harms, such as delivering malicious executables, organizing denial of service
attacks, or managing command and control servers.

Additional content policies


We prohibit the use of our Azure OpenAI Service for scenarios in which the system is
likely to generate undesired content due to limitations in the models or scenarios in
which the system cannot be applied in a way that properly manages potential negative
consequences to people and society. Without limiting the foregoing restriction,
Microsoft reserves the right to revise and expand the above Content requirements to
address specific harms to people and society.

This includes prohibiting content that is sexually graphic, including consensual


pornographic content and intimate descriptions of sexual acts, as well as content that
may influence the political process, such as an election, passage of legislation, and
content for campaigning purposes.

We may at times limit our service's ability to respond to particular topics, such as
probing for personal information or seeking opinions on sensitive topics or current
events.

We prohibit the use of Azure OpenAI Service for activities that significantly harm other
individuals, organizations, or society, including but not limited to use of the service for
purposes in conflict with the applicable Azure Legal Terms and the Microsoft Product
Terms .

Report abuse
If you suspect that Azure OpenAI Service is being used in a manner that is abusive or
illegal, infringes on your rights or the rights of other people, or violates these policies,
you can report it at the Report Abuse Portal .

Report problematic content


If Azure OpenAI Service outputs problematic content that you believe should have been
filtered, report it at cscraireport@microsoft.com.

See also
Limited access to Azure OpenAI Service
Transparency note for Azure OpenAI Service
Data, privacy, and security for Azure OpenAI Service
Data, privacy, and security for Azure
OpenAI Service
Article • 06/23/2023

This article provides details regarding how data provided by you to the Azure OpenAI
service is processed, used, and stored. Azure OpenAI stores and processes data to
provide the service and to monitor for uses that violate the applicable product terms.
Please also see the Microsoft Products and Services Data Protection Addendum ,
which governs data processing by the Azure OpenAI Service except as otherwise
provided in the applicable Product Terms .

) Important

Your prompts (inputs) and completions (outputs), your embeddings, and your
training data:

are NOT available to other customers.


are NOT available to OpenAI.
are NOT used to improve OpenAI models.
are NOT used to improve any Microsoft or 3rd party products or services.
are NOT used for automatically improving Azure OpenAI models for your use
in your resource (The models are stateless, unless you explicitly fine-tune
models with your training data).
Your fine-tuned Azure OpenAI models are available exclusively for your use.

The Azure OpenAI Service is fully controlled by Microsoft; Microsoft hosts the
OpenAI models in Microsoft’s Azure environment and the Service does NOT
interact with any services operated by OpenAI (e.g. ChatGPT, or the OpenAI API).

What data does the Azure OpenAI Service


process?
Azure OpenAI processes the following types of data:

Prompts and generated content. Prompts are submitted by the user, and content
is generated by the service, via the completions, chat completions, images and
embeddings operations.
Augmented data included with prompts. When using the "on your data" feature,
the service retrieves relevant data from a configured data store and augments the
prompt to produce generations that are grounded with your data.
Training & validation data. You can provide your own training data consisting of
prompt-completion pairs for the purposes of fine-tuning an OpenAI model.

How does the Azure OpenAI Service process


data?
The diagram below illustrates how your data is processed. This diagram covers three
different types of processing:

1. How the Azure OpenAI Service processes your prompts to generate content
(including when additional data from a connected data source is added to a
prompt using Azure OpenAI on your data).
2. How the Azure OpenAI Service creates a fine-tuned (custom) model with your
training data.
3. How the Azure OpenAI Service and Microsoft personnel analyze prompts,
completions and images for harmful content and for patterns suggesting the use
of the service in a manner that violates the Code of Conduct or other applicable
product terms

As depicted in the diagram above, managed customers may apply to modify abuse
monitoring .

Generating completions, images or embeddings


Models (base or fine-tuned) deployed in your resource process your input prompts and
generate responses with text, images or embeddings. The service is configured to
synchronously evaluate the prompt and completion data in real time to check for
harmful content types and stops generating content that exceeds the configured
thresholds. Learn more at Azure OpenAI Service content filtering.

The models are stateless: no prompts or generations are stored in the model.
Additionally, prompts and generations are not used to train, retrain, or improve the
base models.

Augmenting prompts with data retrieved from your data sources to


"ground" the generated results

The Azure OpenAI "on your data" feature lets you connect data sources to ground the
generated results with your data. The data remains stored in the data source and
location you designate. No data is copied into the Azure OpenAI service. When a user
prompt is received, the service retrieves relevant data from the connected data source
and augments the prompt. The model processes this augmented prompt and the
generated content is returned as described above.

As depicted in the diagram above, managed customers may apply to modify abuse
monitoring .

Creating a customized (fine-tuned) model with your data:


Customers can upload their training data to the service to fine tune a model. Uploaded
training data is stored in the Azure OpenAI resource in the customer’s Azure tenant.
Training data and fine-tuned models:

Are available exclusively for use by the customer.


Are stored within the same region as the Azure OpenAI resource.
Can be double encrypted at rest (by default with Microsoft's AES-256 encryption
and optionally with a customer managed key).
Can be deleted by the customer at any time.

Training data uploaded for fine-tuning is not used to train, retrain, or improve any
Microsoft or 3rd party base models.

Preventing abuse and harmful content generation


To reduce the risk of harmful use of the Azure OpenAI Service, the Azure OpenAI Service
includes both content filtering and abuse monitoring features. To learn more about
content filtering, see Azure OpenAI Service content filtering. To learn more about abuse
monitoring, see abuse monitoring.

Content filtering occurs synchronously as the service processes prompts to generate


content as described above and here. No prompts or generated results are stored in the
content classifier models, and prompts and results are not used to train, retrain, or
improve the classifier models.

Azure OpenAI abuse monitoring detects and mitigates instances of recurring content
and/or behaviors that suggest use of the service in a manner that may violate the code
of conduct or other applicable product terms. To detect and mitigate abuse, Azure
OpenAI stores all prompts and generated content securely for up to thirty (30) days. (No
prompts or completions are stored if the customer is approved for and elects to
configure abuse monitoring off, as described below.)

The data store where prompts and completions are stored is logically separated by
customer resource (each request includes the resource ID of the customer’s Azure
OpenAI resource). A separate data store is located in each region in which the Azure
OpenAI Service is available, and a customer’s prompts and generated content are stored
in the Azure region where the customer’s Azure OpenAI service resource is deployed,
within the Azure OpenAI service boundary. Human reviewers assessing potential abuse
can access prompts and completions data only when that data has been flagged by the
abuse monitoring system. The human reviewers are authorized Microsoft employees
who access the data via point wise queries using request IDs, Secure Access
Workstations (SAWs), and Just-In-Time (JIT) request approval granted by team
managers. For Azure OpenAI Service deployed in the European Economic Area, the
authorized Microsoft employees are located in the European Economic Area.
How can customers get an exemption from
abuse monitoring and human review?
Some customers may want to use the Azure OpenAI Service for a use case that involves
the processing of sensitive, highly confidential, or legally-regulated input data but where
the likelihood of harmful outputs and/or misuse is low. These customers may conclude
that they do not want or do not have the right to permit Microsoft to process such data
for abuse detection, as described above, due to their internal policies or applicable legal
regulations. To address these concerns, Microsoft allows customers who meet additional
Limited Access eligibility criteria and attest to specific use cases to apply to modify the
Azure OpenAI content management features by completing this form .

If Microsoft approves a customer's request to modify abuse monitoring, then Microsoft


does not store any prompts and completions associated with the approved Azure
subscription for which abuse monitoring is configured off. In this case, because no
prompts and completions are stored at rest in the Service Results Store, the human
review process is not possible and is not performed. See Abuse monitoring for more
information.

How can a customer verify if data storage for abuse


monitoring is off?
There are two ways for customers, once approved to turn off abuse monitoring, to verify
that data storage for abuse monitoring has been turned off in their approved Azure
subscription:

Using the Azure portal, or


Azure CLI (or any management API).

7 Note

The value of "false" for the "ContentLogging" attribute appears only if data storage
for abuse monitoring is turned off. Otherwise, this property will not appear in either
Azure portal or Azure CLI's output.

Prerequisites
1. Sign into Azure
2. Select the Azure Subscription which hosts the Azure OpenAI Service resource.
3. Navigate to the Overview page of the Azure OpenAI Service resource.
Logging status verification using the Azure portal:

1. Go to the resource Overview page

2. Click on the JSON view link on the top right corner as shown in the image below.

There will be a value in the Capabilities list called "ContentLogging" which will appear
and be set to FALSE when logging for abuse monitoring is off.

JSON

"name":"ContentLogging",

"value":"false"

Logging status verification using the Azure CLI (or other management API):

Execute the following command in Azure CLI to see the same JSON data as shown in the
Azure portal above.

Azure CLI

az cognitiveservices account show -n resource\_name -g resource \_group

To learn more about Microsoft's privacy and security commitments see the Microsoft
Trust Center .

Change log
Date Changes

23 Added information about data processing for new Azure on your data feature; removed
June information about abuse monitoring which is now available at Azure OpenAI Service
2023 abuse monitoring. Added summary note. Updated and streamlined content and updated
diagrams for additional clarity. added change log

See also
Limited access to Azure OpenAI Service
Code of conduct for Azure OpenAI Service integrations
Transparency note and use cases for Azure OpenAI Service
Characteristics and limitations for Azure OpenAI Service
Report abuse of Azure OpenAI Service through the Report Abuse Portal
Report problematic content to cscraireport@microsoft.com
Microsoft.CognitiveServices accounts
Article • 06/18/2023

Bicep resource definition


The accounts resource type can be deployed with operations that target:

Resource groups - See resource group deployment commands

For a list of changed properties in each API version, see change log.

Resource format
To create a Microsoft.CognitiveServices/accounts resource, add the following Bicep to your template.

Bicep

resource symbolicname 'Microsoft.CognitiveServices/accounts@2023-05-01' = {

name: 'string'

location: 'string'

tags: {

tagName1: 'tagValue1'

tagName2: 'tagValue2'

sku: {

capacity: int

family: 'string'

name: 'string'

size: 'string'

tier: 'string'

kind: 'string'

identity: {

type: 'string'

userAssignedIdentities: {}

properties: {

allowedFqdnList: [

'string'

apiProperties: {

aadClientId: 'string'

aadTenantId: 'string'

eventHubConnectionString: 'string'

qnaAzureSearchEndpointId: 'string'

qnaAzureSearchEndpointKey: 'string'

qnaRuntimeEndpoint: 'string'

statisticsEnabled: bool

storageAccountConnectionString: 'string'

superUser: 'string'

websiteName: 'string'

customSubDomainName: 'string'

disableLocalAuth: bool

dynamicThrottlingEnabled: bool

encryption: {

keySource: 'string'

keyVaultProperties: {

identityClientId: 'string'

keyName: 'string'

keyVaultUri: 'string'

keyVersion: 'string'

locations: {

regions: [

customsubdomain: 'string'

name: 'string'

value: int

routingMethod: 'string'

migrationToken: 'string'

networkAcls: {

defaultAction: 'string'

ipRules: [

value: 'string'

virtualNetworkRules: [

id: 'string'

ignoreMissingVnetServiceEndpoint: bool

state: 'string'

publicNetworkAccess: 'string'

restore: bool

restrictOutboundNetworkAccess: bool

userOwnedStorage: [

identityClientId: 'string'

resourceId: 'string'

Property values

accounts

Name Description Value

name The resource name string (required)

Character limit: 2-64

Valid characters:

Alphanumerics and hyphens.

Start and end with alphanumeric.

location The geo-location where the resource lives string

tags Resource tags. Dictionary of tag names and values. See Tags in templates

sku The resource model definition representing SKU Sku

kind The Kind of the resource. string

identity Identity for the resource. Identity

properties Properties of Cognitive Services account. AccountProperties

Identity

Name Description

type The identity type.

userAssignedIdentities The list of user assigned identities associated with the resource. The user identity dictionary key references will be ARM resource ids in the for
'/subscriptions/{subscriptionId}/resourceGroups/{resourceGroupName}/providers/Microsoft.ManagedIdentity/userAssignedIdentities/{identity

AccountProperties

Name Description Value

allowedFqdnList string[]
Name Description Value

apiProperties The api properties for special APIs. ApiProperties

customSubDomainName Optional subdomain name used for token-based authentication. string

disableLocalAuth bool

dynamicThrottlingEnabled The flag to enable dynamic throttling. bool

encryption The encryption properties for this resource. Encryption

locations The multiregion settings of Cognitive Services account. MultiRegionSettings

migrationToken Resource migration token. string

networkAcls A collection of rules governing the accessibility from specific network locations. NetworkRuleSet

publicNetworkAccess Whether or not public endpoint access is allowed for this account. 'Disabled'

'Enabled'

restore bool

restrictOutboundNetworkAccess bool

userOwnedStorage The storage accounts for this resource. UserOwnedStorage[]

ApiProperties

Name Description Value

aadClientId (Metrics Advisor Only) The Azure AD Client Id (Application Id). string

aadTenantId (Metrics Advisor Only) The Azure AD Tenant Id. string

eventHubConnectionString (Personalization Only) The flag to enable statistics of Bing Search. string

qnaAzureSearchEndpointId (QnAMaker Only) The Azure Search endpoint id of QnAMaker. string

qnaAzureSearchEndpointKey (QnAMaker Only) The Azure Search endpoint key of QnAMaker. string

qnaRuntimeEndpoint (QnAMaker Only) The runtime endpoint of QnAMaker. string

statisticsEnabled (Bing Search Only) The flag to enable statistics of Bing Search. bool

storageAccountConnectionString (Personalization Only) The storage account connection string. string

superUser (Metrics Advisor Only) The super user of Metrics Advisor. string

websiteName (Metrics Advisor Only) The website name of Metrics Advisor. string

Encryption

Name Description Value

keySource Enumerates the possible value of keySource for Encryption 'Microsoft.CognitiveServices'

'Microsoft.KeyVault'

keyVaultProperties Properties of KeyVault KeyVaultProperties

KeyVaultProperties

Name Description Value

identityClientId string

keyName Name of the Key from KeyVault string

keyVaultUri Uri of KeyVault string

keyVersion Version of the Key from KeyVault string

MultiRegionSettings
Name Description Value

regions RegionSetting[]

routingMethod Multiregion routing methods. 'Performance'

'Priority'

'Weighted'

RegionSetting

Name Description Value

customsubdomain Maps the region to the regional custom subdomain. string

name Name of the region. string

value A value for priority or weighted routing methods. int

NetworkRuleSet

Name Description Value

defaultAction The default action when no rule from ipRules and from virtualNetworkRules match. This is only used after the 'Allow'

bypass property has been evaluated. 'Deny'

ipRules The list of IP address rules. IpRule[]

virtualNetworkRules The list of virtual network rules. VirtualNetworkRule[]

IpRule

Name Description Value

value An IPv4 address range in CIDR notation, such as '124.56.78.91' (simple IP address) or '124.56.78.0/24' (all addresses that start with string
124.56.78). (required)

VirtualNetworkRule

Name Description Value

id Full resource id of a vnet subnet, such as string


'/subscriptions/subid/resourceGroups/rg1/providers/Microsoft.Network/virtualNetworks/test- (required)
vnet/subnets/subnet1'.

ignoreMissingVnetServiceEndpoint Ignore missing vnet service endpoint or not. bool

state Gets the state of virtual network rule. string

UserOwnedStorage

Name Description Value

identityClientId string

resourceId Full resource id of a Microsoft.Storage resource. string

Sku

Name Description Value

capacity If the SKU supports scale out/in then the capacity integer should be included. If scale out/in is not possible for the resource this int
may be omitted.

family If the service has different generations of hardware, for the same SKU, then that can be captured here. string

name The name of the SKU. Ex - P3. It is typically a letter+number code string
(required)
Name Description Value

size The SKU size. When the name field is the combination of tier and some other value, this would be the standalone code. string

tier This field is required to be implemented by the Resource Provider if the service has more than one tier, but is not required on a 'Basic'

PUT. 'Enterprise'

'Free'

'Premium'

'Standard'

Quickstart templates
The following quickstart templates deploy this resource type.

Template Description

Deploy a Cognitive Services Computer Vision API


A template for creating a new Cognitive Services Computer Vision API

Deploy a Cognitive Services Translate API


This template deploys a Cognitive Services Translate API. Microsoft Translator
API is a neural machine translation service that developers can easily
integrate into their applications websites, tools, or any solution requiring
multi-language support such as website localization, e-commerce, customer
support, messaging applications, internal communication, and more.

Deploy a Cognitive Service Universal key


This template deploys all the Cognitive Services APIs Cognitive Services bring
AI within reach of every developer?without requiring machine-learning
expertise. All it takes is an API call to embed the ability to see, hear, speak,
search, understand, and accelerate decision-making into your apps.
Azure OpenAI Service REST API
reference
Article • 06/19/2023

This article provides details on the inference REST API endpoints for Azure OpenAI.

Authentication
Azure OpenAI provides two methods for authentication. you can use either API Keys or
Azure Active Directory.

API Key authentication: For this type of authentication, all API requests must
include the API Key in the api-key HTTP header. The Quickstart provides guidance
for how to make calls with this type of authentication.

Azure Active Directory authentication: You can authenticate an API call using an
Azure Active Directory token. Authentication tokens are included in a request as
the Authorization header. The token provided must be preceded by Bearer , for
example Bearer YOUR_AUTH_TOKEN . You can read our how-to guide on
authenticating with Azure Active Directory.

REST API versioning


The service APIs are versioned using the api-version query parameter. All versions
follow the YYYY-MM-DD date structure. For example:

HTTP

POST
https://YOUR_RESOURCE_NAME.openai.azure.com/openai/deployments/YOUR_DEPLOYME
NT_NAME/completions?api-version=2023-05-15

Completions
With the Completions operation, the model will generate one or more predicted
completions based on a provided prompt. The service can also return the probabilities
of alternative tokens at each position.

Create a completion
HTTP

POST https://{your-resource-
name}.openai.azure.com/openai/deployments/{deployment-id}/completions?api-
version={api-version}

Path parameters

Parameter Type Required? Description

your-resource- string Required The name of your Azure OpenAI Resource.


name

deployment-id string Required The deployment name you chose when you deployed the
model.

api-version string Required The API version to use for this operation. This follows the
YYYY-MM-DD format.

Supported versions

2023-03-15-preview Swagger spec


2022-12-01 Swagger spec

2023-05-15 Swagger spec

2023-06-01-preview Swagger spec

Request body

Parameter Type Required? Default Description

prompt string Optional <\|endoftext\|> The prompt(s) to generate


or array completions for, encoded as a
string, or array of strings. Note
that <\|endoftext\|> is the
document separator that the
model sees during training, so if
a prompt isn't specified the
model will generate as if from
the beginning of a new
document.
Parameter Type Required? Default Description

max_tokens integer Optional 16 The maximum number of tokens


to generate in the completion.
The token count of your prompt
plus max_tokens can't exceed the
model's context length. Most
models have a context length of
2048 tokens (except for the
newest models, which support
4096).

temperature number Optional 1 What sampling temperature to


use, between 0 and 2. Higher
values means the model will take
more risks. Try 0.9 for more
creative applications, and 0
( argmax sampling ) for ones with
a well-defined answer. We
generally recommend altering
this or top_p but not both.

top_p number Optional 1 An alternative to sampling with


temperature, called nucleus
sampling, where the model
considers the results of the
tokens with top_p probability
mass. So 0.1 means only the
tokens comprising the top 10%
probability mass are considered.
We generally recommend
altering this or temperature but
not both.
Parameter Type Required? Default Description

logit_bias map Optional null Modify the likelihood of


specified tokens appearing in the
completion. Accepts a json
object that maps tokens
(specified by their token ID in the
GPT tokenizer) to an associated
bias value from -100 to 100. You
can use this tokenizer tool (which
works for both GPT-2 and GPT-3)
to convert text to token IDs.
Mathematically, the bias is added
to the logits generated by the
model prior to sampling. The
exact effect will vary per model,
but values between -1 and 1
should decrease or increase
likelihood of selection; values like
-100 or 100 should result in a
ban or exclusive selection of the
relevant token. As an example,
you can pass {"50256": -100} to
prevent the <|endoftext|> token
from being generated.

user string Optional A unique identifier representing


your end-user, which can help
monitoring and detecting abuse

n integer Optional 1 How many completions to


generate for each prompt. Note:
Because this parameter
generates many completions, it
can quickly consume your token
quota. Use carefully and ensure
that you have reasonable
settings for max_tokens and
stop.

stream boolean Optional False Whether to stream back partial


progress. If set, tokens will be
sent as data-only server-sent
events as they become available,
with the stream terminated by a
data: [DONE] message.
Parameter Type Required? Default Description

logprobs integer Optional null Include the log probabilities on


the logprobs most likely tokens,
as well the chosen tokens. For
example, if logprobs is 10, the
API will return a list of the 10
most likely tokens. the API will
always return the logprob of the
sampled token, so there may be
up to logprobs+1 elements in
the response. This parameter
cannot be used with gpt-35-
turbo .

suffix string Optional null The suffix that comes after a


completion of inserted text.

echo boolean Optional False Echo back the prompt in


addition to the completion. This
parameter cannot be used with
gpt-35-turbo .

stop string Optional null Up to four sequences where the


or array API will stop generating further
tokens. The returned text won't
contain the stop sequence.

presence_penalty number Optional 0 Number between -2.0 and 2.0.


Positive values penalize new
tokens based on whether they
appear in the text so far,
increasing the model's likelihood
to talk about new topics.

frequency_penalty number Optional 0 Number between -2.0 and 2.0.


Positive values penalize new
tokens based on their existing
frequency in the text so far,
decreasing the model's
likelihood to repeat the same
line verbatim.
Parameter Type Required? Default Description

best_of integer Optional 1 Generates best_of completions


server-side and returns the
"best" (the one with the lowest
log probability per token).
Results can't be streamed. When
used with n, best_of controls the
number of candidate
completions and n specifies how
many to return – best_of must be
greater than n. Note: Because
this parameter generates many
completions, it can quickly
consume your token quota. Use
carefully and ensure that you
have reasonable settings for
max_tokens and stop. This
parameter cannot be used with
gpt-35-turbo .

Example request

Console

curl
https://YOUR_RESOURCE_NAME.openai.azure.com/openai/deployments/YOUR_DEPLOYME
NT_NAME/completions?api-version=2023-05-15\

-H "Content-Type: application/json" \

-H "api-key: YOUR_API_KEY" \

-d "{

\"prompt\": \"Once upon a time\",

\"max_tokens\": 5

}"

Example response

JSON

"id": "cmpl-4kGh7iXtjW4lc9eGhff6Hp8C7btdQ",

"object": "text_completion",

"created": 1646932609,

"model": "ada",

"choices": [

"text": ", a dark line crossed",

"index": 0,
"logprobs": null,

"finish_reason": "length"

In the example response, finish_reason equals stop . If finish_reason equals


content_filter consult our content filtering guide to understand why this is occurring.

Embeddings
Get a vector representation of a given input that can be easily consumed by machine
learning models and other algorithms.

7 Note

We currently do not support batching of embeddings into a single API call. If you
receive the error InvalidRequestError: Too many inputs. The max number of inputs
is 1. We hope to increase the number of inputs per request soon. , this typically
occurs when an array of embeddings is attempted to be passed as a batch rather
than a single string. The string can be up to 8191 tokens in length when using the
text-embedding-ada-002 (Version 2) model.

Create an embedding

HTTP

POST https://{your-resource-
name}.openai.azure.com/openai/deployments/{deployment-id}/embeddings?api-
version={api-version}

Path parameters

Parameter Type Required? Description

your- string Required The name of your Azure OpenAI Resource.


resource-
name

deployment- string Required The name of your model deployment. You're required to first
id deploy a model before you can make calls
Parameter Type Required? Description

api- string Required The API version to use for this operation. This follows the
version YYYY-MM-DD format.

Supported versions

2023-03-15-preview Swagger spec


2022-12-01 Swagger spec

2023-05-15 Swagger spec

Request body

Parameter Type Required? Default Description

input string Yes N/A Input text to get embeddings for, encoded as a
string. The number of input tokens varies
depending on what model you are using.

Unless you're embedding code, we suggest


replacing newlines (\n) in your input with a single
space, as we have observed inferior results when
newlines are present.

user string No Null A unique identifier representing for your end-user.


This will help Azure OpenAI monitor and detect
abuse. Do not pass PII identifiers instead use
pseudoanonymized values such as GUIDs

Example request

Console

curl
https://YOUR_RESOURCE_NAME.openai.azure.com/openai/deployments/YOUR_DEPLOYME
NT_NAME/embeddings?api-version=2023-05-15 \

-H "Content-Type: application/json" \

-H "api-key: YOUR_API_KEY" \

-d "{\"input\": \"The food was delicious and the waiter...\"}"

Example response

JSON

"object": "list",

"data": [

"object": "embedding",

"embedding": [

0.018990106880664825,

-0.0073809814639389515,

.... (1024 floats total for ada)

0.021276434883475304,

],

"index": 0

],

"model": "text-similarity-babbage:001"

Chat completions
Create completions for chat messages with the ChatGPT and GPT-4 models.

Create chat completions

HTTP

POST https://{your-resource-
name}.openai.azure.com/openai/deployments/{deployment-id}/chat/completions?
api-version={api-version}

Path parameters

Parameter Type Required? Description

your- string Required The name of your Azure OpenAI Resource.


resource-
name

deployment- string Required The name of your model deployment. You're required to first
id deploy a model before you can make calls

api- string Required The API version to use for this operation. This follows the
version YYYY-MM-DD format.

Supported versions

2023-03-15-preview Swagger spec


2023-05-15 Swagger spec

2023-06-01-preview Swagger spec


Example request

Console

curl
https://YOUR_RESOURCE_NAME.openai.azure.com/openai/deployments/YOUR_DEPLOYME
NT_NAME/chat/completions?api-version=2023-05-15 \

-H "Content-Type: application/json" \

-H "api-key: YOUR_API_KEY" \

-d '{"messages":[{"role": "system", "content": "You are a helpful


assistant."},{"role": "user", "content": "Does Azure OpenAI support customer
managed keys?"},{"role": "assistant", "content": "Yes, customer managed keys
are supported by Azure OpenAI."},{"role": "user", "content": "Do other Azure
Cognitive Services support this too?"}]}'

Example response

Console

{"id":"chatcmpl-6v7mkQj980V1yBec6ETrKPRqFjNw9",

"object":"chat.completion","created":1679072642,

"model":"gpt-35-turbo",

"usage":{"prompt_tokens":58,

"completion_tokens":68,

"total_tokens":126},

"choices":[{"message":{"role":"assistant",

"content":"Yes, other Azure Cognitive Services also support customer managed


keys. Azure Cognitive Services offer multiple options for customers to
manage keys, such as using Azure Key Vault, customer-managed keys in Azure
Key Vault or customer-managed keys through Azure Storage service. This helps
customers ensure that their data is secure and access to their services is
controlled."},"finish_reason":"stop","index":0}]}

In the example response, finish_reason equals stop . If finish_reason equals


content_filter consult our content filtering guide to understand why this is occurring.

Output formatting adjusted for ease of reading, actual output is a single block of text
without line breaks.

Parameter Type Required? Default Description

messages array Required The messages to generate chat


completions for, in the chat format.
Parameter Type Required? Default Description

temperature number Optional 1 What sampling temperature to use,


between 0 and 2. Higher values like 0.8
will make the output more random, while
lower values like 0.2 will make it more
focused and deterministic.\nWe generally
recommend altering this or top_p but not
both.

n integer Optional 1 How many chat completion choices to


generate for each input message.

stream boolean Optional false If set, partial message deltas will be sent,
like in ChatGPT. Tokens will be sent as
data-only server-sent events as they
become available, with the stream
terminated by a data: [DONE] message."

stop string Optional null Up to 4 sequences where the API will stop
or array generating further tokens.

max_tokens integer Optional inf The maximum number of tokens allowed


for the generated answer. By default, the
number of tokens the model can return
will be (4096 - prompt tokens).

presence_penalty number Optional 0 Number between -2.0 and 2.0. Positive


values penalize new tokens based on
whether they appear in the text so far,
increasing the model's likelihood to talk
about new topics.

frequency_penalty number Optional 0 Number between -2.0 and 2.0. Positive


values penalize new tokens based on
their existing frequency in the text so far,
decreasing the model's likelihood to
repeat the same line verbatim.
Parameter Type Required? Default Description

logit_bias object Optional null Modify the likelihood of specified tokens


appearing in the completion. Accepts a
json object that maps tokens (specified by
their token ID in the tokenizer) to an
associated bias value from -100 to 100.
Mathematically, the bias is added to the
logits generated by the model prior to
sampling. The exact effect will vary per
model, but values between -1 and 1
should decrease or increase likelihood of
selection; values like -100 or 100 should
result in a ban or exclusive selection of
the relevant token.

user string Optional A unique identifier representing your


end-user, which can help Azure OpenAI
to monitor and detect abuse.

Completions extensions
Extensions for chat completions, for example Azure OpenAI on your data.

Use chat completions extensions

HTTP

POST {your-resource-name}/openai/deployments/{deployment-
id}/extensions/chat/completions?api-version={api-version}

Path parameters

Parameter Type Required? Description

your- string Required The name of your Azure OpenAI Resource.


resource-
name

deployment- string Required The name of your model deployment. You're required to first
id deploy a model before you can make calls

api- string Required The API version to use for this operation. This follows the
version YYYY-MM-DD format.

Supported versions
2023-06-01-preview Swagger spec

Example request

Console

curl -i -X POST
YOUR_RESOURCE_NAME/openai/deployments/YOUR_DEPLOYMENT_NAME/extensions/chat/c
ompletions?api-version=2023-06-01-preview \

-H "Content-Type: application/json" \

-H "api-key: YOUR_API_KEY" \

-H "chatgpt_url: YOUR_RESOURCE_URL" \

-H "chatgpt_key: YOUR_API_KEY" \

-d \

'

"dataSources": [

"type": "AzureCognitiveSearch",

"parameters": {

"endpoint": "'YOUR_AZURE_COGNITIVE_SEARCH_ENDPOINT'",

"key": "'YOUR_AZURE_COGNITIVE_SEARCH_KEY'",

"indexName": "'YOUR_AZURE_COGNITIVE_SEARCH_INDEX_NAME'"

],

"messages": [

"role": "user",

"content": "What are the differences between Azure Machine


Learning and Azure Cognitive Services?"

'

Example response

JSON

"id": "12345678-1a2b-3c4e5f-a123-12345678abcd",

"model": "",

"created": 1684304924,

"object": "chat.completion",

"choices": [

"index": 0,
"messages": [

"role": "tool",

"content": "{\"citations\": [{\"content\":


\"\\nCognitive Services are cloud-based artificial intelligence (AI)
services...\", \"id\": null, \"title\": \"What is Cognitive Services\",
\"filepath\": null, \"url\": null, \"metadata\": {\"chunking\": \"orignal
document size=250. Scores=0.4314117431640625 and 1.72564697265625.Org
Highlight count=4.\"}, \"chunk_id\": \"0\"}], \"intent\": \"[\\\"Learn about
Azure Cognitive Services.\\\"]\"}",

"end_turn": false

},

"role": "assistant",

"content": " \nAzure Cognitive Services are cloud-based


artificial intelligence (AI) services that help developers build cognitive
intelligence into applications without having direct AI or data science
skills or knowledge. [doc1]. Azure Machine Learning is a cloud service for
accelerating and managing the machine learning project lifecycle. [doc1].",

"end_turn": true

Parameters Type Required? Default Description

messages array Required null The messages to generate chat completions for,
in the chat format.

dataSources array Required The data sources to be used for the Azure
OpenAI on your data feature.

temperature number Optional 0 What sampling temperature to use, between 0


and 2. Higher values like 0.8 will make the
output more random, while lower values like 0.2
will make it more focused and deterministic. We
generally recommend altering this or top_p but
not both.

top_p number Optional 1 An alternative to sampling with temperature,


called nucleus sampling, where the model
considers the results of the tokens with top_p
probability mass. So 0.1 means only the tokens
comprising the top 10% probability mass are
considered. We generally recommend altering
this or temperature but not both.
Parameters Type Required? Default Description

stream boolean Optional false If set, partial message deltas will be sent, like in
ChatGPT. Tokens will be sent as data-only
server-sent events as they become available,
with the stream terminated by a message
"messages": [{"delta": {"content": "[DONE]"},
"index": 2, "end_turn": true}]

stop string Optional null Up to 2 sequences where the API will stop
or array generating further tokens.

max_tokens integer Optional 1000 The maximum number of tokens allowed for the
generated answer. By default, the number of
tokens the model can return is 4096 -
prompt_tokens .

The following parameters can be used inside of the parameters field inside of
dataSources .

Parameters Type Required? Default Description

type string Required null The data source to be used for the
Azure OpenAI on your data feature.
For Azure Cognitive search the
value is AzureCognitiveSearch .

endpoint string Required null The data source endpoint.

key string Required null One of the Azure Cognitive Search


admin keys for your service.

indexName string Required null The search index to be used.

fieldsMapping dictionary Optional null Index data column mapping.

inScope boolean Optional true If set, this value will limit responses
specific to the grounding data
content.

topNDocuments number Optional 5 Number of documents that need to


be fetched for document
augmentation.

queryType string Optional simple Indicates which query option will be


used for Azure Cognitive Search.
Parameters Type Required? Default Description

semanticConfiguration string Optional null The semantic search configuration.


Only available when queryType is
set to semantic .

roleInformation string Optional null Gives the model instructions about


how it should behave and the
context it should reference when
generating a response.
Corresponds to the “System
Message” in Azure OpenAI Studio.
There’s a 100 token limit, which
counts towards the overall token
limit.

Image generation

Request a generated image


Generate a batch of images from a text caption. Image generation is currently only
available with api-version=2023-06-01-preview .

HTTP

POST https://{your-resource-
name}.openai.azure.com/openai/images/generations:submit?api-version={api-
version}

Path parameters

Parameter Type Required? Description

your-resource- string Required The name of your Azure OpenAI Resource.


name

api-version string Required The API version to use for this operation. This follows the
YYYY-MM-DD format.

Supported versions

2023-06-01-preview

Request body
Parameter Type Required? Default Description

prompt string Required A text description of the desired image(s). The


maximum length is 1000 characters.

n integer Optional 1 The number of images to generate. Must be


between 1 and 5.

size string Optional 1024x1024 The size of the generated images. Must be one
of 256x256 , 512x512 , or 1024x1024 .

Example request

Console

curl -X POST
https://YOUR_RESOURCE_NAME.openai.azure.com/openai/images/generations:submit
?api-version=2023-06-01-preview \
-H "Content-Type: application/json" \

-H "api-key: YOUR_API_KEY" \

-d '{

"prompt": "An avocado chair",

"size": "512x512",

"n": 3

}'

Example response
The operation returns a 202 status code and an GenerateImagesResponse JSON object
containing the ID and status of the operation.

JSON

"id": "f508bcf2-e651-4b4b-85a7-58ad77981ffa",

"status": "notRunning"

Get a generated image result


Use this API to retrieve the results of an image generation operation. Image generation
is currently only available with api-version=2023-06-01-preview .

HTTP
GET https://{your-resource-
name}.openai.azure.com/openai/operations/images/{operation-id}?api-version=
{api-version}

Path parameters

Parameter Type Required? Description

your-resource- string Required The name of your Azure OpenAI Resource.


name

operation-id string Required The GUID that identifies the original image generation
request.

Supported versions

2023-06-01-preview

Example request

Console

curl -X GET "https://{your-resource-


name}.openai.azure.com/openai/operations/images/{operation-id}?api-
version=2023-06-01-preview"

-H "Content-Type: application/json"

-H "Api-Key: {api key}"

Example response

Upon success the operation returns a 200 status code and an OperationResponse JSON
object. The status field can be "notRunning" (task is queued but hasn't started yet),
"running" , "succeeded" , "canceled" (task has timed out), "failed" , or "deleted" . A
succeeded status indicates that the generated image is available for download at the

given URL. If multiple images were generated, their URLs are all returned in the
result.data field.

JSON

"created": 1685064331,

"expires": 1685150737,

"id": "4b755937-3173-4b49-bf3f-da6702a3971a",

"result": {

"data": [

"url": "<URL_TO_IMAGE>"

},

"url": "<URL_TO_NEXT_IMAGE>"

},

...

},

"status": "succeeded"

Delete a generated image from the server


You can use the operation ID returned by the request to delete the corresponding
image from the Azure server. Generated images are automatically deleted after 24 hours
by default, but you can trigger the deletion earlier if you want to.

HTTP

DELETE https://{your-resource-
name}.openai.azure.com/openai/operations/images/{operation-id}?api-version=
{api-version}

Path parameters

Parameter Type Required? Description

your-resource- string Required The name of your Azure OpenAI Resource.


name

operation-id string Required The GUID that identifies the original image generation
request.

Supported versions

2023-06-01-preview

Example request

Console

curl -X DELETE "https://{your-resource-


name}.openai.azure.com/openai/operations/images/{operation-id}?api-
version=2023-06-01-preview"

-H "Content-Type: application/json"

-H "Api-Key: {api key}"

Response
The operation returns a 204 status code if successful. This API only succeeds if the
operation is in an end state (not running ).

Management APIs
Azure OpenAI is deployed as a part of the Azure Cognitive Services. All Cognitive
Services rely on the same set of management APIs for creation, update and delete
operations. The management APIs are also used for deploying models within an OpenAI
resource.

Management APIs reference documentation

Next steps
Learn about managing deployments, models, and fine-tuning with the REST API.
Learn
more about the underlying models that power Azure OpenAI.
Fine Tunes
Reference
Service: Cognitive Services
API Version: 2022-12-01

Operations
Cancel Cancels the processing of the fine-tune job specified by the
given fine-tune-id.

Create Creates a job that fine-tunes a specified model from a given


training file.
Response includes details of the enqueued job
including job status and hyper paramete...

Delete Deletes the fine-tune job specified by the given fine-tune-id.

Get Gets details for a single fine-tune job specified by the given fine-
tune-id.
The details contain the base model, training and
validation files, hyper parameters,...

Get Events Gets the events for the fine-tune job specified by the given fine-
tune-id.
Events are created when the job status changes, e.g.
running or complete, and when res...

List Gets a list of all fine-tune jobs owned by the Azure OpenAI
resource.
The details that are returned for each fine-tune job
contain besides its identifier
the base...
Deployments - Create Or Update
Reference
Service: Cognitive Services
API Version: 2021-10-01

Update the state of specified deployments associated with the Cognitive Services account.

HTTP

PUT
https://management.azure.com/subscriptions/{subscriptionId}/resourceGroups/{resourceGroupName}/providers/Microsoft.Cognitiv
eServices/accounts/{accountName}/deployments/{deploymentName}?api-version=2021-10-01

URI Parameters
Name In Required Type Description

accountName path True string The name of Cognitive Services account.

Regex pattern: ^[a-zA-Z0-9][a-zA-Z0-9_.-]*$

deploymentName path True string The name of the deployment associated with the Cognitive Services Account

resourceGroupName path True string The name of the resource group. The name is case insensitive.

subscriptionId path True string The ID of the target subscription.

api-version query True string The API version to use for this operation.

Request Body
Name Type Description

properties DeploymentProperties Properties of Cognitive Services account deployment.

Responses
Name Type Description

200 OK Deployment Create/Update the deployment successfully.

201 Created Deployment Create the deployment successfully.

Other Status ErrorResponse Error response describing why the operation failed.
Codes

Examples

PutDeployment

Sample Request

HTTP
HTTP

PUT
https://management.azure.com/subscriptions/subscriptionId/resourceGroups/resourceGroupName/providers/Microsoft.Cognitiv
eServices/accounts/accountName/deployments/deploymentName?api-version=2021-10-01

"properties": {

"model": {

"format": "OpenAI",

"name": "ada",

"version": "1"

},

"scaleSettings": {

"scaleType": "Manual",

"capacity": 1

Sample Response
Status code: 200

JSON

"id":
"/subscriptions/subscriptionId/resourceGroups/resourceGroupName/providers/Microsoft.CognitiveServices/accounts/accountName/
deployments/deploymentName",

"name": "deploymentName",

"type": "Microsoft.CognitiveServices/accounts/deployments",

"properties": {

"model": {

"format": "OpenAI",

"name": "ada",

"version": "1"

},

"scaleSettings": {

"scaleType": "Manual",

"capacity": 1

},

"provisioningState": "Succeeded"

Status code: 201

JSON

"id":
"/subscriptions/subscriptionId/resourceGroups/resourceGroupName/providers/Microsoft.CognitiveServices/accounts/accountName/
deployments/deploymentName",

"name": "deploymentName",

"type": "Microsoft.CognitiveServices/accounts/deployments",

"properties": {

"model": {

"format": "OpenAI",

"name": "ada",

"version": "1"

},

"scaleSettings": {

"scaleType": "Manual",

"capacity": 1

},

"provisioningState": "Accepted"

Definitions
Name Description
createdByType The type of identity that created the resource.

Deployment Cognitive Services account deployment.

DeploymentModel Properties of Cognitive Services account deployment model.

DeploymentProperties Properties of Cognitive Services account deployment.

DeploymentProvisioningState Gets the status of the resource at the time the operation was called.

DeploymentScaleSettings Properties of Cognitive Services account deployment model.

DeploymentScaleType Deployment scale type.

ErrorAdditionalInfo The resource management error additional info.

ErrorDetail The error detail.

ErrorResponse Error response

systemData Metadata pertaining to creation and last modification of the resource.

createdByType
The type of identity that created the resource.

Name Type Description

Application string

Key string

ManagedIdentity string

User string

Deployment
Cognitive Services account deployment.

Name Type Description

etag string Resource Etag.

id string Fully qualified resource ID for the resource. Ex -


/subscriptions/{subscriptionId}/resourceGroups/{resourceGroupName}/providers/{resourceProviderNamespace}/{resourceType}/

name string The name of the resource

properties Deployment Properties of Cognitive Services account deployment.


Properties

systemData systemData Metadata pertaining to creation and last modification of the resource.

type string The type of the resource. E.g. "Microsoft.Compute/virtualMachines" or "Microsoft.Storage/storageAccounts"


DeploymentModel
Properties of Cognitive Services account deployment model.

Name Type Description

format string Deployment model format.

name string Deployment model name.

version string Deployment model version.

DeploymentProperties
Properties of Cognitive Services account deployment.

Name Type Description

model Deployment Properties of Cognitive Services account deployment model.


Model

provisioningState Deployment Gets the status of the resource at the time the operation was called.
ProvisioningState

scaleSettings Deployment Properties of Cognitive Services account deployment model.


ScaleSettings

DeploymentProvisioningState
Gets the status of the resource at the time the operation was called.

Name Type Description

Accepted string

Creating string

Deleting string

Failed string

Moving string

Succeeded string

DeploymentScaleSettings
Properties of Cognitive Services account deployment model.

Name Type Description

capacity integer Deployment capacity.

scaleType Deployment Deployment scale type.


ScaleType

DeploymentScaleType
Deployment scale type.

Name Type Description

Manual string

Standard string

ErrorAdditionalInfo
The resource management error additional info.

Name Type Description

info object The additional info.

type string The additional info type.

ErrorDetail
The error detail.

Name Type Description

additionalInfo ErrorAdditional The error additional info.


Info[]

code string The error code.

details ErrorDetail[] The error details.

message string The error message.

target string The error target.

ErrorResponse
Error response

Name Type Description

error ErrorDetail The error object.

systemData
Metadata pertaining to creation and last modification of the resource.

Name Type Description

createdAt string The timestamp of resource creation (UTC).

createdBy string The identity that created the resource.

createdByType createdByType The type of identity that created the resource.

lastModifiedAt string The timestamp of resource last modification (UTC)


lastModifiedBy string The identity that last modified the resource.

lastModifiedByType createdByType The type of identity that last modified the resource.
Azure.AI.OpenAI Namespace
Reference

) Important

Some information relates to prerelease product that may be substantially modified


before it’s released. Microsoft makes no warranties, express or implied, with respect
to the information provided here.

Classes
ChatChoice The representation of a single completions result for a chat
completions request.

ChatCompletions A representation of a full response to a chat completions


request.

ChatCompletionsOptions The configuration information used for a chat completions


request.

ChatMessage A single, role-attributed message within a chat completion


interaction.

Choice Choice model within completion response.

Completions Expected response schema to completion request.

CompletionsLogProbability LogProbs model within completion choice.

CompletionsOptions Post body schema to create a prompt completion from a


deployment.

CompletionsUsage Representation of the token counts processed for a completions


request.
Counts consider all tokens across prompts, choices,
choice alternates, best_of generations, and other consumers.

EmbeddingItem Expected response schema to embeddings object list item


request.

Embeddings Expected response schema to embeddings request.

EmbeddingsOptions Schema to create a prompt completion from a deployment.

EmbeddingsUsage Measurment of the amount of tokens used in this request and


response.
OpenAIClient Azure OpenAI APIs for completions and search.

OpenAIClientOptions Client options for OpenAIClient.

StreamingChatChoice

StreamingChatCompletions

StreamingChoice

StreamingCompletions

Structs
ChatRole A description of the intended purpose of a message within a
chat completions interaction.

Enums
OpenAIClientOptions.Service The version of the service to use.
Version
com.azure.ai.openai
Reference
Package: com.azure.ai.openai
Maven Artifact: com.azure:azure-ai-openai:1.0.0-beta.2

Package containing the classes for OpenAI. Azure OpenAI APIs for completions and
search.

Classes
OpenAIAsyncClient Initializes a new instance of the asynchronous OpenAIClient type.

OpenAIClient Initializes a new instance of the synchronous OpenAIClient type.

OpenAIClientBuilder A builder for creating a new instance of the OpenAIClient type.

Enums
OpenAIServiceVersion Service version of OpenAIClient.
@azure/openai package
Reference

Classes
AzureKeyCredential A static-key-based credential that supports updating
the
underlying key value.

OpenAIClient A client for interacting with Azure OpenAI.

The client needs the endpoint of an OpenAI resource and an


authentication
method such as an API key or token. The API key
and endpoint can be found in
the OpenAI resource page. They
will be located in the resource's Keys and Endpoint page.

Examples for authentication:

API Key
JavaScript

import { OpenAIClient } from "@azure/openai";

import { AzureKeyCredential } from


"@azure/core-auth";

const endpoint = "<azure endpoint>";

const credential = new AzureKeyCredential("<api


key>");

const client = new OpenAIClient(endpoint,


credential);

Azure Active Directory


JavaScript

import { OpenAIClient } from "@azure/openai";

import { DefaultAzureCredential } from


"@azure/identity";

const endpoint = "<azure endpoint>";

const credential = new


DefaultAzureCredential();

const client = new OpenAIClient(endpoint,


credential);

OpenAIKeyCredential The OpenAIKeyCredential class represents an OpenAI API key


and is used to authenticate into an OpenAI client for
an OpenAI
endpoint.

Interfaces
ChatChoice The representation of a single prompt completion as part of an
overall chat completions request.
Generally, n choices are
generated per provided prompt with a default value of 1.
Token
limits and other settings may limit the number of choices
generated.

ChatCompletions Representation of the response data from a chat completions


request.
Completions support a wide variety of tasks and
generate text that continues from or "completes"
provided
prompt data.

ChatMessage A single, role-attributed message within a chat completion


interaction.

Choice The representation of a single prompt completion as part of an


overall completions request.
Generally, n choices are generated
per provided prompt with a default value of 1.
Token limits and
other settings may limit the number of choices generated.

Completions Representation of the response data from a completions request.


Completions support a wide variety of tasks and generate text
that continues from or "completes"
provided prompt data.

CompletionsLogProbability Representation of a log probabilities model for a completions


Model generation.

CompletionsUsage Representation of the token counts processed for a completions


request.
Counts consider all tokens across prompts, choices,
choice alternates, best_of generations, and
other consumers.

EmbeddingItem Representation of a single embeddings relatedness comparison.

Embeddings Representation of the response data from an embeddings


request.
Embeddings measure the relatedness of text strings and
are commonly used for search, clustering,
recommendations,
and other similar scenarios.

EmbeddingsUsage Measurement of the amount of tokens used in this request and


response.
GetChatCompletionsOptions

GetCompletionsOptions

GetEmbeddingsOptions

OpenAIClientOptions

RequestOptions

Type Aliases
ChatRole "system", "assistant", "user"

CompletionsFinishReason "stopped", "tokenLimitReached", "contentFiltered"


Azure Cognitive Services support and
help options
Article • 07/22/2022

Are you just starting to explore the functionality of Azure Cognitive Services? Perhaps
you are implementing a new feature in your application. Or after using the service, do
you have suggestions on how to improve it? Here are options for where you can get
support, stay up-to-date, give feedback, and report bugs for Cognitive Services.

Create an Azure support request

Explore the range of Azure support options and choose the plan that best fits,
whether you're a developer just starting your cloud journey or a large organization
deploying business-critical, strategic applications. Azure customers can create and
manage support requests in the Azure portal.

Azure portal
Azure portal for the United States government

Post a question on Microsoft Q&A


For quick and reliable answers on your technical product questions from Microsoft
Engineers, Azure Most Valuable Professionals (MVPs), or our expert community, engage
with us on Microsoft Q&A, Azure's preferred destination for community support.

If you can't find an answer to your problem using search, submit a new question to
Microsoft Q&A. Use one of the following tags when you ask your question:

Cognitive Services

Vision

Computer Vision
Custom Vision
Face
Form Recognizer
Video Indexer

Language
Immersive Reader
Language Understanding (LUIS)
QnA Maker
Language service
Translator

Speech

Speech service

Decision

Anomaly Detector
Content Moderator
Metrics Advisor
Personalizer

Azure OpenAI

Azure OpenAI

Post a question to Stack Overflow

For answers on your developer questions from the largest community developer
ecosystem, ask your question on Stack Overflow.

If you do submit a new question to Stack Overflow, please use one or more of the
following tags when you create the question:

Cognitive Services

Vision

Computer Vision
Custom Vision
Face
Form Recognizer
Video Indexer

Language

Immersive Reader
Language Understanding (LUIS)
QnA Maker
Language service
Translator

Speech

Speech service

Decision

Anomaly Detector
Content Moderator
Metrics Advisor
Personalizer

Azure OpenAI

Azure OpenAI

Submit feedback
To request new features, post them on https://feedback.azure.com . Share your ideas
for making Cognitive Services and its APIs work better for the applications you develop.

Cognitive Services

Vision

Computer Vision
Custom Vision
Face
Form Recognizer
Video Indexer

Language

Immersive Reader
Language Understanding (LUIS)
QnA Maker
Language service
Translator

Speech

Speech service
Decision

Anomaly Detector
Content Moderator
Metrics Advisor
Personalizer

Stay informed
Staying informed about features in a new release or news on the Azure blog can help
you find the difference between a programming error, a service bug, or a feature not yet
available in Cognitive Services.

Learn more about product updates, roadmap, and announcements in Azure


Updates .
News about Cognitive Services is shared in the Azure blog .
Join the conversation on Reddit about Cognitive Services.

Next steps
What are Azure Cognitive Services?

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy