Azure Cognitive Services Openai
Azure Cognitive Services Openai
OVERVIEW QUICKSTART
What is Azure OpenAI Service? Quickstarts
CONCEPT REFERENCE
Azure OpenAI Models Support and help options
Additional resources
Reference Tools
REST API Azure CLI
Terms of use
PowerShell
What is Azure OpenAI Service?
Article • 06/28/2023
Azure OpenAI Service provides REST API access to OpenAI's powerful language models
including the GPT-3, Codex and Embeddings model series. In addition, the new GPT-4
and ChatGPT (gpt-35-turbo) model series have now reached general availability. These
models can be easily adapted to your specific task including but not limited to content
generation, summarization, semantic search, and natural language to code translation.
Users can access the service through REST APIs, Python SDK, or our web-based interface
in the Azure OpenAI Studio.
Features overview
Codex series
Embeddings series
Fine-tuning Ada
Babbage
Curie
Cushman
Davinci
Virtual network support Yes, unless using Azure OpenAI on your data.
& private link support
Azure OpenAI Service Studio for model exploration and fine tuning
Content filtering Prompts and completions are evaluated against our content policy
with automated systems. High severity content will be filtered.
Responsible AI
At Microsoft, we're committed to the advancement of AI driven by principles that put
people first. Generative models such as the ones available in Azure OpenAI have
significant potential benefits, but without careful design and thoughtful mitigations,
such models have the potential to generate incorrect or even harmful content. Microsoft
has made significant investments to help guard against abuse and unintended harm,
which includes requiring applicants to show well-defined use cases, incorporating
Microsoft’s principles for responsible AI use , building content filters to support
customers, and providing responsible AI implementation guidance to onboarded
customers.
Apply now
With Azure OpenAI, customers get the security capabilities of Microsoft Azure while
running the same models as OpenAI. Azure OpenAI offers private networking, regional
availability, and responsible AI content filtering.
Key concepts
Prompts & completions
The completions endpoint is the core component of the API service. This API provides
access to the model's text-in, text-out interface. Users simply need to provide an input
prompt containing the English text command, and the model will generate a text
completion.
Prompt:
""" count to 5 in a for loop """
Completion:
for i in range(1, 6): print(i)
Tokens
Azure OpenAI processes text by breaking it down into tokens. Tokens can be words or
just chunks of characters. For example, the word “hamburger” gets broken up into the
tokens “ham”, “bur” and “ger”, while a short and common word like “pear” is a single
token. Many tokens start with a whitespace, for example “ hello” and “ bye”.
The total number of tokens processed in a given request depends on the length of your
input, output and request parameters. The quantity of tokens being processed will also
affect your response latency and throughput for the models.
Resources
Azure OpenAI is a new product offering on Azure. You can get started with Azure
OpenAI the same way as any other Azure product where you create a resource, or
instance of the service, in your Azure Subscription. You can read more about Azure's
resource management design.
Deployments
Once you create an Azure OpenAI Resource, you must deploy a model before you can
start making API calls and generating text. This action can be done using the
Deployment APIs. These APIs allow you to specify the model you wish to use.
Prompt engineering
GPT-3, GPT-3.5, and GPT-4 models from OpenAI are prompt-based. With prompt-based
models, the user interacts with the model by entering a text prompt, to which the model
responds with a text completion. This completion is the model’s continuation of the
input text.
While these models are extremely powerful, their behavior is also very sensitive to the
prompt. This makes prompt engineering an important skill to develop.
Prompt construction can be difficult. In practice, the prompt acts to configure the model
weights to complete the desired task, but it's more of an art than a science, often
requiring experience and intuition to craft a successful prompt.
Models
The service provides users access to several different models. Each model provides a
different capability and price point.
GPT-4 models are the latest available models. Due to high demand access to this model
series is currently only available by request. To request access, existing Azure OpenAI
customers can apply by filling out this form
The GPT-3 base models are known as Davinci, Curie, Babbage, and Ada in decreasing
order of capability and increasing order of speed.
The Codex series of models is a descendant of GPT-3 and has been trained on both
natural language and code to power natural language to code use cases. Learn more
about each model on our models concept page.
The DALL-E models, currently in preview, generate images from text prompts that the
user provides.
Next steps
Learn more about the underlying models that power Azure OpenAI.
Azure OpenAI Service quotas and limits
Article • 06/21/2023
This article contains a quick reference and a detailed description of the quotas and limits
for Azure OpenAI in Azure Cognitive Services.
tokens-per-minute)1 GPT-4: 20 K
GPT-4-32K: 60 K
All others: 240 K
Maximum prompt tokens per request Varies per model. For more information, see Azure
OpenAI Service models
Next steps
Explore how to manage quota for your Azure OpenAI deployments.
Learn more about
the underlying models that power Azure OpenAI.
Azure OpenAI Service models
Article • 06/30/2023
Azure OpenAI provides access to many different models, grouped by family and
capability. A model family typically associates models by their intended task. The
following table describes model families currently available in Azure OpenAI. Not all
models are available in all regions currently. Refer to the model capability table in this
article for a full breakdown.
Model Description
family
GPT-4 A set of models that improve on GPT-3.5 and can understand as well as generate
natural language and code.
GPT-3 A series of models that can understand and generate natural language. This
includes the new ChatGPT model.
DALL-E A series of models in preview that can generate original images from natural
(Preview) language.
Codex A series of models that can understand and generate code, including translating
natural language to code.
Embeddings A set of models that can understand and use embeddings. An embedding is a
special format of data representation that can be easily utilized by machine
learning models and algorithms. The embedding is an information dense
representation of the semantic meaning of a piece of text. Currently, we offer three
families of Embeddings models for different functionalities: similarity, text search,
and code search.
Model capabilities
Each model family has a series of models that are further distinguished by capability.
These capabilities are typically identified by names, and the alphabetical order of these
names generally signifies the relative capability and cost of that model within a given
model family. For example, GPT-3 models use names such as Ada, Babbage, Curie, and
Davinci to indicate relative capability and cost. Davinci is more capable and more
expensive than Curie, which in turn is more capable and more expensive than Babbage,
and so on.
7 Note
Any task that can be performed by a less capable model like Ada can be performed
by a more capable model like Curie or Davinci.
Naming convention
Azure OpenAI model names typically correspond to the following standard naming
convention:
{capability}-{family}[-{input-type}]-{identifier}
Element Description
{capability} The model capability of the model. For example, GPT-3 models uses text , while
Codex models use code .
{family} The relative family of the model. For example, GPT-3 models include ada ,
babbage , curie , and davinci .
{input-type} (Embeddings models only) The input type of the embedding supported by the
model. For example, text search embedding models support doc and query .
For example, our most powerful GPT-3 model is called text-davinci-003 , while our
most powerful Codex model is called code-davinci-002 .
The older versions of GPT-3 models named ada , babbage , curie , and davinci that
don't follow the standard naming convention are primarily intended for fine tuning.
For more information, see Learn how to customize a model for your application.
Model updates
Azure OpenAI now supports automatic updates for select model deployments. On
models where automatic update support is available, a model version drop-down will be
visible in Azure OpenAI Studio under Create new deployment and Edit deployment:
If you are still in the early testing phases for completion and chat completion based
models we recommend deploying models with auto-update to default set whenever it
is available. For embeddings models while we recommend using the latest model
version, you should choose when you want to upgrade since embeddings generated
with an earlier model version will not be interchangeable with the new version.
To view deprecation/expiration dates for all available models in a given region from
Azure OpenAI Studio select Models > Column options > Select Deprecation fine tune
and Deprecation inference:
Update & deploy models via the API
HTTP
PUT
https://management.azure.com/subscriptions/{subscriptionId}/resourceGroups/{
resourceGroupName}/providers/Microsoft.CognitiveServices/accounts/{accountNa
me}/deployments/{deploymentName}?api-version=2023-05-01
Path parameters
deploymentName string Required The deployment name you chose when you deployed
an existing model or the name you would like a new
model deployment to have.
resourceGroupName string Required The name of the associated resource group for this
model deployment.
api-version string Required The API version to use for this operation. This follows
the YYYY-MM-DD format.
Supported versions
Request body
This is only a subset of the available request body parameters. For the full list of the
parameters you can refer to the REST API spec .
OnceNewDefaultVersionAvailable
OnceCurrentVersionExpired
NoAutoUpgrade
capacity integer This represents the amount of quota you are assigning to this
deployment. A value of 1 equals 1,000 Tokens per Minute
(TPM)
Example request
Bash
-H "Content-Type: application/json" \
-d '{"sku":{"name":"Standard","capacity":1},"properties": {"model":
{"format": "OpenAI","name": "text-embedding-ada-002","version":
"2"},"versionUpgradeOption":"OnceCurrentVersionExpired"}}'
7 Note
There are multiple ways to generate an authorization token. The easiest method for
initial testing is to launch the Cloud Shell from https://portal.azure.com . Then
run az account get-access-token. You can use this token as your temporary
authorization token for API testing.
Example response
JSON
"id": "/subscriptions/{subscription-id}/resourceGroups/resource-group-
temp/providers/Microsoft.CognitiveServices/accounts/docs-openai-test-
001/deployments/text-embedding-ada-002-test-1",
"type": "Microsoft.CognitiveServices/accounts/deployments",
"name": "text-embedding-ada-002-test-1",
"sku": {
"name": "Standard",
"capacity": 1
},
"properties": {
"model": {
"format": "OpenAI",
"name": "text-embedding-ada-002",
"version": "2"
},
"versionUpgradeOption": "OnceCurrentVersionExpired",
"capabilities": {
"embeddings": "true",
"embeddingsMaxInputs": "1"
},
"provisioningState": "Succeeded",
"ratelimits": [
"key": "request",
"renewalPeriod": 10,
"count": 2
},
"key": "token",
"renewalPeriod": 60,
"count": 1000
},
"systemData": {
"createdBy": "docs@contoso.com",
"createdByType": "User",
"createdAt": "2023-06-13T00:12:38.885937Z",
"lastModifiedBy": "docs@contoso.com",
"lastModifiedByType": "User",
"lastModifiedAt": "2023-06-13T02:41:04.8410965Z"
},
"etag": "\"{GUID}\""
GPT-4 models
GPT-4 can solve difficult problems with greater accuracy than any of OpenAI's previous
models. Like gpt-35-turbo, GPT-4 is optimized for chat but works well for traditional
completions tasks.
Due to high demand access to this model series is currently only available by request. To
request access, existing Azure OpenAI customers can apply by filling out this form
gpt-4
gpt-4-32k
The gpt-4 supports 8192 max input tokens and the gpt-4-32k supports up to 32,768
tokens.
GPT-3 models
The GPT-3 models can understand and generate natural language. The service offers
four model capabilities, each with different levels of power and speed suitable for
different tasks. Davinci is the most capable model, while Ada is the fastest. In the order
of greater to lesser capability, the models are:
text-davinci-003
text-curie-001
text-babbage-001
text-ada-001
While Davinci is the most capable, the other models provide significant speed
advantages. Our recommendation is for users to start with Davinci while experimenting,
because it produces the best results and validate the value that Azure OpenAI can
provide. Once you have a prototype working, you can then optimize your model choice
with the best latency/performance balance for your application.
Davinci
Davinci is the most capable model and can perform any task the other models can
perform, often with less instruction. For applications requiring deep understanding of
the content, like summarization for a specific audience and creative content generation,
Davinci produces the best results. The increased capabilities provided by Davinci require
more compute resources, so Davinci costs more and isn't as fast as other models.
Another area where Davinci excels is in understanding the intent of text. Davinci is
excellent at solving many kinds of logic problems and explaining the motives of
characters. Davinci has been able to solve some of the most challenging AI problems
involving cause and effect.
Use for: Complex intent, cause and effect, summarization for audience
Curie
Curie is powerful, yet fast. While Davinci is stronger when it comes to analyzing
complicated text, Curie is capable for many nuanced tasks like sentiment classification
and summarization. Curie is also good at answering questions and performing Q&A and
as a general service chatbot.
Ada
Ada is usually the fastest model and can perform tasks like parsing text, address
correction and certain kinds of classification tasks that don’t require too much nuance.
Ada’s performance can often be improved by providing more context.
ChatGPT (gpt-35-turbo)
The ChatGPT model (gpt-35-turbo) is a language model designed for conversational
interfaces and the model behaves differently than previous GPT-3 models. Previous
models were text-in and text-out, meaning they accepted a prompt string and returned
a completion to append to the prompt. However, the ChatGPT model is conversation-in
and message-out. The model expects a prompt string formatted in a specific chat-like
transcript format, and returns a completion that represents a model-written message in
the chat.
To learn more about the ChatGPT model and how to interact with the Chat API check
out our in-depth how-to.
Codex models
The Codex models are descendants of our base GPT-3 models that can understand and
generate code. Their training data contains both natural language and billions of lines of
public code from GitHub.
They’re most capable in Python and proficient in over a dozen languages, including C#,
JavaScript, Go, Perl, PHP, Ruby, Swift, TypeScript, SQL, and Shell. In the order of greater
to lesser capability, the Codex models are:
code-davinci-002
code-cushman-001
Davinci
Similar to GPT-3, Davinci is the most capable Codex model and can perform any task the
other models can perform, often with less instruction. For applications requiring deep
understanding of the content, Davinci produces the best results. Greater capabilities
require more compute resources, so Davinci costs more and isn't as fast as other
models.
Cushman
Cushman is powerful, yet fast. While Davinci is stronger when it comes to analyzing
complicated tasks, Cushman is a capable model for many code generation tasks.
Cushman typically runs faster and cheaper than Davinci, as well.
Embeddings models
) Important
Similarity
Text search
Code search
Each family includes models across a range of capability. The following list indicates the
length of the numerical vector returned by the service, based on model capability:
Babbage 2048
Curie 4096
Davinci 12288
Davinci is the most capable, but is slower and more expensive than the other models.
Ada is the least capable, but is both faster and cheaper.
Similarity embedding
These models are good at capturing semantic similarity between two or more pieces of
text.
text-similarity-babbage-001
text-similarity-curie-001
text-similarity-davinci-001
text-search-ada-query-001
text-search-babbage-doc-001
text-search-babbage-query-001
text-search-curie-doc-001
text-search-curie-query-001
text-search-davinci-doc-001
text-search-davinci-query-001
code-search-ada-text-001
code-search-babbage-code-001
code-search-babbage-text-001
When using our embeddings models, keep in mind their limitations and risks.
) Important
South Central US is temporarily unavailable for creating new resources due to high
demand.
GPT-3 Models
These models can be used with Completion API requests. gpt-35-turbo is the only
model that can be used with both Completion API requests and the Chat Completion
API.
Model ID Base model Regions Fine- Max Training
Tuning Request Data (up
Regions (tokens) to)
text-ada-001 East US, South Central US, West N/A 2,049 Oct 2019
Europe
text- East US, South Central US, West N/A 2,049 Oct 2019
babbage-001 Europe
text-curie-001 East US, South Central US, West N/A 2,049 Oct 2019
Europe
text-davinci- East US, South Central US, West N/A 4,097 Jun 2021
002 Europe
gpt-35-turbo1 East US, France Central, South N/A 4,096 Sep 2021
(0301) Central US, UK South, West Europe
gpt-35-turbo East US, France Central, UK South N/A 4,096 Sep 2021
(0613)
gpt-35-turbo- East US, France Central, UK South N/A 16,384 Sep 2021
16k (0613)
1
Version 0301 of gpt-35-turbo will be deprecated no earlier than October 15th, 2023 in
favor of version 0613 .
GPT-4 Models
These models can only be used with the Chat Completion API.
Model ID Base model Fine-Tuning Max Request Training Data
Regions Regions (tokens) (up to)
1
The model is only available by request .
2 Version 0314 of gpt-4 and gpt-4-32k will be deprecated no earlier than October 15th,
2023 in favor of version 0613 .
Dall-E Models
Codex Models
These models can only be used with Completions API requests.
1
The model is available for fine-tuning by request only. Currently we aren't accepting
new requests to fine-tune the model.
Embeddings Models
These models can only be used with Embedding API requests.
7 Note
Next steps
Learn more about Azure OpenAI
Learn more about fine-tuning Azure OpenAI models
Azure OpenAI Service legacy models
Article • 07/07/2023
Azure OpenAI Service offers a variety of models for different use cases. The following
models are not available for new deployments beginning July 6, 2023. Deployments
created prior to July 6, 2023 remain available to customers until July 5, 2024. We
recommend customers migrate to the replacement models prior to the July 5, 2024
retirement.
GPT-3.5
The impacted GPT-3.5 models are the following. The replacement for the GPT-3.5
models is GPT-3.5 Turbo Instruct when that model becomes available.
text-davinci-002
text-davinci-003
code-davinci-002
GPT-3
The impacted GPT-3 models are the following. The replacement for the GPT-3 models is
GPT-3.5 Turbo Instruct when that model becomes available.
text-ada-001
text-babbage-001
text-curie-001
text-davinci-001
code-cushman-001
Embedding models
The embedding models below will be retired effective July 5, 2024. Customers should
migrate to text-embedding-ada-002 (version 2).
Similarity
Text search
Code search
Each family includes models across a range of capability. The following list indicates the
length of the numerical vector returned by the service, based on model capability:
Ada 1024
Babbage 2048
Curie 4096
Davinci 12288
Similarity embedding
These models are good at capturing semantic similarity between two or more pieces of
text.
text-similarity-babbage-001
text-similarity-curie-001
text-similarity-davinci-001
text-search-ada-query-001
text-search-babbage-doc-001
text-search-babbage-query-001
text-search-curie-doc-001
text-search-curie-query-001
text-search-davinci-doc-001
text-search-davinci-query-001
code-search-ada-text-001
code-search-babbage-code-001
code-search-babbage-text-001
GPT-3.5 models
text- East US, South Central US, N/A 4,097 Jun 2021
davinci-002 West Europe
GPT-3 models
text-ada- East US, South Central US, N/A 2,049 Oct 2019
001 West Europe
text- East US, South Central US, N/A 2,049 Oct 2019
babbage- West Europe
001
Model ID Base model Regions Fine-Tuning Max Request Training Data
Regions (tokens) (up to)
text-curie- East US, South Central US, N/A 2,049 Oct 2019
001 West Europe
Codex models
Embedding models
June 2023
UK South
Azure OpenAI is now available in the UK South region. Check the models page, for
the latest information on model availability in each region.
Quota
Quota provides the flexibility to actively manage the allocation of rate limits across
the deployments within your subscription.
May 2023
If you are currently using the 2023-03-15-preview API, we recommend migrating to the
GA 2023-05-15 API. If you are currently using API version 2022-12-01 this API remains
GA, but does not include the latest Chat Completion capabilities.
) Important
Using the current versions of the GPT-35-Turbo models with the completion
endpoint remains in preview.
France Central
Azure OpenAI is now available in the France Central region. Check the models
page, for the latest information on model availability in each region.
April 2023
DALL-E 2 public preview. Azure OpenAI Service now supports image generation
APIs powered by OpenAI's DALL-E 2 model. Get AI-generated images based on the
descriptive text you provide. To learn more, check out the quickstart. To request
access, existing Azure OpenAI customers can apply by filling out this form .
New Chat Completion API for ChatGPT and GPT-4 models released in preview
on 3/21. To learn more checkout the updated quickstarts and how-to article.
Increased training limits for fine-tuning: The max training job size (tokens in
training file) x (# of epochs) is 2 Billion tokens for all models. We have also
increased the max training job from 120 to 720 hours.
Adding additional use cases to your existing access. Previously, the process for
adding new use cases required customers to reapply to the service. Now, we're
releasing a new process that allows you to quickly add new use cases to your use
of the service. This process follows the established Limited Access process within
Azure Cognitive Services. Existing customers can attest to any and all new use
cases here . Please note that this is required anytime you would like to use the
service for a new use case you did not originally apply for.
February 2023
New Features
.NET SDK(inference) preview release | Samples
Terraform SDK update to support Azure OpenAI management operations.
Inserting text at the end of a completion is now supported with the suffix
parameter.
Updates
Content filtering is on by default.
January 2023
New Features
Service GA. Azure OpenAI Service is now generally available.
New models: Addition of the latest text model, text-davinci-003 (East US, West
Europe), text-ada-embeddings-002 (East US, South Central US, West Europe)
December 2022
New features
The latest models from OpenAI. Azure OpenAI provides access to all the latest
models including the GPT-3.5 series.
Higher request per second limits. 50 for non-Davinci models. 20 for Davinci
models.
Higher training limits: 40M training tokens for Ada, Babbage, and Curie. 10M for
Davinci.
Process for requesting modifications to the abuse & miss-use data logging &
human review. Today, the service logs request/response data for the purposes of
abuse and misuse detection to ensure that these powerful models aren't abused.
However, many customers have strict data privacy and security requirements that
require greater control over their data. To support these use cases, we're releasing
a new process for customers to modify the content filtering policies or turn off the
abuse logging for low-risk use cases. This process follows the established Limited
Access process within Azure Cognitive Services and existing OpenAI customers can
apply here .
Lockbox support
SOC-2 compliance
Logging and diagnostics through Azure Resource Health, Cost Analysis, and
Metrics & Diagnostic settings.
Changes (breaking)
Fine-tuning create API request has been updated to match OpenAI’s schema.
JSON
{
"training_file": "file-XGinujblHPwGLSztz8cPS8XY",
"hyperparams": {
"batch_size": 4,
"learning_rate_multiplier": 0.1,
"n_epochs": 4,
"prompt_loss_weight": 0.1,
}
JSON
{
"training_file": "file-XGinujblHPwGLSztz8cPS8XY",
"batch_size": 4,
"learning_rate_multiplier": 0.1,
"n_epochs": 4,
"prompt_loss_weight": 0.1,
Customer actions
Contact Azure Support if you would like these turned on for your subscription.
Apply for filtering modifications , if you would like to have them remain off. (This
option will be for low-risk use cases only.)
Next steps
Learn more about the underlying models that power Azure OpenAI.
Azure OpenAI Service frequently
asked questions
FAQ
If you can't find answers to your questions in this document, and still need help check
the Cognitive Services support options guide. Azure OpenAI is part of Azure Cognitive
Services.
General
Does Azure OpenAI support GPT-4?
Azure OpenAI supports the latest GPT-4 models. These models are currently only
available by request. For access, existing Azure OpenAI customers can apply by filling
out this form .
With Azure OpenAI, customers get the security capabilities of Microsoft Azure while
running the same models as OpenAI.
Does Azure OpenAI support VNETs and Private
Endpoints?
Yes, as part of Azure Cognitive Services, Azure OpenAI supports VNETs and Private
Endpoints. To learn more consult the Cognitive Services virtual networking guidance
Web app
How can I customize my published web app?
You can customize your published web app in the Azure portal. The source code for the
published web app is available on GitHub , where you can find information on
changing the app frontend, as well as instructions for building and deploying the app.
Use this article to get started making your first calls to Azure OpenAI.
Prerequisites
An Azure subscription - Create one for free .
Currently, access to this service is granted only by application. You can apply for
access to Azure OpenAI by completing the form at https://aka.ms/oai/access .
Open an issue on this repo to contact us if you have an issue.
An Azure OpenAI resource with a model deployed. For more information about
model deployment, see the resource deployment guide.
From the Azure OpenAI Studio landing page navigate further to explore examples for
prompt completion, manage your deployments and models, and find learning resources
such as documentation and community forums.
Playground
Start exploring Azure OpenAI capabilities with a no-code approach through the GPT-3
Playground. It's simply a text box where you can submit a prompt to generate a
completion. From this page, you can quickly iterate and experiment with the capabilities.
You can select a deployment and choose from a few pre-loaded examples to get started.
If your resource doesn't have a deployment, select Create a deployment and follow the
instructions provided by the wizard. For more information about model deployment, see
the resource deployment guide.
You can experiment with the configuration settings such as temperature and pre-
response text to improve the performance of your task. You can read more about each
parameter in the REST API.
Selecting the Generate button will send the entered text to the completions API
and stream the results back to the text box.
Select the Undo button to undo the prior generation call.
Select the Regenerate button to complete an undo and generation call together.
Azure OpenAI also performs content moderation on the prompt inputs and generated
outputs. The prompts or responses may be filtered if harmful content is detected. For
more information, see the content filter article.
In the GPT-3 playground you can also view Python and curl code samples pre-filled
according to your selected settings. Just select View code next to the examples
dropdown. You can write an application to complete the same task with the OpenAI
Python SDK, curl, or other REST API client.
4. Select your deployment from the Deployments dropdown. If your resource doesn't
have a deployment, select Create a deployment and then revisit this step.
6. Select Generate . Azure OpenAI will attempt to capture the context of text and
rephrase it succinctly. You should get a result that resembles the following text:
The accuracy of the response can vary per model. The Davinci based model in this
example is well-suited to this type of summarization, whereas a Codex based model
wouldn't perform as well at this particular task.
Clean up resources
If you want to clean up and remove an OpenAI resource, you can delete the resource or
resource group. Deleting the resource group also deletes any other resources associated
with it.
Portal
Azure CLI
Next steps
Learn more about how to generate the best completion in our How-to guide on
completions.
For more examples check out the Azure OpenAI Samples GitHub repository .
Quickstart: Get started using ChatGPT
and GPT-4 with Azure OpenAI Service
Article • 05/24/2023
Prerequisites
An Azure subscription - Create one for free .
Currently, access to this service is granted only by application. You can apply for
access to Azure OpenAI by completing the form at https://aka.ms/oai/access .
Open an issue on this repo to contact us if you have an issue.
An Azure OpenAI Service resource with either the gpt-35-turbo or the gpt-4 1
models deployed. For more information about model deployment, see the
resource deployment guide.
1
GPT-4 models are currently only available by request. To access these models,
existing Azure OpenAI customers can apply for access by filling out this form .
From the Azure OpenAI Studio landing page, select Chat playground.
Playground
Start exploring OpenAI capabilities with a no-code approach through the Azure OpenAI
Studio Chat playground. From this page, you can quickly iterate and experiment with the
capabilities.
Assistant setup
You can use the Assistant setup dropdown to select a few pre-loaded System message
examples to get started.
System messages give the model instructions about how it should behave and any
context it should reference when generating a response. You can describe the assistant's
personality, tell it what it should and shouldn't answer, and tell it how to format
responses.
Add few-shot examples allows you to provide conversational examples that are used by
the model for in-context learning.
At any time while using the Chat playground you can select View code to see Python,
curl, and json code samples pre-populated based on your current chat session and
settings selections. You can then take this code and write an application to complete the
same task you're currently performing with the playground.
Chat session
Selecting the Send button sends the entered text to the completions API and the results
are returned back to the text box.
Select the Clear chat button to delete the current conversation history.
Settings
Name Description
Deployments Your deployment name that is associated with a specific model. For ChatGPT, you
need to use the gpt-35-turbo model.
Temperature Controls randomness. Lowering the temperature means that the model produces
more repetitive and deterministic responses. Increasing the temperature results in
more unexpected or creative responses. Try adjusting temperature or Top P but
not both.
Max length Set a limit on the number of tokens per model response. The API supports a
(tokens) maximum of 4096 tokens shared between the prompt (including system message,
examples, message history, and user query) and the model's response. One token
is roughly four characters for typical English text.
Top Similar to temperature, this controls randomness but uses a different method.
probabilities Lowering Top P narrows the model’s token selection to likelier tokens. Increasing
Top P lets the model choose from tokens with both high and low likelihood. Try
adjusting temperature or Top P but not both.
Name Description
Multi-turn Select the number of past messages to include in each new API request. This
conversations helps give the model context for new user queries. Setting this number to 10
results in five user queries and five system responses.
Stop Stop sequence make the model end its response at a desired point. The model
sequences response ends before the specified sequence, so it won't contain the stop
sequence text. For ChatGPT, using <|im_end|> ensures that the model response
doesn't generate a follow-up user query. You can include as many as four stop
sequences.
Show panels
By default there are three panels: assistant setup, chat session, and settings. Show
panels allows you to add, remove, and rearrange the panels. If you ever close a panel
and need to get it back, use Show panels to restore the lost panel.
2. You'll be prompted asking if you want to update the system message, select
Continue.
3. In the chat session pane, enter the following question: "I'm interested in buying a
new Xbox", and select Send.
6. Now that you have a basic conversation select View code from under Assistant
setup and you'll have a replay of the code behind the entire conversation so far:
Python
import os
import openai
openai.api_type = "azure"
openai.api_base = "https://docs-test-001.openai.azure.com/"
openai.api_version = "2023-05-15"
openai.api_key = os.getenv("OPENAI_API_KEY")
response = openai.ChatCompletion.create(
max_tokens=350,
top_p=0.95,
frequency_penalty=0,
presence_penalty=0,
stop=None)
The assistant's response to the prompt will then be returned below the
<|im_start|>assistant token and will end with <|im_end|> denoting that the assistant
has finished its response. You can also use the Show raw syntax toggle button to
display these tokens within the chat session panel.
The ChatGPT how-to guide provides an in-depth introduction into the new prompt
structure and how to use the gpt-35-turbo model effectively.
The first time you deploy a web app, you should select Create a new web app. Choose a
name for the app, which will
become part of the app URL. For example,
https://<appname>.azurewebsites.net .
Select your subscription, resource group, location, and pricing plan for the published
app. To
update an existing app, select Publish to an existing web app and choose the
name of your previous
app from the dropdown menu.
Clean up resources
Once you're done testing out the Chat playground, if you want to clean up and remove
an OpenAI resource, you can delete the resource or resource group. Deleting the
resource group also deletes any other resources associated with it.
Portal
Azure CLI
Next steps
Learn more about how to work with ChatGPT and the new gpt-35-turbo model
with the ChatGPT how-to guide.
For more examples check out the Azure OpenAI Samples GitHub repository
Quickstart: Get started generating
images using Azure OpenAI Service
Article • 06/12/2023
Use this guide to get started generating images with Azure OpenAI in your browser.
7 Note
The image generation API creates an image from a text prompt. It does not edit
existing images or create variations.
Prerequisites
An Azure subscription - Create one for free .
Access granted to DALL-E in the desired Azure subscription.
Currently, access to
this service is granted only by application. You can apply for access to Azure
OpenAI by completing the form at https://aka.ms/oai/access . Existing Azure
OpenAI customers need to re-enter the form to get access to DALL-E. Open an
issue on this repo to contact us if you have an issue.
An Azure OpenAI resource created in the East US region. For more information
about model deployment, see the resource deployment guide.
From the Azure OpenAI Studio landing page, select DALL·E 2 playground to use the
image generation APIs.
The image generation APIs come with a content moderation filter. If the service
recognizes your prompt as harmful content, it won't return a generated image. For
more information, see the content filter guide.
In the DALL-E 2 playground, you can also view Python and cURL code samples, which
are pre-filled according to your settings. Select View code near the top of the page. You
can use this code to write an application that completes the same task.
Clean up resources
If you want to clean up and remove an OpenAI resource, you can delete the resource or
resource group. Deleting the resource group also deletes any other resources associated
with it.
Portal
Azure CLI
Next steps
Azure OpenAI Overview
For more examples check out the Azure OpenAI Samples GitHub repository .
Quickstart: Chat with Azure OpenAI
models using your own data
Article • 06/28/2023
In this quickstart you can use your own data with Azure OpenAI models. Using Azure
OpenAI's models on your data can provide you with a powerful conversational AI
platform that enables faster and more accurate communication.
Prerequisites
An Azure subscription - Create one for free .
An Azure OpenAI resource with a chat model deployed (for example, GPT-3 or
GPT-4). For more information about model deployment, see the resource
deployment guide.
2. On the Assistant setup tile, select Add your data (preview) > + Add a data
source.
3. In the pane that appears, select Upload files under Select data source. Select
Upload files. Azure OpenAI needs both a storage resource and a search resource
to access and index your data.
Tip
a. For Azure OpenAI to access your storage account, you will need to turn on
Cross-origin resource sharing (CORS) . If CORS isn't already turned on for the
Azure Blob storage resource, select Turn on CORS.
b. Select your Azure Cognitive Search resource, and select the acknowledgment
that connecting it will incur usage on your account. Then select Next.
4. On the Upload files pane, select Browse for a file and select the files you want to
upload. Then select Upload files. Then select Next.
5. Review the details you entered, and select Save and close. You can now chat with
the model and it will use information from your data to construct the response.
Chat playground
Start exploring Azure OpenAI capabilities with a no-code approach through the chat
playground. It's simply a text box where you can submit a prompt to generate a
completion. From this page, you can quickly iterate and experiment with the capabilities.
You can experiment with the configuration settings such as temperature and pre-
response text to improve the performance of your task. You can read more about each
parameter in the REST API.
Selecting the Generate button will send the entered text to the completions API
and stream the results back to the text box.
Select the Undo button to undo the prior generation call.
Select the Regenerate button to complete an undo and generation call together.
The first time you deploy a web app, you should select Create a new web app. Choose a
name for the app, which will
become part of the app URL. For example,
https://<appname>.azurewebsites.net .
Select your subscription, resource group, location, and pricing plan for the published
app. To
update an existing app, select Publish to an existing web app and choose the
name of your previous
app from the dropdown menu.
Important considerations
Publishing creates an Azure App Service in your subscription. It may incur costs
depending on the
pricing plan you select. When you're done with your app, you
can delete it from the Azure portal.
By default, the app will only be accessible to you. To add authentication (for
example, restrict access to the app to members of your Azure tenant):
1. Go to the Azure portal and search for the app name you specified during
publishing. Select the web app, and go to the Authentication tab on the left
navigation menu. Then select Add an identity provider.
2. Select Microsoft as the identity provider. The default settings on this page will
restrict the app to your tenant only, so you don't need to change anything
else here. Then select Add
Now users will be asked to sign in with their Azure Active Directory account to be
able to access your app. You can follow a similar process to add another identity
provider if you prefer. The app doesn't use the user's login information in any
other way other than verifying they are a member of your tenant.
Clean up resources
If you want to clean up and remove an OpenAI or Azure Cognitive Search resource, you
can delete the resource or resource group. Deleting the resource group also deletes any
other resources associated with it.
Next steps
Learn more about using your data in Azure OpenAI Service
Chat app sample code on GitHub .
Abuse Monitoring
Article • 06/19/2023
Azure OpenAI Service detects and mitigates instances of recurring content and/or
behaviors that suggest use of the service in a manner that may violate the Code of
Conduct or other applicable product terms. Details on how data is handled can be
found on the Data, Privacy and Security page.
Human Review and Decision: When prompts and/or completions are flagged
through content classification and abuse pattern capture as described above,
authorized Microsoft employees may assess the flagged content, and either
confirm or correct the classification or determination based on predefined
guidelines and policies. Data can be accessed for human review only by authorized
Microsoft employees via Secure Access Workstations (SAWs) with Just-In-Time (JIT)
request approval granted by team managers. For Azure OpenAI Service resources
deployed in the European Economic Area, the authorized Microsoft employees are
located in the European Economic Area.
Azure OpenAI Service includes a content filtering system that works alongside core
models. This system works by running both the prompt and completion through an
ensemble of classification models aimed at detecting and preventing the output of
harmful content. The content filtering system detects and takes action on specific
categories of potentially harmful content in both input prompts and output
completions. Variations in API configurations and application design may affect
completions and thus filtering behavior. The content filtering system supports the
following languages: English, German, Japanese, Spanish, French, Italian, Portuguese,
and Chinese. It might not be able to detect inappropriate content in languages that it
has not been trained or tested to process.
In addition to the content filtering system, the Azure OpenAI Service performs
monitoring to detect content and/or behaviors that suggest use of the service in a
manner that may violate applicable product terms. For more information about
understanding and mitigating risks associated with your application, see the
Transparency Note for Azure OpenAI. For more information about how data is
processed in connection with content filtering and abuse monitoring, see Data, privacy,
and security for Azure OpenAI Service.
The following sections provide information about the content filtering categories, the
filtering severity levels and their configurability, and API scenarios to be considered in
application design and implementation.
Categories
Category Description
Category Description
Hate The hate category describes language attacks or uses that include pejorative or
discriminatory language with reference to a person or identity group on the basis of
certain differentiating attributes of these groups including but not limited to race,
ethnicity, nationality, gender identity and expression, sexual orientation, religion,
immigration status, ability status, personal appearance, and body size.
Sexual The sexual category describes language related to anatomical organs and genitals,
romantic relationships, acts portrayed in erotic or affectionate terms, physical sexual
acts, including those portrayed as an assault or a forced sexual violent act against
one’s will, prostitution, pornography, and abuse.
Violence The violence category describes language related to physical actions intended to
hurt, injure, damage, or kill someone or something; describes weapons, etc.
Self- The self-harm category describes language related to physical actions intended to
Harm purposely hurt, injure, or damage one’s body, or kill oneself.
Severity levels
Category Description
Safe Content may be related to violence, self-harm, sexual, or hate categories but the
terms are used in general, journalistic, scientific, medical, and similar professional
contexts, which are appropriate for most audiences.
Medium Content that uses offensive, insulting, mocking, intimidating, or demeaning language
towards specific identity groups, includes depictions of seeking and executing
harmful instructions, fantasies, glorification, promotion of harm at medium intensity.
High Content that displays explicit and severe harmful instructions, actions, damage, or
abuse; includes endorsement, glorification, or promotion of severe harmful acts,
extreme or illegal forms of harm, radicalization, or non-consensual power exchange
or abuse.
Configurability (preview)
The default content filtering configuration is set to filter at the medium severity
threshold for all four content harm categories for both prompts and completions. That
means that content that is detected at severity level medium or high is filtered, while
content detected at severity level low is not filtered by the content filters. The
configurability feature is available in preview and allows customers to adjust the
settings, separately for prompts and completions, to filter content for each content
category at different severity levels as described in the table below:
Medium, Yes Yes Default setting. Content detected at severity level low
high is not filtered, content at medium and high is filtered.
High If approved* If approved* Content detected at severity levels low and medium
is not filtered. Only content at severity level high is
filtered. Requires approval*.
*
Only customers who have been approved for modified content filtering have full
content filtering control, including configuring content filters at severity level high only
or turning content filters off. Apply for modified content filters via this form: Azure
OpenAI Limited Access Review: Modified Content Filters and Abuse Monitoring
(microsoft.com)
Content filtering configurations are created within a Resource in Azure AI Studio, and
can be associated with Deployments. Learn more about configurability here.
Scenario details
When the content filtering system detects harmful content, you'll receive either an error
on the API call if the prompt was deemed inappropriate or the finish_reason on the
response will be content_filter to signify that some of the completion was filtered.
When building your application or system, you'll want to account for these scenarios
where the content returned by the Completions API is filtered, which may result in
content that is incomplete. How you act on this information will be application specific.
The behavior can be summarized in the following points:
Prompts that are classified at a filtered category and severity level will return an
HTTP 400 error.
Non-streaming completions calls won't return any content when the content is
filtered. The finish_reason value will be set to content_filter. In rare cases with
longer responses, a partial result can be returned. In these cases, the
finish_reason will be updated.
For streaming completions calls, segments will be returned back to the user as
they're completed. The service will continue streaming until either reaching a stop
token, length, or when content that is classified at a filtered category and severity
level is detected.
Scenario: You send a non-streaming completions call
asking for multiple outputs; no content is classified at a
filtered category and severity level
The table below outlines the various ways content filtering can appear:
200 In the cases when all generation passes the filters as configured, no content
moderation details are added to the response. The finish_reason for each generation
will be either stop or length.
JSON
"prompt":"Text example",
"n": 3,
"stream": false
JSON
"id": "example-id",
"object": "text_completion",
"created": 1653666286,
"model": "davinci",
"choices": [
"index": 0,
"finish_reason": "stop",
"logprobs": null
200 The generations that were filtered will have a finish_reason value of
content_filter .
JSON
"prompt":"Text example",
"n": 3,
"stream": false
JSON
"id": "example",
"object": "text_completion",
"created": 1653666831,
"model": "ada",
"choices": [
"index": 0,
"finish_reason": "length",
"logprobs": null
},
"index": 1,
"finish_reason": "content_filter",
"logprobs": null
400 The API call will fail when the prompt triggers a content filter as configured.
Modify the prompt and try again.
JSON
JSON
"error": {
"type": null,
"param": "prompt",
"code": "content_filter",
"status": 400
200 In this case, the call will stream back with the full generation and finish_reason will
be either 'length' or 'stop' for each generated response.
JSON
"prompt":"Text example",
"n": 3,
"stream": true
JSON
"id": "cmpl-example",
"object": "text_completion",
"created": 1653670914,
"model": "ada",
"choices": [
"index": 2,
"finish_reason": "stop",
"logprobs": null
200 For a given generation index, the last chunk of the generation will include a non-null
finish_reason value. The value will be content_filter when the generation was
filtered.
JSON
"prompt":"Text example",
"n": 3,
"stream": true
"id": "cmpl-example",
"object": "text_completion",
"created": 1653670515,
"model": "ada",
"choices": [
"index": 2,
"finish_reason": "content_filter",
"logprobs": null
200 If the content filtering system is down or otherwise unable to complete the operation
in time, your request will still complete without content filtering. You can determine
that the filtering wasn't applied by looking for an error message in the
content_filter_result object.
JSON
"prompt":"Text example",
"n": 1,
"stream": false
JSON
"id": "cmpl-example",
"object": "text_completion",
"created": 1652294703,
"model": "ada",
"choices": [
"index": 0,
"finish_reason": "length",
"logprobs": null,
"content_filter_result": {
"error": {
"code": "content_filter_error",
Annotations (preview)
When annotations are enabled as shown in the code snippet below, the following
information is returned via the API: content filtering category (hate, sexual, violence,
self-harm); within each content filtering category, the severity level (safe, low, medium or
high); filtering status (true or false).
Annotations are currently in preview for Completions and Chat Completions (GPT
models); the following code snippet shows how to use annotations in preview:
Python
# os.getenv() for the endpoint and key assumes that you are using
environment variables.
import os
import openai
openai.api_type = "azure"
openai.api_base = os.getenv("AZURE_OPENAI_ENDPOINT")
openai.api_key = os.getenv("AZURE_OPENAI_KEY")
response = openai.Completion.create(
print(response)
Output
JSON
"choices": [
"content_filter_results": {
"hate": {
"filtered": false,
"severity": "safe"
},
"self_harm": {
"filtered": false,
"severity": "safe"
},
"sexual": {
"filtered": false,
"severity": "safe"
},
"violence": {
"filtered": false,
"severity": "low"
},
"finish_reason": "length",
"index": 0,
"logprobs": null,
"text": {"\")(\"Example model response will be returned\").}"
],
"created": 1685727831,
"id": "cmpl-7N36VZAVBMJtxycrmiHZ12aK76a6v",
"model": "text-davinci-003",
"object": "text_completion",
"prompt_annotations": [
"content_filter_results": {
"hate": {
"filtered": false,
"severity": "safe"
},
"self_harm": {
"filtered": false,
"severity": "safe"
},
"sexual": {
"filtered": false,
"severity": "safe"
},
"violence": {
"filtered": false,
"severity": "safe"
},
"prompt_index": 0
}
],
"usage": {
"completion_tokens": 16,
"prompt_tokens": 5,
"total_tokens": 21
The following code snippet shows how to retrieve annotations when content was
filtered:
Python
# os.getenv() for the endpoint and key assumes that you are using
environment variables.
import os
import openai
openai.api_type = "azure"
openai.api_base = os.getenv("AZURE_OPENAI_ENDPOINT")
openai.api_key = os.getenv("AZURE_OPENAI_KEY")
try:
response = openai.Completion.create(
prompt="<PROMPT>",
engine="<MODEL_DEPLOYMENT_NAME>",
print(response)
except openai.error.InvalidRequestError as e:
content_filter_result = e.error.innererror.content_filter_result
print(content_filter_result)
For details on the inference REST API endpoints for Azure OpenAI and how to create
Chat and Completions please follow Azure OpenAI Service REST API reference guidance.
Annotations are returned for all scenarios when using 2023-06-01-preview .
"error": {
"type": null,
"param": "prompt",
"code": "content_filter",
"status": 400,
"innererror": {
"code": "ResponsibleAIPolicyViolation",
"content_filter_result": {
"hate": {
"filtered": true,
"severity": "high"
},
"self-harm": {
"filtered": true,
"severity": "high"
},
"sexual": {
"filtered": false,
"severity": "safe"
},
"violence": {
"filtered":true,
"severity": "medium"
Best practices
As part of your application design, consider the following best practices to deliver a
positive experience with your application while minimizing potential harms:
Decide how you want to handle scenarios where your users send prompts
containing content that is classified at a filtered category and severity level or
otherwise misuse your application.
Check the finish_reason to see if a completion is filtered.
Check that there's no error object in the content_filter_result (indicating that
content filters didn't run).
Next steps
Learn more about the underlying models that power Azure OpenAI.
Apply for modified content filters via this form .
Azure OpenAI content filtering is powered by Azure AI Content Safety .
Learn more about understanding and mitigating risks associated with your
application: Overview of Responsible AI practices for Azure OpenAI models.
Learn more about how data is processed in connection with content filtering and
abuse monitoring: Data, privacy, and security for Azure OpenAI Service.
Understanding embeddings in Azure
OpenAI Service
Article • 05/10/2023
Embedding models
Different Azure OpenAI embedding models are specifically created to be good at a
particular task. Similarity embeddings are good at capturing semantic similarity
between two or more pieces of text. Text search embeddings help measure whether
long documents are relevant to a short query. Code search embeddings are useful for
embedding code snippets and embedding natural language search queries.
Cosine similarity
Azure OpenAI embeddings rely on cosine similarity to compute similarity between
documents and a query.
From a mathematic perspective, cosine similarity measures the cosine of the angle
between two vectors projected in a multi-dimensional space. This is beneficial because if
two documents are far apart by Euclidean distance because of size, they could still have
a smaller angle between them and therefore higher cosine similarity. For more
information about cosine similarity equations, see this article on Wikipedia .
Next steps
Learn more about using Azure OpenAI and embeddings to perform document search
with our embeddings tutorial.
Introduction to red teaming large
language models (LLMs)
Article • 05/19/2023
The term red teaming has historically described systematic adversarial attacks for testing
security vulnerabilities. With the rise of LLMs, the term has extended beyond traditional
cybersecurity and evolved in common usage to describe many kinds of probing, testing,
and attacking of AI systems. With LLMs, both benign and adversarial usage can produce
potentially harmful outputs, which can take many forms, including harmful content such
as hate speech, incitement or glorification of violence, or sexual content.
Microsoft has conducted red teaming exercises and implemented safety systems
(including content filters and other mitigation strategies) for its Azure OpenAI Service
models (see this Responsible AI Overview). However, the context of your LLM
application will be unique and you also should conduct red teaming to:
Test the LLM base model and determine whether there are gaps in the existing
safety systems, given the context of your application system.
Identify and mitigate shortcomings in the existing default filters or mitigation
strategies.
Provide feedback on failures so we can make improvements.
Here is how you can get started in your process of red teaming LLMs. Advance planning
is critical to a productive red teaming exercise.
Getting started
LLM red teamers should be a mix of people with diverse social and professional
backgrounds, demographic groups, and interdisciplinary expertise that fits the
deployment context of your AI system. For example, if you’re designing a chatbot to
help health care providers, medical experts can help identify risks in that domain.
Recruit red teamers with both benign and adversarial mindsets.
You will need to take care of your red teamers, not only by limiting the amount of time
they spend on an assignment, but also by letting them know they can opt out at any
time. Also, avoid burnout by switching red teamers’ assignments to different focus
areas.
Where to test
Because a system is developed using a LLM base model, you may need to test at several
different layers:
The LLM base model with its safety system in place to identify any gaps that may
need to be addressed in the context of your application system. (Testing is usually
through an API endpoint.)
Your application system. (Testing is usually through a UI.)
Both the LLM base model and your application system before and after mitigations
are in place.
How to test
Consider conducting iterative red teaming in at least two phases:
1. Open-ended red teaming, where red teamers are encouraged to discover a variety
of harms. This can help you develop a taxonomy of harms to guide further testing.
Note that developing a taxonomy of undesired LLM outputs for your application
system is crucial to being able to measure the success of specific mitigation efforts.
2. Guided red teaming, where red teamers are assigned to focus on specific harms
listed in the taxonomy while staying alert for any new harms that may emerge. Red
teamers can also be instructed to focus testing on specific features of a system for
surfacing potential harms.
Be sure to:
Provide your red teamers with clear instructions for what harms or system features
they will be testing.
Give your red teamers a place for recording their findings. For example, this could
be a simple spreadsheet specifying the types of data that red teamers should
provide, including basics such as:
The type of harm that was surfaced.
The input prompt that triggered the output.
An excerpt from the problematic output.
Comments about why the red teamer considered the output problematic.
Maximize the effort of responsible AI red teamers who have expertise for testing
specific types of harms or undesired outputs. For example, have security subject
matter experts focus on jailbreaks, metaprompt extraction, and content related to
aiding cyberattacks.
Next steps
Learn about other mitigation strategies like prompt engineering
Introduction to prompt engineering
Article • 03/21/2023
GPT-3, GPT-3.5, and GPT-4 models from OpenAI are prompt-based. With prompt-based
models, the user interacts with the model by entering a text prompt, to which the model
responds with a text completion. This completion is the model’s continuation of the
input text.
While these models are extremely powerful, their behavior is also very sensitive to the
prompt. This makes prompt construction an important skill to develop.
Prompt construction can be difficult. In practice, the prompt acts to configure the model
weights to complete the desired task, but it's more of an art than a science, often
requiring experience and intuition to craft a successful prompt. The goal of this article is
to help get you started with this learning process. It attempts to capture general
concepts and patterns that apply to all GPT models. However it's important to
understand that each model behaves differently, so the learnings may not apply equally
to all models.
Basics
This section covers the basic concepts and elements of GPT prompts.
Text prompts are how users interact with GPT models. As with all generative language
models, GPT models attempt to produce the next series of words that are most likely to
follow from the previous text. It's as if we're saying What is the first thing that comes to
your mind when I say <prompt> ?”. The examples below demonstrate this behavior. Given
the first words of famous content, the model is able to accurately continue the text.
Prompt Completion
Four score fathers brought forth on this continent, a new nation, conceived in Liberty, and
and seven dedicated to the proposition that all men are created equal. […]
years ago
our
"Call me "Some years ago—never mind how long precisely—having little o no money in my
Ishmael." purse, and nothing particular to interest me on shore, I thought I would sail about
a little and see the watery part of the world." […]
As you develop more complex prompts, it's helpful to keep this fundamental behavior in
mind. Regardless of the prompt that is provided, the model is simply responding with
what it determines is most likely (given its training data and training targets). If you ask
a question in your prompt, the model isn't following a separate “Q&A” code path, but
rather it appears to answer the question because an answer is the most likely sort of
response for the given question as input.
Prompt components
When using the Completion API while there's no differentiation between different parts
of the prompt, it can still be useful for learning and discussion to identify underlying
prompt components. With the Chat Completion API there are distinct sections of the
prompt that are sent to the API in the form of an array of dictionaries with associated
roles: system, user, and assistant. This guidance focuses more generally on how to think
about prompt construction rather than providing prescriptive guidance that is specific to
one API over another.
It's also important to understand that while there could be other valid ways to dissect
prompts, the goal of this breakdown is to provide a relatively simple way to think about
prompt construction. With the Completion API, all the components are optional, but at
least one must be present and most prompts include more than one component. There
can be some grey area between components as well. The order presented below
roughly corresponds to how commonly each component is used, from most to least.
Instructions
Instructions are likely the most commonly used prompt component. Instructions are
simply that: instructions to the model on what to do. While simple in concept, they can
be complex in practice. The following table illustrates this by showing both simple and
more complex instructions, in two different forms.
Simple Write an introduction for Welcome to our weekly newsletter. This week we have
instructions a weekly newsletter. a special focus on productivity. We'll be sharing tips
and tricks on how to be productive in all areas of your
life. We hope you enjoy and please share with your
friends!
Instruction Prompt Completion
Type
Complex Write an introduction for Hello Contoso team! I wanted to take a moment to
Instructions a weekly newsletter, thank you for all your hard work over the past few
including the following:
months. It's been a tough couple of months, but you've
- My company is named all pulled together and done an amazing job. I'm really
contoso
proud of you all. Looking ahead to the next quarter, I'm
- An exuberant greeting
feeling really positive. I think we're going to achieve
- Thanking the team for some great things. Thanks again, The Senior Leadership
their hard work over a Team.
couple tough months
Primary Content
Primary content refers to some sort of text that is being processed or transformed by
the model. Primary content is typically used in conjunction with instructions. A simple
example would be language translation. In the example below, the English text would be
considered the primary content while “Translate to French:” is an instruction.
Prompt Completion
Can you please tell me how to get to the Pouvez-vous s'il vous plaît me dire comment aller
museum?
au musée?
Translate to French:
Primary content can also be much longer. In the following example, the primary content
is the introduction section from the Wikipedia entry on time complexity , which is
almost 300 words long. This content has been abbreviated in our table for ease of
display.
Prompt Completion
Prompt Completion
In computer science, the time Time complexity is a measure of how long an algorithm takes
complexity is the to run. It is typically expressed as a function of the size of the
computational complexity that input. The most common measure is worst-case time
describes the amount of complexity, which is the maximum amount of time required for
computer time it takes to run inputs of a given size.
an algorithm. Time complexity
is commonly estimated by [...]
In computer science, the time The time complexity of an algorithm is the amount of time it
complexity is the takes to run an algorithm. Time complexity is commonly
computational complexity that estimated by counting the number of elementary operations
describes the amount of performed by the algorithm, supposing that each elementary
computer time it takes to run operation takes a fixed amount of time to perform. The time
an algorithm. Time complexity complexity is generally expressed as a function of the size of
is commonly estimated by [...]
the input.
TL;DR (Too long didn't read)
GPT models can also handle primary content that is structured. In the example below, a
TSV about beers in Belgium (abbreviated for display in this article) is passed to the
model as part of the prompt. It is capable of properly interpreting the content and
answering questions about the data.
Prompt Completion
[…]
Referencing the table above, how many beers are less than
6% ABV?
Examples
Successful prompts often rely on the practice of “one-shot” or “few-shot” learning. This
refers to the inclusion of one or more examples of the desired behavior of the model,
typically by including input and output pairs. This is not learning in the sense that the
model is permanently changed, but rather that the examples better condition the model
to respond as desired for only the current inference. The use of prompts with no
examples is sometimes referred to as “zero-shot” learning. Note that with the Chat
Completion API few-shot learning examples are typically added to the messages array in
the form of example user/assistant interactions after the initial system message.
Zero- Headline: Coach The coach is confident that the injury won't derail the Warriors'
shot confident injury season. The team is still focused on their goals and that they
won't derail will continue to work hard to achieve them.
Warriors
Topic:
Topic: Baseball
Headline: Qatar
World Cup to have
zones for sobering
up
Topic: Soccer
Headline: Yates:
Fantasy football
intel for Week 6
Topic: Football
Headline: Coach
confident injury
won't derail
Warriors
Topic:
The example above illustrates the utility of few-shot learning. Without the examples, the
model seems to be guessing at the desired behavior, while the examples cleanly show
the model how to operate. This also demonstrates the power of the model, it can infer
the category of label that is wanted, even without a ‘basketball’ label in the examples.
Cue
Cues act as the “jumpstart” for the output of the model, helping to direct the model to
the desired output. It is often a prefix that the model can build onto. In the example
below, we show the cue directing the story to include key elements. Cues are often used
along with instructions, but not always. In the examples below, the model is being used
to summarize a long announcement email. A cue is used in one case to help focus the
output of the model and to suggest a specific output format (bullet points) in the other.
In the context of the Chat Completion API instructions take the form of the system
message, and few-shot examples are indicated by providing a series of example
user/assistant interactions to help prime the model.
Summarize the
above email
message:
Summarize the
above email
message:
The key
takeaway of the
message is
Number Prompt Completion
of Cues
Performance & • Performance reviews are a time to reflect on past performance and
Development learning as well as to plan for future priorities and growth
approach at opportunities.
Contoso. […]
• Employees and managers should write comments in advance of
Summarize the the performance review meeting.
above email • There should be at least two performance reviews per year.
message:
• The time investment for each review should remain about the
Key Points:
same.
•
Supporting content
Supporting content is information that the model can utilize to influence the output in
some way. It differs from primary content in that it is not the main target of task, but it is
typically used along with primary content. Common examples include contextual
information such as the current date, the name of the user, user preferences, and so on.
The example below uses supporting content to help organize a set of planned
workshops for the user. Without the supporting (important topics) the model simply lists
off the workshops (truncated for display), when informed of my important topics, the
model is able to accurately group the meetings.
Planning Workshop Kickoff Please join us to kick off our workshop! Planning
Hear from our leaders about cross group priorities and learnings for Workshop
the year ahead. The invitation to this event has already been sent to Kickoff
everyone in […]
-Planning
Summarize the workshops above, grouped by topic: Workshop
Wrap-up
- Core
MeetUp
- Customer
Driven
Perspectives
-Impact and
Key Focus
Areas
-Best
practices
-
Accelerating
app
adoption
[…]
2. Search:
-Intro to
vector
search with
embedding
3. GPT
Models:
- Intro to
GPT-4
- GPT-35-
Turbo in-
depth.
Best practices
Be Specific. Leave as little to interpretation as possible. Restrict the operational
space.
Be Descriptive. Use analogies.
Double Down. Sometimes you may need to repeat yourself to the model. Give
instructions before and after your primary content, use an instruction and a cue,
etc.
Order Matters. The order in which you present information to the model may
impact the output. Whether you put instructions before your content (“summarize
the following…”) or after (“summarize the above…”) can make a difference in
output. Even the order of few-shot examples can matter. This is referred to as
recency bias.
Give the model an “out”. It can sometimes be helpful to give the model an
alternative path if it is unable to complete the assigned task. For example, when
asking a question over a piece of text you might include something like "respond
with ‘not found’ if the answer is not present." This can help the model avoid
generating false responses.
Space efficiency
While the input size increases with each new generation of GPT models, there will
continue to be scenarios that provide more data than the model can handle. GPT
models break words into “tokens”. While common multi-syllable words are often a
single token, less common words are broken in syllables. Tokens can sometimes be
counter-intuitive, as shown by the example below which demonstrates token boundaries
for different date formats. In this case, spelling out the entire month is more space
efficient than a fully numeric date. The current range of token support goes from 2000
tokens with earlier GPT-3 models to up to 32,768 tokens with the 32k version of the
latest GPT-4 model.
Tables – As shown in the examples in the previous section, GPT models can
understand tabular formatted data quite easily. This can be a space efficient way to
include data, rather than preceding every field with name (such as with JSON).
White Space – Consecutive whitespaces are treated as separate tokens which can
be an easy way to waste space. Spaces preceding a word, on the other hand, are
typically treated as part of the same token as the word. Carefully watch your usage
of whitespace and don’t use punctuation when a space alone will do.
Next steps
Learn more about Azure OpenAI
Prompt engineering techniques
Article • 04/23/2023
This guide will walk you through some advanced techniques in prompt design and
prompt engineering. If you're new to prompt engineering, we recommend starting with
our introduction to prompt engineering guide.
While the principles of prompt engineering can be generalized across many different
model types, certain models expect a specialized prompt structure. For Azure OpenAI
GPT models, there are currently two distinct APIs where prompt engineering comes into
play:
Each API requires input data to be formatted differently, which in turn impacts overall
prompt design. The Chat Completion API supports the GPT-35-Turbo and GPT-4
models. These models are designed to take input formatted in a specific chat-like
transcript stored inside an array of dictionaries.
The Completion API supports the older GPT-3 models and has much more flexible input
requirements in that it takes a string of text with no specific format rules. Technically the
GPT-35-Turbo models can be used with either APIs, but we strongly recommend using
the Chat Completion API for these models. To learn more, please consult our in-depth
guide on using these APIs.
The techniques in this guide will teach you strategies for increasing the accuracy and
grounding of responses you generate with a Large Language Model (LLM). It is,
however, important to remember that even when using prompt engineering effectively
you still need to validate the responses the models generate. Just because a carefully
crafted prompt worked well for a particular scenario doesn't necessarily mean it will
generalize more broadly to certain use cases. Understanding the limitations of LLMs, is
just as important as understanding how to leverage their strengths.
This guide does not go in-depth into the mechanics behind the message structure for
Chat Completions. If you aren't familiar with interacting with the ChatGPT and GPT-4
models programmatically we recommend reading our how-to guide on the Chat
Completion API first.
7 Note
All of the examples in this section of the guide were tested against a base GPT-4
model in English. If you are reading a localized version of this article in another
language, these responses represent a localized translation of the English results. To
learn more about potential limitations depending on what language you are using
to prompt a model, please consult our Responsible AI transparency note.
System message
The system message is included at the beginning of the prompt and is used to prime
the model with context, instructions, or other information relevant to your use case. You
can use the system message to describe the assistant’s personality, define what the
model should and shouldn’t answer, and define the format of model responses.
The example below, shows a sample system message and the resulting model response:
You are an AI assistant that helps people find information What can you Dear John, I'm
and responds in rhyme. If the user asks you a question you tell about me, sorry to say,
don't know the answer to, say so. John Doe? But I don't have
info on you today.
So please tell me
more about what
you seek,
JSON
"name": "",
"company": "",
"phone_number": ""
An important detail to understand is that even if you instruct a model in the system
message to answer I don't know when unsure of an answer this does not guarantee that
the request will be honored. A well designed system message can increase the
likelihood of a certain outcome, but it is still possible that an incorrect response could
be generated that contradicts the intent of the instruction in the system message.
Few-shot learning
A common way to adapt language models to new tasks is to use few-shot learning. In
few-shot learning, a set of training examples is provided as part of the prompt to give
additional context to the model.
When using the Chat Completions API, a series of messages between the User and
Assistant (written in the new prompt format), can serve as examples for few-shot
learning. These examples can be used to prime the model to respond in a certain way,
emulate particular behaviors, and seed answers to common questions.
designed to help users answer Assistant - "In 2023, you will need to file your taxes by April
their tax related questions.
18th. The date falls after the usual April 15th deadline
Instructions
because April 15th falls on a Saturday in 2023. For more
- Only answer questions related to details, see https://www.irs.gov/filing/individuals/when-to-
taxes.
file "
sure" and recommend users go to Assistant - "You can check the status of your tax refund by
the IRS website for more visiting https://www.irs.gov/refunds "
information.
The table above describes the basic prompt structure, but for a detailed explanation of
the exact prompt format you will need to refer to the Chat Completion guide.
Non chat scenarios
While the Chat Completion API is optimized to work with multi-turn conversations, it
also can be used for non chat scenarios. For example, for a sentiment analysis scenario,
you might use the following prompt:
You are an hi thank you for calling contoso pharmacy who am i Sentiment rating: 8
assistant speaking with today i understand your frustration mary um The speaker in this
designed to can you tell me what exactly you're trying to accomplish text seems to be
analyze OK uh certainly happy to check that for you one moment polite, helpful, and
sentiment please i see here that you were on a generic form of lipitor patient while
from speech is that right OK uh so i see that your doctor stopped addressing the
data. Users prescribing these drugs in twenty twenty one actually OK concerns of the
will paste in uh yeah i'd be happy to check that for you uh because customer. They take
a string of sometimes there's a gap in the system and it just doesn't the time to check
text and you get reported so let me take a look here just a moment so the information,
will respond i'm seeing here that your doctor had these drugs provide clear
with an prescribed to you from twenty twelve through twenty explanations, and
assessment twenty one OK um yeah according to the latest records uh offer assistance.
of the provided by doctor's office you're now on a different drug However, the rating
response of metformin would you like us to go ahead and fill that is not a 10 because
the speaker. prescription for you for pick up a bit of form in five there is still a hint of
Rate on a hundred milligrams uh you're very welcome mary please let uncertainty in some
scale of 1-10 us know if there's anything else we can do for you today. parts of the
(10 being conversation, which
highest). may not convey
Explain why complete
this rating confidence to the
was given. customer.
7 Note
You are an Your task is to verify if the No, the statement is not directly implied or
AI assistant statement "Several sources stated by the snippets. The snippets mention a
that helps mention a chance of another chance of a "megaquake" and a "magnitude 9
people find large eruption" is supported by a Cascadia earthquake" hitting Seattle in the
information. specific quote from the following next 50 years, but do not mention a chance of
set of snippets.
another large eruption.
---
SNIPPETS
---
You are an AI assistant John Smith is married to Lucy Smith. They have five "John Smith
that helps people find kids, and he works as a software engineer at married Lucy
information. Answer in Microsoft. What search queries should I do to fact- Smith five kids
as few words as check this? ## One possible search query is: software
possible. engineer
Microsoft"
In the above prompt, the text One possible search query is: primes the model to produce
a single output. Without this cue the model produces several search queries as output.
In the example below, separators ( --- in this case) have been added between different
sources of information or steps. This allows the use of --- as a stopping condition for
generation. In addition, section headings or special variables are presented in uppercase
to differentiate them.
You are an AI You will read a paragraph, and then issue queries 1. "John Smith Microsoft"
assistant that to a search engine in order to fact-check it. Also - To check if John Smith
helps people explain the queries.
is indeed employed at
find information. ---
Microsoft.
PARAGRAPH
2. "John Smith Lucy
John Smith is married to Lucy Smith. They have Smith" - To check if John
five kids, and he works as a software engineer at Smith is married to Lucy
Microsoft. What search queries should I do to Smith.
fact-check this?
3. "John Smith children" -
---
To check if John Smith
QUERIES has five children.
If you're not sure what syntax to use, consider using Markdown or XML. The models
have been trained on a lot of web content in XML and Markdown which may provide
better results.
You are an AI You will read a paragraph, and then issue queries to a 1. John Smith is
assistant that helps search engine in order to fact-check it.
married to Lucy
people find ---
Smith
information. PARAGRAPH
2. They have five
John Smith is married to Lucy Smith. They have five kids, kids
SEARCH("John
FACTUAL CLAIMS Smith married
Lucy Smith")
SEARCH("John
Smith family
size")
SEARCH("John
Smith Microsoft
software
engineer")
Notice the use of clear syntax to differentiate the sections and prime the output. In this
simple example, breaking the task down from one to two steps is not very dramatic, but
when trying to do this for a larger piece of text with many factual claims, breaking the
task down can make a significant difference.
Use of affordances
Sometimes we can get the model to use an affordance instead of relying on its own
parameters for information and answers. Search, for example, can be an affordance to
help mitigate against fabricated answers, and to get up-to-date information.
One simple way to use an affordance is to stop generation once the affordance calls are
generated by the model, then paste the outcomes back into the prompt. Below is an
example of what the follow-up call could be after doing the SEARCH calls above. Notice
how we pasted the search results into the prompt and replaced the previous SEARCH
calls.
You are an You will read a paragraph, - John Smith is married to Lucy Smith: Confirmed.
AI assistant extract factual claims, and Snippet [1] and [2] both mention John Smith's wife
System User Assistant
message
information. ---
- John and Lucy have five kids: Confirmed. Snippet
PARAGRAPH
[3] mentions that John was accompanied by his
John Smith is married to Lucy two daughters and three sons.
---
SEARCH QUERIES
---
SNIPPETS:
---
You are an Who was the most decorated Step 1: Determine the year of the Sydney
AI assistant (maximum medals) individual athlete Olympic games. The Sydney Olympic
that helps in the Olympic games that were held games were held in the year 2000.
If you instruct the model to cite the source material when it makes statements, those
statements are much more likely to be grounded. Asking for citations makes it so that
the model must make two errors every time it generates a response: the first error is the
fabricated response, and the second is the bad citation. Note that the closer the citation
is to the text it supports, the shorter the distance ahead the model needs to anticipate
the citation, which suggests that inline citations are better mitigations for false content
generation than citations at the end of the content.
Similarly, if you ask the model to extract factual statements from a paragraph, it may
extract compound statements such as ‘X is doing Y AND Z’ (which may be harder to
verify). This can be avoided by specifying an output structure such as (entity1,
relationship, entity2).
The examples below show the use of citations and directing the model response to fit a
defined structure.
You are an List down the different behaviors exhibited by the Customer 1. Polite Greeting
AI assistant Support agent in the following conversation.
BEHAVIOR("Agent
that helps ---
starts the
people find hi thank you for calling contoso pharmacy who am i speaking conversation with
information with today i understand your frustration mary um can you tell a polite greeting
me what exactly you're trying to accomplish OK uh certainly and asks for the
happy to check that for you one moment please i see here that caller's name")
---
4. Proactive
How was the response of the Customer Support agent?
BEHAVIOR("Agent
Classify and list down their various behaviors and cite the offers to check
reason behind the classification, using the format the issue for the
BEHAVIOR("reason") customer")
5. Informative
BEHAVIOR("Agent
provides the
relevant
information
about the
System User Assistant
message
customer's
prescription
history")
6. Thorough
BEHAVIOR("Agent
checks for any
possible gaps in
the system and
provides a
complete
response")
7. Confirmatory
BEHAVIOR("Agent
confirms the
current
medication with
the customer")
8. Solution-
oriented
BEHAVIOR("Agent
offers to fill the
new prescription
for the
customer")
9. Courteous
BEHAVIOR("Agent
thanks the
customer and
offers further
assistance if
needed")
"TOPIC_NAME": "",
"Demo from 1993 of 32-year-old
"HEADLINES": [],
Yann LeCun showing off the
"REASONING": ""
World's first Convolutional
}
Network for Text Recognition",
AR + ML"
"REASONING": "These headlines
2. "Demo from 1993 of 32-year-old Yann are related to artificial
LeCun showing off the World's first intelligence, machine learning,
Convolutional Network for Text Recognition"
and their applications in various
3. "First Order Motion Model applied to fields."
animate paintings"
},
],
You are an AI assistant that helps Who are some Here are some of the early customers
people find information. You will of the early of the Azure OpenAI service GPT-4
provide answers exclusively from customers of models:
below text and respond in 100 words the Azure - Epic Healthcare: They plan to use
or less:
OpenAI service GPT-4 to help physicians and nurses
"At Microsoft, we are constantly GPT-4 models? spend less time on data entry and
discovering new ways to unleash investigation.
1
Text from Azure OpenAI GPT-4 launch blog .
Next steps
Learn more about Azure OpenAI.
Get started with the ChatGPT model with the ChatGPT quickstart.
For more examples, check out the Azure OpenAI Samples GitHub repository
System message framework and
template recommendations for Large
Language Models (LLMs)
Article • 06/01/2023
This article provides a recommended framework and example templates to help write an
effective system message, sometimes referred to as a metaprompt or system prompt
that can be used to guide an AI system’s behavior and improve system performance. If
you're new to prompt engineering, we recommend starting with our introduction to
prompt engineering and prompt engineering techniques guidance.
This guide provides system message recommendations and resources that, along with
other prompt engineering techniques, can help increase the accuracy and grounding of
responses you generate with a Large Language Model (LLM). However, it is important to
remember that even when using these templates and guidance, you still need to
validate the responses the models generate. Just because a carefully crafted system
message worked well for a particular scenario doesn't necessarily mean it will work more
broadly across other scenarios. Understanding the limitations of LLMs and the
mechanisms for evaluating and mitigating those limitations is just as important as
understanding how to leverage their strengths.
The LLM system message framework described here covers four concepts:
Define the model’s profile, capabilities, and limitations for your scenario
Define the model’s output format
Provide example(s) to demonstrate the intended behavior of the model
Provide additional behavioral guardrails
Define how the model should complete the tasks, including any additional tools
(like APIs, code, plug-ins) the model can use. If it doesn’t use additional tools, it
can rely on its own parametric knowledge.
Define the scope and limitations of the model’s performance. Provide clear
instructions on how the model should respond when faced with any limitations.
For example, define how the model should respond if prompted on subjects or for
uses that are off topic or otherwise outside of what you want the system to do.
Define the posture and tone the model should exhibit in its responses.
Define the language and syntax of the output format. If you want the output to
be machine parse-able, you may want the output to be in formats like JSON, XSON
or XML.
Describe difficult use cases where the prompt is ambiguous or complicated, to give
the model additional visibility into how to approach such cases.
Show the potential “inner monologue” and chain-of-thought reasoning to better
inform the model on the steps it should take to achieve the desired outcomes.
Next steps
Learn more about Azure OpenAI
Learn more about deploying Azure OpenAI responsibly
For more examples, check out the Azure OpenAI Samples GitHub repository
Azure OpenAI on your data (preview)
Article • 07/07/2023
Azure OpenAI on your data enables you to run supported chat models such as GPT-35-
Turbo and GPT-4 on your data without needing to train or fine-tune models. Running
models on your data enables you to chat on top of, and analyze your data with greater
accuracy and speed. By doing so, you can unlock valuable insights that can help you
make better business decisions, identify trends and patterns, and optimize your
operations. One of the key benefits of Azure OpenAI on your data is its ability to tailor
the content of conversational AI.
To get started, connect your data source using Azure OpenAI Studio and start asking
questions and chatting on your data.
Because the model has access to, and can reference specific sources to support its
responses, answers are not only based on its pretrained knowledge but also on the
latest information available in the designated data source. This grounding data also
helps the model avoid generating responses based on outdated or incorrect
information.
7 Note
To get started, you need to already have been approved for Azure OpenAI access
and have an Azure OpenAI Service resource with either the gpt-35-turbo or the
gpt-4 models deployed.
One of the key features of Azure OpenAI on your data is its ability to retrieve and utilize
data in a way that enhances the model's output. Azure OpenAI on your data, together
with Azure Cognitive Search, determines what data to retrieve from the designated data
source based on the user input and provided conversation history. This data is then
augmented and resubmitted as a prompt to the OpenAI model, with retrieved
information being appended to the original prompt. Although retrieved data is being
appended to the prompt, the resulting input is still processed by the model like any
other prompt. Once the data has been retrieved and the prompt has been submitted to
the model, the model uses this information to provide a completion. See the Data,
privacy, and security for Azure OpenAI Service article for more information.
.txt
.md
.html
There are some caveats about document structure and how it might affect the quality of
responses from the model:
The model provides the best citation titles from markdown ( .md ) files.
If a document is a PDF file, the text contents are extracted as a preprocessing step
(unless you're connecting your own Azure Cognitive Search index). If your
document contains images, graphs, or other visual content, the model's response
quality depends on the quality of the text that can be extracted from them.
If you're converting data from an unsupported format into a supported format,
make sure the conversion:
Doesn't lead to significant data loss.
Doesn't add unexpected noise to your data.
This will impact the quality of Azure Cognitive Search and the model response.
Recommended settings
Use the following sections to help you configure Azure OpenAI on your data for optimal
results.
System message
Give the model instructions about how it should behave and any context it should
reference when generating a response. You can describe the assistant’s personality, what
it should and shouldn’t answer, and how to format responses. There’s no token limit for
the system message, but will be included with every API call and counted against the
overall token limit. The system message will be truncated if it's greater than 200 tokens.
For example, if you're creating a chatbot where the data consists of transcriptions of
quarterly financial earnings calls, you might use the following system message:
"You are a financial chatbot useful for answering questions from financial reports. You are
given excerpts from the earnings call. Please answer the questions by parsing through all
dialogue."
This system message can help improve the quality of the response by specifying the
domain (in this case finance) and mentioning that the data consists of call transcriptions.
It helps set the necessary context for the model to respond appropriately.
7 Note
The system message is only guidance. The model might not adhere to every
instruction specified because it has been primed with certain behaviors such as
objectivity, and avoiding controversial statements. Unexpected behavior may occur
if the system message contradicts with these behaviors.
Maximum response
Set a limit on the number of tokens per model response. The upper limit for Azure
OpenAI on Your Data is 1500. This is equivalent to setting the max_tokens parameter in
the API.
Semantic search
) Important
If semantic search is enabled for your Azure Cognitive Search service, you are more
likely to produce better retrieval of your data, which can improve response and citation
quality.
In this example, the fields mapped to Content data and Title provide information to the
model to answer questions. Title is also used to title citation text. The field mapped to
File name generates the citation names in the response.
Mapping these fields correctly helps ensure the model has better response and citation
quality.
Conversation history
Before starting a new conversation (or asking a question that is not related to the
previous ones), clear the chat history.
Getting different responses for the same question between the first conversational
turn and subsequent turns can be expected because the conversation history
changes the current state of the model. If you receive incorrect answers, report it
as a quality bug.
Model response
If you are not satisfied with the model response for a specific question, try either
making the question more specific or more generic to see how the model
responds, and reframe your question accordingly.
Question length
Avoid asking long questions and break them down into multiple questions if possible.
The GPT models have limits on the number of tokens they can accept. Token limits are
counted toward: the user question, the system message, the retrieved search documents
(chunks), internal prompts, the conversation history (if any), and the response. If the
question exceeds the token limit, it will be truncated.
Multi-lingual support
Azure OpenAI on your data supports queries that are in the same language as the
documents. For example, if your data is in Japanese, then queries need to be in
Japanese too.
Currently Azure OpenAI on your data supports semantic search for English data
only. Don't enable semantic search if your data is in other languages.
We recommend using a system message to inform the model that your data is in
another language. For example:
“You are an AI assistant that helps people find information. You retrieve Japanese
documents, and you should read them carefully in Japanese and answer in
Japanese.”
Parameter Recommendation
fieldsMapping Explicitly set the title and content fields of your index. This impacts the search
retrieval quality of Azure Cognitive Search, which impacts the overall response
and citation quality.
roleInformation Corresponds to the “System Message” in the Azure OpenAI Studio. See the
System message section above for recommendations.
Streaming data
You can send a streaming request using the stream parameter, allowing data to be sent
and received incrementally, without waiting for the entire API response. This can
improve performance and user experience, especially for large or dynamic data.
JSON
"stream": true,
"dataSources": [
"type": "AzureCognitiveSearch",
"parameters": {
"endpoint": "'$SearchEndpoint'",
"key": "'$SearchKey'",
"indexName": "'$SearchIndex'"
],
"messages": [
"role": "user",
When chatting with a model, providing a history of the chat will help the model return
higher quality results.
JSON
"dataSources": [
"type": "AzureCognitiveSearch",
"parameters": {
"endpoint": "'$SearchEndpoint'",
"key": "'$SearchKey'",
"indexName": "'$SearchIndex'"
],
"messages": [
"role": "user",
},
"role": "tool",
},
"role": "assistant",
},
"role": "user",
Next steps
Get started using your data with Azure OpenAI
Introduction to prompt engineering
Create a resource and deploy a model
using Azure OpenAI
Article • 05/23/2023
Use this article to get started with Azure OpenAI with step-by-step instructions to create
a resource and deploy a model. While the steps for resource creation and model
deployment can be completed in a few minutes, the actual deployment process itself
can take more than hour. You can create your resource, start your deployment, and then
check back in on your deployment later rather than actively waiting for the deployment
to complete.
Prerequisites
An Azure subscription - Create one for free
Currently, access to this service is granted only by application. You can apply for
access to Azure OpenAI by completing the form at https://aka.ms/oai/access .
Open an issue on this repo to contact us if you have an issue.
Create a resource
Resources in Azure can be created several different ways:
This guide walks you through the Azure portal creation experience.
Field Description
Subscription Select the Azure subscription used in your OpenAI onboarding application
Resource The Azure resource group that will contain your OpenAI resource. You can
group create a new group or add it to a pre-existing group.
Field Description
Region The location of your instance. Different locations may introduce latency, but
have no impact on the runtime availability of your resource.
Name A descriptive name for your cognitive services resource. For example,
MyOpenAIResource.
Pricing Tier Only 1 pricing tier is available for the service currently
Deploy a model
Before you can generate text or inference, you need to deploy a model. You can select
from one of several available models in Azure OpenAI Studio.
Field Description
Select a Model availability varies by region.For a list of available models per region,
model see Model Summary table and region availability.
Deployment Choose a name carefully. The deployment name will be used in your code to
name call the model via the client libraries and REST API
Options Tokens per Minute Rate Limit - Adjust the Tokens per Minute (TPM) to set
the effective rate limit for your deployment. You can modify this value at any
time via the Quotas menu
7. For your first deployment leave the Advanced Options set to the defaults.
The deployments table displays a new entry that corresponds to this newly created
model. Your deployment status will move to succeeded when the deployment is
complete and ready for use.
Next steps
Now that you have a resource and your first model deployed get started making
API calls and generating text with our quickstarts.
Learn more about the underlying models that power Azure OpenAI.
Learn how to work with the ChatGPT
and GPT-4 models
Article • 05/15/2023
The ChatGPT and GPT-4 models are language models that are optimized for
conversational interfaces. The models behave differently than the older GPT-3 models.
Previous models were text-in and text-out, meaning they accepted a prompt string and
returned a completion to append to the prompt. However, the ChatGPT and GPT-4
models are conversation-in and message-out. The models expect input formatted in a
specific chat-like transcript format, and return a completion that represents a model-
written message in the chat. While this format was designed specifically for multi-turn
conversations, you'll find it can also work well for non-chat scenarios too.
In Azure OpenAI there are two different options for interacting with these type of
models:
The Chat Completion API is a new dedicated API for interacting with the ChatGPT and
GPT-4 models. This API is the preferred method for accessing these models. It is also
the only way to access the new GPT-4 models.
ChatML uses the same completion API that you use for other models like text-davinci-
002, it requires a unique token based prompt format known as Chat Markup Language
(ChatML). This provides lower level access than the dedicated Chat Completion API, but
also requires additional input validation, only supports ChatGPT (gpt-35-turbo) models,
and the underlying format is more likely to change over time.
This article walks you through getting started with the new ChatGPT and GPT-4 models.
It's important to use the techniques described here to get the best results. If you try to
interact with the models the same way you did with the older model series, the models
will often be verbose and provide less useful responses.
Python
import os
import openai
openai.api_type = "azure"
openai.api_version = "2023-05-15"
openai.api_key = os.getenv("OPENAI_API_KEY")
response = openai.ChatCompletion.create(
messages=[
print(response)
print(response['choices'][0]['message']['content'])
Output
"choices": [
"finish_reason": "stop",
"index": 0,
"message": {
"content": "The founders of Microsoft are Bill Gates and Paul Allen.
They co-founded the company in 1975.",
"role": "assistant"
],
"created": 1679014551,
"id": "chatcmpl-6usfn2yyjkbmESe3G4jaQR6bsScO1",
"model": "gpt-3.5-turbo-0301",
"object": "chat.completion",
"usage": {
"completion_tokens": 86,
"prompt_tokens": 37,
"total_tokens": 123
7 Note
The following parameters aren't available with the new ChatGPT and GPT-4 models:
logprobs , best_of , and echo . If you set any of these parameters, you'll get an error.
Every response includes a finish_reason . The possible values for finish_reason are:
Consider setting max_tokens to a slightly higher value than normal such as 300 or 500.
This ensures that the model doesn't stop generating text before it reaches the end of
the message.
Model versioning
7 Note
Unlike previous GPT-3 and GPT-3.5 models, the gpt-35-turbo model as well as the gpt-
4 and gpt-4-32k models will continue to be updated. When creating a deployment of
these models, you'll also need to specify a model version.
Currently, only version 0301 is available for ChatGPT and 0314 for GPT-4 models. We'll
continue to make updated versions available in the future. You can find the model
deprecation times for these models on our models page.
A conversation with one example answer followed by a question would look like:
System role
The system role also known as the system message is included at the beginning of the
array. This message provides the initial instructions to the model. You can provide
various information in the system role including:
You can customize the system role for your use case or just include basic instructions.
The system role/message is optional, but it's recommended to at least include a basic
one to get the best results.
Messages
After the system role, you can include a series of messages between the user and the
assistant.
To trigger a response from the model, you should end with a user message indicating
that it's the assistant's turn to respond. You can also include a series of example
messages between the user and the assistant as a way to do few shot learning.
Basic example
If you want the ChatGPT model to behave similarly to chat.openai.com , you can use a
basic system message like "Assistant is a large language model trained by OpenAI."
Instructions:
- If you're unsure of an answer, you can say "I don't know" or "I'm not
sure" and recommend users go to the IRS website for more information. "},
You can also include relevant data or information in the system message to give the
model extra context for the conversation. If you only need to include a small amount of
information, you can hard code it in the system message. If you have a large amount of
data that the model should be aware of, you can use embeddings or a product like
Azure Cognitive Search to retrieve the most relevant information at query time.
Context:
- Azure OpenAI Service gives customers advanced language AI with OpenAI GPT-
3, Codex, and DALL-E models with the security and enterprise promise of
Azure. Azure OpenAI co-develops the APIs with OpenAI, ensuring compatibility
and a smooth transition from one to the other.
},
This is only one example of how you can use few shot learning with ChatGPT and GPT-4.
You can experiment with different approaches to see what works best for your use case.
{"role": "assistant", "content": "In 2023, you will need to file your taxes
by April 18th. The date falls after the usual April 15th deadline because
April 15th falls on a Saturday in 2023. For more details, see
https://www.irs.gov/filing/individuals/when-to-file."},
{"role": "user", "content": "How can I check the status of my tax refund?"},
{"role": "assistant", "content": "You can check the status of your tax
refund by visiting https://www.irs.gov/refunds"}
For example, for an entity extraction scenario, you might use the following prompt:
"name": "",
"company": "",
"phone_number": ""
}"},
Continuously takes console input, and properly formats it as part of the messages
array as user role content.
Outputs responses that are printed to the console and formatted and added to the
messages array as assistant role content.
This means that every time a new question is asked, a running transcript of the
conversation so far is sent along with the latest question. Since the model has no
memory, you need to send an updated transcript with each new question or the model
will lose context of the previous questions and answers.
Python
import os
import openai
openai.api_type = "azure"
openai.api_version = "2023-05-15"
openai.api_key = os.getenv("OPENAI_API_KEY")
while(True):
user_input = input()
response = openai.ChatCompletion.create(
messages = conversation
When you run the code above you will get a blank console window. Enter your first
question in the window and then hit enter. Once the response is returned, you can
repeat the process and keep asking questions.
Managing conversations
The previous example will run until you hit the model's token limit. With each question
asked, and answer received, the messages array grows in size. The token limit for gpt-
35-turbo is 4096 tokens, whereas the token limits for gpt-4 and gpt-4-32k are 8192 and
32768 respectively. These limits include the token count from both the message array
sent and the model response. The number of tokens in the messages array combined
with the value of the max_tokens parameter must stay under these limits or you'll receive
an error.
It's your responsibility to ensure the prompt and completion falls within the token limit.
This means that for longer conversations, you need to keep track of the token count and
only send the model a prompt that falls within the limit.
7 Note
We strongly recommend staying within the documented input token limit for all
models even if you find you are able to exceed that limit.
The following code sample shows a simple chat loop example with a technique for
handling a 4096 token count using OpenAI's tiktoken library.
The code requires tiktoken 0.3.0 . If you have an older version run pip install tiktoken
--upgrade .
Python
import tiktoken
import openai
import os
openai.api_type = "azure"
openai.api_version = "2023-05-15"
openai.api_key = os.getenv("OPENAI_API_KEY")
max_response_tokens = 250
token_limit= 4096
conversation=[]
conversation.append(system_message)
encoding = tiktoken.encoding_for_model(model)
num_tokens = 0
num_tokens += len(encoding.encode(value))
return num_tokens
while(True):
user_input = input("")
conv_history_tokens = num_tokens_from_messages(conversation)
del conversation[1]
conv_history_tokens = num_tokens_from_messages(conversation)
response = openai.ChatCompletion.create(
messages = conversation,
temperature=.7,
max_tokens=max_response_tokens,
In this example once the token count is reached the oldest messages in the conversation
transcript will be removed. del is used instead of pop() for efficiency, and we start at
index 1 so as to always preserve the system message and only remove user/assistant
messages. Over time, this method of managing the conversation can cause the
conversation quality to degrade as the model will gradually lose context of the earlier
portions of the conversation.
An alternative approach is to limit the conversation duration to the max token length or
a certain number of turns. Once the max token limit is reached and the model would
lose context if you were to allow the conversation to continue, you can prompt the user
that they need to begin a new conversation and clear the messages array to start a
brand new conversation with the full token limit available.
The token counting portion of the code demonstrated previously, is a simplified version
of one of OpenAI's cookbook examples .
Next steps
Learn more about Azure OpenAI.
Get started with the ChatGPT model with the ChatGPT quickstart.
For more examples, check out the Azure OpenAI Samples GitHub repository
Learn how to generate or manipulate
text
Article • 02/17/2023
The completions endpoint can be used for a wide variety of tasks. It provides a simple
but powerful text-in, text-out interface to any of our models. You input some text as a
prompt, and the model will generate a text completion that attempts to match whatever
context or pattern you gave it. For example, if you give the API the prompt, "As
Descartes said, I think, therefore", it will return the completion " I am" with high
probability.
The best way to start exploring completions is through our playground in Azure OpenAI
Studio . It's a simple text box where you can submit a prompt to generate a
completion. You can start with a simple example like the following:
once you submit, you'll see something like the following generated:
Console
The actual completion results you see may differ because the API is stochastic by
default. In other words, you might get a slightly different completion every time you call
it, even if your prompt stays the same. You can control this behavior with the
temperature setting.
This simple, "text in, text out" interface means you can "program" the model by
providing instructions or just a few examples of what you'd like it to do. Its success
generally depends on the complexity of the task and quality of your prompt. A general
rule is to think about how you would write a word problem for a middle school student
to solve. A well-written prompt provides enough information for the model to know
what you want and how it should respond.
7 Note
Keep in mind that the models' training data cuts off in October 2019, so they may
not have knowledge of current events. We plan to add more continuous training in
the future.
Prompt design
Basics
OpenAI's models can do everything from generating original stories to performing
complex text analysis. Because they can do so many things, you have to be explicit in
showing what you want. Showing, not just telling, is often the secret to a good prompt.
The models try to predict what you want from the prompt. If you send the words "Give
me a list of cat breeds," the model wouldn't automatically assume that you're asking for
a list of cat breeds. You could as easily be asking the model to continue a conversation
where the first words are "Give me a list of cat breeds" and the next ones are "and I'll tell
you which ones I like." If the model only assumed that you wanted a list of cats, it
wouldn't be as good at content creation, classification, or other tasks.
Show and tell. Make it clear what you want either through instructions, examples, or a
combination of the two. If you want the model to rank a list of items in alphabetical
order or to classify a paragraph by sentiment, show it that's what you want.
Provide quality data. If you're trying to build a classifier or get the model to follow a
pattern, make sure that there are enough examples. Be sure to proofread your examples
— the model is usually smart enough to see through basic spelling mistakes and give
you a response, but it also might assume that the mistakes are intentional and it can
affect the response.
Check your settings. The temperature and top_p settings control how deterministic the
model is in generating a response. If you're asking it for a response where there's only
one right answer, then you'd want to set these settings to lower values. If you're looking
for a response that's not obvious, then you might want to set them to higher values. The
number one mistake people use with these settings is assuming that they're "cleverness"
or "creativity" controls.
Troubleshooting
If you're having trouble getting the API to perform as expected, follow this checklist:
Classification
To create a text classifier with the API we provide a description of the task and provide a
few examples. In this demonstration we show the API how to classify the sentiment of
Tweets.
Console
Sentiment: Positive
Sentiment: Negative
Sentiment: Positive
Sentiment: Neutral
Sentiment:
Sentiment: Positive
Sentiment: Negative
Sentiment: Positive
Sentiment: Neutral
Tweet text
1: Positive
2: Negative
3: Positive
4: Neutral
5: Positive
Tweet text
1.
After showing the API how tweets are classified by sentiment we then provide it a list of
tweets and then a list of sentiment ratings with the same number index. The API is able
to pick up from the first example how a tweet is supposed to be classified. In the second
example it sees how to apply this to a list of tweets. This allows the API to rate five (and
even more) tweets in just one API call.
It's important to note that when you ask the API to create lists or evaluate text you need
to pay extra attention to your probability settings (Top P or Temperature) to avoid drift.
1. Make sure your probability setting is calibrated correctly by running multiple tests.
2. Don't make your list too long or the API is likely to drift.
Generation
One of the most powerful yet simplest tasks you can accomplish with the API is
generating new ideas or versions of input. You can give the API a list of a few story ideas
and it will try to add to that list. We've seen it create business plans, character
descriptions and marketing slogans just by providing it a handful of examples. In this
demonstration we'll use the API to create more examples for how to use virtual reality in
the classroom:
1. Virtual Mars
2.
All we had to do in this example is provide the API with just a description of what the list
is about and one example. We then prompted the API with the number 2. indicating
that it's a continuation of the list.
Although this is a very simple prompt, there are several details worth noting:
Just like with the classifier, we tell the API up front what the list is about. This helps it
focus on completing the list and not trying to guess what the pattern is behind it.
2. Our example sets the pattern for the rest of the list
Because we provided a one-sentence description, the API is going to try to follow that
pattern for the rest of the items it adds to the list. If we want a more verbose response,
we need to set that up from the start.
3. We prompt the API by adding an incomplete entry
When the API sees 2. and the prompt abruptly ends, the first thing it tries to do is
figure out what should come after it. Since we already had an example with number one
and gave the list a title, the most obvious response is to continue adding items to the
list.
You can improve the quality of the responses by making a longer more diverse list in
your prompt. One way to do that is to start off with one example, let the API generate
more and select the ones that you like best and add them to the list. A few more high-
quality variations can dramatically improve the quality of the responses.
Conversation
The API is extremely adept at carrying on conversations with humans and even with
itself. With just a few lines of instruction, we've seen the API perform as a customer
service chatbot that intelligently answers questions without ever getting flustered or a
wise-cracking conversation partner that makes jokes and puns. The key is to tell the API
how it should behave and then provide a few examples.
Human:
1. We tell the API the intent but we also tell it how to behave
Just like the other
prompts, we cue the API into what the example represents, but we also add another key
detail: we give it explicit instructions on how to interact with the phrase "The assistant is
helpful, creative, clever, and very friendly."
Without that instruction the API might stray and mimic the human it's interacting with
and become sarcastic or some other behavior we want to avoid.
2. We give the API an identity
At the start we have the API respond as an AI that was
created by OpenAI. While the API has no intrinsic identity, this helps it respond in a way
that's as close to the truth as possible. You can use identity in other ways to create other
kinds of chatbots. If you tell the API to respond as a woman who works as a research
scientist in biology, you'll get intelligent and thoughtful comments from the API similar
to what you'd expect from someone with that background.
In this example we create a chatbot that is a bit sarcastic and reluctantly answers
questions:
###
Marv: This again? There are 2.2 pounds in a kilogram. Please make a note of
this.
###
Marv: Was Google too busy? Hypertext Markup Language. The T is for try to
ask better questions in the future.
###
Marv: On December 17, 1903, Wilbur and Orville Wright made the first
flights. I wish they'd come and take me away.
###
Marv:
Transformation
The API is a language model that is familiar with a variety of ways that words and
characters can be used to express information. This ranges from natural language text to
code and languages other than English. The API is also able to understand content on a
level that allows it to summarize, convert and express it in different ways.
Translation
In this example we show the API how to convert from English to French:
English: I do not speak French.
English:
This example works because the API already has a grasp of French, so there's no need to
try to teach it this language. Instead, we just need to provide enough examples that API
understands that it's converting from one language to another.
If you want to translate from English to a language the API is unfamiliar with you'd need
to provide it with more examples and a fine-tuned model to do it fluently.
Conversion
In this example we convert the name of a movie into emoji. This shows the adaptability
of the API to picking up patterns and working with other characters.
Batman: 🤵🦇
Transformers: 🚗🤖
Spider-Man: 🕸🕷🕸🕸🕷🕸
Spider-Man:
Summarization
The API is able to grasp the context of text and rephrase it in different ways. In this
example, the API takes a block of text and creates an explanation a child would
understand. This illustrates that the API has a deep grasp of language.
"""
A neutron star is the collapsed core of a massive supergiant star, which had
a total mass of between 10 and 25 solar masses, possibly more if the star
was especially metal-rich.[1] Neutron stars are the smallest and densest
stellar objects, excluding black holes and hypothetical white holes, quark
stars, and strange stars.[2] Neutron stars have a radius on the order of 10
kilometres (6.2 mi) and a mass of about 1.4 solar masses.[3] They result
from the supernova explosion of a massive star, combined with gravitational
collapse, that compresses the core past white dwarf star density to that of
atomic nuclei.
"""
"""
In this example we place whatever we want summarized between the triple quotes. It's
worth noting that we explain both before and after the text to be summarized what our
intent is and who the target audience is for the summary. This is to keep the API from
drifting after it processes a large block of text.
Completion
While all prompts result in completions, it can be helpful to think of text completion as
its own task in instances where you want the API to pick up where you left off. For
example, if given this prompt, the API will continue the train of thought about vertical
farming. You can lower the temperature setting to keep the API more focused on the
intent of the prompt or increase it to let it go off on a tangent.
This next prompt shows how you can use completion to help write React components.
We send some code to the API, and it's able to continue the rest because it has an
understanding of the React library. We recommend using models from our Codex series
for tasks that involve understanding or generating code. Currently, we support two
Codex models: code-davinci-002 and code-cushman-001 . For more information about
Codex models, see the Codex models section in Models.
Factual responses
The API has a lot of knowledge that it's learned from the data it was trained on. It also
has the ability to provide responses that sound very real but are in fact made up. There
are two ways to limit the likelihood of the API making up an answer.
2. Use a low probability and show the API how to say "I don't know"
If the API
understands that in cases where it's less certain about a response that saying "I don't
know" or some variation is appropriate, it will be less inclined to make up answers.
In this example we give the API examples of questions and answers it knows and then
examples of things it wouldn't know and provide question marks. We also set the
probability to zero so the API is more likely to respond with a "?" if there's any doubt.
Q: Who is Batman?
Q: What is torsalplexity?
A: ?
Q: What is Devz9?
A: ?
A: George Lucas is American film director and producer famous for creating
Star Wars.
A: Sacramento.
A: The Moon.
A: ?
Q: What is an atom?
A: ?
Q: What is Kozar-09?
A: ?
Q:
Learn more about generating code completions, with the working with code guide
Next steps
Learn how to work with code (Codex).
Learn more about the underlying models that
power Azure OpenAI.
How to switch between OpenAI and
Azure OpenAI endpoints with Python
Article • 07/07/2023
While OpenAI and Azure OpenAI Service rely on a common Python client library , there
are small changes you need to make to your code in order to swap back and forth
between endpoints. This article walks you through the common changes and differences
you'll experience when working across OpenAI and Azure OpenAI.
7 Note
This library is maintained by OpenAI and is currently in preview. Refer to the release
history or the version.py commit history to track the latest updates to the
library.
Authentication
We recommend using environment variables. If you haven't done this before our Python
quickstarts walk you through this configuration.
API key
Python Python
import openai
import openai
"sk-..."
openai.api_key = "..."
import openai
import openai
openai.api_key =
"sk-..."
credential = DefaultAzureCredential()
openai.organizat token =
ion = "..."
credential.get_token("https://cognitiveservices.az
ure.com/.default")
openai.api_type = "azuread"
openai.api_key = token.token
openai.api_base = "https://example-
endpoint.openai.azure.com"
For OpenAI engine still works in most instances, but it's deprecated and model is
preferred.
Python Python
completion = completion =
openai.Completion.create(
openai.Completion.create(
prompt="<prompt>",
prompt="<prompt>",
model="text-davinci-003"
deployment_id="text-davinci-
)
003"
#engine="text-davinci-003"
chat_completion = )
openai.ChatCompletion.create(
messages="<messages>",
chat_completion =
model="gpt-4"
openai.ChatCompletion.create(
)
messages="<messages>",
deployment_id="gpt-4"
embedding = #engine="gpt-4"
openai.Embedding.create(
input="<input>",
)
model="text-embedding-ada-002"
)
embedding =
openai.Embedding.create(
input="<input>",
deployment_id="text-embedding-
ada-002"
#engine="text-embedding-ada-
002"
Python Python
openai.Embedding.create(
embedding =
input=inputs,
openai.Embedding.create(
model="text-embedding-ada- input=text,
002"
deployment_id="text-
)
embedding-ada-002"
#engine="text-embedding-ada-
002"
Next steps
Learn more about how to work with GPT-35-Turbo and the GPT-4 models with our
how-to guide.
For more examples, check out the Azure OpenAI Samples GitHub repository
How to configure content filters with
Azure OpenAI Service
Article • 06/09/2023
7 Note
All customers have the ability to modify the content filters to be stricter (for
example, to filter content at lower severity levels than the default). Approval is
required for full content filtering control, including (i) configuring content filters at
severity level high only (ii) or turning the content filters off. Managed customers
only may apply for full content filtering control via this form: Azure OpenAI Limited
Access Review: Modified Content Filters and Abuse Monitoring
(microsoft.com) .
The content filtering system integrated into Azure OpenAI Service runs alongside the
core models and uses an ensemble of multi-class classification models to detect four
categories of harmful content (violence, hate, sexual, and self-harm) at four severity
levels respectively (safe, low, medium, and high). The default content filtering
configuration is set to filter at the medium severity threshold for all four content harms
categories for both prompts and completions. That means that content that is detected
at severity level medium or high is filtered, while content detected at severity level low
or safe is not filtered by the content filters. Learn more about content categories,
severity levels, and the behavior of the content filtering system here.
Content filters can be configured at resource level. Once a new configuration is created,
it can be associated with one or more deployments. For more information about model
deployment, see the resource deployment guide.
The configurability feature is available in preview and allows customers to adjust the
settings, separately for prompts and completions, to filter content for each content
category at different severity levels as described in the table below. Content detected at
the 'safe' severity level is labeled in annotations but is not subject to filtering and is not
configurable.
Medium, Yes Yes Default setting. Content detected at severity level low
high is not filtered, content at medium and high is filtered.
High If approved* If approved* Content detected at severity levels low and medium
is not filtered. Only content at severity level high is
filtered. Requires approval*.
* Only approved customers have full content filtering control, including configuring
content filters at severity level high only or turning the content filters off. Managed
customers only can apply for full content filtering control via this form: Azure OpenAI
Limited Access Review: Modified Content Filters and Abuse Monitoring
(microsoft.com)
1. Go to Azure OpenAI Studio and navigate to the Content Filters tab (in the bottom
left navigation, as designated by the red box below).
This leads to the following configuration view, where you can choose a name for
the custom content filtering configuration.
3. This is the view of the default content filtering configuration, where content is
filtered at medium and high severity levels for all categories. You can modify the
content filtering severity level for both prompts and completions separately
(configuration for prompts is in the left column and configuration for completions
is in the right column, as designated with the blue boxes below) for each of the
four content categories (content categories are listed on the left side of the screen,
as designated with the green box below). There are three severity levels for each
category that are partially or fully configurable: Low, medium, and high (labeled at
the top of each column, as designated with the red box below).
4. If you determine that your application or usage scenario requires stricter filtering
for some or all content categories, you can configure the settings, separately for
prompts and completions, to filter at more severity levels than the default setting.
An example is shown in the image below, where the filtering level for user prompts
is set to the strictest configuration for hate and sexual, with low severity content
filtered along with content classified as medium and high severity (outlined in the
red box below). In the example, the filtering levels for model completions are set at
the strictest configuration for all content categories (blue box below). With this
modified filtering configuration in place, low, medium, and high severity content
will be filtered for the hate and sexual categories in user prompts; medium and
high severity content will be filtered for the self-harm and violence categories in
user prompts; and low, medium, and high severity content will be filtered for all
content categories in model completions.
5. If your use case was approved for modified content filters as outlined above, you
will receive full control over content filtering configurations. With full control, you
can choose to turn filtering off, or filter only at severity level high, while accepting
low and medium severity content. In the image below, filtering for the categories
of self-harm and violence is turned off for user prompts (red box below), while
default configurations are retained for other categories for user prompts. For
model completions, only high severity content is filtered for the category self-harm
(blue box below), and filtering is turned off for violence (green box below), while
default configurations are retained for other categories.
You can create multiple content filtering configurations as per your requirements.
7. Go to advanced options (outlined in the blue box below) select the content filter
configuration suitable for that deployment from the Content Filter dropdown
(outlined near the bottom of the dialog box in the red box below).
8. Select Save and close to apply the selected configuration to the deployment.
9. You can also edit and delete a content filter configuration if required. To do this,
navigate to the content filters tab and select the desired action (options outlined
near the top of the screen in the red box below). You can edit/delete only one
filtering configuration at a time.
7 Note
Best practices
We recommend informing your content filtering configuration decisions through an
iterative identification (for example, red team testing, stress-testing, and analysis) and
measurement process to address the potential harms that are relevant for a specific
model, application, and deployment scenario. After implementing mitigations such as
content filtering, repeat measurement to test effectiveness. Recommendations and best
practices for Responsible AI for Azure OpenAI, grounded in the Microsoft Responsible AI
Standard can be found in the Responsible AI Overview for Azure OpenAI.
Next steps
Learn more about Responsible AI practices for Azure OpenAI: Overview of
Responsible AI practices for Azure OpenAI models.
Read more about content filtering categories and severity levels with Azure
OpenAI Service.
Learn more about red teaming from our: Introduction to red teaming large
language models (LLMs) article.
Codex models and Azure OpenAI
Service
Article • 02/17/2023
The Codex model series is a descendant of our GPT-3 series that's been trained on both
natural language and billions of lines of code. It's most capable in Python and proficient
in over a dozen languages including C#, JavaScript, Go, Perl, PHP, Ruby, Swift,
TypeScript, SQL, and even Shell.
"""
"""
"""
"""
"""
"""
query =
// Function 1
fullNames.push(names[Math.floor(Math.random() * names.length)]
Best practices
To get Codex to create a useful completion, it's helpful to think about what information
a programmer would need to perform a task. This could simply be a clear comment or
the data needed to write a useful function, like the names of variables or what class a
function handles.
In this example we tell Codex what to call the function and what task it's going to
perform.
Python
This approach scales even to the point where you can provide Codex with a comment
and an example of a database schema to get it to write useful query requests for various
databases. Here's an example where we provide the columns and table names for the
query.
Python
When you show Codex the database schema, it's able to make an informed guess about
how to format a query.
# R language
Python
# Python 3
HTML
<!-- Create a web page with the title 'Kat Katman attorney at paw' -->
<!DOCTYPE html>
Placing <!DOCTYPE html> after our comment makes it very clear to Codex what we want
it to do.
Or if we want to write a function we could start the prompt as follows and Codex will
understand what it needs to do next.
Python
def counter
HTML
By specifying the version, you can make sure Codex uses the most current library.
7 Note
Codex can suggest helpful libraries and APIs, but always be sure to do your own
research to make sure that they're safe for your application.
Python
"""
"""
Python
def getUserBalance(id):
"""
Look up the user in the database ‘UserData' and return their current
account balance.
"""
Python
"""
"""
In cases where you need Codex to provide different potential results, start at zero and
then increment upwards by 0.1 until you find suitable variation.
JavaScript
JavaScript
Asking Codex to perform this task will produce a table like this:
JavaScript
var weather = [
];
JavaScript
/*
Use the lists to generate stories about what I saw at the zoo in each city
*/
Users can combine streaming with duplication to reduce latency by requesting more
than one solution from the API, and using the first response returned. Do this by setting
n > 1 . This approach consumes more token quota, so use carefully (for example, by
using reasonable settings for max_tokens and stop ).
JavaScript
SQL
FROM department
GROUP BY department.name
--
Python
# Python 3
return a + b
# Unit test
def
JavaScript
Python
Python
# Python version
[ Python code ]
# End
# R version
JavaScript
input.setAttribute('type', 'text');
document.body.appendChild(input);
var button = document.createElement('button');
document.body.appendChild(button);
button.onclick = function() {
document.body.appendChild(hello);
};
// React version:
Next steps
Learn more about the underlying models that power Azure OpenAI.
Learn how to generate embeddings with
Azure OpenAI
Article • 05/16/2023
console
Console
curl
https://YOUR_RESOURCE_NAME.openai.azure.com/openai/deployments/YOUR_DEPL
OYMENT_NAME/embeddings?api-version=2023-05-15\
-H 'Content-Type: application/json' \
-H 'api-key: YOUR_API_KEY' \
Best practices
labels in zero shot classification. You can read more about all of the Embeddings models
in our Models guide.
Next steps
Learn more about using Azure OpenAI and embeddings to perform document
search with our embeddings tutorial.
Learn more about the underlying models that power Azure OpenAI.
Learn how to prepare your dataset for
fine-tuning
Article • 03/13/2023
The first step of customizing your model is to prepare a high quality dataset. To do this
you'll need a set of training examples composed of single input prompts and the
associated desired output ('completion'). This format is notably different than using
models during inference in the following ways:
Best practices
Customization performs better with high-quality examples and the more you have,
generally the better the model performs. We recommend that you provide at least a few
hundred high-quality examples to achieve a model that performs better than using well-
designed prompts with a base model. From there, performance tends to linearly
increase with every doubling of the number of examples. Increasing the number of
examples is usually the best and most reliable way of improving performance.
If you're fine-tuning on a pre-existing dataset rather than writing prompts from scratch,
be sure to manually review your data for offensive or inaccurate content if possible, or
review as many random samples of the dataset as possible if it's large.
Specific guidelines
Fine-tuning can solve various problems, and the optimal way to use it may depend on
your specific use case. Below, we've listed the most common use cases for fine-tuning
and corresponding guidelines.
Classification
Classifiers are the easiest models to get started with. For classification problems we
suggest using ada, which generally tends to perform only very slightly worse than more
capable models once fine-tuned, while being significantly faster. In classification
problems, each prompt in the dataset should be classified into one of the predefined
classes. For this type of problem, we recommend:
Use a separator at the end of the prompt, for example, \n\n###\n\n . Remember to
also append this separator when you eventually make requests to your model.
Choose classes that map to a single token. At inference time, specify
max_tokens=1 since you only need the first token for classification.
Ensure that the prompt + completion doesn't exceed 2048 tokens, including the
separator
Aim for at least 100 examples per class
To get class log probabilities, you can specify logprobs=5 (for five classes) when
using your model
Ensure that the dataset used for fine-tuning is very similar in structure and type of
task as what the model will be used for
JSON
In the example above, we used a structured input containing the name of the company,
the product, and the associated ad. As a separator we used \nSupported: which clearly
separated the prompt from the completion. With a sufficient number of examples, the
separator you choose doesn't make much of a difference (usually less than 0.4%) as long
as it doesn't appear within the prompt or the completion.
For this use case we fine-tuned an ada model since it is faster and cheaper, and the
performance is comparable to larger models because it's a classification task.
Console
curl
https://YOUR_RESOURCE_NAME.openaiazure.com/openai/deployments/YOUR_DEPLOYMEN
T_NAME/completions?api-version=2023-05-15\ \
-H 'Content-Type: application/json' \
-H 'api-key: YOUR_API_KEY' \
-d '{
"max_tokens": 1
}'
Let's say you'd like to get a degree to which a particular tweet is positive or negative.
The dataset might look something like the following:
Console
Once the model is fine-tuned, you can get back the log probabilities for the first
completion token by setting logprobs=2 on the completion request. The higher the
probability for positive class, the higher the relative sentiment.
Console
curl
https://YOUR_RESOURCE_NAME.openaiazure.com/openai/deployments/YOUR_DEPLOYMEN
T_NAME/completions?api-version=2023-05-15\ \
-H 'Content-Type: application/json' \
-H 'api-key: YOUR_API_KEY' \
-d '{
"max_tokens": 1,
"logprobs": 2
}'
JSON
"object": "text_completion",
"created": 1589498378,
"model": "YOUR_FINE_TUNED_MODEL_NAME",
"choices": [
"logprobs": {
"text_offset": [
19
],
"token_logprobs": [
-0.03597255
],
"tokens": [
" positive"
],
"top_logprobs": [
},
"index": 0,
"finish_reason": "length"
JSON
"prompt":"Subject: <email_subject>\nFrom:<customer_name>\nDate:
<date>\nContent:<email_body>\n\n###\n\n", "completion":"
<numerical_category>"
For example:
JSON
"completion":" 4"
In the example above we used an incoming email capped at 2043 tokens as input. (This
allows for a four token separator and a one token completion, summing up to 2048.) As
a separator we used \n\n###\n\n and we removed any occurrence of ### within the
email.
Conditional generation
Conditional generation is a problem where the content needs to be generated given
some kind of input. This includes paraphrasing, summarizing, entity extraction, product
description writing given specifications, chatbots and many others. For this type of
problem we recommend:
Use a separator at the end of the prompt, for example, \n\n###\n\n . Remember to
also append this separator when you eventually make requests to your model.
Use an ending token at the end of the completion, for example, END .
Remember to add the ending token as a stop sequence during inference, for
example, stop=[" END"] .
Aim for at least ~500 examples.
Ensure that the prompt + completion doesn't exceed 2048 tokens, including the
separator.
Ensure the examples are of high quality and follow the same desired format.
Ensure that the dataset used for fine-tuning is similar in structure and type of task
as what the model will be used for.
Using Lower learning rate and only 1-2 epochs tends to work better for these use
cases.
JSON
For example:
JSON
JSON
For example:
JSON
"prompt":"Portugal will be removed from the UK's green travel list from
Tuesday, amid rising coronavirus cases and concern over a \"Nepal mutation
of the so-called Indian variant\". It will join the amber list, meaning
holidaymakers should not visit and returnees must isolate for 10
days...\n\n###\n\n",
A multi-line separator works best, as the text will likely contain multiple lines. Ideally
there will be a high diversity of the types of input prompts (news articles, Wikipedia
pages, tweets, legal documents), which reflect the likely texts which will be encountered
when extracting entities.
A chatbot will normally contain relevant context about the conversation (order details),
summary of the conversation so far, and most recent messages. For this use case the
same past conversation can generate multiple rows in the dataset, each time with a
slightly different context, for every agent generation as a completion. This use case
requires a few thousand examples, as it likely deals with different types of requests, and
customer issues. To ensure the performance is of high quality, we recommend vetting
the conversation samples to ensure the quality of agent messages. The summary can be
generated with a separate text transformation fine tuned model. The dataset could look
as follows:
JSON
Here it's important to convert the input data into a natural language, which will likely
lead to superior performance. For example, the following format:
JSON
JSON
For high performance, ensure that the completions were based on the description
provided. If external content is often consulted, then adding such content in an
automated way would improve the performance. If the description is based on images,
it may help to use an algorithm to extract a textual description of the image. Since
completions are only one sentence long, we can use . as the stop sequence during
inference.
Many companies have a large amount of high quality content generated in a specific
voice. Ideally all generations from our API should follow that voice for the different use
cases. Here we can use the trick of leaving the prompt empty, and feeding in all the
documents which are good examples of the company voice. A fine-tuned model can be
used to solve many different use cases with similar prompts to the ones used for base
models, but the outputs are going to follow the company voice much more closely than
previously.
JSON
A similar technique could be used for creating a virtual character with a particular
personality, style of speech and topics the character talks about.
Generative tasks have a potential to leak training data when requesting completions
from the model, so extra care needs to be taken that this is addressed appropriately. For
example personal or sensitive company information should be replaced by generic
information or not be included into fine-tuning in the first place.
Next steps
Fine tune your model with our How-to guide
Learn more about the underlying models that power Azure OpenAI Service
Learn how to customize a model for
your application
Article • 04/05/2023
Azure OpenAI Service lets you tailor our models to your personal datasets using a
process known as fine-tuning. This customization step will let you get more out of the
service by providing:
Higher quality results than what you can get just from prompt design
The ability to train on more examples than can fit into a prompt
Lower-latency requests
7 Note
There is a breaking change in the create fine tunes command in the latest 12-01-
2022 GA API. For the latest command syntax consult the reference documentation
Prerequisites
An Azure subscription - Create one for free
Currently, access to this service is granted only by application. You can apply for
access to Azure OpenAI by completing the form at https://aka.ms/oai/access .
Open an issue on this repo to contact us if you have an issue.
For more information about creating a resource, see Create a resource and deploy
a model using Azure OpenAI.
Fine-tuning workflow
The fine-tuning workflow in Azure OpenAI Studio requires the following steps:
The training and validation data you use must be formatted as a JSON Lines (JSONL)
document in which each line represents a single prompt-completion pair. The OpenAI
command-line interface (CLI) includes a data preparation tool that validates, gives
suggestions, and reformats your training data into a JSONL file ready for fine-tuning.
JSON
In addition to the JSONL format, training and validation data files must be encoded in
UTF-8 and include a byte-order mark (BOM), and the file must be less than 200 MB in
size. For more information about formatting your training data, see Learn how to
prepare your dataset for fine-tuning.
The more training examples you have, the better. We recommend having at least 200
training examples. In general, we've found that each doubling of the dataset size leads
to a linear increase in model quality.
For more information about preparing training data for various tasks, see Learn how to
prepare your dataset for fine-tuning.
Console
To analyze your training data with the data preparation tool, run the following Python
command, replacing <LOCAL_FILE> with the full path and file name of the training data
file to be analyzed:
Console
This tool accepts files in the following data formats, if they contain a prompt and a
completion column/key:
The tool reformats your training data and saves output into a JSONL file ready for fine-
tuning, after guiding you through the process of implementing suggested changes.
Use the Create customized model wizard
Azure OpenAI Studio provides the Create customized model wizard, so you can
interactively create and train a fine-tuned model for your Azure resource.
Landing page
You'll first land on our main page for Azure OpenAI Studio. From here, you can start
fine-tuning a custom model.
Select the Start fine-tuning a custom model button under Manage deployments and
models section of the landing page, highlighted in the following picture, to start fine-
tuning a custom model.
7 Note
ada
babbage
curie
code-cushman-001 *
davinci *
For more information about our base models that can be fine-tuned, see Models. Select
a base model from the Base model type dropdown, as shown in the following picture,
and then select Next to continue.
If your training data has already been uploaded to the service, select Choose dataset,
and then select the file from the list shown in the Training data pane. Otherwise, select
either Local file to upload training data from a local file, or Azure blob or other shared
web locations to import training data from Azure Blob or another shared web location.
For large data files, we recommend you import from an Azure Blob store. Large files can
become unstable when uploaded through multipart forms because the requests are
atomic and can't be retried or resumed. For more information about Azure Blob storage,
see What is Azure Blob storage?
7 Note
Training data files must be formatted as JSONL files, encoded in UTF-8 with a byte-
order mark (BOM), and less than 200 MB in size.
Drag and drop the file into the client area of the Training data pane, and then
select Upload file
Select Browse for a file from the client area of the Training data pane, choose the
file to upload from the Open dialog, and then select Upload file.
After you've selected and uploaded the training dataset, select Next to optionally
choose your validation data.
You can import a training dataset from Azure Blob or another shared web location by
providing the name and location of the file, as shown in the following picture. Enter the
name of the file in File name and the Azure Blob URL, Azure Storage shared access
signature (SAS), or other link to an accessible shared web location that contains the file
in File location, then select Upload file to import the training dataset to the service.
After you've selected and uploaded the training dataset, select Next to optionally
choose your validation data.
You can now choose to optionally use validation data in the training process of your
fine-tuned model. If you don't want to use validation data, you can choose Next to
choose advanced options for your model. Otherwise, if you have a validation dataset,
you can either choose existing prepared validation data or upload new prepared
validation data to use when customizing your model. The Validation data pane, shown
in the following picture, displays any existing, previously uploaded training and
validation datasets and provides options by which you can upload new validation data.
If your validation data has already been uploaded to the service, select Choose dataset,
and then select the file from the list shown in the Validation data pane. Otherwise,
select either Local file to upload validation data from a local file, or Azure blob or other
shared web locations to import validation data from Azure Blob or another shared web
location.
For large data files, we recommend you import from an Azure Blob store. Large files can
become unstable when uploaded through multipart forms because the requests are
atomic and can't be retried or resumed.
7 Note
Like training data files, validation data files must be formatted as JSONL files,
encoded in UTF-8 with a byte-order mark (BOM), and less than 200 MB in size.
You can upload a new validation dataset to the service from a local file by using one of
the following methods:
Drag and drop the file into the client area of the Validation data pane, and then
select Upload file
Select Browse for a file from the client area of the Validation data pane, choose
the file to upload from the Open dialog, and then select Upload file.
After you've uploaded the validation dataset, select Next to optionally choose advanced
options.
You can import a validation dataset from Azure Blob or another shared web location by
providing the name and location of the file, as shown in the following picture. Enter the
name of the file in File name and the Azure Blob URL, Azure Storage shared access
signature (SAS), or other link to an accessible shared web location that contains the file
in File location, then select Upload file to import the validation dataset to the service.
After you've imported the validation dataset, select Next to optionally choose advanced
options.
Either select Default to use the default values for the fine-tune job, or select Advanced
to display and edit the hyperparameter values, as shown in the following picture.
The following hyperparameters are available:
Parameter Description
name
Number The number of epochs to train the model for. An epoch refers to one full cycle
of epochs through the training dataset.
Batch size The batch size to use for training. The batch size is the number of training examples
used to train a single forward and backward pass.
Learning The learning rate multiplier to use for training. The fine-tuning learning rate is the
rate original learning rate used for pre-training, multiplied by this value.
multiplier
Prompt The weight to use for loss on the prompt tokens. This value controls how much the
loss model tries to learn to generate the prompt (as compared to the completion, which
weight always has a weight of 1.0.) Increasing this value can add a stabilizing effect to
training when completions are short.
For more information about these hyperparameters, see the Create a Fine tune job
section of the REST API documentation.
After you've chosen either default or advanced options, select Next to review your
choices and train your fine-tuned model.
If you're ready to train your model, select Save and close to start the fine-tune job and
return to the Models page.
After you've started a fine-tune job, it may take some time to complete. Your job may be
queued behind other jobs on our system, and training your model can take minutes or
hours depending on the model and dataset size. You can check the status of the fine-
tune job for your customized model in the Status column of the Customized models
tab on the Models page, and you can select Refresh to update the information on that
page.
You can also select the name of the model from the Model name column of the Models
page to display more information about your customized model, including the status of
the fine-tune job, training results, training events, and hyperparameters used in the job.
You can select the Refresh button to refresh the information for your model, as shown in
the following picture.
From the model page, you can also select Download training file to download the
training data you used for the model, or select Download results to download the result
file attached to the fine-tune job for your model and analyze your customized model for
training and validation performance.
7 Note
To deploy your customized model, select the customized model to be deployed and
then select Deploy model, as shown in the following picture.
The Deploy model dialog is presented, in which you can provide a name for the
deployment of your customized model. Enter a name in Deployment name and then
select Create to start the deployment of your customized model.
You can monitor the progress of your deployment from the Deployments pane of Azure
OpenAI Studio.
7 Note
The result file is a CSV file containing a header row and a row for each training step
performed by the fine-tune job. The result file contains the following columns:
elapsed_tokens The number of tokens the customized model has seen so far,
including repeats.
Column name Description
elapsed_examples The number of examples the model has seen so far, including
repeats.
For example, if the batch size is set to 3 and your data contains
completions [[1, 2], [0, 5], [4, 2]] , this value is set to 0.67
(2 of 3) if the model predicted [[1, 1], [0, 5], [4, 2]] .
For example, if the batch size is set to 3 and your data contains
completions [[1, 2], [0, 5], [4, 2]] , this value is set to 0.83
(5 of 6) if the model predicted [[1, 1], [0, 5], [4, 2]] .
For example, if the batch size is set to 3 and your data contains
completions [[1, 2], [0, 5], [4, 2]] , this value is set to 0.67
(2 of 3) if the model predicted [[1, 1], [0, 5], [4, 2]] .
For example, if the batch size is set to 3 and your data contains
completions [[1, 2], [0, 5], [4, 2]] , this value is set to 0.83
(5 of 6) if the model predicted [[1, 1], [0, 5], [4, 2]] .
) Important
You can delete the deployment for your customized model from the Deployments page
for Azure OpenAI Studio. Select the deployment to delete, and then select Delete to
delete the deployment.
7 Note
You cannot delete a customized model if it has an existing deployment. You must
first delete your model deployment before you can delete your customized model.
Next steps
Explore the full REST API Reference documentation to learn more about all the
fine-tuning capabilities. You can find the full REST documentation here.
Explore more of the Python SDK operations here .
How to configure Azure OpenAI Service
with managed identities
Article • 05/10/2023
More complex security scenarios require Azure role-based access control (Azure RBAC).
This document covers how to authenticate to your OpenAI resource using Azure Active
Directory (Azure AD).
In the following sections, you'll use the Azure CLI to assign roles, and obtain a bearer
token to call the OpenAI resource. If you get stuck, links are provided in each section
with all available options for each command in Azure Cloud Shell/Azure CLI.
Prerequisites
An Azure subscription - Create one for free
Access granted to the Azure OpenAI service in the desired Azure subscription
Currently, access to this service is granted only by application. You can apply for
access to Azure OpenAI by completing the form at https://aka.ms/oai/access .
Open an issue on this repo to contact us if you have an issue.
Azure CLI
az login
Azure CLI
Azure CLI
7 Note
3. Acquire an Azure AD access token. Access tokens expire in one hour. you'll then
need to acquire another one.
Azure CLI
Bash
curl ${endpoint%/}/openai/deployments/YOUR_DEPLOYMENT_NAME/completions?
api-version=2023-05-15 \
-H "Content-Type: application/json" \
Azure portal
Azure PowerShell
Azure CLI
Azure Resource Manager template
Azure Resource Manager client libraries
For more information about managed identities, see Managed identities for Azure
resources.
Manage Azure OpenAI Service quota
Article • 06/09/2023
Quota provides the flexibility to actively manage the allocation of rate limits across the
deployments within your subscription. This article walks through the process of
managing your Azure OpenAI quota.
Introduction to quota
Azure OpenAI's quota feature enables assignment of rate limits to your deployments,
up-to a global limit called your “quota.” Quota is assigned to your subscription on a per-
region, per-model basis in units of Tokens-per-Minute (TPM). When you onboard a
subscription to Azure OpenAI, you'll receive default quota for most available models.
Then, you'll assign TPM to each deployment as it is created, and the available quota for
that model will be reduced by that amount. You can continue to create deployments
and assign them TPM until you reach your quota limit. Once that happens, you can only
create new deployments of that model by reducing the TPM assigned to other
deployments of the same model (thus freeing TPM for use), or by requesting and being
approved for a model quota increase in the desired region.
7 Note
With a quota of 240,000 TPM for GPT-35-Turbo in East US, a customer can create a
single deployment of 240K TPM, 2 deployments of 120K TPM each, or any number
of deployments in one or multiple Azure OpenAI resources as long as their TPM
adds up to less than 240K total in that region.
When a deployment is created, the assigned TPM will directly map to the tokens-per-
minute rate limit enforced on its inferencing requests. A Requests-Per-Minute (RPM)
rate limit will also be enforced whose value is set proportionally to the TPM assignment
using the following ratio:
The flexibility to distribute TPM globally within a subscription and region has allowed
Azure OpenAI Service to loosen other restrictions:
To create a new deployment from within the Azure AI Studio under Management select
Deployments > Create new deployment.
The option to set the TPM is under the Advanced options drop-down:
Post deployment you can adjust your TPM allocation by selecting Edit deployment
under Management > Deployments in Azure AI Studio. You can also modify this
selection within the new quota management experience under Management > Quotas.
) Important
Quotas and limits are subject to change, for the most up-date-information consult
our quotas and limits article.
GPT-4
GPT-4-32K
Text-Davinci-003
7 Note
Quota Tokens-Per-Minute (TPM) allocation is not related to the max input token
limit of a model. Model input token limits are defined in the models table and are
not impacted by changes made to TPM.
Quota Name: There's one quota value per region for each model type. The quota
covers all versions of that model. The quota name can be expanded in the UI to
show the deployments that are using the quota.
Deployment: Model deployments divided by model class.
Usage/Limit: For the quota name, this shows how much quota is used by
deployments and the total quota approved for this subscription and region. This
amount of quota used is also represented in the bar graph.
Request Quota: The icon in this field navigates to a form where requests to
increase quota can be submitted.
RPM rate limits are based on the number of requests received over time. The rate limit
expects that requests be evenly distributed over a one-minute period. If this average
flow isn't maintained, then requests may receive a 429 response even though the limit
isn't met when measured over the course of a minute. To implement this behavior, Azure
OpenAI Service evaluates the rate of incoming requests over a small period of time,
typically 1 or 10 seconds. If the number of requests received during that time exceeds
what would be expected at the set RPM limit, then new requests will receive a 429
response code until the next evaluation period. For example, if Azure OpenAI is
monitoring request rate on 1-second intervals, then rate limiting will occur for a 600-
RPM deployment if more than 10 requests are received during each 1-second period
(600 requests per minute = 10 requests per second).
Set max_tokens and best_of to the minimum values that serve the needs of your
scenario. For example, don’t set a large max-tokens value if you expect your
responses to be small.
Use quota management to increase TPM on deployments with high traffic, and to
reduce TPM on deployments with limited needs.
Implement retry logic in your application.
Avoid sharp changes in the workload. Increase the workload gradually.
Test different load increase patterns.
Next steps
To review quota defaults for Azure OpenAI, consult the quotas & limits article
Configure Azure Cognitive Services
virtual networks
Article • 07/04/2023
Azure Cognitive Services provides a layered security model. This model enables you to
secure your Cognitive Services accounts to a specific subset of networks. When network
rules are configured, only applications requesting data over the specified set of
networks can access the account. You can limit access to your resources with request
filtering. Allowing only requests originating from specified IP addresses, IP ranges or
from a list of subnets in Azure Virtual Networks.
An application that accesses a Cognitive Services resource when network rules are in
effect requires authorization. Authorization is supported with Azure Active Directory
(Azure AD) credentials or with a valid API key.
) Important
Turning on firewall rules for your Cognitive Services account blocks incoming
requests for data by default. In order to allow requests through, one of the
following conditions needs to be met:
The request should originate from a service operating within an Azure Virtual
Network (VNet) on the allowed subnet list of the target Cognitive Services
account. The endpoint in requests originated from VNet needs to be set as
the custom subdomain of your Cognitive Services account.
Or the request should originate from an allowed list of IP addresses.
Requests that are blocked include those from other Azure services, from the Azure
portal, from logging and metrics services, and so on.
7 Note
We recommend that you use the Azure Az PowerShell module to interact with
Azure. See Install Azure PowerShell to get started. To learn how to migrate to the
Az PowerShell module, see Migrate Azure PowerShell from AzureRM to Az.
Scenarios
To secure your Cognitive Services resource, you should first configure a rule to deny
access to traffic from all networks (including internet traffic) by default. Then, you should
configure rules that grant access to traffic from specific VNets. This configuration
enables you to build a secure network boundary for your applications. You can also
configure rules to grant access to traffic from select public internet IP address ranges,
enabling connections from specific internet or on-premises clients.
Network rules are enforced on all network protocols to Azure Cognitive Services,
including REST and WebSocket. To access data using tools such as the Azure test
consoles, explicit network rules must be configured. You can apply network rules to
existing Cognitive Services resources, or when you create new Cognitive Services
resources. Once network rules are applied, they're enforced for all requests.
" Anomaly Detector
" Azure OpenAI
" Computer Vision
" Content Moderator
" Custom Vision
" Face
" Language Understanding (LUIS)
" Personalizer
" Speech service
" Language service
" QnA Maker
" Translator Text
7 Note
If you're using, Azure OpenAI, LUIS, Speech Services, or Language services, the
CognitiveServicesManagement tag only enables you use the service using the SDK
or REST API. To access and use Azure OpenAI Studio, LUIS portal , Speech Studio or
Language Studio from a virtual network, you will need to use the following tags:
AzureActiveDirectory
AzureFrontDoor.Frontend
AzureResourceManager
CognitiveServicesManagement
CognitiveServicesFrontEnd
2 Warning
Making changes to network rules can impact your applications' ability to connect
to Azure Cognitive Services. Setting the default network rule to deny blocks all
access to the data unless specific network rules that grant access are also applied.
Be sure to grant access to any allowed networks using network rules before you
change the default rule to deny access. If you are allow listing IP addresses for your
on-premises network, be sure to add all possible outgoing public IP addresses from
your on-premises network.
Azure portal
4. To allow traffic from all networks, choose to allow access from All networks.
Enable a service endpoint for Azure Cognitive Services within the VNet. The service
endpoint routes traffic from the VNet through an optimal path to the Azure Cognitive
Services service. The identities of the subnet and the virtual network are also transmitted
with each request. Administrators can then configure network rules for the Cognitive
Services resource that allow requests to be received from specific subnets in a VNet.
Clients granted access via these network rules must continue to meet the authorization
requirements of the Cognitive Services resource to access the data.
Each Cognitive Services resource supports up to 100 virtual network rules, which may be
combined with IP network rules.
Required permissions
To apply a virtual network rule to a Cognitive Services resource, the user must have the
appropriate permissions for the subnets being added. The required permission is the
default Contributor role, or the Cognitive Services Contributor role. Required permissions
can also be added to custom role definitions.
Cognitive Services resource and the virtual networks granted access may be in different
subscriptions, including subscriptions that are a part of a different Azure AD tenant.
7 Note
Configuration of rules that grant access to subnets in virtual networks that are a
part of a different Azure Active Directory tenant are currently only supported
through PowerShell, CLI and REST APIs. Such rules cannot be configured through
the Azure portal, though they may be viewed in the portal.
Azure portal
5. Select the Virtual networks and Subnets options, and then select Enable.
6. To create a new virtual network and grant it access, select Add new virtual
network.
7. Provide the information necessary to create the new virtual network, and then
select Create.
7 Note
) Important
Be sure to set the default rule to deny, or network rules have no effect.
Provide allowed internet address ranges using CIDR notation in the form
16.17.18.0/24 or as individual IP addresses like 16.17.18.19 .
Tip
Small address ranges using "/31" or "/32" prefix sizes are not supported. These
ranges should be configured using individual IP address rules.
IP network rules are only allowed for public internet IP addresses. IP address ranges
reserved for private networks (as defined in RFC 1918 ) aren't allowed in IP rules.
Private networks include addresses that start with 10.* , 172.16.* - 172.31.* , and
192.168.* .
Only IPV4 addresses are supported at this time. Each Cognitive Services resource
supports up to 100 IP network rules, which may be combined with Virtual network rules.
If you're using ExpressRoute on-premises for public peering or Microsoft peering, you
need to identify the NAT IP addresses. For public peering, each ExpressRoute circuit by
default uses two NAT IP addresses. Each is applied to Azure service traffic when the
traffic enters the Microsoft Azure network backbone. For Microsoft peering, the NAT IP
addresses that are used are either customer provided or are provided by the service
provider. To allow access to your service resources, you must allow these public IP
addresses in the resource IP firewall setting. To find your public peering ExpressRoute
circuit IP addresses, open a support ticket with ExpressRoute via the Azure portal.
Learn more about NAT for ExpressRoute public and Microsoft peering.
Azure portal
) Important
Be sure to set the default rule to deny, or network rules have no effect.
Use private endpoints
You can use private endpoints for your Cognitive Services resources to allow clients on a
virtual network (VNet) to securely access data over a Private Link. The private endpoint
uses an IP address from the VNet address space for your Cognitive Services resource.
Network traffic between the clients on the VNet and the resource traverses the VNet
and a private link on the Microsoft backbone network, eliminating exposure from the
public internet.
Secure your Cognitive Services resource by configuring the firewall to block all
connections on the public endpoint for the Cognitive Services service.
Increase security for the VNet, by enabling you to block exfiltration of data from
the VNet.
Securely connect to Cognitive Services resources from on-premises networks that
connect to the VNet using VPN or ExpressRoutes with private-peering.
Conceptual overview
A private endpoint is a special network interface for an Azure resource in your VNet.
Creating a private endpoint for your Cognitive Services resource provides secure
connectivity between clients in your VNet and your resource. The private endpoint is
assigned an IP address from the IP address range of your VNet. The connection between
the private endpoint and the Cognitive Services service uses a secure private link.
Applications in the VNet can connect to the service over the private endpoint
seamlessly, using the same connection strings and authorization mechanisms that they
would use otherwise. The exception is the Speech Services, which require a separate
endpoint. See the section on Private endpoints with the Speech Services. Private
endpoints can be used with all protocols supported by the Cognitive Services resource,
including REST.
Private endpoints can be created in subnets that use Service Endpoints. Clients in a
subnet can connect to one Cognitive Services resource using private endpoint, while
using service endpoints to access others.
When you create a private endpoint for a Cognitive Services resource in your VNet, a
consent request is sent for approval to the Cognitive Services resource owner. If the user
requesting the creation of the private endpoint is also an owner of the resource, this
consent request is automatically approved.
Cognitive Services resource owners can manage consent requests and the private
endpoints, through the 'Private endpoints' tab for the Cognitive Services resource in the
Azure portal .
Private endpoints
When creating the private endpoint, you must specify the Cognitive Services resource it
connects to. For more information on creating a private endpoint, see:
Create a private endpoint using the Private Link Center in the Azure portal
Create a private endpoint using Azure CLI
Create a private endpoint using Azure PowerShell
7 Note
Azure OpenAI Service uses a different private DNS zone and public DNS zone
forwarder than other Azure Cognitive Services. Refer to the Azure services DNS
zone configuration article for the correct zone and forwader names.
Clients on a VNet using the private endpoint should use the same connection string for
the Cognitive Services resource as clients connecting to the public endpoint. The
exception is the Speech Services, which require a separate endpoint. See the section on
Private endpoints with the Speech Services. We rely upon DNS resolution to
automatically route the connections from the VNet to the Cognitive Services resource
over a private link.
We create a private DNS zone attached to the VNet with the necessary updates for the
private endpoints, by default. However, if you're using your own DNS server, you may
need to make more changes to your DNS configuration. The section on DNS changes
below describes the updates required for private endpoints.
When you resolve the endpoint URL from outside the VNet with the private endpoint, it
resolves to the public endpoint of the Cognitive Services resource. When resolved from
the VNet hosting the private endpoint, the endpoint URL resolves to the private
endpoint's IP address.
This approach enables access to the Cognitive Services resource using the same
connection string for clients in the VNet hosting the private endpoints and clients
outside the VNet.
If you're using a custom DNS server on your network, clients must be able to resolve the
fully qualified domain name (FQDN) for the Cognitive Services resource endpoint to the
private endpoint IP address. Configure your DNS server to delegate your private link
subdomain to the private DNS zone for the VNet.
Tip
When using a custom or on-premises DNS server, you should configure your DNS
server to resolve the Cognitive Services resource name in the 'privatelink'
subdomain to the private endpoint IP address. You can do this by delegating the
'privatelink' subdomain to the private DNS zone of the VNet, or configuring the
DNS zone on your DNS server and adding the DNS A records.
For more information on configuring your own DNS server to support private endpoints,
see the following articles:
Pricing
For pricing details, see Azure Private Link pricing .
Next steps
Explore the various Azure Cognitive Services
Learn more about Azure Virtual Network Service Endpoints
Use Azure OpenAI with large datasets
Article • 02/17/2023
Azure OpenAI can be used to solve a large number of natural language tasks through
prompting the completion API. To make it easier to scale your prompting workflows
from a few examples to large datasets of examples, we have integrated the Azure
OpenAI service with the distributed machine learning library SynapseML . This
integration makes it easy to use the Apache Spark distributed computing framework
to process millions of prompts with the OpenAI service. This tutorial shows how to apply
large language models at a distributed scale using Azure Open AI and Azure Synapse
Analytics.
Prerequisites
An Azure subscription - Create one for free
Currently, access to this service is granted only by application. You can apply for
access to Azure OpenAI by completing the form at https://aka.ms/oai/access .
Open an issue on this repo to contact us if you have an issue.
1. Download this demo as a notebook (click Raw, then save the file)
2. Import the notebook into the Synapse Workspace or, if using Databricks, into the
Databricks Workspace
3. Install SynapseML on your cluster. See the installation instructions for Synapse at
the bottom of the SynapseML website . This requires pasting another cell at the
top of the notebook you imported
4. Connect your notebook to a cluster and follow along, editing and running the cells
below.
) Important
Remember to remove the key from your code when you're done, and never post it
publicly. For production, use a secure way of storing and accessing your credentials
like Azure Key Vault. See the Cognitive Services security article for more
information.
Python
import os
# Replace the following values with your Azure OpenAI resource information
You can also load data directly from Azure Data Lake Storage (ADLS) or other databases.
For more information about loading and preparing Spark dataframes, see the Apache
Spark data loading guide .
Python
df = spark.createDataFrame(
("SynapseML is ",),
).toDF("prompt")
Python
completion = (
OpenAICompletion()
.setSubscriptionKey(key)
.setDeploymentName(deployment_name)
.setUrl("https://{}.openai.azure.com/".format(resource_name))
.setMaxTokens(200)
.setPromptCol("prompt")
.setErrorCol("error")
.setOutputCol("completions")
Python
from pyspark.sql.functions import col
completed_df = completion.transform(df).cache()
display(completed_df.select(
col("prompt"), col("error"),
col("completions.choices.text").getItem(0).alias("text")))
Your output should look something like the following example; note that the completion
text can vary.
Hello my name is undefined Makaveli I'm eighteen years old and I want to
Angeles, CA
7 Note
Python
batch_df = spark.createDataFrame(
[
(["The time has come", "Pleased to", "Today stocks", "Here's to"],),
(["The only thing", "Ask not what", "Every litter", "I am"],),
).toDF("batchPrompt")
Next we create the OpenAICompletion object. Rather than setting the prompt column, set
the batchPrompt column if your column is of type Array[String] .
Python
batch_completion = (
OpenAICompletion()
.setSubscriptionKey(key)
.setDeploymentName(deployment_name)
.setUrl("https://{}.openai.azure.com/".format(resource_name))
.setMaxTokens(200)
.setBatchPromptCol("batchPrompt")
.setErrorCol("error")
.setOutputCol("completions")
In the call to transform, a request will then be made per row. Because there are multiple
prompts in a single row, each request will be sent with all prompts in that row. The
results will contain a row for each row in the request.
Python
completed_batch_df = batch_completion.transform(batch_df).cache()
display(completed_batch_df)
7 Note
Python
completed_autobatch_df = (df
.mlTransform(FixedMiniBatchTransformer(batchSize=4))
.withColumnRenamed("prompt", "batchPrompt")
.mlTransform(batch_completion))
display(completed_autobatch_df)
Python
translate_df = spark.createDataFrame(
).toDF("prompt")
display(completion.transform(translate_df))
Python
qa_df = spark.createDataFrame(
).toDF("prompt")
display(completion.transform(qa_df))
Azure OpenAI automatically encrypts your data when it's persisted to the cloud. The
encryption protects your data and helps you meet your organizational security and
compliance commitments. This article covers how Azure OpenAI handles encryption of
data at rest, specifically training data and fine-tuned models. For information on how
data provided by you to the service is processed, used, and stored, consult the data,
privacy, and security article.
You must use Azure Key Vault to store your customer-managed keys. You can either
create your own keys and store them in a key vault, or you can use the Azure Key Vault
APIs to generate keys. The Cognitive Services resource and the key vault must be in the
same region and in the same Azure Active Directory (Azure AD) tenant, but they can be
in different subscriptions. For more information about Azure Key Vault, see What is
Azure Key Vault?.
To request the ability to use customer-managed keys, fill out and submit the Cognitive
Services Customer-Managed Key Request Form . It will take approximately 3-5
business days to hear back on the status of your request.
To enable customer-managed keys, you must also enable both the Soft Delete and Do
Not Purge properties on the key vault.
Only RSA keys of size 2048 are supported with Cognitive Services encryption. For more
information about keys, see Key Vault keys in About Azure Key Vault keys, secrets and
certificates.
Specify a key
After you enable customer-managed keys, you can specify a key to associate with the
Cognitive Services resource.
3. Select the desired key, and then select the key to view its versions. Select a key
version to view the settings for that version.
7. Paste the URI that you copied into the Key URI box.
8. Under Subscription, select the subscription that contains the key vault.
3. Select the key vault that contains the key that you want to use.
To change the access policy of the managed identity that your registry uses, run the az-
keyvault-delete-policy command:
Azure CLI
az keyvault delete-policy \
--resource-group <resource-group-name> \
--name <key-vault-name> \
--key_id <key-vault-key-id>
To delete the individual versions of a key, run the az-keyvault-key-delete command. This
operation requires the keys/delete permission.
Azure CLI
--name <key-vault-name> \
--object-id $identityPrincipalID \
) Important
Revoking access to an active customer-managed key while CMK is still enabled will
prevent downloading of training data and results files, fine-tuning new models, and
deploying fine-tuned models. However, previously deployed fine-tuned models will
continue to operate and serve traffic until those deployments are deleted.
When you previously enabled customer managed keys this also enabled a system
assigned managed identity, a feature of Azure AD. Once the system assigned managed
identity is enabled, this resource will be registered with Azure Active Directory. After
being registered, the managed identity will be given access to the Key Vault selected
during customer managed key setup. You can learn more about Managed Identities.
) Important
If you disable system assigned managed identities, access to the key vault will be
removed and any data encrypted with the customer keys will no longer be
accessible. Any features depended on this data will stop working.
) Important
Azure OpenAI is available in multiple regions. Since subscription keys are region bound,
when a customer acquires a key, they select the region in which their deployments will
reside and from then on, all operations stay associated with that Azure server region.
It's rare, but not impossible, to encounter a network issue that hits an entire region. If
your service needs to always be available, then you should design it to either fail-over
into another region or split the workload between two or more regions. Both
approaches require at least two Azure OpenAI resources in different regions. This article
provides general recommendations for how to implement Business Continuity and
Disaster Recovery (BCDR) for your Azure OpenAI applications.
Best practices
Today customers will call the endpoint provided during deployment for both
deployments and inference. These operations are stateless, so no data is lost in the case
that a region becomes unavailable.
Business continuity
The following set of instructions applies both customers using default endpoints and
those using custom endpoints.
1. Use the models page to identify the list of available regions for Azure OpenAI.
2. Select a primary and one secondary/backup regions from the list.
4. For the primary region and any backup regions your code will need to know:
5. Configure your code so that you monitor connectivity errors (typically connection
timeouts and service unavailability errors).
a. Given that networks yield transient errors, for single connectivity issue
occurrences, the suggestion is to retry.
b. For persistence redirect traffic to the backup resource in the region you've
created.
When you have critical applications and business processes relying on Azure resources, you want to
monitor those resources for their availability, performance, and operation.
This article describes the monitoring data generated by Azure OpenAI Service. Azure OpenAI is part of
Cognitive Services, which uses Azure Monitor. If you're unfamiliar with the features of Azure Monitor
common to all Azure services that use it, read Monitoring Azure resources with Azure Monitor.
Monitoring data
Azure OpenAI collects the same kinds of monitoring data as other Azure resources that are described in
Monitoring data from Azure resources.
Resource Logs aren't collected and stored until you create a diagnostic setting and route them to one or
more locations.
See Create diagnostic setting to collect platform logs and metrics in Azure for the detailed process for
creating a diagnostic setting using the Azure portal, CLI, or PowerShell. When you create a diagnostic
setting, you specify which categories of logs to collect.
Keep in mind that using diagnostic settings and sending data to Azure Monitor Logs has additional costs
associated with it. To understand more, consult the Azure Monitor cost calculation guide.
The metrics and logs you can collect are discussed in the following sections.
Analyzing metrics
You can analyze metrics for Azure OpenAI by opening Metrics which can be found underneath the
Monitoring section when viewing your Azure OpenAI resource in the Azure portal. See Getting started
with Azure Metrics Explorer for details on using this tool.
Azure OpenAI is a part of Cognitive Services. For a list of all platform metrics collected for Cognitive
Services and Azure OpenAI, see Cognitive Services supported metrics.
Analyzing logs
Data in Azure Monitor Logs is stored in tables where each table has its own set of unique properties.
All resource logs in Azure Monitor have the same fields followed by service-specific fields. The common
schema is outlined in Azure Monitor resource log schema.
The Activity log is a type of platform log in Azure that provides insight into subscription-level events. You
can view it independently or route it to Azure Monitor Logs, where you can do much more complex
queries using Log Analytics.
For a list of the types of resource logs available for Azure OpenAI and other Cognitive Services, see
Resource provider operations for Cognitive Services
Kusto queries
) Important
When you select Logs from the Azure OpenAI menu, Log Analytics is opened with the query scope
set to the current Azure OpenAI resource. This means that log queries will only include data from that
resource. If you want to run a query that includes data from other resources or data from other Azure
services, select Logs from the Azure Monitor menu. See Log query scope and time range in Azure
Monitor Log Analytics for details.
To explore and get a sense of what type of information is available for your Azure OpenAI resource a
useful query to start with once you have deployed a model and sent some completion calls through the
playground is as follows:
Kusto
AzureDiagnostics
| take 100
Here we return a sample of 100 entries and are displaying a subset of the available columns of data in the
logs. The results are as follows:
If you wish to see all available columns of data, you can remove the scoping that is provided by the |
project line:
Kusto
AzureDiagnostics
| take 100
You can also select the arrow next to the table name to view all available columns and associated data
types.
Kusto
AzureMetrics
| take 100
Alerts
Azure Monitor alerts proactively notify you when important conditions are found in your monitoring data.
They allow you to identify and address issues in your system before your customers notice them. You can
set alerts on metrics, logs, and the activity log. Different types of alerts have different benefits and
drawbacks.
Every organization's alerting needs are going to vary, and will also evolve over time. Generally all alerts
should be actionable, with a specific intended response if the alert occurs. If there's no action for someone
to take, then it might be something you want to capture in a report, but not in an alert. Some use cases
may require alerting anytime certain error conditions exist. But in many environments, it might only be in
cases where errors exceed a certain threshold for a period of time where sending an alert is warranted.
Errors below certain thresholds can often be evaluated through regular analysis of data in Azure Monitor
Logs. As you analyze your log data over time, you may also find that a certain condition not occurring for
a long enough period of time might be valuable to track with alerts. Sometimes the absence of an event in
a log is just as important a signal as an error.
Depending on what type of application you're developing in conjunction with your use of Azure OpenAI,
Azure Monitor Application Insights may offer additional monitoring benefits at the application layer.
Next steps
See Monitoring Azure resources with Azure Monitor for details on monitoring Azure resources.
Read Understand log searches in Azure Monitor logs.
Plan to manage costs for Azure OpenAI
Service
Article • 04/05/2023
This article describes how you plan for and manage costs for Azure OpenAI Service.
Before you deploy the service, you can use the Azure pricing calculator to estimate costs
for Azure OpenAI. Later, as you deploy Azure resources, review the estimated costs.
After you've started using Azure OpenAI resources, use Cost Management features to
set budgets and monitor costs. You can also review forecasted costs and identify
spending trends to identify areas where you might want to act. Costs for Azure OpenAI
Service are only a portion of the monthly costs in your Azure bill. Although this article
explains how to plan for and manage costs for Azure OpenAI, you're billed for all Azure
services and resources used in your Azure subscription, including the third-party
services.
Prerequisites
Cost analysis in Cost Management supports most Azure account types, but not all of
them. To view the full list of supported account types, see Understand Cost
Management data. To view cost data, you need at least read access for an Azure
account. For information about assigning access to Azure Cost Management data, see
Assign access to data.
Our models understand and process text by breaking it down into tokens. For reference,
each token is roughly four characters for typical English text.
Token costs are for both input and output. For example, if you have a 1,000 token
JavaScript code sample that you ask an Azure OpenAI model to convert to Python. You
would be charged approximately 1,000 tokens for the initial input request sent, and
1,000 more tokens for the output that is received in response for a total of 2,000 tokens.
In practice, for this type of completion call the token input/output wouldn't be perfectly
1:1. A conversion from one programming language to another could result in a longer
or shorter output depending on many different factors including the value assigned to
the max_tokens parameter.
Training hours
Hosting hours
Inference per 1,000 tokens
The hosting hours cost is important to be aware of since once a fine-tuned model is
deployed it continues to incur an hourly cost regardless of whether you're actively using
it. Fine-tuned model costs should be monitored closely.
) Important
Monitor costs
As you use Azure resources with Azure OpenAI, you incur costs. Azure resource usage
unit costs vary by time intervals (seconds, minutes, hours, and days) or by unit usage
(bytes, megabytes, and so on.) As soon as Azure OpenAI use starts, costs can be
incurred and you can see the costs in cost analysis.
When you use cost analysis, you view Azure OpenAI costs in graphs and tables for
different time intervals. Some examples are by day, current and prior month, and year.
You also view costs against budgets and forecasted costs. Switching to longer views
over time can help you identify spending trends. And you see where overspending
might have occurred. If you've created budgets, you can also easily see where they're
exceeded.
To understand the breakdown of what makes up that cost, it can help to modify Group
by to Meter and in this case switching the chart type to Line. You can now see that for
this particular resource the source of the costs is from three different model series with
Text-Davinci Tokens representing the bulk of the costs.
It's important to understand scope when evaluating costs associated with Azure OpenAI.
If your resources are part of the same resource group you can scope Cost Analysis at
that level to understand the effect on costs. If your resources are spread across multiple
resource groups you can scope to the subscription level.
However, when scoped at a higher level you often need to add additional filters to be
able to zero in on Azure OpenAI usage. When scoped at the subscription level we see a
number of other resources that we may not care about in the context of Azure OpenAI
cost management. When scoping at the subscription level, we recommend navigating to
the full Cost analysis tool under the Cost Management service. Search for "Cost
Management" in the top Azure search bar to navigate to the full service experience,
which includes more options like creating budgets.
If you try to add a filter by service, you'll find that you can't find Azure OpenAI in the list.
This is because technically Azure OpenAI is part of Cognitive Services so the service level
filter is Cognitive Services, but if you want to see all Azure OpenAI resources across a
subscription without any other type of Cognitive Services resources you need to instead
scope to Service tier: Azure OpenAI:
Create budgets
You can create budgets to manage costs and create alerts that automatically notify
stakeholders of spending anomalies and overspending risks. Alerts are based on
spending compared to budget and cost thresholds. Budgets and alerts are created for
Azure subscriptions and resource groups, so they're useful as part of an overall cost
monitoring strategy.
Budgets can be created with filters for specific resources or services in Azure if you want
more granularity present in your monitoring. Filters help ensure that you don't
accidentally create new resources that cost you additional money. For more information
about the filter options available when you create a budget, see Group and filter
options.
) Important
While OpenAI has an option for hard limits that will prevent you from going over
your budget, Azure OpenAI does not currently provide this functionality. You are
able to kick off automation from action groups as part of your budget notifications
to take more advanced actions, but this requires additional custom development
on your part.
Next steps
Learn how to optimize your cloud investment with Azure Cost Management.
Learn more about managing costs with cost analysis.
Learn about how to prevent unexpected costs.
Take the Cost Management guided learning course.
Tutorial: Explore Azure OpenAI Service
embeddings and document search
Article • 06/14/2023
This tutorial will walk you through using the Azure OpenAI embeddings API to perform
document search where you'll query a knowledge base to find the most relevant
document.
) Important
Prerequisites
An Azure subscription - Create one for free
Access granted to Azure OpenAI in the desired Azure subscription
Currently,
access to this service is granted only by application. You can apply for access to
Azure OpenAI by completing the form at https://aka.ms/oai/access . Open an
issue on this repo to contact us if you have an issue.
Python 3.7.1 or later version
The following Python libraries: openai, num2words, matplotlib, plotly, scipy, scikit-
learn, pandas, tiktoken.
Jupyter Notebooks
An Azure OpenAI resource with the text-embedding-ada-002 (Version 2) model
deployed. This model is currently only available in certain regions. If you don't have
a resource the process of creating one is documented in our resource deployment
guide.
Set up
Python libraries
If you haven't already, you need to install the following libraries:
This tutorial uses the bill_sum_data.csv file that can be downloaded from our GitHub
sample data .
You can also download the sample data by running the following command on your
local machine:
curl "https://raw.githubusercontent.com/Azure-Samples/Azure-OpenAI-Docs-
Samples/main/Samples/Tutorials/Embeddings/data/bill_sum_data.csv" --output
bill_sum_data.csv
ENDPOINT This value can be found in the Keys & Endpoint section when examining your
resource from the Azure portal. Alternatively, you can find the value in Azure OpenAI
Studio > Playground > Code View. An example endpoint is: https://docs-test-
001.openai.azure.com .
API-KEY This value can be found in the Keys & Endpoint section when examining your
resource from the Azure portal. You can use either KEY1 or KEY2 .
Go to your resource in the Azure portal. The Endpoint and Keys can be found in the
Resource Management section. Copy your endpoint and access key as you'll need both
for authenticating your API calls. You can use either KEY1 or KEY2 . Always having two
keys allows you to securely rotate and regenerate keys without causing a service
disruption.
Create and assign persistent environment variables for your key and endpoint.
Environment variables
Command Line
CMD
CMD
After setting the environment variables, you may need to close and reopen Jupyter
notebooks or whatever IDE you're using in order for the environment variables to be
accessible. While we strongly recommend using Jupyter Notebooks, if for some reason
you cannot you'll need to modify any code that is returning a pandas dataframe by
using print(dataframe_name) rather than just calling the dataframe_name directly as is
often done at the end of a code block.
import openai
import os
import re
import requests
import sys
import os
import pandas as pd
import numpy as np
import tiktoken
API_KEY = os.getenv("AZURE_OPENAI_API_KEY")
RESOURCE_ENDPOINT = os.getenv("AZURE_OPENAI_ENDPOINT")
openai.api_type = "azure"
openai.api_key = API_KEY
openai.api_base = RESOURCE_ENDPOINT
openai.api_version = "2022-12-01"
print(r.text)
Output
"data": [
"scale_settings": {
"scale_type": "standard"
},
"model": "text-embedding-ada-002",
"owner": "organization-owner",
"id": "text-embedding-ada-002",
"status": "succeeded",
"created_at": 1657572678,
"updated_at": 1657572678,
"object": "deployment"
},
"scale_settings": {
"scale_type": "standard"
},
"model": "code-cushman-001",
"owner": "organization-owner",
"id": "code-cushman-001",
"status": "succeeded",
"created_at": 1657572712,
"updated_at": 1657572712,
"object": "deployment"
},
"scale_settings": {
"scale_type": "standard"
},
"model": "text-search-curie-doc-001",
"owner": "organization-owner",
"id": "text-search-curie-doc-001",
"status": "succeeded",
"created_at": 1668620345,
"updated_at": 1668620345,
"object": "deployment"
},
"scale_settings": {
"scale_type": "standard"
},
"model": "text-search-curie-query-001",
"owner": "organization-owner",
"id": "text-search-curie-query-001",
"status": "succeeded",
"created_at": 1669048765,
"updated_at": 1669048765,
"object": "deployment"
],
"object": "list"
The output of this command will vary based on the number and type of models you've
deployed. In this case, we need to confirm that we have an entry for text-embedding-
ada-002. If you find that you're missing this model, you'll need to deploy the model to
your resource before proceeding.
Now we need to read our csv file and create a pandas DataFrame. After the initial
DataFrame is created, we can view the contents of the table by running df .
Python
df
Output:
The initial table has more columns than we need we'll create a new smaller DataFrame
called df_bills which will contain only the columns for text , summary , and title .
Python
df_bills
Output:
Next we'll perform some light data cleaning by removing redundant whitespace and
cleaning up the punctuation to prepare the data for tokenization.
Python
# s is input text
s = re.sub(r". ,","",s)
s = s.replace("..",".")
s = s.replace(". .",".")
s = s.replace("\n", "")
s = s.strip()
return s
Now we need to remove any bills that are too long for the token limit (8192 tokens).
Python
tokenizer = tiktoken.get_encoding("cl100k_base")
df_bills['n_tokens'] = df_bills["text"].apply(lambda x:
len(tokenizer.encode(x)))
df_bills = df_bills[df_bills.n_tokens<8192]
len(df_bills)
Output
20
7 Note
In this case all bills are under the embedding model input token limit, but you can
use the technique above to remove entries that would otherwise cause embedding
to fail. When faced with content that exceeds the embedding limit, you can also
chunk the content into smaller pieces and then embed those one at a time.
Python
df_bills
Output:
To understand the n_tokens column a little more as well how text ultimately is tokenized,
it can be helpful to run the following code:
Python
sample_encode = tokenizer.encode(df_bills.text[0])
decode = tokenizer.decode_tokens_bytes(sample_encode)
decode
For our docs we're intentionally truncating the output, but running this command in
your environment will return the full text from index zero tokenized into chunks. You can
see that in some cases an entire word is represented with a single token whereas in
others parts of words are split across multiple tokens.
Output
[b'SECTION',
b' ',
b'1',
b'.',
b' SHORT',
b' TITLE',
b'.',
b' This',
b' Act',
b' may',
b' be',
b' cited',
b' as',
b' the',
b' ``',
b'National',
b' Science',
b' Education',
b' Tax',
b' In',
b'cent',
b'ive',
b' for',
b' Businesses',
b' Act',
b' of',
b' ',
b'200',
b'7',
b"''.",
b' SEC',
b'.',
b' ',
b'2',
b'.',
b' C',
b'RED',
b'ITS',
b' FOR',
b' CERT',
b'AIN',
b' CONTRIBUT',
b'IONS',
b' BEN',
b'EF',
b'IT',
b'ING',
b' SC',
If you then check the length of the decode variable, you'll find it matches the first
number in the n_tokens column.
Python
len(decode)
Output
1466
Now that we understand more about how tokenization works we can move on to
embedding. It is important to note, that we haven't actually tokenized the documents
yet. The n_tokens column is simply a way of making sure none of the data we pass to
the model for tokenization and embedding exceeds the input token limit of 8,192. When
we pass the documents to the embeddings model, it will break the documents into
tokens similar (though not necessarily identical) to the examples above and then convert
the tokens to a series of floating point numbers that will be accessible via vector search.
These embeddings can be stored locally or in an Azure Database. As a result, each bill
will have its own corresponding embedding vector in the new ada_v2 column on the
right side of the DataFrame.
Python
Python
df_bills
Output:
As we run the search code block below, we'll embed the search query "Can I get
information on cable company tax revenue?" with the same text-embedding-ada-002
(Version 2) model. Next we'll find the closest bill embedding to the newly embedded
text from our query ranked by cosine similarity.
Python
embedding = get_embedding(
user_query,
res = (
df.sort_values("similarities", ascending=False)
.head(top_n)
if to_print:
display(res)
return res
Output:
Finally, we'll show the top result from document search based on user query against the
entire knowledge base. This returns the top result of the "Taxpayer's Right to View Act of
1993". This document has a cosine similarity score of 0.76 between the query and the
document:
Python
res["summary"][9]
Output
Using this approach, you can use embeddings as a search mechanism across documents
in a knowledge base. The user can then take the top search result and use it for their
downstream task, which prompted their initial query.
Clean up resources
If you created an OpenAI resource solely for completing this tutorial and want to clean
up and remove an OpenAI resource, you'll need to delete your deployed models, and
then delete the resource or associated resource group if it's dedicated to your test
resource. Deleting the resource group also deletes any other resources associated with
it.
Portal
Azure CLI
Next steps
Learn more about Azure OpenAI's models:
) Important
To complete the steps in this guide, access must be granted to Microsoft Azure
OpenAI Service in the desired Azure subscription. Currently, access to this service is
granted only by application. You can apply for access to Azure OpenAI by
completing the form at https://aka.ms/oai/access .
In this how-to guide, you can use Azure Cognitive Services Speech to converse with
Azure OpenAI Service. The text recognized by the Speech service is sent to Azure
OpenAI. The text response from Azure OpenAI is then synthesized by the Speech
service.
The Speech service recognizes your speech and converts it into text (speech to
text).
Your request as text is sent to Azure OpenAI.
The Speech service text to speech (TTS) feature synthesizes the response from
Azure OpenAI to the default speaker.
Prerequisites
" Azure subscription - Create one for free
" Create a Speech resource in the Azure portal.
" Get the Speech resource key and region. After your Speech resource is deployed,
select Go to resource to view and manage keys. For more information about
Cognitive Services resources, see Get the keys for your resource.
Tip
Don't include the key directly in your code, and never post it publicly. See the
Cognitive Services security article for more authentication options like Azure Key
Vault.
To set the environment variables, open a console window, and follow the instructions for
your operating system and development environment.
Windows
Console
7 Note
If you only need to access the environment variable in the current running
console, you can set the environment variable with set instead of setx .
After you add the environment variables, you may need to restart any running
programs that will need to read the environment variable, including the console
window. For example, if you are using Visual Studio as your editor, restart Visual
Studio before running the example.
1. Open a command prompt where you want the new project, and create a console
application with the .NET CLI. The Program.cs file should be created in the project
directory.
.NET CLI
2. Install the Speech SDK in your new project with the .NET CLI.
.NET CLI
3. Install the Azure OpenAI SDK (prerelease) in your new project with the .NET CLI.
.NET CLI
C#
using System;
using System.IO;
using System.Threading.Tasks;
using Microsoft.CognitiveServices.Speech;
using Microsoft.CognitiveServices.Speech.Audio;
using Azure;
using Azure.AI.OpenAI;
class Program
// Enter the deployment name you chose when you deployed the model.
Prompts = { prompt },
MaxTokens = 100,
};
Response<Completions> completionsResponse =
client.GetCompletions(engine, completionsOptions);
speechConfig.SpeechSynthesisVoiceName = "en-US-
JennyMultilingualNeural";
if (speechSynthesisResult.Reason ==
ResultReason.SynthesizingAudioCompleted)
else if (speechSynthesisResult.Reason ==
ResultReason.Canceled)
var cancellationDetails =
SpeechSynthesisCancellationDetails.FromResult(speechSynthesisResult);
if (cancellationDetails.Reason ==
CancellationReason.Error)
Console.WriteLine($"Error details:
{cancellationDetails.ErrorDetails}");
speechConfig.SpeechRecognitionLanguage = "en-US";
while(!conversationEnded)
{
switch (speechRecognitionResult.Reason)
case ResultReason.RecognizedSpeech:
if (speechRecognitionResult.Text == "Stop.")
Console.WriteLine("Conversation ended.");
conversationEnded = true;
else
Console.WriteLine($"Recognized speech:
{speechRecognitionResult.Text}");
await
AskOpenAI(speechRecognitionResult.Text).ConfigureAwait(true);
break;
case ResultReason.NoMatch:
break;
case ResultReason.Canceled:
var cancellationDetails =
CancellationDetails.FromResult(speechRecognitionResult);
if (cancellationDetails.Reason ==
CancellationReason.Error)
Console.WriteLine($"Error details=
{cancellationDetails.ErrorDetails}");
break;
try
await ChatWithOpenAI().ConfigureAwait(true);
Console.WriteLine(ex.Message);
Run your new console application to start speech recognition from a microphone:
Console
dotnet run
) Important
Make sure that you set the OPEN_AI_KEY , OPEN_AI_ENDPOINT , SPEECH__KEY and
SPEECH__REGION environment variables as described previously. If you don't set
Speak into your microphone when prompted. The console output includes the prompt
for you to begin speaking, then your request as text, and then the response from Azure
OpenAI as text. The response from Azure OpenAI should be converted from text to
speech and then output to the default speaker.
Console
Conversation ended.
PS C:\dev\openai\csharp>
Remarks
Now that you've completed the quickstart, here are some more considerations:
To change the speech recognition language, replace en-US with another supported
language. For example, es-ES for Spanish (Spain). The default language is en-US if
you don't specify a language. For details about how to identify one of multiple
languages that might be spoken, see language identification.
To change the voice that you hear, replace en-US-JennyMultilingualNeural with
another supported voice. If the voice doesn't speak the language of the text
returned from Azure OpenAI, the Speech service doesn't output synthesized audio.
To use a different model, replace text-davinci-002 with the ID of another
deployment. Keep in mind that the deployment ID isn't necessarily the same as the
model name. You named your deployment when you created it in Azure OpenAI
Studio .
Azure OpenAI also performs content moderation on the prompt inputs and
generated outputs. The prompts or responses may be filtered if harmful content is
detected. For more information, see the content filtering article.
Clean up resources
You can use the Azure portal or Azure Command Line Interface (CLI) to remove the
Speech resource you created.
Next steps
Learn more about Speech
Learn more about Azure OpenAI
Overview of Responsible AI practices for
Azure OpenAI models
Article • 05/19/2023
Many of the Azure OpenAI models are generative AI models that have demonstrated
improvements in advanced capabilities such as content and code generation,
summarization, and search. With many of these improvements also come increased
responsible AI challenges related to harmful content, manipulation, human-like
behavior, privacy, and more. For more information about the capabilities, limitations and
appropriate use cases for these models, please review the Transparency Note.
1. Identify : Identify and prioritize potential harms that could result from your AI
system through iterative red-teaming, stress-testing, and analysis.
2. Measure : Measure the frequency and severity of those harms by establishing clear
metrics, creating measurement test sets, and completing iterative, systematic
testing (both manual and automated).
3. Mitigate : Mitigate harms by implementing tools and strategies such as prompt
engineering and using our content filters. Repeat measurement to test
effectiveness after implementing mitigations.
4. Operate : Define and execute a deployment and operational readiness plan.
Identify
Identifying potential harms that could occur in or be caused by an AI system is the first
stage of the Responsible AI lifecycle. The earlier you begin to identify potential harms,
the more effective you can be at mitigating the harms. When assessing potential harms,
it is important to develop an understanding of the types of harms that could result from
using the Azure OpenAI Service in your specific context(s). In this section, we provide
recommendations and resources you can use to identify harms through an impact
assessment, iterative red team testing, stress-testing, and analysis. Red teaming and
stress-testing are approaches where a group of testers come together and intentionally
probe a system to identify its limitations, risk surface, and vulnerabilities.
These steps have the goal of producing a prioritized list of potential harms for each
specific scenario.
1. Identify harms that are relevant for your specific model, application, and
deployment scenario.
a. Identify potential harms associated with the model and model capabilities (for
example, GPT-3 model vs GPT-4 model) that you're using in your system. This is
important to consider because each model has different capabilities, limitations,
and risks, as described more fully in the sections above.
b. Identify any other harms or increased scope of harm presented by the intended
use of the system you're developing. Consider using a Responsible AI Impact
Assessment to identify potential harms.
i. For example, let's consider an AI system that summarizes text. Some uses of
text generation are lower risk than others. For example, if the system is to be
used in a healthcare domain for summarizing doctor's notes, the risk of harm
arising from inaccuracies is higher than if the system is summarizing online
articles.
2. Prioritize harms based on elements of risk such as frequency and severity. Assess
the level of risk for each harm and the likelihood of each risk occurring in order to
prioritize the list of harms you've identified. Consider working with subject matter
experts and risk managers within your organization and with relevant external
stakeholders when appropriate.
3. Conduct red team testing and stress testing starting with the highest priority
harms, to develop a better understanding of whether and how the identified harms
are actually occurring in your scenario, as well as to identify new harms you didn't
initially anticipate.
4. Share this information with relevant stakeholders using your organization's
internal compliance processes.
At the end of this Identify stage, you should have a documented, prioritized list of
harms. When new harms and new instances of harms emerge through further testing
and use of the system, you can update and improve this list by following the above
process again.
Measure
Once a list of prioritized harms has been identified, the next stage involves developing
an approach for systematic measurement of each harm and conducting evaluations of
the AI system. There are manual and automated approaches to measurement. We
recommend you do both, starting with manual measurement.
1. Create inputs that are likely to produce each prioritized harm: Create
measurement set(s) by generating many diverse examples of targeted inputs that
are likely to produce each prioritized harm.
2. Generate System Outputs: Pass in the examples from the measurement sets as
inputs to the system to generate system outputs. Document the outputs.
At the end of this measurement stage, you should have a defined measurement
approach to benchmark how your system performs for each potential harm as well as an
initial set of documented results. As you continue implementing and testing mitigations,
the metrics and measurement sets should continue to be refined (for example, to add
metrics for new harms that were initially unanticipated) and the results updated.
Mitigate
Mitigating harms presented by large language models such as the Azure OpenAI
models requires an iterative, layered approach that includes experimentation and
continual measurement. We recommend developing a mitigation plan that
encompasses four layers of mitigations for the harms identified in the earlier stages of
this process:
1. At the model level, it's important to understand the model(s) you'll be using and
what fine-tuning steps may have been taken by the model developers to align the
model towards its intended uses and to reduce the risk of potentially harmful uses
and outcomes.
a. For example, for GPT-4, model developers have been able to use reinforcement
learning methods as a responsible AI tool to better align the model towards the
designers' intended goals.
2. At the safety system level, you should understand the platform level mitigations
that have been implemented. such as the Azure OpenAI content filters which help
to block the output of harmful content.
3. At the application level, application developers can implement metaprompt and
user-centered design and user experience mitigations. Metaprompts are
instructions provided to the model to guide its behavior; their use can make a
critical difference in guiding the system to behave in accordance with your
expectations. User-centered design and user experience (UX) interventions are also
key mitigation tools to prevent misuse and overreliance on AI.
4. At the positioning level, there are many ways to educate the people who will use
or be affected by your system about its capabilities and limitations.
1. Model level Mitigations: Review and identify which Azure OpenAI base model is
best suited for the system you're building and educate yourself about its
capabilities, limitations, and any measures taken to reduce the risk of the potential
harms you've identified. For example, if you're using GPT-4, in addition to reading
this Transparency Note, you can review OpenAI's GPT-4 System Card explaining
the safety challenges presented by the model and the safety processes that
OpenAI adopted to prepare GPT-4 for deployment. It may be worth experimenting
with different versions of the model(s) (including through red teaming and
measuring) to see how the harms present differently.
Operate
Once measurement and mitigation systems are in place, we recommend that you define
and execute a deployment and operational readiness plan. This stage includes
completing appropriate reviews of your system and mitigation plans with relevant
stakeholders, establishing pipelines to collect telemetry and feedback, and developing
an incident response and rollback plan.
Some recommendations for how to deploy and operate a system that uses the Azure
OpenAI service with appropriate, targeted harms mitigations include:
1. Work with compliance teams within your organization to understand what types of
reviews are required for your system and when they are required (for example,
legal review, privacy review, security review, accessibility review, etc.).
This document is not intended to be, and should not be construed as providing, legal
advice. The jurisdiction in which you're operating may have various regulatory or legal
requirements that apply to your AI system. Consult a legal specialist if you are uncertain
about laws or regulations that might apply to your system, especially if you think those
might impact these recommendations. Be aware that not all of these recommendations
and resources are appropriate for every scenario, and conversely, these
recommendations and resources may be insufficient for some scenarios.
Microsoft’s Transparency Notes are part of a broader effort at Microsoft to put our AI
Principles into practice. To find out more, see the Microsoft's AI principles .
Select the tabs to see content for the relevant model type.
Introduction
As part of the fully managed Azure OpenAI Service, the GPT-3 models analyze and
generate natural language, Codex models analyze and generate code and plain text
code commentary, and the GPT-4 models can understand and generate natural
language and code. These models use an autoregressive architecture, meaning they
use data from prior observations to predict the most probable next word. This
process is then repeated by appending the newly generated content to the original
text to produce the complete generated response. Because the response is
conditioned on the input text, these models can be applied to various tasks simply
by changing the input text.
The GPT-3 series of models are pretrained on a wide body of publicly available free
text data. This data is sourced from a combination of web crawling (specifically, a
filtered version of Common Crawl , which includes a broad range of text from the
internet and comprises 60 percent of the weighted pretraining dataset) and higher-
quality datasets, including an expanded version of the WebText dataset, two
internet-based books corpora and English-language Wikipedia. The GPT-4 base
model was trained using publicly available data (such as internet data) and data
that was licensed by OpenAI. The model was fine-tuned using reinforcement
learning with human feedback (RLHF).
Learn more about the training and modeling techniques in OpenAI's GPT-3 , GPT-
4 , and Codex research papers. The guidance below is also drawn from OpenAI's
safety best practices .
Key terms
Term Definition
Prompt The text you send to the service in the API call. This text is then input into the
model. For example, one might input the following prompt:
A:
Completion The text Azure OpenAI outputs in response. For example, the service may
or respond with the following answer to the above prompt: send-msg 'find
Generation greg' figure out if things are ready for Wednesday.
Token Azure OpenAI processes text by breaking it down into tokens. Tokens can be
words or just chunks of characters. For example, the word hamburger gets
broken up into the tokens ham , bur and ger , while a short and common word
like pear is a single token. Many tokens start with a whitespace, for example
hello and bye .
Capabilities
Text and code models
The GPT-4, GPT-3, and Codex Azure OpenAI Service models use natural language
instructions and examples in the prompt to identify the task. The model then
completes the task by predicting the most probable next text. This technique is
known as "in-context" learning. These models are not retrained during this step but
instead give predictions based on the context you include in the prompt.
There are three main approaches for in-context learning. These approaches vary
based on the amount of task-specific data that is given to the model:
Few-shot : In this case, a user includes several examples in the prompt that
demonstrate the expected answer format and content. The following example
shows a few-shot prompt providing multiple examples:
Q: Contact the ski store and figure out if I can getmy skis fixed before
I leave on Thursday
A:
The number of examples typically ranges from 0 to 100 depending on how many
can fit in the maximum input length for a single prompt. Few-shot learning enables
a major reduction in the amount of task-specific data required for accurate
predictions.
One-shot : This case is the same as the few-shot approach except only one example
is provided. The following example shows a one-shot prompt:
Convert the questions to a command:
A:
Zero-shot: In this case, no examples are provided to the model and only the task
request is provided. The following example shows a zero-shot prompt:
A:
Use cases
Text and code models
Intended uses
The GPT-4, GPT-3, and Codex models in the Azure OpenAI service can be used in
multiple scenarios. The following list isn't comprehensive, but it illustrates the
diversity of tasks that can be supported with appropriate mitigations:
Limitations
When it comes to large-scale natural language models and image models, there are
particular fairness and responsible AI issues to consider. People use language and
images to describe the world and to express their beliefs, assumptions, attitudes, and
values. As a result, publicly available text and image data typically used to train large-
scale natural language processing and image generation models contains societal biases
relating to race, gender, religion, age, and other groups of people, as well as other
undesirable content. These societal biases are reflected in the distributions of words,
phrases, and syntactic structures.
Please be advised that this section contains illustrative examples which include
terms and language that some individuals may find offensive.
Large-scale natural language and image models trained with such data can potentially
behave in ways that are unfair, unreliable, or offensive, in turn causing harms. Some of
the ways are listed here. We emphasize that these types of harms aren't mutually
exclusive. A single model can exhibit more than one type of harm, potentially relating to
multiple different groups of people. For example:
Allocation: These models can be used in ways that lead to unfair allocation of
resources or opportunities. For example, automated resume screening systems can
withhold employment opportunities from one gender if they're trained on resume
data that reflects the existing gender imbalance in a particular industry. Or the
DALL·E 2 model could be used to create imagery in the style of a known artist,
which could affect the value of the artist's work or the artist's life opportunities.
Quality of service: The Azure OpenAI models are trained primarily on English text
and images with English text descriptions. Languages other than English will
experience worse performance. English language varieties with less representation
in the training data might experience worse performance. The publicly available
images used to train the DALL·E models might reinforce public bias and other
undesirable content. The models are also unable to generate consistent
photorealistic images and comprehensive text at this time.
Stereotyping: These models can reinforce stereotypes. For example, when
translating "He is a nurse" and "She is a doctor" into a genderless language such as
Turkish and then back into English, many machine translation systems yield the
stereotypical (and incorrect) results of "She is a nurse" and "He is a doctor." With
DALL·E 2, when generating an image based on the prompt "Fatherless children,"
the model could generate images of Black children only, reinforcing harmful
stereotypes that may exist in publicly available images.
Demeaning: These models can demean people. For example, an open-ended
content generation system with inappropriate or insufficient mitigations might
produce offensive or demeaning to a particular group of people.
Overrepresentation and underrepresentation: These models can over- or under-
represent groups of people, or even erase their representation entirely. For
example, if text prompts that contain the word "gay" are detected as potentially
harmful or offensive, this could to the underrepresentation or even erasure of
legitimate image generations by or about the LGBTQIA+ community.
Inappropriate or offensive content: These models can produce other types of
inappropriate or offensive content. Examples include the ability to create images
that potentially contain harmful artifacts such as hate symbols; images that illicit
harmful connotations; images that relate to contested, controversial, or
ideologically polarizing topics; images that are manipulative; images that contain
sexually charged content that isn't caught by sexual-related content filters; and
images that relate to sensitive or emotionally charged topics. For example, a well-
intentioned text prompt aimed to create an image of the New York skyline with
clouds and airplanes flying over it might unintentionally generate images that illicit
sentiments related to the events surrounding 9/11.
Disinformation and misinformation about sensitive topics: Because DALL·E 2 is a
powerful image generation model, it can be used to produce disinformation and
misinformation that can be extremely harmful. For example, the model might
generate an image of a political leader engaging in activity of a violent or sexual
(or simply inaccurate) nature that might lead to defamation and other
consequential harms, including but not limited to public protests, political change,
or fake news.
Information reliability: Language model responses can fabricate content that may
sound reasonable but is nonsensical or inaccurate with respect to external
validation sources. Even when drawing responses from trusted source information,
responses may misrepresent that content.
False information: Azure OpenAI doesn't fact-check or verify content that is
provided by customers or users. Depending on how you've developed your
application, it might produce false information unless you've built in mitigations
(see Best practices for improving system performance below).
System performance
In many AI systems, performance is often defined in relation to accuracy—that is, how
often the AI system offers a correct prediction or output. With large-scale natural
language models and image models, two different users might look at the same output
and have different opinions of how useful or relevant it is, which means that
performance for these systems must be defined more flexibly. Here, we broadly consider
performance to mean that the application performs as you and your users expect,
including not generating harmful outputs.
Azure OpenAI service can support a wide range of applications like search, classification,
code generation, and image generation, each with different performance metrics and
mitigation strategies. There are several steps you can take to mitigate some of the
concerns listed under "Limitations" and to improve performance. Other important
mitigation techniques are outlined in the section Evaluating and integrating Azure
OpenAI for your use below.
Registration process
Azure OpenAI requires registration and is currently only available to approved enterprise
customers and partners. Customers who wish to use Azure OpenAI are required to
submit a registration form .
Customers must attest to any and all use cases for which they will use the service (the
use cases from which customers may select will populate in the form after selection of
the desired model(s) in Question 22 in the initial registration form). Customers who wish
to add additional use cases after initial onboarding must submit the additional use cases
using this form . The use of Azure OpenAI is limited to use cases that have been
selected in a registration form. Microsoft may require customers to re-verify this
information. Read more about example use cases and use cases to avoid here.
Customers who wish to modify content filters and modify abuse monitoring after they
have onboarded to the service are subject to additional eligibility criteria and scenario
restrictions. At this time, modified content filters and/or modified abuse monitoring for
Azure OpenAI Service are only available to managed customers and partners working
with Microsoft account teams and have additional use case restrictions. Customers
meeting these requirements can register here .
Access to the Azure OpenAI Service is subject to Microsoft's sole discretion based on
eligibility criteria and a vetting process, and customers must acknowledge that they
have read and understand the Azure terms of service for Azure OpenAI Service.
Azure OpenAI Service is made available to customers under the terms governing their
subscription to Microsoft Azure Services, including the Azure OpenAI section of the
Microsoft Product Terms . Please review these terms carefully as they contain
important conditions and obligations governing your use of Azure OpenAI Service.
Important links
Register to use Azure OpenAI
Add additional use cases (if needed)
Register to modify content filters and abuse monitoring (if needed)
See also
Code of conduct for Azure OpenAI Service integrations
Transparency note for Azure OpenAI Service
Characteristics and limitations for Azure OpenAI Service
Data, privacy, and security for Azure OpenAI Service
Code of conduct for Azure OpenAI
Service
Article • 03/13/2023
The following Code of Conduct defines the requirements that all Azure OpenAI Service
implementations must adhere to in good faith. This code of conduct is in addition to the
Acceptable Use Policy in the Microsoft Online Services Terms .
Access requirements
Azure OpenAI Service is a Limited Access service that requires registration and is only
available to approved enterprise customers and partners. Customers who wish to use
this service are required to register through this form . To learn more, see Limited
Access to Azure OpenAI Service.
Content requirements
We prohibit the use of our service for generating content that can inflict harm on
individuals or society. Our content policies are intended to improve the safety of our
platform.
These content requirements apply to the output of all models developed by OpenAI and
hosted in Azure OpenAI, such as GPT-3, GPT-4, Codex models, and DALL·E 2, and
includes content provided as input to the service and content generated as output from
the service.
Azure OpenAI Service prohibits content that describes, features, or promotes child
sexual exploitation or abuse, whether or not prohibited by law. This includes sexual
content involving a child or that sexualizes a child.
Grooming
Azure OpenAI Service prohibits content that describes or is used for purposes of
grooming of children. Grooming is the act of an adult building a relationship with a
child for the purposes of exploitation, especially sexual exploitation. This includes
communicating with a child for the purpose of sexual exploitation, trafficking, or other
forms of exploitation.
Azure OpenAI Service prohibits content that describes, features, or promotes non-
consensual intimate activity.
Sexual solicitation
Azure OpenAI Service prohibits content that describes, features, or promotes, or is used
for, purposes of solicitation of commercial sexual activity and sexual services. This
includes encouragement and coordination of real sexual activity.
Trafficking
Azure OpenAI Service prohibits content describing or used for purposes of human
trafficking. This includes the recruitment of individuals, facilitation of transport, and
payment for, and the promotion of, exploitation of people such as forced labor,
domestic servitude, sexual slavery, forced marriages, and forced medical procedures.
Azure OpenAI Service prohibits content that describes, features, or promotes graphic
violence or gore.
Harmful Content
Azure OpenAI Service prohibits content that targets individual(s) or group(s) with
threats, intimidation, insults, degrading or demeaning language or images, promotion of
physical harm, or other abusive behavior such as stalking.
Content that directly supports unlawful active attacks or malware campaigns that cause
technical harms, such as delivering malicious executables, organizing denial of service
attacks, or managing command and control servers.
We may at times limit our service's ability to respond to particular topics, such as
probing for personal information or seeking opinions on sensitive topics or current
events.
We prohibit the use of Azure OpenAI Service for activities that significantly harm other
individuals, organizations, or society, including but not limited to use of the service for
purposes in conflict with the applicable Azure Legal Terms and the Microsoft Product
Terms .
Report abuse
If you suspect that Azure OpenAI Service is being used in a manner that is abusive or
illegal, infringes on your rights or the rights of other people, or violates these policies,
you can report it at the Report Abuse Portal .
See also
Limited access to Azure OpenAI Service
Transparency note for Azure OpenAI Service
Data, privacy, and security for Azure OpenAI Service
Data, privacy, and security for Azure
OpenAI Service
Article • 06/23/2023
This article provides details regarding how data provided by you to the Azure OpenAI
service is processed, used, and stored. Azure OpenAI stores and processes data to
provide the service and to monitor for uses that violate the applicable product terms.
Please also see the Microsoft Products and Services Data Protection Addendum ,
which governs data processing by the Azure OpenAI Service except as otherwise
provided in the applicable Product Terms .
) Important
Your prompts (inputs) and completions (outputs), your embeddings, and your
training data:
The Azure OpenAI Service is fully controlled by Microsoft; Microsoft hosts the
OpenAI models in Microsoft’s Azure environment and the Service does NOT
interact with any services operated by OpenAI (e.g. ChatGPT, or the OpenAI API).
Prompts and generated content. Prompts are submitted by the user, and content
is generated by the service, via the completions, chat completions, images and
embeddings operations.
Augmented data included with prompts. When using the "on your data" feature,
the service retrieves relevant data from a configured data store and augments the
prompt to produce generations that are grounded with your data.
Training & validation data. You can provide your own training data consisting of
prompt-completion pairs for the purposes of fine-tuning an OpenAI model.
1. How the Azure OpenAI Service processes your prompts to generate content
(including when additional data from a connected data source is added to a
prompt using Azure OpenAI on your data).
2. How the Azure OpenAI Service creates a fine-tuned (custom) model with your
training data.
3. How the Azure OpenAI Service and Microsoft personnel analyze prompts,
completions and images for harmful content and for patterns suggesting the use
of the service in a manner that violates the Code of Conduct or other applicable
product terms
As depicted in the diagram above, managed customers may apply to modify abuse
monitoring .
The models are stateless: no prompts or generations are stored in the model.
Additionally, prompts and generations are not used to train, retrain, or improve the
base models.
The Azure OpenAI "on your data" feature lets you connect data sources to ground the
generated results with your data. The data remains stored in the data source and
location you designate. No data is copied into the Azure OpenAI service. When a user
prompt is received, the service retrieves relevant data from the connected data source
and augments the prompt. The model processes this augmented prompt and the
generated content is returned as described above.
As depicted in the diagram above, managed customers may apply to modify abuse
monitoring .
Training data uploaded for fine-tuning is not used to train, retrain, or improve any
Microsoft or 3rd party base models.
Azure OpenAI abuse monitoring detects and mitigates instances of recurring content
and/or behaviors that suggest use of the service in a manner that may violate the code
of conduct or other applicable product terms. To detect and mitigate abuse, Azure
OpenAI stores all prompts and generated content securely for up to thirty (30) days. (No
prompts or completions are stored if the customer is approved for and elects to
configure abuse monitoring off, as described below.)
The data store where prompts and completions are stored is logically separated by
customer resource (each request includes the resource ID of the customer’s Azure
OpenAI resource). A separate data store is located in each region in which the Azure
OpenAI Service is available, and a customer’s prompts and generated content are stored
in the Azure region where the customer’s Azure OpenAI service resource is deployed,
within the Azure OpenAI service boundary. Human reviewers assessing potential abuse
can access prompts and completions data only when that data has been flagged by the
abuse monitoring system. The human reviewers are authorized Microsoft employees
who access the data via point wise queries using request IDs, Secure Access
Workstations (SAWs), and Just-In-Time (JIT) request approval granted by team
managers. For Azure OpenAI Service deployed in the European Economic Area, the
authorized Microsoft employees are located in the European Economic Area.
How can customers get an exemption from
abuse monitoring and human review?
Some customers may want to use the Azure OpenAI Service for a use case that involves
the processing of sensitive, highly confidential, or legally-regulated input data but where
the likelihood of harmful outputs and/or misuse is low. These customers may conclude
that they do not want or do not have the right to permit Microsoft to process such data
for abuse detection, as described above, due to their internal policies or applicable legal
regulations. To address these concerns, Microsoft allows customers who meet additional
Limited Access eligibility criteria and attest to specific use cases to apply to modify the
Azure OpenAI content management features by completing this form .
7 Note
The value of "false" for the "ContentLogging" attribute appears only if data storage
for abuse monitoring is turned off. Otherwise, this property will not appear in either
Azure portal or Azure CLI's output.
Prerequisites
1. Sign into Azure
2. Select the Azure Subscription which hosts the Azure OpenAI Service resource.
3. Navigate to the Overview page of the Azure OpenAI Service resource.
Logging status verification using the Azure portal:
2. Click on the JSON view link on the top right corner as shown in the image below.
There will be a value in the Capabilities list called "ContentLogging" which will appear
and be set to FALSE when logging for abuse monitoring is off.
JSON
"name":"ContentLogging",
"value":"false"
Logging status verification using the Azure CLI (or other management API):
Execute the following command in Azure CLI to see the same JSON data as shown in the
Azure portal above.
Azure CLI
To learn more about Microsoft's privacy and security commitments see the Microsoft
Trust Center .
Change log
Date Changes
23 Added information about data processing for new Azure on your data feature; removed
June information about abuse monitoring which is now available at Azure OpenAI Service
2023 abuse monitoring. Added summary note. Updated and streamlined content and updated
diagrams for additional clarity. added change log
See also
Limited access to Azure OpenAI Service
Code of conduct for Azure OpenAI Service integrations
Transparency note and use cases for Azure OpenAI Service
Characteristics and limitations for Azure OpenAI Service
Report abuse of Azure OpenAI Service through the Report Abuse Portal
Report problematic content to cscraireport@microsoft.com
Microsoft.CognitiveServices accounts
Article • 06/18/2023
For a list of changed properties in each API version, see change log.
Resource format
To create a Microsoft.CognitiveServices/accounts resource, add the following Bicep to your template.
Bicep
name: 'string'
location: 'string'
tags: {
tagName1: 'tagValue1'
tagName2: 'tagValue2'
sku: {
capacity: int
family: 'string'
name: 'string'
size: 'string'
tier: 'string'
kind: 'string'
identity: {
type: 'string'
userAssignedIdentities: {}
properties: {
allowedFqdnList: [
'string'
apiProperties: {
aadClientId: 'string'
aadTenantId: 'string'
eventHubConnectionString: 'string'
qnaAzureSearchEndpointId: 'string'
qnaAzureSearchEndpointKey: 'string'
qnaRuntimeEndpoint: 'string'
statisticsEnabled: bool
storageAccountConnectionString: 'string'
superUser: 'string'
websiteName: 'string'
customSubDomainName: 'string'
disableLocalAuth: bool
dynamicThrottlingEnabled: bool
encryption: {
keySource: 'string'
keyVaultProperties: {
identityClientId: 'string'
keyName: 'string'
keyVaultUri: 'string'
keyVersion: 'string'
locations: {
regions: [
customsubdomain: 'string'
name: 'string'
value: int
routingMethod: 'string'
migrationToken: 'string'
networkAcls: {
defaultAction: 'string'
ipRules: [
value: 'string'
virtualNetworkRules: [
id: 'string'
ignoreMissingVnetServiceEndpoint: bool
state: 'string'
publicNetworkAccess: 'string'
restore: bool
restrictOutboundNetworkAccess: bool
userOwnedStorage: [
identityClientId: 'string'
resourceId: 'string'
Property values
accounts
Valid characters:
tags Resource tags. Dictionary of tag names and values. See Tags in templates
Identity
Name Description
userAssignedIdentities The list of user assigned identities associated with the resource. The user identity dictionary key references will be ARM resource ids in the for
'/subscriptions/{subscriptionId}/resourceGroups/{resourceGroupName}/providers/Microsoft.ManagedIdentity/userAssignedIdentities/{identity
AccountProperties
allowedFqdnList string[]
Name Description Value
disableLocalAuth bool
networkAcls A collection of rules governing the accessibility from specific network locations. NetworkRuleSet
publicNetworkAccess Whether or not public endpoint access is allowed for this account. 'Disabled'
'Enabled'
restore bool
restrictOutboundNetworkAccess bool
ApiProperties
aadClientId (Metrics Advisor Only) The Azure AD Client Id (Application Id). string
eventHubConnectionString (Personalization Only) The flag to enable statistics of Bing Search. string
qnaAzureSearchEndpointKey (QnAMaker Only) The Azure Search endpoint key of QnAMaker. string
statisticsEnabled (Bing Search Only) The flag to enable statistics of Bing Search. bool
superUser (Metrics Advisor Only) The super user of Metrics Advisor. string
websiteName (Metrics Advisor Only) The website name of Metrics Advisor. string
Encryption
'Microsoft.KeyVault'
KeyVaultProperties
identityClientId string
MultiRegionSettings
Name Description Value
regions RegionSetting[]
'Priority'
'Weighted'
RegionSetting
NetworkRuleSet
defaultAction The default action when no rule from ipRules and from virtualNetworkRules match. This is only used after the 'Allow'
IpRule
value An IPv4 address range in CIDR notation, such as '124.56.78.91' (simple IP address) or '124.56.78.0/24' (all addresses that start with string
124.56.78). (required)
VirtualNetworkRule
UserOwnedStorage
identityClientId string
Sku
capacity If the SKU supports scale out/in then the capacity integer should be included. If scale out/in is not possible for the resource this int
may be omitted.
family If the service has different generations of hardware, for the same SKU, then that can be captured here. string
name The name of the SKU. Ex - P3. It is typically a letter+number code string
(required)
Name Description Value
size The SKU size. When the name field is the combination of tier and some other value, this would be the standalone code. string
tier This field is required to be implemented by the Resource Provider if the service has more than one tier, but is not required on a 'Basic'
PUT. 'Enterprise'
'Free'
'Premium'
'Standard'
Quickstart templates
The following quickstart templates deploy this resource type.
Template Description
This article provides details on the inference REST API endpoints for Azure OpenAI.
Authentication
Azure OpenAI provides two methods for authentication. you can use either API Keys or
Azure Active Directory.
API Key authentication: For this type of authentication, all API requests must
include the API Key in the api-key HTTP header. The Quickstart provides guidance
for how to make calls with this type of authentication.
Azure Active Directory authentication: You can authenticate an API call using an
Azure Active Directory token. Authentication tokens are included in a request as
the Authorization header. The token provided must be preceded by Bearer , for
example Bearer YOUR_AUTH_TOKEN . You can read our how-to guide on
authenticating with Azure Active Directory.
HTTP
POST
https://YOUR_RESOURCE_NAME.openai.azure.com/openai/deployments/YOUR_DEPLOYME
NT_NAME/completions?api-version=2023-05-15
Completions
With the Completions operation, the model will generate one or more predicted
completions based on a provided prompt. The service can also return the probabilities
of alternative tokens at each position.
Create a completion
HTTP
POST https://{your-resource-
name}.openai.azure.com/openai/deployments/{deployment-id}/completions?api-
version={api-version}
Path parameters
deployment-id string Required The deployment name you chose when you deployed the
model.
api-version string Required The API version to use for this operation. This follows the
YYYY-MM-DD format.
Supported versions
Request body
Example request
Console
curl
https://YOUR_RESOURCE_NAME.openai.azure.com/openai/deployments/YOUR_DEPLOYME
NT_NAME/completions?api-version=2023-05-15\
-H "Content-Type: application/json" \
-H "api-key: YOUR_API_KEY" \
-d "{
\"max_tokens\": 5
}"
Example response
JSON
"id": "cmpl-4kGh7iXtjW4lc9eGhff6Hp8C7btdQ",
"object": "text_completion",
"created": 1646932609,
"model": "ada",
"choices": [
"index": 0,
"logprobs": null,
"finish_reason": "length"
Embeddings
Get a vector representation of a given input that can be easily consumed by machine
learning models and other algorithms.
7 Note
We currently do not support batching of embeddings into a single API call. If you
receive the error InvalidRequestError: Too many inputs. The max number of inputs
is 1. We hope to increase the number of inputs per request soon. , this typically
occurs when an array of embeddings is attempted to be passed as a batch rather
than a single string. The string can be up to 8191 tokens in length when using the
text-embedding-ada-002 (Version 2) model.
Create an embedding
HTTP
POST https://{your-resource-
name}.openai.azure.com/openai/deployments/{deployment-id}/embeddings?api-
version={api-version}
Path parameters
deployment- string Required The name of your model deployment. You're required to first
id deploy a model before you can make calls
Parameter Type Required? Description
api- string Required The API version to use for this operation. This follows the
version YYYY-MM-DD format.
Supported versions
Request body
input string Yes N/A Input text to get embeddings for, encoded as a
string. The number of input tokens varies
depending on what model you are using.
Example request
Console
curl
https://YOUR_RESOURCE_NAME.openai.azure.com/openai/deployments/YOUR_DEPLOYME
NT_NAME/embeddings?api-version=2023-05-15 \
-H "Content-Type: application/json" \
-H "api-key: YOUR_API_KEY" \
Example response
JSON
"object": "list",
"data": [
"object": "embedding",
"embedding": [
0.018990106880664825,
-0.0073809814639389515,
0.021276434883475304,
],
"index": 0
],
"model": "text-similarity-babbage:001"
Chat completions
Create completions for chat messages with the ChatGPT and GPT-4 models.
HTTP
POST https://{your-resource-
name}.openai.azure.com/openai/deployments/{deployment-id}/chat/completions?
api-version={api-version}
Path parameters
deployment- string Required The name of your model deployment. You're required to first
id deploy a model before you can make calls
api- string Required The API version to use for this operation. This follows the
version YYYY-MM-DD format.
Supported versions
Console
curl
https://YOUR_RESOURCE_NAME.openai.azure.com/openai/deployments/YOUR_DEPLOYME
NT_NAME/chat/completions?api-version=2023-05-15 \
-H "Content-Type: application/json" \
-H "api-key: YOUR_API_KEY" \
Example response
Console
{"id":"chatcmpl-6v7mkQj980V1yBec6ETrKPRqFjNw9",
"object":"chat.completion","created":1679072642,
"model":"gpt-35-turbo",
"usage":{"prompt_tokens":58,
"completion_tokens":68,
"total_tokens":126},
"choices":[{"message":{"role":"assistant",
Output formatting adjusted for ease of reading, actual output is a single block of text
without line breaks.
stream boolean Optional false If set, partial message deltas will be sent,
like in ChatGPT. Tokens will be sent as
data-only server-sent events as they
become available, with the stream
terminated by a data: [DONE] message."
stop string Optional null Up to 4 sequences where the API will stop
or array generating further tokens.
Completions extensions
Extensions for chat completions, for example Azure OpenAI on your data.
HTTP
POST {your-resource-name}/openai/deployments/{deployment-
id}/extensions/chat/completions?api-version={api-version}
Path parameters
deployment- string Required The name of your model deployment. You're required to first
id deploy a model before you can make calls
api- string Required The API version to use for this operation. This follows the
version YYYY-MM-DD format.
Supported versions
2023-06-01-preview Swagger spec
Example request
Console
curl -i -X POST
YOUR_RESOURCE_NAME/openai/deployments/YOUR_DEPLOYMENT_NAME/extensions/chat/c
ompletions?api-version=2023-06-01-preview \
-H "Content-Type: application/json" \
-H "api-key: YOUR_API_KEY" \
-H "chatgpt_url: YOUR_RESOURCE_URL" \
-H "chatgpt_key: YOUR_API_KEY" \
-d \
'
"dataSources": [
"type": "AzureCognitiveSearch",
"parameters": {
"endpoint": "'YOUR_AZURE_COGNITIVE_SEARCH_ENDPOINT'",
"key": "'YOUR_AZURE_COGNITIVE_SEARCH_KEY'",
"indexName": "'YOUR_AZURE_COGNITIVE_SEARCH_INDEX_NAME'"
],
"messages": [
"role": "user",
'
Example response
JSON
"id": "12345678-1a2b-3c4e5f-a123-12345678abcd",
"model": "",
"created": 1684304924,
"object": "chat.completion",
"choices": [
"index": 0,
"messages": [
"role": "tool",
"end_turn": false
},
"role": "assistant",
"end_turn": true
messages array Required null The messages to generate chat completions for,
in the chat format.
dataSources array Required The data sources to be used for the Azure
OpenAI on your data feature.
stream boolean Optional false If set, partial message deltas will be sent, like in
ChatGPT. Tokens will be sent as data-only
server-sent events as they become available,
with the stream terminated by a message
"messages": [{"delta": {"content": "[DONE]"},
"index": 2, "end_turn": true}]
stop string Optional null Up to 2 sequences where the API will stop
or array generating further tokens.
max_tokens integer Optional 1000 The maximum number of tokens allowed for the
generated answer. By default, the number of
tokens the model can return is 4096 -
prompt_tokens .
The following parameters can be used inside of the parameters field inside of
dataSources .
type string Required null The data source to be used for the
Azure OpenAI on your data feature.
For Azure Cognitive search the
value is AzureCognitiveSearch .
inScope boolean Optional true If set, this value will limit responses
specific to the grounding data
content.
Image generation
HTTP
POST https://{your-resource-
name}.openai.azure.com/openai/images/generations:submit?api-version={api-
version}
Path parameters
api-version string Required The API version to use for this operation. This follows the
YYYY-MM-DD format.
Supported versions
2023-06-01-preview
Request body
Parameter Type Required? Default Description
size string Optional 1024x1024 The size of the generated images. Must be one
of 256x256 , 512x512 , or 1024x1024 .
Example request
Console
curl -X POST
https://YOUR_RESOURCE_NAME.openai.azure.com/openai/images/generations:submit
?api-version=2023-06-01-preview \
-H "Content-Type: application/json" \
-H "api-key: YOUR_API_KEY" \
-d '{
"size": "512x512",
"n": 3
}'
Example response
The operation returns a 202 status code and an GenerateImagesResponse JSON object
containing the ID and status of the operation.
JSON
"id": "f508bcf2-e651-4b4b-85a7-58ad77981ffa",
"status": "notRunning"
HTTP
GET https://{your-resource-
name}.openai.azure.com/openai/operations/images/{operation-id}?api-version=
{api-version}
Path parameters
operation-id string Required The GUID that identifies the original image generation
request.
Supported versions
2023-06-01-preview
Example request
Console
-H "Content-Type: application/json"
Example response
Upon success the operation returns a 200 status code and an OperationResponse JSON
object. The status field can be "notRunning" (task is queued but hasn't started yet),
"running" , "succeeded" , "canceled" (task has timed out), "failed" , or "deleted" . A
succeeded status indicates that the generated image is available for download at the
given URL. If multiple images were generated, their URLs are all returned in the
result.data field.
JSON
"created": 1685064331,
"expires": 1685150737,
"id": "4b755937-3173-4b49-bf3f-da6702a3971a",
"result": {
"data": [
"url": "<URL_TO_IMAGE>"
},
"url": "<URL_TO_NEXT_IMAGE>"
},
...
},
"status": "succeeded"
HTTP
DELETE https://{your-resource-
name}.openai.azure.com/openai/operations/images/{operation-id}?api-version=
{api-version}
Path parameters
operation-id string Required The GUID that identifies the original image generation
request.
Supported versions
2023-06-01-preview
Example request
Console
-H "Content-Type: application/json"
Response
The operation returns a 204 status code if successful. This API only succeeds if the
operation is in an end state (not running ).
Management APIs
Azure OpenAI is deployed as a part of the Azure Cognitive Services. All Cognitive
Services rely on the same set of management APIs for creation, update and delete
operations. The management APIs are also used for deploying models within an OpenAI
resource.
Next steps
Learn about managing deployments, models, and fine-tuning with the REST API.
Learn
more about the underlying models that power Azure OpenAI.
Fine Tunes
Reference
Service: Cognitive Services
API Version: 2022-12-01
Operations
Cancel Cancels the processing of the fine-tune job specified by the
given fine-tune-id.
Get Gets details for a single fine-tune job specified by the given fine-
tune-id.
The details contain the base model, training and
validation files, hyper parameters,...
Get Events Gets the events for the fine-tune job specified by the given fine-
tune-id.
Events are created when the job status changes, e.g.
running or complete, and when res...
List Gets a list of all fine-tune jobs owned by the Azure OpenAI
resource.
The details that are returned for each fine-tune job
contain besides its identifier
the base...
Deployments - Create Or Update
Reference
Service: Cognitive Services
API Version: 2021-10-01
Update the state of specified deployments associated with the Cognitive Services account.
HTTP
PUT
https://management.azure.com/subscriptions/{subscriptionId}/resourceGroups/{resourceGroupName}/providers/Microsoft.Cognitiv
eServices/accounts/{accountName}/deployments/{deploymentName}?api-version=2021-10-01
URI Parameters
Name In Required Type Description
deploymentName path True string The name of the deployment associated with the Cognitive Services Account
resourceGroupName path True string The name of the resource group. The name is case insensitive.
api-version query True string The API version to use for this operation.
Request Body
Name Type Description
Responses
Name Type Description
Other Status ErrorResponse Error response describing why the operation failed.
Codes
Examples
PutDeployment
Sample Request
HTTP
HTTP
PUT
https://management.azure.com/subscriptions/subscriptionId/resourceGroups/resourceGroupName/providers/Microsoft.Cognitiv
eServices/accounts/accountName/deployments/deploymentName?api-version=2021-10-01
"properties": {
"model": {
"format": "OpenAI",
"name": "ada",
"version": "1"
},
"scaleSettings": {
"scaleType": "Manual",
"capacity": 1
Sample Response
Status code: 200
JSON
"id":
"/subscriptions/subscriptionId/resourceGroups/resourceGroupName/providers/Microsoft.CognitiveServices/accounts/accountName/
deployments/deploymentName",
"name": "deploymentName",
"type": "Microsoft.CognitiveServices/accounts/deployments",
"properties": {
"model": {
"format": "OpenAI",
"name": "ada",
"version": "1"
},
"scaleSettings": {
"scaleType": "Manual",
"capacity": 1
},
"provisioningState": "Succeeded"
JSON
"id":
"/subscriptions/subscriptionId/resourceGroups/resourceGroupName/providers/Microsoft.CognitiveServices/accounts/accountName/
deployments/deploymentName",
"name": "deploymentName",
"type": "Microsoft.CognitiveServices/accounts/deployments",
"properties": {
"model": {
"format": "OpenAI",
"name": "ada",
"version": "1"
},
"scaleSettings": {
"scaleType": "Manual",
"capacity": 1
},
"provisioningState": "Accepted"
Definitions
Name Description
createdByType The type of identity that created the resource.
DeploymentProvisioningState Gets the status of the resource at the time the operation was called.
createdByType
The type of identity that created the resource.
Application string
Key string
ManagedIdentity string
User string
Deployment
Cognitive Services account deployment.
systemData systemData Metadata pertaining to creation and last modification of the resource.
DeploymentProperties
Properties of Cognitive Services account deployment.
provisioningState Deployment Gets the status of the resource at the time the operation was called.
ProvisioningState
DeploymentProvisioningState
Gets the status of the resource at the time the operation was called.
Accepted string
Creating string
Deleting string
Failed string
Moving string
Succeeded string
DeploymentScaleSettings
Properties of Cognitive Services account deployment model.
DeploymentScaleType
Deployment scale type.
Manual string
Standard string
ErrorAdditionalInfo
The resource management error additional info.
ErrorDetail
The error detail.
ErrorResponse
Error response
systemData
Metadata pertaining to creation and last modification of the resource.
lastModifiedByType createdByType The type of identity that last modified the resource.
Azure.AI.OpenAI Namespace
Reference
) Important
Classes
ChatChoice The representation of a single completions result for a chat
completions request.
StreamingChatChoice
StreamingChatCompletions
StreamingChoice
StreamingCompletions
Structs
ChatRole A description of the intended purpose of a message within a
chat completions interaction.
Enums
OpenAIClientOptions.Service The version of the service to use.
Version
com.azure.ai.openai
Reference
Package: com.azure.ai.openai
Maven Artifact: com.azure:azure-ai-openai:1.0.0-beta.2
Package containing the classes for OpenAI. Azure OpenAI APIs for completions and
search.
Classes
OpenAIAsyncClient Initializes a new instance of the asynchronous OpenAIClient type.
Enums
OpenAIServiceVersion Service version of OpenAIClient.
@azure/openai package
Reference
Classes
AzureKeyCredential A static-key-based credential that supports updating
the
underlying key value.
API Key
JavaScript
Interfaces
ChatChoice The representation of a single prompt completion as part of an
overall chat completions request.
Generally, n choices are
generated per provided prompt with a default value of 1.
Token
limits and other settings may limit the number of choices
generated.
GetCompletionsOptions
GetEmbeddingsOptions
OpenAIClientOptions
RequestOptions
Type Aliases
ChatRole "system", "assistant", "user"
Are you just starting to explore the functionality of Azure Cognitive Services? Perhaps
you are implementing a new feature in your application. Or after using the service, do
you have suggestions on how to improve it? Here are options for where you can get
support, stay up-to-date, give feedback, and report bugs for Cognitive Services.
Explore the range of Azure support options and choose the plan that best fits,
whether you're a developer just starting your cloud journey or a large organization
deploying business-critical, strategic applications. Azure customers can create and
manage support requests in the Azure portal.
Azure portal
Azure portal for the United States government
If you can't find an answer to your problem using search, submit a new question to
Microsoft Q&A. Use one of the following tags when you ask your question:
Cognitive Services
Vision
Computer Vision
Custom Vision
Face
Form Recognizer
Video Indexer
Language
Immersive Reader
Language Understanding (LUIS)
QnA Maker
Language service
Translator
Speech
Speech service
Decision
Anomaly Detector
Content Moderator
Metrics Advisor
Personalizer
Azure OpenAI
Azure OpenAI
For answers on your developer questions from the largest community developer
ecosystem, ask your question on Stack Overflow.
If you do submit a new question to Stack Overflow, please use one or more of the
following tags when you create the question:
Cognitive Services
Vision
Computer Vision
Custom Vision
Face
Form Recognizer
Video Indexer
Language
Immersive Reader
Language Understanding (LUIS)
QnA Maker
Language service
Translator
Speech
Speech service
Decision
Anomaly Detector
Content Moderator
Metrics Advisor
Personalizer
Azure OpenAI
Azure OpenAI
Submit feedback
To request new features, post them on https://feedback.azure.com . Share your ideas
for making Cognitive Services and its APIs work better for the applications you develop.
Cognitive Services
Vision
Computer Vision
Custom Vision
Face
Form Recognizer
Video Indexer
Language
Immersive Reader
Language Understanding (LUIS)
QnA Maker
Language service
Translator
Speech
Speech service
Decision
Anomaly Detector
Content Moderator
Metrics Advisor
Personalizer
Stay informed
Staying informed about features in a new release or news on the Azure blog can help
you find the difference between a programming error, a service bug, or a feature not yet
available in Cognitive Services.
Next steps
What are Azure Cognitive Services?