1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
|
Metadata-Version: 2.1
Name: azure-ai-inference
Version: 1.0.0b9
Summary: Microsoft Azure AI Inference Client Library for Python
Home-page: https://github.com/Azure/azure-sdk-for-python/tree/main/sdk/ai/azure-ai-inference
Author: Microsoft Corporation
Author-email: azpysdkhelp@microsoft.com
License: MIT License
Keywords: azure,azure sdk
Classifier: Development Status :: 4 - Beta
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: License :: OSI Approved :: MIT License
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: isodate >=0.6.1
Requires-Dist: azure-core >=1.30.0
Requires-Dist: typing-extensions >=4.6.0
Provides-Extra: opentelemetry
Requires-Dist: azure-core-tracing-opentelemetry ; extra == 'opentelemetry'
Provides-Extra: prompts
Requires-Dist: pyyaml ; extra == 'prompts'
# Azure AI Inference client library for Python
Use the Inference client library (in preview) to:
* Authenticate against the service
* Get information about the AI model
* Do chat completions
* Get text embeddings
* Get image embeddings
The Inference client library supports AI models deployed to the following services:
* [GitHub Models](https://github.com/marketplace/models) - Free-tier endpoint for AI models from different providers
* Serverless API endpoints and Managed Compute endpoints - AI models from different providers deployed from [Azure AI Foundry](https://ai.azure.com). See [Overview: Deploy models, flows, and web apps with Azure AI Foundry](https://learn.microsoft.com/azure/ai-studio/concepts/deployments-overview).
* Azure OpenAI Service - OpenAI models deployed from [Azure AI Foundry](https://oai.azure.com/). See [What is Azure OpenAI Service?](https://learn.microsoft.com/azure/ai-services/openai/overview). Although we recommend you use the official [OpenAI client library](https://pypi.org/project/openai/) in your production code for this service, you can use the Azure AI Inference client library to easily compare the performance of OpenAI models to other models, using the same client library and Python code.
The Inference client library makes services calls using REST API version `2024-05-01-preview`, as documented in [Azure AI Model Inference API](https://aka.ms/azureai/modelinference).
[Product documentation](https://aka.ms/aiservices/inference)
| [Samples](https://github.com/Azure/azure-sdk-for-python/tree/main/sdk/ai/azure-ai-inference/samples)
| [API reference documentation](https://aka.ms/azsdk/azure-ai-inference/python/reference)
| [Package (Pypi)](https://aka.ms/azsdk/azure-ai-inference/python/package)
| [SDK source code](https://github.com/Azure/azure-sdk-for-python/tree/main/sdk/ai/azure-ai-inference/azure/ai/inference)
## Reporting issues
To report an issue with the client library, or request additional features, please open a GitHub issue [here](https://github.com/Azure/azure-sdk-for-python/issues). Mention the package name "azure-ai-inference" in the title or content.
## Getting started
### Prerequisites
* [Python 3.8](https://www.python.org/) or later installed, including [pip](https://pip.pypa.io/en/stable/).
* For GitHub models
* The AI model name, such as "gpt-4o" or "mistral-large"
* A GitHub personal access token. [Create one here](https://github.com/settings/tokens). You do not need to give any permissions to the token. The token is a string that starts with `github_pat_`.
* For Serverless API endpoints or Managed Compute endpoints
* An [Azure subscription](https://azure.microsoft.com/free).
* An [AI Model from the catalog](https://ai.azure.com/explore/models) deployed through Azure AI Foundry.
* The endpoint URL of your model, in of the form `https://<your-host-name>.<your-azure-region>.models.ai.azure.com`, where `your-host-name` is your unique model deployment host name and `your-azure-region` is the Azure region where the model is deployed (e.g. `eastus2`).
* Depending on your authentication preference, you either need an API key to authenticate against the service, or Entra ID credentials.
* For Azure OpenAI (AOAI) service
* An [Azure subscription](https://azure.microsoft.com/free).
* An [OpenAI Model from the catalog](https://oai.azure.com/resource/models) deployed through Azure AI Foundry.
* The endpoint URL of your model, in the form `https://<your-resouce-name>.openai.azure.com/openai/deployments/<your-deployment-name>`, where `your-resource-name` is your globally unique AOAI resource name, and `your-deployment-name` is your AI Model deployment name.
* Depending on your authentication preference, you either need an API key to authenticate against the service, or Entra ID credentials.
* An api-version. Latest preview or GA version listed in the `Data plane - inference` row in [the API Specs table](https://aka.ms/azsdk/azure-ai-inference/azure-openai-api-versions). At the time of writing, latest GA version was "2024-06-01".
### Install the package
To install the Azure AI Inferencing package use the following command:
```bash
pip install azure-ai-inference
```
To update an existing installation of the package, use:
```bash
pip install --upgrade azure-ai-inference
```
If you want to install Azure AI Inferencing package with support for OpenTelemetry based tracing, use the following command:
```bash
pip install azure-ai-inference[opentelemetry]
```
## Key concepts
### Create and authenticate a client directly, using API key or GitHub token
The package includes two clients `ChatCompletionsClient` and `EmbeddingsClient`<!-- and `ImageGenerationClients`-->. Both can be created in the similar manner. For example, assuming `endpoint`, `key` and `github_token` are strings holding your endpoint URL, API key or GitHub token, this Python code will create and authenticate a synchronous `ChatCompletionsClient`:
```python
from azure.ai.inference import ChatCompletionsClient
from azure.core.credentials import AzureKeyCredential
# For GitHub models
client = ChatCompletionsClient(
endpoint="https://models.inference.ai.azure.com",
credential=AzureKeyCredential(github_token),
model="mistral-large" # Update as needed. Alternatively, you can include this is the `complete` call.
)
# For Serverless API or Managed Compute endpoints
client = ChatCompletionsClient(
endpoint=endpoint, # Of the form https://<your-host-name>.<your-azure-region>.models.ai.azure.com
credential=AzureKeyCredential(key)
)
# For Azure OpenAI endpoint
client = ChatCompletionsClient(
endpoint=endpoint, # Of the form https://<your-resouce-name>.openai.azure.com/openai/deployments/<your-deployment-name>
credential=AzureKeyCredential(key),
api_version="2024-06-01", # Azure OpenAI api-version. See https://aka.ms/azsdk/azure-ai-inference/azure-openai-api-versions
)
```
A synchronous client supports synchronous inference methods, meaning they will block until the service responds with inference results. For simplicity the code snippets below all use synchronous methods. The client offers equivalent asynchronous methods which are more commonly used in production.
To create an asynchronous client, Install the additional package [aiohttp](https://pypi.org/project/aiohttp/):
```bash
pip install aiohttp
```
and update the code above to import `asyncio`, and import `ChatCompletionsClient` from the `azure.ai.inference.aio` namespace instead of `azure.ai.inference`. For example:
```python
import asyncio
from azure.ai.inference.aio import ChatCompletionsClient
from azure.core.credentials import AzureKeyCredential
# For Serverless API or Managed Compute endpoints
client = ChatCompletionsClient(
endpoint=endpoint,
credential=AzureKeyCredential(key)
)
```
### Create and authenticate a client directly, using Entra ID
_Note: At the time of writing, only Managed Compute endpoints and Azure OpenAI endpoints support Entra ID authentication.
To use an Entra ID token credential, first install the [azure-identity](https://github.com/Azure/azure-sdk-for-python/tree/main/sdk/identity/azure-identity) package:
```python
pip install azure.identity
```
You will need to provide the desired credential type obtained from that package. A common selection is [DefaultAzureCredential](https://github.com/Azure/azure-sdk-for-python/tree/main/sdk/identity/azure-identity#defaultazurecredential) and it can be used as follows:
```python
from azure.ai.inference import ChatCompletionsClient
from azure.identity import DefaultAzureCredential
# For Managed Compute endpoints
client = ChatCompletionsClient(
endpoint=endpoint,
credential=DefaultAzureCredential(exclude_interactive_browser_credential=False)
)
# For Azure OpenAI endpoint
client = ChatCompletionsClient(
endpoint=endpoint,
credential=DefaultAzureCredential(exclude_interactive_browser_credential=False),
credential_scopes=["https://cognitiveservices.azure.com/.default"],
api_version="2024-06-01", # Azure OpenAI api-version. See https://aka.ms/azsdk/azure-ai-inference/azure-openai-api-versions
)
```
During application development, you would typically set up the environment for authentication using Entra ID by first [Installing the Azure CLI](https://learn.microsoft.com/cli/azure/install-azure-cli), running `az login` in your console window, then entering your credentials in the browser window that was opened. The call to `DefaultAzureCredential()` will then succeed. Setting `exclude_interactive_browser_credential=False` in that call will enable launching a browser window if the user isn't already logged in.
### Defining default settings while creating the clients
You can define default chat completions or embeddings configurations while constructing the relevant client. These configurations will be applied to all future service calls.
For example, here we create a `ChatCompletionsClient` using API key authentication, and apply two settings, `temperature` and `max_tokens`:
```python
from azure.ai.inference import ChatCompletionsClient
from azure.core.credentials import AzureKeyCredential
# For Serverless API or Managed Compute endpoints
client = ChatCompletionsClient(
endpoint=endpoint,
credential=AzureKeyCredential(key),
temperature=0.5,
max_tokens=1000
)
```
Default settings can be overridden in individual service calls.
### Create and authenticate clients using `load_client`
If you are using Serverless API or Managed Compute endpoints, there is an alternative to creating a specific client directly. You can instead use the function `load_client` to return the relevant client (of types `ChatCompletionsClient` or `EmbeddingsClient`) based on the provided endpoint:
```python
from azure.ai.inference import load_client
from azure.core.credentials import AzureKeyCredential
# For Serverless API or Managed Compute endpoints only.
# This will not work on GitHub Models endpoint or Azure OpenAI endpoint.
client = load_client(
endpoint=endpoint,
credential=AzureKeyCredential(key)
)
print(f"Created client of type `{type(client).__name__}`.")
```
To load an asynchronous client, import the `load_client` function from `azure.ai.inference.aio` instead.
Entra ID authentication is also supported by the `load_client` function. Replace the key authentication above with `credential=DefaultAzureCredential(exclude_interactive_browser_credential=False)` for example.
### Get AI model information
If you are using Serverless API or Managed Compute endpoints, you can call the client method `get_model_info` to retrive AI model information. This makes a REST call to the `/info` route on the provided endpoint, as documented in [the REST API reference](https://learn.microsoft.com/azure/ai-studio/reference/reference-model-inference-info). This call will not work for GitHub Models or Azure OpenAI endpoints.
<!-- SNIPPET:sample_get_model_info.get_model_info -->
```python
model_info = client.get_model_info()
print(f"Model name: {model_info.model_name}")
print(f"Model provider name: {model_info.model_provider_name}")
print(f"Model type: {model_info.model_type}")
```
<!-- END SNIPPET -->
AI model information is cached in the client, and futher calls to `get_model_info` will access the cached value and wil not result in a REST API call. Note that if you created the client using `load_client` function, model information will already be cached in the client.
AI model information is displayed (if available) when you `print(client)`.
### Chat Completions
The `ChatCompletionsClient` has a method named `complete`. The method makes a REST API call to the `/chat/completions` route on the provided endpoint, as documented in [the REST API reference](https://learn.microsoft.com/azure/ai-studio/reference/reference-model-inference-chat-completions).
See simple chat completion examples below. More can be found in the [samples](https://github.com/Azure/azure-sdk-for-python/tree/main/sdk/ai/azure-ai-inference/samples) folder.
### Text Embeddings
The `EmbeddingsClient` has a method named `embed`. The method makes a REST API call to the `/embeddings` route on the provided endpoint, as documented in [the REST API reference](https://learn.microsoft.com/azure/ai-studio/reference/reference-model-inference-embeddings).
See simple text embedding example below. More can be found in the [samples](https://github.com/Azure/azure-sdk-for-python/tree/main/sdk/ai/azure-ai-inference/samples) folder.
### Image Embeddings
The `ImageEmbeddingsClient` has a method named `embed`. The method makes a REST API call to the `/images/embeddings` route on the provided endpoint, as documented in [the REST API reference](https://learn.microsoft.com/azure/ai-studio/reference/reference-model-inference-images-embeddings).
See simple image embedding example below. More can be found in the [samples](https://github.com/Azure/azure-sdk-for-python/tree/main/sdk/ai/azure-ai-inference/samples) folder.
## Examples
In the following sections you will find simple examples of:
* [Chat completions](#chat-completions-example)
* [Streaming chat completions](#streaming-chat-completions-example)
* [Adding model-specific parameters](#adding-model-specific-parameters)
* [Adding HTTP request headers](#adding-http-request-headers)
* [Text Embeddings](#text-embeddings-example)
* [Image Embeddings](#image-embeddings-example)
The examples create a synchronous client assuming a Serverless API or Managed Compute endpoint. Modify client
construction code as descirbed in [Key concepts](#key-concepts) to have it work with GitHub Models endpoint or Azure OpenAI
endpoint. Only mandatory input settings are shown for simplicity.
See the [Samples](https://github.com/Azure/azure-sdk-for-python/tree/main/sdk/ai/azure-ai-inference/samples) folder for full working samples for synchronous and asynchronous clients.
### Chat completions example
This example demonstrates how to generate a single chat completions, for a Serverless API or Managed Compute endpoint, with key authentication, assuming `endpoint` and `key` are already defined. For Entra ID authentication, GitHub models endpoint or Azure OpenAI endpoint, modify the code to create the client as specified in the above sections.
<!-- SNIPPET:sample_chat_completions.chat_completions -->
```python
from azure.ai.inference import ChatCompletionsClient
from azure.ai.inference.models import SystemMessage, UserMessage
from azure.core.credentials import AzureKeyCredential
client = ChatCompletionsClient(endpoint=endpoint, credential=AzureKeyCredential(key))
response = client.complete(
messages=[
SystemMessage("You are a helpful assistant."),
UserMessage("How many feet are in a mile?"),
],
)
print(response.choices[0].message.content)
print(f"\nToken usage: {response.usage}")
```
<!-- END SNIPPET -->
The following types of messages are supported: `SystemMessage`,`UserMessage`, `AssistantMessage`, `ToolMessage`, `DeveloperMessage`. See also samples:
* [sample_chat_completions_with_tools.py](https://github.com/Azure/azure-sdk-for-python/blob/main/sdk/ai/azure-ai-inference/samples/sample_chat_completions_with_tools.py) for usage of `ToolMessage`.
* [sample_chat_completions_with_image_url.py](https://github.com/Azure/azure-sdk-for-python/blob/main/sdk/ai/azure-ai-inference/samples/sample_chat_completions_with_image_url.py) for usage of `UserMessage` that
includes sending an image URL.
* [sample_chat_completions_with_image_data.py](https://github.com/Azure/azure-sdk-for-python/blob/main/sdk/ai/azure-ai-inference/samples/sample_chat_completions_with_image_data.py) for usage of `UserMessage` that
includes sending image data read from a local file.
* [sample_chat_completions_with_audio_data.py](https://github.com/Azure/azure-sdk-for-python/blob/main/sdk/ai/azure-ai-inference/samples/sample_chat_completions_with_image_data.py) for usage of `UserMessage` that includes sending audio data read from a local file.
* [sample_chat_completions_with_structured_output.py](https://github.com/Azure/azure-sdk-for-python/blob/main/sdk/ai/azure-ai-inference/samples/sample_chat_completions_with_structured_output.py) and [sample_chat_completions_with_structured_output_pydantic.py](https://github.com/Azure/azure-sdk-for-python/blob/main/sdk/ai/azure-ai-inference/samples/sample_chat_completions_with_structured_output_pydantic.py) for configuring the service to respond with a JSON-formatted string, adhering to your schema.
Alternatively, you can provide the full request body as a Python dictionary (`dict` object) instead of using the strongly typed classes like `SystemMessage` and `UserMessage`:
<!-- SNIPPET:sample_chat_completions_from_input_dict.chat_completions_full_request_as_dict -->
```python
response = client.complete(
{
"messages": [
{
"role": "system",
"content": "You are an AI assistant that helps people find information. Your replies are short, no more than two sentences.",
},
{
"role": "user",
"content": "What year was construction of the International Space Station mostly done?",
},
{
"role": "assistant",
"content": "The main construction of the International Space Station (ISS) was completed between 1998 and 2011. During this period, more than 30 flights by US space shuttles and 40 by Russian rockets were conducted to transport components and modules to the station.",
},
{"role": "user", "content": "And what was the estimated cost to build it?"},
]
}
)
```
<!-- END SNIPPET -->
Or you can provide just the `messages` input argument as a list of Python `dict`:
<!-- SNIPPET:sample_chat_completions_from_input_dict.chat_completions_messages_as_dict -->
```python
response = client.complete(
messages=[
{
"role": "system",
"content": "You are an AI assistant that helps people find information.",
},
{
"role": "user",
"content": "How many feet are in a mile?",
},
]
)
```
<!-- END SNIPPET -->
To generate completions for additional messages, simply call `client.complete` multiple times using the same `client`.
### Streaming chat completions example
This example demonstrates how to generate a single chat completions with streaming response, for a Serverless API or Managed Compute endpoint, with key authentication, assuming `endpoint` and `key` are already defined. You simply need to add `stream=True` to the `complete` call to enable streaming.
For Entra ID authentication, GitHub models endpoint or Azure OpenAI endpoint, modify the code to create the client as specified in the above sections.
<!-- SNIPPET:sample_chat_completions_streaming.chat_completions_streaming -->
```python
from azure.ai.inference import ChatCompletionsClient
from azure.ai.inference.models import SystemMessage, UserMessage
from azure.core.credentials import AzureKeyCredential
client = ChatCompletionsClient(endpoint=endpoint, credential=AzureKeyCredential(key))
response = client.complete(
stream=True,
messages=[
SystemMessage("You are a helpful assistant."),
UserMessage("Give me 5 good reasons why I should exercise every day."),
],
)
for update in response:
if update.choices and update.choices[0].delta:
print(update.choices[0].delta.content or "", end="", flush=True)
if update.usage:
print(f"\n\nToken usage: {update.usage}")
client.close()
```
<!-- END SNIPPET -->
In the above `for` loop that prints the results you should see the answer progressively get longer as updates get streamed to the client.
To generate completions for additional messages, simply call `client.complete` multiple times using the same `client`.
### Adding model-specific parameters
In this example, extra JSON elements are inserted at the root of the request body by setting `model_extras` when calling the `complete` method of the `ChatCompletionsClient`. These are intended for AI models that require additional model-specific parameters beyond what is defined in the REST API [Request Body table](https://learn.microsoft.com/azure/ai-studio/reference/reference-model-inference-chat-completions#request-body).
<!-- SNIPPET:sample_chat_completions_with_model_extras.model_extras -->
```python
response = client.complete(
messages=[
SystemMessage("You are a helpful assistant."),
UserMessage("How many feet are in a mile?"),
],
model_extras={"key1": "value1", "key2": "value2"}, # Optional. Additional parameters to pass to the model.
)
```
<!-- END SNIPPET -->
In the above example, this will be the JSON payload in the HTTP request:
```json
{
"messages":
[
{"role":"system","content":"You are a helpful assistant."},
{"role":"user","content":"How many feet are in a mile?"}
],
"key1": "value1",
"key2": "value2"
}
```
Note that by default, the service will reject any request payload that includes extra parameters. In order to change the default service behaviour, when the `complete` method includes `model_extras`, the client library will automatically add the HTTP request header `"extra-parameters": "pass-through"`.
Use the same method to add additional paramaters in the request of other clients in this package.
### Adding HTTP request headers
To add your own HTTP request headers, include a `headers` keyword in the client constructor, and specify a `dict` with your
header names and values. For example:
```python
client = ChatCompletionsClient(
endpoint=endpoint,
credential=AzureKeyCredential(key),
headers={"header1", "value1", "header2", "value2"}
)
```
And similarly for the other clients in this package.
### Text Embeddings example
This example demonstrates how to get text embeddings, for a Serverless API or Managed Compute endpoint, with key authentication, assuming `endpoint` and `key` are already defined. For Entra ID authentication, GitHub models endpoint or Azure OpenAI endpoint, modify the code to create the client as specified in the above sections.
<!-- SNIPPET:sample_embeddings.embeddings -->
```python
from azure.ai.inference import EmbeddingsClient
from azure.core.credentials import AzureKeyCredential
client = EmbeddingsClient(endpoint=endpoint, credential=AzureKeyCredential(key))
response = client.embed(input=["first phrase", "second phrase", "third phrase"])
for item in response.data:
length = len(item.embedding)
print(
f"data[{item.index}]: length={length}, [{item.embedding[0]}, {item.embedding[1]}, "
f"..., {item.embedding[length-2]}, {item.embedding[length-1]}]"
)
```
<!-- END SNIPPET -->
The length of the embedding vector depends on the model, but you should see something like this:
```text
data[0]: length=1024, [0.0013399124, -0.01576233, ..., 0.007843018, 0.000238657]
data[1]: length=1024, [0.036590576, -0.0059547424, ..., 0.011405945, 0.004863739]
data[2]: length=1024, [0.04196167, 0.029083252, ..., -0.0027484894, 0.0073127747]
```
To generate embeddings for additional phrases, simply call `client.embed` multiple times using the same `client`.
### Image Embeddings example
This example demonstrates how to get image embeddings, for a Serverless API or Managed Compute endpoint, with key authentication, assuming `endpoint` and `key` are already defined. For Entra ID authentication, GitHub models endpoint or Azure OpenAI endpoint, modify the code to create the client as specified in the above sections.
<!-- SNIPPET:sample_image_embeddings.image_embeddings -->
```python
from azure.ai.inference import ImageEmbeddingsClient
from azure.ai.inference.models import ImageEmbeddingInput
from azure.core.credentials import AzureKeyCredential
client = ImageEmbeddingsClient(endpoint=endpoint, credential=AzureKeyCredential(key))
response = client.embed(input=[ImageEmbeddingInput.load(image_file="sample1.png", image_format="png")])
for item in response.data:
length = len(item.embedding)
print(
f"data[{item.index}]: length={length}, [{item.embedding[0]}, {item.embedding[1]}, "
f"..., {item.embedding[length-2]}, {item.embedding[length-1]}]"
)
```
<!-- END SNIPPET -->
The length of the embedding vector depends on the model, but you should see something like this:
```text
data[0]: length=1024, [0.0103302, -0.04425049, ..., -0.011543274, -0.0009088516]
```
To generate image embeddings for additional images, simply call `client.embed` multiple times using the same `client`.
## Troubleshooting
### Exceptions
The `complete`, `embed` and `get_model_info` methods on the clients raise an [HttpResponseError](https://learn.microsoft.com/python/api/azure-core/azure.core.exceptions.httpresponseerror) exception for a non-success HTTP status code response from the service. The exception's `status_code` will hold the HTTP response status code (with `reason` showing the friendly name). The exception's `error.message` contains a detailed message that may be helpful in diagnosing the issue:
```python
from azure.core.exceptions import HttpResponseError
...
try:
result = client.complete( ... )
except HttpResponseError as e:
print(f"Status code: {e.status_code} ({e.reason})")
print(e.message)
```
For example, when you provide a wrong authentication key:
```text
Status code: 401 (Unauthorized)
Operation returned an invalid status 'Unauthorized'
```
Or when you create an `EmbeddingsClient` and call `embed` on the client, but the endpoint does not
support the `/embeddings` route:
```text
Status code: 405 (Method Not Allowed)
Operation returned an invalid status 'Method Not Allowed'
```
### Logging
The client uses the standard [Python logging library](https://docs.python.org/3/library/logging.html). The SDK logs HTTP request and response details, which may be useful in troubleshooting. To log to stdout, add the following:
```python
import sys
import logging
# Acquire the logger for this client library. Use 'azure' to affect both
# 'azure.core` and `azure.ai.inference' libraries.
logger = logging.getLogger("azure")
# Set the desired logging level. logging.INFO or logging.DEBUG are good options.
logger.setLevel(logging.DEBUG)
# Direct logging output to stdout:
handler = logging.StreamHandler(stream=sys.stdout)
# Or direct logging output to a file:
# handler = logging.FileHandler(filename="sample.log")
logger.addHandler(handler)
# Optional: change the default logging format. Here we add a timestamp.
formatter = logging.Formatter("%(asctime)s:%(levelname)s:%(name)s:%(message)s")
handler.setFormatter(formatter)
```
By default logs redact the values of URL query strings, the values of some HTTP request and response headers (including `Authorization` which holds the key or token), and the request and response payloads. To create logs without redaction, do these two things:
1. Set the method argument `logging_enable = True` when you construct the client library, or when you call the client's `complete` or `embed` methods.
```python
client = ChatCompletionsClient(
endpoint=endpoint,
credential=AzureKeyCredential(key),
logging_enable=True
)
```
1. Set the log level to `logging.DEBUG`. Logs will be redacted with any other log level.
Be sure to protect non redacted logs to avoid compromising security.
For more information, see [Configure logging in the Azure libraries for Python](https://aka.ms/azsdk/python/logging)
### Reporting issues
To report an issue with the client library, or request additional features, please open a GitHub issue [here](https://github.com/Azure/azure-sdk-for-python/issues). Mention "azure-ai-inference" in the title or content.
## Observability With OpenTelemetry
The Azure AI Inference client library provides experimental support for tracing with OpenTelemetry.
You can capture prompt and completion contents by setting `AZURE_TRACING_GEN_AI_CONTENT_RECORDING_ENABLED` environment to `true` (case insensitive).
By default prompts, completions, function name, parameters or outputs are not recorded.
### Setup with Azure Monitor
When using Azure AI Inference library with [Azure Monitor OpenTelemetry Distro](https://learn.microsoft.com/azure/azure-monitor/app/opentelemetry-enable?tabs=python),
distributed tracing for Azure AI Inference calls is enabled by default when using latest version of the distro.
### Setup with OpenTelemetry
Check out your observability vendor documentation on how to configure OpenTelemetry or refer to the [official OpenTelemetry documentation](https://opentelemetry.io/docs/languages/python/).
#### Installation
Make sure to install OpenTelemetry and the Azure SDK tracing plugin via
```bash
pip install opentelemetry
pip install azure-core-tracing-opentelemetry
```
You will also need an exporter to send telemetry to your observability backend. You can print traces to the console or use a local viewer such as [Aspire Dashboard](https://learn.microsoft.com/dotnet/aspire/fundamentals/dashboard/standalone?tabs=bash).
To connect to Aspire Dashboard or another OpenTelemetry compatible backend, install OTLP exporter:
```bash
pip install opentelemetry-exporter-otlp
```
#### Configuration
To enable Azure SDK tracing set `AZURE_SDK_TRACING_IMPLEMENTATION` environment variable to `opentelemetry`.
Or configure it in the code with the following snippet:
<!-- SNIPPET:sample_chat_completions_with_tracing.trace_setting -->
```python
from azure.core.settings import settings
settings.tracing_implementation = "opentelemetry"
```
<!-- END SNIPPET -->
Please refer to [azure-core-tracing-documentation](https://learn.microsoft.com/python/api/overview/azure/core-tracing-opentelemetry-readme) for more information.
The final step is to enable Azure AI Inference instrumentation with the following code snippet:
<!-- SNIPPET:sample_chat_completions_with_tracing.instrument_inferencing -->
```python
from azure.ai.inference.tracing import AIInferenceInstrumentor
# Instrument AI Inference API
AIInferenceInstrumentor().instrument()
```
<!-- END SNIPPET -->
It is also possible to uninstrument the Azure AI Inferencing API by using the uninstrument call. After this call, the traces will no longer be emitted by the Azure AI Inferencing API until instrument is called again.
<!-- SNIPPET:sample_chat_completions_with_tracing.uninstrument_inferencing -->
```python
AIInferenceInstrumentor().uninstrument()
```
<!-- END SNIPPET -->
### Tracing Your Own Functions
The `@tracer.start_as_current_span` decorator can be used to trace your own functions. This will trace the function parameters and their values. You can also add further attributes to the span in the function implementation as demonstrated below. Note that you will have to setup the tracer in your code before using the decorator. More information is available [here](https://opentelemetry.io/docs/languages/python/).
<!-- SNIPPET:sample_chat_completions_with_tracing.trace_function -->
```python
from opentelemetry.trace import get_tracer
tracer = get_tracer(__name__)
# The tracer.start_as_current_span decorator will trace the function call and enable adding additional attributes
# to the span in the function implementation. Note that this will trace the function parameters and their values.
@tracer.start_as_current_span("get_temperature") # type: ignore
def get_temperature(city: str) -> str:
# Adding attributes to the current span
span = trace.get_current_span()
span.set_attribute("requested_city", city)
if city == "Seattle":
return "75"
elif city == "New York City":
return "80"
else:
return "Unavailable"
```
<!-- END SNIPPET -->
## Next steps
* Have a look at the [Samples](https://github.com/Azure/azure-sdk-for-python/tree/main/sdk/ai/azure-ai-inference/samples) folder, containing fully runnable Python code for doing inference using synchronous and asynchronous clients.
## Contributing
This project welcomes contributions and suggestions. Most contributions require
you to agree to a Contributor License Agreement (CLA) declaring that you have
the right to, and actually do, grant us the rights to use your contribution.
For details, visit [https://cla.microsoft.com](https://cla.microsoft.com).
When you submit a pull request, a CLA-bot will automatically determine whether
you need to provide a CLA and decorate the PR appropriately (e.g., label,
comment). Simply follow the instructions provided by the bot. You will only
need to do this once across all repos using our CLA.
This project has adopted the
[Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct). For more information,
see the Code of Conduct FAQ or contact opencode@microsoft.com with any
additional questions or comments.
<!-- Note: I did not use LINKS section here with a list of `[link-label](link-url)` because these
links don't work in the Sphinx generated documentation. The index.html page of these docs
include this README, but with broken links.-->
|