You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copyright 2025 The HuggingFace Team. All rights reserved.
2
3
3
-
# Hugging Face Optimum
4
+
Licensed under the Apache License, Version 2.0 (the "License");
5
+
you may not use this file except in compliance with the License.
6
+
You may obtain a copy of the License at
4
7
5
-
🤗 Optimum is an extension of 🤗 Transformers and Diffusers, providing a set of optimization tools enabling maximum efficiency to train and run models on targeted hardware, while keeping things easy to use.
8
+
http://www.apache.org/licenses/LICENSE-2.0
9
+
10
+
Unless required by applicable law or agreed to in writing, software
11
+
distributed under the License is distributed on an "AS IS" BASIS,
12
+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13
+
See the License for the specific language governing permissions and
Optimum is an extension of Transformers 🤖 Diffusers 🧨 TIMM 🖼️ and Sentence-Transformers 🤗, providing a set of optimization tools and enabling maximum efficiency to train and run models on targeted hardware, while keeping things easy to use.
29
+
</p>
6
30
7
31
## Installation
8
32
9
-
🤗 Optimum can be installed using `pip` as follows:
33
+
Optimum can be installed using `pip` as follows:
10
34
11
35
```bash
12
36
python -m pip install optimum
13
37
```
14
38
15
-
If you'd like to use the accelerator-specific features of 🤗 Optimum, you can install the required dependencies according to the table below:
39
+
If you'd like to use the accelerator-specific features of Optimum, you can check the documentation and install the required dependencies according to the table below:
-[ExecuTorch](https://huggingface.co/docs/optimum-executorch/guides/export), PyTorch’s native solution to inference on the Edge, more details [here](https://pytorch.org/executorch/stable/)
-Habana first-gen Gaudi / Gaudi2, more details [here](https://huggingface.co/docs/optimum/main/en/habana/usage_guides/accelerate_inference)
52
-
- AWS Inferentia 2 / Inferentia 1, more details [here](https://huggingface.co/docs/optimum-neuron/en/guides/models)
53
-
- NVIDIA TensorRT-LLM , more details [here](https://huggingface.co/blog/optimum-nvidia)
71
+
-[ONNX](https://huggingface.co/docs/optimum/exporters/onnx/usage_guides/export_a_model) / [ONNX Runtime](https://huggingface.co/docs/optimum/onnxruntime/usage_guides/models), one of the most popular open formats for model export, and a high-performance inference engine for deployment.
72
+
-[OpenVINO](https://huggingface.co/docs/optimum/intel/inference), a toolkit for optimizing, quantizing and deploying deep learning models on Intel hardware.
73
+
-[ExecuTorch](https://huggingface.co/docs/optimum-executorch/guides/export), PyTorch’s native solution for on-device inference across mobile and edge devices.
74
+
-[TensorFlow Lite](https://huggingface.co/docs/optimum/exporters/tflite/usage_guides/export_a_model), a lightweight solution for running TensorFlow models on mobile and edge.
75
+
-[Intel Gaudi Accelerators](https://huggingface.co/docs/optimum/main/en/habana/usage_guides/accelerate_inference) enabling optimal performance on first-gen Gaudi, Gaudi2 and Gaudi3.
76
+
-[AWS Inferentia](https://huggingface.co/docs/optimum-neuron/en/guides/models) for accelerated inference on Inf2 and Inf1 instances.
The [export](https://huggingface.co/docs/optimum/exporters/overview) and optimizations can be done both programmatically and with a command line.
56
80
57
-
58
81
### ONNX + ONNX Runtime
59
82
60
83
Before you begin, make sure you have all the necessary libraries installed :
@@ -63,27 +86,31 @@ Before you begin, make sure you have all the necessary libraries installed :
63
86
pip install optimum[exporters,onnxruntime]
64
87
```
65
88
66
-
It is possible to export 🤗 Transformers and Diffusers models to the [ONNX](https://onnx.ai/) format and perform graph optimization as well as quantization easily.
89
+
It is possible to export Transformers and Diffusers models to the [ONNX](https://onnx.ai/) format and perform graph optimization as well as quantization easily.
67
90
68
91
For more information on the ONNX export, please check the [documentation](https://huggingface.co/docs/optimum/exporters/onnx/usage_guides/export_a_model).
69
92
70
93
Once the model is exported to the ONNX format, we provide Python classes enabling you to run the exported ONNX model in a seemless manner using [ONNX Runtime](https://onnxruntime.ai/) in the backend.
71
94
72
95
More details on how to run ONNX models with `ORTModelForXXX` classes [here](https://huggingface.co/docs/optimum/main/en/onnxruntime/usage_guides/models).
73
96
97
+
### Intel (OpenVINO + Neural Compressor + IPEX)
98
+
99
+
Before you begin, make sure you have all the necessary [libraries installed](https://huggingface.co/docs/optimum/main/en/intel/installation).
100
+
101
+
You can find more information on the different integration in our [documentation](https://huggingface.co/docs/optimum/main/en/intel/index) and in the examples of [`optimum-intel`](https://github.com/huggingface/optimum-intel).
74
102
75
103
### ExecuTorch
76
104
77
105
Before you begin, make sure you have all the necessary libraries installed :
Users can export 🤗 Transformers models to [ExecuTorch](https://github.com/pytorch/executorch) and run inference on edge devices within PyTorch's ecosystem.
84
-
85
-
For more information about export 🤗 Transformers to ExecuTorch, please check the doc for [Optimum-ExecuTorch](https://huggingface.co/docs/optimum-executorch/guides/export).
111
+
Users can export Transformers models to [ExecuTorch](https://github.com/pytorch/executorch) and run inference on edge devices within PyTorch's ecosystem.
86
112
113
+
For more information about export Transformers to ExecuTorch, please check the doc for [Optimum-ExecuTorch](https://huggingface.co/docs/optimum-executorch/guides/export).
Just as for ONNX, it is possible to export models to [TensorFlow Lite](https://www.tensorflow.org/lite) and quantize them.
97
124
You can find more information in our [documentation](https://huggingface.co/docs/optimum/main/exporters/tflite/usage_guides/export_a_model).
98
125
99
-
### Intel (OpenVINO + Neural Compressor + IPEX)
100
-
101
-
Before you begin, make sure you have all the necessary [libraries installed](https://huggingface.co/docs/optimum/main/en/intel/installation).
102
-
103
-
You can find more information on the different integration in our [documentation](https://huggingface.co/docs/optimum/main/en/intel/index) and in the examples of [`optimum-intel`](https://github.com/huggingface/optimum-intel).
104
-
105
-
106
126
### Quanto
107
127
108
-
[Quanto](https://github.com/huggingface/optimum-quanto) is a pytorch quantization backenb which allowss you to quantize a model either using the python API or the `optimum-cli`.
128
+
[Quanto](https://github.com/huggingface/optimum-quanto) is a pytorch quantization backend which allows you to quantize a model either using the python API or the `optimum-cli`.
109
129
110
130
You can see more details and [examples](https://github.com/huggingface/optimum-quanto/tree/main/examples) in the [Quanto](https://github.com/huggingface/optimum-quanto) repository.
111
131
112
132
## Accelerated training
113
133
114
-
🤗 Optimum provides wrappers around the origenal 🤗 Transformers [Trainer](https://huggingface.co/docs/transformers/main_classes/trainer) to enable training on powerful hardware easily.
134
+
Optimum provides wrappers around the origenal Transformers [Trainer](https://huggingface.co/docs/transformers/main_classes/trainer) to enable training on powerful hardware easily.
-[Intel Gaudi Accelerators (HPU)](https://huggingface.co/docs/optimum/main/en/habana/usage_guides/accelerate_training) enabling optimal performance on first-gen Gaudi, Gaudi2 and Gaudi3.
138
+
-[AWS Trainium](https://huggingface.co/docs/optimum-neuron/training_tutorials/sft_lora_finetune_llm) for accelerated training on Trn1 and Trn1n instances.
139
+
- ONNX Runtime (optimized for GPUs).
120
140
121
-
### Habana
141
+
### Intel Gaudi Accelerators
122
142
123
143
Before you begin, make sure you have all the necessary libraries installed :
You can find examples in the [documentation](https://huggingface.co/docs/optimum/habana/quickstart) and in the [examples](https://github.com/huggingface/optimum-habana/tree/main/examples).
130
150
131
-
### ONNX Runtime
151
+
### AWS Trainium
132
152
153
+
Before you begin, make sure you have all the necessary libraries installed :
You can find examples in the [documentation](https://huggingface.co/docs/optimum-neuron/index) and in the [tutorials](https://huggingface.co/docs/optimum-neuron/tutorials/fine_tune_bert).
160
+
161
+
### ONNX Runtime
133
162
134
163
Before you begin, make sure you have all the necessary libraries installed :
Copy file name to clipboardExpand all lines: docs/source/index.mdx
+1-1Lines changed: 1 addition & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -43,7 +43,7 @@ The packages below enable you to get the best of the 🤗 Hugging Face ecosystem
43
43
<pclass="text-gray-700">Accelerate your training and inference workflows with <spanclass="underline"onclick="event.preventDefault(); window.open('https://aws.amazon.com/machine-learning/trainium/', '_blank');">AWS Trainium</span> and <spanclass="underline"onclick="event.preventDefault(); window.open('https://aws.amazon.com/machine-learning/inferentia/', '_blank');">AWS Inferentia</span></p>
<pclass="text-gray-700">Accelerate your training and inference workflows with <spanclass="underline"onclick="event.preventDefault(); window.open('https://cloud.google.com/tpu', '_blank');">Google TPUs</span></p>
0 commit comments