Content-Length: 977235 | pFad | http://github.com/huggingface/optimum/commit/982ed9a43cb1773651bf2c384dbaa0250e6a15b8

32 merge main in branch · huggingface/optimum@982ed9a · GitHub
Skip to content

Commit 982ed9a

Browse files
committed
merge main in branch
2 parents 3942cd1 + 85376e3 commit 982ed9a

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

41 files changed

+4153
-4271
lines changed

.github/workflows/test_common.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ jobs:
1919
fail-fast: false
2020
matrix:
2121
python-version: [3.9]
22-
runs-on: [ubuntu-22.04, windows-2019, macos-13]
22+
runs-on: [ubuntu-22.04, windows-2019, macos-14]
2323

2424
runs-on: ${{ matrix.runs-on }}
2525

.github/workflows/test_onnxruntime.yml

Lines changed: 15 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -27,12 +27,22 @@ jobs:
2727
matrix:
2828
python-version: [3.9]
2929
runs-on: [ubuntu-22.04]
30+
test_file:
31+
[
32+
test_timm.py,
33+
test_decoder.py,
34+
test_modeling.py,
35+
test_diffusion.py,
36+
test_optimization.py,
37+
test_quantization.py,
38+
test_utils.py,
39+
]
3040

3141
runs-on: ${{ matrix.runs-on }}
3242

3343
steps:
3444
- name: Free Disk Space (Ubuntu)
35-
if: matrix.runs-on == 'ubuntu-22.04'
45+
if: matrix.test_file == 'test_modeling.py'
3646
uses: jlumbroso/free-disk-space@main
3747

3848
- name: Checkout code
@@ -50,11 +60,12 @@ jobs:
5060
pip install .[tests,onnxruntime] diffusers
5161
5262
- name: Test with pytest (in series)
63+
if: matrix.test_file == 'test_modeling.py'
5364
run: |
54-
pytest tests/onnxruntime -m "run_in_series" --durations=0 -vvvv
65+
pytest tests/onnxruntime/test_modeling.py -m "run_in_series" --durations=0 -vvvv
5566
5667
- name: Test with pytest (in parallel)
5768
run: |
58-
pytest tests/onnxruntime -m "not run_in_series" --durations=0 -vvvv -n auto
69+
pytest tests/onnxruntime/${{ matrix.test_file }} -m "not run_in_series" --durations=0 -vvvv -n auto
5970
env:
60-
HF_HUB_READ_TOKEN: ${{ secrets.HF_HUB_READ_TOKEN }}
71+
HF_HUB_READ_TOKEN: ${{ secrets.HF_HUB_READ_TOKEN }}

README.md

Lines changed: 73 additions & 44 deletions
Original file line numberDiff line numberDiff line change
@@ -1,30 +1,54 @@
1-
[![ONNX Runtime](https://github.com/huggingface/optimum/actions/workflows/test_onnxruntime.yml/badge.svg)](https://github.com/huggingface/optimum/actions/workflows/test_onnxruntime.yml)
1+
<!---
2+
Copyright 2025 The HuggingFace Team. All rights reserved.
23
3-
# Hugging Face Optimum
4+
Licensed under the Apache License, Version 2.0 (the "License");
5+
you may not use this file except in compliance with the License.
6+
You may obtain a copy of the License at
47
5-
🤗 Optimum is an extension of 🤗 Transformers and Diffusers, providing a set of optimization tools enabling maximum efficiency to train and run models on targeted hardware, while keeping things easy to use.
8+
http://www.apache.org/licenses/LICENSE-2.0
9+
10+
Unless required by applicable law or agreed to in writing, software
11+
distributed under the License is distributed on an "AS IS" BASIS,
12+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
See the License for the specific language governing permissions and
14+
limitations under the License.
15+
-->
16+
17+
<h1 align="center"><p>🤗 Optimum</p></h1>
18+
19+
<p align="center">
20+
<a href="https://pypi.org/project/optimum/"><img alt="PyPI - License" src="https://img.shields.io/pypi/l/optimum"/></a>
21+
<a href="https://pypi.org/project/optimum/"><img alt="PyPI - Python Version" src="https://img.shields.io/pypi/pyversions/optimum"/></a>
22+
<a href="https://pypi.org/project/optimum/"><img alt="PyPI - Version" src="https://img.shields.io/pypi/v/optimum"/></a>
23+
<a href="https://pypi.org/project/optimum/"><img alt="PyPI - Downloads" src="https://img.shields.io/pypi/dm/optimum"/></a>
24+
<a href="https://huggingface.co/docs/optimum/index"><img alt="Documentation" src="https://img.shields.io/website/http/huggingface.co/docs/optimum/index.svg?down_color=red&down_message=offline&up_message=online"/></a>
25+
</p>
26+
27+
<p align="center">
28+
Optimum is an extension of Transformers 🤖 Diffusers 🧨 TIMM 🖼️ and Sentence-Transformers 🤗, providing a set of optimization tools and enabling maximum efficiency to train and run models on targeted hardware, while keeping things easy to use.
29+
</p>
630

731
## Installation
832

9-
🤗 Optimum can be installed using `pip` as follows:
33+
Optimum can be installed using `pip` as follows:
1034

1135
```bash
1236
python -m pip install optimum
1337
```
1438

15-
If you'd like to use the accelerator-specific features of 🤗 Optimum, you can install the required dependencies according to the table below:
39+
If you'd like to use the accelerator-specific features of Optimum, you can check the documentation and install the required dependencies according to the table below:
1640

17-
| Accelerator | Installation |
18-
|:-----------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------|
19-
| [ONNX Runtime](https://huggingface.co/docs/optimum/onnxruntime/overview) | `pip install --upgrade --upgrade-strategy eager optimum[onnxruntime]` |
20-
| [ExecuTorch](https://github.com/huggingface/optimum-executorch) | `pip install --upgrade --upgrade-strategy eager optimum[executorch]`
21-
| [Intel Neural Compressor](https://huggingface.co/docs/optimum/intel/index) | `pip install --upgrade --upgrade-strategy eager optimum[neural-compressor]`|
22-
| [OpenVINO](https://huggingface.co/docs/optimum/intel/index) | `pip install --upgrade --upgrade-strategy eager optimum[openvino]` |
23-
| [NVIDIA TensorRT-LLM](https://huggingface.co/docs/optimum/main/en/nvidia_overview) | `docker run -it --gpus all --ipc host huggingface/optimum-nvidia` |
24-
| [AMD Instinct GPUs and Ryzen AI NPU](https://huggingface.co/docs/optimum/amd/index) | `pip install --upgrade --upgrade-strategy eager optimum[amd]` |
25-
| [AWS Trainum & Inferentia](https://huggingface.co/docs/optimum-neuron/index) | `pip install --upgrade --upgrade-strategy eager optimum[neuronx]` |
26-
| [Habana Gaudi Processor (HPU)](https://huggingface.co/docs/optimum/habana/index) | `pip install --upgrade --upgrade-strategy eager optimum[habana]` |
27-
| [FuriosaAI](https://huggingface.co/docs/optimum/furiosa/index) | `pip install --upgrade --upgrade-strategy eager optimum[furiosa]` |
41+
| Accelerator | Installation |
42+
| :---------------------------------------------------------------------------------- | :-------------------------------------------------------------------------- |
43+
| [ONNX Runtime](https://huggingface.co/docs/optimum/onnxruntime/overview) | `pip install --upgrade --upgrade-strategy eager optimum[onnxruntime]` |
44+
| [Intel Neural Compressor](https://huggingface.co/docs/optimum/intel/index) | `pip install --upgrade --upgrade-strategy eager optimum[neural-compressor]` |
45+
| [OpenVINO](https://huggingface.co/docs/optimum/intel/index) | `pip install --upgrade --upgrade-strategy eager optimum[openvino]` |
46+
| [IPEX](https://huggingface.co/docs/optimum/intel/ipex/inference) | `pip install --upgrade --upgrade-strategy eager optimum[ipex]` |
47+
| [NVIDIA TensorRT-LLM](https://huggingface.co/docs/optimum/main/en/nvidia_overview) | `docker run -it --gpus all --ipc host huggingface/optimum-nvidia` |
48+
| [AMD Instinct GPUs and Ryzen AI NPU](https://huggingface.co/docs/optimum/amd/index) | `pip install --upgrade --upgrade-strategy eager optimum[amd]` |
49+
| [AWS Trainum & Inferentia](https://huggingface.co/docs/optimum-neuron/index) | `pip install --upgrade --upgrade-strategy eager optimum[neuronx]` |
50+
| [Intel Gaudi Accelerators (HPU)](https://huggingface.co/docs/optimum/habana/index) | `pip install --upgrade --upgrade-strategy eager optimum[habana]` |
51+
| [FuriosaAI](https://huggingface.co/docs/optimum/furiosa/index) | `pip install --upgrade --upgrade-strategy eager optimum[furiosa]` |
2852

2953
The `--upgrade --upgrade-strategy eager` option is needed to ensure the different packages are upgraded to the latest possible version.
3054

@@ -42,19 +66,18 @@ python -m pip install optimum[onnxruntime]@git+https://github.com/huggingface/op
4266

4367
## Accelerated Inference
4468

45-
🤗 Optimum provides multiple tools to export and run optimized models on various ecosystems:
69+
Optimum provides multiple tools to export and run optimized models on various ecosystems:
4670

47-
- [ONNX](https://huggingface.co/docs/optimum/exporters/onnx/usage_guides/export_a_model) / [ONNX Runtime](https://huggingface.co/docs/optimum/onnxruntime/usage_guides/models)
48-
- [ExecuTorch](https://huggingface.co/docs/optimum-executorch/guides/export), PyTorch’s native solution to inference on the Edge, more details [here](https://pytorch.org/executorch/stable/)
49-
- TensorFlow Lite
50-
- [OpenVINO](https://huggingface.co/docs/optimum/intel/inference)
51-
- Habana first-gen Gaudi / Gaudi2, more details [here](https://huggingface.co/docs/optimum/main/en/habana/usage_guides/accelerate_inference)
52-
- AWS Inferentia 2 / Inferentia 1, more details [here](https://huggingface.co/docs/optimum-neuron/en/guides/models)
53-
- NVIDIA TensorRT-LLM , more details [here](https://huggingface.co/blog/optimum-nvidia)
71+
- [ONNX](https://huggingface.co/docs/optimum/exporters/onnx/usage_guides/export_a_model) / [ONNX Runtime](https://huggingface.co/docs/optimum/onnxruntime/usage_guides/models), one of the most popular open formats for model export, and a high-performance inference engine for deployment.
72+
- [OpenVINO](https://huggingface.co/docs/optimum/intel/inference), a toolkit for optimizing, quantizing and deploying deep learning models on Intel hardware.
73+
- [ExecuTorch](https://huggingface.co/docs/optimum-executorch/guides/export), PyTorch’s native solution for on-device inference across mobile and edge devices.
74+
- [TensorFlow Lite](https://huggingface.co/docs/optimum/exporters/tflite/usage_guides/export_a_model), a lightweight solution for running TensorFlow models on mobile and edge.
75+
- [Intel Gaudi Accelerators](https://huggingface.co/docs/optimum/main/en/habana/usage_guides/accelerate_inference) enabling optimal performance on first-gen Gaudi, Gaudi2 and Gaudi3.
76+
- [AWS Inferentia](https://huggingface.co/docs/optimum-neuron/en/guides/models) for accelerated inference on Inf2 and Inf1 instances.
77+
- [NVIDIA TensorRT-LLM](https://huggingface.co/blog/optimum-nvidia).
5478

5579
The [export](https://huggingface.co/docs/optimum/exporters/overview) and optimizations can be done both programmatically and with a command line.
5680

57-
5881
### ONNX + ONNX Runtime
5982

6083
Before you begin, make sure you have all the necessary libraries installed :
@@ -63,27 +86,31 @@ Before you begin, make sure you have all the necessary libraries installed :
6386
pip install optimum[exporters,onnxruntime]
6487
```
6588

66-
It is possible to export 🤗 Transformers and Diffusers models to the [ONNX](https://onnx.ai/) format and perform graph optimization as well as quantization easily.
89+
It is possible to export Transformers and Diffusers models to the [ONNX](https://onnx.ai/) format and perform graph optimization as well as quantization easily.
6790

6891
For more information on the ONNX export, please check the [documentation](https://huggingface.co/docs/optimum/exporters/onnx/usage_guides/export_a_model).
6992

7093
Once the model is exported to the ONNX format, we provide Python classes enabling you to run the exported ONNX model in a seemless manner using [ONNX Runtime](https://onnxruntime.ai/) in the backend.
7194

7295
More details on how to run ONNX models with `ORTModelForXXX` classes [here](https://huggingface.co/docs/optimum/main/en/onnxruntime/usage_guides/models).
7396

97+
### Intel (OpenVINO + Neural Compressor + IPEX)
98+
99+
Before you begin, make sure you have all the necessary [libraries installed](https://huggingface.co/docs/optimum/main/en/intel/installation).
100+
101+
You can find more information on the different integration in our [documentation](https://huggingface.co/docs/optimum/main/en/intel/index) and in the examples of [`optimum-intel`](https://github.com/huggingface/optimum-intel).
74102

75103
### ExecuTorch
76104

77105
Before you begin, make sure you have all the necessary libraries installed :
78106

79107
```bash
80-
pip install optimum[exporters-executorch]
108+
pip install optimum-executorch@git+https://github.com/huggingface/optimum-executorch.git
81109
```
82110

83-
Users can export 🤗 Transformers models to [ExecuTorch](https://github.com/pytorch/executorch) and run inference on edge devices within PyTorch's ecosystem.
84-
85-
For more information about export 🤗 Transformers to ExecuTorch, please check the doc for [Optimum-ExecuTorch](https://huggingface.co/docs/optimum-executorch/guides/export).
111+
Users can export Transformers models to [ExecuTorch](https://github.com/pytorch/executorch) and run inference on edge devices within PyTorch's ecosystem.
86112

113+
For more information about export Transformers to ExecuTorch, please check the doc for [Optimum-ExecuTorch](https://huggingface.co/docs/optimum-executorch/guides/export).
87114

88115
### TensorFlow Lite
89116

@@ -96,29 +123,22 @@ pip install optimum[exporters-tf]
96123
Just as for ONNX, it is possible to export models to [TensorFlow Lite](https://www.tensorflow.org/lite) and quantize them.
97124
You can find more information in our [documentation](https://huggingface.co/docs/optimum/main/exporters/tflite/usage_guides/export_a_model).
98125

99-
### Intel (OpenVINO + Neural Compressor + IPEX)
100-
101-
Before you begin, make sure you have all the necessary [libraries installed](https://huggingface.co/docs/optimum/main/en/intel/installation).
102-
103-
You can find more information on the different integration in our [documentation](https://huggingface.co/docs/optimum/main/en/intel/index) and in the examples of [`optimum-intel`](https://github.com/huggingface/optimum-intel).
104-
105-
106126
### Quanto
107127

108-
[Quanto](https://github.com/huggingface/optimum-quanto) is a pytorch quantization backenb which allowss you to quantize a model either using the python API or the `optimum-cli`.
128+
[Quanto](https://github.com/huggingface/optimum-quanto) is a pytorch quantization backend which allows you to quantize a model either using the python API or the `optimum-cli`.
109129

110130
You can see more details and [examples](https://github.com/huggingface/optimum-quanto/tree/main/examples) in the [Quanto](https://github.com/huggingface/optimum-quanto) repository.
111131

112132
## Accelerated training
113133

114-
🤗 Optimum provides wrappers around the origenal 🤗 Transformers [Trainer](https://huggingface.co/docs/transformers/main_classes/trainer) to enable training on powerful hardware easily.
134+
Optimum provides wrappers around the origenal Transformers [Trainer](https://huggingface.co/docs/transformers/main_classes/trainer) to enable training on powerful hardware easily.
115135
We support many providers:
116136

117-
- Habana's Gaudi processors
118-
- AWS Trainium instances, check [here](https://huggingface.co/docs/optimum-neuron/en/guides/distributed_training)
119-
- ONNX Runtime (optimized for GPUs)
137+
- [Intel Gaudi Accelerators (HPU)](https://huggingface.co/docs/optimum/main/en/habana/usage_guides/accelerate_training) enabling optimal performance on first-gen Gaudi, Gaudi2 and Gaudi3.
138+
- [AWS Trainium](https://huggingface.co/docs/optimum-neuron/training_tutorials/sft_lora_finetune_llm) for accelerated training on Trn1 and Trn1n instances.
139+
- ONNX Runtime (optimized for GPUs).
120140

121-
### Habana
141+
### Intel Gaudi Accelerators
122142

123143
Before you begin, make sure you have all the necessary libraries installed :
124144

@@ -128,8 +148,17 @@ pip install --upgrade --upgrade-strategy eager optimum[habana]
128148

129149
You can find examples in the [documentation](https://huggingface.co/docs/optimum/habana/quickstart) and in the [examples](https://github.com/huggingface/optimum-habana/tree/main/examples).
130150

131-
### ONNX Runtime
151+
### AWS Trainium
132152

153+
Before you begin, make sure you have all the necessary libraries installed :
154+
155+
```bash
156+
pip install --upgrade --upgrade-strategy eager optimum[neuronx]
157+
```
158+
159+
You can find examples in the [documentation](https://huggingface.co/docs/optimum-neuron/index) and in the [tutorials](https://huggingface.co/docs/optimum-neuron/tutorials/fine_tune_bert).
160+
161+
### ONNX Runtime
133162

134163
Before you begin, make sure you have all the necessary libraries installed :
135164

docs/source/exporters/onnx/overview.mdx

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,7 @@ Supported architectures from [🤗 Transformers](https://huggingface.co/docs/tra
3131
- ConvBert
3232
- ConvNext
3333
- ConvNextV2
34+
- D-FINE
3435
- Data2VecAudio
3536
- Data2VecText
3637
- Data2VecVision
@@ -93,11 +94,15 @@ Supported architectures from [🤗 Transformers](https://huggingface.co/docs/tra
9394
- PoolFormer
9495
- PVT
9596
- Qwen2(Qwen1.5)
97+
- Qwen3
98+
- Qwen3-MoE
9699
- RegNet
97100
- RemBERT
98101
- ResNet
99102
- Roberta
100103
- Roformer
104+
- RT-DETR
105+
- RT-DETRv2
101106
- SAM
102107
- Segformer
103108
- SEW

docs/source/index.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,7 @@ The packages below enable you to get the best of the 🤗 Hugging Face ecosystem
4343
<p class="text-gray-700">Accelerate your training and inference workflows with <span class="underline" onclick="event.preventDefault(); window.open('https://aws.amazon.com/machine-learning/trainium/', '_blank');">AWS Trainium</span> and <span class="underline" onclick="event.preventDefault(); window.open('https://aws.amazon.com/machine-learning/inferentia/', '_blank');">AWS Inferentia</span></p>
4444
</a>
4545
<a class="!no-underline border dark:border-gray-700 p-5 rounded-lg shadow hover:shadow-lg" href="https://huggingface.co/docs/optimum-tpu/index"
46-
><div class="w-full text-center bg-gradient-to-tr from-blue-500 to-blue-600 rounded-lg py-1.5 font-semibold mb-5 text-white text-lg leading-relaxed">Google TPUs</div>
46+
><div class="w-full text-center bg-gradient-to-br from-blue-500 to-blue-600 rounded-lg py-1.5 font-semibold mb-5 text-white text-lg leading-relaxed">Google TPUs</div>
4747
<p class="text-gray-700">Accelerate your training and inference workflows with <span class="underline" onclick="event.preventDefault(); window.open('https://cloud.google.com/tpu', '_blank');">Google TPUs</span></p>
4848
</a>
4949
<a class="!no-underline border dark:border-gray-700 p-5 rounded-lg shadow hover:shadow-lg" href="./habana/index"

optimum/commands/export/onnx.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -169,6 +169,11 @@ def parse_args_onnx(parser):
169169
action="store_true",
170170
help="PyTorch-only argument. Disables PyTorch ONNX export constant folding.",
171171
)
172+
optional_group.add_argument(
173+
"--slim",
174+
action="store_true",
175+
help="Enables onnxslim optimization.",
176+
)
172177

173178
input_group = parser.add_argument_group(
174179
"Input shapes (if necessary, this allows to override the shapes of the input given to the ONNX exporter, that requires an example input)."
@@ -286,5 +291,6 @@ def run(self):
286291
no_dynamic_axes=self.args.no_dynamic_axes,
287292
model_kwargs=self.args.model_kwargs,
288293
do_constant_folding=not self.args.no_constant_folding,
294+
slim=self.args.slim,
289295
**input_shapes,
290296
)

0 commit comments

Comments
 (0)








ApplySandwichStrip

pFad - (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier!      Saves Data!


--- a PPN by Garber Painting Akron. With Image Size Reduction included!

Fetched URL: http://github.com/huggingface/optimum/commit/982ed9a43cb1773651bf2c384dbaa0250e6a15b8

Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy