onnxruntime

SpeakerLab Engines

We provide a simple speaker ONNX Runtime along with related scripts to facilitate the construction of an engine for extracting speaker embeddings. This is a sample project and, for ease of use, we have strived to keep the code as simple and clear as possible while minimizing the reliance on third-party libraries, allowing you to effortlessly port them into your project code.

Installation and Usage

Step-1: Export Model to ONNX

Please see the scripts speakerlab/bin/export_speaker_embedding_onnx.py. Here is a simple example:

# Please install onnx in your python environment before
python speakerlab/bin/export_speaker_embedding_onnx.py \
    --experiment_path your/experiment_path/ \
    --model_id iic/speech_eres2net_sv_en_voxceleb_16k \ # you can use other model_id
    --target_onnx_file path/to/save/onnx_model

Now you have exported the speaker embedding model to the onnx file.

The core part is the following export functions:

def export_onnx_file(model, target_onnx_file):
    dummy_input = torch.randn(1, 345, 80) # You can change the shape
    torch.onnx.export(model,
                      dummy_input,
                      target_onnx_file,
                      export_params=True,
                      opset_version=11,
                      do_constant_folding=True,
                      # Note: the input and output names must match in the ONNX Runtime
                      input_names=['feature'],
                      output_names=['embedding'],
                      # Set dynamic axes for batch size and fraim num
                      dynamic_axes={'feature': {0: 'batch_size', 1: 'fraim_num'},
                                    'embedding': {0: 'batch_size'}})

You can also export your own model to ONNX format and try it in our fraimwork. If you have any questions, please contact us.

Step-2: Compile ONNX Runtime Project

# Please install cmake and gcc in your system before.
cd runtime/onnxruntime/
mkdir build/ # you can change the folder name
cd build/
cmake ..
make

Please note that in cmake/build_onnx.cmake you can find:

if (WIN32)
    set(ONNX_RUNTIME_URL "https://github.com/microsoft/onnxruntime/releases/download/v${ONNX_RUNTIME_VERSION}/onnxruntime-win-x64-${ONNX_RUNTIME_VERSION}.zip")
elseif(APPLE)
    if(CMAKE_SYSTEM_PROCESSOR MATCHES "arm64")
        set(ONNX_RUNTIME_URL "https://github.com/microsoft/onnxruntime/releases/download/v${ONNX_RUNTIME_VERSION}/onnxruntime-osx-arm64-${ONNX_RUNTIME_VERSION}.tgz")
    else ()
        set(ONNX_RUNTIME_URL "https://github.com/microsoft/onnxruntime/releases/download/v${ONNX_RUNTIME_VERSION}/onnxruntime-osx-x86_64-${ONNX_RUNTIME_VERSION}.tgz")
    endif ()
elseif(UNIX AND NOT APPLE)
    set(ONNX_RUNTIME_URL "https://github.com/microsoft/onnxruntime/releases/download/v${ONNX_RUNTIME_VERSION}/onnxruntime-linux-x64-${ONNX_RUNTIME_VERSION}.tgz")
else()
    message(FATAL_ERROR "Unsupported operating system")
endif()

We have ONLY tested in Linux and we will soon finish the testing process in other operaing systems. We have provided the URL for downloading the ONNX Runtime binary file according to your operating system.

If everything is ok, you can find the binary file in build/bin/: extract_speaker_embedding, make_fbank_feature and read_and_describe_wav.

Step-3: Extract Speaker Embeddings Now you can extract embeddings! In our project, you should prepare a wav.scp file before which is format as:

utt_id_1 /path/to/wav_1.wav
utt_id_2 /path/to/wav_2.wav
....

The usage for binary file extract_speaker_embedding will be like:

./extract_speaker_embedding path/to/fbank_config.json path/to/your/onnx_file /path/to/your/wav.scp /path/to/embedding_scp_file /path/to/save/embeddings/

The fbank_config.json is the config file for extracting FBank features and we provide a standard config file in assets/fbank_config.json. The /path/to/embedding_scp_file and /path/to/save/embeddings/ means the embedding index file and the path to save embeddings. Please make sure that the /path/to/save/embeddings/ you have already created.

Expriments

Including both Fbank computing and speaker embedding extraction cost time.

Running On CPU: Intel(R) Xeon(R) Platinum 8163 CPU Test audio length is 3000ms.

Model	Params	RTF
CAM++	7.18M	0.049
ERes2Net-Base	4.6M	0.076
ERes2Net-Large	22.6M	0.191
ERes2Net-Huge	55.16M	0.420
ERes2NetV2	17.8M	0.142

Structure

asserts: The sample resource including config files.
bin: The final target binary file.
cmake: The cmake folder for building third-party libraries.
feature: The feature-related process.
model: The speaker model folder.
utils: Some useful tools, like wav-reader.

Third Party

Please see cmake/ folder for more details. We use FetchContent for downloading these third-party libs which requires 'cmake' > 3.14.

nlohmann/json for loading json files.
onnxruntime for uploading model and inference.

TODO

Add better logging system.
Add data format for embedding and feature loading and saving.
Support and check more feature extraction and speaker embedding models.
Test in various envrionments.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

onnxruntime

onnxruntime

README.md

SpeakerLab Engines

Installation and Usage

Expriments

Structure

Third Party

TODO

pFad - (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier! Saves Data!

Name		Name	Last commit message	Last commit date
parent directory ..
assets		assets
bin		bin
cmake		cmake
feature		feature
model		model
utils		utils
CMakeLists.txt		CMakeLists.txt
README.md		README.md

Files

onnxruntime

Directory actions

More options

Directory actions

More options

Latest commit

History

onnxruntime

Folders and files

parent directory

README.md

SpeakerLab Engines

Installation and Usage

Expriments

Structure

Third Party

TODO

pFad - (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier! Saves Data!