ONNX ViT model fails with DNN_BACKEND_INFERENCE_ENGINE in OpenCV, but works in all other cases

### System Information

Environment

Windows 10

OpenCV: 4.11.0

OpenVINO: 2024.4.0

Model: ViT model (opset 17) exported to ONNX

Input shape: [1, 3, 112, 112]

### Detailed description

I'm testing a ViT model exported to ONNX. It runs fine using OpenVINO Runtime (in both Python and C++), and also works in OpenCV when using the DNN_BACKEND_OPENCV backend.

It also works with DNN_BACKEND_INFERENCE_ENGINE if I first convert the ONNX model to OpenVINO IR format (.xml and .bin) using the Model Optimizer.

The issue happens when I use the ONNX file directly with cv::dnn::readNetFromONNX() and set the backend to DNN_BACKEND_INFERENCE_ENGINE. In that case, I get this error:

```
[ INFO:0@11.880] global op_inf_engine.cpp:133 cv::dnn::detectArmPlugin_ CPU plugin: 13th Gen Intel(R) Core(TM) i7-1370P
OpenCV(4.12.0-dev) Error: Assertion failed (sz == src.get_size()) in cv::dnn::InfEngineNgraphNet::init, file C:\Users\cesar.gouveia\Projects\OpenCV-Package\opencv_mirror\modules\dnn\src\ie_ngraph.cpp, line 256
OpenCV: terminate handler is called! The last OpenCV error is:
OpenCV(4.12.0-dev) Error: Assertion failed (sz == src.get_size()) in cv::dnn::InfEngineNgraphNet::init, file C:\Users\cesar.gouveia\Projects\OpenCV-Package\opencv_mirror\modules\dnn\src\ie_ngraph.cpp, line 256
```

OpenCV(4.12.0-dev) Error: Assertion failed (sz == src.get_size()) in InfEngineNgraphNet::init
Tested with OpenCV 4.11.0 and OpenVINO 2024.4. The model uses opset 17 and input shape [1, 3, 112, 112]. I garanteed that the dimensions were [1,3,112,112] and not [batch_size, 3, 112, 112] because I know that OpenVINO has limitations with that. 

I need the model to remain in ONNX format. Using IR is not a long-term solution. This only breaks when using OpenCV + Inference Engine with ONNX. Please fix or confirm if this is a known limitation.

You can find the model here: https://we.tl/t-u7vryBKkOw

You can find a reproducible code down here.

Thank you!

### Steps to reproduce

```
#include "pch.h"

#include <iostream>
#include <chrono>
#include <fstream>

#include <opencv2/highgui.hpp>
#include <opencv2/imgcodecs.hpp>
#include <opencv2/imgproc.hpp>
#include <opencv2/dnn.hpp>
#include <opencv2/core.hpp>

std::string imageFilename = "C:/Users/cesar.gouveia/Downloads/1708014478513.jpeg";
std::string modelFilename = "C:/Users/cesar.gouveia/Downloads/vit_model.onnx";
std::string modelXml = "C:/Users/cesar.gouveia/Downloads/vit_ir/vit_model.xml";
std::string modelBin = "C:/Users/cesar.gouveia/Downloads/vit_ir/vit_model.bin";
cv::dnn::Backend targetBackend = cv::dnn::DNN_BACKEND_INFERENCE_ENGINE;
cv::dnn::Target targetDevice = cv::dnn::DNN_TARGET_CPU;

unsigned int numInferences = 100;
cv::Size modelInputSize = cv::Size(112, 112);
cv::ImreadModes imReadMode = cv::IMREAD_COLOR;
std::vector<std::string> inputLayerNames = {"input"};
bool swapRBChannels = false;
unsigned int numChannels = 3;

int main()
{
    cv::dnn::Net net = cv::dnn::readNetFromONNX(modelFilename);

    net.setPreferableBackend(targetBackend);
    net.setPreferableTarget(targetDevice);

    cv::Mat img = cv::imread(imageFilename, imReadMode);

    cv::Mat imgResized;
    cv::resize(img, imgResized, modelInputSize);

    std::vector<cv::Mat> imgBatch = { imgResized };

    cv::Mat blob = cv::dnn::blobFromImages(imgBatch, 1.0, cv::Size(), cv::Scalar(), swapRBChannels, false, CV_32F);

    std::cout << "Blob size: " << blob.size[0] << "x" << blob.size[1] << "x" << blob.size[2] << "x" << blob.size[3] << std::endl;

    net.setInput(blob);

    //for (auto inputLayerName : inputLayerNames)
    //    net.setInput(blob, inputLayerName);

    std::vector<cv::String> unconnectedOutLayerNames = net.getUnconnectedOutLayersNames();

    

    std::vector<cv::Mat> outputs;
    outputs.clear();

    std::chrono::high_resolution_clock::time_point timeLoadModelPlusInference1 = std::chrono::high_resolution_clock::now();

    net.forward(outputs, unconnectedOutLayerNames);

    std::chrono::high_resolution_clock::time_point timeLoadModelPlusInference2 = std::chrono::high_resolution_clock::now();

    std::chrono::duration<double, std::milli> ms_doubleTimeLoadModelPlusInference = timeLoadModelPlusInference2 - timeLoadModelPlusInference1;

    std::cout << "Execution time (load model + inference): " << ms_doubleTimeLoadModelPlusInference.count() << std::endl; // in ms

    std::chrono::high_resolution_clock::time_point time1 = std::chrono::high_resolution_clock::now();

    try
    {
        for (size_t i = 0; i < numInferences; i++)
            net.forward(outputs, unconnectedOutLayerNames);
    }
    catch (std::exception& ex)
    {
        std::cout << ex.what() << std::endl;
    }
    
    std::chrono::high_resolution_clock::time_point time2 = std::chrono::high_resolution_clock::now();

    std::chrono::duration<double, std::milli> ms_double = time2 - time1;

    std::cout << "Execution time inference only: " << ms_double.count() / numInferences << std::endl; // in ms
}
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

ONNX ViT model fails with DNN_BACKEND_INFERENCE_ENGINE in OpenCV, but works in all other cases #27451

System Information

Detailed description

Steps to reproduce

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Uh oh!

ONNX ViT model fails with DNN_BACKEND_INFERENCE_ENGINE in OpenCV, but works in all other cases #27451

Description

System Information

Detailed description

Steps to reproduce

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.