-
-
Notifications
You must be signed in to change notification settings - Fork 56.2k
Description
System Information
Environment
Windows 10
OpenCV: 4.11.0
OpenVINO: 2024.4.0
Model: ViT model (opset 17) exported to ONNX
Input shape: [1, 3, 112, 112]
Detailed description
I'm testing a ViT model exported to ONNX. It runs fine using OpenVINO Runtime (in both Python and C++), and also works in OpenCV when using the DNN_BACKEND_OPENCV backend.
It also works with DNN_BACKEND_INFERENCE_ENGINE if I first convert the ONNX model to OpenVINO IR format (.xml and .bin) using the Model Optimizer.
The issue happens when I use the ONNX file directly with cv::dnn::readNetFromONNX() and set the backend to DNN_BACKEND_INFERENCE_ENGINE. In that case, I get this error:
[ INFO:0@11.880] global op_inf_engine.cpp:133 cv::dnn::detectArmPlugin_ CPU plugin: 13th Gen Intel(R) Core(TM) i7-1370P
OpenCV(4.12.0-dev) Error: Assertion failed (sz == src.get_size()) in cv::dnn::InfEngineNgraphNet::init, file C:\Users\cesar.gouveia\Projects\OpenCV-Package\opencv_mirror\modules\dnn\src\ie_ngraph.cpp, line 256
OpenCV: terminate handler is called! The last OpenCV error is:
OpenCV(4.12.0-dev) Error: Assertion failed (sz == src.get_size()) in cv::dnn::InfEngineNgraphNet::init, file C:\Users\cesar.gouveia\Projects\OpenCV-Package\opencv_mirror\modules\dnn\src\ie_ngraph.cpp, line 256
OpenCV(4.12.0-dev) Error: Assertion failed (sz == src.get_size()) in InfEngineNgraphNet::init
Tested with OpenCV 4.11.0 and OpenVINO 2024.4. The model uses opset 17 and input shape [1, 3, 112, 112]. I garanteed that the dimensions were [1,3,112,112] and not [batch_size, 3, 112, 112] because I know that OpenVINO has limitations with that.
I need the model to remain in ONNX format. Using IR is not a long-term solution. This only breaks when using OpenCV + Inference Engine with ONNX. Please fix or confirm if this is a known limitation.
You can find the model here: https://we.tl/t-u7vryBKkOw
You can find a reproducible code down here.
Thank you!
Steps to reproduce
#include "pch.h"
#include <iostream>
#include <chrono>
#include <fstream>
#include <opencv2/highgui.hpp>
#include <opencv2/imgcodecs.hpp>
#include <opencv2/imgproc.hpp>
#include <opencv2/dnn.hpp>
#include <opencv2/core.hpp>
std::string imageFilename = "C:/Users/cesar.gouveia/Downloads/1708014478513.jpeg";
std::string modelFilename = "C:/Users/cesar.gouveia/Downloads/vit_model.onnx";
std::string modelXml = "C:/Users/cesar.gouveia/Downloads/vit_ir/vit_model.xml";
std::string modelBin = "C:/Users/cesar.gouveia/Downloads/vit_ir/vit_model.bin";
cv::dnn::Backend targetBackend = cv::dnn::DNN_BACKEND_INFERENCE_ENGINE;
cv::dnn::Target targetDevice = cv::dnn::DNN_TARGET_CPU;
unsigned int numInferences = 100;
cv::Size modelInputSize = cv::Size(112, 112);
cv::ImreadModes imReadMode = cv::IMREAD_COLOR;
std::vector<std::string> inputLayerNames = {"input"};
bool swapRBChannels = false;
unsigned int numChannels = 3;
int main()
{
cv::dnn::Net net = cv::dnn::readNetFromONNX(modelFilename);
net.setPreferableBackend(targetBackend);
net.setPreferableTarget(targetDevice);
cv::Mat img = cv::imread(imageFilename, imReadMode);
cv::Mat imgResized;
cv::resize(img, imgResized, modelInputSize);
std::vector<cv::Mat> imgBatch = { imgResized };
cv::Mat blob = cv::dnn::blobFromImages(imgBatch, 1.0, cv::Size(), cv::Scalar(), swapRBChannels, false, CV_32F);
std::cout << "Blob size: " << blob.size[0] << "x" << blob.size[1] << "x" << blob.size[2] << "x" << blob.size[3] << std::endl;
net.setInput(blob);
//for (auto inputLayerName : inputLayerNames)
// net.setInput(blob, inputLayerName);
std::vector<cv::String> unconnectedOutLayerNames = net.getUnconnectedOutLayersNames();
std::vector<cv::Mat> outputs;
outputs.clear();
std::chrono::high_resolution_clock::time_point timeLoadModelPlusInference1 = std::chrono::high_resolution_clock::now();
net.forward(outputs, unconnectedOutLayerNames);
std::chrono::high_resolution_clock::time_point timeLoadModelPlusInference2 = std::chrono::high_resolution_clock::now();
std::chrono::duration<double, std::milli> ms_doubleTimeLoadModelPlusInference = timeLoadModelPlusInference2 - timeLoadModelPlusInference1;
std::cout << "Execution time (load model + inference): " << ms_doubleTimeLoadModelPlusInference.count() << std::endl; // in ms
std::chrono::high_resolution_clock::time_point time1 = std::chrono::high_resolution_clock::now();
try
{
for (size_t i = 0; i < numInferences; i++)
net.forward(outputs, unconnectedOutLayerNames);
}
catch (std::exception& ex)
{
std::cout << ex.what() << std::endl;
}
std::chrono::high_resolution_clock::time_point time2 = std::chrono::high_resolution_clock::now();
std::chrono::duration<double, std::milli> ms_double = time2 - time1;
std::cout << "Execution time inference only: " << ms_double.count() / numInferences << std::endl; // in ms
}