-
-
Notifications
You must be signed in to change notification settings - Fork 56.2k
[hal_rvv] Add cv::integral implementation and more types of input for test #27060
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I'm sorry that there is a few problems with this PR, I modified the wrong perf test which belongs to OpenCL, and I will cover more data types, so I turned this into a draft 🙇 TODO List:
|
template <typename T> struct rvv; | ||
|
||
// Vector operations wrapper | ||
template<> struct rvv<uint8_t> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do not expose the struct rvv
into cv_hal_rvv
namespace, see #26865 (comment).
I still observe several test failures in opencv_perf_imgproc:
Example:
|
Thank you! I will investigate and fix it |
I'm sorry that I can't reproduce the failed test using both GCC 14.2.0 (from riscv-gnu-toolchain) and GCC 14.2.1 (from SpacemiT toolchain) 😔
|
Hi Alex @asmorkalov , could you please take a look at this PR when you have time? |
return __riscv_vget_v_##TYPE##TWO_LMUL##_##TYPE##ONE_LMUL(v, idx); \ | ||
} \ | ||
inline TWO::VecType vcreate(ONE::VecType v0, ONE::VecType v1) { \ | ||
return __riscv_vcreate_v_##TYPE##ONE_LMUL##_##TYPE##TWO_LMUL(v0, v1); \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
vcreate
is not supported in clang 17. Use this instead.
vuint8m2_t v{};
v = __riscv_vset_v_u8m1_u8m2(v, 0, x);
v = __riscv_vset_v_u8m1_u8m2(v, 1, y);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for pointing out!
Should I change the implementation of vcreate
and fall back to vset
when clang version is under 18, or just add vset
for types.hpp
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer both. Note that __riscv_vset
needs an immediate number for the second argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer both. Note that
__riscv_vset
needs an immediate number for the second argument.
Thanks, I will update this patch later
There are a lot of build issues. Please take a look. |
Co-authored-by: Liutong HAN <liutong2020@iscas.ac.cn>
Co-authored-by: Liutong HAN <liutong2020@iscas.ac.cn>
Co-authored-by: Liutong HAN <liutong2020@iscas.ac.cn>
Co-authored-by: Liutong HAN <liutong2020@iscas.ac.cn>
…pe conversions for rvv hal macros Co-authored-by: Liutong HAN <liutong2020@iscas.ac.cn>
Co-authored-by: Liutong HAN <liutong2020@iscas.ac.cn>
Co-authored-by: Liutong HAN <liutong2020@iscas.ac.cn>
Oops, sorry I just rebased this patch to the latest commit of |
Current patch passed all build checks, could you please have a look? Thanks! : ) |
Regressions with this patch on K1:
|
Sorry about that 🥹 |
It is not about performance but accuracy. We have sanity checks, which works with data in opencv_extra, in most of the performance testings. For example, opencv/modules/imgproc/perf/perf_integral.cpp Lines 82 to 83 in 5d3a978
|
Oh now I understand, thank you for explanation! I remembered #27060 (comment) pointed out the same failed tests. I will try to reproduce the failed tests and investigate about that |
This patch introduces an RVV-optimized implementation of
cv::integral()
in hal_rvv, along with performance and accuracy tests for all valid input/output type combinations specified inmodules/imgproc/src/hal_replacement.hpp
:opencv/modules/imgproc/src/hal_replacement.hpp
Lines 960 to 974 in 2a8d4b8
The vectorized prefix sum algorithm follows the approach described in Prefix Sum with SIMD - Algorithmica.
I intentionally omitted support for the following cases by returning
CV_HAL_ERROR_NOT_IMPLEMENTED
, as they are harder to implement or show limited performance gains:cn == 3
): Current implementation requiresVLEN/SEW
(a.k.a. number of elements in a vector register) to be a multiple of channel count, which 3-channel formats typically cannot satisfy.!(width >> 8 || height >> 8)
): The scalar implementation demonstrates better performance for images with limited dimensions.3rdparty/ndsrvp/src/integral.cpp
opencv/3rdparty/ndsrvp/src/integral.cpp
Lines 24 to 26 in 09c71ae
Test configuration:
integral_sqsum_full
test is disabled by default, so--gtest_also_run_disabled_tests
is neededTest results:
The vectorized implementation shows progressively better acceleration for larger image sizes and higher channel counts, achieving up to 6.21× speedup for 64FC4 (1920×1080) inputs with
DEPTH_64F_64F
configuration.This is my first time proposing patch for the OpenCV Project 🥹, if there's anything that can be improved, please tell me.
Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
Patch to opencv_extra has the same branch name.