Skip to content

hal/riscv-rvv: implement FAST keypoint detection #27391

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 24 commits into
base: 4.x
Choose a base branch
from

Conversation

Haosonn
Copy link
Contributor

@Haosonn Haosonn commented Jun 2, 2025

An implementation of FAST keypoint detection with NMS/noNMS version.

A new perf test is written, and the perf test is evaluated in two platforms: K1/K230.
Accelaration is achieved when threshold is high, however, weird stat shows that the acceleration doesn't work when threshold is low (the number of keypoint candidates is high).

K1:

# GCC

                             Name of Test                               scalar  rvv      rvv
                                                                                          vs
                                                                                        scalar
                                                                                      (x-factor)
detect::Fast_Params::(20, 2, false, "cv/cameracalibration/chess9.png")  22.113 23.721    0.93
detect::Fast_Params::(20, 2, false, "cv/inpaint/orig.png")              4.605  7.168     0.64
detect::Fast_Params::(20, 2, true, "cv/cameracalibration/chess9.png")   26.228 24.689    1.06
detect::Fast_Params::(20, 2, true, "cv/inpaint/orig.png")               7.134  7.561     0.94
detect::Fast_Params::(30, 2, false, "cv/cameracalibration/chess9.png")  19.488 21.407    0.91
detect::Fast_Params::(30, 2, false, "cv/inpaint/orig.png")              3.481  5.404     0.64
detect::Fast_Params::(30, 2, true, "cv/cameracalibration/chess9.png")   22.309 22.145    1.01
detect::Fast_Params::(30, 2, true, "cv/inpaint/orig.png")               4.826  5.654     0.85
detect::Fast_Params::(100, 2, false, "cv/cameracalibration/chess9.png") 14.108 8.205     1.72
detect::Fast_Params::(100, 2, false, "cv/inpaint/orig.png")             2.520  1.072     2.35
detect::Fast_Params::(100, 2, true, "cv/cameracalibration/chess9.png")  14.133 8.410     1.68
detect::Fast_Params::(100, 2, true, "cv/inpaint/orig.png")              2.556  1.097     2.33

# Clang

                             Name of Test                               scalar  rvv      rvv
                                                                                          vs
                                                                                        scalar
                                                                                      (x-factor)
detect::Fast_Params::(20, 2, false, "cv/cameracalibration/chess9.png")  25.130 23.695    1.06
detect::Fast_Params::(20, 2, false, "cv/inpaint/orig.png")              4.987  7.168     0.70
detect::Fast_Params::(20, 2, true, "cv/cameracalibration/chess9.png")   28.035 24.467    1.15
detect::Fast_Params::(20, 2, true, "cv/inpaint/orig.png")               6.760  7.503     0.90
detect::Fast_Params::(30, 2, false, "cv/cameracalibration/chess9.png")  22.954 21.373    1.07
detect::Fast_Params::(30, 2, false, "cv/inpaint/orig.png")              3.838  5.330     0.72
detect::Fast_Params::(30, 2, true, "cv/cameracalibration/chess9.png")   24.523 21.998    1.11
detect::Fast_Params::(30, 2, true, "cv/inpaint/orig.png")               4.795  5.543     0.87
detect::Fast_Params::(100, 2, false, "cv/cameracalibration/chess9.png") 16.799 8.102     2.07
detect::Fast_Params::(100, 2, false, "cv/inpaint/orig.png")             2.874  1.024     2.81
detect::Fast_Params::(100, 2, true, "cv/cameracalibration/chess9.png")  16.950 8.073     2.10
detect::Fast_Params::(100, 2, true, "cv/inpaint/orig.png")              2.899  1.027     2.82

K230

# GCC

                             Name of Test                               scalar  rvv      rvv
                                                                                          vs
                                                                                        scalar
                                                                                      (x-factor)
detect::Fast_Params::(20, 2, false, "cv/cameracalibration/chess9.png")  21.082 32.090    0.66
detect::Fast_Params::(20, 2, false, "cv/inpaint/orig.png")              4.837  9.157     0.53
detect::Fast_Params::(20, 2, true, "cv/cameracalibration/chess9.png")   25.479 33.576    0.76
detect::Fast_Params::(20, 2, true, "cv/inpaint/orig.png")               7.549  9.716     0.78
detect::Fast_Params::(30, 2, false, "cv/cameracalibration/chess9.png")  18.463 30.087    0.61
detect::Fast_Params::(30, 2, false, "cv/inpaint/orig.png")              3.716  6.544     0.57
detect::Fast_Params::(30, 2, true, "cv/cameracalibration/chess9.png")   21.548 31.374    0.69
detect::Fast_Params::(30, 2, true, "cv/inpaint/orig.png")               5.107  6.928     0.74
detect::Fast_Params::(100, 2, false, "cv/cameracalibration/chess9.png") 13.763 8.712     1.58
detect::Fast_Params::(100, 2, false, "cv/inpaint/orig.png")             2.578  1.284     2.01
detect::Fast_Params::(100, 2, true, "cv/cameracalibration/chess9.png")  13.804 8.831     1.56
detect::Fast_Params::(100, 2, true, "cv/inpaint/orig.png")              2.615  1.289     2.03

# Clang

                             Name of Test                               scalar  rvv      rvv
                                                                                          vs
                                                                                        scalar
                                                                                      (x-factor)
detect::Fast_Params::(20, 2, false, "cv/cameracalibration/chess9.png")  23.424 35.072    0.67
detect::Fast_Params::(20, 2, false, "cv/inpaint/orig.png")              5.284  10.107    0.52
detect::Fast_Params::(20, 2, true, "cv/cameracalibration/chess9.png")   26.487 35.978    0.74
detect::Fast_Params::(20, 2, true, "cv/inpaint/orig.png")               7.146  10.612    0.67
detect::Fast_Params::(30, 2, false, "cv/cameracalibration/chess9.png")  21.155 32.858    0.64
detect::Fast_Params::(30, 2, false, "cv/inpaint/orig.png")              4.101  7.153     0.57
detect::Fast_Params::(30, 2, true, "cv/cameracalibration/chess9.png")   23.321 33.505    0.70
detect::Fast_Params::(30, 2, true, "cv/inpaint/orig.png")               5.106  7.415     0.69
detect::Fast_Params::(100, 2, false, "cv/cameracalibration/chess9.png") 15.597 8.792     1.77
detect::Fast_Params::(100, 2, false, "cv/inpaint/orig.png")             2.922  1.228     2.38
detect::Fast_Params::(100, 2, true, "cv/cameracalibration/chess9.png")  15.626 8.817     1.77
detect::Fast_Params::(100, 2, true, "cv/inpaint/orig.png")              2.963  1.240     2.39

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

  • I agree to contribute to the project under Apache 2 License.
  • To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
  • The PR is proposed to the proper branch
  • There is a reference to the original bug report and related work
  • There is accuracy test, performance test and test data in opencv_extra repository, if applicable
    Patch to opencv_extra has the same branch name.
  • The feature is well documented and sample code can be built with the project CMake

@param threshold Threshold for keypoint
@param nonmax_suppression Indicates if make nonmaxima suppression or not.
@param type FAST type
*/
inline int hal_ni_FAST(const uchar* src_data, size_t src_step, int width, int height, uchar* keypoints_data, size_t* keypoints_count, int threshold, bool nonmax_suppression, int /*cv::FastFeatureDetector::DetectorType*/ type) { return CV_HAL_ERROR_NOT_IMPLEMENTED; }
inline int hal_ni_FAST(const uchar* src_data, size_t src_step, int width, int height, std::vector<cv::KeyPoint>& keypoints, int threshold, bool nonmax_suppression, int /*cv::FastFeatureDetector::DetectorType*/ type) { return CV_HAL_ERROR_NOT_IMPLEMENTED; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's dangerous replacement. HAL uses C style API without std::vector on purpose. HAL may be a binary built with own STL. It means that std::vector on HAL side and OpenCV side may be different with different memory layout.

Copy link
Contributor Author

@Haosonn Haosonn Jun 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed. But I haven't figured out a way to modify the member size in std::vector<cv::KeyPoint>& keypoints by simply reinterpreting keypoints.data() to uchar*. The original CALL_HAL, causes wrong keypoint count in accuracy test since it compares the member size of the output std::vector<cv::KeyPoint>. Is there any solution for the problem?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A simple solution is that you can extract the macro of calling HAL, manually call HAL function and do a resize on the vector before returning. E.g.

diff --git a/modules/features2d/src/fast.cpp b/modules/features2d/src/fast.cpp
index a9615da5bd..b00b80818d 100644
--- a/modules/features2d/src/fast.cpp
+++ b/modules/features2d/src/fast.cpp
@@ -438,8 +438,12 @@ void FAST(InputArray _img, std::vector<KeyPoint>& keypoints, int threshold, bool
     size_t keypoints_count = 10000;
     keypoints.clear();
     keypoints.resize(keypoints_count);
-    CALL_HAL(fast, cv_hal_FAST, img.data, img.step, img.cols, img.rows,
-             (uchar*)(keypoints.data()), &keypoints_count, threshold, nonmax_suppression, type);
+    int hal_ret = cv_hal_FAST(img.data, img.step, img.cols, img.rows, (uchar *)(keypoints.data()),
+                              &keypoints_count, threshold, nonmax_suppression, type);
+    if (hal_ret == CV_HAL_ERROR_OK) {
+        keypoints.resize(keypoints_count);
+        return;
+    }
 
     switch(type) {
     case FastFeatureDetector::TYPE_5_8:

@asmorkalov asmorkalov requested a review from fengyuentau June 2, 2025 09:38
@Haosonn Haosonn force-pushed the pr-rvv-hal-fast branch 2 times, most recently from 64415e0 to 2ed698e Compare June 2, 2025 10:28
@Haosonn Haosonn force-pushed the pr-rvv-hal-fast branch from 2ed698e to 3cfdf0a Compare June 2, 2025 10:28
Copy link
Member

@fengyuentau fengyuentau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also take a look at the trailing whitespaces:

hal/riscv-rvv/CMakeLists.txt:20: trailing whitespace.
+  ${CMAKE_SOURCE_DIR}/modules/imgproc/include 
hal/riscv-rvv/src/features2d/fast.cpp:60: trailing whitespace.
+inline uint8_t cornerScore(const uint8_t* ptr, const vuint16m2_t& v_offset, int64_t row_stride) 
hal/riscv-rvv/src/features2d/fast.cpp:64: trailing whitespace.
+    
hal/riscv-rvv/src/features2d/fast.cpp:76: trailing whitespace.
+    
hal/riscv-rvv/src/features2d/fast.cpp:108: trailing whitespace.
+inline int fast_16(const uchar* src_data, size_t src_step, int width, int height, std::vector<KeyPoint>& keypoints, int threshold, bool nonmax_suppression) 
hal/riscv-rvv/src/features2d/fast.cpp:[19](https://github.com/opencv/opencv/actions/runs/15389955155/job/43296989535?pr=27391#step:14:20)0: trailing whitespace.
+                    
hal/riscv-rvv/src/features2d/fast.cpp:[20](https://github.com/opencv/opencv/actions/runs/15389955155/job/43296989535?pr=27391#step:14:21)7: trailing whitespace.

@Haosonn Haosonn force-pushed the pr-rvv-hal-fast branch from 3ab04e1 to 4f72ede Compare June 7, 2025 06:48
@Haosonn Haosonn force-pushed the pr-rvv-hal-fast branch from 4f72ede to 96756ba Compare June 7, 2025 06:49
Copy link
Member

@fengyuentau fengyuentau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checked on my side and found more warnings

/home/tao/workspace/fytao/k1/worktree/qhx-hal_rvv-features2d-fast/opencv/3rdparty/libtiff/tif_hash_set.c:370:37: note: earlier argument should specify number of elements, later size of each element
[352/653] Building CXX object hal/riscv-rvv/CMakeFiles/rvv_hal.dir/src/features2d/fast.cpp.o
/home/tao/workspace/fytao/k1/worktree/qhx-hal_rvv-features2d-fast/opencv/hal/riscv-rvv/src/features2d/fast.cpp: In function 'void cv::rvv_hal::features2d::makeOffsets(int16_t*, vuint16m2_t&, int64_t, int)':
/home/tao/workspace/fytao/k1/worktree/qhx-hal_rvv-features2d-fast/opencv/hal/riscv-rvv/src/features2d/fast.cpp:11:89: warning: unused parameter 'patternSize' [-Wunused-parameter]
   11 | inline void makeOffsets(int16_t pixel[], vuint16m2_t& v_offset, int64_t row_stride, int patternSize)
      |                                                                                     ~~~~^~~~~~~~~~~
/home/tao/workspace/fytao/k1/worktree/qhx-hal_rvv-features2d-fast/opencv/hal/riscv-rvv/src/features2d/fast.cpp: In function 'uint8_t cv::rvv_hal::features2d::cornerScore(const uint8_t*, const vuint16m2_t&, int64_t)':
/home/tao/workspace/fytao/k1/worktree/qhx-hal_rvv-features2d-fast/opencv/hal/riscv-rvv/src/features2d/fast.cpp:48:14: warning: unused variable 'k' [-Wunused-variable]
   48 |     uint32_t k, v = ptr[0];
      |              ^
/home/tao/workspace/fytao/k1/worktree/qhx-hal_rvv-features2d-fast/opencv/hal/riscv-rvv/src/features2d/fast.cpp: In function 'int cv::rvv_hal::features2d::fast_16(const uchar*, size_t, int, int, uchar*, size_t*, int, bool)':
/home/tao/workspace/fytao/k1/worktree/qhx-hal_rvv-features2d-fast/opencv/hal/riscv-rvv/src/features2d/fast.cpp:211:38: warning: unused variable 'debug' [-Wunused-variable]
  211 |                                 bool debug = false;
      |                                      ^~~~~
/home/tao/workspace/fytao/k1/worktree/qhx-hal_rvv-features2d-fast/opencv/hal/riscv-rvv/src/features2d/fast.cpp:212:37: warning: unused variable 'debug_x' [-Wunused-variable]
  212 |                                 int debug_x = -1;
      |                                     ^~~~~~~
/home/tao/workspace/fytao/k1/worktree/qhx-hal_rvv-features2d-fast/opencv/hal/riscv-rvv/src/features2d/fast.cpp:213:37: warning: unused variable 'debug_y' [-Wunused-variable]
  213 |                                 int debug_y = -1;
      |

Copy link
Member

@fengyuentau fengyuentau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My performance results:

K1 GCC:

                             Name of Test                               base-gcc patch-gcc patch-gcc
                                                                                               vs
                                                                                            base-gcc
                                                                                           (x-factor)
detect::Fast_Params::(20, 2, false, "cv/cameracalibration/chess9.png")   22.308   23.161      0.96
detect::Fast_Params::(20, 2, false, "cv/inpaint/orig.png")               4.816     6.888      0.70
detect::Fast_Params::(20, 2, true, "cv/cameracalibration/chess9.png")    26.150   24.789      1.05
detect::Fast_Params::(20, 2, true, "cv/inpaint/orig.png")                7.114     7.497      0.95
detect::Fast_Params::(30, 2, false, "cv/cameracalibration/chess9.png")   19.625   21.090      0.93
detect::Fast_Params::(30, 2, false, "cv/inpaint/orig.png")               3.572     5.179      0.69
detect::Fast_Params::(30, 2, true, "cv/cameracalibration/chess9.png")    22.323   22.283      1.00
detect::Fast_Params::(30, 2, true, "cv/inpaint/orig.png")                4.873     5.627      0.87
detect::Fast_Params::(100, 2, false, "cv/cameracalibration/chess9.png")  14.156    8.319      1.70
detect::Fast_Params::(100, 2, false, "cv/inpaint/orig.png")              2.582     1.170      2.21
detect::Fast_Params::(100, 2, true, "cv/cameracalibration/chess9.png")   14.257    8.521      1.67
detect::Fast_Params::(100, 2, true, "cv/inpaint/orig.png")               2.618     1.192      2.20

K1 Clang:

                             Name of Test                               base-clang patch-clang patch-clang
                                                                                                   vs
                                                                                               base-clang
                                                                                               (x-factor)
detect::Fast_Params::(20, 2, false, "cv/cameracalibration/chess9.png")    25.594     23.853       1.07
detect::Fast_Params::(20, 2, false, "cv/inpaint/orig.png")                5.453       7.196       0.76
detect::Fast_Params::(20, 2, true, "cv/cameracalibration/chess9.png")     29.278     24.837       1.18
detect::Fast_Params::(20, 2, true, "cv/inpaint/orig.png")                 7.782       7.888       0.99
detect::Fast_Params::(30, 2, false, "cv/cameracalibration/chess9.png")    23.346     21.688       1.08
detect::Fast_Params::(30, 2, false, "cv/inpaint/orig.png")                4.332       5.609       0.77
detect::Fast_Params::(30, 2, true, "cv/cameracalibration/chess9.png")     25.912     22.294       1.16
detect::Fast_Params::(30, 2, true, "cv/inpaint/orig.png")                 5.524       5.957       0.93
detect::Fast_Params::(100, 2, false, "cv/cameracalibration/chess9.png")   17.603      8.718       2.02
detect::Fast_Params::(100, 2, false, "cv/inpaint/orig.png")               3.256       1.216       2.68
detect::Fast_Params::(100, 2, true, "cv/cameracalibration/chess9.png")    17.665      8.637       2.05
detect::Fast_Params::(100, 2, true, "cv/inpaint/orig.png")                3.267       1.222       2.67

K1 vs. RK3568:

TBD

fengyuentau
fengyuentau previously approved these changes Jun 16, 2025
Copy link
Member

@fengyuentau fengyuentau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@asmorkalov
Copy link
Contributor

Muse Pi v30 board (./opencv_test_features2d):

[ RUN      ] Features2d/DescriptorImage.no_crash/6, where GetParam() = "shared/pic*.png"
0. image: /home/opencv/opencv_extra/testdata/cv/shared/pic1.png:
        400x300
        AKAZE:MLDB
                        (149 keypoints, descriptor size = 61)
        AKAZE:MLDB_UPRIGHT
                        (149 keypoints, descriptor size = 61)
        AKAZE:MLDB_256
                        (149 keypoints, descriptor size = 32)
        AKAZE:MLDB_UPRIGHT_256
                        (149 keypoints, descriptor size = 32)
        AKAZE:KAZE
                        (135 keypoints, descriptor size = 64)
        AKAZE:KAZE_UPRIGHT
                        (135 keypoints, descriptor size = 64)
        KAZE
                        (265 keypoints, descriptor size = 64)
        ORB
                        (557 keypoints, descriptor size = 32)
        BRISK
                        (151 keypoints, descriptor size = 64)
1. image: /home/opencv/opencv_extra/testdata/cv/shared/pic2.png:
        400x300
        AKAZE:MLDB
                        (412 keypoints, descriptor size = 61)
        AKAZE:MLDB_UPRIGHT
                        (412 keypoints, descriptor size = 61)
        AKAZE:MLDB_256
                        (412 keypoints, descriptor size = 32)
        AKAZE:MLDB_UPRIGHT_256
                        (412 keypoints, descriptor size = 32)
        AKAZE:KAZE
                        (388 keypoints, descriptor size = 64)
        AKAZE:KAZE_UPRIGHT
                        (388 keypoints, descriptor size = 64)
        KAZE
                        (393 keypoints, descriptor size = 64)
        ORB
double free or corruption (!prev)
[1]    2696 segmentation fault (core dumped)  ./opencv_test_features2d

@Haosonn
Copy link
Contributor Author

Haosonn commented Jun 16, 2025

Muse Pi v30 board (./opencv_test_features2d):

[ RUN      ] Features2d/DescriptorImage.no_crash/6, where GetParam() = "shared/pic*.png"
0. image: /home/opencv/opencv_extra/testdata/cv/shared/pic1.png:
        400x300
        AKAZE:MLDB
                        (149 keypoints, descriptor size = 61)
        AKAZE:MLDB_UPRIGHT
                        (149 keypoints, descriptor size = 61)
        AKAZE:MLDB_256
                        (149 keypoints, descriptor size = 32)
        AKAZE:MLDB_UPRIGHT_256
                        (149 keypoints, descriptor size = 32)
        AKAZE:KAZE
                        (135 keypoints, descriptor size = 64)
        AKAZE:KAZE_UPRIGHT
                        (135 keypoints, descriptor size = 64)
        KAZE
                        (265 keypoints, descriptor size = 64)
        ORB
                        (557 keypoints, descriptor size = 32)
        BRISK
                        (151 keypoints, descriptor size = 64)
1. image: /home/opencv/opencv_extra/testdata/cv/shared/pic2.png:
        400x300
        AKAZE:MLDB
                        (412 keypoints, descriptor size = 61)
        AKAZE:MLDB_UPRIGHT
                        (412 keypoints, descriptor size = 61)
        AKAZE:MLDB_256
                        (412 keypoints, descriptor size = 32)
        AKAZE:MLDB_UPRIGHT_256
                        (412 keypoints, descriptor size = 32)
        AKAZE:KAZE
                        (388 keypoints, descriptor size = 64)
        AKAZE:KAZE_UPRIGHT
                        (388 keypoints, descriptor size = 64)
        KAZE
                        (393 keypoints, descriptor size = 64)
        ORB
double free or corruption (!prev)
[1]    2696 segmentation fault (core dumped)  ./opencv_test_features2d

I've checked this test. This cause of this failure is that the count of keypoints in this case is greater than 10000 which is set in the initialization in modules/features2d/src/fast.cpp:438. A wider range solves the problem.

    size_t keypoints_count = 10000;
    keypoints.clear();
    keypoints.resize(keypoints_count); // reserve space for keypoints

@asmorkalov
Copy link
Contributor

Looks like we need some API change here.

@fengyuentau fengyuentau dismissed their stale review June 23, 2025 07:29

Introduced new changes

keypoints.resize(keypoints_count);
CALL_HAL(fast, cv_hal_FAST, img.data, img.step, img.cols, img.rows,
(uchar*)(keypoints.data()), &keypoints_count, threshold, nonmax_suppression, type);
uchar* kps = (uchar*)malloc(sizeof(KeyPoint) * keypoints_count);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't need extra variable. Use the following code instead:

    KeyPoint* kps = (KeyPoint*)malloc(sizeof(kps[0]) * keypoints_count);
    int hal_ret = cv_hal_FAST(img.data, img.step, img.cols, img.rows, (void**)&kps,
                              &keypoints_count, threshold, nonmax_suppression, type, realloc);
    if (hal_ret == CV_HAL_ERROR_OK) {
        keypoints.assign(kps, kps + keypoints_count);
    }
    free(kps);
    if (hal_ret == CV_HAL_ERROR_OK) {
        return;
    }

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@asmorkalov asmorkalov added this to the 4.13.0 milestone Jun 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy