Skip to content

[hal_rvv] Add cv::integral implementation and more types of input for test #27060

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Apr 21, 2025

Conversation

YooLc
Copy link
Contributor

@YooLc YooLc commented Mar 13, 2025

This patch introduces an RVV-optimized implementation of cv::integral() in hal_rvv, along with performance and accuracy tests for all valid input/output type combinations specified in modules/imgproc/src/hal_replacement.hpp:

@note Following combinations of image depths are used:
Source | Sum | Square sum
-------|-----|-----------
CV_8U | CV_32S | CV_64F
CV_8U | CV_32S | CV_32F
CV_8U | CV_32S | CV_32S
CV_8U | CV_32F | CV_64F
CV_8U | CV_32F | CV_32F
CV_8U | CV_64F | CV_64F
CV_16U | CV_64F | CV_64F
CV_16S | CV_64F | CV_64F
CV_32F | CV_32F | CV_64F
CV_32F | CV_32F | CV_32F
CV_32F | CV_64F | CV_64F
CV_64F | CV_64F | CV_64F

The vectorized prefix sum algorithm follows the approach described in Prefix Sum with SIMD - Algorithmica.

I intentionally omitted support for the following cases by returning CV_HAL_ERROR_NOT_IMPLEMENTED, as they are harder to implement or show limited performance gains:

  1. Tilted Sum: The data access pattern for tilted sums requires multi-row operations, making effective vectorization difficult.
  2. 3-channel images (cn == 3): Current implementation requires VLEN/SEW (a.k.a. number of elements in a vector register) to be a multiple of channel count, which 3-channel formats typically cannot satisfy.
    • Support for 1, 2 and 4 channel images is implemented
  3. Small images (!(width >> 8 || height >> 8)): The scalar implementation demonstrates better performance for images with limited dimensions.

Test configuration:

  • Platform: SpacemiT Muse Pi (K1 @ 1.60 Ghz)
  • Toolchain: GCC 14.2.0
  • integral_sqsum_full test is disabled by default, so --gtest_also_run_disabled_tests is needed

Test results:

Geometric mean (ms)

                                     Name of Test                                       imgproc-gcc-scalar imgproc-gcc-hal  imgproc-gcc-hal  
                                                                                                                                   vs        
                                                                                                                           imgproc-gcc-scalar
                                                                                                                               (x-factor)      
integral::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_32F)                                   1.973             1.415             1.39       
integral::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_32S)                                   1.343             1.351             0.99       
integral::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_64F)                                   2.021             2.756             0.73       
integral::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_32F)                                   4.695             2.874             1.63       
integral::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_32S)                                   4.028             2.801             1.44       
integral::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_64F)                                   5.965             4.926             1.21       
integral::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_32F)                                   9.970             4.440             2.25       
integral::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_32S)                                   7.934             4.244             1.87       
integral::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_64F)                                   14.696            8.431             1.74       
integral::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_32F)                                  5.949             4.108             1.45       
integral::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_32S)                                  4.064             4.080             1.00       
integral::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_64F)                                  6.137             7.975             0.77       
integral::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_32F)                                  13.896            8.721             1.59       
integral::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_32S)                                  10.948            8.513             1.29       
integral::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_64F)                                  18.046           15.234             1.18       
integral::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_32F)                                  35.105           13.778             2.55       
integral::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_32S)                                  27.135           13.417             2.02       
integral::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_64F)                                  43.477           25.616             1.70       
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_32F)                                 13.386            9.281             1.44       
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_32S)                                 9.159             9.194             1.00       
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_64F)                                 13.776           17.836             0.77       
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_32F)                                 31.943           19.435             1.64       
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_32S)                                 24.747           18.946             1.31       
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_64F)                                 35.925           33.943             1.06       
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_32F)                                 66.493           29.692             2.24       
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_32S)                                 54.737           28.250             1.94       
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_64F)                                 91.880           57.495             1.60            
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_32F)                             4.384             4.016             1.09       
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_32S)                             3.676             3.960             0.93       
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_64F)                             5.620             5.224             1.08       
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_32F)                             9.971             7.696             1.30       
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_32S)                             8.934             7.632             1.17       
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_64F)                             9.927             9.759             1.02       
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_32F)                             21.556           12.288             1.75       
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_32S)                             21.261           12.089             1.76       
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_64F)                             23.989           16.278             1.47       
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_32F)                            15.232           11.752             1.30       
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_32S)                            12.976           11.721             1.11       
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_64F)                            16.450           15.627             1.05       
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_32F)                            25.932           23.243             1.12       
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_32S)                            24.750           23.019             1.08       
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_64F)                            28.228           29.605             0.95       
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_32F)                            61.665           37.477             1.65       
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_32S)                            61.536           37.126             1.66       
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_64F)                            73.989           48.994             1.51       
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_32F)                           49.640           26.529             1.87       
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_32S)                           35.869           26.417             1.36       
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_64F)                           34.378           35.056             0.98       
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_32F)                           82.138           52.661             1.56       
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_32S)                           54.644           52.089             1.05       
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_64F)                           75.073           66.670             1.13       
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_32F)                          143.283           83.943             1.71       
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_32S)                          156.851           82.378             1.90       
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_64F)                          521.594           111.375            4.68            
integral_sqsum_full::Size_MatType_OutMatDepthArray::(640x480, (8UC1, DEPTH_32F_32F))          3.529             2.787             1.27       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(640x480, (8UC1, DEPTH_32F_64F))          4.396             3.998             1.10       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(640x480, (8UC1, DEPTH_32S_32F))          3.229             2.774             1.16       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(640x480, (8UC1, DEPTH_32S_32S))          2.945             2.780             1.06       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(640x480, (8UC1, DEPTH_32S_64F))          3.857             3.995             0.97       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(640x480, (8UC1, DEPTH_64F_64F))          5.872             5.228             1.12       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(640x480, (16UC1, DEPTH_64F_64F))         6.075             5.277             1.15       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(640x480, (16SC1, DEPTH_64F_64F))         5.680             5.296             1.07       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(640x480, (32FC1, DEPTH_32F_32F))         3.355             2.896             1.16       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(640x480, (32FC1, DEPTH_32F_64F))         4.183             4.000             1.05       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(640x480, (32FC1, DEPTH_64F_64F))         6.237             5.143             1.21       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(640x480, (64FC1, DEPTH_64F_64F))         4.753             4.783             0.99       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(640x480, (8UC2, DEPTH_32F_32F))          8.021             5.793             1.38       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(640x480, (8UC2, DEPTH_32F_64F))          9.963             7.704             1.29       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(640x480, (8UC2, DEPTH_32S_32F))          7.864             5.720             1.37       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(640x480, (8UC2, DEPTH_32S_32S))          7.141             5.699             1.25       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(640x480, (8UC2, DEPTH_32S_64F))          9.228             7.646             1.21       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(640x480, (8UC2, DEPTH_64F_64F))          9.940             9.759             1.02       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(640x480, (16UC2, DEPTH_64F_64F))         10.606            9.716             1.09       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(640x480, (16SC2, DEPTH_64F_64F))         9.933             9.751             1.02       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(640x480, (32FC2, DEPTH_32F_32F))         7.986             5.962             1.34       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(640x480, (32FC2, DEPTH_32F_64F))         9.243             7.598             1.22       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(640x480, (32FC2, DEPTH_64F_64F))         10.573            9.425             1.12       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(640x480, (64FC2, DEPTH_64F_64F))         11.029            8.977             1.23       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(640x480, (8UC4, DEPTH_32F_32F))          17.236            8.881             1.94       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(640x480, (8UC4, DEPTH_32F_64F))          20.905           12.322             1.70       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(640x480, (8UC4, DEPTH_32S_32F))          16.011            8.666             1.85       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(640x480, (8UC4, DEPTH_32S_32S))          15.932            8.507             1.87       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(640x480, (8UC4, DEPTH_32S_64F))          20.713           12.115             1.71       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(640x480, (8UC4, DEPTH_64F_64F))          23.953           16.284             1.47       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(640x480, (16UC4, DEPTH_64F_64F))         25.127           16.341             1.54       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(640x480, (16SC4, DEPTH_64F_64F))         24.950           16.441             1.52       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(640x480, (32FC4, DEPTH_32F_32F))         17.261            8.906             1.94       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(640x480, (32FC4, DEPTH_32F_64F))         21.944           12.073             1.82       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(640x480, (32FC4, DEPTH_64F_64F))         25.921           15.539             1.67       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(640x480, (64FC4, DEPTH_64F_64F))         27.938           14.824             1.88       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1280x720, (8UC1, DEPTH_32F_32F))         11.156            8.260             1.35       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1280x720, (8UC1, DEPTH_32F_64F))         14.777           11.869             1.24       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1280x720, (8UC1, DEPTH_32S_32F))         9.693             8.221             1.18       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1280x720, (8UC1, DEPTH_32S_32S))         9.023             8.256             1.09       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1280x720, (8UC1, DEPTH_32S_64F))         13.276           11.821             1.12       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1280x720, (8UC1, DEPTH_64F_64F))         15.406           15.618             0.99       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1280x720, (16UC1, DEPTH_64F_64F))        16.799           15.749             1.07       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1280x720, (16SC1, DEPTH_64F_64F))        15.054           15.806             0.95       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1280x720, (32FC1, DEPTH_32F_32F))        10.055            7.999             1.26       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1280x720, (32FC1, DEPTH_32F_64F))        13.506           11.253             1.20       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1280x720, (32FC1, DEPTH_64F_64F))        14.952           15.021             1.00       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1280x720, (64FC1, DEPTH_64F_64F))        13.761           14.002             0.98       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1280x720, (8UC2, DEPTH_32F_32F))         22.677           17.330             1.31       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1280x720, (8UC2, DEPTH_32F_64F))         26.283           23.237             1.13       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1280x720, (8UC2, DEPTH_32S_32F))         20.126           17.118             1.18       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1280x720, (8UC2, DEPTH_32S_32S))         19.337           17.041             1.13       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1280x720, (8UC2, DEPTH_32S_64F))         24.973           23.004             1.09       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1280x720, (8UC2, DEPTH_64F_64F))         29.959           29.585             1.01       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1280x720, (16UC2, DEPTH_64F_64F))        33.598           29.599             1.14       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1280x720, (16SC2, DEPTH_64F_64F))        46.213           29.741             1.55       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1280x720, (32FC2, DEPTH_32F_32F))        33.077           17.556             1.88       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1280x720, (32FC2, DEPTH_32F_64F))        33.960           22.991             1.48       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1280x720, (32FC2, DEPTH_64F_64F))        41.792           28.803             1.45       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1280x720, (64FC2, DEPTH_64F_64F))        34.660           28.532             1.21       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1280x720, (8UC4, DEPTH_32F_32F))         52.989           27.659             1.92       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1280x720, (8UC4, DEPTH_32F_64F))         62.418           37.515             1.66       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1280x720, (8UC4, DEPTH_32S_32F))         50.902           27.310             1.86       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1280x720, (8UC4, DEPTH_32S_32S))         47.301           27.019             1.75       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1280x720, (8UC4, DEPTH_32S_64F))         61.982           37.140             1.67       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1280x720, (8UC4, DEPTH_64F_64F))         79.403           49.041             1.62       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1280x720, (16UC4, DEPTH_64F_64F))        86.550           49.180             1.76       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1280x720, (16SC4, DEPTH_64F_64F))        85.715           49.468             1.73       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1280x720, (32FC4, DEPTH_32F_32F))        63.932           28.019             2.28       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1280x720, (32FC4, DEPTH_32F_64F))        68.180           36.858             1.85       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1280x720, (32FC4, DEPTH_64F_64F))        83.063           46.483             1.79       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1280x720, (64FC4, DEPTH_64F_64F))        91.990           44.545             2.07       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1920x1080, (8UC1, DEPTH_32F_32F))        25.503           18.609             1.37       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1920x1080, (8UC1, DEPTH_32F_64F))        29.544           26.635             1.11       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1920x1080, (8UC1, DEPTH_32S_32F))        22.581           18.514             1.22       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1920x1080, (8UC1, DEPTH_32S_32S))        20.860           18.547             1.12       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1920x1080, (8UC1, DEPTH_32S_64F))        26.046           26.373             0.99       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1920x1080, (8UC1, DEPTH_64F_64F))        34.831           34.997             1.00       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1920x1080, (16UC1, DEPTH_64F_64F))       36.428           35.214             1.03       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1920x1080, (16SC1, DEPTH_64F_64F))       32.435           35.314             0.92       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1920x1080, (32FC1, DEPTH_32F_32F))       22.548           18.845             1.20       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1920x1080, (32FC1, DEPTH_32F_64F))       28.589           25.790             1.11       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1920x1080, (32FC1, DEPTH_64F_64F))       32.625           33.791             0.97       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1920x1080, (64FC1, DEPTH_64F_64F))       30.158           31.889             0.95       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1920x1080, (8UC2, DEPTH_32F_32F))        53.374           38.938             1.37       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1920x1080, (8UC2, DEPTH_32F_64F))        73.892           52.747             1.40       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1920x1080, (8UC2, DEPTH_32S_32F))        47.392           38.572             1.23       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1920x1080, (8UC2, DEPTH_32S_32S))        45.638           38.225             1.19       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1920x1080, (8UC2, DEPTH_32S_64F))        69.966           52.156             1.34       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1920x1080, (8UC2, DEPTH_64F_64F))        68.560           66.963             1.02       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1920x1080, (16UC2, DEPTH_64F_64F))       71.487           65.420             1.09       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1920x1080, (16SC2, DEPTH_64F_64F))       68.127           65.718             1.04       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1920x1080, (32FC2, DEPTH_32F_32F))       72.967           39.987             1.82       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1920x1080, (32FC2, DEPTH_32F_64F))       63.933           51.408             1.24       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1920x1080, (32FC2, DEPTH_64F_64F))       73.334           63.354             1.16       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1920x1080, (64FC2, DEPTH_64F_64F))       80.983           60.778             1.33       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1920x1080, (8UC4, DEPTH_32F_32F))       116.981           59.908             1.95       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1920x1080, (8UC4, DEPTH_32F_64F))       155.085           83.974             1.85       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1920x1080, (8UC4, DEPTH_32S_32F))       109.567           58.525             1.87       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1920x1080, (8UC4, DEPTH_32S_32S))       105.457           57.124             1.85       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1920x1080, (8UC4, DEPTH_32S_64F))       157.325           82.485             1.91       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1920x1080, (8UC4, DEPTH_64F_64F))       265.776           111.577            2.38       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1920x1080, (16UC4, DEPTH_64F_64F))      585.218           110.583            5.29       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1920x1080, (16SC4, DEPTH_64F_64F))      585.418           111.302            5.26       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1920x1080, (32FC4, DEPTH_32F_32F))      126.456           60.415             2.09       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1920x1080, (32FC4, DEPTH_32F_64F))      169.278           81.460             2.08       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1920x1080, (32FC4, DEPTH_64F_64F))      281.256           104.732            2.69       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1920x1080, (64FC4, DEPTH_64F_64F))      620.885           99.953             6.21       

The vectorized implementation shows progressively better acceleration for larger image sizes and higher channel counts, achieving up to 6.21× speedup for 64FC4 (1920×1080) inputs with DEPTH_64F_64F configuration.

This is my first time proposing patch for the OpenCV Project 🥹, if there's anything that can be improved, please tell me.

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

  • I agree to contribute to the project under Apache 2 License.
  • To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
  • The PR is proposed to the proper branch
  • There is a reference to the original bug report and related work
  • There is accuracy test, performance test and test data in opencv_extra repository, if applicable
    Patch to opencv_extra has the same branch name.
  • The feature is well documented and sample code can be built with the project CMake

@YooLc
Copy link
Contributor Author

YooLc commented Mar 14, 2025

I'm sorry that there is a few problems with this PR, I modified the wrong perf test which belongs to OpenCL, and I will cover more data types, so I turned this into a draft 🙇

TODO List:

  • Revert the modified OpenCL perf test, and use the correct test
  • Cover more input/output type combinations
  • Add support for cn>1

@YooLc YooLc force-pushed the hal-rvv-integral branch from bb69047 to 36bc54c Compare March 31, 2025 15:55
@YooLc YooLc marked this pull request as ready for review March 31, 2025 16:29
template <typename T> struct rvv;

// Vector operations wrapper
template<> struct rvv<uint8_t> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not expose the struct rvv into cv_hal_rvv namespace, see #26865 (comment).

@asmorkalov
Copy link
Contributor

I still observe several test failures in opencv_perf_imgproc:

[  FAILED  ] 8 tests, listed below:
[  FAILED  ] Size_MatType_OutMatDepth_integral_sqsum.integral_sqsum/1, where GetParam() = (640x480, 8UC1, CV_32F)
[  FAILED  ] Size_MatType_OutMatDepth_integral_sqsum.integral_sqsum/4, where GetParam() = (640x480, 8UC2, CV_32F)
[  FAILED  ] Size_MatType_OutMatDepth_integral_sqsum.integral_sqsum/10, where GetParam() = (640x480, 8UC4, CV_32F)
[  FAILED  ] Size_MatType_OutMatDepth_integral_sqsum.integral_sqsum/16, where GetParam() = (1280x720, 8UC2, CV_32F)
[  FAILED  ] Size_MatType_OutMatDepth_integral_sqsum.integral_sqsum/22, where GetParam() = (1280x720, 8UC4, CV_32F)
[  FAILED  ] Size_MatType_OutMatDepth_integral_sqsum.integral_sqsum/25, where GetParam() = (1920x1080, 8UC1, CV_32F)
[  FAILED  ] Size_MatType_OutMatDepth_integral_sqsum.integral_sqsum/28, where GetParam() = (1920x1080, 8UC2, CV_32F)
[  FAILED  ] Size_MatType_OutMatDepth_integral_sqsum.integral_sqsum/34, where GetParam() = (1920x1080, 8UC4, CV_32F)

Example:

➜  cross-build-patched ./opencv_perf_imgproc --gtest_filter=Size_MatType_OutMatDepth_integral_sqsum.integral_sqsum/1                                                        
TEST: Skip tests with tags: 'mem_6gb', 'verylong'
CTEST_FULL_OUTPUT
OpenCV version: 4.12.0-dev
OpenCV VCS version: 4.11.0-316-gc5e0bf3d42
Build type: Release
Compiler: /mnt/Projects/spacemit/spacemit-toolchain-linux-glibc-x86_64-v1.0.4/bin/riscv64-unknown-linux-gnu-g++  (ver 14.2.1)
Algorithm hint: ALGO_HINT_ACCURATE
HAL: YES (HAL RVV (ver 0.0.1))
Parallel framework: pthreads (nthreads=8)
CPU features: RVV
Note: Google Test filter = Size_MatType_OutMatDepth_integral_sqsum.integral_sqsum/1
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from Size_MatType_OutMatDepth_integral_sqsum
[ RUN      ] Size_MatType_OutMatDepth_integral_sqsum.integral_sqsum/1, where GetParam() = (640x480, 8UC1, CV_32F)
/mnt/Projects/Projects/opencv/modules/ts/src/ts_perf.cpp:368: Failure
The difference between expect_max and actual_max is 12, which exceeds eps, where
expect_max evaluates to 39124888,
actual_max evaluates to 39124900, and
eps evaluates to 9.9999999999999995e-07.
Argument "sum" has unexpected maximal value

params    = (640x480, 8UC1, CV_32F)
termination reason:  unknown
bytesIn   =     307200
bytesOut  =    2457600
samples   =         13 of 100
outliers  =          1
frequency = 1000000000
min       =    3937369 = 3.94ms
median    =    3951390 = 3.95ms
gmean     =    3960626 = 3.96ms
gstddev   = 0.00806408 = 0.19ms for 97% dispersion interval
mean      =    3960744 = 3.96ms
stddev    =      32216 = 0.03ms
[  FAILED  ] Size_MatType_OutMatDepth_integral_sqsum.integral_sqsum/1, where GetParam() = (640x480, 8UC1, CV_32F) (67 ms)
[----------] 1 test from Size_MatType_OutMatDepth_integral_sqsum (67 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (69 ms total)
[  PASSED  ] 0 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] Size_MatType_OutMatDepth_integral_sqsum.integral_sqsum/1, where GetParam() = (640x480, 8UC1, CV_32F)

@YooLc
Copy link
Contributor Author

YooLc commented Apr 1, 2025

I still observe several test failures in opencv_perf_imgproc:

[  FAILED  ] 8 tests, listed below:
[  FAILED  ] Size_MatType_OutMatDepth_integral_sqsum.integral_sqsum/1, where GetParam() = (640x480, 8UC1, CV_32F)
[  FAILED  ] Size_MatType_OutMatDepth_integral_sqsum.integral_sqsum/4, where GetParam() = (640x480, 8UC2, CV_32F)
[  FAILED  ] Size_MatType_OutMatDepth_integral_sqsum.integral_sqsum/10, where GetParam() = (640x480, 8UC4, CV_32F)
[  FAILED  ] Size_MatType_OutMatDepth_integral_sqsum.integral_sqsum/16, where GetParam() = (1280x720, 8UC2, CV_32F)
[  FAILED  ] Size_MatType_OutMatDepth_integral_sqsum.integral_sqsum/22, where GetParam() = (1280x720, 8UC4, CV_32F)
[  FAILED  ] Size_MatType_OutMatDepth_integral_sqsum.integral_sqsum/25, where GetParam() = (1920x1080, 8UC1, CV_32F)
[  FAILED  ] Size_MatType_OutMatDepth_integral_sqsum.integral_sqsum/28, where GetParam() = (1920x1080, 8UC2, CV_32F)
[  FAILED  ] Size_MatType_OutMatDepth_integral_sqsum.integral_sqsum/34, where GetParam() = (1920x1080, 8UC4, CV_32F)

Example:

➜  cross-build-patched ./opencv_perf_imgproc --gtest_filter=Size_MatType_OutMatDepth_integral_sqsum.integral_sqsum/1                                                        
TEST: Skip tests with tags: 'mem_6gb', 'verylong'
CTEST_FULL_OUTPUT
OpenCV version: 4.12.0-dev
OpenCV VCS version: 4.11.0-316-gc5e0bf3d42
Build type: Release
Compiler: /mnt/Projects/spacemit/spacemit-toolchain-linux-glibc-x86_64-v1.0.4/bin/riscv64-unknown-linux-gnu-g++  (ver 14.2.1)
Algorithm hint: ALGO_HINT_ACCURATE
HAL: YES (HAL RVV (ver 0.0.1))
Parallel framework: pthreads (nthreads=8)
CPU features: RVV
Note: Google Test filter = Size_MatType_OutMatDepth_integral_sqsum.integral_sqsum/1
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from Size_MatType_OutMatDepth_integral_sqsum
[ RUN      ] Size_MatType_OutMatDepth_integral_sqsum.integral_sqsum/1, where GetParam() = (640x480, 8UC1, CV_32F)
/mnt/Projects/Projects/opencv/modules/ts/src/ts_perf.cpp:368: Failure
The difference between expect_max and actual_max is 12, which exceeds eps, where
expect_max evaluates to 39124888,
actual_max evaluates to 39124900, and
eps evaluates to 9.9999999999999995e-07.
Argument "sum" has unexpected maximal value

params    = (640x480, 8UC1, CV_32F)
termination reason:  unknown
bytesIn   =     307200
bytesOut  =    2457600
samples   =         13 of 100
outliers  =          1
frequency = 1000000000
min       =    3937369 = 3.94ms
median    =    3951390 = 3.95ms
gmean     =    3960626 = 3.96ms
gstddev   = 0.00806408 = 0.19ms for 97% dispersion interval
mean      =    3960744 = 3.96ms
stddev    =      32216 = 0.03ms
[  FAILED  ] Size_MatType_OutMatDepth_integral_sqsum.integral_sqsum/1, where GetParam() = (640x480, 8UC1, CV_32F) (67 ms)
[----------] 1 test from Size_MatType_OutMatDepth_integral_sqsum (67 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (69 ms total)
[  PASSED  ] 0 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] Size_MatType_OutMatDepth_integral_sqsum.integral_sqsum/1, where GetParam() = (640x480, 8UC1, CV_32F)

Thank you! I will investigate and fix it

@YooLc
Copy link
Contributor Author

YooLc commented Apr 1, 2025

I'm sorry that I can't reproduce the failed test using both GCC 14.2.0 (from riscv-gnu-toolchain) and GCC 14.2.1 (from SpacemiT toolchain) 😔
I think current version can pass those tests 🙂

TEST: Skip tests with tags: 'mem_6gb', 'verylong'
CTEST_FULL_OUTPUT
OpenCV version: 4.12.0-dev
OpenCV VCS version: unknown
Build type: Release
Compiler: /opt/riscv/bin/riscv64-unknown-linux-gnu-g++  (ver 14.2.1)
Algorithm hint: ALGO_HINT_ACCURATE
HAL: YES (HAL RVV (ver 0.0.1))
Parallel framework: pthreads (nthreads=8)
CPU features: RVV
Note: Google Test filter = Size_MatType_OutMatDepth_integral_sqsum.integral_sqsum/1
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from Size_MatType_OutMatDepth_integral_sqsum
[ RUN      ] Size_MatType_OutMatDepth_integral_sqsum.integral_sqsum/1, where GetParam() = (640x480, 8UC1, CV_32F)
[ PERFSTAT ]    (samples=13   mean=3.92   median=3.90   min=3.88   stddev=0.04 (0.9%))
[       OK ] Size_MatType_OutMatDepth_integral_sqsum.integral_sqsum/1 (67 ms)
[----------] 1 test from Size_MatType_OutMatDepth_integral_sqsum (67 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (69 ms total)
[  PASSED  ] 1 test.

@YooLc
Copy link
Contributor Author

YooLc commented Apr 4, 2025

I'm sorry that I can't reproduce the failed test using both GCC 14.2.0 (from riscv-gnu-toolchain) and GCC 14.2.1 (from SpacemiT toolchain) 😔 I think current version can pass those tests 🙂

Hi Alex @asmorkalov , could you please take a look at this PR when you have time?
Let me know if any changes are needed. Thanks for your review!

return __riscv_vget_v_##TYPE##TWO_LMUL##_##TYPE##ONE_LMUL(v, idx); \
} \
inline TWO::VecType vcreate(ONE::VecType v0, ONE::VecType v1) { \
return __riscv_vcreate_v_##TYPE##ONE_LMUL##_##TYPE##TWO_LMUL(v0, v1); \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vcreate is not supported in clang 17. Use this instead.

vuint8m2_t v{};
v = __riscv_vset_v_u8m1_u8m2(v, 0, x);
v = __riscv_vset_v_u8m1_u8m2(v, 1, y);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for pointing out!
Should I change the implementation of vcreate and fall back to vset when clang version is under 18, or just add vset for types.hpp?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer both. Note that __riscv_vset needs an immediate number for the second argument.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer both. Note that __riscv_vset needs an immediate number for the second argument.

Thanks, I will update this patch later

@asmorkalov
Copy link
Contributor

There are a lot of build issues. Please take a look.

YooLc and others added 7 commits April 11, 2025 22:00
Co-authored-by: Liutong HAN <liutong2020@iscas.ac.cn>
Co-authored-by: Liutong HAN <liutong2020@iscas.ac.cn>
Co-authored-by: Liutong HAN <liutong2020@iscas.ac.cn>
Co-authored-by: Liutong HAN <liutong2020@iscas.ac.cn>
…pe conversions for rvv hal macros

Co-authored-by: Liutong HAN <liutong2020@iscas.ac.cn>
Co-authored-by: Liutong HAN <liutong2020@iscas.ac.cn>
Co-authored-by: Liutong HAN <liutong2020@iscas.ac.cn>
@YooLc YooLc force-pushed the hal-rvv-integral branch from 509e69f to d12bb7c Compare April 11, 2025 14:18
@YooLc
Copy link
Contributor Author

YooLc commented Apr 11, 2025

Oops, sorry I just rebased this patch to the latest commit of 4.x branch.
The first 6 commits are not changed, and the last one is to fix build problems with clang version <= 17.
Here's a demo for functions added in types.hpp: Godbolt

@YooLc
Copy link
Contributor Author

YooLc commented Apr 18, 2025

There are a lot of build issues. Please take a look.

Current patch passed all build checks, could you please have a look? Thanks! : )

@asmorkalov asmorkalov self-assigned this Apr 21, 2025
@asmorkalov asmorkalov self-requested a review April 21, 2025 06:11
@asmorkalov asmorkalov added this to the 4.12.0 milestone Apr 21, 2025
@asmorkalov asmorkalov merged commit f20facc into opencv:4.x Apr 21, 2025
28 checks passed
@asmorkalov asmorkalov mentioned this pull request Apr 29, 2025
@fengyuentau
Copy link
Member

fengyuentau commented May 14, 2025

Regressions with this patch on K1:

[  FAILED  ] 8 tests, listed below:
[  FAILED  ] Size_MatType_OutMatDepth_integral_sqsum.integral_sqsum/1, where GetParam() = (640x480, 8UC1, CV_32F)
[  FAILED  ] Size_MatType_OutMatDepth_integral_sqsum.integral_sqsum/4, where GetParam() = (640x480, 8UC2, CV_32F)
[  FAILED  ] Size_MatType_OutMatDepth_integral_sqsum.integral_sqsum/10, where GetParam() = (640x480, 8UC4, CV_32F)
[  FAILED  ] Size_MatType_OutMatDepth_integral_sqsum.integral_sqsum/16, where GetParam() = (1280x720, 8UC2, CV_32F)
[  FAILED  ] Size_MatType_OutMatDepth_integral_sqsum.integral_sqsum/22, where GetParam() = (1280x720, 8UC4, CV_32F)
[  FAILED  ] Size_MatType_OutMatDepth_integral_sqsum.integral_sqsum/25, where GetParam() = (1920x1080, 8UC1, CV_32F)
[  FAILED  ] Size_MatType_OutMatDepth_integral_sqsum.integral_sqsum/28, where GetParam() = (1920x1080, 8UC2, CV_32F)
[  FAILED  ] Size_MatType_OutMatDepth_integral_sqsum.integral_sqsum/34, where GetParam() = (1920x1080, 8UC4, CV_32F)

@YooLc
Copy link
Contributor Author

YooLc commented May 14, 2025

Regressions with this patch on K1:

[  FAILED  ] 8 tests, listed below:
[  FAILED  ] Size_MatType_OutMatDepth_integral_sqsum.integral_sqsum/1, where GetParam() = (640x480, 8UC1, CV_32F)
[  FAILED  ] Size_MatType_OutMatDepth_integral_sqsum.integral_sqsum/4, where GetParam() = (640x480, 8UC2, CV_32F)
[  FAILED  ] Size_MatType_OutMatDepth_integral_sqsum.integral_sqsum/10, where GetParam() = (640x480, 8UC4, CV_32F)
[  FAILED  ] Size_MatType_OutMatDepth_integral_sqsum.integral_sqsum/16, where GetParam() = (1280x720, 8UC2, CV_32F)
[  FAILED  ] Size_MatType_OutMatDepth_integral_sqsum.integral_sqsum/22, where GetParam() = (1280x720, 8UC4, CV_32F)
[  FAILED  ] Size_MatType_OutMatDepth_integral_sqsum.integral_sqsum/25, where GetParam() = (1920x1080, 8UC1, CV_32F)
[  FAILED  ] Size_MatType_OutMatDepth_integral_sqsum.integral_sqsum/28, where GetParam() = (1920x1080, 8UC2, CV_32F)
[  FAILED  ] Size_MatType_OutMatDepth_integral_sqsum.integral_sqsum/34, where GetParam() = (1920x1080, 8UC4, CV_32F)

Sorry about that 🥹
I noticed that there's performance difference between two clusters of K1 core (core 0-3 runs faster than core 5-7)
Could you test again with numactl -C 0-3 again and confirm that there actual is a performance regression?
Thank you!

@fengyuentau
Copy link
Member

Regressions with this patch on K1:

[  FAILED  ] 8 tests, listed below:
[  FAILED  ] Size_MatType_OutMatDepth_integral_sqsum.integral_sqsum/1, where GetParam() = (640x480, 8UC1, CV_32F)
[  FAILED  ] Size_MatType_OutMatDepth_integral_sqsum.integral_sqsum/4, where GetParam() = (640x480, 8UC2, CV_32F)
[  FAILED  ] Size_MatType_OutMatDepth_integral_sqsum.integral_sqsum/10, where GetParam() = (640x480, 8UC4, CV_32F)
[  FAILED  ] Size_MatType_OutMatDepth_integral_sqsum.integral_sqsum/16, where GetParam() = (1280x720, 8UC2, CV_32F)
[  FAILED  ] Size_MatType_OutMatDepth_integral_sqsum.integral_sqsum/22, where GetParam() = (1280x720, 8UC4, CV_32F)
[  FAILED  ] Size_MatType_OutMatDepth_integral_sqsum.integral_sqsum/25, where GetParam() = (1920x1080, 8UC1, CV_32F)
[  FAILED  ] Size_MatType_OutMatDepth_integral_sqsum.integral_sqsum/28, where GetParam() = (1920x1080, 8UC2, CV_32F)
[  FAILED  ] Size_MatType_OutMatDepth_integral_sqsum.integral_sqsum/34, where GetParam() = (1920x1080, 8UC4, CV_32F)

Sorry about that 🥹 I noticed that there's performance difference between two clusters of K1 core (core 0-3 runs faster than core 5-7) Could you test again with numactl -C 0-3 again and confirm that there actual is a performance regression? Thank you!

It is not about performance but accuracy. We have sanity checks, which works with data in opencv_extra, in most of the performance testings. For example,

SANITY_CHECK(sum, 1e-6);
SANITY_CHECK(sqsum, 1e-6);

@YooLc
Copy link
Contributor Author

YooLc commented May 16, 2025

Regressions with this patch on K1:

[  FAILED  ] 8 tests, listed below:
[  FAILED  ] Size_MatType_OutMatDepth_integral_sqsum.integral_sqsum/1, where GetParam() = (640x480, 8UC1, CV_32F)
[  FAILED  ] Size_MatType_OutMatDepth_integral_sqsum.integral_sqsum/4, where GetParam() = (640x480, 8UC2, CV_32F)
[  FAILED  ] Size_MatType_OutMatDepth_integral_sqsum.integral_sqsum/10, where GetParam() = (640x480, 8UC4, CV_32F)
[  FAILED  ] Size_MatType_OutMatDepth_integral_sqsum.integral_sqsum/16, where GetParam() = (1280x720, 8UC2, CV_32F)
[  FAILED  ] Size_MatType_OutMatDepth_integral_sqsum.integral_sqsum/22, where GetParam() = (1280x720, 8UC4, CV_32F)
[  FAILED  ] Size_MatType_OutMatDepth_integral_sqsum.integral_sqsum/25, where GetParam() = (1920x1080, 8UC1, CV_32F)
[  FAILED  ] Size_MatType_OutMatDepth_integral_sqsum.integral_sqsum/28, where GetParam() = (1920x1080, 8UC2, CV_32F)
[  FAILED  ] Size_MatType_OutMatDepth_integral_sqsum.integral_sqsum/34, where GetParam() = (1920x1080, 8UC4, CV_32F)

Sorry about that 🥹 I noticed that there's performance difference between two clusters of K1 core (core 0-3 runs faster than core 5-7) Could you test again with numactl -C 0-3 again and confirm that there actual is a performance regression? Thank you!

It is not about performance but accuracy. We have sanity checks, which works with data in opencv_extra, in most of the performance testings. For example,

SANITY_CHECK(sum, 1e-6);
SANITY_CHECK(sqsum, 1e-6);

Oh now I understand, thank you for explanation! I remembered #27060 (comment) pointed out the same failed tests. I will try to reproduce the failed tests and investigate about that

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy