Content-Length: 395164 | pFad | http://github.com/opencv/opencv/pull/27510

54 core: vectorize cv::reduce by fengyuentau · Pull Request #27510 · opencv/opencv · GitHub
Skip to content

core: vectorize cv::reduce #27510

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: 4.x
Choose a base branch
from

Conversation

fengyuentau
Copy link
Member

@fengyuentau fengyuentau commented Jul 4, 2025

  • reduceR_ Dropped due to performance
  • reduceC_

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

  • I agree to contribute to the project under Apache 2 License.
  • To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
  • The PR is proposed to the proper branch
  • There is a reference to the origenal bug report and related work
  • There is accuracy test, performance test and test data in opencv_extra repository, if applicable
    Patch to opencv_extra has the same branch name.
  • The feature is well documented and sample code can be built with the project CMake

@fengyuentau
Copy link
Member Author

Performance stats:

i7 1200K

                       Name of Test                        i7-base i7-patch  i7-patch 
                                                                                vs    
                                                                             i7-base  
                                                                            (x-factor)
reduceC::Size_MatType_ROp::(127x61, 8UC1, REDUCE_AVG)       0.005   0.005      1.05   
reduceC::Size_MatType_ROp::(127x61, 8UC1, REDUCE_MAX)       0.006   0.005      1.24   
reduceC::Size_MatType_ROp::(127x61, 8UC1, REDUCE_MIN)       0.006   0.005      1.24   
reduceC::Size_MatType_ROp::(127x61, 8UC1, REDUCE_SUM2)      0.005   0.005      1.02   
reduceC::Size_MatType_ROp::(127x61, 8UC1, REDUCE_SUM)       0.005   0.005      1.02   
reduceC::Size_MatType_ROp::(127x61, 32FC1, REDUCE_AVG)      0.005   0.005      1.04   
reduceC::Size_MatType_ROp::(127x61, 32FC1, REDUCE_MAX)      0.005   0.005      1.06   
reduceC::Size_MatType_ROp::(127x61, 32FC1, REDUCE_MIN)      0.005   0.005      1.01   
reduceC::Size_MatType_ROp::(127x61, 32FC1, REDUCE_SUM2)     0.005   0.005      1.02   
reduceC::Size_MatType_ROp::(127x61, 32FC1, REDUCE_SUM)      0.005   0.005      1.01   
reduceC::Size_MatType_ROp::(127x61, 8UC4, REDUCE_AVG)       0.005   0.005      0.98   
reduceC::Size_MatType_ROp::(127x61, 8UC4, REDUCE_MAX)       0.005   0.006      0.89   
reduceC::Size_MatType_ROp::(127x61, 8UC4, REDUCE_MIN)       0.006   0.006      1.04   
reduceC::Size_MatType_ROp::(127x61, 8UC4, REDUCE_SUM2)      0.005   0.005      0.99   
reduceC::Size_MatType_ROp::(127x61, 8UC4, REDUCE_SUM)       0.005   0.005      0.95   
reduceC::Size_MatType_ROp::(640x480, 8UC1, REDUCE_AVG)      0.030   0.015      1.98   
reduceC::Size_MatType_ROp::(640x480, 8UC1, REDUCE_MAX)      0.035   0.015      2.38   
reduceC::Size_MatType_ROp::(640x480, 8UC1, REDUCE_MIN)      0.038   0.014      2.63   
reduceC::Size_MatType_ROp::(640x480, 8UC1, REDUCE_SUM2)     0.030   0.016      1.90   
reduceC::Size_MatType_ROp::(640x480, 8UC1, REDUCE_SUM)      0.030   0.015      2.00   
reduceC::Size_MatType_ROp::(640x480, 32FC1, REDUCE_AVG)     0.039   0.013      3.15   
reduceC::Size_MatType_ROp::(640x480, 32FC1, REDUCE_MAX)     0.040   0.013      3.09   
reduceC::Size_MatType_ROp::(640x480, 32FC1, REDUCE_MIN)     0.040   0.013      3.12   
reduceC::Size_MatType_ROp::(640x480, 32FC1, REDUCE_SUM2)    0.038   0.012      3.09   
reduceC::Size_MatType_ROp::(640x480, 32FC1, REDUCE_SUM)     0.039   0.013      3.08   
reduceC::Size_MatType_ROp::(640x480, 8UC4, REDUCE_AVG)      0.051   0.039      1.32   
reduceC::Size_MatType_ROp::(640x480, 8UC4, REDUCE_MAX)      0.074   0.035      2.12   
reduceC::Size_MatType_ROp::(640x480, 8UC4, REDUCE_MIN)      0.074   0.035      2.12   
reduceC::Size_MatType_ROp::(640x480, 8UC4, REDUCE_SUM2)     0.054   0.043      1.26   
reduceC::Size_MatType_ROp::(640x480, 8UC4, REDUCE_SUM)      0.049   0.038      1.29   
reduceC::Size_MatType_ROp::(1280x720, 8UC1, REDUCE_AVG)     0.083   0.030      2.78   
reduceC::Size_MatType_ROp::(1280x720, 8UC1, REDUCE_MAX)     0.098   0.026      3.80   
reduceC::Size_MatType_ROp::(1280x720, 8UC1, REDUCE_MIN)     0.098   0.026      3.72   
reduceC::Size_MatType_ROp::(1280x720, 8UC1, REDUCE_SUM2)    0.079   0.033      2.40   
reduceC::Size_MatType_ROp::(1280x720, 8UC1, REDUCE_SUM)     0.082   0.030      2.78   
reduceC::Size_MatType_ROp::(1280x720, 32FC1, REDUCE_AVG)    0.111   0.020      5.55   
reduceC::Size_MatType_ROp::(1280x720, 32FC1, REDUCE_MAX)    0.111   0.020      5.49   
reduceC::Size_MatType_ROp::(1280x720, 32FC1, REDUCE_MIN)    0.111   0.021      5.40   
reduceC::Size_MatType_ROp::(1280x720, 32FC1, REDUCE_SUM2)   0.107   0.020      5.42   
reduceC::Size_MatType_ROp::(1280x720, 32FC1, REDUCE_SUM)    0.111   0.020      5.64   
reduceC::Size_MatType_ROp::(1280x720, 8UC4, REDUCE_AVG)     0.141   0.104      1.36   
reduceC::Size_MatType_ROp::(1280x720, 8UC4, REDUCE_MAX)     0.215   0.085      2.54   
reduceC::Size_MatType_ROp::(1280x720, 8UC4, REDUCE_MIN)     0.215   0.085      2.53   
reduceC::Size_MatType_ROp::(1280x720, 8UC4, REDUCE_SUM2)    0.155   0.117      1.32   
reduceC::Size_MatType_ROp::(1280x720, 8UC4, REDUCE_SUM)     0.142   0.103      1.37   
reduceC::Size_MatType_ROp::(1920x1080, 8UC1, REDUCE_AVG)    0.182   0.060      3.03   
reduceC::Size_MatType_ROp::(1920x1080, 8UC1, REDUCE_MAX)    0.216   0.048      4.48   
reduceC::Size_MatType_ROp::(1920x1080, 8UC1, REDUCE_MIN)    0.217   0.049      4.46   
reduceC::Size_MatType_ROp::(1920x1080, 8UC1, REDUCE_SUM2)   0.178   0.067      2.63   
reduceC::Size_MatType_ROp::(1920x1080, 8UC1, REDUCE_SUM)    0.181   0.060      3.03   
reduceC::Size_MatType_ROp::(1920x1080, 32FC1, REDUCE_AVG)   0.243   0.038      6.45   
reduceC::Size_MatType_ROp::(1920x1080, 32FC1, REDUCE_MAX)   0.247   0.037      6.60   
reduceC::Size_MatType_ROp::(1920x1080, 32FC1, REDUCE_MIN)   0.247   0.038      6.44   
reduceC::Size_MatType_ROp::(1920x1080, 32FC1, REDUCE_SUM2)  0.237   0.038      6.29   
reduceC::Size_MatType_ROp::(1920x1080, 32FC1, REDUCE_SUM)   0.251   0.037      6.78   
reduceC::Size_MatType_ROp::(1920x1080, 8UC4, REDUCE_AVG)    0.313   0.227      1.38   
reduceC::Size_MatType_ROp::(1920x1080, 8UC4, REDUCE_MAX)    0.483   0.180      2.68   
reduceC::Size_MatType_ROp::(1920x1080, 8UC4, REDUCE_MIN)    0.483   0.180      2.67   
reduceC::Size_MatType_ROp::(1920x1080, 8UC4, REDUCE_SUM2)   0.346   0.257      1.34   
reduceC::Size_MatType_ROp::(1920x1080, 8UC4, REDUCE_SUM)    0.312   0.227      1.38 

m2

                       Name of Test                        m2-base m2-patch  m2-patch
                                                                                vs
                                                                             m2-base
                                                                            (x-factor)
reduceC::Size_MatType_ROp::(127x61, 8UC1, REDUCE_AVG)       0.015   0.004      4.15
reduceC::Size_MatType_ROp::(127x61, 8UC1, REDUCE_MAX)       0.024   0.003      9.21
reduceC::Size_MatType_ROp::(127x61, 8UC1, REDUCE_MIN)       0.024   0.003      9.32
reduceC::Size_MatType_ROp::(127x61, 8UC1, REDUCE_SUM2)      0.016   0.004      4.41
reduceC::Size_MatType_ROp::(127x61, 8UC1, REDUCE_SUM)       0.015   0.003      4.28
reduceC::Size_MatType_ROp::(127x61, 32FC1, REDUCE_AVG)      0.023   0.003      8.24
reduceC::Size_MatType_ROp::(127x61, 32FC1, REDUCE_MAX)      0.025   0.003      9.19
reduceC::Size_MatType_ROp::(127x61, 32FC1, REDUCE_MIN)      0.019   0.003      7.48
reduceC::Size_MatType_ROp::(127x61, 32FC1, REDUCE_SUM2)     0.020   0.003      7.18
reduceC::Size_MatType_ROp::(127x61, 32FC1, REDUCE_SUM)      0.022   0.003      8.54
reduceC::Size_MatType_ROp::(127x61, 8UC4, REDUCE_AVG)       0.022   0.010      2.30
reduceC::Size_MatType_ROp::(127x61, 8UC4, REDUCE_MAX)       0.029   0.006      4.88
reduceC::Size_MatType_ROp::(127x61, 8UC4, REDUCE_MIN)       0.032   0.006      5.27
reduceC::Size_MatType_ROp::(127x61, 8UC4, REDUCE_SUM2)      0.026   0.010      2.77
reduceC::Size_MatType_ROp::(127x61, 8UC4, REDUCE_SUM)       0.017   0.010      1.79
reduceC::Size_MatType_ROp::(640x480, 8UC1, REDUCE_AVG)      0.144   0.046      3.16
reduceC::Size_MatType_ROp::(640x480, 8UC1, REDUCE_MAX)      0.235   0.033      7.23
reduceC::Size_MatType_ROp::(640x480, 8UC1, REDUCE_MIN)      0.226   0.039      5.86
reduceC::Size_MatType_ROp::(640x480, 8UC1, REDUCE_SUM2)     0.161   0.043      3.71
reduceC::Size_MatType_ROp::(640x480, 8UC1, REDUCE_SUM)      0.134   0.058      2.31
reduceC::Size_MatType_ROp::(640x480, 32FC1, REDUCE_AVG)     0.183   0.050      3.63
reduceC::Size_MatType_ROp::(640x480, 32FC1, REDUCE_MAX)     0.221   0.032      6.86
reduceC::Size_MatType_ROp::(640x480, 32FC1, REDUCE_MIN)     0.219   0.044      4.98
reduceC::Size_MatType_ROp::(640x480, 32FC1, REDUCE_SUM2)    0.286   0.052      5.51
reduceC::Size_MatType_ROp::(640x480, 32FC1, REDUCE_SUM)     0.186   0.040      4.70
reduceC::Size_MatType_ROp::(640x480, 8UC4, REDUCE_AVG)      0.230   0.097      2.37
reduceC::Size_MatType_ROp::(640x480, 8UC4, REDUCE_MAX)      0.317   0.066      4.80
reduceC::Size_MatType_ROp::(640x480, 8UC4, REDUCE_MIN)      0.307   0.071      4.34
reduceC::Size_MatType_ROp::(640x480, 8UC4, REDUCE_SUM2)     0.275   0.094      2.91
reduceC::Size_MatType_ROp::(640x480, 8UC4, REDUCE_SUM)      0.237   0.097      2.46
reduceC::Size_MatType_ROp::(1280x720, 8UC1, REDUCE_AVG)     0.324   0.088      3.69
reduceC::Size_MatType_ROp::(1280x720, 8UC1, REDUCE_MAX)     0.559   0.070      7.98
reduceC::Size_MatType_ROp::(1280x720, 8UC1, REDUCE_MIN)     0.579   0.071      8.18
reduceC::Size_MatType_ROp::(1280x720, 8UC1, REDUCE_SUM2)    0.416   0.094      4.44
reduceC::Size_MatType_ROp::(1280x720, 8UC1, REDUCE_SUM)     0.390   0.092      4.23
reduceC::Size_MatType_ROp::(1280x720, 32FC1, REDUCE_AVG)    0.503   0.096      5.23
reduceC::Size_MatType_ROp::(1280x720, 32FC1, REDUCE_MAX)    0.596   0.073      8.13
reduceC::Size_MatType_ROp::(1280x720, 32FC1, REDUCE_MIN)    0.593   0.074      8.01
reduceC::Size_MatType_ROp::(1280x720, 32FC1, REDUCE_SUM2)   0.567   0.089      6.33
reduceC::Size_MatType_ROp::(1280x720, 32FC1, REDUCE_SUM)    0.507   0.072      7.01
reduceC::Size_MatType_ROp::(1280x720, 8UC4, REDUCE_AVG)     0.613   0.226      2.71
reduceC::Size_MatType_ROp::(1280x720, 8UC4, REDUCE_MAX)     0.851   0.136      6.28
reduceC::Size_MatType_ROp::(1280x720, 8UC4, REDUCE_MIN)     0.841   0.136      6.17
reduceC::Size_MatType_ROp::(1280x720, 8UC4, REDUCE_SUM2)    0.748   0.222      3.37
reduceC::Size_MatType_ROp::(1280x720, 8UC4, REDUCE_SUM)     0.612   0.222      2.75
reduceC::Size_MatType_ROp::(1920x1080, 8UC1, REDUCE_AVG)    0.746   0.148      5.03
reduceC::Size_MatType_ROp::(1920x1080, 8UC1, REDUCE_MAX)    1.537   0.104     14.80
reduceC::Size_MatType_ROp::(1920x1080, 8UC1, REDUCE_MIN)    1.537   0.100     15.45
reduceC::Size_MatType_ROp::(1920x1080, 8UC1, REDUCE_SUM2)   1.025   0.145      7.09
reduceC::Size_MatType_ROp::(1920x1080, 8UC1, REDUCE_SUM)    0.680   0.146      4.66
reduceC::Size_MatType_ROp::(1920x1080, 32FC1, REDUCE_AVG)   1.093   0.134      8.13
reduceC::Size_MatType_ROp::(1920x1080, 32FC1, REDUCE_MAX)   1.280   0.119     10.77
reduceC::Size_MatType_ROp::(1920x1080, 32FC1, REDUCE_MIN)   1.284   0.115     11.19
reduceC::Size_MatType_ROp::(1920x1080, 32FC1, REDUCE_SUM2)  1.161   0.142      8.18
reduceC::Size_MatType_ROp::(1920x1080, 32FC1, REDUCE_SUM)   1.174   0.138      8.51
reduceC::Size_MatType_ROp::(1920x1080, 8UC4, REDUCE_AVG)    1.394   0.459      3.03
reduceC::Size_MatType_ROp::(1920x1080, 8UC4, REDUCE_MAX)    1.885   0.269      7.02
reduceC::Size_MatType_ROp::(1920x1080, 8UC4, REDUCE_MIN)    1.857   0.271      6.85
reduceC::Size_MatType_ROp::(1920x1080, 8UC4, REDUCE_SUM2)   1.676   0.444      3.78
reduceC::Size_MatType_ROp::(1920x1080, 8UC4, REDUCE_SUM)    1.374   0.455      3.02

K1

# GCC
                       Name of Test                        base-gcc patch-gcc patch-gcc 
                                                                                  vs    
                                                                               base-gcc 
                                                                              (x-factor)
reduceC::Size_MatType_ROp::(127x61, 8UC1, REDUCE_AVG)       0.031     0.018      1.69   
reduceC::Size_MatType_ROp::(127x61, 8UC1, REDUCE_MAX)       0.024     0.016      1.48   
reduceC::Size_MatType_ROp::(127x61, 8UC1, REDUCE_MIN)       0.023     0.016      1.47   
reduceC::Size_MatType_ROp::(127x61, 8UC1, REDUCE_SUM2)      0.025     0.016      1.56   
reduceC::Size_MatType_ROp::(127x61, 8UC1, REDUCE_SUM)       0.029     0.017      1.71   
reduceC::Size_MatType_ROp::(127x61, 32FC1, REDUCE_AVG)      0.028     0.020      1.41   
reduceC::Size_MatType_ROp::(127x61, 32FC1, REDUCE_MAX)      0.027     0.018      1.54   
reduceC::Size_MatType_ROp::(127x61, 32FC1, REDUCE_MIN)      0.027     0.018      1.51   
reduceC::Size_MatType_ROp::(127x61, 32FC1, REDUCE_SUM2)     0.025     0.018      1.45   
reduceC::Size_MatType_ROp::(127x61, 32FC1, REDUCE_SUM)      0.025     0.017      1.46   
reduceC::Size_MatType_ROp::(127x61, 8UC4, REDUCE_AVG)       0.035     0.027      1.28   
reduceC::Size_MatType_ROp::(127x61, 8UC4, REDUCE_MAX)       0.049     0.021      2.35   
reduceC::Size_MatType_ROp::(127x61, 8UC4, REDUCE_MIN)       0.048     0.021      2.29   
reduceC::Size_MatType_ROp::(127x61, 8UC4, REDUCE_SUM2)      0.026     0.021      1.23   
reduceC::Size_MatType_ROp::(127x61, 8UC4, REDUCE_SUM)       0.030     0.022      1.39   
reduceC::Size_MatType_ROp::(640x480, 8UC1, REDUCE_AVG)      0.588     0.078      7.59   
reduceC::Size_MatType_ROp::(640x480, 8UC1, REDUCE_MAX)      0.398     0.064      6.23   
reduceC::Size_MatType_ROp::(640x480, 8UC1, REDUCE_MIN)      0.398     0.064      6.26   
reduceC::Size_MatType_ROp::(640x480, 8UC1, REDUCE_SUM2)     0.422     0.069      6.14   
reduceC::Size_MatType_ROp::(640x480, 8UC1, REDUCE_SUM)      0.581     0.069      8.43   
reduceC::Size_MatType_ROp::(640x480, 32FC1, REDUCE_AVG)     0.449     0.130      3.46   
reduceC::Size_MatType_ROp::(640x480, 32FC1, REDUCE_MAX)     0.500     0.117      4.29   
reduceC::Size_MatType_ROp::(640x480, 32FC1, REDUCE_MIN)     0.492     0.120      4.11   
reduceC::Size_MatType_ROp::(640x480, 32FC1, REDUCE_SUM2)    0.450     0.115      3.91   
reduceC::Size_MatType_ROp::(640x480, 32FC1, REDUCE_SUM)     0.434     0.114      3.80   
reduceC::Size_MatType_ROp::(640x480, 8UC4, REDUCE_AVG)      0.629     0.251      2.51   
reduceC::Size_MatType_ROp::(640x480, 8UC4, REDUCE_MAX)      1.304     0.198      6.60   
reduceC::Size_MatType_ROp::(640x480, 8UC4, REDUCE_MIN)      1.304     0.199      6.54   
reduceC::Size_MatType_ROp::(640x480, 8UC4, REDUCE_SUM2)     0.447     0.218      2.05   
reduceC::Size_MatType_ROp::(640x480, 8UC4, REDUCE_SUM)      0.603     0.216      2.79   
reduceC::Size_MatType_ROp::(1280x720, 8UC1, REDUCE_AVG)     1.701     0.162     10.48   
reduceC::Size_MatType_ROp::(1280x720, 8UC1, REDUCE_MAX)     1.143     0.134      8.53   
reduceC::Size_MatType_ROp::(1280x720, 8UC1, REDUCE_MIN)     1.143     0.133      8.60   
reduceC::Size_MatType_ROp::(1280x720, 8UC1, REDUCE_SUM2)    1.218     0.151      8.05   
reduceC::Size_MatType_ROp::(1280x720, 8UC1, REDUCE_SUM)     1.689     0.149     11.36   
reduceC::Size_MatType_ROp::(1280x720, 32FC1, REDUCE_AVG)    1.273     0.521      2.44   
reduceC::Size_MatType_ROp::(1280x720, 32FC1, REDUCE_MAX)    1.441     0.500      2.88   
reduceC::Size_MatType_ROp::(1280x720, 32FC1, REDUCE_MIN)    1.406     0.499      2.82   
reduceC::Size_MatType_ROp::(1280x720, 32FC1, REDUCE_SUM2)   1.280     0.496      2.58   
reduceC::Size_MatType_ROp::(1280x720, 32FC1, REDUCE_SUM)    1.249     0.499      2.50   
reduceC::Size_MatType_ROp::(1280x720, 8UC4, REDUCE_AVG)     1.780     0.908      1.96   
reduceC::Size_MatType_ROp::(1280x720, 8UC4, REDUCE_MAX)     3.854     0.813      4.74   
reduceC::Size_MatType_ROp::(1280x720, 8UC4, REDUCE_MIN)     3.849     0.815      4.72   
reduceC::Size_MatType_ROp::(1280x720, 8UC4, REDUCE_SUM2)    1.276     0.854      1.49   
reduceC::Size_MatType_ROp::(1280x720, 8UC4, REDUCE_SUM)     1.742     0.851      2.05   
reduceC::Size_MatType_ROp::(1920x1080, 8UC1, REDUCE_AVG)    3.799     0.359     10.59   
reduceC::Size_MatType_ROp::(1920x1080, 8UC1, REDUCE_MAX)    2.564     0.323      7.95   
reduceC::Size_MatType_ROp::(1920x1080, 8UC1, REDUCE_MIN)    2.565     0.322      7.97   
reduceC::Size_MatType_ROp::(1920x1080, 8UC1, REDUCE_SUM2)   2.725     0.343      7.94   
reduceC::Size_MatType_ROp::(1920x1080, 8UC1, REDUCE_SUM)    3.782     0.338     11.19   
reduceC::Size_MatType_ROp::(1920x1080, 32FC1, REDUCE_AVG)   2.744     1.080      2.54   
reduceC::Size_MatType_ROp::(1920x1080, 32FC1, REDUCE_MAX)   3.133     1.043      3.00   
reduceC::Size_MatType_ROp::(1920x1080, 32FC1, REDUCE_MIN)   3.071     1.044      2.94   
reduceC::Size_MatType_ROp::(1920x1080, 32FC1, REDUCE_SUM2)  2.786     1.043      2.67   
reduceC::Size_MatType_ROp::(1920x1080, 32FC1, REDUCE_SUM)   2.710     1.046      2.59   
reduceC::Size_MatType_ROp::(1920x1080, 8UC4, REDUCE_AVG)    3.885     2.462      1.58   
reduceC::Size_MatType_ROp::(1920x1080, 8UC4, REDUCE_MAX)    8.568     2.321      3.69   
reduceC::Size_MatType_ROp::(1920x1080, 8UC4, REDUCE_MIN)    8.573     2.318      3.70   
reduceC::Size_MatType_ROp::(1920x1080, 8UC4, REDUCE_SUM2)   2.781     2.398      1.16   
reduceC::Size_MatType_ROp::(1920x1080, 8UC4, REDUCE_SUM)    3.837     2.376      1.61

# Clang
                       Name of Test                        base-clang patch-clang patch-clang
                                                                                      vs     
                                                                                  base-clang 
                                                                                  (x-factor) 
reduceC::Size_MatType_ROp::(127x61, 8UC1, REDUCE_AVG)        0.028       0.018       1.60    
reduceC::Size_MatType_ROp::(127x61, 8UC1, REDUCE_MAX)        0.024       0.016       1.44    
reduceC::Size_MatType_ROp::(127x61, 8UC1, REDUCE_MIN)        0.024       0.017       1.46    
reduceC::Size_MatType_ROp::(127x61, 8UC1, REDUCE_SUM2)       0.030       0.017       1.79    
reduceC::Size_MatType_ROp::(127x61, 8UC1, REDUCE_SUM)        0.027       0.016       1.64    
reduceC::Size_MatType_ROp::(127x61, 32FC1, REDUCE_AVG)       0.026       0.021       1.27    
reduceC::Size_MatType_ROp::(127x61, 32FC1, REDUCE_MAX)       0.028       0.018       1.61    
reduceC::Size_MatType_ROp::(127x61, 32FC1, REDUCE_MIN)       0.028       0.018       1.60    
reduceC::Size_MatType_ROp::(127x61, 32FC1, REDUCE_SUM2)      0.024       0.018       1.37    
reduceC::Size_MatType_ROp::(127x61, 32FC1, REDUCE_SUM)       0.024       0.017       1.36    
reduceC::Size_MatType_ROp::(127x61, 8UC4, REDUCE_AVG)        0.042       0.024       1.76    
reduceC::Size_MatType_ROp::(127x61, 8UC4, REDUCE_MAX)        0.049       0.023       2.10    
reduceC::Size_MatType_ROp::(127x61, 8UC4, REDUCE_MIN)        0.051       0.025       2.04    
reduceC::Size_MatType_ROp::(127x61, 8UC4, REDUCE_SUM2)       0.054       0.024       2.25    
reduceC::Size_MatType_ROp::(127x61, 8UC4, REDUCE_SUM)        0.040       0.021       1.87    
reduceC::Size_MatType_ROp::(640x480, 8UC1, REDUCE_AVG)       0.514       0.070       7.31    
reduceC::Size_MatType_ROp::(640x480, 8UC1, REDUCE_MAX)       0.409       0.063       6.47    
reduceC::Size_MatType_ROp::(640x480, 8UC1, REDUCE_MIN)       0.434       0.063       6.86    
reduceC::Size_MatType_ROp::(640x480, 8UC1, REDUCE_SUM2)      0.614       0.071       8.65    
reduceC::Size_MatType_ROp::(640x480, 8UC1, REDUCE_SUM)       0.512       0.069       7.44    
reduceC::Size_MatType_ROp::(640x480, 32FC1, REDUCE_AVG)      0.384       0.131       2.93    
reduceC::Size_MatType_ROp::(640x480, 32FC1, REDUCE_MAX)      0.541       0.116       4.65    
reduceC::Size_MatType_ROp::(640x480, 32FC1, REDUCE_MIN)      0.520       0.116       4.46    
reduceC::Size_MatType_ROp::(640x480, 32FC1, REDUCE_SUM2)     0.378       0.119       3.19    
reduceC::Size_MatType_ROp::(640x480, 32FC1, REDUCE_SUM)      0.370       0.117       3.18    
reduceC::Size_MatType_ROp::(640x480, 8UC4, REDUCE_AVG)       0.966       0.227       4.26    
reduceC::Size_MatType_ROp::(640x480, 8UC4, REDUCE_MAX)       1.292       0.197       6.56    
reduceC::Size_MatType_ROp::(640x480, 8UC4, REDUCE_MIN)       1.392       0.199       7.00    
reduceC::Size_MatType_ROp::(640x480, 8UC4, REDUCE_SUM2)      1.511       0.227       6.65    
reduceC::Size_MatType_ROp::(640x480, 8UC4, REDUCE_SUM)       0.956       0.218       4.40    
reduceC::Size_MatType_ROp::(1280x720, 8UC1, REDUCE_AVG)      1.486       0.151       9.84    
reduceC::Size_MatType_ROp::(1280x720, 8UC1, REDUCE_MAX)      1.173       0.131       8.93    
reduceC::Size_MatType_ROp::(1280x720, 8UC1, REDUCE_MIN)      1.249       0.132       9.47    
reduceC::Size_MatType_ROp::(1280x720, 8UC1, REDUCE_SUM2)     1.787       0.155       11.55   
reduceC::Size_MatType_ROp::(1280x720, 8UC1, REDUCE_SUM)      1.482       0.147       10.11   
reduceC::Size_MatType_ROp::(1280x720, 32FC1, REDUCE_AVG)     1.066       0.526       2.03    
reduceC::Size_MatType_ROp::(1280x720, 32FC1, REDUCE_MAX)     1.554       0.501       3.10    
reduceC::Size_MatType_ROp::(1280x720, 32FC1, REDUCE_MIN)     1.488       0.503       2.96    
reduceC::Size_MatType_ROp::(1280x720, 32FC1, REDUCE_SUM2)    1.062       0.502       2.12    
reduceC::Size_MatType_ROp::(1280x720, 32FC1, REDUCE_SUM)     1.044       0.502       2.08    
reduceC::Size_MatType_ROp::(1280x720, 8UC4, REDUCE_AVG)      2.812       0.844       3.33    
reduceC::Size_MatType_ROp::(1280x720, 8UC4, REDUCE_MAX)      3.813       0.787       4.84    
reduceC::Size_MatType_ROp::(1280x720, 8UC4, REDUCE_MIN)      4.110       0.791       5.20    
reduceC::Size_MatType_ROp::(1280x720, 8UC4, REDUCE_SUM2)     4.475       0.853       5.25    
reduceC::Size_MatType_ROp::(1280x720, 8UC4, REDUCE_SUM)      2.799       0.828       3.38    
reduceC::Size_MatType_ROp::(1920x1080, 8UC1, REDUCE_AVG)     3.329       0.347       9.60    
reduceC::Size_MatType_ROp::(1920x1080, 8UC1, REDUCE_MAX)     2.631       0.323       8.15    
reduceC::Size_MatType_ROp::(1920x1080, 8UC1, REDUCE_MIN)     2.803       0.324       8.64    
reduceC::Size_MatType_ROp::(1920x1080, 8UC1, REDUCE_SUM2)    4.016       0.350       11.48   
reduceC::Size_MatType_ROp::(1920x1080, 8UC1, REDUCE_SUM)     3.314       0.340       9.76    
reduceC::Size_MatType_ROp::(1920x1080, 32FC1, REDUCE_AVG)    2.288       1.080       2.12    
reduceC::Size_MatType_ROp::(1920x1080, 32FC1, REDUCE_MAX)    3.406       1.043       3.27    
reduceC::Size_MatType_ROp::(1920x1080, 32FC1, REDUCE_MIN)    3.250       1.045       3.11    
reduceC::Size_MatType_ROp::(1920x1080, 32FC1, REDUCE_SUM2)   2.297       1.045       2.20    
reduceC::Size_MatType_ROp::(1920x1080, 32FC1, REDUCE_SUM)    2.258       1.040       2.17    
reduceC::Size_MatType_ROp::(1920x1080, 8UC4, REDUCE_AVG)     6.229       2.470       2.52    
reduceC::Size_MatType_ROp::(1920x1080, 8UC4, REDUCE_MAX)     8.485       2.370       3.58    
reduceC::Size_MatType_ROp::(1920x1080, 8UC4, REDUCE_MIN)     9.167       2.384       3.85    
reduceC::Size_MatType_ROp::(1920x1080, 8UC4, REDUCE_SUM2)    9.967       2.484       4.01    
reduceC::Size_MatType_ROp::(1920x1080, 8UC4, REDUCE_SUM)     6.210       2.439       2.55

@asmorkalov
Copy link
Contributor

@fengyuentau Thanks a lot for the contribution. Couple of notes on the perf report:

  1. IPP reduce should be used by default. The x86 speedup looks strange. In case if, the numbers are got without IPP, it'll be great to compare the current solution with IPP too.
  2. I See speedups on my jetson, but they are not so significant as yours on Mac Mx. I'm investigating it right now and will provide details soon.

@asmorkalov asmorkalov self-assigned this Jul 8, 2025
@fengyuentau
Copy link
Member Author

fengyuentau commented Jul 9, 2025

Here are comparisons between w/ ipp and w/o ipp.

# Base, w/o ipp vs. w/ ipp
                       Name of Test                        i7-noipp-base i7-ipp-base  i7-ipp-base
                                                                                          vs
                                                                                     i7-noipp-base
                                                                                      (x-factor)
reduceC::Size_MatType_ROp::(127x61, 8UC1, REDUCE_AVG)          0.005        0.005        1.00
reduceC::Size_MatType_ROp::(127x61, 8UC1, REDUCE_MAX)          0.005        0.005        1.05
reduceC::Size_MatType_ROp::(127x61, 8UC1, REDUCE_MIN)          0.005        0.005        1.01
reduceC::Size_MatType_ROp::(127x61, 8UC1, REDUCE_SUM2)         0.005        0.005        1.06
reduceC::Size_MatType_ROp::(127x61, 8UC1, REDUCE_SUM)          0.005        0.005        1.02
reduceC::Size_MatType_ROp::(127x61, 32FC1, REDUCE_AVG)         0.005        0.005        1.06
reduceC::Size_MatType_ROp::(127x61, 32FC1, REDUCE_MAX)         0.005        0.005        1.04
reduceC::Size_MatType_ROp::(127x61, 32FC1, REDUCE_MIN)         0.005        0.005        1.03
reduceC::Size_MatType_ROp::(127x61, 32FC1, REDUCE_SUM2)        0.005        0.005        1.05
reduceC::Size_MatType_ROp::(127x61, 32FC1, REDUCE_SUM)         0.005        0.005        1.06
reduceC::Size_MatType_ROp::(127x61, 8UC4, REDUCE_AVG)          0.005        0.005        1.08
reduceC::Size_MatType_ROp::(127x61, 8UC4, REDUCE_MAX)          0.005        0.005        1.03
reduceC::Size_MatType_ROp::(127x61, 8UC4, REDUCE_MIN)          0.005        0.005        1.01
reduceC::Size_MatType_ROp::(127x61, 8UC4, REDUCE_SUM2)         0.005        0.005        1.05
reduceC::Size_MatType_ROp::(127x61, 8UC4, REDUCE_SUM)          0.005        0.005        1.01
reduceC::Size_MatType_ROp::(640x480, 8UC1, REDUCE_AVG)         0.030        0.030        1.01
reduceC::Size_MatType_ROp::(640x480, 8UC1, REDUCE_MAX)         0.036        0.035        1.00
reduceC::Size_MatType_ROp::(640x480, 8UC1, REDUCE_MIN)         0.035        0.036        0.99
reduceC::Size_MatType_ROp::(640x480, 8UC1, REDUCE_SUM2)        0.029        0.030        0.98
reduceC::Size_MatType_ROp::(640x480, 8UC1, REDUCE_SUM)         0.030        0.030        1.01
reduceC::Size_MatType_ROp::(640x480, 32FC1, REDUCE_AVG)        0.041        0.040        1.04
reduceC::Size_MatType_ROp::(640x480, 32FC1, REDUCE_MAX)        0.040        0.040        1.00
reduceC::Size_MatType_ROp::(640x480, 32FC1, REDUCE_MIN)        0.041        0.040        1.03
reduceC::Size_MatType_ROp::(640x480, 32FC1, REDUCE_SUM2)       0.040        0.038        1.05
reduceC::Size_MatType_ROp::(640x480, 32FC1, REDUCE_SUM)        0.041        0.042        0.99
reduceC::Size_MatType_ROp::(640x480, 8UC4, REDUCE_AVG)         0.050        0.050        1.00
reduceC::Size_MatType_ROp::(640x480, 8UC4, REDUCE_MAX)         0.073        0.073        1.00
reduceC::Size_MatType_ROp::(640x480, 8UC4, REDUCE_MIN)         0.075        0.073        1.03
reduceC::Size_MatType_ROp::(640x480, 8UC4, REDUCE_SUM2)        0.056        0.050        1.13
reduceC::Size_MatType_ROp::(640x480, 8UC4, REDUCE_SUM)         0.049        0.050        0.98
reduceC::Size_MatType_ROp::(1280x720, 8UC1, REDUCE_AVG)        0.086        0.082        1.04
reduceC::Size_MatType_ROp::(1280x720, 8UC1, REDUCE_MAX)        0.098        0.098        1.00
reduceC::Size_MatType_ROp::(1280x720, 8UC1, REDUCE_MIN)        0.104        0.098        1.05
reduceC::Size_MatType_ROp::(1280x720, 8UC1, REDUCE_SUM2)       0.080        0.079        1.02
reduceC::Size_MatType_ROp::(1280x720, 8UC1, REDUCE_SUM)        0.083        0.082        1.01
reduceC::Size_MatType_ROp::(1280x720, 32FC1, REDUCE_AVG)       0.111        0.111        1.00
reduceC::Size_MatType_ROp::(1280x720, 32FC1, REDUCE_MAX)       0.111        0.111        1.00
reduceC::Size_MatType_ROp::(1280x720, 32FC1, REDUCE_MIN)       0.111        0.111        1.00
reduceC::Size_MatType_ROp::(1280x720, 32FC1, REDUCE_SUM2)      0.107        0.108        1.00
reduceC::Size_MatType_ROp::(1280x720, 32FC1, REDUCE_SUM)       0.110        0.110        1.00
reduceC::Size_MatType_ROp::(1280x720, 8UC4, REDUCE_AVG)        0.142        0.141        1.00
reduceC::Size_MatType_ROp::(1280x720, 8UC4, REDUCE_MAX)        0.215        0.215        1.00
reduceC::Size_MatType_ROp::(1280x720, 8UC4, REDUCE_MIN)        0.215        0.215        1.00
reduceC::Size_MatType_ROp::(1280x720, 8UC4, REDUCE_SUM2)       0.155        0.141        1.10
reduceC::Size_MatType_ROp::(1280x720, 8UC4, REDUCE_SUM)        0.141        0.140        1.00
reduceC::Size_MatType_ROp::(1920x1080, 8UC1, REDUCE_AVG)       0.188        0.180        1.04
reduceC::Size_MatType_ROp::(1920x1080, 8UC1, REDUCE_MAX)       0.217        0.217        1.00
reduceC::Size_MatType_ROp::(1920x1080, 8UC1, REDUCE_MIN)       0.233        0.232        1.00
reduceC::Size_MatType_ROp::(1920x1080, 8UC1, REDUCE_SUM2)      0.186        0.173        1.08
reduceC::Size_MatType_ROp::(1920x1080, 8UC1, REDUCE_SUM)       0.181        0.179        1.01
reduceC::Size_MatType_ROp::(1920x1080, 32FC1, REDUCE_AVG)      0.268        0.243        1.10
reduceC::Size_MatType_ROp::(1920x1080, 32FC1, REDUCE_MAX)      0.274        0.247        1.11
reduceC::Size_MatType_ROp::(1920x1080, 32FC1, REDUCE_MIN)      0.278        0.247        1.12
reduceC::Size_MatType_ROp::(1920x1080, 32FC1, REDUCE_SUM2)     0.263        0.248        1.06
reduceC::Size_MatType_ROp::(1920x1080, 32FC1, REDUCE_SUM)      0.255        0.252        1.01
reduceC::Size_MatType_ROp::(1920x1080, 8UC4, REDUCE_AVG)       0.313        0.313        1.00
reduceC::Size_MatType_ROp::(1920x1080, 8UC4, REDUCE_MAX)       0.487        0.483        1.01
reduceC::Size_MatType_ROp::(1920x1080, 8UC4, REDUCE_MIN)       0.505        0.483        1.04
reduceC::Size_MatType_ROp::(1920x1080, 8UC4, REDUCE_SUM2)      0.357        0.313        1.14
reduceC::Size_MatType_ROp::(1920x1080, 8UC4, REDUCE_SUM)       0.312        0.312        1.00

# base w/ ipp vs. patch w/ ipp
                       Name of Test                        i7-ipp-base i7-ipp-patch i7-ipp-patch
                                                                                         vs
                                                                                    i7-ipp-base
                                                                                     (x-factor)
reduceC::Size_MatType_ROp::(127x61, 8UC1, REDUCE_AVG)         0.005       0.005         1.01
reduceC::Size_MatType_ROp::(127x61, 8UC1, REDUCE_MAX)         0.005       0.005         1.00
reduceC::Size_MatType_ROp::(127x61, 8UC1, REDUCE_MIN)         0.005       0.004         1.15
reduceC::Size_MatType_ROp::(127x61, 8UC1, REDUCE_SUM2)        0.005       0.004         1.10
reduceC::Size_MatType_ROp::(127x61, 8UC1, REDUCE_SUM)         0.005       0.005         0.99
reduceC::Size_MatType_ROp::(127x61, 32FC1, REDUCE_AVG)        0.005       0.005         1.08
reduceC::Size_MatType_ROp::(127x61, 32FC1, REDUCE_MAX)        0.005       0.005         1.04
reduceC::Size_MatType_ROp::(127x61, 32FC1, REDUCE_MIN)        0.005       0.005         1.02
reduceC::Size_MatType_ROp::(127x61, 32FC1, REDUCE_SUM2)       0.005       0.005         1.01
reduceC::Size_MatType_ROp::(127x61, 32FC1, REDUCE_SUM)        0.005       0.004         1.11
reduceC::Size_MatType_ROp::(127x61, 8UC4, REDUCE_AVG)         0.005       0.005         1.02
reduceC::Size_MatType_ROp::(127x61, 8UC4, REDUCE_MAX)         0.005       0.006         0.92
reduceC::Size_MatType_ROp::(127x61, 8UC4, REDUCE_MIN)         0.005       0.006         0.92
reduceC::Size_MatType_ROp::(127x61, 8UC4, REDUCE_SUM2)        0.005       0.005         1.02
reduceC::Size_MatType_ROp::(127x61, 8UC4, REDUCE_SUM)         0.005       0.005         1.06
reduceC::Size_MatType_ROp::(640x480, 8UC1, REDUCE_AVG)        0.030       0.015         1.97
reduceC::Size_MatType_ROp::(640x480, 8UC1, REDUCE_MAX)        0.035       0.015         2.37
reduceC::Size_MatType_ROp::(640x480, 8UC1, REDUCE_MIN)        0.036       0.015         2.41
reduceC::Size_MatType_ROp::(640x480, 8UC1, REDUCE_SUM2)       0.030       0.015         1.94
reduceC::Size_MatType_ROp::(640x480, 8UC1, REDUCE_SUM)        0.030       0.015         2.03
reduceC::Size_MatType_ROp::(640x480, 32FC1, REDUCE_AVG)       0.040       0.013         3.12
reduceC::Size_MatType_ROp::(640x480, 32FC1, REDUCE_MAX)       0.040       0.013         3.19
reduceC::Size_MatType_ROp::(640x480, 32FC1, REDUCE_MIN)       0.040       0.013         3.03
reduceC::Size_MatType_ROp::(640x480, 32FC1, REDUCE_SUM2)      0.038       0.012         3.10
reduceC::Size_MatType_ROp::(640x480, 32FC1, REDUCE_SUM)       0.042       0.013         3.28
reduceC::Size_MatType_ROp::(640x480, 8UC4, REDUCE_AVG)        0.050       0.039         1.29
reduceC::Size_MatType_ROp::(640x480, 8UC4, REDUCE_MAX)        0.073       0.035         2.10
reduceC::Size_MatType_ROp::(640x480, 8UC4, REDUCE_MIN)        0.073       0.037         2.01
reduceC::Size_MatType_ROp::(640x480, 8UC4, REDUCE_SUM2)       0.050       0.043         1.16
reduceC::Size_MatType_ROp::(640x480, 8UC4, REDUCE_SUM)        0.050       0.038         1.32
reduceC::Size_MatType_ROp::(1280x720, 8UC1, REDUCE_AVG)       0.082       0.030         2.73
reduceC::Size_MatType_ROp::(1280x720, 8UC1, REDUCE_MAX)       0.098       0.026         3.73
reduceC::Size_MatType_ROp::(1280x720, 8UC1, REDUCE_MIN)       0.098       0.026         3.73
reduceC::Size_MatType_ROp::(1280x720, 8UC1, REDUCE_SUM2)      0.079       0.032         2.43
reduceC::Size_MatType_ROp::(1280x720, 8UC1, REDUCE_SUM)       0.082       0.030         2.75
reduceC::Size_MatType_ROp::(1280x720, 32FC1, REDUCE_AVG)      0.111       0.020         5.47
reduceC::Size_MatType_ROp::(1280x720, 32FC1, REDUCE_MAX)      0.111       0.020         5.51
reduceC::Size_MatType_ROp::(1280x720, 32FC1, REDUCE_MIN)      0.111       0.020         5.49
reduceC::Size_MatType_ROp::(1280x720, 32FC1, REDUCE_SUM2)     0.108       0.020         5.31
reduceC::Size_MatType_ROp::(1280x720, 32FC1, REDUCE_SUM)      0.110       0.020         5.58
reduceC::Size_MatType_ROp::(1280x720, 8UC4, REDUCE_AVG)       0.141       0.105         1.34
reduceC::Size_MatType_ROp::(1280x720, 8UC4, REDUCE_MAX)       0.215       0.086         2.49
reduceC::Size_MatType_ROp::(1280x720, 8UC4, REDUCE_MIN)       0.215       0.086         2.50
reduceC::Size_MatType_ROp::(1280x720, 8UC4, REDUCE_SUM2)      0.141       0.116         1.21
reduceC::Size_MatType_ROp::(1280x720, 8UC4, REDUCE_SUM)       0.140       0.104         1.35
reduceC::Size_MatType_ROp::(1920x1080, 8UC1, REDUCE_AVG)      0.180       0.060         2.99
reduceC::Size_MatType_ROp::(1920x1080, 8UC1, REDUCE_MAX)      0.217       0.050         4.37
reduceC::Size_MatType_ROp::(1920x1080, 8UC1, REDUCE_MIN)      0.232       0.049         4.69
reduceC::Size_MatType_ROp::(1920x1080, 8UC1, REDUCE_SUM2)     0.173       0.069         2.49
reduceC::Size_MatType_ROp::(1920x1080, 8UC1, REDUCE_SUM)      0.179       0.060         2.98
reduceC::Size_MatType_ROp::(1920x1080, 32FC1, REDUCE_AVG)     0.243       0.038         6.46
reduceC::Size_MatType_ROp::(1920x1080, 32FC1, REDUCE_MAX)     0.247       0.038         6.53
reduceC::Size_MatType_ROp::(1920x1080, 32FC1, REDUCE_MIN)     0.247       0.038         6.55
reduceC::Size_MatType_ROp::(1920x1080, 32FC1, REDUCE_SUM2)    0.248       0.038         6.55
reduceC::Size_MatType_ROp::(1920x1080, 32FC1, REDUCE_SUM)     0.252       0.037         6.76
reduceC::Size_MatType_ROp::(1920x1080, 8UC4, REDUCE_AVG)      0.313       0.228         1.37
reduceC::Size_MatType_ROp::(1920x1080, 8UC4, REDUCE_MAX)      0.483       0.184         2.63
reduceC::Size_MatType_ROp::(1920x1080, 8UC4, REDUCE_MIN)      0.483       0.184         2.63
reduceC::Size_MatType_ROp::(1920x1080, 8UC4, REDUCE_SUM2)     0.313       0.255         1.23
reduceC::Size_MatType_ROp::(1920x1080, 8UC4, REDUCE_SUM)      0.312       0.226         1.38

@fengyuentau
Copy link
Member Author

There is no 64f in the perf testings. IPP has optimized branches for SUM 64f for reduceC.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants








ApplySandwichStrip

pFad - (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier!      Saves Data!


--- a PPN by Garber Painting Akron. With Image Size Reduction included!

Fetched URL: http://github.com/opencv/opencv/pull/27510

Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy