-
-
Notifications
You must be signed in to change notification settings - Fork 56.2k
add MATCH_CUDA_MINOR_VERSION, resolve #26965 #26966
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: 4.x
Are you sure you want to change the base?
add MATCH_CUDA_MINOR_VERSION, resolve #26965 #26966
Conversation
… YES; when turn it off it only depends on the major cuda version; this also fix the wrong behavior: need to match cuda patch version when ENABLE_CUDA_FIRST_CLASS_LANGUAGE=ON
…x/fix-cuda-version-dependency
cc @cudawarped |
else() | ||
# Do not match minor: range is [major, <major + 1>) | ||
math(EXPR new_major "${major} + 1") | ||
set(${lower_bound} "${major}" PARENT_SCOPE) | ||
set(${upper_bound} "${new_major}" PARENT_SCOPE) | ||
endif() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I propose for the case when match_minor
is off use range major.minor...major+1.0. NVidia may introduce backward compatible change, but not forward compatible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that is why the match_minor by default is ON. But the user can choose to ignore minor version.
Personally, I never encounter API incompatibility during minor version update. The case the application breaks due to cuda toolkit update is the PTX, which is actually a problem of system maintenance: you simply should not update cuda toolkit without update driver.
Other kind of breakage is due to bugs introduced in each minor version, but then it is not API breakage, the library still be able to link to whatever minor version. If one knows some version give out wrong executing result they on should just choose a different version when he compile his code, as library provider we should not care.
@@ -10,10 +12,10 @@ set(OpenCV_CUDNN_VERSION "@CUDNN_VERSION@") | |||
set(OpenCV_USE_CUDNN "@HAVE_CUDNN@") | |||
|
|||
if(NOT CUDA_FOUND) | |||
find_host_package(CUDA ${OpenCV_CUDA_VERSION} EXACT REQUIRED) | |||
find_host_package(CUDA ${OpenCV_CUDA_VERSION_MIN} EXACT REQUIRED) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From CMake documentation:
The EXACT option requests that the version be matched exactly. This option is incompatible with the specification of a version range.
The option is redundant for the case with softer case, if minor version is included.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as I tested: find_host_package(CUDA 12 EXACT REQUIRED)
find whatever 12.x, it does not care the minor
find_host_package(CUDA 12.6 EXACT REQUIRED)
find whatever 12.6 it does not care the patch version.
the newly introduced get_version_range
does not give you minor version if match_minor==OFF, and give you the exact minor version the CUDA_VERSION_STRING give out if match_minor==ON.
Looking at the compatibility guide doesn't it depend on whether we are statically or dynamically linking? For shared libraries doesn't
imply there is no guarantee unless your application has access to the exact same version of the SDK libraries it was built against (i.e. For static libraries doesn't everything depend on the driver which has nothing to do with the CUDA toolkit? Is the CUDA toolkit even required on linux when statically linking?
|
Did you check the minor version compatibility? cuda runtime >= 11 is:
So I disagree, and even to my experience, we build both statically and dynamically linked libs, they are in fact minor version compatible.
Even though we compile opencv library as static, it still contains the undefined symbols from static cuda library, thus as a library provider, we do need to expose the related static cuda library via interfacing cmake target. Also emphasize again, by default it match the minor version Currently when use CUDA first class language it matches patch version, which is unacceptable it requires to rebuild the ocv too often on up-to-date system. I also encourage experienced users to try out MATCH_CUDA_MINOR_VERSION=OFF on there own, to see if there is really a problem. |
Is a flag really necessary to keep the old behaviour? A flag made sence if
implies that libraries such as npp, cuBLAS and cuFFT's may or may not support minor version compatibility and if they do this will depend on the CUDA Toolkit version (NVRTC supports minor version compatibility from CUDA 11.3 onwards)? If you are 100% sure this is always the case wouldn't it be better to force the behaviour?
I agree. |
Which static CUDA library are you refering to? My understanding is that if we just depend on cudart_static then we don't require the CUDA toolkit only a compatible driver because libcuda.so from the driver implements all the undefined symbols? I was assuming that this is also the case for the static versions of NPP, cuBLAS and cuFFT if they are built with vanilla CUDA? |
Let's try an example from this issue: If I build a cuda enabled opencv that only need cudart, then the libopencv_core.a actually contains undefined symbols can only be resolved by linking to libcudart_static.a and libopencv_cudev.a. The static library is not self contained, it only pack the symbols implemented by its own code. So when we use opencv_core in our product exec, during link stage, we still need libcudart_static.a. The final exec then do not need any cudart lib to run. You can distribute it without any cudart lib. I don't know if I make this clear. Here you may noticed another inconvenience of current library structure. As long as we build the ocv with cuda, it still requires to have libcudart_static.a to build even the exec itself does not really use any cuda related thing. Ideally this should be separated. But it is another big topic. |
I think it means the system package maintainer need to be careful not to mix the cuBLAS and cuDNN libs. It is cuda toolkit internal thing. I am not sure about the NPP and cuFFT so far, because I did not use them intensively. But at least they link fine and according to the document it should be minor version compatible. The only question I have is can the same major version of CUDA toolkits have mixed cuFFT, NPP major versions?
But if it is possible to have libcufft.so.10 for 12.1 or libcufft.so.12 for 12.9, then it won't be compatible I guess. so far I never saw this. Actually if you check the pypi distribution of cuda libraries they actually do not have minor version. I also noticed the cuDNN does not specify minor version of cuda at all, you can only choose cuda11 or cuda12, which may indicate the minor version indeed does not matter. I actually expecting your team tell me if it could be different? because it is your code making it depends on minor version, I suppose you have special reason for this. Did you observe breakage at some point? Actually static link to cuda prevent bug fix via update the cuda libs. So I don't really like it. Dynamic link in other hand, should not have any problem.
further we maybe allow to build libopencv_core.a linking to |
The PR was discussed on OpenCV core team meeting. The team proposes the do not touch OpenCV build scripts at all, but modify OpenCV CMake config file template. The config may change CUDA/CuDNN/CuBLAS/etc search beahviour with user provided variable. For example, OpenCV user finds OpenCV like this:
And OpenCV config handles |
solve #26965
-DMATCH_CUDA_MINOR_VERSION=ON
.-DMATCH_CUDA_MINOR_VERSION=OFF
, OpenCVConfig.cmake only check if the major version of the found cuda toolkit matches the expectation.The cuda sdk actually have stable API for major version, we do not even need to depend on the minor version. But to keep the old behaviour I make MATCH_CUDA_MINOR_VERSION default to ON.
Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
Patch to opencv_extra has the same branch name.