Skip to content

add MATCH_CUDA_MINOR_VERSION, resolve #26965 #26966

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: 4.x
Choose a base branch
from

Conversation

braindevices
Copy link
Contributor

solve #26965

  1. fix the OpenCVConfig.cmake when -DMATCH_CUDA_MINOR_VERSION=ON.
  2. add MATCH_CUDA_MINOR_VERSION, when -DMATCH_CUDA_MINOR_VERSION=OFF, OpenCVConfig.cmake only check if the major version of the found cuda toolkit matches the expectation.

The cuda sdk actually have stable API for major version, we do not even need to depend on the minor version. But to keep the old behaviour I make MATCH_CUDA_MINOR_VERSION default to ON.

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

  • I agree to contribute to the project under Apache 2 License.
  • To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
  • The PR is proposed to the proper branch
  • There is a reference to the original bug report and related work
  • There is accuracy test, performance test and test data in opencv_extra repository, if applicable
    Patch to opencv_extra has the same branch name.
  • The feature is well documented and sample code can be built with the project CMake

… YES; when turn it off it only depends on the major cuda version; this also fix the wrong behavior: need to match cuda patch version when ENABLE_CUDA_FIRST_CLASS_LANGUAGE=ON
@asmorkalov
Copy link
Contributor

cc @cudawarped

@asmorkalov asmorkalov self-requested a review February 24, 2025 07:14
@asmorkalov asmorkalov self-assigned this Feb 24, 2025
Comment on lines +62 to +67
else()
# Do not match minor: range is [major, <major + 1>)
math(EXPR new_major "${major} + 1")
set(${lower_bound} "${major}" PARENT_SCOPE)
set(${upper_bound} "${new_major}" PARENT_SCOPE)
endif()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I propose for the case when match_minor is off use range major.minor...major+1.0. NVidia may introduce backward compatible change, but not forward compatible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that is why the match_minor by default is ON. But the user can choose to ignore minor version.

Personally, I never encounter API incompatibility during minor version update. The case the application breaks due to cuda toolkit update is the PTX, which is actually a problem of system maintenance: you simply should not update cuda toolkit without update driver.

Other kind of breakage is due to bugs introduced in each minor version, but then it is not API breakage, the library still be able to link to whatever minor version. If one knows some version give out wrong executing result they on should just choose a different version when he compile his code, as library provider we should not care.

@@ -10,10 +12,10 @@ set(OpenCV_CUDNN_VERSION "@CUDNN_VERSION@")
set(OpenCV_USE_CUDNN "@HAVE_CUDNN@")

if(NOT CUDA_FOUND)
find_host_package(CUDA ${OpenCV_CUDA_VERSION} EXACT REQUIRED)
find_host_package(CUDA ${OpenCV_CUDA_VERSION_MIN} EXACT REQUIRED)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From CMake documentation:

The EXACT option requests that the version be matched exactly. This option is incompatible with the specification of a version range.

The option is redundant for the case with softer case, if minor version is included.

Copy link
Contributor Author

@braindevices braindevices Feb 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as I tested: find_host_package(CUDA 12 EXACT REQUIRED) find whatever 12.x, it does not care the minor
find_host_package(CUDA 12.6 EXACT REQUIRED) find whatever 12.6 it does not care the patch version.
the newly introduced get_version_range does not give you minor version if match_minor==OFF, and give you the exact minor version the CUDA_VERSION_STRING give out if match_minor==ON.

@cudawarped
Copy link
Contributor

cudawarped commented Feb 24, 2025

The cuda sdk actually have stable API for major version, we do not even need to depend on the minor version. But to keep the old behaviour I make MATCH_CUDA_MINOR_VERSION default to ON.

Looking at the compatibility guide doesn't it depend on whether we are statically or dynamically linking?

For shared libraries doesn't

If the application relies on dynamic linking for libraries, then the system should have the right version of such libraries as well.

imply there is no guarantee unless your application has access to the exact same version of the SDK libraries it was built against (i.e. CUDA_VERSION_STRING VERSION_EQUAL OpenCV_CUDA_VERSION)?

For static libraries doesn't everything depend on the driver which has nothing to do with the CUDA toolkit? Is the CUDA toolkit even required on linux when statically linking?

CUDA Compatibility guarantees allow for upgrading only certain components:

  • Backwards compatibility ensures that a newer NVIDIA driver can be used with an older CUDA Toolkit. This is implicit and most simple way of doing upgrades.

  • Minor version and forward compatibility ensure that an older NVIDIA driver can be used with a newer CUDA Toolkit.

@braindevices
Copy link
Contributor Author

braindevices commented Feb 24, 2025

For shared libraries doesn't

If the application relies on dynamic linking for libraries, then the system should have the right version of such libraries as well.

Did you check the minor version compatibility? cuda runtime >= 11 is:

CUDA 11 and Later Defaults to Minor Version Compatibility
Minor version compatibility has another benefit that offers flexibility in the use and deployment of libraries. Applications that use libraries that support minor version compatibility can be deployed on systems with a different version of the toolkit and libraries without recompiling the application for the difference in the library version. This holds true for both older and newer versions of the libraries provided they are all from the same major release family. Note that libraries themselves have interdependencies that should be considered. For example, each cuDNN version requires a certain version of cuBLAS.

So I disagree, and even to my experience, we build both statically and dynamically linked libs, they are in fact minor version compatible.

For static libraries doesn't everything depend on the driver which has nothing to do with the CUDA toolkit? Is the CUDA toolkit even required on linux when statically linking?

Even though we compile opencv library as static, it still contains the undefined symbols from static cuda library, thus as a library provider, we do need to expose the related static cuda library via interfacing cmake target.

Also emphasize again, by default it match the minor version Currently when use CUDA first class language it matches patch version, which is unacceptable it requires to rebuild the ocv too often on up-to-date system.

I also encourage experienced users to try out MATCH_CUDA_MINOR_VERSION=OFF on there own, to see if there is really a problem.

@cudawarped
Copy link
Contributor

So I disagree, and even to my experience, we build both statically and dynamically linked libs, they are in fact minor version compatible.

Is a flag really necessary to keep the old behaviour? A flag made sence if

Applications that use libraries that support minor version compatibility can be deployed on systems with a different version of the toolkit and libraries without recompiling the application for the difference in the library version.

implies that libraries such as npp, cuBLAS and cuFFT's may or may not support minor version compatibility and if they do this will depend on the CUDA Toolkit version (NVRTC supports minor version compatibility from CUDA 11.3 onwards)? If you are 100% sure this is always the case wouldn't it be better to force the behaviour?

Also emphasize again, by default it match the minor version Currently when use CUDA first class language it matches patch version, which is unacceptable it requires to rebuild the ocv too often on up-to-date system.

I agree.

@cudawarped
Copy link
Contributor

Even though we compile opencv library as static, it still contains the undefined symbols from static cuda library, thus as a library provider, we do need to expose the related static cuda library via interfacing cmake target.

Which static CUDA library are you refering to?

My understanding is that if we just depend on cudart_static then we don't require the CUDA toolkit only a compatible driver because libcuda.so from the driver implements all the undefined symbols? I was assuming that this is also the case for the static versions of NPP, cuBLAS and cuFFT if they are built with vanilla CUDA?

@braindevices
Copy link
Contributor Author

braindevices commented Feb 24, 2025

My understanding is that if we just depend on cudart_static then we don't require the CUDA toolkit only a compatible driver because libcuda.so from the driver implements all the undefined symbols? I was assuming that this is also the case for the static versions of NPP, cuBLAS and cuFFT if they are built with vanilla CUDA?

Let's try an example from this issue:
#26963

If I build a cuda enabled opencv that only need cudart, then the libopencv_core.a actually contains undefined symbols can only be resolved by linking to libcudart_static.a and libopencv_cudev.a. The static library is not self contained, it only pack the symbols implemented by its own code.

So when we use opencv_core in our product exec, during link stage, we still need libcudart_static.a. The final exec then do not need any cudart lib to run. You can distribute it without any cudart lib.

I don't know if I make this clear.

Here you may noticed another inconvenience of current library structure. As long as we build the ocv with cuda, it still requires to have libcudart_static.a to build even the exec itself does not really use any cuda related thing. Ideally this should be separated. But it is another big topic.

@braindevices
Copy link
Contributor Author

braindevices commented Feb 24, 2025

Is a flag really necessary to keep the old behaviour? A flag made sence if
Just try not to break anything

implies that libraries such as npp, cuBLAS and cuFFT's may or may not support minor version compatibility and if they do this will depend on the CUDA Toolkit version (NVRTC supports minor version compatibility from CUDA 11.3 onwards)? If you are 100% sure this is always the case wouldn't it be better to force the behaviour?

I think it means the system package maintainer need to be careful not to mix the cuBLAS and cuDNN libs. It is cuda toolkit internal thing. I am not sure about the NPP and cuFFT so far, because I did not use them intensively. But at least they link fine and according to the document it should be minor version compatible.

The only question I have is can the same major version of CUDA toolkits have mixed cuFFT, NPP major versions?
for example, I always see same major version of libcufft so far:

find /usr/local/cuda* -name 'libcufft.so.*.*'
/usr/local/cuda-12.3/targets/x86_64-linux/lib/libcufft.so.11.0.12.1
/usr/local/cuda-12.8/targets/x86_64-linux/lib/libcufft.so.11.3.3.41

But if it is possible to have libcufft.so.10 for 12.1 or libcufft.so.12 for 12.9, then it won't be compatible I guess. so far I never saw this. Actually if you check the pypi distribution of cuda libraries they actually do not have minor version. I also noticed the cuDNN does not specify minor version of cuda at all, you can only choose cuda11 or cuda12, which may indicate the minor version indeed does not matter.

I actually expecting your team tell me if it could be different? because it is your code making it depends on minor version, I suppose you have special reason for this. Did you observe breakage at some point?

Actually static link to cuda prevent bug fix via update the cuda libs. So I don't really like it.

Dynamic link in other hand, should not have any problem.
So next PR I will actually try to fix https://github.com/opencv/opencv_contrib/blob/ce3c6681c9bf0e5cf46704a6ce0883078bdba074/modules/cudaarithm/CMakeLists.txt#L15

CUDA::cudart_static should be CUDA::cudart${CUDA_LIB_EXT}

further we maybe allow to build libopencv_core.a linking to libcudart.so instead of libcudart.a via some flag like SHARED_CUDA_LIBS

@asmorkalov
Copy link
Contributor

asmorkalov commented Mar 10, 2025

The PR was discussed on OpenCV core team meeting. The team proposes the do not touch OpenCV build scripts at all, but modify OpenCV CMake config file template. The config may change CUDA/CuDNN/CuBLAS/etc search beahviour with user provided variable.

For example, OpenCV user finds OpenCV like this:

set(OPENCV_STRONG_CUDA_VERSION_CHECK TRUE) # or vice-versa
find_package(OpenCV REQUIRED)

And OpenCV config handles OPENCV_STRONG_CUDA_VERSION_CHECK in the library search.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy