-
Notifications
You must be signed in to change notification settings - Fork 111
fixes an issue with macro directives for !$acc kernels
#926
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR replaces the custom GPU_PARALLEL()
macro with raw OpenACC kernels
directives as a stopgap for NVHPC compatibility.
- Substituted
#:call GPU_PARALLEL()
/#:endcall GPU_PARALLEL
with!$acc kernels
/!$acc end kernels
- Applied the change in both the time-stepping and data-output modules
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
File | Description |
---|---|
src/simulation/m_time_steppers.fpp | Replaced GPU_PARALLEL macro around minval(max_dt) with kernels |
src/simulation/m_data_output.fpp | Replaced GPU_PARALLEL macro around maxval /minval calls |
Comments suppressed due to low confidence (3)
src/simulation/m_time_steppers.fpp:996
- Wrapping a scalar
minval
call in akernels
region may incur unnecessary kernel launch overhead and may not generate a reduction on the device. Consider using!$acc parallel loop reduction(min:dt_local)
around the explicit loop overmax_dt
with a collapse if multiple dimensions are involved.
!$acc kernels
src/simulation/m_data_output.fpp:319
- Enclosing
maxval(icfl_sf)
in akernels
region may not produce an efficient reduction; consider converting this to a!$acc parallel loop reduction(max:icfl_max_loc)
over the underlying array to leverage device-side reductions.
!$acc kernels
src/simulation/m_data_output.fpp:323
- This
kernels
region wraps two scalar reductions (vcfl_max_loc
,Rc_min_loc
); you may get better performance by using a combined!$acc parallel loop reduction(max:vcfl_max_loc) reduction(min:Rc_min_loc)
over the loop indices instead ofkernels
.
!$acc kernels
PR Code Suggestions ✨Explore these optional code suggestions:
|
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #926 +/- ##
==========================================
+ Coverage 43.68% 43.71% +0.02%
==========================================
Files 68 68
Lines 18363 18360 -3
Branches 2295 2292 -3
==========================================
+ Hits 8022 8026 +4
+ Misses 8949 8945 -4
+ Partials 1392 1389 -3 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
User description
The #883 created an issue that substituted
!$acc parallel
for!$acc kernels
in all of 3 whole places (!!). It turns out this does not work on NVHPC. This is a stopgap fix. @prathi-wind will fix it up more properly.PR Type
Bug fix
Description
Replace GPU_PARALLEL macro with
!$acc kernels
directivesFix NVHPC compiler compatibility issues
Update OpenACC directives in data output and time stepping modules
Changes diagram
Changes walkthrough 📝
m_data_output.fpp
Update OpenACC directives in data output module
src/simulation/m_data_output.fpp
#:call GPU_PARALLEL()
with!$acc kernels
foricfl_max_loc
calculation
!$acc kernels
directives forvcfl_max_loc
andRc_min_loc
m_time_steppers.fpp
Fix OpenACC directives in time stepping module
src/simulation/m_time_steppers.fpp
#:call GPU_PARALLEL()
with!$acc kernels
fordt_local
calculation