Skip to content

fixes an issue with macro directives for !$acc kernels #926

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 8, 2025

Conversation

sbryngelson
Copy link
Member

@sbryngelson sbryngelson commented Jul 8, 2025

User description

The #883 created an issue that substituted !$acc parallel for !$acc kernels in all of 3 whole places (!!). It turns out this does not work on NVHPC. This is a stopgap fix. @prathi-wind will fix it up more properly.


PR Type

Bug fix


Description

  • Replace GPU_PARALLEL macro with !$acc kernels directives

  • Fix NVHPC compiler compatibility issues

  • Update OpenACC directives in data output and time stepping modules


Changes diagram

flowchart LR
  A["GPU_PARALLEL macro"] -- "replace with" --> B["!$acc kernels directives"]
  B --> C["NVHPC compatibility"]
Loading

Changes walkthrough 📝

Relevant files
Bug fix
m_data_output.fpp
Update OpenACC directives in data output module                   

src/simulation/m_data_output.fpp

  • Replace #:call GPU_PARALLEL() with !$acc kernels for icfl_max_loc
    calculation
  • Update viscous flow section with !$acc kernels directives for
    vcfl_max_loc and Rc_min_loc
  • +7/-7     
    m_time_steppers.fpp
    Fix OpenACC directives in time stepping module                     

    src/simulation/m_time_steppers.fpp

  • Replace #:call GPU_PARALLEL() with !$acc kernels for dt_local
    calculation
  • +3/-3     

    Need help?
  • Type /help how to ... in the comments thread for any questions about Qodo Merge usage.
  • Check out the documentation for more information.
  • @Copilot Copilot AI review requested due to automatic review settings July 8, 2025 14:07
    @sbryngelson sbryngelson requested a review from a team as a code owner July 8, 2025 14:07
    Copy link

    qodo-merge-pro bot commented Jul 8, 2025

    PR Reviewer Guide 🔍

    Here are some key observations to aid the review process:

    ⏱️ Estimated effort to review: 2 🔵🔵⚪⚪⚪
    🧪 No relevant tests
    🔒 No security concerns identified
    ⚡ Recommended focus areas for review

    Duplicate Code

    The same pattern of wrapping maxval/minval operations with !$acc kernels directives is repeated multiple times. This could be refactored into a reusable macro or subroutine to reduce code duplication and improve maintainability.

    !$acc kernels
    icfl_max_loc = maxval(icfl_sf)
    !$acc end kernels
    if (viscous) then
        !$acc kernels
        vcfl_max_loc = maxval(vcfl_sf)
        Rc_min_loc = minval(Rc_sf)
        !$acc end kernels
    end if
    Performance Concern

    Using !$acc kernels for a single minval operation may not provide optimal GPU performance compared to !$acc parallel. The kernels directive relies on compiler analysis which may not be as efficient for simple reduction operations.

    !$acc kernels
    dt_local = minval(max_dt)
    !$acc end kernels

    Copy link
    Contributor

    @Copilot Copilot AI left a comment

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Pull Request Overview

    This PR replaces the custom GPU_PARALLEL() macro with raw OpenACC kernels directives as a stopgap for NVHPC compatibility.

    • Substituted #:call GPU_PARALLEL()/#:endcall GPU_PARALLEL with !$acc kernels/!$acc end kernels
    • Applied the change in both the time-stepping and data-output modules

    Reviewed Changes

    Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

    File Description
    src/simulation/m_time_steppers.fpp Replaced GPU_PARALLEL macro around minval(max_dt) with kernels
    src/simulation/m_data_output.fpp Replaced GPU_PARALLEL macro around maxval/minval calls
    Comments suppressed due to low confidence (3)

    src/simulation/m_time_steppers.fpp:996

    • Wrapping a scalar minval call in a kernels region may incur unnecessary kernel launch overhead and may not generate a reduction on the device. Consider using !$acc parallel loop reduction(min:dt_local) around the explicit loop over max_dt with a collapse if multiple dimensions are involved.
            !$acc kernels
    

    src/simulation/m_data_output.fpp:319

    • Enclosing maxval(icfl_sf) in a kernels region may not produce an efficient reduction; consider converting this to a !$acc parallel loop reduction(max:icfl_max_loc) over the underlying array to leverage device-side reductions.
            !$acc kernels
    

    src/simulation/m_data_output.fpp:323

    • This kernels region wraps two scalar reductions (vcfl_max_loc, Rc_min_loc); you may get better performance by using a combined !$acc parallel loop reduction(max:vcfl_max_loc) reduction(min:Rc_min_loc) over the loop indices instead of kernels.
                !$acc kernels
    

    Copy link

    qodo-merge-pro bot commented Jul 8, 2025

    PR Code Suggestions ✨

    Explore these optional code suggestions:

    CategorySuggestion                                                                                                                                    Impact
    General
    Separate reduction operations into individual kernels

    Consider using separate kernel regions for each reduction operation. Multiple
    reductions in a single kernel may not be optimal for GPU performance and could
    lead to synchronization issues.

    src/simulation/m_data_output.fpp [323-326]

     !$acc kernels
     vcfl_max_loc = maxval(vcfl_sf)
    +!$acc end kernels
    +!$acc kernels
     Rc_min_loc = minval(Rc_sf)
     !$acc end kernels
    Suggestion importance[1-10]: 5

    __

    Why: This is a valid performance consideration, as splitting multiple reduction operations into separate kernels can sometimes improve GPU performance, although the benefit is not guaranteed and depends on the compiler and hardware.

    Low
    • More

    Copy link

    codecov bot commented Jul 8, 2025

    Codecov Report

    Attention: Patch coverage is 33.33333% with 2 lines in your changes missing coverage. Please review.

    Project coverage is 43.71%. Comparing base (0900648) to head (534a8a9).
    Report is 1 commits behind head on master.

    Files with missing lines Patch % Lines
    src/simulation/m_data_output.fpp 0.00% 2 Missing ⚠️
    Additional details and impacted files
    @@            Coverage Diff             @@
    ##           master     #926      +/-   ##
    ==========================================
    + Coverage   43.68%   43.71%   +0.02%     
    ==========================================
      Files          68       68              
      Lines       18363    18360       -3     
      Branches     2295     2292       -3     
    ==========================================
    + Hits         8022     8026       +4     
    + Misses       8949     8945       -4     
    + Partials     1392     1389       -3     

    ☔ View full report in Codecov by Sentry.
    📢 Have feedback on the report? Share it here.

    🚀 New features to boost your workflow:
    • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

    @sbryngelson sbryngelson merged commit 8026a1c into MFlowCode:master Jul 8, 2025
    25 of 43 checks passed
    @sbryngelson sbryngelson deleted the kernels branch July 10, 2025 00:17
    prathi-wind pushed a commit to prathi-wind/MFC-prathi that referenced this pull request Jul 13, 2025
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Development

    Successfully merging this pull request may close these issues.

    1 participant
    pFad - Phonifier reborn

    Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

    Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


    Alternative Proxies:

    Alternative Proxy

    pFad Proxy

    pFad v3 Proxy

    pFad v4 Proxy