Skip to content

Created FYPP macros to allow for meta-directive parallelization #883

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 85 commits into from
Jul 8, 2025

Conversation

prathi-wind
Copy link
Collaborator

@prathi-wind prathi-wind commented Jun 13, 2025

User description

Description

Added FYPP macros that replace the current OpenACC directives. In a future pull request, the meta-directives will add support for OpenMP. This pull request has replaced all of the ACC directives except acc kernels and those in mpi_common.fpp, and syscheck.fpp.

Fixes #(issue) [optional]

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Something else

Scope

  • This PR comprises a set of related changes with a common goal

If you cannot check the above box, please split your PR into multiple PRs that each have a common goal.

How Has This Been Tested?

Please describe the tests that you ran to verify your changes.
Provide instructions so we can reproduce.
Please also list any relevant details for your test configuration

  • Test A
  • Test B

Test Configuration:

  • What computers and compilers did you use to test this:

Checklist

  • I have added comments for the new code
  • I added Doxygen docstrings to the new code
  • I have made corresponding changes to the documentation (docs/)
  • I have added regression tests to the test suite so that people can verify in the future that the feature is behaving as expected
  • I have added example cases in examples/ that demonstrate my new feature performing as expected.
    They run to completion and demonstrate "interesting physics"
  • I ran ./mfc.sh format before committing my code
  • New and existing tests pass locally with my changes, including with GPU capability enabled (both NVIDIA hardware with NVHPC compilers and AMD hardware with CRAY compilers) and disabled
  • This PR does not introduce any repeated code (it follows the DRY principle)
  • I cannot think of a way to condense this code and reduce any introduced additional line count

If your code changes any code source files (anything in src/simulation)

To make sure the code is performing as expected on GPU devices, I have:

  • Checked that the code compiles using NVHPC compilers
  • Checked that the code compiles using CRAY compilers
  • Ran the code on either V100, A100, or H100 GPUs and ensured the new feature performed as expected (the GPU results match the CPU results)
  • Ran the code on MI200+ GPUs and ensure the new features performed as expected (the GPU results match the CPU results)
  • Enclosed the new feature via nvtx ranges so that they can be identified in profiles
  • Ran a Nsight Systems profile using ./mfc.sh run XXXX --gpu -t simulation --nsys, and have attached the output file (.nsys-rep) and plain text results to this PR
  • Ran a Rocprof Systems profile using ./mfc.sh run XXXX --gpu -t simulation --rsys --hip-trace, and have attached the output file and plain text results to this PR.
  • Ran my code using various numbers of different GPUs (1, 2, and 8, for example) in parallel and made sure that the results scale similarly to what happens if you run without the new code/feature

PR Type

Enhancement


Description

• Created comprehensive FYPP macro library for GPU parallelization directives in parallel_macros.fpp
• Replaced OpenACC directives with FYPP GPU macros across multiple simulation modules
• Converted !$acc declare create statements to $:GPU_DECLARE(create='[...]') format
• Replaced !$acc parallel loop with $:GPU_PARALLEL_LOOP macro calls
• Updated !$acc routine seq to $:GPU_ROUTINE(parallelism='[seq]') format
• Converted data movement directives (!$acc update, !$acc enter data) to corresponding GPU macros
• Applied changes to core simulation modules including Riemann solvers, viscous, RHS, CBC, WENO, and others
• Maintained functionality while preparing codebase for future OpenMP meta-directive support


Changes walkthrough 📝

Relevant files
Enhancement
15 files
m_riemann_solvers.fpp
Replace OpenACC directives with FYPP GPU macros                   

src/simulation/m_riemann_solvers.fpp

• Replaced OpenACC directives with FYPP GPU macros throughout the file

• Converted !$acc declare create statements to
$:GPU_DECLARE(create='[...]') format
• Replaced !$acc parallel loop
with $:GPU_PARALLEL_LOOP macro calls
• Updated !$acc loop seq to
$:GPU_LOOP(parallelism='[seq]') format
• Converted !$acc update and
!$acc enter data to corresponding GPU macros

+238/-232
m_bubbles_EL.fpp
Replace OpenACC directives with FYPP GPU macros                   

src/simulation/m_bubbles_EL.fpp

• Added FYPP GPU macros include and replaced OpenACC directives

Converted !$acc declare create statements to
$:GPU_DECLARE(create='[...]') format
• Replaced !$acc parallel loop
with $:GPU_PARALLEL_LOOP macro calls
• Updated !$acc routine seq to
$:GPU_ROUTINE macro format
• Converted !$acc update statements to
$:GPU_UPDATE macro calls

+63/-60 
m_helper_basic.fpp
Add FYPP macros and replace OpenACC routine directives     

src/common/m_helper_basic.fpp

• Added FYPP macros include directive at the top of the file

Replaced !$acc routine seq directives with
$:GPU_ROUTINE(parallelism='[seq]') macro calls

+6/-4     
m_viscous.fpp
Replace OpenACC directives with FYPP GPU macros in viscous module

src/simulation/m_viscous.fpp

• Replaced OpenACC directives with FYPP GPU macros throughout the file

• Converted !$acc declare create to $:GPU_DECLARE(create='[...]')

Replaced !$acc update device with $:GPU_UPDATE(device='[...]')

Converted !$acc parallel loop to $:GPU_PARALLEL_LOOP with appropriate
parameters
• Replaced !$acc loop seq with
$:GPU_LOOP(parallelism='[seq]')

+112/-108
m_rhs.fpp
Replace OpenACC directives with FYPP GPU macros in RHS module

src/simulation/m_rhs.fpp

• Replaced OpenACC directives with FYPP GPU macros for variable
declarations
• Converted !$acc declare create to
$:GPU_DECLARE(create='[...]')
• Replaced !$acc enter data and !$acc
exit data with corresponding GPU macros
• Converted !$acc parallel
loop to $:GPU_PARALLEL_LOOP with collapse and private parameters

Updated !$acc loop seq to $:GPU_LOOP(parallelism='[seq]')

+113/-105
m_cbc.fpp
Replace OpenACC directives with FYPP GPU macros in CBC module

src/simulation/m_cbc.fpp

• Replaced OpenACC directives with FYPP GPU macros for variable
declarations
• Converted !$acc declare create to
$:GPU_DECLARE(create='[...]')
• Replaced !$acc update device with
$:GPU_UPDATE(device='[...]')
• Converted !$acc parallel loop to
$:GPU_PARALLEL_LOOP with appropriate parameters
• Updated !$acc loop
seq to $:GPU_LOOP(parallelism='[seq]')

+95/-89 
m_mpi_proxy.fpp
Replace OpenACC directives with FYPP GPU macros in MPI proxy

src/simulation/m_mpi_proxy.fpp

• Replaced OpenACC directives with FYPP GPU macros for variable
declarations
• Converted !$acc declare create to
$:GPU_DECLARE(create='[i_halo_size]')
• Replaced !$acc update device
with $:GPU_UPDATE(device='[i_halo_size]')
• Converted !$acc parallel
loop to $:GPU_PARALLEL_LOOP with private parameters

+8/-8     
m_patches.fpp
Replace OpenACC routine directives with FYPP GPU macros in patches

src/pre_process/m_patches.fpp

• Replaced OpenACC routine directives with FYPP GPU macros
• Converted
!$acc routine seq to $:GPU_ROUTINE(parallelism='[seq]')
• Applied
changes to coordinate conversion functions and helper routines

+4/-4     
m_global_parameters.fpp
Replace OpenACC directives with FYPP GPU macros in global parameters

src/simulation/m_global_parameters.fpp

• Replaced OpenACC directives with FYPP GPU_DECLARE macros for
variable declarations
• Replaced OpenACC update directives with FYPP
GPU_UPDATE macros for device updates
• Replaced OpenACC enter data
directives with FYPP GPU_ENTER_DATA macros
• Consolidated multiple GPU
declarations into fewer macro calls with array syntax

+96/-70 
m_variables_conversion.fpp
Convert OpenACC directives to FYPP GPU macros in variables conversion

src/common/m_variables_conversion.fpp

• Replaced OpenACC routine directives with FYPP GPU_ROUTINE macros

Replaced OpenACC parallel loop directives with FYPP GPU_PARALLEL_LOOP
macros
• Replaced OpenACC loop directives with FYPP GPU_LOOP macros

Updated data movement directives to use FYPP GPU_ENTER_DATA and
GPU_UPDATE macros

+58/-74 
m_weno.fpp
Modernize WENO module with FYPP GPU macros                             

src/simulation/m_weno.fpp

• Added GPU_DECLARE macros for WENO-related variable declarations

Replaced OpenACC parallel loop directives with FYPP GPU_PARALLEL_LOOP
macros
• Replaced OpenACC loop directives with FYPP GPU_LOOP macros

Updated device data updates to use FYPP GPU_UPDATE macros

+33/-44 
m_boundary_common.fpp
Convert boundary condition module to FYPP GPU macros         

src/common/m_boundary_common.fpp

• Replaced OpenACC declare directives with FYPP GPU_DECLARE macros

Replaced OpenACC parallel loop directives with FYPP GPU_PARALLEL_LOOP
macros
• Replaced OpenACC routine directives with FYPP GPU_ROUTINE
macros
• Updated device data updates to use FYPP GPU_UPDATE macros

+39/-59 
parallel_macros.fpp
Create FYPP macro library for GPU parallelization directives

src/common/include/parallel_macros.fpp

• Added comprehensive FYPP macro definitions for GPU parallelization

Implemented macros for OpenACC directives like GPU_PARALLEL_LOOP,
GPU_ROUTINE, GPU_DECLARE
• Added support for various OpenACC clauses
and data management directives
• Included helper functions for
generating proper clause syntax

+397/-0 
m_qbmm.fpp
Modernize QBMM module with FYPP GPU macros                             

src/simulation/m_qbmm.fpp

• Replaced OpenACC declare directives with FYPP GPU_DECLARE macros

Replaced OpenACC parallel loop directives with FYPP GPU_PARALLEL_LOOP
macros
• Replaced OpenACC routine directives with FYPP GPU_ROUTINE
macros
• Updated data movement directives to use FYPP GPU_ENTER_DATA
and GPU_UPDATE macros

+44/-51 
m_hypoelastic.fpp
Convert hypoelastic module to FYPP GPU macros                       

src/simulation/m_hypoelastic.fpp

• Replaced OpenACC declare directives with FYPP GPU_DECLARE macros

Replaced OpenACC parallel loop directives with FYPP GPU_PARALLEL_LOOP
macros
• Replaced OpenACC loop directives with FYPP GPU_LOOP macros

Updated device data updates to use FYPP GPU_UPDATE macros

+25/-31 
Additional files
23 files
macros.fpp +20/-15 
m_chemistry.fpp +6/-7     
m_finite_differences.fpp +4/-1     
m_helper.fpp +3/-2     
m_phase_change.fpp +26/-47 
m_assign_variables.fpp +3/-2     
inline_riemann.fpp +2/-2     
m_acoustic_src.fpp +36/-29 
m_body_forces.fpp +7/-7     
m_bubbles.fpp +24/-30 
m_bubbles_EE.fpp +24/-23 
m_bubbles_EL_kernels.fpp +27/-39 
m_compute_cbc.fpp +31/-45 
m_data_output.fpp +7/-7     
m_fftw.fpp +91/-91 
m_hyperelastic.fpp +19/-20 
m_ibm.fpp +33/-27 
m_mhd.fpp +9/-11   
m_pressure_relaxation.fpp +26/-26 
m_sim_helpers.fpp +9/-10   
m_start_up.fpp +33/-26 
m_surface_tension.fpp +20/-20 
m_time_steppers.fpp +24/-24 

Need help?
  • Type /help how to ... in the comments thread for any questions about Qodo Merge usage.
  • Check out the documentation for more information.
  • Tanush Prathi and others added 30 commits June 3, 2025 14:19
    Co-authored-by: Xuzheng Tian <xtian64@login-phoenix-rh9-3.pace.gatech.edu>
    Co-authored-by: Spencer Bryngelson <sbryngelson@gmail.com>
    Co-authored-by: Spencer Bryngelson <shb@gatech.edu>
    @prathi-wind prathi-wind marked this pull request as ready for review June 23, 2025 15:15
    Copy link

    PR Reviewer Guide 🔍

    Here are some key observations to aid the review process:

    ⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪
    🧪 No relevant tests
    🔒 No security concerns identified
    ⚡ Recommended focus areas for review

    Code Duplication

    Extensive code duplication exists across multiple functions with nearly identical loop structures and variable declarations. The same patterns for computing viscous stresses are repeated in multiple subroutines with only minor variations.

        $:GPU_PARALLEL_LOOP(collapse=3, private='[alpha_visc, &
            & alpha_rho_visc, Re_visc, tau_Re]')
        do l = is3_viscous%beg, is3_viscous%end
            do k = -1, 1
                do j = is1_viscous%beg, is1_viscous%end
    
                    $:GPU_LOOP(parallelism='[seq]')
                    do i = 1, num_fluids
                        alpha_rho_visc(i) = q_prim_vf(i)%sf(j, k, l)
                        if (bubbles_euler .and. num_fluids == 1) then
                            alpha_visc(i) = 1._wp - q_prim_vf(E_idx + i)%sf(j, k, l)
                        else
                            alpha_visc(i) = q_prim_vf(E_idx + i)%sf(j, k, l)
                        end if
                    end do
    
                    if (bubbles_euler) then
                        rho_visc = 0._wp
                        gamma_visc = 0._wp
                        pi_inf_visc = 0._wp
    
                        if (mpp_lim .and. (model_eqns == 2) .and. (num_fluids > 2)) then
                            $:GPU_LOOP(parallelism='[seq]')
                            do i = 1, num_fluids
                                rho_visc = rho_visc + alpha_rho_visc(i)
                                gamma_visc = gamma_visc + alpha_visc(i)*gammas(i)
                                pi_inf_visc = pi_inf_visc + alpha_visc(i)*pi_infs(i)
                            end do
                        else if ((model_eqns == 2) .and. (num_fluids > 2)) then
                            $:GPU_LOOP(parallelism='[seq]')
                            do i = 1, num_fluids - 1
                                rho_visc = rho_visc + alpha_rho_visc(i)
                                gamma_visc = gamma_visc + alpha_visc(i)*gammas(i)
                                pi_inf_visc = pi_inf_visc + alpha_visc(i)*pi_infs(i)
                            end do
                        else
                            rho_visc = alpha_rho_visc(1)
                            gamma_visc = gammas(1)
                            pi_inf_visc = pi_infs(1)
                        end if
                    else
                        rho_visc = 0._wp
                        gamma_visc = 0._wp
                        pi_inf_visc = 0._wp
    
                        alpha_visc_sum = 0._wp
    
                        if (mpp_lim) then
                            $:GPU_LOOP(parallelism='[seq]')
                            do i = 1, num_fluids
                                alpha_rho_visc(i) = max(0._wp, alpha_rho_visc(i))
                                alpha_visc(i) = min(max(0._wp, alpha_visc(i)), 1._wp)
                                alpha_visc_sum = alpha_visc_sum + alpha_visc(i)
                            end do
    
                            alpha_visc = alpha_visc/max(alpha_visc_sum, sgm_eps)
    
                        end if
    
                        $:GPU_LOOP(parallelism='[seq]')
                        do i = 1, num_fluids
                            rho_visc = rho_visc + alpha_rho_visc(i)
                            gamma_visc = gamma_visc + alpha_visc(i)*gammas(i)
                            pi_inf_visc = pi_inf_visc + alpha_visc(i)*pi_infs(i)
                        end do
    
                        if (viscous) then
                            $:GPU_LOOP(parallelism='[seq]')
                            do i = 1, 2
                                Re_visc(i) = dflt_real
    
                                if (Re_size(i) > 0) Re_visc(i) = 0._wp
                                $:GPU_LOOP(parallelism='[seq]')
                                do q = 1, Re_size(i)
                                    Re_visc(i) = alpha_visc(Re_idx(i, q))/Res_viscous(i, q) &
                                                 + Re_visc(i)
                                end do
    
                                Re_visc(i) = 1._wp/max(Re_visc(i), sgm_eps)
    
                            end do
                        end if
                    end if
    
                    tau_Re(2, 1) = (grad_y_vf(1)%sf(j, k, l) + &
                                    grad_x_vf(2)%sf(j, k, l))/ &
                                   Re_visc(1)
    
                    tau_Re(2, 2) = (4._wp*grad_y_vf(2)%sf(j, k, l) &
                                    - 2._wp*grad_x_vf(1)%sf(j, k, l) &
                                    - 2._wp*q_prim_vf(momxb + 1)%sf(j, k, l)/y_cc(k))/ &
                                   (3._wp*Re_visc(1))
                    $:GPU_LOOP(parallelism='[seq]')
                    do i = 1, 2
                        tau_Re_vf(contxe + i)%sf(j, k, l) = &
                            tau_Re_vf(contxe + i)%sf(j, k, l) - &
                            tau_Re(2, i)
    
                        tau_Re_vf(E_idx)%sf(j, k, l) = &
                            tau_Re_vf(E_idx)%sf(j, k, l) - &
                            q_prim_vf(contxe + i)%sf(j, k, l)*tau_Re(2, i)
                    end do
                end do
            end do
        end do
    end if
    
    if (bulk_stress) then    ! Bulk stresses
        $:GPU_PARALLEL_LOOP(collapse=3, private='[alpha_visc, &
            & alpha_rho_visc, Re_visc, tau_Re]')
        do l = is3_viscous%beg, is3_viscous%end
            do k = -1, 1
                do j = is1_viscous%beg, is1_viscous%end
    
                    $:GPU_LOOP(parallelism='[seq]')
                    do i = 1, num_fluids
                        alpha_rho_visc(i) = q_prim_vf(i)%sf(j, k, l)
                        if (bubbles_euler .and. num_fluids == 1) then
                            alpha_visc(i) = 1._wp - q_prim_vf(E_idx + i)%sf(j, k, l)
                        else
                            alpha_visc(i) = q_prim_vf(E_idx + i)%sf(j, k, l)
                        end if
                    end do
    
                    if (bubbles_euler) then
                        rho_visc = 0._wp
                        gamma_visc = 0._wp
                        pi_inf_visc = 0._wp
    
                        if (mpp_lim .and. (model_eqns == 2) .and. (num_fluids > 2)) then
                            $:GPU_LOOP(parallelism='[seq]')
                            do i = 1, num_fluids
                                rho_visc = rho_visc + alpha_rho_visc(i)
                                gamma_visc = gamma_visc + alpha_visc(i)*gammas(i)
                                pi_inf_visc = pi_inf_visc + alpha_visc(i)*pi_infs(i)
                            end do
                        else if ((model_eqns == 2) .and. (num_fluids > 2)) then
                            $:GPU_LOOP(parallelism='[seq]')
                            do i = 1, num_fluids - 1
                                rho_visc = rho_visc + alpha_rho_visc(i)
                                gamma_visc = gamma_visc + alpha_visc(i)*gammas(i)
                                pi_inf_visc = pi_inf_visc + alpha_visc(i)*pi_infs(i)
                            end do
                        else
                            rho_visc = alpha_rho_visc(1)
                            gamma_visc = gammas(1)
                            pi_inf_visc = pi_infs(1)
                        end if
                    else
                        rho_visc = 0._wp
                        gamma_visc = 0._wp
                        pi_inf_visc = 0._wp
    
                        alpha_visc_sum = 0._wp
    
                        if (mpp_lim) then
                            $:GPU_LOOP(parallelism='[seq]')
                            do i = 1, num_fluids
                                alpha_rho_visc(i) = max(0._wp, alpha_rho_visc(i))
                                alpha_visc(i) = min(max(0._wp, alpha_visc(i)), 1._wp)
                                alpha_visc_sum = alpha_visc_sum + alpha_visc(i)
                            end do
    
                            alpha_visc = alpha_visc/max(alpha_visc_sum, sgm_eps)
    
                        end if
    
                        $:GPU_LOOP(parallelism='[seq]')
                        do i = 1, num_fluids
                            rho_visc = rho_visc + alpha_rho_visc(i)
                            gamma_visc = gamma_visc + alpha_visc(i)*gammas(i)
                            pi_inf_visc = pi_inf_visc + alpha_visc(i)*pi_infs(i)
                        end do
    
                        if (viscous) then
                            $:GPU_LOOP(parallelism='[seq]')
                            do i = 1, 2
                                Re_visc(i) = dflt_real
    
                                if (Re_size(i) > 0) Re_visc(i) = 0._wp
                                $:GPU_LOOP(parallelism='[seq]')
                                do q = 1, Re_size(i)
                                    Re_visc(i) = alpha_visc(Re_idx(i, q))/Res_viscous(i, q) &
                                                 + Re_visc(i)
                                end do
    
                                Re_visc(i) = 1._wp/max(Re_visc(i), sgm_eps)
    
                            end do
                        end if
                    end if
    
                    tau_Re(2, 2) = (grad_x_vf(1)%sf(j, k, l) + &
                                    grad_y_vf(2)%sf(j, k, l) + &
                                    q_prim_vf(momxb + 1)%sf(j, k, l)/y_cc(k))/ &
                                   Re_visc(2)
    
                    tau_Re_vf(momxb + 1)%sf(j, k, l) = &
                        tau_Re_vf(momxb + 1)%sf(j, k, l) - &
                        tau_Re(2, 2)
    
                    tau_Re_vf(E_idx)%sf(j, k, l) = &
                        tau_Re_vf(E_idx)%sf(j, k, l) - &
                        q_prim_vf(momxb + 1)%sf(j, k, l)*tau_Re(2, 2)
    
                end do
            end do
        end do
    end if
    
    if (p == 0) return
    Complex Loop Logic

    The conservative to primitive variable conversion contains deeply nested loops with complex conditional logic and multiple sequential inner loops that could introduce performance bottlenecks or numerical instabilities.

            $:GPU_PARALLEL_LOOP(collapse=3, private='[alpha_K, alpha_rho_K, Re_K, &
                & nRtmp, rho_K, gamma_K, pi_inf_K,qv_K, &
                & dyn_pres_K, rhoYks, B]')
            do l = ibounds(3)%beg, ibounds(3)%end
                do k = ibounds(2)%beg, ibounds(2)%end
                    do j = ibounds(1)%beg, ibounds(1)%end
                        dyn_pres_K = 0._wp
    
                        $:GPU_LOOP(parallelism='[seq]')
                        do i = 1, num_fluids
                            alpha_rho_K(i) = qK_cons_vf(i)%sf(j, k, l)
                            alpha_K(i) = qK_cons_vf(advxb + i - 1)%sf(j, k, l)
                        end do
    
                        if (model_eqns /= 4) then
    #ifdef MFC_SIMULATION
                            ! If in simulation, use acc mixture subroutines
                            if (elasticity) then
                                call s_convert_species_to_mixture_variables_acc(rho_K, gamma_K, pi_inf_K, qv_K, alpha_K, &
                                                                                alpha_rho_K, Re_K, G_K, Gs)
                            else if (bubbles_euler) then
                                call s_convert_species_to_mixture_variables_bubbles_acc(rho_K, gamma_K, pi_inf_K, qv_K, &
                                                                                        alpha_K, alpha_rho_K, Re_K)
                            else
                                call s_convert_species_to_mixture_variables_acc(rho_K, gamma_K, pi_inf_K, qv_K, &
                                                                                alpha_K, alpha_rho_K, Re_K)
                            end if
    #else
                            ! If pre-processing, use non acc mixture subroutines
                            if (elasticity) then
                                call s_convert_to_mixture_variables(qK_cons_vf, j, k, l, &
                                                                    rho_K, gamma_K, pi_inf_K, qv_K, Re_K, G_K, fluid_pp(:)%G)
                            else
                                call s_convert_to_mixture_variables(qK_cons_vf, j, k, l, &
                                                                    rho_K, gamma_K, pi_inf_K, qv_K)
                            end if
    #endif
                        end if
    
                        if (relativity) then
                            if (n == 0) then
                                B(1) = Bx0
                                B(2) = qK_cons_vf(B_idx%beg)%sf(j, k, l)
                                B(3) = qK_cons_vf(B_idx%beg + 1)%sf(j, k, l)
                            else
                                B(1) = qK_cons_vf(B_idx%beg)%sf(j, k, l)
                                B(2) = qK_cons_vf(B_idx%beg + 1)%sf(j, k, l)
                                B(3) = qK_cons_vf(B_idx%beg + 2)%sf(j, k, l)
                            end if
                            B2 = B(1)**2 + B(2)**2 + B(3)**2
    
                            m2 = 0._wp
                            $:GPU_LOOP(parallelism='[seq]')
                            do i = momxb, momxe
                                m2 = m2 + qK_cons_vf(i)%sf(j, k, l)**2
                            end do
    
                            S = 0._wp
                            $:GPU_LOOP(parallelism='[seq]')
                            do i = 1, 3
                                S = S + qK_cons_vf(momxb + i - 1)%sf(j, k, l)*B(i)
                            end do
    
                            E = qK_cons_vf(E_idx)%sf(j, k, l)
    
                            D = 0._wp
                            $:GPU_LOOP(parallelism='[seq]')
                            do i = 1, contxe
                                D = D + qK_cons_vf(i)%sf(j, k, l)
                            end do
    
                            ! Newton-Raphson
                            W = E + D
                            $:GPU_LOOP(parallelism='[seq]')
                            do iter = 1, relativity_cons_to_prim_max_iter
                                Ga = (W + B2)*W/sqrt((W + B2)**2*W**2 - (m2*W**2 + S**2*(2*W + B2)))
                                pres = (W - D*Ga)/((gamma_K + 1)*Ga**2) ! Thermal pressure from EOS
                                f = W - pres + (1 - 1/(2*Ga**2))*B2 - S**2/(2*W**2) - E - D
    
                                ! The first equation below corrects a typo in (Mignone & Bodo, 2006)
                                ! m2*W**22*m2*W**2, which would cancel with the 2* in other terms
                                ! This corrected version is not used as the second equation empirically converges faster.
                                ! First equation is kept for further investigation.
                                ! dGa_dW = -Ga**3 * ( S**2*(3*W**2+3*W*B2+B2**2) + m2*W**2 ) / (W**3 * (W+B2)**3) ! first (corrected)
                                dGa_dW = -Ga**3*(2*S**2*(3*W**2 + 3*W*B2 + B2**2) + m2*W**2)/(2*W**3*(W + B2)**3) ! second (in paper)
    
                                dp_dW = (Ga*(1 + D*dGa_dW) - 2*W*dGa_dW)/((gamma_K + 1)*Ga**3)
                                df_dW = 1 - dp_dW + (B2/Ga**3)*dGa_dW + S**2/W**3
    
                                dW = -f/df_dW
                                W = W + dW
                                if (abs(dW) < 1.e-12_wp*W) exit
                            end do
    
                            ! Recalculate pressure using converged W
                            Ga = (W + B2)*W/sqrt((W + B2)**2*W**2 - (m2*W**2 + S**2*(2*W + B2)))
                            qK_prim_vf(E_idx)%sf(j, k, l) = (W - D*Ga)/((gamma_K + 1)*Ga**2)
    
                            ! Recover the other primitive variables
                            $:GPU_LOOP(parallelism='[seq]')
                            do i = 1, 3
                                qK_prim_vf(momxb + i - 1)%sf(j, k, l) = (qK_cons_vf(momxb + i - 1)%sf(j, k, l) + (S/W)*B(i))/(W + B2)
                            end do
                            qK_prim_vf(1)%sf(j, k, l) = D/Ga ! Hard-coded for single-component for now
    
                            $:GPU_LOOP(parallelism='[seq]')
                            do i = B_idx%beg, B_idx%end
                                qK_prim_vf(i)%sf(j, k, l) = qK_cons_vf(i)%sf(j, k, l)
                            end do
    
                            cycle ! skip all the non-relativistic conversions below
                        end if
    
                        if (chemistry) then
                            rho_K = 0._wp
                            $:GPU_LOOP(parallelism='[seq]')
                            do i = chemxb, chemxe
                                rho_K = rho_K + max(0._wp, qK_cons_vf(i)%sf(j, k, l))
                            end do
    
                            $:GPU_LOOP(parallelism='[seq]')
                            do i = 1, contxe
                                qK_prim_vf(i)%sf(j, k, l) = rho_K
                            end do
    
                            $:GPU_LOOP(parallelism='[seq]')
                            do i = chemxb, chemxe
                                qK_prim_vf(i)%sf(j, k, l) = max(0._wp, qK_cons_vf(i)%sf(j, k, l)/rho_K)
                            end do
                        else
                            $:GPU_LOOP(parallelism='[seq]')
                            do i = 1, contxe
                                qK_prim_vf(i)%sf(j, k, l) = qK_cons_vf(i)%sf(j, k, l)
                            end do
                        end if
    
    #ifdef MFC_SIMULATION
                        rho_K = max(rho_K, sgm_eps)
    #endif
    
                        $:GPU_LOOP(parallelism='[seq]')
                        do i = momxb, momxe
                            if (model_eqns /= 4) then
                                qK_prim_vf(i)%sf(j, k, l) = qK_cons_vf(i)%sf(j, k, l) &
                                                            /rho_K
                                dyn_pres_K = dyn_pres_K + 5.e-1_wp*qK_cons_vf(i)%sf(j, k, l) &
                                             *qK_prim_vf(i)%sf(j, k, l)
                            else
                                qK_prim_vf(i)%sf(j, k, l) = qK_cons_vf(i)%sf(j, k, l) &
                                                            /qK_cons_vf(1)%sf(j, k, l)
                            end if
                        end do
    
                        if (chemistry) then
                            $:GPU_LOOP(parallelism='[seq]')
                            do i = 1, num_species
                                rhoYks(i) = qK_cons_vf(chemxb + i - 1)%sf(j, k, l)
                            end do
    
                            T = q_T_sf%sf(j, k, l)
                        end if
    
                        if (mhd) then
                            if (n == 0) then
                                pres_mag = 0.5_wp*(Bx0**2 + qK_cons_vf(B_idx%beg)%sf(j, k, l)**2 + qK_cons_vf(B_idx%beg + 1)%sf(j, k, l)**2)
                            else
                                pres_mag = 0.5_wp*(qK_cons_vf(B_idx%beg)%sf(j, k, l)**2 + qK_cons_vf(B_idx%beg + 1)%sf(j, k, l)**2 + qK_cons_vf(B_idx%beg + 2)%sf(j, k, l)**2)
                            end if
                        else
                            pres_mag = 0._wp
                        end if
    
                        call s_compute_pressure(qK_cons_vf(E_idx)%sf(j, k, l), &
                                                qK_cons_vf(alf_idx)%sf(j, k, l), &
                                                dyn_pres_K, pi_inf_K, gamma_K, rho_K, &
                                                qv_K, rhoYks, pres, T, pres_mag=pres_mag)
    
                        qK_prim_vf(E_idx)%sf(j, k, l) = pres
    
                        if (chemistry) then
                            q_T_sf%sf(j, k, l) = T
                        end if
    
                        if (bubbles_euler) then
                            $:GPU_LOOP(parallelism='[seq]')
                            do i = 1, nb
                                nRtmp(i) = qK_cons_vf(bubrs(i))%sf(j, k, l)
                            end do
    
                            vftmp = qK_cons_vf(alf_idx)%sf(j, k, l)
    
                            if (qbmm) then
                                !Get nb (constant across all R0 bins)
                                nbub_sc = qK_cons_vf(bubxb)%sf(j, k, l)
    
                                !Convert cons to prim
                                $:GPU_LOOP(parallelism='[seq]')
                                do i = bubxb, bubxe
                                    qK_prim_vf(i)%sf(j, k, l) = qK_cons_vf(i)%sf(j, k, l)/nbub_sc
                                end do
                                !Need to keep track of nb in the primitive variable list (converted back to true value before output)
    #ifdef MFC_SIMULATION
                                qK_prim_vf(bubxb)%sf(j, k, l) = qK_cons_vf(bubxb)%sf(j, k, l)
    #endif
    
                            else
                                if (adv_n) then
                                    qK_prim_vf(n_idx)%sf(j, k, l) = qK_cons_vf(n_idx)%sf(j, k, l)
                                    nbub_sc = qK_prim_vf(n_idx)%sf(j, k, l)
                                else
                                    call s_comp_n_from_cons(vftmp, nRtmp, nbub_sc, weight)
                                end if
    
                                $:GPU_LOOP(parallelism='[seq]')
                                do i = bubxb, bubxe
                                    qK_prim_vf(i)%sf(j, k, l) = qK_cons_vf(i)%sf(j, k, l)/nbub_sc
                                end do
                            end if
                        end if
    
                        if (mhd) then
                            $:GPU_LOOP(parallelism='[seq]')
                            do i = B_idx%beg, B_idx%end
                                qK_prim_vf(i)%sf(j, k, l) = qK_cons_vf(i)%sf(j, k, l)
                            end do
                        end if
    
                        if (elasticity) then
                            $:GPU_LOOP(parallelism='[seq]')
                            do i = strxb, strxe
                                qK_prim_vf(i)%sf(j, k, l) = qK_cons_vf(i)%sf(j, k, l)/rho_K
                            end do
                        end if
    
                        if (hypoelasticity) then
                            $:GPU_LOOP(parallelism='[seq]')
                            do i = strxb, strxe
                                ! subtracting elastic contribution for pressure calculation
                                if (G_K > verysmall) then
                                    if (cont_damage) G_K = G_K*max((1._wp - qK_cons_vf(damage_idx)%sf(j, k, l)), 0._wp)
                                    qK_prim_vf(E_idx)%sf(j, k, l) = qK_prim_vf(E_idx)%sf(j, k, l) - &
                                                                    ((qK_prim_vf(i)%sf(j, k, l)**2._wp)/(4._wp*G_K))/gamma_K
                                    ! Double for shear stresses
                                    if (any(i == shear_indices)) then
                                        qK_prim_vf(E_idx)%sf(j, k, l) = qK_prim_vf(E_idx)%sf(j, k, l) - &
                                                                        ((qK_prim_vf(i)%sf(j, k, l)**2._wp)/(4._wp*G_K))/gamma_K
                                    end if
                                end if
                            end do
                        end if
    
                        if (hyperelasticity) then
                            $:GPU_LOOP(parallelism='[seq]')
                            do i = xibeg, xiend
                                qK_prim_vf(i)%sf(j, k, l) = qK_cons_vf(i)%sf(j, k, l)/rho_K
                            end do
                        end if
    
                        $:GPU_LOOP(parallelism='[seq]')
                        do i = advxb, advxe
                            qK_prim_vf(i)%sf(j, k, l) = qK_cons_vf(i)%sf(j, k, l)
                        end do
    
                        if (surface_tension) then
                            qK_prim_vf(c_idx)%sf(j, k, l) = qK_cons_vf(c_idx)%sf(j, k, l)
                        end if
    
                        if (cont_damage) qK_prim_vf(damage_idx)%sf(j, k, l) = qK_cons_vf(damage_idx)%sf(j, k, l)
    
    #ifdef MFC_POST_PROCESS
                        if (bubbles_lagrange) qK_prim_vf(beta_idx)%sf(j, k, l) = qK_cons_vf(beta_idx)%sf(j, k, l)
    #endif
    
                    end do
                end do
            end do
    Missing Validation

    The macro replacements change parallelization behavior significantly but there are no corresponding tests to validate that the new GPU directives produce equivalent results to the original OpenACC directives.

        $:GPU_DECLARE(create='[is1_viscous,is2_viscous,is3_viscous,iv]')
    
        real(wp), allocatable, dimension(:, :) :: Res_viscous
        $:GPU_DECLARE(create='[Res_viscous]')
    
    contains
    
        impure subroutine s_initialize_viscous_module
    
            integer :: i, j !< generic loop iterators
    
            @:ALLOCATE(Res_viscous(1:2, 1:maxval(Re_size)))
    
            do i = 1, 2
                do j = 1, Re_size(i)
                    Res_viscous(i, j) = fluid_pp(Re_idx(i, j))%Re(i)
                end do
            end do
            $:GPU_UPDATE(device='[Res_viscous,Re_idx,Re_size]')
            $:GPU_ENTER_DATA(copyin='[is1_viscous,is2_viscous,is3_viscous,iv]')

    Copy link

    qodo-merge-pro bot commented Jun 23, 2025

    PR Code Suggestions ✨

    Explore these optional code suggestions:

    CategorySuggestion                                                                                                                                    Impact
    Possible issue
    Remove line continuation characters

    The private variable list contains line continuation characters (&) which may
    not be properly handled by the FYPP preprocessor. Remove the line continuation
    characters and format the private variable list as a single continuous string to
    ensure proper macro expansion.

    src/simulation/m_riemann_solvers.fpp [359-366]

    -$:GPU_PARALLEL_LOOP(collapse=3, private='[alpha_rho_L, alpha_rho_R, &
    -    & vel_L, vel_R, alpha_L, alpha_R, tau_e_L, tau_e_R, &
    -    & G_L, G_R, Re_L, Re_R, rho_avg, h_avg, gamma_avg, &
    -    & s_L, s_R, s_S, Ys_L, Ys_R, xi_field_L, xi_field_R, &
    -    & Cp_iL, Cp_iR, Xs_L, Xs_R, Gamma_iL, Gamma_iR, &
    -    & Yi_avg, Phi_avg, h_iL, h_iR, h_avg_2, c_fast, &
    -    & pres_mag, B, Ga, vdotB, B2, b4, cm, pcorr, &
    -    & zcoef, vel_L_tmp, vel_R_tmp]')
    +$:GPU_PARALLEL_LOOP(collapse=3, private='[alpha_rho_L, alpha_rho_R, vel_L, vel_R, alpha_L, alpha_R, tau_e_L, tau_e_R, G_L, G_R, Re_L, Re_R, rho_avg, h_avg, gamma_avg, s_L, s_R, s_S, Ys_L, Ys_R, xi_field_L, xi_field_R, Cp_iL, Cp_iR, Xs_L, Xs_R, Gamma_iL, Gamma_iR, Yi_avg, Phi_avg, h_iL, h_iR, h_avg_2, c_fast, pres_mag, B, Ga, vdotB, B2, b4, cm, pcorr, zcoef, vel_L_tmp, vel_R_tmp]')
    Suggestion importance[1-10]: 9

    __

    Why: The suggestion correctly identifies that line continuation characters (&) inside a string literal will be treated as part of the string, which is likely not what the $:GPU_PARALLEL_LOOP macro expects. This could lead to preprocessor errors or incorrect code generation. Removing them is crucial for correctness.

    High
    Fix inconsistent line continuation formatting

    The macro call contains mixed line continuation characters within the private
    clause and between macro arguments. This inconsistent formatting may cause FYPP
    preprocessing errors. Consolidate the entire macro call into a single line or
    use consistent continuation formatting throughout.

    src/simulation/m_riemann_solvers.fpp [2443-2451]

    -$:GPU_PARALLEL_LOOP(collapse=3, private='[vel_L, vel_R, &
    -    & Re_L, Re_R, rho_avg, h_avg, gamma_avg, &
    -    & alpha_L, alpha_R, s_L, s_R, s_S, &
    -    & vel_avg_rms, pcorr, zcoef, vel_L_tmp, &
    -    & vel_R_tmp, Ys_L, Ys_R, Xs_L, Xs_R, &
    -    & Gamma_iL, Gamma_iR, Cp_iL, Cp_iR, tau_e_L, &
    -    & tau_e_R, xi_field_L, xi_field_R, Yi_avg, &
    -    & Phi_avg, h_iL, h_iR, h_avg_2]', &
    -    & copyin='[is1, is2, is3]')
    +$:GPU_PARALLEL_LOOP(collapse=3, private='[vel_L, vel_R, Re_L, Re_R, rho_avg, h_avg, gamma_avg, alpha_L, alpha_R, s_L, s_R, s_S, vel_avg_rms, pcorr, zcoef, vel_L_tmp, vel_R_tmp, Ys_L, Ys_R, Xs_L, Xs_R, Gamma_iL, Gamma_iR, Cp_iL, Cp_iR, tau_e_L, tau_e_R, xi_field_L, xi_field_R, Yi_avg, Phi_avg, h_iL, h_iR, h_avg_2]', copyin='[is1, is2, is3]')
    Suggestion importance[1-10]: 9

    __

    Why: The suggestion correctly identifies two critical issues: line continuation characters (&) inside a string literal, and a continuation character between macro arguments. Both are likely to cause preprocessor errors or incorrect code generation. The proposed fix resolves these issues, ensuring the macro call is parsed correctly.

    High
    General
    Remove duplicate variable reference
    Suggestion Impact:The suggestion was directly implemented - the duplicate 'idwbuff' reference was removed from both GPU_ENTER_DATA and GPU_UPDATE directives

    code diff:

    -        $:GPU_ENTER_DATA(copyin='[idwbuff,idwbuff]')
    -        $:GPU_UPDATE(device='[idwbuff, idwbuff]')
    +        $:GPU_ENTER_DATA(copyin='[idwbuff]')
    +        $:GPU_UPDATE(device='[idwbuff]')

    The same variable idwbuff is listed twice in the device update directive, which
    is redundant and may cause issues. Remove the duplicate entry.

    src/simulation/m_rhs.fpp [177]

    -$:GPU_UPDATE(device='[idwbuff, idwbuff]')
    +$:GPU_UPDATE(device='[idwbuff]')
    Suggestion importance[1-10]: 4

    __

    Why: The suggestion correctly identifies that the variable idwbuff is listed twice in the $:GPU_UPDATE directive. Removing the duplicate is a good practice for code clarity and maintainability, although it was present in the original code and its impact is minor.

    Low
    Fix array formatting consistency

    The variable list should be properly formatted as a single array. Remove the
    space after the comma in cbc_dir, cbc_loc to maintain consistency with other
    GPU_DECLARE calls.

    src/simulation/m_cbc.fpp [107]

    -$:GPU_DECLARE(create='[cbc_dir, cbc_loc,flux_cbc_index]')
    +$:GPU_DECLARE(create='[cbc_dir,cbc_loc,flux_cbc_index]')
    Suggestion importance[1-10]: 3

    __

    Why: The suggestion correctly identifies a minor formatting inconsistency in the $:GPU_DECLARE macro. Removing the space after the comma in cbc_dir, cbc_loc aligns its style with other similar declarations in the file, improving code consistency.

    Low
    Add spacing after commas

    The variable list in the GPU_DECLARE macro lacks proper spacing after commas,
    which could cause parsing issues. Add spaces after commas to improve readability
    and ensure consistent formatting with standard Fortran conventions.

    src/simulation/m_riemann_solvers.fpp [70]

    -$:GPU_DECLARE(create='[flux_rsx_vf,flux_src_rsx_vf,flux_rsy_vf,flux_src_rsy_vf,flux_rsz_vf,flux_src_rsz_vf]')
    +$:GPU_DECLARE(create='[flux_rsx_vf, flux_src_rsx_vf, flux_rsy_vf, flux_src_rsy_vf, flux_rsz_vf, flux_src_rsz_vf]')
    Suggestion importance[1-10]: 2

    __

    Why: The suggestion correctly points out missing spaces after commas within the string literal for the create argument. While this is a valid style improvement for readability, it has no functional impact as the spacing within the string is irrelevant to the FYPP preprocessor or the Fortran compiler.

    Low
    • Update

    @sbryngelson
    Copy link
    Member

    -$:GPU_UPDATE(device='[idwbuff, idwbuff]')
    +$:GPU_UPDATE(device='[idwbuff]')

    @prathi-wind prathi-wind requested a review from a team as a code owner June 24, 2025 19:45
    @sbryngelson sbryngelson self-requested a review June 28, 2025 11:54
    @sbryngelson
    Copy link
    Member

    My PR unfortunately caused a merge error to resolve

    @wilfonba
    Copy link
    Contributor

    wilfonba commented Jun 28, 2025

    Weird, when I merged locally, it didn't show any conflicts.

    @sbryngelson
    Copy link
    Member

    Weird, when I merged locally, it didn't show any conflicts.

    Yeah, not sure what GitHub was thinking, but it's actually a straightforward merge. Of course your local git may have made some merge assumptions that the website isn't willing to do, but in either case the resulting new code looks right. We'll see if it passes tests. Want to merge this soon so other people can fix their PRs if needed (which are accumulating...)

    @sbryngelson sbryngelson merged commit 40c1327 into MFlowCode:master Jul 8, 2025
    59 of 65 checks passed
    prathi-wind added a commit to prathi-wind/MFC-prathi that referenced this pull request Jul 13, 2025
    …wCode#883)
    
    Co-authored-by: Xuzheng Tian <xtian64@login-phoenix-rh9-3.pace.gatech.edu>
    Co-authored-by: Spencer Bryngelson <sbryngelson@gmail.com>
    Co-authored-by: Spencer Bryngelson <shb@gatech.edu>
    Co-authored-by: mohdsaid497566 <mohdsaid497566@gmail.com>
    Co-authored-by: Tanush Prathi <tprathi3@login-phoenix-rh9-1.pace.gatech.edu>
    Co-authored-by: Mohammed S. Al-Mahrouqi <145478595+mohdsaid497566@users.noreply.github.com>
    Co-authored-by: Ben Wilfong <48168887+wilfonba@users.noreply.github.com>
    @prathi-wind prathi-wind deleted the meta-directive branch July 18, 2025 22:30
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Development

    Successfully merging this pull request may close these issues.

    4 participants
    pFad - Phonifier reborn

    Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

    Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


    Alternative Proxies:

    Alternative Proxy

    pFad Proxy

    pFad v3 Proxy

    pFad v4 Proxy