Created FYPP macros to allow for meta-directive parallelization #883

prathi-wind · 2025-06-13T00:13:14Z

User description

Description

Added FYPP macros that replace the current OpenACC directives. In a future pull request, the meta-directives will add support for OpenMP. This pull request has replaced all of the ACC directives except acc kernels and those in mpi_common.fpp, and syscheck.fpp.

Fixes #(issue) [optional]

Type of change

Please delete options that are not relevant.

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Something else

Scope

This PR comprises a set of related changes with a common goal

If you cannot check the above box, please split your PR into multiple PRs that each have a common goal.

How Has This Been Tested?

Please describe the tests that you ran to verify your changes.
Provide instructions so we can reproduce.
Please also list any relevant details for your test configuration

Test A
Test B

Test Configuration:

What computers and compilers did you use to test this:

Checklist

I have added comments for the new code
I added Doxygen docstrings to the new code
I have made corresponding changes to the documentation (docs/)
I have added regression tests to the test suite so that people can verify in the future that the feature is behaving as expected
I have added example cases in examples/ that demonstrate my new feature performing as expected.
They run to completion and demonstrate "interesting physics"
I ran ./mfc.sh format before committing my code
New and existing tests pass locally with my changes, including with GPU capability enabled (both NVIDIA hardware with NVHPC compilers and AMD hardware with CRAY compilers) and disabled
This PR does not introduce any repeated code (it follows the DRY principle)
I cannot think of a way to condense this code and reduce any introduced additional line count

If your code changes any code source files (anything in `src/simulation`)

To make sure the code is performing as expected on GPU devices, I have:

Checked that the code compiles using NVHPC compilers
Checked that the code compiles using CRAY compilers
Ran the code on either V100, A100, or H100 GPUs and ensured the new feature performed as expected (the GPU results match the CPU results)
Ran the code on MI200+ GPUs and ensure the new features performed as expected (the GPU results match the CPU results)
Enclosed the new feature via nvtx ranges so that they can be identified in profiles
Ran a Nsight Systems profile using ./mfc.sh run XXXX --gpu -t simulation --nsys, and have attached the output file (.nsys-rep) and plain text results to this PR
Ran a Rocprof Systems profile using ./mfc.sh run XXXX --gpu -t simulation --rsys --hip-trace, and have attached the output file and plain text results to this PR.
Ran my code using various numbers of different GPUs (1, 2, and 8, for example) in parallel and made sure that the results scale similarly to what happens if you run without the new code/feature

PR Type

Enhancement

Description

• Created comprehensive FYPP macro library for GPU parallelization directives in parallel_macros.fpp
• Replaced OpenACC directives with FYPP GPU macros across multiple simulation modules
• Converted !$acc declare create statements to $:GPU_DECLARE(create='[...]') format
• Replaced !$acc parallel loop with $:GPU_PARALLEL_LOOP macro calls
• Updated !$acc routine seq to $:GPU_ROUTINE(parallelism='[seq]') format
• Converted data movement directives (!$acc update, !$acc enter data) to corresponding GPU macros
• Applied changes to core simulation modules including Riemann solvers, viscous, RHS, CBC, WENO, and others
• Maintained functionality while preparing codebase for future OpenMP meta-directive support

Changes walkthrough 📝

Relevant files

Enhancement

15 files

m_riemann_solvers.fpp `Replace OpenACC directives with FYPP GPU macros` src/simulation/m_riemann_solvers.fpp • Replaced OpenACC directives with FYPP GPU macros throughout the file • Converted `!$acc declare create` statements to `$:GPU_DECLARE(create='[...]')` format • Replaced `!$acc parallel loop` with `$:GPU_PARALLEL_LOOP` macro calls • Updated `!$acc loop seq` to `$:GPU_LOOP(parallelism='[seq]')` format • Converted `!$acc update` and `!$acc enter data` to corresponding GPU macros	+238/-232
m_bubbles_EL.fpp `Replace OpenACC directives with FYPP GPU macros` src/simulation/m_bubbles_EL.fpp • Added FYPP GPU macros include and replaced OpenACC directives • Converted `!$acc declare create` statements to `$:GPU_DECLARE(create='[...]')` format • Replaced `!$acc parallel loop` with `$:GPU_PARALLEL_LOOP` macro calls • Updated `!$acc routine seq` to `$:GPU_ROUTINE` macro format • Converted `!$acc update` statements to `$:GPU_UPDATE` macro calls	+63/-60
m_helper_basic.fpp `Add FYPP macros and replace OpenACC routine directives` src/common/m_helper_basic.fpp • Added FYPP macros include directive at the top of the file • Replaced `!$acc routine seq` directives with `$:GPU_ROUTINE(parallelism='[seq]')` macro calls	+6/-4
m_viscous.fpp `Replace OpenACC directives with FYPP GPU macros in viscous module` src/simulation/m_viscous.fpp • Replaced OpenACC directives with FYPP GPU macros throughout the file • Converted `!$acc declare create` to `$:GPU_DECLARE(create='[...]')` • Replaced `!$acc update device` with `$:GPU_UPDATE(device='[...]')` • Converted `!$acc parallel loop` to `$:GPU_PARALLEL_LOOP` with appropriate parameters • Replaced `!$acc loop seq` with `$:GPU_LOOP(parallelism='[seq]')`	+112/-108
m_rhs.fpp `Replace OpenACC directives with FYPP GPU macros in RHS module` src/simulation/m_rhs.fpp • Replaced OpenACC directives with FYPP GPU macros for variable declarations • Converted `!$acc declare create` to `$:GPU_DECLARE(create='[...]')` • Replaced `!$acc enter data` and `!$acc` `exit data` with corresponding GPU macros • Converted `!$acc parallel` `loop` to `$:GPU_PARALLEL_LOOP` with collapse and private parameters • Updated `!$acc loop seq` to `$:GPU_LOOP(parallelism='[seq]')`	+113/-105
m_cbc.fpp `Replace OpenACC directives with FYPP GPU macros in CBC module` src/simulation/m_cbc.fpp • Replaced OpenACC directives with FYPP GPU macros for variable declarations • Converted `!$acc declare create` to `$:GPU_DECLARE(create='[...]')` • Replaced `!$acc update device` with `$:GPU_UPDATE(device='[...]')` • Converted `!$acc parallel loop` to `$:GPU_PARALLEL_LOOP` with appropriate parameters • Updated `!$acc loop` `seq` to `$:GPU_LOOP(parallelism='[seq]')`	+95/-89
m_mpi_proxy.fpp `Replace OpenACC directives with FYPP GPU macros in MPI proxy` src/simulation/m_mpi_proxy.fpp • Replaced OpenACC directives with FYPP GPU macros for variable declarations • Converted `!$acc declare create` to `$:GPU_DECLARE(create='[i_halo_size]')` • Replaced `!$acc update device` with `$:GPU_UPDATE(device='[i_halo_size]')` • Converted `!$acc parallel` `loop` to `$:GPU_PARALLEL_LOOP` with private parameters	+8/-8
m_patches.fpp `Replace OpenACC routine directives with FYPP GPU macros in patches` src/pre_process/m_patches.fpp • Replaced OpenACC routine directives with FYPP GPU macros • Converted `!$acc routine seq` to `$:GPU_ROUTINE(parallelism='[seq]')` • Applied changes to coordinate conversion functions and helper routines	+4/-4
m_global_parameters.fpp `Replace OpenACC directives with FYPP GPU macros in global parameters` src/simulation/m_global_parameters.fpp • Replaced OpenACC directives with FYPP GPU_DECLARE macros for variable declarations • Replaced OpenACC update directives with FYPP GPU_UPDATE macros for device updates • Replaced OpenACC enter data directives with FYPP GPU_ENTER_DATA macros • Consolidated multiple GPU declarations into fewer macro calls with array syntax	+96/-70
m_variables_conversion.fpp `Convert OpenACC directives to FYPP GPU macros in variables conversion` src/common/m_variables_conversion.fpp • Replaced OpenACC routine directives with FYPP GPU_ROUTINE macros • Replaced OpenACC parallel loop directives with FYPP GPU_PARALLEL_LOOP macros • Replaced OpenACC loop directives with FYPP GPU_LOOP macros • Updated data movement directives to use FYPP GPU_ENTER_DATA and GPU_UPDATE macros	+58/-74
m_weno.fpp `Modernize WENO module with FYPP GPU macros` src/simulation/m_weno.fpp • Added GPU_DECLARE macros for WENO-related variable declarations • Replaced OpenACC parallel loop directives with FYPP GPU_PARALLEL_LOOP macros • Replaced OpenACC loop directives with FYPP GPU_LOOP macros • Updated device data updates to use FYPP GPU_UPDATE macros	+33/-44
m_boundary_common.fpp `Convert boundary condition module to FYPP GPU macros` src/common/m_boundary_common.fpp • Replaced OpenACC declare directives with FYPP GPU_DECLARE macros • Replaced OpenACC parallel loop directives with FYPP GPU_PARALLEL_LOOP macros • Replaced OpenACC routine directives with FYPP GPU_ROUTINE macros • Updated device data updates to use FYPP GPU_UPDATE macros	+39/-59
parallel_macros.fpp `Create FYPP macro library for GPU parallelization directives` src/common/include/parallel_macros.fpp • Added comprehensive FYPP macro definitions for GPU parallelization • Implemented macros for OpenACC directives like GPU_PARALLEL_LOOP, GPU_ROUTINE, GPU_DECLARE • Added support for various OpenACC clauses and data management directives • Included helper functions for generating proper clause syntax	+397/-0
m_qbmm.fpp `Modernize QBMM module with FYPP GPU macros` src/simulation/m_qbmm.fpp • Replaced OpenACC declare directives with FYPP GPU_DECLARE macros • Replaced OpenACC parallel loop directives with FYPP GPU_PARALLEL_LOOP macros • Replaced OpenACC routine directives with FYPP GPU_ROUTINE macros • Updated data movement directives to use FYPP GPU_ENTER_DATA and GPU_UPDATE macros	+44/-51
m_hypoelastic.fpp `Convert hypoelastic module to FYPP GPU macros` src/simulation/m_hypoelastic.fpp • Replaced OpenACC declare directives with FYPP GPU_DECLARE macros • Replaced OpenACC parallel loop directives with FYPP GPU_PARALLEL_LOOP macros • Replaced OpenACC loop directives with FYPP GPU_LOOP macros • Updated device data updates to use FYPP GPU_UPDATE macros	+25/-31

Additional files

23 files

macros.fpp	+20/-15
m_chemistry.fpp	+6/-7
m_finite_differences.fpp	+4/-1
m_helper.fpp	+3/-2
m_phase_change.fpp	+26/-47
m_assign_variables.fpp	+3/-2
inline_riemann.fpp	+2/-2
m_acoustic_src.fpp	+36/-29
m_body_forces.fpp	+7/-7
m_bubbles.fpp	+24/-30
m_bubbles_EE.fpp	+24/-23
m_bubbles_EL_kernels.fpp	+27/-39
m_compute_cbc.fpp	+31/-45
m_data_output.fpp	+7/-7
m_fftw.fpp	+91/-91
m_hyperelastic.fpp	+19/-20
m_ibm.fpp	+33/-27
m_mhd.fpp	+9/-11
m_pressure_relaxation.fpp	+26/-26
m_sim_helpers.fpp	+9/-10
m_start_up.fpp	+33/-26
m_surface_tension.fpp	+20/-20
m_time_steppers.fpp	+24/-24

Need help?
Type /help how to ... in the comments thread for any questions about Qodo Merge usage.
Check out the documentation for more information.

Co-authored-by: Xuzheng Tian <xtian64@login-phoenix-rh9-3.pace.gatech.edu> Co-authored-by: Spencer Bryngelson <sbryngelson@gmail.com> Co-authored-by: Spencer Bryngelson <shb@gatech.edu>

qodo-merge-pro · 2025-06-23T15:16:43Z

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ Recommended focus areas for review Code Duplication Extensive code duplication exists across multiple functions with nearly identical loop structures and variable declarations. The same patterns for computing viscous stresses are repeated in multiple subroutines with only minor variations. $:GPU_PARALLEL_LOOP(collapse=3, private='[alpha_visc, & & alpha_rho_visc, Re_visc, tau_Re]') do l = is3_viscous%beg, is3_viscous%end do k = -1, 1 do j = is1_viscous%beg, is1_viscous%end $:GPU_LOOP(parallelism='[seq]') do i = 1, num_fluids alpha_rho_visc(i) = q_prim_vf(i)%sf(j, k, l) if (bubbles_euler .and. num_fluids == 1) then alpha_visc(i) = 1._wp - q_prim_vf(E_idx + i)%sf(j, k, l) else alpha_visc(i) = q_prim_vf(E_idx + i)%sf(j, k, l) end if end do if (bubbles_euler) then rho_visc = 0._wp gamma_visc = 0._wp pi_inf_visc = 0._wp if (mpp_lim .and. (model_eqns == 2) .and. (num_fluids > 2)) then $:GPU_LOOP(parallelism='[seq]') do i = 1, num_fluids rho_visc = rho_visc + alpha_rho_visc(i) gamma_visc = gamma_visc + alpha_visc(i)gammas(i) pi_inf_visc = pi_inf_visc + alpha_visc(i)pi_infs(i) end do else if ((model_eqns == 2) .and. (num_fluids > 2)) then $:GPU_LOOP(parallelism='[seq]') do i = 1, num_fluids - 1 rho_visc = rho_visc + alpha_rho_visc(i) gamma_visc = gamma_visc + alpha_visc(i)gammas(i) pi_inf_visc = pi_inf_visc + alpha_visc(i)pi_infs(i) end do else rho_visc = alpha_rho_visc(1) gamma_visc = gammas(1) pi_inf_visc = pi_infs(1) end if else rho_visc = 0._wp gamma_visc = 0._wp pi_inf_visc = 0._wp alpha_visc_sum = 0._wp if (mpp_lim) then $:GPU_LOOP(parallelism='[seq]') do i = 1, num_fluids alpha_rho_visc(i) = max(0._wp, alpha_rho_visc(i)) alpha_visc(i) = min(max(0._wp, alpha_visc(i)), 1._wp) alpha_visc_sum = alpha_visc_sum + alpha_visc(i) end do alpha_visc = alpha_visc/max(alpha_visc_sum, sgm_eps) end if $:GPU_LOOP(parallelism='[seq]') do i = 1, num_fluids rho_visc = rho_visc + alpha_rho_visc(i) gamma_visc = gamma_visc + alpha_visc(i)gammas(i) pi_inf_visc = pi_inf_visc + alpha_visc(i)pi_infs(i) end do if (viscous) then $:GPU_LOOP(parallelism='[seq]') do i = 1, 2 Re_visc(i) = dflt_real if (Re_size(i) > 0) Re_visc(i) = 0._wp $:GPU_LOOP(parallelism='[seq]') do q = 1, Re_size(i) Re_visc(i) = alpha_visc(Re_idx(i, q))/Res_viscous(i, q) & + Re_visc(i) end do Re_visc(i) = 1._wp/max(Re_visc(i), sgm_eps) end do end if end if tau_Re(2, 1) = (grad_y_vf(1)%sf(j, k, l) + & grad_x_vf(2)%sf(j, k, l))/ & Re_visc(1) tau_Re(2, 2) = (4._wpgrad_y_vf(2)%sf(j, k, l) & - 2._wpgrad_x_vf(1)%sf(j, k, l) & - 2._wpq_prim_vf(momxb + 1)%sf(j, k, l)/y_cc(k))/ & (3._wpRe_visc(1)) $:GPU_LOOP(parallelism='[seq]') do i = 1, 2 tau_Re_vf(contxe + i)%sf(j, k, l) = & tau_Re_vf(contxe + i)%sf(j, k, l) - & tau_Re(2, i) tau_Re_vf(E_idx)%sf(j, k, l) = & tau_Re_vf(E_idx)%sf(j, k, l) - & q_prim_vf(contxe + i)%sf(j, k, l)tau_Re(2, i) end do end do end do end do end if if (bulk_stress) then ! Bulk stresses $:GPU_PARALLEL_LOOP(collapse=3, private='[alpha_visc, & & alpha_rho_visc, Re_visc, tau_Re]') do l = is3_viscous%beg, is3_viscous%end do k = -1, 1 do j = is1_viscous%beg, is1_viscous%end $:GPU_LOOP(parallelism='[seq]') do i = 1, num_fluids alpha_rho_visc(i) = q_prim_vf(i)%sf(j, k, l) if (bubbles_euler .and. num_fluids == 1) then alpha_visc(i) = 1._wp - q_prim_vf(E_idx + i)%sf(j, k, l) else alpha_visc(i) = q_prim_vf(E_idx + i)%sf(j, k, l) end if end do if (bubbles_euler) then rho_visc = 0._wp gamma_visc = 0._wp pi_inf_visc = 0._wp if (mpp_lim .and. (model_eqns == 2) .and. (num_fluids > 2)) then $:GPU_LOOP(parallelism='[seq]') do i = 1, num_fluids rho_visc = rho_visc + alpha_rho_visc(i) gamma_visc = gamma_visc + alpha_visc(i)gammas(i) pi_inf_visc = pi_inf_visc + alpha_visc(i)pi_infs(i) end do else if ((model_eqns == 2) .and. (num_fluids > 2)) then $:GPU_LOOP(parallelism='[seq]') do i = 1, num_fluids - 1 rho_visc = rho_visc + alpha_rho_visc(i) gamma_visc = gamma_visc + alpha_visc(i)gammas(i) pi_inf_visc = pi_inf_visc + alpha_visc(i)pi_infs(i) end do else rho_visc = alpha_rho_visc(1) gamma_visc = gammas(1) pi_inf_visc = pi_infs(1) end if else rho_visc = 0._wp gamma_visc = 0._wp pi_inf_visc = 0._wp alpha_visc_sum = 0._wp if (mpp_lim) then $:GPU_LOOP(parallelism='[seq]') do i = 1, num_fluids alpha_rho_visc(i) = max(0._wp, alpha_rho_visc(i)) alpha_visc(i) = min(max(0._wp, alpha_visc(i)), 1._wp) alpha_visc_sum = alpha_visc_sum + alpha_visc(i) end do alpha_visc = alpha_visc/max(alpha_visc_sum, sgm_eps) end if $:GPU_LOOP(parallelism='[seq]') do i = 1, num_fluids rho_visc = rho_visc + alpha_rho_visc(i) gamma_visc = gamma_visc + alpha_visc(i)gammas(i) pi_inf_visc = pi_inf_visc + alpha_visc(i)pi_infs(i) end do if (viscous) then $:GPU_LOOP(parallelism='[seq]') do i = 1, 2 Re_visc(i) = dflt_real if (Re_size(i) > 0) Re_visc(i) = 0._wp $:GPU_LOOP(parallelism='[seq]') do q = 1, Re_size(i) Re_visc(i) = alpha_visc(Re_idx(i, q))/Res_viscous(i, q) & + Re_visc(i) end do Re_visc(i) = 1._wp/max(Re_visc(i), sgm_eps) end do end if end if tau_Re(2, 2) = (grad_x_vf(1)%sf(j, k, l) + & grad_y_vf(2)%sf(j, k, l) + & q_prim_vf(momxb + 1)%sf(j, k, l)/y_cc(k))/ & Re_visc(2) tau_Re_vf(momxb + 1)%sf(j, k, l) = & tau_Re_vf(momxb + 1)%sf(j, k, l) - & tau_Re(2, 2) tau_Re_vf(E_idx)%sf(j, k, l) = & tau_Re_vf(E_idx)%sf(j, k, l) - & q_prim_vf(momxb + 1)%sf(j, k, l)tau_Re(2, 2) end do end do end do end if if (p == 0) return Complex Loop Logic The conservative to primitive variable conversion contains deeply nested loops with complex conditional logic and multiple sequential inner loops that could introduce performance bottlenecks or numerical instabilities. $:GPU_PARALLEL_LOOP(collapse=3, private='[alpha_K, alpha_rho_K, Re_K, & & nRtmp, rho_K, gamma_K, pi_inf_K,qv_K, & & dyn_pres_K, rhoYks, B]') do l = ibounds(3)%beg, ibounds(3)%end do k = ibounds(2)%beg, ibounds(2)%end do j = ibounds(1)%beg, ibounds(1)%end dyn_pres_K = 0._wp $:GPU_LOOP(parallelism='[seq]') do i = 1, num_fluids alpha_rho_K(i) = qK_cons_vf(i)%sf(j, k, l) alpha_K(i) = qK_cons_vf(advxb + i - 1)%sf(j, k, l) end do if (model_eqns /= 4) then #ifdef MFC_SIMULATION ! If in simulation, use acc mixture subroutines if (elasticity) then call s_convert_species_to_mixture_variables_acc(rho_K, gamma_K, pi_inf_K, qv_K, alpha_K, & alpha_rho_K, Re_K, G_K, Gs) else if (bubbles_euler) then call s_convert_species_to_mixture_variables_bubbles_acc(rho_K, gamma_K, pi_inf_K, qv_K, & alpha_K, alpha_rho_K, Re_K) else call s_convert_species_to_mixture_variables_acc(rho_K, gamma_K, pi_inf_K, qv_K, & alpha_K, alpha_rho_K, Re_K) end if #else ! If pre-processing, use non acc mixture subroutines if (elasticity) then call s_convert_to_mixture_variables(qK_cons_vf, j, k, l, & rho_K, gamma_K, pi_inf_K, qv_K, Re_K, G_K, fluid_pp(:)%G) else call s_convert_to_mixture_variables(qK_cons_vf, j, k, l, & rho_K, gamma_K, pi_inf_K, qv_K) end if #endif end if if (relativity) then if (n == 0) then B(1) = Bx0 B(2) = qK_cons_vf(B_idx%beg)%sf(j, k, l) B(3) = qK_cons_vf(B_idx%beg + 1)%sf(j, k, l) else B(1) = qK_cons_vf(B_idx%beg)%sf(j, k, l) B(2) = qK_cons_vf(B_idx%beg + 1)%sf(j, k, l) B(3) = qK_cons_vf(B_idx%beg + 2)%sf(j, k, l) end if B2 = B(1)2 + B(2)2 + B(3)2 m2 = 0._wp $:GPU_LOOP(parallelism='[seq]') do i = momxb, momxe m2 = m2 + qK_cons_vf(i)%sf(j, k, l)2 end do S = 0._wp $:GPU_LOOP(parallelism='[seq]') do i = 1, 3 S = S + qK_cons_vf(momxb + i - 1)%sf(j, k, l)B(i) end do E = qK_cons_vf(E_idx)%sf(j, k, l) D = 0._wp $:GPU_LOOP(parallelism='[seq]') do i = 1, contxe D = D + qK_cons_vf(i)%sf(j, k, l) end do ! Newton-Raphson W = E + D $:GPU_LOOP(parallelism='[seq]') do iter = 1, relativity_cons_to_prim_max_iter Ga = (W + B2)W/sqrt((W + B2)*2W*2 - (m2W2 + S2(2W + B2))) pres = (W - DGa)/((gamma_K + 1)Ga*2) ! Thermal pressure from EOS f = W - pres + (1 - 1/(2Ga*2))B2 - S*2/(2W*2) - E - D ! The first equation below corrects a typo in (Mignone & Bodo, 2006) ! m2W*2 → 2m2W2, which would cancel with the 2 in other terms ! This corrected version is not used as the second equation empirically converges faster. ! First equation is kept for further investigation. ! dGa_dW = -Ga*3 ( S*2(3W2+3WB2+B22) + m2W2 ) / (W3 * (W+B2)3) ! first (corrected) dGa_dW = -Ga3(2S*2(3W2 + 3WB2 + B22) + m2W*2)/(2W*3(W + B2)*3) ! second (in paper) dp_dW = (Ga(1 + DdGa_dW) - 2WdGa_dW)/((gamma_K + 1)Ga3) df_dW = 1 - dp_dW + (B2/Ga3)dGa_dW + S2/W3 dW = -f/df_dW W = W + dW if (abs(dW) < 1.e-12_wpW) exit end do ! Recalculate pressure using converged W Ga = (W + B2)W/sqrt((W + B2)2W*2 - (m2W2 + S2(2W + B2))) qK_prim_vf(E_idx)%sf(j, k, l) = (W - DGa)/((gamma_K + 1)Ga*2) ! Recover the other primitive variables $:GPU_LOOP(parallelism='[seq]') do i = 1, 3 qK_prim_vf(momxb + i - 1)%sf(j, k, l) = (qK_cons_vf(momxb + i - 1)%sf(j, k, l) + (S/W)B(i))/(W + B2) end do qK_prim_vf(1)%sf(j, k, l) = D/Ga ! Hard-coded for single-component for now $:GPU_LOOP(parallelism='[seq]') do i = B_idx%beg, B_idx%end qK_prim_vf(i)%sf(j, k, l) = qK_cons_vf(i)%sf(j, k, l) end do cycle ! skip all the non-relativistic conversions below end if if (chemistry) then rho_K = 0._wp $:GPU_LOOP(parallelism='[seq]') do i = chemxb, chemxe rho_K = rho_K + max(0._wp, qK_cons_vf(i)%sf(j, k, l)) end do $:GPU_LOOP(parallelism='[seq]') do i = 1, contxe qK_prim_vf(i)%sf(j, k, l) = rho_K end do $:GPU_LOOP(parallelism='[seq]') do i = chemxb, chemxe qK_prim_vf(i)%sf(j, k, l) = max(0._wp, qK_cons_vf(i)%sf(j, k, l)/rho_K) end do else $:GPU_LOOP(parallelism='[seq]') do i = 1, contxe qK_prim_vf(i)%sf(j, k, l) = qK_cons_vf(i)%sf(j, k, l) end do end if #ifdef MFC_SIMULATION rho_K = max(rho_K, sgm_eps) #endif $:GPU_LOOP(parallelism='[seq]') do i = momxb, momxe if (model_eqns /= 4) then qK_prim_vf(i)%sf(j, k, l) = qK_cons_vf(i)%sf(j, k, l) & /rho_K dyn_pres_K = dyn_pres_K + 5.e-1_wpqK_cons_vf(i)%sf(j, k, l) & qK_prim_vf(i)%sf(j, k, l) else qK_prim_vf(i)%sf(j, k, l) = qK_cons_vf(i)%sf(j, k, l) & /qK_cons_vf(1)%sf(j, k, l) end if end do if (chemistry) then $:GPU_LOOP(parallelism='[seq]') do i = 1, num_species rhoYks(i) = qK_cons_vf(chemxb + i - 1)%sf(j, k, l) end do T = q_T_sf%sf(j, k, l) end if if (mhd) then if (n == 0) then pres_mag = 0.5_wp(Bx02 + qK_cons_vf(B_idx%beg)%sf(j, k, l)2 + qK_cons_vf(B_idx%beg + 1)%sf(j, k, l)2) else pres_mag = 0.5_wp(qK_cons_vf(B_idx%beg)%sf(j, k, l)2 + qK_cons_vf(B_idx%beg + 1)%sf(j, k, l)2 + qK_cons_vf(B_idx%beg + 2)%sf(j, k, l)*2) end if else pres_mag = 0._wp end if call s_compute_pressure(qK_cons_vf(E_idx)%sf(j, k, l), & qK_cons_vf(alf_idx)%sf(j, k, l), & dyn_pres_K, pi_inf_K, gamma_K, rho_K, & qv_K, rhoYks, pres, T, pres_mag=pres_mag) qK_prim_vf(E_idx)%sf(j, k, l) = pres if (chemistry) then q_T_sf%sf(j, k, l) = T end if if (bubbles_euler) then $:GPU_LOOP(parallelism='[seq]') do i = 1, nb nRtmp(i) = qK_cons_vf(bubrs(i))%sf(j, k, l) end do vftmp = qK_cons_vf(alf_idx)%sf(j, k, l) if (qbmm) then !Get nb (constant across all R0 bins) nbub_sc = qK_cons_vf(bubxb)%sf(j, k, l) !Convert cons to prim $:GPU_LOOP(parallelism='[seq]') do i = bubxb, bubxe qK_prim_vf(i)%sf(j, k, l) = qK_cons_vf(i)%sf(j, k, l)/nbub_sc end do !Need to keep track of nb in the primitive variable list (converted back to true value before output) #ifdef MFC_SIMULATION qK_prim_vf(bubxb)%sf(j, k, l) = qK_cons_vf(bubxb)%sf(j, k, l) #endif else if (adv_n) then qK_prim_vf(n_idx)%sf(j, k, l) = qK_cons_vf(n_idx)%sf(j, k, l) nbub_sc = qK_prim_vf(n_idx)%sf(j, k, l) else call s_comp_n_from_cons(vftmp, nRtmp, nbub_sc, weight) end if $:GPU_LOOP(parallelism='[seq]') do i = bubxb, bubxe qK_prim_vf(i)%sf(j, k, l) = qK_cons_vf(i)%sf(j, k, l)/nbub_sc end do end if end if if (mhd) then $:GPU_LOOP(parallelism='[seq]') do i = B_idx%beg, B_idx%end qK_prim_vf(i)%sf(j, k, l) = qK_cons_vf(i)%sf(j, k, l) end do end if if (elasticity) then $:GPU_LOOP(parallelism='[seq]') do i = strxb, strxe qK_prim_vf(i)%sf(j, k, l) = qK_cons_vf(i)%sf(j, k, l)/rho_K end do end if if (hypoelasticity) then $:GPU_LOOP(parallelism='[seq]') do i = strxb, strxe ! subtracting elastic contribution for pressure calculation if (G_K > verysmall) then if (cont_damage) G_K = G_Kmax((1._wp - qK_cons_vf(damage_idx)%sf(j, k, l)), 0._wp) qK_prim_vf(E_idx)%sf(j, k, l) = qK_prim_vf(E_idx)%sf(j, k, l) - & ((qK_prim_vf(i)%sf(j, k, l)*2._wp)/(4._wpG_K))/gamma_K ! Double for shear stresses if (any(i == shear_indices)) then qK_prim_vf(E_idx)%sf(j, k, l) = qK_prim_vf(E_idx)%sf(j, k, l) - & ((qK_prim_vf(i)%sf(j, k, l)*2._wp)/(4._wpG_K))/gamma_K end if end if end do end if if (hyperelasticity) then $:GPU_LOOP(parallelism='[seq]') do i = xibeg, xiend qK_prim_vf(i)%sf(j, k, l) = qK_cons_vf(i)%sf(j, k, l)/rho_K end do end if $:GPU_LOOP(parallelism='[seq]') do i = advxb, advxe qK_prim_vf(i)%sf(j, k, l) = qK_cons_vf(i)%sf(j, k, l) end do if (surface_tension) then qK_prim_vf(c_idx)%sf(j, k, l) = qK_cons_vf(c_idx)%sf(j, k, l) end if if (cont_damage) qK_prim_vf(damage_idx)%sf(j, k, l) = qK_cons_vf(damage_idx)%sf(j, k, l) #ifdef MFC_POST_PROCESS if (bubbles_lagrange) qK_prim_vf(beta_idx)%sf(j, k, l) = qK_cons_vf(beta_idx)%sf(j, k, l) #endif end do end do end do Missing Validation The macro replacements change parallelization behavior significantly but there are no corresponding tests to validate that the new GPU directives produce equivalent results to the original OpenACC directives. $:GPU_DECLARE(create='[is1_viscous,is2_viscous,is3_viscous,iv]') real(wp), allocatable, dimension(:, :) :: Res_viscous $:GPU_DECLARE(create='[Res_viscous]') contains impure subroutine s_initialize_viscous_module integer :: i, j !< generic loop iterators @:ALLOCATE(Res_viscous(1:2, 1:maxval(Re_size))) do i = 1, 2 do j = 1, Re_size(i) Res_viscous(i, j) = fluid_pp(Re_idx(i, j))%Re(i) end do end do $:GPU_UPDATE(device='[Res_viscous,Re_idx,Re_size]') $:GPU_ENTER_DATA(copyin='[is1_viscous,is2_viscous,is3_viscous,iv]')

qodo-merge-pro · 2025-06-23T15:18:11Z

PR Code Suggestions ✨

Explore these optional code suggestions:

Category	Suggestion	Impact
Possible issue	Remove line continuation characters The private variable list contains line continuation characters (`&`) which may not be properly handled by the FYPP preprocessor. Remove the line continuation characters and format the private variable list as a single continuous string to ensure proper macro expansion. src/simulation/m_riemann_solvers.fpp [359-366] -$:GPU_PARALLEL_LOOP(collapse=3, private='[alpha_rho_L, alpha_rho_R, & - & vel_L, vel_R, alpha_L, alpha_R, tau_e_L, tau_e_R, & - & G_L, G_R, Re_L, Re_R, rho_avg, h_avg, gamma_avg, & - & s_L, s_R, s_S, Ys_L, Ys_R, xi_field_L, xi_field_R, & - & Cp_iL, Cp_iR, Xs_L, Xs_R, Gamma_iL, Gamma_iR, & - & Yi_avg, Phi_avg, h_iL, h_iR, h_avg_2, c_fast, & - & pres_mag, B, Ga, vdotB, B2, b4, cm, pcorr, & - & zcoef, vel_L_tmp, vel_R_tmp]') +$:GPU_PARALLEL_LOOP(collapse=3, private='[alpha_rho_L, alpha_rho_R, vel_L, vel_R, alpha_L, alpha_R, tau_e_L, tau_e_R, G_L, G_R, Re_L, Re_R, rho_avg, h_avg, gamma_avg, s_L, s_R, s_S, Ys_L, Ys_R, xi_field_L, xi_field_R, Cp_iL, Cp_iR, Xs_L, Xs_R, Gamma_iL, Gamma_iR, Yi_avg, Phi_avg, h_iL, h_iR, h_avg_2, c_fast, pres_mag, B, Ga, vdotB, B2, b4, cm, pcorr, zcoef, vel_L_tmp, vel_R_tmp]') Suggestion importance[1-10]: 9 __ Why: The suggestion correctly identifies that line continuation characters (`&`) inside a string literal will be treated as part of the string, which is likely not what the `$:GPU_PARALLEL_LOOP` macro expects. This could lead to preprocessor errors or incorrect code generation. Removing them is crucial for correctness.	High
Possible issue	Fix inconsistent line continuation formatting The macro call contains mixed line continuation characters within the private clause and between macro arguments. This inconsistent formatting may cause FYPP preprocessing errors. Consolidate the entire macro call into a single line or use consistent continuation formatting throughout. src/simulation/m_riemann_solvers.fpp [2443-2451] -$:GPU_PARALLEL_LOOP(collapse=3, private='[vel_L, vel_R, & - & Re_L, Re_R, rho_avg, h_avg, gamma_avg, & - & alpha_L, alpha_R, s_L, s_R, s_S, & - & vel_avg_rms, pcorr, zcoef, vel_L_tmp, & - & vel_R_tmp, Ys_L, Ys_R, Xs_L, Xs_R, & - & Gamma_iL, Gamma_iR, Cp_iL, Cp_iR, tau_e_L, & - & tau_e_R, xi_field_L, xi_field_R, Yi_avg, & - & Phi_avg, h_iL, h_iR, h_avg_2]', & - & copyin='[is1, is2, is3]') +$:GPU_PARALLEL_LOOP(collapse=3, private='[vel_L, vel_R, Re_L, Re_R, rho_avg, h_avg, gamma_avg, alpha_L, alpha_R, s_L, s_R, s_S, vel_avg_rms, pcorr, zcoef, vel_L_tmp, vel_R_tmp, Ys_L, Ys_R, Xs_L, Xs_R, Gamma_iL, Gamma_iR, Cp_iL, Cp_iR, tau_e_L, tau_e_R, xi_field_L, xi_field_R, Yi_avg, Phi_avg, h_iL, h_iR, h_avg_2]', copyin='[is1, is2, is3]') Suggestion importance[1-10]: 9 __ Why: The suggestion correctly identifies two critical issues: line continuation characters (`&`) inside a string literal, and a continuation character between macro arguments. Both are likely to cause preprocessor errors or incorrect code generation. The proposed fix resolves these issues, ensuring the macro call is parsed correctly.	High
General	✅ ~~Remove duplicate variable reference~~ Suggestion Impact: The suggestion was directly implemented - the duplicate 'idwbuff' reference was removed from both GPU_ENTER_DATA and GPU_UPDATE directives code diff: - $:GPU_ENTER_DATA(copyin='[idwbuff,idwbuff]') - $:GPU_UPDATE(device='[idwbuff, idwbuff]') + $:GPU_ENTER_DATA(copyin='[idwbuff]') + $:GPU_UPDATE(device='[idwbuff]') The same variable `idwbuff` is listed twice in the device update directive, which is redundant and may cause issues. Remove the duplicate entry. src/simulation/m_rhs.fpp [177] -$:GPU_UPDATE(device='[idwbuff, idwbuff]') +$:GPU_UPDATE(device='[idwbuff]') Suggestion importance[1-10]: 4 __ Why: The suggestion correctly identifies that the variable `idwbuff` is listed twice in the `$:GPU_UPDATE` directive. Removing the duplicate is a good practice for code clarity and maintainability, although it was present in the original code and its impact is minor.	Low
	Fix array formatting consistency The variable list should be properly formatted as a single array. Remove the space after the comma in `cbc_dir, cbc_loc` to maintain consistency with other GPU_DECLARE calls. src/simulation/m_cbc.fpp [107] -$:GPU_DECLARE(create='[cbc_dir, cbc_loc,flux_cbc_index]') +$:GPU_DECLARE(create='[cbc_dir,cbc_loc,flux_cbc_index]') Suggestion importance[1-10]: 3 __ Why: The suggestion correctly identifies a minor formatting inconsistency in the `$:GPU_DECLARE` macro. Removing the space after the comma in `cbc_dir, cbc_loc` aligns its style with other similar declarations in the file, improving code consistency.	Low
	Add spacing after commas The variable list in the GPU_DECLARE macro lacks proper spacing after commas, which could cause parsing issues. Add spaces after commas to improve readability and ensure consistent formatting with standard Fortran conventions. src/simulation/m_riemann_solvers.fpp [70] -$:GPU_DECLARE(create='[flux_rsx_vf,flux_src_rsx_vf,flux_rsy_vf,flux_src_rsy_vf,flux_rsz_vf,flux_src_rsz_vf]') +$:GPU_DECLARE(create='[flux_rsx_vf, flux_src_rsx_vf, flux_rsy_vf, flux_src_rsy_vf, flux_rsz_vf, flux_src_rsz_vf]') Suggestion importance[1-10]: 2 __ Why: The suggestion correctly points out missing spaces after commas within the string literal for the `create` argument. While this is a valid style improvement for readability, it has no functional impact as the spacing within the string is irrelevant to the FYPP preprocessor or the Fortran compiler.	Low
Update

sbryngelson · 2025-06-23T21:14:13Z

-$:GPU_UPDATE(device='[idwbuff, idwbuff]')
+$:GPU_UPDATE(device='[idwbuff]')

sbryngelson · 2025-06-28T14:09:48Z

My PR unfortunately caused a merge error to resolve

wilfonba · 2025-06-28T17:20:31Z

Weird, when I merged locally, it didn't show any conflicts.

sbryngelson · 2025-06-28T17:37:34Z

Weird, when I merged locally, it didn't show any conflicts.

Yeah, not sure what GitHub was thinking, but it's actually a straightforward merge. Of course your local git may have made some merge assumptions that the website isn't willing to do, but in either case the resulting new code looks right. We'll see if it passes tests. Want to merge this soon so other people can fix their PRs if needed (which are accumulating...)

…wCode#883) Co-authored-by: Xuzheng Tian <xtian64@login-phoenix-rh9-3.pace.gatech.edu> Co-authored-by: Spencer Bryngelson <sbryngelson@gmail.com> Co-authored-by: Spencer Bryngelson <shb@gatech.edu> Co-authored-by: mohdsaid497566 <mohdsaid497566@gmail.com> Co-authored-by: Tanush Prathi <tprathi3@login-phoenix-rh9-1.pace.gatech.edu> Co-authored-by: Mohammed S. Al-Mahrouqi <145478595+mohdsaid497566@users.noreply.github.com> Co-authored-by: Ben Wilfong <48168887+wilfonba@users.noreply.github.com>

Tanush Prathi and others added 30 commits June 3, 2025 14:19

Created parallel loop macro

d8730b1

Added more args to parallel loop macro and replaced some directives

8eec3d6

Replaced more directives and added reduction option

4056f84

Multi-reduction support and more directives replaced

7a81f7a

Replaced some directives

23dfe88

Replaced more directives

76f79e0

More directives replaced and formatter

aa9b5c6

More directives replaced

d3e5ac7

Don't change mpi_proxy as will likely cause merge conflicts

2e61a77

Made macro captialized and updated macro with DRY

891b714

Fixed issues with macro

5922cb2

Added loop macro

d59e15e

Added rest of directive macros

ef035a8

Moved directive macros to seperate file

c4332d8

Fixed spelling issue

25a5716

Replaced Openacc routine directives

e8d78e8

Replaced non-parallel loops

779ea27

Added declare macros

8061a0d

Added update device macros

bb0d2ed

Added update host macros

9814a90

Replaced rest of parallel loop directives

2bcfa22

Ran formatter

b4afb9f

Small refactor of QBMM subroutines (MFlowCode#856)

764c71d

Fix packer compare message (MFlowCode#857)

2914bed

Fixed some OpenACC directives (MFlowCode#859)

d9656a7

cmake flags (MFlowCode#862)

6a34845

Benchmarking news a new temp directory and there ya go

217817f

refac qbmm (MFlowCode#861)

b443875

Cody tidying - Remove unused dummy variables (MFlowCode#854)

ffb479a

Co-authored-by: Xuzheng Tian <xtian64@login-phoenix-rh9-3.pace.gatech.edu> Co-authored-by: Spencer Bryngelson <sbryngelson@gmail.com> Co-authored-by: Spencer Bryngelson <shb@gatech.edu>

move pressure relaxation to its own module + refac (MFlowCode#865)

5b8d979

prathi-wind marked this pull request as ready for review June 23, 2025 15:15

qodo-merge-pro bot added the Review effort 4/5 label Jun 23, 2025

sbryngelson and others added 7 commits June 23, 2025 17:18

Merge branch 'master' into meta-directive

f87d672

Added gpu nacro docs

70c679b

more gpu macro docs

5899492

Finished up gpu docs

df1a27b

Reorganized macros into logical groups

9036bfd

Cleaned up docs

7a8cdc2

Fixes render issues with doxygen

6bc96a8

prathi-wind requested a review from a team as a code owner June 24, 2025 19:45

sbryngelson approved these changes Jun 27, 2025

View reviewed changes

sbryngelson self-requested a review June 28, 2025 11:54

Merge remote-tracking branch 'upstream/master' into meta-directive

6ac97d7

prathi-wind and others added 4 commits July 2, 2025 12:48

Merge branch 'master' into meta-directive

ce2a250

Added GPU_PARALLEL and replaced rest of acc directives

f976b66

Ran formatter

2beb457

Merge branch 'master' into meta-directive

68f876e

sbryngelson approved these changes Jul 4, 2025

View reviewed changes

fix bug

df54d6f

sbryngelson merged commit 40c1327 into MFlowCode:master Jul 8, 2025
59 of 65 checks passed

This was referenced Jul 8, 2025

fixes an issue with macro directives for !$acc kernels #926

Merged

Add CI check for raw !$acc or !$omp directives. #928

Closed

prathi-wind deleted the meta-directive branch July 18, 2025 22:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Created FYPP macros to allow for meta-directive parallelization #883

Created FYPP macros to allow for meta-directive parallelization #883

Uh oh!

prathi-wind commented Jun 13, 2025 •

edited by thierrydaoud

Loading

Uh oh!

qodo-merge-pro bot commented Jun 23, 2025

Uh oh!

qodo-merge-pro bot commented Jun 23, 2025 •

edited

Loading

Uh oh!

sbryngelson commented Jun 23, 2025

Uh oh!

sbryngelson commented Jun 28, 2025

Uh oh!

wilfonba commented Jun 28, 2025 •

edited

Loading

Uh oh!

sbryngelson commented Jun 28, 2025

Uh oh!

Uh oh!

Uh oh!

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Created FYPP macros to allow for meta-directive parallelization #883

Created FYPP macros to allow for meta-directive parallelization #883

Uh oh!

Conversation

prathi-wind commented Jun 13, 2025 • edited by thierrydaoud Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

User description

Description

Type of change

Scope

How Has This Been Tested?

Checklist

If your code changes any code source files (anything in src/simulation)

PR Type

Description

Changes walkthrough 📝

Uh oh!

qodo-merge-pro bot commented Jun 23, 2025

PR Reviewer Guide 🔍

Uh oh!

qodo-merge-pro bot commented Jun 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Code Suggestions ✨

Uh oh!

sbryngelson commented Jun 23, 2025

Uh oh!

sbryngelson commented Jun 28, 2025

Uh oh!

wilfonba commented Jun 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sbryngelson commented Jun 28, 2025

Uh oh!

Uh oh!

Uh oh!

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

prathi-wind commented Jun 13, 2025 •

edited by thierrydaoud

Loading

If your code changes any code source files (anything in `src/simulation`)

qodo-merge-pro bot commented Jun 23, 2025 •

edited

Loading

wilfonba commented Jun 28, 2025 •

edited

Loading