Probe WRT on GPUs #964

anandrdbz · 2025-07-21T09:12:43Z

User description

Description

Bug fix for probe_wrt on GPUs, where the acceleration and center of mass are written. Previously gave NaNs and had poor performance since this subroutine was not ported to GPUs

Fixes #(issue) [optional]

Type of change

Please delete options that are not relevant.

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Something else

Scope

This PR comprises a set of related changes with a common goal

If you cannot check the above box, please split your PR into multiple PRs that each have a common goal.

How Has This Been Tested?

Please describe the tests that you ran to verify your changes.
Provide instructions so we can reproduce.
Please also list any relevant details for your test configuration

Test A
Test B

Test Configuration:

What computers and compilers did you use to test this:

Checklist

I have added comments for the new code
I added Doxygen docstrings to the new code
I have made corresponding changes to the documentation (docs/)
I have added regression tests to the test suite so that people can verify in the future that the feature is behaving as expected
I have added example cases in examples/ that demonstrate my new feature performing as expected.
They run to completion and demonstrate "interesting physics"
I ran ./mfc.sh format before committing my code
New and existing tests pass locally with my changes, including with GPU capability enabled (both NVIDIA hardware with NVHPC compilers and AMD hardware with CRAY compilers) and disabled
This PR does not introduce any repeated code (it follows the DRY principle)
I cannot think of a way to condense this code and reduce any introduced additional line count

If your code changes any code source files (anything in `src/simulation`)

To make sure the code is performing as expected on GPU devices, I have:

Checked that the code compiles using NVHPC compilers
Checked that the code compiles using CRAY compilers
Ran the code on either V100, A100, or H100 GPUs and ensured the new feature performed as expected (the GPU results match the CPU results)
Ran the code on MI200+ GPUs and ensure the new features performed as expected (the GPU results match the CPU results)
Enclosed the new feature via nvtx ranges so that they can be identified in profiles
Ran a Nsight Systems profile using ./mfc.sh run XXXX --gpu -t simulation --nsys, and have attached the output file (.nsys-rep) and plain text results to this PR
Ran a Rocprof Systems profile using ./mfc.sh run XXXX --gpu -t simulation --rsys --hip-trace, and have attached the output file and plain text results to this PR.
Ran my code using various numbers of different GPUs (1, 2, and 8, for example) in parallel and made sure that the results scale similarly to what happens if you run without the new code/feature

PR Type

Enhancement

Description

Add GPU support for probe write functionality
Implement GPU memory management for finite difference coefficients
Convert array operations to GPU-compatible loops
Add atomic operations for thread-safe center of mass calculations

Diagram Walkthrough

flowchart LR
  A["CPU-only probe writes"] --> B["GPU memory allocation"]
  B --> C["GPU parallel loops"]
  C --> D["Atomic operations"]
  D --> E["GPU-accelerated probe writes"]

File Walkthrough

Relevant files

Configuration changes

m_checker.fpp `Prohibit probe writes with IGR` src/simulation/m_checker.fpp Add prohibition check for probe writes with IGR	+1/-0

Enhancement

m_derived_variables.fpp `GPU support for derived variables computation` src/simulation/m_derived_variables.fpp Add GPU memory declarations and allocations Convert array operations to GPU parallel loops Implement GPU memory transfers for coefficients Add atomic operations for thread-safe calculations	+240/-128
m_time_steppers.fpp `GPU-compatible time step cycling` src/simulation/m_time_steppers.fpp Replace array slice operations with explicit GPU loops Convert time step cycling to GPU-compatible format	+44/-13

qodo-merge-pro · 2025-07-21T09:13:11Z

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ Recommended focus areas for review Code Duplication The acceleration component calculation contains significant code duplication across the three coordinate directions (x, y, z). The nested loops and finite difference calculations are nearly identical with only variable names changing. This violates the DRY principle and makes maintenance difficult. $:GPU_PARALLEL_LOOP(collapse=3) do l = 0, p do k = 0, n do j = 0, m q_sf(j, k, l) = (11._wpq_prim_vf0(momxb)%sf(j, k, l) & - 18._wpq_prim_vf1(momxb)%sf(j, k, l) & + 9._wpq_prim_vf2(momxb)%sf(j, k, l) & - 2._wpq_prim_vf3(momxb)%sf(j, k, l))/(6._wpdt) end do end do end do if(n == 0) then $:GPU_PARALLEL_LOOP(collapse=4) do l = 0, p do k = 0, n do j = 0, m do r = -fd_number, fd_number q_sf(j, k, l) = q_sf(j, k, l) & + q_prim_vf0(momxb)%sf(j, k, l)fd_coeff_x(r, j)* & q_prim_vf0(momxb)%sf(r + j, k, l) end do end do end do end do elseif (p == 0) then $:GPU_PARALLEL_LOOP(collapse=4) do l = 0, p do k = 0, n do j = 0, m do r = -fd_number, fd_number q_sf(j, k, l) = q_sf(j, k, l) & + q_prim_vf0(momxb)%sf(j, k, l)fd_coeff_x(r, j) & q_prim_vf0(momxb)%sf(r + j, k, l) & + q_prim_vf0(momxb + 1)%sf(j, k, l)fd_coeff_y(r, k) & q_prim_vf0(momxb)%sf(j, r + k, l) end do end do end do end do else if(grid_geometry == 3) then $:GPU_PARALLEL_LOOP(collapse=4) do l = 0, p do k = 0, n do j = 0, m do r = -fd_number, fd_number q_sf(j, k, l) = q_sf(j, k, l) & + q_prim_vf0(momxb)%sf(j, k, l)fd_coeff_x(r, j) & q_prim_vf0(momxb)%sf(r + j, k, l) & + q_prim_vf0(momxb + 1)%sf(j, k, l)fd_coeff_y(r, k) & q_prim_vf0(momxb)%sf(j, r + k, l) & + q_prim_vf0(momxe)%sf(j, k, l)fd_coeff_z(r, l) & q_prim_vf0(momxb)%sf(j, k, r + l)/y_cc(k) end do end do end do end do else $:GPU_PARALLEL_LOOP(collapse=4) do l = 0, p do k = 0, n do j = 0, m do r = -fd_number, fd_number q_sf(j, k, l) = q_sf(j, k, l) & + q_prim_vf0(momxb)%sf(j, k, l)fd_coeff_x(r, j) & q_prim_vf0(momxb)%sf(r + j, k, l) & + q_prim_vf0(momxb + 1)%sf(j, k, l)fd_coeff_y(r, k) & q_prim_vf0(momxb)%sf(j, r + k, l) & + q_prim_vf0(momxe)%sf(j, k, l)fd_coeff_z(r, l) & q_prim_vf0(momxb)%sf(j, k, r + l) end do end do end do end do end if end if ! Computing the acceleration component in the y-coordinate direction elseif (i == 2) then $:GPU_PARALLEL_LOOP(collapse=3) do l = 0, p do k = 0, n do j = 0, m q_sf(j, k, l) = (11._wpq_prim_vf0(momxb + 1)%sf(j, k, l) & - 18._wpq_prim_vf1(momxb + 1)%sf(j, k, l) & + 9._wpq_prim_vf2(momxb + 1)%sf(j, k, l) & - 2._wpq_prim_vf3(momxb + 1)%sf(j, k, l))/(6._wpdt) end do end do end do if (p == 0) then $:GPU_PARALLEL_LOOP(collapse=4) do l = 0, p do k = 0, n do j = 0, m do r = -fd_number, fd_number q_sf(j, k, l) = q_sf(j, k, l) & + q_prim_vf0(momxb)%sf(j, k, l)fd_coeff_x(r, j)* & q_prim_vf0(momxb + 1)%sf(r + j, k, l) & + q_prim_vf0(momxb + 1)%sf(j, k, l)fd_coeff_y(r, k) & q_prim_vf0(momxb + 1)%sf(j, r + k, l) end do end do end do end do else if(grid_geometry == 3) then $:GPU_PARALLEL_LOOP(collapse=4) do l = 0, p do k = 0, n do j = 0, m do r = -fd_number, fd_number q_sf(j, k, l) = q_sf(j, k, l) & + q_prim_vf0(momxb)%sf(j, k, l)fd_coeff_x(r, j) & q_prim_vf0(momxb + 1)%sf(r + j, k, l) & + q_prim_vf0(momxb + 1)%sf(j, k, l)fd_coeff_y(r, k) & q_prim_vf0(momxb + 1)%sf(j, r + k, l) & + q_prim_vf0(momxe)%sf(j, k, l)fd_coeff_z(r, l) & q_prim_vf0(momxb + 1)%sf(j, k, r + l)/y_cc(k) & - (q_prim_vf0(momxe)%sf(j, k, l)*2._wp)/y_cc(k) end do end do end do end do else $:GPU_PARALLEL_LOOP(collapse=4) do l = 0, p do k = 0, n do j = 0, m do r = -fd_number, fd_number q_sf(j, k, l) = q_sf(j, k, l) & + q_prim_vf0(momxb)%sf(j, k, l)fd_coeff_x(r, j)* & q_prim_vf0(momxb + 1)%sf(r + j, k, l) & + q_prim_vf0(momxb + 1)%sf(j, k, l)fd_coeff_y(r, k) & q_prim_vf0(momxb + 1)%sf(j, r + k, l) & + q_prim_vf0(momxe)%sf(j, k, l)fd_coeff_z(r, l) & q_prim_vf0(momxb + 1)%sf(j, k, r + l) end do end do end do end do end if end if ! Computing the acceleration component in the z-coordinate direction else $:GPU_PARALLEL_LOOP(collapse=3) do l = 0, p do k = 0, n do j = 0, m q_sf(j, k, l) = (11._wpq_prim_vf0(momxe)%sf(j, k, l) & - 18._wpq_prim_vf1(momxe)%sf(j, k, l) & + 9._wpq_prim_vf2(momxe)%sf(j, k, l) & - 2._wpq_prim_vf3(momxe)%sf(j, k, l))/(6._wpdt) end do end do end do if(grid_geometry == 3) then $:GPU_PARALLEL_LOOP(collapse=4) do l = 0, p do k = 0, n do j = 0, m do r = -fd_number, fd_number q_sf(j, k, l) = q_sf(j, k, l) & + q_prim_vf0(momxb)%sf(j, k, l)fd_coeff_x(r, j)* & q_prim_vf0(momxe)%sf(r + j, k, l) & + q_prim_vf0(momxb + 1)%sf(j, k, l)fd_coeff_y(r, k) & q_prim_vf0(momxe)%sf(j, r + k, l) & + q_prim_vf0(momxe)%sf(j, k, l)fd_coeff_z(r, l) & q_prim_vf0(momxe)%sf(j, k, r + l)/y_cc(k) & + (q_prim_vf0(momxe)%sf(j, k, l)* & q_prim_vf0(momxb + 1)%sf(j, k, l))/y_cc(k) end do end do end do end do else $:GPU_PARALLEL_LOOP(collapse=4) do l = 0, p do k = 0, n do j = 0, m do r = -fd_number, fd_number q_sf(j, k, l) = q_sf(j, k, l) & + q_prim_vf0(momxb)%sf(j, k, l)fd_coeff_x(r, j) & q_prim_vf0(momxe)%sf(r + j, k, l) & + q_prim_vf0(momxb + 1)%sf(j, k, l)fd_coeff_y(r, k) & q_prim_vf0(momxe)%sf(j, r + k, l) & + q_prim_vf0(momxe)%sf(j, k, l)fd_coeff_z(r, l) & q_prim_vf0(momxe)%sf(j, k, r + l) end do end do end do end do end if end if Variable Reference The code uses hardcoded momentum indices like `momxb`, `momxe` instead of the original `mom_idx%beg`, `mom_idx%end` pattern. This change should be verified to ensure these variables are properly defined and accessible in the GPU context. q_sf(j, k, l) = (11._wpq_prim_vf0(momxb)%sf(j, k, l) & - 18._wpq_prim_vf1(momxb)%sf(j, k, l) & + 9._wpq_prim_vf2(momxb)%sf(j, k, l) & - 2._wpq_prim_vf3(momxb)%sf(j, k, l))/(6._wpdt) end do end do end do if(n == 0) then $:GPU_PARALLEL_LOOP(collapse=4) do l = 0, p do k = 0, n do j = 0, m do r = -fd_number, fd_number q_sf(j, k, l) = q_sf(j, k, l) & + q_prim_vf0(momxb)%sf(j, k, l)fd_coeff_x(r, j)* & q_prim_vf0(momxb)%sf(r + j, k, l) end do end do end do end do elseif (p == 0) then $:GPU_PARALLEL_LOOP(collapse=4) do l = 0, p do k = 0, n do j = 0, m do r = -fd_number, fd_number q_sf(j, k, l) = q_sf(j, k, l) & + q_prim_vf0(momxb)%sf(j, k, l)fd_coeff_x(r, j) & q_prim_vf0(momxb)%sf(r + j, k, l) & + q_prim_vf0(momxb + 1)%sf(j, k, l)fd_coeff_y(r, k) & q_prim_vf0(momxb)%sf(j, r + k, l) end do end do end do end do else if(grid_geometry == 3) then $:GPU_PARALLEL_LOOP(collapse=4) do l = 0, p do k = 0, n do j = 0, m do r = -fd_number, fd_number q_sf(j, k, l) = q_sf(j, k, l) & + q_prim_vf0(momxb)%sf(j, k, l)fd_coeff_x(r, j) & q_prim_vf0(momxb)%sf(r + j, k, l) & + q_prim_vf0(momxb + 1)%sf(j, k, l)fd_coeff_y(r, k) & q_prim_vf0(momxb)%sf(j, r + k, l) & + q_prim_vf0(momxe)%sf(j, k, l)fd_coeff_z(r, l) & q_prim_vf0(momxb)%sf(j, k, r + l)/y_cc(k) end do end do end do end do else $:GPU_PARALLEL_LOOP(collapse=4) do l = 0, p do k = 0, n do j = 0, m do r = -fd_number, fd_number q_sf(j, k, l) = q_sf(j, k, l) & + q_prim_vf0(momxb)%sf(j, k, l)fd_coeff_x(r, j) & q_prim_vf0(momxb)%sf(r + j, k, l) & + q_prim_vf0(momxb + 1)%sf(j, k, l)fd_coeff_y(r, k) & q_prim_vf0(momxb)%sf(j, r + k, l) & + q_prim_vf0(momxe)%sf(j, k, l)fd_coeff_z(r, l) & q_prim_vf0(momxb)%sf(j, k, r + l) end do end do end do end do end if end if ! Computing the acceleration component in the y-coordinate direction elseif (i == 2) then $:GPU_PARALLEL_LOOP(collapse=3) do l = 0, p do k = 0, n do j = 0, m q_sf(j, k, l) = (11._wpq_prim_vf0(momxb + 1)%sf(j, k, l) & - 18._wpq_prim_vf1(momxb + 1)%sf(j, k, l) & + 9._wpq_prim_vf2(momxb + 1)%sf(j, k, l) & - 2._wpq_prim_vf3(momxb + 1)%sf(j, k, l))/(6._wpdt) end do end do end do if (p == 0) then $:GPU_PARALLEL_LOOP(collapse=4) do l = 0, p do k = 0, n do j = 0, m do r = -fd_number, fd_number q_sf(j, k, l) = q_sf(j, k, l) & + q_prim_vf0(momxb)%sf(j, k, l)fd_coeff_x(r, j)* & q_prim_vf0(momxb + 1)%sf(r + j, k, l) & + q_prim_vf0(momxb + 1)%sf(j, k, l)fd_coeff_y(r, k) & q_prim_vf0(momxb + 1)%sf(j, r + k, l) end do end do end do end do else if(grid_geometry == 3) then $:GPU_PARALLEL_LOOP(collapse=4) do l = 0, p do k = 0, n do j = 0, m do r = -fd_number, fd_number q_sf(j, k, l) = q_sf(j, k, l) & + q_prim_vf0(momxb)%sf(j, k, l)fd_coeff_x(r, j) & q_prim_vf0(momxb + 1)%sf(r + j, k, l) & + q_prim_vf0(momxb + 1)%sf(j, k, l)fd_coeff_y(r, k) & q_prim_vf0(momxb + 1)%sf(j, r + k, l) & + q_prim_vf0(momxe)%sf(j, k, l)fd_coeff_z(r, l) & q_prim_vf0(momxb + 1)%sf(j, k, r + l)/y_cc(k) & - (q_prim_vf0(momxe)%sf(j, k, l)*2._wp)/y_cc(k) end do end do end do end do else $:GPU_PARALLEL_LOOP(collapse=4) do l = 0, p do k = 0, n do j = 0, m do r = -fd_number, fd_number q_sf(j, k, l) = q_sf(j, k, l) & + q_prim_vf0(momxb)%sf(j, k, l)fd_coeff_x(r, j)* & q_prim_vf0(momxb + 1)%sf(r + j, k, l) & + q_prim_vf0(momxb + 1)%sf(j, k, l)fd_coeff_y(r, k) & q_prim_vf0(momxb + 1)%sf(j, r + k, l) & + q_prim_vf0(momxe)%sf(j, k, l)fd_coeff_z(r, l) & q_prim_vf0(momxb + 1)%sf(j, k, r + l) end do end do end do end do end if end if ! Computing the acceleration component in the z-coordinate direction else $:GPU_PARALLEL_LOOP(collapse=3) do l = 0, p do k = 0, n do j = 0, m q_sf(j, k, l) = (11._wpq_prim_vf0(momxe)%sf(j, k, l) & - 18._wpq_prim_vf1(momxe)%sf(j, k, l) & + 9._wpq_prim_vf2(momxe)%sf(j, k, l) & - 2._wpq_prim_vf3(momxe)%sf(j, k, l))/(6._wpdt) end do end do end do if(grid_geometry == 3) then $:GPU_PARALLEL_LOOP(collapse=4) do l = 0, p do k = 0, n do j = 0, m do r = -fd_number, fd_number q_sf(j, k, l) = q_sf(j, k, l) & + q_prim_vf0(momxb)%sf(j, k, l)fd_coeff_x(r, j)* & q_prim_vf0(momxe)%sf(r + j, k, l) & + q_prim_vf0(momxb + 1)%sf(j, k, l)fd_coeff_y(r, k) & q_prim_vf0(momxe)%sf(j, r + k, l) & + q_prim_vf0(momxe)%sf(j, k, l)fd_coeff_z(r, l) & q_prim_vf0(momxe)%sf(j, k, r + l)/y_cc(k) & + (q_prim_vf0(momxe)%sf(j, k, l)* & q_prim_vf0(momxb + 1)%sf(j, k, l))/y_cc(k) end do end do end do end do else $:GPU_PARALLEL_LOOP(collapse=4) do l = 0, p do k = 0, n do j = 0, m do r = -fd_number, fd_number q_sf(j, k, l) = q_sf(j, k, l) & + q_prim_vf0(momxb)%sf(j, k, l)fd_coeff_x(r, j) & q_prim_vf0(momxe)%sf(r + j, k, l) & + q_prim_vf0(momxb + 1)%sf(j, k, l)fd_coeff_y(r, k) & q_prim_vf0(momxe)%sf(j, r + k, l) & + q_prim_vf0(momxe)%sf(j, k, l)fd_coeff_z(r, l) & q_prim_vf0(momxe)%sf(j, k, r + l) end do Atomic Operations The center of mass calculation uses atomic updates for accumulating values in parallel loops. While necessary for correctness, this could create performance bottlenecks on GPUs due to serialization of memory access. Consider using reduction operations instead. $:GPU_ATOMIC(atomic='update') c_m(i, 1) = c_m(i, 1) + q_vf(i)%sf(j, k, l)dV ! x-location weighted $:GPU_ATOMIC(atomic='update') c_m(i, 2) = c_m(i, 2) + q_vf(i)%sf(j, k, l)dVx_cc(j) ! Volume fraction $:GPU_ATOMIC(atomic='update') c_m(i, 5) = c_m(i, 5) + q_vf(i + advxb - 1)%sf(j, k, l)dV end do end do end do end do elseif (p == 0) then !2D simulation $:GPU_PARALLEL_LOOP(collapse=3,private='[dV]') do l = 0, p !Loop over grid do k = 0, n do j = 0, m $:GPU_LOOP(parallelism='[seq]') do i = 1, num_fluids !Loop over individual fluids dV = dx(j)dy(k) ! Mass $:GPU_ATOMIC(atomic='update') c_m(i, 1) = c_m(i, 1) + q_vf(i)%sf(j, k, l)dV ! x-location weighted $:GPU_ATOMIC(atomic='update') c_m(i, 2) = c_m(i, 2) + q_vf(i)%sf(j, k, l)dVx_cc(j) ! y-location weighted $:GPU_ATOMIC(atomic='update') c_m(i, 3) = c_m(i, 3) + q_vf(i)%sf(j, k, l)dVy_cc(k) ! Volume fraction $:GPU_ATOMIC(atomic='update') c_m(i, 5) = c_m(i, 5) + q_vf(i + advxb - 1)%sf(j, k, l)dV end do end do end do end do else !3D simulation $:GPU_PARALLEL_LOOP(collapse=3,private='[dV]') do l = 0, p !Loop over grid do k = 0, n do j = 0, m $:GPU_LOOP(parallelism='[seq]') do i = 1, num_fluids !Loop over individual fluids dV = dx(j)dy(k)dz(l) ! Mass $:GPU_ATOMIC(atomic='update') c_m(i, 1) = c_m(i, 1) + q_vf(i)%sf(j, k, l)dV ! x-location weighted $:GPU_ATOMIC(atomic='update') c_m(i, 2) = c_m(i, 2) + q_vf(i)%sf(j, k, l)dVx_cc(j) ! y-location weighted $:GPU_ATOMIC(atomic='update') c_m(i, 3) = c_m(i, 3) + q_vf(i)%sf(j, k, l)dVy_cc(k) ! z-location weighted $:GPU_ATOMIC(atomic='update') c_m(i, 4) = c_m(i, 4) + q_vf(i)%sf(j, k, l)dVz_cc(l) ! Volume fraction $:GPU_ATOMIC(atomic='update') c_m(i, 5) = c_m(i, 5) + q_vf(i + advxb - 1)%sf(j, k, l)*dV

qodo-merge-pro · 2025-07-21T09:13:54Z

PR Code Suggestions ✨

Explore these optional code suggestions:

Category	Suggestion	Impact
Possible issue	Add bounds checking for array access The array access `q_prim_vf0(momxb)%sf(r + j, k, l)` can cause out-of-bounds memory access when `r + j` exceeds array bounds. Add boundary checks to prevent accessing invalid memory locations, which could cause crashes or incorrect results. src/simulation/m_derived_variables.fpp [219-230] $:GPU_PARALLEL_LOOP(collapse=4) do l = 0, p do k = 0, n do j = 0, m do r = -fd_number, fd_number - q_sf(j, k, l) = q_sf(j, k, l) & - + q_prim_vf0(momxb)%sf(j, k, l)fd_coeff_x(r, j) & - q_prim_vf0(momxb)%sf(r + j, k, l) + if (r + j >= 0 .and. r + j <= m) then + q_sf(j, k, l) = q_sf(j, k, l) & + + q_prim_vf0(momxb)%sf(j, k, l)fd_coeff_x(r, j) & + q_prim_vf0(momxb)%sf(r + j, k, l) + end if end do end do end do end do Suggestion importance[1-10]: 9 __ Why: The suggestion correctly identifies a potential out-of-bounds memory access in `q_prim_vf0(momxb)%sf(r + j, k, l)`, which is a critical bug that could lead to incorrect results or crashes.	High
Possible issue	Fix data dependencies in parallel loops The cascading assignments create data dependencies that prevent proper GPU parallelization. The assignments should be done in separate loops or use temporary variables to avoid race conditions and ensure correct execution order. src/simulation/m_time_steppers.fpp [1122-1134] +! Use separate loops to avoid data dependencies $:GPU_PARALLEL_LOOP(collapse=4) do i = 1, sys_size do l = 0, p do k = 0, n do j = 0, m - q_prim_ts(3)%vf(i)%sf(j, k, l) = q_prim_ts(2)%vf(i)%sf(j, k, l) - q_prim_ts(2)%vf(i)%sf(j, k, l) = q_prim_ts(1)%vf(i)%sf(j, k, l) - q_prim_ts(1)%vf(i)%sf(j, k, l) = q_prim_ts(0)%vf(i)%sf(j, k, l) - q_prim_ts(0)%vf(i)%sf(j, k, l) = q_prim_vf(i)%sf(j, k, l) + temp_val = q_prim_ts(2)%vf(i)%sf(j, k, l) + q_prim_ts(3)%vf(i)%sf(j, k, l) = temp_val end do end do end do end do +! Continue with separate loops for other assignments Suggestion importance[1-10]: 9 __ Why: The suggestion correctly identifies a read-after-write data dependency within a parallel loop, which creates a race condition and will lead to incorrect results.	High
Update

codecov · 2025-07-21T11:20:35Z

Codecov Report

Attention: Patch coverage is 15.54054% with 125 lines in your changes missing coverage. Please review.

Project coverage is 44.03%. Comparing base (f2ef560) to head (e576795).
Report is 2 commits behind head on master.

Files with missing lines	Patch %	Lines
src/simulation/m_derived_variables.fpp	6.45%	115 Missing and 1 partial ⚠️
src/simulation/m_time_steppers.fpp	65.21%	8 Missing ⚠️
src/simulation/m_checker.fpp	0.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #964      +/-   ##
==========================================
- Coverage   44.08%   44.03%   -0.05%     
==========================================
  Files          69       69              
  Lines       19573    19630      +57     
  Branches     2428     2428              
==========================================
+ Hits         8628     8645      +17     
- Misses       9444     9484      +40     
  Partials     1501     1501

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

sbryngelson · 2025-07-21T18:04:00Z

Is src/simulation/m_time_steppers.fpp [1122-1134] not a race condition on GPU?

anandrdbz · 2025-07-22T00:03:29Z

@sbryngelson I don't think so, because each thread is only reading and writing from one location in the array

Probe WRT on GPUs

66a309e

anandrdbz requested a review from a team as a code owner July 21, 2025 09:12

qodo-merge-pro bot added the Review effort 4/5 label Jul 21, 2025

format

e576795

sbryngelson approved these changes Jul 21, 2025

View reviewed changes

sbryngelson merged commit 31b52be into MFlowCode:master Jul 22, 2025
33 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Probe WRT on GPUs #964

Probe WRT on GPUs #964

Uh oh!

anandrdbz commented Jul 21, 2025 •

edited

Loading

Uh oh!

qodo-merge-pro bot commented Jul 21, 2025

Uh oh!

qodo-merge-pro bot commented Jul 21, 2025 •

edited

Loading

Uh oh!

codecov bot commented Jul 21, 2025 •

edited

Loading

Uh oh!

sbryngelson commented Jul 21, 2025

Uh oh!

anandrdbz commented Jul 22, 2025

Uh oh!

Uh oh!

Uh oh!

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Probe WRT on GPUs #964

Probe WRT on GPUs #964

Uh oh!

Conversation

anandrdbz commented Jul 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

User description

Description

Type of change

Scope

How Has This Been Tested?

Checklist

If your code changes any code source files (anything in src/simulation)

PR Type

Description

Diagram Walkthrough

File Walkthrough

Uh oh!

qodo-merge-pro bot commented Jul 21, 2025

PR Reviewer Guide 🔍

Uh oh!

qodo-merge-pro bot commented Jul 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Code Suggestions ✨

Uh oh!

codecov bot commented Jul 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

sbryngelson commented Jul 21, 2025

Uh oh!

anandrdbz commented Jul 22, 2025

Uh oh!

Uh oh!

Uh oh!

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

anandrdbz commented Jul 21, 2025 •

edited

Loading

If your code changes any code source files (anything in `src/simulation`)

qodo-merge-pro bot commented Jul 21, 2025 •

edited

Loading

codecov bot commented Jul 21, 2025 •

edited

Loading