Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Offload DG method to GPUs #1485

Draft
wants to merge 96 commits into
base: main
Choose a base branch
from

Conversation

jkravs
Copy link

@jkravs jkravs commented May 24, 2023

Offload some parts of the DG method to GPU accelerators.

TODO:

  • Port elixir_advection_basic.jl with 2D tree mesh: (be39bb8)
    • Write GPU kernels for calculations:
      • Volume integral.
      • Interface flux.
      • Surface integral.
      • Jacobian calculation.
    • Initialize data on GPU memory
      • Data from Interface and Element containers
      • Jacobian (530d10a)
      • u and du integration variables (9a81463)
  • Check how well tree mesh offloading is applicable to p4est mesh.
  • Port elixir_advection_basic.jl with 2D p4est mesh:
    • Replace Symbol Arrays with Integers Arrays for GPU access: (92736d8)
      • In Interface Container node_indices
      • In Boundary Container node_indices
      • In Mortar Container node_indices
    • Write Kernels for KA CPU backend: (21dfe5a)
    • Adapt Kernels to CUDA backend if necessary: (85efe0a)
      • Weak Form Kernel
      • Interface Flux
      • Surface Integral
    • Initialize data on GPU memory:
  • Port more advanced elixirs that use boundaries, mortars, flux differencing kernels, etc.

@sloede sloede self-assigned this May 24, 2023
@codecov
Copy link

codecov bot commented May 29, 2023

Codecov Report

Merging #1485 (92b9e68) into main (5676ec0) will increase coverage by 5.74%.
The diff coverage is 5.08%.

❗ Current head 92b9e68 differs from pull request most recent head 4a13bde. Consider uploading reports for the commit 4a13bde to get more accurate results

@@            Coverage Diff             @@
##             main    #1485      +/-   ##
==========================================
+ Coverage   88.81%   94.55%   +5.74%     
==========================================
  Files         363      360       -3     
  Lines       30172    29980     -192     
==========================================
+ Hits        26796    28345    +1549     
+ Misses       3376     1635    -1741     
Flag Coverage Δ
unittests 94.55% <5.08%> (+5.74%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/Trixi.jl 68.18% <ø> (+27.27%) ⬆️
...emidiscretization/semidiscretization_hyperbolic.jl 85.04% <0.00%> (-6.49%) ⬇️
src/solvers/dgsem_tree/dg_2d.jl 88.11% <0.00%> (-8.37%) ⬇️
src/semidiscretization/semidiscretization.jl 95.24% <75.00%> (-0.88%) ⬇️

... and 58 files with indirect coverage changes

@@ -10,8 +10,10 @@ DiffEqCallbacks = "459566f4-90b8-5000-8ac3-15dfb0a30def"
EllipsisNotation = "da5c29d0-fa7d-589e-88eb-ea29b0a81949"
FillArrays = "1a297f60-69ca-5386-bcde-b61e274b549b"
ForwardDiff = "f6369f11-7733-5829-9624-2563aa707210"
GPUArrays = "0c68f7d7-f131-5f86-a1c3-88cf8149b2d7"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change to GPUArraysCore.jl (see discussion on Julia Slack)

Comment on lines +349 to +353
get_backend(::PtrArray) = CPU()

function get_array_type(backend::CPU)
return Array
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those should inline anyways, but this might give the compiler even more motivation to do so

Suggested change
get_backend(::PtrArray) = CPU()
function get_array_type(backend::CPU)
return Array
end
@inline get_backend(::PtrArray) = CPU()
@inline get_array_type(backend::CPU) = Array

Comment on lines +52 to +53
tmp_u = copyto!(CPU(), allocate(CPU(), eltype(u), size(u)), u)
integrate(cons2cons, tmp_u, semi; normalize = normalize)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious: is it not possible (or feasible) to execute integration on the GPU, or is it just not implemented yet?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another question: if the array u already lives on the CPU, is this still a copy (I assume it is) or is it a no-op. If it forces a copy, we should consider dispatching on u, i.e., if it is our "CPU backend array type" we should keep the original implementation and only do the copy on non-CPU backends.

But this is really just something to keep in mind/store on a TODO list, not something that needs to be done right now

@@ -68,14 +68,16 @@ function calc_jacobian_matrix!(jacobian_matrix, element,
# jacobian_matrix[1, 2, :, :, element] = node_coordinates[1, :, :, element] * derivative_matrix' # x_η
# jacobian_matrix[2, 2, :, :, element] = node_coordinates[2, :, :, element] * derivative_matrix' # y_η

tmp_derivate_matrix = copyto!(CPU(), allocate(CPU(), eltype(derivative_matrix), size(derivative_matrix)), derivative_matrix)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
tmp_derivate_matrix = copyto!(CPU(), allocate(CPU(), eltype(derivative_matrix), size(derivative_matrix)), derivative_matrix)
tmp_derivative_matrix = copyto!(CPU(), allocate(CPU(), eltype(derivative_matrix), size(derivative_matrix)), derivative_matrix)

Here and elsewhere?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants