[WIP] Offload DG method to GPUs #1485

jkravs · 2023-05-24T13:38:37Z

Offload some parts of the DG method to GPU accelerators.

TODO:

codecov · 2023-05-29T16:55:34Z

Codecov Report

Merging #1485 (92b9e68) into main (5676ec0) will increase coverage by 5.74%.
The diff coverage is 5.08%.

❗ Current head 92b9e68 differs from pull request most recent head 4a13bde. Consider uploading reports for the commit 4a13bde to get more accurate results

@@            Coverage Diff             @@
##             main    #1485      +/-   ##
==========================================
+ Coverage   88.81%   94.55%   +5.74%     
==========================================
  Files         363      360       -3     
  Lines       30172    29980     -192     
==========================================
+ Hits        26796    28345    +1549     
+ Misses       3376     1635    -1741

Flag	Coverage Δ
unittests	`94.55% <5.08%> (+5.74%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
src/Trixi.jl	`68.18% <ø> (+27.27%)`	⬆️
...emidiscretization/semidiscretization_hyperbolic.jl	`85.04% <0.00%> (-6.49%)`	⬇️
src/solvers/dgsem_tree/dg_2d.jl	`88.11% <0.00%> (-8.37%)`	⬇️
src/semidiscretization/semidiscretization.jl	`95.24% <75.00%> (-0.88%)`	⬇️

... and 58 files with indirect coverage changes

sloede · 2023-08-21T09:11:23Z

Project.toml

@@ -10,8 +10,10 @@ DiffEqCallbacks = "459566f4-90b8-5000-8ac3-15dfb0a30def"
 EllipsisNotation = "da5c29d0-fa7d-589e-88eb-ea29b0a81949"
 FillArrays = "1a297f60-69ca-5386-bcde-b61e274b549b"
 ForwardDiff = "f6369f11-7733-5829-9624-2563aa707210"
+GPUArrays = "0c68f7d7-f131-5f86-a1c3-88cf8149b2d7"


Change to GPUArraysCore.jl (see discussion on Julia Slack)

sloede · 2023-08-21T09:12:50Z

src/auxiliary/auxiliary.jl

+get_backend(::PtrArray) = CPU()
+
+function get_array_type(backend::CPU)
+    return Array
+end


Those should inline anyways, but this might give the compiler even more motivation to do so

Suggested change

get_backend(::PtrArray) = CPU()

function get_array_type(backend::CPU)

return Array

end

@inline get_backend(::PtrArray) = CPU()

@inline get_array_type(backend::CPU) = Array

sloede · 2023-08-21T09:15:16Z

src/semidiscretization/semidiscretization.jl

+    tmp_u = copyto!(CPU(), allocate(CPU(), eltype(u), size(u)), u)
+    integrate(cons2cons, tmp_u, semi; normalize = normalize)


Just curious: is it not possible (or feasible) to execute integration on the GPU, or is it just not implemented yet?

Another question: if the array u already lives on the CPU, is this still a copy (I assume it is) or is it a no-op. If it forces a copy, we should consider dispatching on u, i.e., if it is our "CPU backend array type" we should keep the original implementation and only do the copy on non-CPU backends.

But this is really just something to keep in mind/store on a TODO list, not something that needs to be done right now

sloede · 2023-08-21T09:17:48Z

src/solvers/dgsem_structured/containers_2d.jl

@@ -68,14 +68,16 @@ function calc_jacobian_matrix!(jacobian_matrix, element,
    # jacobian_matrix[1, 2, :, :, element] = node_coordinates[1, :, :, element] * derivative_matrix' # x_η
    # jacobian_matrix[2, 2, :, :, element] = node_coordinates[2, :, :, element] * derivative_matrix' # y_η

+    tmp_derivate_matrix = copyto!(CPU(), allocate(CPU(), eltype(derivative_matrix), size(derivative_matrix)), derivative_matrix)


Suggested change

tmp_derivate_matrix = copyto!(CPU(), allocate(CPU(), eltype(derivative_matrix), size(derivative_matrix)), derivative_matrix)

tmp_derivative_matrix = copyto!(CPU(), allocate(CPU(), eltype(derivative_matrix), size(derivative_matrix)), derivative_matrix)

Here and elsewhere?

Initial; Added KernelAbstraction dependency

096b860

sloede self-assigned this May 24, 2023

jkravs added 3 commits May 29, 2023 16:46

Add bad initial gpu offloading for volume integral

61adfd3

Add possiblity to test correctness using CI

3aa6897

Merge branch 'main' into dg_gpu_port

92b9e68

jkravs added 24 commits June 6, 2023 17:49

Fixed data race

fabd803

RWTH cluster CI added

bf5102c

Merge branch 'main' into dg_gpu_port

5f14874

Fixed invalid typo in yml key

4a13bde

Change Backend with trixi_include

22cf6e4

Fixed gitlab ci

92c9f34

Removed after scripts

35f4926

Install OrdinaryDiffEq in CI

a05ed8a

Fixed ODE version because of dependency breakage

a848f72

Added KernelAbstractions dependency in CI

fdf3931

Fixed crash of default cpu computation

54d9fc7

Removing workgroupsize autotunes kernels in CUDA.jl

48c31b8

Initial interface flux calculation offloaded

651d26d

Merge branch 'main' into dg_gpu_port

9b0bc7e

Fixed CUDA offloading bugs

ad2763f

Fixed scalar indexing issue during interface init

79c5286

Julia 1.9 test pipeline

e79d8bb

Fixed invalid yaml file

9e3db29

Added Extensions to better deal with missing KA API calls

eeb04d2

Removed runtime downcasting bottleneck

7e48fb7

Merge branch 'main' into dg_gpu_port

1b3ac6c

Boundary GPU ported

b4dd4da

Fixed issues with zero element arrays and CUDA.jl

382712f

Prolong2mortars on gpu

ee93f61

jkravs added 12 commits August 2, 2023 13:40

Separated internal loop of surface integral calc

c2eec36

Add CPU Offloading of surface integral calc

2084f33

Longer CI Timeout

8fb7a3e

apply_jacobian offloaded for p4est meshes

21dfe5a

P4est advection basic on GPU

6eb5933

CI Test of p4est on GPU

d81fe1a

Init interfaces.u on GPU

85efe0a

Merge branch 'main' into dg_gpu_port

d3b5e09

Init data from interface container

38ea4bc

Data from element container init on GPU

2c6443e

Removed scalar indexing with dg init on GPU

bd2696d

Merge branch 'main' into dg_gpu_port

ad1b7b6

sloede reviewed Aug 21, 2023

View reviewed changes

jkravs added 17 commits September 6, 2023 17:09

New elixirs

8442991

Remove all symbols

2186fd4

Merge branch 'main' into dg_gpu_port

1164e4c

reset du offload

a4fb500

Flux differencing kernel offload

3cfc529

prolong2interfaces offloaded

369518f

calc_interfaces in 3d p4est offloaded

50a3ba0

dummy functions for p4est 3d boundaries/mortars

0c1ea4b

surface_integral p4est 3d offload

6e89971

apply jacobian p4est gpu offload

4ea72b1

calc source terms 3d offload

a597b10

Merge branch 'main' into dg_gpu_port

2ddb8d6

elixir_advection_basic_fd offloaded to gpu

3022e5c

p4est euler taylor green vortex elixir offloaded to gpu

74b61a1

Reduced memory usage of calculate dt

8e69091

Merge branch 'main' into dg_gpu_port

a880c16

Merge branch 'main' into dg_gpu_port

c7df644

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Offload DG method to GPUs #1485

[WIP] Offload DG method to GPUs #1485

jkravs commented May 24, 2023 •

edited

Loading

codecov bot commented May 29, 2023 •

edited

Loading

sloede Aug 21, 2023

sloede Aug 21, 2023

sloede Aug 21, 2023

sloede Aug 21, 2023

sloede Aug 21, 2023

		tmp_u = copyto!(CPU(), allocate(CPU(), eltype(u), size(u)), u)
		integrate(cons2cons, tmp_u, semi; normalize = normalize)

	tmp_derivate_matrix = copyto!(CPU(), allocate(CPU(), eltype(derivative_matrix), size(derivative_matrix)), derivative_matrix)
	tmp_derivative_matrix = copyto!(CPU(), allocate(CPU(), eltype(derivative_matrix), size(derivative_matrix)), derivative_matrix)

[WIP] Offload DG method to GPUs #1485

Are you sure you want to change the base?

[WIP] Offload DG method to GPUs #1485

Conversation

jkravs commented May 24, 2023 • edited Loading

codecov bot commented May 29, 2023 • edited Loading

Codecov Report

sloede Aug 21, 2023

Choose a reason for hiding this comment

sloede Aug 21, 2023

Choose a reason for hiding this comment

sloede Aug 21, 2023

Choose a reason for hiding this comment

sloede Aug 21, 2023

Choose a reason for hiding this comment

sloede Aug 21, 2023

Choose a reason for hiding this comment

jkravs commented May 24, 2023 •

edited

Loading

codecov bot commented May 29, 2023 •

edited

Loading