-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
computations using only Float32
and no Float64
#2042
Conversation
Review checklistThis checklist is meant to assist creators of PRs (to let them know what reviewers will typically look for) and reviewers (to guide them in a structured review process). Items do not need to be checked explicitly for a PR to be eligible for merging. Purpose and scope
Code quality
Documentation
Testing
Performance
Verification
Created with ❤️ by the Trixi.jl community. |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #2042 +/- ##
=======================================
Coverage 96.30% 96.30%
=======================================
Files 466 467 +1
Lines 37227 37256 +29
=======================================
+ Hits 35850 35879 +29
Misses 1377 1377
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Thanks!! I'm not sure whether using I have a question about the original purpose of adding |
That's indeed the main motivation |
julia> redirect_stdout(devnull) do
trixi_include("examples/unstructured_2d_dgsem/elixir_shallowwater_ec_float32.jl")
end
[ Info: You just called `trixi_include`. Julia may now compile the code, please be patient.
julia> u_ode = copy(sol.u[end]); du_ode = similar(u_ode);
julia> @perfmon "FLOPS_DP" Trixi.rhs!(du_ode, u_ode, semi, 0.0f0);
Group: FLOPS_DP
┌──────────────────────────────────────────┬──────────┐
│ Event │ Thread 1 │
├──────────────────────────────────────────┼──────────┤
│ INSTR_RETIRED_ANY │ 868636.0 │
│ CPU_CLK_UNHALTED_CORE │ 373225.0 │
│ CPU_CLK_UNHALTED_REF │ 287364.0 │
│ FP_ARITH_INST_RETIRED_128B_PACKED_DOUBLE │ 0.0 │
│ FP_ARITH_INST_RETIRED_SCALAR_DOUBLE │ 0.0 │
│ FP_ARITH_INST_RETIRED_256B_PACKED_DOUBLE │ 0.0 │
└──────────────────────────────────────────┴──────────┘
┌──────────────────────┬─────────────┐
│ Metric │ Thread 1 │
├──────────────────────┼─────────────┤
│ Runtime (RDTSC) [s] │ 9.39318e-5 │
│ Runtime unhalted [s] │ 0.000100979 │
│ Clock [MHz] │ 4800.41 │
│ CPI │ 0.429668 │
│ DP [MFLOP/s] │ 0.0 │
│ AVX DP [MFLOP/s] │ 0.0 │
│ Packed [MUOPS/s] │ 0.0 │
│ Scalar [MUOPS/s] │ 0.0 │
│ Vectorization ratio │ NaN │
└──────────────────────┴─────────────┘ |
|
Float32
and no Float64
Float32
and no Float64
FDSBP upwind works, too: julia> trixi_include("examples/unstructured_2d_fdsbp/elixir_euler_free_stream_upwind_float32.jl", tspan = (0.0f0, 0.1f0))
julia> u_ode = copy(sol.u[end]); du_ode = similar(u_ode);
julia> @perfmon "FLOPS_DP" Trixi.rhs!(du_ode, u_ode, semi, 0.0f0);
Group: FLOPS_DP
┌──────────────────────────────────────────┬───────────┐
│ Event │ Thread 1 │
├──────────────────────────────────────────┼───────────┤
│ INSTR_RETIRED_ANY │ 2.59028e7 │
│ CPU_CLK_UNHALTED_CORE │ 1.44735e7 │
│ CPU_CLK_UNHALTED_REF │ 1.1153e7 │
│ FP_ARITH_INST_RETIRED_128B_PACKED_DOUBLE │ 0.0 │
│ FP_ARITH_INST_RETIRED_SCALAR_DOUBLE │ 0.0 │
│ FP_ARITH_INST_RETIRED_256B_PACKED_DOUBLE │ 0.0 │
└──────────────────────────────────────────┴───────────┘
┌──────────────────────┬────────────┐
│ Metric │ Thread 1 │
├──────────────────────┼────────────┤
│ Runtime (RDTSC) [s] │ 0.0033605 │
│ Runtime unhalted [s] │ 0.00391595 │
│ Clock [MHz] │ 4796.45 │
│ CPI │ 0.558763 │
│ DP [MFLOP/s] │ 0.0 │
│ AVX DP [MFLOP/s] │ 0.0 │
│ Packed [MUOPS/s] │ 0.0 │
│ Scalar [MUOPS/s] │ 0.0 │
│ Vectorization ratio │ NaN │
└──────────────────────┴────────────┘ |
julia> @perfmon "FLOPS_DP" solve(ode, CarpenterKennedy2N54(williamson_condition = false),
dt = 5.0f-2, save_everystep = false); But the result shows that Also, you are all referring to type stability during the call of Trixi.jl/src/meshes/structured_mesh.jl Lines 161 to 164 in b9ace6d
What I plan to do is to perform mesh initialization on the GPU and keep the data on the GPU to avoid the overhead of data transfer between the CPU and GPU. I can handle the type stability during initialization myself. @ranocha |
I need to get access to the system again but it may have been that I was running the code without the julia> 0.000264961 * 0.00754828 # MFLOP/s * time in seconds
1.99999981708e-6 So that's like 2 FLOPs. |
It's even less than that - what I have measured here is whether there are |
That's a good point 👍
Done |
Do you plan to do this in your repo or also with PRs to this repo, e.g., to fix the problem you pointed at above? |
That's fair. Then for julia> 0.000512994 * 0.00779737
4.00000402578e-6 And for julia> 0.000808995 * 0.00988881
7.99999784595e-6 It does not look reasonable for me to conclude
I'm more than curious about this question. Do you have any good ideas (or see from other repos) for designing thorough tests for it, except for the memory allocation tests you are currently using?
Certainly. But I'm not sure whether to implement it in my repo or in this repo. Could you see any benefits in making |
No
Well, if it's just fixing the |
It's not work only about |
This PR is big enough as it is. I will not work on the other mesh types immediately. Adding tests for complete RHS evaluations etc. is quite difficult as I have described above - even when looking only at something like |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything looks good as far as the changes go. Just a few questions and one suggestion to simplify the new elixir.
One question looking toward the future, e.g., the ln_mean
. For a literal that cannot be exactly represented in Float32
, would we change a line like
return (x + y) / @evalpoly(f2, 2, 2/3, 2/5, 2/7)
to be
RealT = eltype(x)
return (x + y) / @evalpoly(f2, 2, convert(RealT, 2/3), convert(RealT, 2/5), convert(RealT, 2/7))
examples/p4est_3d_dgsem/elixir_euler_free_stream_boundaries_float32.jl
Outdated
Show resolved
Hide resolved
examples/p4est_3d_dgsem/elixir_euler_free_stream_boundaries_float32.jl
Outdated
Show resolved
Hide resolved
Co-authored-by: Andrew Winters <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
That's a good question. I am not sure how exactly to handle the
Currently, I would likely opt for variant 1 but maybe with a slight change since I am not sure whether it's better to use julia> function foo1(x, y)
s = x + y
return (x + y) / @evalpoly(s, 2, 2 / 3, 2 / 5, 2 / 7)
end
foo1 (generic function with 1 method)
julia> function foo2(x, y)
s = x + y
return (x + y) / @evalpoly(s, 2, 2 / 3, 2 / 5, 2 / 7)
end
foo2 (generic function with 1 method)
julia> function foo2(x, y)
s = x + y
RealT = typeof(s)
return (x + y) / @evalpoly(s, 2, convert(RealT, 2 / 3),
convert(RealT, 2 / 5),
convert(RealT, 2 / 7))
end
foo2 (generic function with 1 method)
julia> function foo3(x, y)
s = x + y
RealT = typeof(s)
return (x + y) / @evalpoly(s, 2, convert(RealT, 2) / 3,
convert(RealT, 2) / 5,
convert(RealT, 2) / 7)
end
foo3 (generic function with 1 method)
julia> foo1(1.0, 1.0)
0.2770448548812665
julia> foo2(1.0, 1.0)
0.2770448548812665
julia> foo3(1.0, 1.0)
0.2770448548812665
julia> foo1(1.0f0, 1.0f0)
0.2770448548812665
julia> foo2(1.0f0, 1.0f0)
0.27704483f0
julia> foo3(1.0f0, 1.0f0)
0.27704483f0
julia> @code_llvm foo1(1.0, 1.0)
; @ REPL[9]:1 within `foo1`
define double @julia_foo1_637(double %0, double %1) #0 {
top:
; @ REPL[9]:2 within `foo1`
; ┌ @ float.jl:409 within `+`
%2 = fadd double %0, %1
; └
; @ REPL[9]:3 within `foo1`
; ┌ @ math.jl:186 within `evalpoly`
; │┌ @ math.jl:187 within `macro expansion`
; ││┌ @ float.jl:414 within `muladd`
%3 = fmul contract double %2, 0x3FD2492492492492
%4 = fadd contract double %3, 4.000000e-01
%5 = fmul contract double %2, %4
%6 = fadd contract double %5, 0x3FE5555555555555
; │││ @ promotion.jl:459 within `muladd` @ float.jl:414
%7 = fmul contract double %2, %6
%8 = fadd contract double %7, 2.000000e+00
; └└└
; ┌ @ float.jl:412 within `/`
%9 = fdiv double %2, %8
; └
ret double %9
}
julia> @code_llvm foo2(1.0, 1.0)
; @ REPL[11]:1 within `foo2`
define double @julia_foo2_648(double %0, double %1) #0 {
top:
; @ REPL[11]:2 within `foo2`
; ┌ @ float.jl:409 within `+`
%2 = fadd double %0, %1
; └
; @ REPL[11]:4 within `foo2`
; ┌ @ math.jl:186 within `evalpoly`
; │┌ @ math.jl:187 within `macro expansion`
; ││┌ @ float.jl:414 within `muladd`
%3 = fmul contract double %2, 0x3FD2492492492492
%4 = fadd contract double %3, 4.000000e-01
%5 = fmul contract double %2, %4
%6 = fadd contract double %5, 0x3FE5555555555555
; │││ @ promotion.jl:459 within `muladd` @ float.jl:414
%7 = fmul contract double %2, %6
%8 = fadd contract double %7, 2.000000e+00
; └└└
; ┌ @ float.jl:412 within `/`
%9 = fdiv double %2, %8
; └
ret double %9
}
julia> @code_llvm foo3(1.0, 1.0)
; @ REPL[12]:1 within `foo3`
define double @julia_foo3_650(double %0, double %1) #0 {
top:
; @ REPL[12]:2 within `foo3`
; ┌ @ float.jl:409 within `+`
%2 = fadd double %0, %1
; └
; @ REPL[12]:4 within `foo3`
; ┌ @ math.jl:186 within `evalpoly`
; │┌ @ math.jl:187 within `macro expansion`
; ││┌ @ float.jl:414 within `muladd`
%3 = fmul contract double %2, 0x3FD2492492492492
%4 = fadd contract double %3, 4.000000e-01
%5 = fmul contract double %2, %4
%6 = fadd contract double %5, 0x3FE5555555555555
; │││ @ promotion.jl:459 within `muladd` @ float.jl:414
%7 = fmul contract double %2, %6
%8 = fadd contract double %7, 2.000000e+00
; └└└
; ┌ @ float.jl:412 within `/`
%9 = fdiv double %2, %8
; └
ret double %9
}
julia> @code_llvm foo1(1.0f0, 1.0f0)
; @ REPL[9]:1 within `foo1`
define double @julia_foo1_652(float %0, float %1) #0 {
top:
; @ REPL[9]:2 within `foo1`
; ┌ @ float.jl:409 within `+`
%2 = fadd float %0, %1
; └
; @ REPL[9]:3 within `foo1`
; ┌ @ math.jl:186 within `evalpoly`
; │┌ @ math.jl:187 within `macro expansion`
; ││┌ @ promotion.jl:459 within `muladd`
; │││┌ @ promotion.jl:399 within `promote`
; ││││┌ @ promotion.jl:377 within `_promote`
; │││││┌ @ number.jl:7 within `convert`
; ││││││┌ @ float.jl:261 within `Float64`
%3 = fpext float %2 to double
; │││└└└└
; │││ @ promotion.jl:459 within `muladd` @ float.jl:414
%4 = fmul contract double %3, 0x3FD2492492492492
%5 = fadd contract double %4, 4.000000e-01
%6 = fmul contract double %5, %3
%7 = fadd contract double %6, 0x3FE5555555555555
%8 = fmul contract double %7, %3
%9 = fadd contract double %8, 2.000000e+00
; └└└
; ┌ @ promotion.jl:425 within `/` @ float.jl:412
%10 = fdiv double %3, %9
; └
ret double %10
}
julia> @code_llvm foo2(1.0f0, 1.0f0)
; @ REPL[11]:1 within `foo2`
define float @julia_foo2_654(float %0, float %1) #0 {
top:
; @ REPL[11]:2 within `foo2`
; ┌ @ float.jl:409 within `+`
%2 = fadd float %0, %1
; └
; @ REPL[11]:4 within `foo2`
; ┌ @ math.jl:186 within `evalpoly`
; │┌ @ math.jl:187 within `macro expansion`
; ││┌ @ float.jl:414 within `muladd`
%3 = fmul contract float %2, 0x3FD24924A0000000
%4 = fadd contract float %3, 0x3FD99999A0000000
%5 = fmul contract float %2, %4
%6 = fadd contract float %5, 0x3FE5555560000000
; │││ @ promotion.jl:459 within `muladd` @ float.jl:414
%7 = fmul contract float %2, %6
%8 = fadd contract float %7, 2.000000e+00
; └└└
; ┌ @ float.jl:412 within `/`
%9 = fdiv float %2, %8
; └
ret float %9
}
julia> @code_llvm foo3(1.0f0, 1.0f0)
; @ REPL[12]:1 within `foo3`
define float @julia_foo3_656(float %0, float %1) #0 {
top:
; @ REPL[12]:2 within `foo3`
; ┌ @ float.jl:409 within `+`
%2 = fadd float %0, %1
; └
; @ REPL[12]:4 within `foo3`
; ┌ @ math.jl:186 within `evalpoly`
; │┌ @ math.jl:187 within `macro expansion`
; ││┌ @ float.jl:414 within `muladd`
%3 = fmul contract float %2, 0x3FD24924A0000000
%4 = fadd contract float %3, 0x3FD99999A0000000
%5 = fmul contract float %2, %4
%6 = fadd contract float %5, 0x3FE5555560000000
; │││ @ promotion.jl:459 within `muladd` @ float.jl:414
%7 = fmul contract float %2, %6
%8 = fadd contract float %7, 2.000000e+00
; └└└
; ┌ @ float.jl:412 within `/`
%9 = fdiv float %2, %8
; └
ret float %9
} |
I would also lean toward option 1 as it touches the least amount of code and (hopefully) does not cause too many headaches with AD. Just to double check that I understand the type inheritance:
|
Yes, exactly 👍 |
Due to the great work of @huiyuxie, we're on a good way to support running computations only in
Float32
. However, checking that we're indeed using onlyFloat32
and noFloat64
everywhere is not trivial as far as I know. This may also be interesting for @patrickersing.The best option that comes to my mind is using LIKWID.jl. Installing it (Linux only!) and using the additional patch
I get the following results:
This confirms that our
Trixi.rhs!
evaluation does indeed not use anyFloat64
computations in this case 🥳For completeness, I also get
Solving without callbacks is also fine (using the time step size roughly given by the
stepsize_callback
):The
StepsizeCallback
is also fine:However, the
SummaryCallback
uses a tiny amount ofFloat64
computations:The
AnalysisCallback
uses quite someFloat64
computations:The
SaveSolutionCallback
uses a tiny amount ofFloat64
computations, too:TODO for this PR
Open questions for discussion
Float32
for stuff like thePerformanceCounter
?Open questions that will not be resolved here