Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allow for optional printing of errors from GPUs #2923

Merged
merged 6 commits into from
Jul 25, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/workflows/good_defines.txt
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
ALLOW_GPU_PRINTF
AMREX_DEBUG
AMREX_PARTICLES
AMREX_SPACEDIM
Expand Down
31 changes: 31 additions & 0 deletions Docs/source/mpi_plus_x.rst
Original file line number Diff line number Diff line change
Expand Up @@ -130,6 +130,37 @@ To enable this, compile with::
USE_HIP = TRUE


Printing Warnings from GPU Kernels
==================================

.. index:: USE_GPU_PRINTF

Castro will output warnings if several assumptions are violated (often
triggering a retry in the process). On GPUs, printing from a kernel
(using ``printf()``) can increase the number of registers a kernel needs,
causing performance problems. As a result, warnings are disabled by
wrapping them in ``#ifndef AMREX_USE_GPU``.

However, for debugging GPU runs, sometimes we want to see these
warnings. The build option ``USE_GPU_PRINTF=TRUE`` will enable these
(by setting the preprocessor flag ``ALLOW_GPU_PRINTF``).

.. note::

Not every warning has been enabled for GPUs.

.. tip::

On AMD architectures, it seems necessary to use unbuffered I/O. This
can be accomplished in the job submission script (for SLURM) by doing

::

srun -u ./Castro...




Working at Supercomputing Centers
=================================

Expand Down
4 changes: 4 additions & 0 deletions Exec/Make.Castro
Original file line number Diff line number Diff line change
Expand Up @@ -136,6 +136,10 @@ ifeq ($(USE_GPU),TRUE)
endif
endif

ifeq ($(USE_GPU_PRINTF),TRUE)
DEFINES += -DALLOW_GPU_PRINTF
endif

CASTRO_AUTO_SOURCE_DIR := $(TmpBuildDir)/castro_sources/$(optionsSuffix).EXE


Expand Down
7 changes: 7 additions & 0 deletions Source/driver/Castro.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -3300,6 +3300,9 @@ Castro::check_for_negative_density ()
std::cout << "Invalid X[" << n << "] = " << X << " in zone "
<< i << ", " << j << ", " << k
<< " with density = " << rho << "\n";
#elif defined(ALLOW_GPU_PRINTF)
AMREX_DEVICE_PRINTF("Invalid X[%d] = %g in zone (%d,%d,%d) with density = %g\n",
n, X, i, j, k, rho);
#endif
X_check_failed = 1;
}
Expand All @@ -3310,6 +3313,10 @@ Castro::check_for_negative_density ()
return {rho_check_failed, X_check_failed};
});

#ifdef ALLOW_GPU_PRINTF
std::fflush(nullptr);
#endif

}

ReduceTuple hv = reduce_data.value();
Expand Down