Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add expected behavior to unified memory #3617

Open
wants to merge 6 commits into
base: hip_runtime_api_how-to
Choose a base branch
from

Conversation

matyas-streamhpc
Copy link

No description provided.

@matyas-streamhpc matyas-streamhpc self-assigned this Sep 19, 2024
@neon60 neon60 marked this pull request as draft September 19, 2024 09:43
@neon60 neon60 force-pushed the update-unified-memory branch 3 times, most recently from 7e27ba9 to 84f9097 Compare September 29, 2024 17:13
@neon60 neon60 force-pushed the update-unified-memory branch 4 times, most recently from 934e4d3 to 4baf272 Compare October 9, 2024 09:59
@neon60 neon60 force-pushed the update-unified-memory branch 3 times, most recently from 546b8d9 to 6244fc2 Compare October 15, 2024 05:37
@neon60 neon60 changed the base branch from docs/develop to hip_runtime_api_how-to October 15, 2024 08:23
@neon60 neon60 marked this pull request as ready for review October 15, 2024 08:23
@neon60 neon60 requested a review from MKKnorr October 15, 2024 08:24
@neon60 neon60 changed the base branch from hip_runtime_api_how-to to docs/develop October 31, 2024 14:30
********************************************************************************

The HIP runtime API provides C and C++ functionalities to manage event, stream,
and memory on GPUs. On AMD ROCm software, the HIP runtime uses :doc:`Common

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
and memory on GPUs. On AMD ROCm software, the HIP runtime uses :doc:`Common
and memory on GPUs. On AMD ROCm software, the HIP runtime uses :doc:`Compute

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently CLR is supposed to be Compute Language Runtime

The runtime offers functions for allocating, freeing, and copying device memory,
along with transferring data between host and device memory.

Here are the various memory management techniques:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like these techniques could use a brief description?

.. figure:: ../../../data/how-to/hip_runtime_api/memory_management/pageable_pinned.svg

The pageable and pinned memory allow you to exercise direct control over
memory operations, which is known as explicit memory management. When using the

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In memory_management.rst we list four memory management techniques, none of which is "explicit memory management". Is it possible to relate this explicit memory management to one of the four techniques? Or should it be listed as one of the techniques?

// Run the kernel
// ...

HIP_CHECK(hipMemcpy(device_input, host_input, element_number * sizeof(int), hipMemcpyHostToDevice));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem right to me. We have device_output and host_output, shouldn't those come into play here?

Pinned memory
================================================================================

Pinned memory or page-locked memory is stored in pages that are locked in specific sectors in RAM and can't be migrated. The pointer can be used on both

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like we should note that pinned memory is allocated by hipMalloc() and hipHostMalloc().

// Run the kernel
// ...

HIP_CHECK(hipMemcpy(device_input, host_input, element_number * sizeof(int), hipMemcpyHostToDevice));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be DeviceToHost?

as HBM2e. Device memory can be allocated as global memory, constant, texture or
surface memory.

Global memory

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest making this a bulleted list rather than four separate subheads.

:sup:`1` The :cpp:func:`hipHostMalloc` memory allocation coherence mode can be
affected by the ``HIP_HOST_COHERENT`` environment variable, if the
``hipHostMallocCoherent``, ``hipHostMallocNonCoherent``, and
``hipHostMallocMapped`` are unset. If neither these flags nor the

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here the footnote to the table references hipHostMallocMapped flag, but the table above does not show this flag. This also is mentioned in the following note.

Unified memory can simplify the complexities of memory management in GPU
computing, by not requiring explicit copies between the host and the devices. It
can be particularly useful in use cases with sparse memory accesses from both
the CPU and the GPU, as not the whole memory region needs to be transferred to
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
the CPU and the GPU, as not the whole memory region needs to be transferred to
the CPU and the GPU, as the whole memory region does not need to be transferred to

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a logical difference between "not the whole memory region needs to be transferred" and "the whole memory region does not need to be transferred"

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the meaning of the sentence?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed the sentence to make it easier to understand.

functions. To unmap the memory, use :cpp:func:`hipMemUnmap`. To release the
virtual address range, use :cpp:func:`hipMemAddressFree`. Finally, to release
the physical memory, use :cpp:func:`hipMemRelease`. A side effect of these
functions is the lack of synchronization when memory is released. If you call
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you call :cpp:func:hipFree when you have multiple streams running in parallel, it
synchronizes the device. This causes worse resource usage and performance.

This seems like it could be an important note?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On line 24: - Learning curve: Requires additional effort to understand and utilize SOMA effectively.

Copy link

@randyh62 randyh62 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added comments

@neon60 neon60 changed the base branch from docs/develop to hip_runtime_api_how-to November 5, 2024 07:57
@neon60 neon60 force-pushed the hip_runtime_api_how-to branch 2 times, most recently from eb41cb1 to 7cb026c Compare November 7, 2024 12:00
Copy link

@randyh62 randyh62 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants