Skip to content

Commit

Permalink
intermediate_source/process_group_cpp_extension_tutorial.rst ๋ฒˆ์—ญ (#764)
Browse files Browse the repository at this point in the history
intermediate_source/tensorboard_profiler_tutorial.py ๋ฒˆ์—ญ
  • Loading branch information
jenner9212 authored Nov 26, 2023
1 parent 139ae22 commit 4144d34
Showing 1 changed file with 55 additions and 91 deletions.
146 changes: 55 additions & 91 deletions intermediate_source/process_group_cpp_extension_tutorial.rst
Original file line number Diff line number Diff line change
@@ -1,78 +1,55 @@
Customize Process Group Backends Using Cpp Extensions
Cpp ํ™•์žฅ์„ ์‚ฌ์šฉํ•˜์—ฌ ํ”„๋กœ์„ธ์Šค ๊ทธ๋ฃน ๋ฐฑ์—”๋“œ ์‚ฌ์šฉ์ž ์ •์˜
=====================================================

**Author**: `Feng Tian <https://github.com/ftian1>`__, `Shen Li <https://mrshenli.github.io/>`__, `Min Si <https://minsii.github.io/>`__

**๋ฒˆ์—ญ**: `๋ฐ•์žฌ์œค <https://github.com/jenner9212>`_

.. note::
|edit| View and edit this tutorial in `github <https://github.com/pytorch/tutorials/blob/main/intermediate_source/process_group_cpp_extension_tutorial.rst>`__.
|edit| ์ด ํŠœํ† ๋ฆฌ์–ผ์˜ ์†Œ์Šค ์ฝ”๋“œ๋Š” `github <https://github.com/pytorch/tutorials/blob/main/intermediate_source/process_group_cpp_extension_tutorial.rst>`__ ์—์„œ ํ™•์ธํ•˜๊ณ  ๋ณ€๊ฒฝํ•ด ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

Prerequisites:
์„ ์ˆ˜๊ณผ๋ชฉ(Prerequisites):

- `PyTorch Distributed Overview <../beginner/dist_overview.html>`__
- `PyTorch Collective Communication Package <https://pytorch.org/docs/stable/distributed.html>`__
- `PyTorch Cpp Extension <https://pytorch.org/docs/stable/cpp_extension.html>`__
- `Writing Distributed Applications with PyTorch <https://tutorials.pytorch.kr/intermediate/dist_tuto.html>`__

This tutorial demonstrates how to implement a custom ``ProcessGroup``
backend and plug that into
`PyTorch distributed package <https://pytorch.org/docs/stable/distributed.html>`__ using
`cpp extensions <https://pytorch.org/docs/stable/cpp_extension.html>`__. This is helpful when you need a specialized software
stack for your hardware, or when you would like to experiment with new
collective communication algorithms.
์ด ํŠœํ† ๋ฆฌ์–ผ์€ `cpp ํ™•์žฅ <https://pytorch.org/docs/stable/cpp_extension.html>`__ ์„ ์‚ฌ์šฉํ•˜์—ฌ ์‚ฌ์šฉ์ž ์ •์˜ ProcessGroup ๋ฐฑ์—”๋“œ๋ฅผ ๊ตฌํ˜„ํ•˜๊ณ  ์ด๋ฅผ `ํŒŒ์ดํ† ์น˜ ๋ถ„์‚ฐ ํŒจํ‚ค์ง€ <https://pytorch.org/docs/stable/distributed.html>`__ ์— ์—ฐ๊ฒฐํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.
์ด๊ฒƒ์€ ํ•˜๋“œ์›จ์–ด์— ํŠนํ™”๋œ ์†Œํ”„ํŠธ์›จ์–ด ์Šคํƒ์ด ํ•„์š”ํ•œ ๊ฒฝ์šฐ๋‚˜ ์ƒˆ๋กœ์šด ์ง‘ํ•ฉ ํ†ต์‹  ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์‹คํ—˜ํ•˜๊ณ ์ž ํ•  ๋•Œ ์œ ์šฉํ•ฉ๋‹ˆ๋‹ค.


Basics
๊ธฐ์ดˆ
------

PyTorch collective communications power several widely adopted distributed
training features, including
`DistributedDataParallel <https://pytorch.org/docs/stable/generated/torch.nn.parallel.DistributedDataParallel.html>`__,
`ZeroRedundancyOptimizer <https://pytorch.org/docs/stable/distributed.optim.html#torch.distributed.optim.ZeroRedundancyOptimizer>`__,
`FullyShardedDataParallel <https://github.com/pytorch/pytorch/blob/master/torch/distributed/_fsdp/fully_sharded_data_parallel.py>`__.
In order to make the same collective communication API work with
different communication backends, the distributed package abstracts collective
communication operations into a
ํŒŒ์ดํ† ์น˜ ์ง‘ํ•ฉ ํ†ต์‹ ์€
`๋ถ„์‚ฐ ๋ฐ์ดํ„ฐ ๋ณ‘๋ ฌ(DistributedDataParallel) <https://pytorch.org/docs/stable/generated/torch.nn.parallel.DistributedDataParallel.html>`__,
`์ œ๋กœ ๋ฆฌ๋˜๋˜์‹œ ์ตœ์ ํ™”๊ธฐ(ZeroRedundancyOptimizer) <https://pytorch.org/docs/stable/distributed.optim.html#torch.distributed.optim.ZeroRedundancyOptimizer>`__,
`์™„์ „ ๊ณต์œ  ๋ฐ์ดํ„ฐ ๋ณ‘๋ ฌ(FullyShardedDataParallel) <https://github.com/pytorch/pytorch/blob/master/torch/distributed/_fsdp/fully_sharded_data_parallel.py>`__ ์„ ํฌํ•จํ•œ ๋„๋ฆฌ ์‚ฌ์šฉ๋˜๋Š” ๋ถ„์‚ฐ ํ›ˆ๋ จ ๊ธฐ๋Šฅ์„ ์ง€์›ํ•ฉ๋‹ˆ๋‹ค.
๋™์ผํ•œ ์ง‘ํ•ฉ ํ†ต์‹  API๋ฅผ ๋‹ค์–‘ํ•œ ํ†ต์‹  ๋ฐฑ์—”๋“œ์—์„œ ์ž‘๋™ํ•˜๋„๋ก ํ•˜๊ธฐ ์œ„ํ•ด ๋ถ„์‚ฐ ํŒจํ‚ค์ง€๋Š” ์ง‘ํ•ฉ ํ†ต์‹  ์ž‘์—…์„
`ProcessGroup <https://github.com/pytorch/pytorch/blob/release/1.10/torch/csrc/distributed/c10d/ProcessGroup.hpp>`__
class. Different backends can
then be implemented as subclasses of ``ProcessGroup`` using preferred
third-party libraries. PyTorch distributed comes with three default backends,
``ProcessGroupNCCL``, ``ProcessGroupGloo``, and ``ProcessGroupMPI``. However,
beyond these three backends, there are also other communication libraries
(e.g., `UCC <https://github.com/openucx/ucc>`__,
`OneCCL <https://github.com/oneapi-src/oneCCL>`__), different types of hardware
(e.g., `TPU <https://cloud.google.com/tpu>`__,
`Trainum <https://aws.amazon.com/machine-learning/trainium/>`__), and emerging
communication algorithms (e.g.,
`Herring <https://www.amazon.science/publications/herring-rethinking-the-parameter-server-at-scale-for-the-cloud>`__,
`Reduction Server <https://cloud.google.com/blog/topics/developers-practitioners/optimize-training-performance-reduction-server-vertex-ai>`__).
Therefore, the distributed package exposes extension APIs to allow customizing
collective communication backends.


The 4 steps below show how to implement a dummy ``ProcessGroup`` backend
and use that in Python application code. Please note that this tutorial focuses
on demonstrating the extension APIs, instead of developing a functioning
communication backend. Hence, the ``dummy`` backend just covers a subset of the
APIs (``all_reduce`` and ``all_gather``), and simply sets the values of tensors
to 0.


Step 1: Implement a Subclass of ``ProcessGroup``
ํด๋ž˜์Šค๋กœ ์ถ”์ƒํ™”ํ•ฉ๋‹ˆ๋‹ค. ์ดํ›„์—๋Š” ์›ํ•˜๋Š” ์„œ๋“œํŒŒํ‹ฐ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ``ProcessGroup`` ์˜ ํ•˜์œ„ ํด๋ž˜์Šค๋กœ ๋‹ค์–‘ํ•œ ๋ฐฑ์—”๋“œ๋ฅผ ๊ตฌํ˜„ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
ํŒŒ์ดํ† ์น˜ ๋ถ„์‚ฐ์—๋Š” ์„ธ ๊ฐ€์ง€ ๊ธฐ๋ณธ ๋ฐฑ์—”๋“œ์ธ ``ProcessGroupNCCL``, ``ProcessGroupGloo``, ๊ทธ๋ฆฌ๊ณ  ``ProcessGroupMPI`` ๊ฐ€ ํฌํ•จ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.
๊ทธ๋Ÿฌ๋‚˜ ์ด ์„ธ ๊ฐ€์ง€ ๋ฐฑ์—”๋“œ ์™ธ์—๋„ ๋‹ค๋ฅธ ํ†ต์‹  ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ(์˜ˆ: `UCC <https://github.com/openucx/ucc>`__, `OneCCL <https://github.com/oneapi-src/oneCCL>`__), ๋‹ค๋ฅธ ์œ ํ˜•์˜ ํ•˜๋“œ์›จ์–ด(์˜ˆ: `TPU <https://cloud.google.com/tpu>`__, `Trainum <https://aws.amazon.com/machine-learning/trainium/>`__),
๊ทธ๋ฆฌ๊ณ  ์ƒˆ๋กœ์šด ํ†ต์‹  ์•Œ๊ณ ๋ฆฌ์ฆ˜(์˜ˆ: `Herring <https://www.amazon.science/publications/herring-rethinking-the-parameter-server-at-scale-for-the-cloud>`__, `Reduction Server <https://cloud.google.com/blog/topics/developers-practitioners/optimize-training-performance-reduction-server-vertex-ai>`__)๋„ ์žˆ์Šต๋‹ˆ๋‹ค.
๋”ฐ๋ผ์„œ ๋ถ„์‚ฐ ํŒจํ‚ค์ง€๋Š” ์ง‘ํ•ฉ ํ†ต์‹  ๋ฐฑ์—”๋“œ๋ฅผ ์‚ฌ์šฉ์ž ์ง€์ •ํ•  ์ˆ˜ ์žˆ๋„๋ก ํ™•์žฅ API๋ฅผ ๋…ธ์ถœํ•ฉ๋‹ˆ๋‹ค.


์•„๋ž˜์˜ 4๋‹จ๊ณ„๋Š” ๋”๋ฏธ ``ProcessGroup`` ๋ฐฑ์—”๋“œ๋ฅผ ๊ตฌํ˜„ํ•˜๊ณ  ํŒŒ์ด์ฌ ์‘์šฉ ํ”„๋กœ๊ทธ๋žจ ์ฝ”๋“œ์—์„œ ์‚ฌ์šฉํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.
์ด ํŠœํ† ๋ฆฌ์–ผ์€ ์ž‘๋™ํ•˜๋Š” ํ†ต์‹  ๋ฐฑ์—”๋“œ๋ฅผ ๊ฐœ๋ฐœํ•˜๋Š” ๋Œ€์‹  ํ™•์žฅ API๋ฅผ ์„ค๋ช…ํ•˜๋Š” ๋ฐ ์ค‘์ ์„ ๋‘ก๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ``dummy`` ๋ฐฑ์—”๋“œ๋Š” API์˜ ์ผ๋ถ€ (``all_reduce`` ๋ฐ ``all_gather``)๋ฅผ ๋‹ค๋ฃจ๋ฉฐ tensor์˜ ๊ฐ’์„ ๋‹จ์ˆœํžˆ 0์œผ๋กœ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค.


๋‹จ๊ณ„ 1: ``ProcessGroup`` ์˜ ํ•˜์œ„ ํด๋ž˜์Šค ๊ตฌํ˜„
------------------------------------------------

This first step is to implement a ``ProcessGroup`` subclass that overrides
target collective communication APIs and runs the custom communication algorithm.
The extension also needs to implement a ``Work`` subclass, which
serves as a future of communication results and allows asynchronous execution in
application code. If the extension uses third-party libraries, it can
include the headers and call into the library APIs from the ``ProcessGroupDummy``
subclass. The two code snippets below present the implementation of ``dummy.h`` and
``dummy.cpp``. See the `dummy collectives <https://github.com/mrshenli/dummy_collectives>`__
repository for the full implementation.
์ฒซ ๋ฒˆ์งธ ๋‹จ๊ณ„๋Š” ๋Œ€์ƒ ์ง‘ํ•ฉ ํ†ต์‹  API๋ฅผ ์žฌ์ •์˜ํ•˜๊ณ  ์‚ฌ์šฉ์ž ์ •์˜ ํ†ต์‹  ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์‹คํ–‰ํ•˜๋Š” ``ProcessGroup`` ํ•˜์œ„ ํด๋ž˜์Šค๋ฅผ ๊ตฌํ˜„ํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.
ํ™•์žฅ ๊ธฐ๋Šฅ์€ ๋ฏธ๋ž˜(future) ํ†ต์‹  ๊ฒฐ๊ณผ๋ฅผ ์ œ๊ณตํ•˜๋Š” ``Work`` ํ•˜์œ„ ํด๋ž˜์Šค๋ฅผ ๊ตฌํ˜„ํ•ด์•ผ ํ•˜๋ฉฐ, ์ด๋Š” ์‘์šฉ ํ”„๋กœ๊ทธ๋žจ ์ฝ”๋“œ์—์„œ ๋น„๋™๊ธฐ ์‹คํ–‰์„ ํ—ˆ์šฉํ•ฉ๋‹ˆ๋‹ค.
ํ™•์žฅ ๊ธฐ๋Šฅ์ด ์„œ๋“œํŒŒํ‹ฐ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒฝ์šฐ, ํ•ด๋‹น ํ™•์žฅ ๊ธฐ๋Šฅ์€ ``ProcessGroupDummy`` ํ•˜์œ„ ํด๋ž˜์Šค์—์„œ ํ—ค๋”๋ฅผ ํฌํ•จํ•˜๊ณ  ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ API๋ฅผ ํ˜ธ์ถœํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
์•„๋ž˜์˜ ๋‘ ์ฝ”๋“œ๋Š” ``dummy.h`` ๋ฐ ``dummy.cpp`` ์˜ ๊ตฌํ˜„์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค. ์ „์ฒด ๊ตฌํ˜„์€ `๋”๋ฏธ ์ง‘ํ•ฉ(dummy collectives) <https://github.com/mrshenli/dummy_collectives>`__ ์ €์žฅ์†Œ์—์„œ ํ™•์ธํ•˜์‹ค ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

.. code-block:: cpp
// file name: dummy.hpp
// ํŒŒ์ผ ์ด๋ฆ„: dummy.hpp
#include <torch/python.h>
#include <torch/csrc/distributed/c10d/ProcessGroup.hpp>
Expand All @@ -98,8 +75,8 @@ repository for the full implementation.
std::vector<at::Tensor>& tensors,
const AllreduceOptions& opts = AllreduceOptions()) override;
// The collective communication APIs without a custom implementation
// will error out if invoked by application code.
// ์‚ฌ์šฉ์ž ์ •์˜ ๊ตฌํ˜„์ด ์—†๋Š” ์ƒํƒœ์—์„œ์˜ ์ง‘ํ•ฉ ํ†ต์‹  API๋Š”
// ์‘์šฉ ํ”„๋กœ๊ทธ๋žจ ์ฝ”๋“œ์—์„œ ํ˜ธ์ถœ๋˜๋ฉด ์˜ค๋ฅ˜๊ฐ€ ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค.
};
class WorkDummy : public Work {
Expand All @@ -108,12 +85,11 @@ repository for the full implementation.
OpType opType,
c10::intrusive_ptr<c10::ivalue::Future> future) // future of the output
: Work(
-1, // rank, only used by recvAnySource, irrelevant in this demo
-1, // ๋žญํฌ, recvAnySource์—์„œ๋งŒ ์‚ฌ์šฉ๋˜๋ฉฐ ์ด ๋ฐ๋ชจ์—์„œ๋Š” ๊ด€๋ จ์ด ์—†์Šต๋‹ˆ๋‹ค.
opType),
future_(std::move(future)) {}
// There are several additional helper functions that need to be
// implemented. Please refer to https://github.com/mrshenli/dummy_collectives
// for the full implementation.
// ์ถ”๊ฐ€์ ์œผ๋กœ ๊ตฌํ˜„ํ•ด์•ผ ํ•˜๋Š” ์—ฌ๋Ÿฌ ๋„์šฐ๋ฏธ ํ•จ์ˆ˜๋“ค์ด ์žˆ์Šต๋‹ˆ๋‹ค.
// ์ „์ฒด ๊ตฌํ˜„์— ๋Œ€ํ•œ ์ž์„ธํ•œ ๋‚ด์šฉ์€ https://github.com/mrshenli/dummy_collectives ๋ฅผ ์ฐธ์กฐํ•˜์„ธ์š”.
private:
c10::intrusive_ptr<c10::ivalue::Future> future_;
Expand All @@ -123,13 +99,13 @@ repository for the full implementation.
.. code-block:: cpp
// file name: dummy.cpp
// ํŒŒ์ผ ์ด๋ฆ„: dummy.cpp
#include "dummy.hpp"
namespace c10d {
// This is a dummy allgather that sets all output tensors to zero
// Modify the implementation to conduct real communication asynchronously
// ์ด๊ฒƒ์€ ๋ชจ๋“  ์ถœ๋ ฅ tensor๋ฅผ 0์œผ๋กœ ์„ค์ •ํ•˜๋Š” ๋”๋ฏธ allgather์ž…๋‹ˆ๋‹ค.
// ์‹ค์ œ ํ†ต์‹ ์„ ๋น„๋™๊ธฐ์ ์œผ๋กœ ์ˆ˜ํ–‰ํ•˜๋„๋ก ๊ตฌํ˜„์„ ์ˆ˜์ •ํ•˜์„ธ์š”.
c10::intrusive_ptr<Work> ProcessGroupDummy::allgather(
std::vector<std::vector<at::Tensor>>& outputTensors,
std::vector<at::Tensor>& inputTensors,
Expand All @@ -146,8 +122,8 @@ repository for the full implementation.
return c10::make_intrusive<WorkDummy>(OpType::ALLGATHER, std::move(future));
}
// This is a dummy allreduce that sets all output tensors to zero
// Modify the implementation to conduct real communication asynchronously
// ์ด๊ฒƒ์€ ๋ชจ๋“  ์ถœ๋ ฅ tensor๋ฅผ 0์œผ๋กœ ์„ค์ •ํ•˜๋Š” ๋”๋ฏธ allgather์ž…๋‹ˆ๋‹ค.
// ์‹ค์ œ ํ†ต์‹ ์„ ๋น„๋™๊ธฐ์ ์œผ๋กœ ์ˆ˜ํ–‰ํ•˜๋„๋ก ๊ตฌํ˜„์„ ์ˆ˜์ •ํ•˜์„ธ์š”.
c10::intrusive_ptr<Work> ProcessGroupDummy::allreduce(
std::vector<at::Tensor>& tensors,
const AllreduceOptions& opts) {
Expand All @@ -162,17 +138,14 @@ repository for the full implementation.
}
} // namespace c10d
Step 2: Expose The Extension Python APIs
๋‹จ๊ณ„ 2: ํ™•์žฅ ํŒŒ์ด์ฌ API ๋…ธ์ถœ
----------------------------------------

The backend constructors are called
`from Python side <https://github.com/pytorch/pytorch/blob/v1.9.0/torch/distributed/distributed_c10d.py#L643-L650>`__,
so the extension also needs to expose the constructor APIs to Python. This can
be done by adding the following methods. In this example, ``store`` and
``timeout`` are ignored by the ``ProcessGroupDummy`` instantiation method, as
those are not used in this dummy implementation. However, real-world extensions
should consider using the ``store`` to perform rendezvous and supporting the
``timeout`` argument.
๋ฐฑ์—”๋“œ ์ƒ์„ฑ์ž๋Š” `ํŒŒ์ด์ฌ ์ธก <https://github.com/pytorch/pytorch/blob/v1.9.0/torch/distributed/distributed_c10d.py#L643-L650>`__ ์—์„œ
ํ˜ธ์ถœ๋˜๋ฏ€๋กœ ํ™•์žฅ ๊ธฐ๋Šฅ๋„ ํŒŒ์ด์ฌ์— ์ƒ์„ฑ์ž API๋ฅผ ๋…ธ์ถœํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.
๋‹ค์Œ ๋ฉ”์„œ๋“œ๋ฅผ ์ถ”๊ฐ€ํ•จ์œผ๋กœ์จ ์ด ์ž‘์—…์„ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
์ด ์˜ˆ์ œ์—์„œ๋Š” ``store`` ์™€ ``timeout`` ์ด ์‚ฌ์šฉ๋˜์ง€ ์•Š์œผ๋ฏ€๋กœ ``ProcessGroupDummy`` ์ธ์Šคํ„ด์Šคํ™” ๋ฉ”์„œ๋“œ์—์„œ ๋ฌด์‹œ๋ฉ๋‹ˆ๋‹ค.
๊ทธ๋Ÿฌ๋‚˜ ์‹ค์ œ ํ™•์žฅ ๊ธฐ๋Šฅ์€ ๋ž‘๋ฐ๋ทฐ๋ฅผ ์ˆ˜ํ–‰ํ•˜๊ณ  ``timeout`` ์ธ์ˆ˜๋ฅผ ์ง€์›ํ•˜๊ธฐ ์œ„ํ•ด ``store`` ์‚ฌ์šฉ์„ ๊ณ ๋ คํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

.. code-block:: cpp
Expand All @@ -187,8 +160,7 @@ should consider using the ``store`` to perform rendezvous and supporting the
py::object module = py::module::import("torch.distributed");
py::object register_backend =
module.attr("Backend").attr("register_backend");
// torch.distributed.Backend.register_backend will add `dummy` as a
// new valid backend.
// torch.distributed.Backend.register_backend๋Š” '๋”๋ฏธ'๋ฅผ ์ƒˆ๋กœ์šด ์œ ํšจํ•œ ๋ฐฑ์—”๋“œ๋กœ ์ถ”๊ฐ€ํ•ฉ๋‹ˆ๋‹ค.
register_backend("dummy", py::cpp_function(createProcessGroupDummy));
}
}
Expand All @@ -208,22 +180,17 @@ should consider using the ``store`` to perform rendezvous and supporting the
}
Step 3: Build The Custom Extension
๋‹จ๊ณ„ 3: ์‚ฌ์šฉ์ž ์ •์˜ ํ™•์žฅ ๋นŒ๋“œ
----------------------------------

Now, the extension source code files are ready. We can then use
`cpp extensions <https://pytorch.org/docs/stable/cpp_extension.html>`__
to build it. To do that, create a ``setup.py`` file that prepares the paths and
commands. Then call ``python setup.py install`` to install the extension.
์ด์ œ ํ™•์žฅ ์†Œ์Šค ์ฝ”๋“œ ํŒŒ์ผ์ด ์ค€๋น„๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ๊ทธ๋Ÿฐ ๋‹ค์Œ `cpp ํ™•์žฅ <https://pytorch.org/docs/stable/cpp_extension.html>`__ ์„ ์‚ฌ์šฉํ•˜์—ฌ ๋นŒ๋“œํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
์ด๋ฅผ ์œ„ํ•ด ๊ฒฝ๋กœ์™€ ๋ช…๋ น์„ ์ค€๋น„ํ•˜๋Š” ``setup.py`` ํŒŒ์ผ์„ ์ƒ์„ฑํ•˜๊ณ , ``python setup.py install`` ์„ ํ˜ธ์ถœํ•˜์—ฌ ํ™•์žฅ์„ ์„ค์น˜ํ•ฉ๋‹ˆ๋‹ค.

If the extension depends on third-party libraries, you can also specify
``libraries_dirs`` and ``libraries`` to the cpp extension APIs. See the
`torch ucc <https://github.com/openucx/torch-ucc>`__
project as a real-world example.
ํ™•์žฅ์ด ์„œ๋“œํŒŒํ‹ฐ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์— ์˜์กดํ•˜๋Š” ๊ฒฝ์šฐ, cpp ํ™•์žฅ API์— ``libraries_dirs`` ๋ฐ ``libraries`` ์ง€์ •ํ•  ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค. ์‹ค์ œ ์˜ˆ์ œ๋กœ `torch ucc <https://github.com/openucx/torch-ucc>`__ ํ”„๋กœ์ ํŠธ๋ฅผ ์ฐธ์กฐํ•˜์‹ญ์‹œ์˜ค.

.. code-block:: python
# file name: setup.py
# ํŒŒ์ผ ์ด๋ฆ„: setup.py
import os
import sys
import torch
Expand Down Expand Up @@ -253,20 +220,17 @@ project as a real-world example.
cmdclass={'build_ext': cpp_extension.BuildExtension}
)
Step 4: Use The Extension in Application
๋‹จ๊ณ„ 4: ์‘์šฉ ํ”„๋กœ๊ทธ๋žจ์—์„œ ํ™•์žฅ ๊ธฐ๋Šฅ ์‚ฌ์šฉ
----------------------------------------

After installation, you can conveniently use the ``dummy`` backend when calling
`init_process_group <https://pytorch.org/docs/stable/distributed.html#torch.distributed.init_process_group>`__
as if it is an builtin backend.
์„ค์น˜ ํ›„ `init_process_group <https://pytorch.org/docs/stable/distributed.html#torch.distributed.init_process_group>`__ ์„ ํ˜ธ์ถœํ•  ๋•Œ ``๋”๋ฏธ`` ๋ฐฑ์—”๋“œ๋ฅผ ๋‚ด์žฅ๋œ ๋ฐฑ์—”๋“œ์ฒ˜๋Ÿผ ํŽธ๋ฆฌํ•˜๊ฒŒ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

.. code-block:: python
import os
import torch
# importing dummy_collectives makes torch.distributed recognize `dummy`
# as a valid backend.
# dummy_collectives๋ฅผ importํ•˜๋ฉด torch.distributed๊ฐ€ `๋”๋ฏธ`๋ฅผ ์œ ํšจํ•œ ๋ฐฑ์—”๋“œ๋กœ ์ธ์‹ํ•ฉ๋‹ˆ๋‹ค.
import dummy_collectives
import torch.distributed as dist
Expand Down

0 comments on commit 4144d34

Please sign in to comment.