Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

beginner_source/ddp_series_theory.rst 번역 #896

Merged
merged 6 commits into from
Oct 15, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
85 changes: 42 additions & 43 deletions beginner_source/ddp_series_theory.rst
Original file line number Diff line number Diff line change
@@ -1,70 +1,69 @@
`Introduction <ddp_series_intro.html>`__ \|\| **What is DDP** \|\|
`Single-Node Multi-GPU Training <ddp_series_multigpu.html>`__ \|\|
`Fault Tolerance <ddp_series_fault_tolerance.html>`__ \|\|
`Multi-Node training <../intermediate/ddp_series_multinode.html>`__ \|\|
`minGPT Training <../intermediate/ddp_series_minGPT.html>`__
`소개 <ddp_series_intro.html>`__ \|\| **분산 데이터 병렬 처리 (DDP) 란 무엇인가?** \|\|
`단일 노드 다중-GPU 학습 <ddp_series_multigpu.html>`__ \|\|
`결함 내성 <ddp_series_fault_tolerance.html>`__ \|\|
`다중 노드 학습 <../intermediate/ddp_series_multinode.html>`__ \|\|
`minGPT 학습 <../intermediate/ddp_series_minGPT.html>`__

What is Distributed Data Parallel (DDP)
분산 데이터 병렬 처리 (DDP) 란 무엇인가?
=======================================

Authors: `Suraj Subramanian <https://github.com/suraj813>`__
저자: `Suraj Subramanian <https://github.com/suraj813>`__
번역: `박지은 <https://github.com/rumjie>`__

.. grid:: 2

.. grid-item-card:: :octicon:`mortar-board;1em;` What you will learn
.. grid-item-card:: :octicon:`mortar-board;1em;` 이 장에서 배우는 것

* How DDP works under the hood
* What is ``DistributedSampler``
* How gradients are synchronized across GPUs
* DDP 의 내부 작동 원리
* ``DistributedSampler`` 이란 무엇인가?
* GPU 간 변화도가 동기화되는 방법


.. grid-item-card:: :octicon:`list-unordered;1em;` Prerequisites
.. grid-item-card:: :octicon:`list-unordered;1em;` 필요 사항

* Familiarity with `basic non-distributed training <https://tutorials.pytorch.kr/beginner/basics/quickstart_tutorial.html>`__ in PyTorch
* 파이토치 `비분산 학습 <https://tutorials.pytorch.kr/beginner/basics/quickstart_tutorial.html>`__ 에 익숙할 것

Follow along with the video below or on `youtube <https://www.youtube.com/watch/Cvdhwx-OBBo>`__.
아래의 영상이나 `유투브 영상 youtube <https://www.youtube.com/watch/Cvdhwx-OBBo>`__ 을 따라 진행하세요.
rumjie marked this conversation as resolved.
Show resolved Hide resolved

.. raw:: html

<div style="margin-top:10px; margin-bottom:10px;">
<iframe width="560" height="315" src="https://www.youtube.com/embed/Cvdhwx-OBBo" frameborder="0" allow="accelerometer; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
</div>

This tutorial is a gentle introduction to PyTorch `DistributedDataParallel <https://pytorch.org/docs/stable/generated/torch.nn.parallel.DistributedDataParallel.html>`__ (DDP)
which enables data parallel training in PyTorch. Data parallelism is a way to
process multiple data batches across multiple devices simultaneously
to achieve better performance. In PyTorch, the `DistributedSampler <https://pytorch.org/docs/stable/data.html#torch.utils.data.distributed.DistributedSampler>`__
ensures each device gets a non-overlapping input batch. The model is replicated on all the devices;
each replica calculates gradients and simultaneously synchronizes with the others using the `ring all-reduce
algorithm <https://tech.preferred.jp/en/blog/technologies-behind-distributed-deep-learning-allreduce/>`__.
이 튜토리얼은 파이토치에서 분산 데이터 병렬 학습을 가능하게 하는 `분산 데이터 병렬 <https://pytorch.org/docs/stable/generated/torch.nn.parallel.DistributedDataParallel.html>`__ (DDP)
에 대해 소개합니다. 데이터 병렬 처리란 더 높은 성능을 달성하기 위해
여러 개의 디바이스에서 여러 데이터 배치들을 동시에 처리하는 방법입니다.
파이토치에서, `분산 샘플러 <https://pytorch.org/docs/stable/data.html#torch.utils.data.distributed.DistributedSampler>`__
각 디바이스가 서로 다른 입력 배치를 받는 것을 보장합니다.
모델은 모든 디바이스에 복제되며, 각 사본은 변화도를 계산하는 동시에 `Ring-All-Reduce
알고리즘 <https://tech.preferred.jp/en/blog/technologies-behind-distributed-deep-learning-allreduce/>`__ 을 사용해 다른 사본과 동기화됩니다.

This `illustrative tutorial <https://tutorials.pytorch.kr/intermediate/dist_tuto.html#>`__ provides a more in-depth python view of the mechanics of DDP.
`예시 튜토리얼 <https://tutorials.pytorch.kr/intermediate/dist_tuto.html#>`__ 에서 DDP 메커니즘에 대해 파이썬 관점에서 심도 있는 설명을 볼 수 있습니다.

Why you should prefer DDP over ``DataParallel`` (DP)
``데이터 병렬 DataParallel`` (DP) 보다 DDP가 나은 이유
----------------------------------------------------

`DataParallel <https://pytorch.org/docs/stable/generated/torch.nn.DataParallel.html>`__
is an older approach to data parallelism. DP is trivially simple (with just one extra line of code) but it is much less performant.
DDP improves upon the architecture in a few ways:

+---------------------------------------+------------------------------+
| ``DataParallel`` | ``DistributedDataParallel`` |
+=======================================+==============================+
| More overhead; model is replicated | Model is replicated only |
| and destroyed at each forward pass | once |
+---------------------------------------+------------------------------+
| Only supports single-node parallelism | Supports scaling to multiple |
| | machines |
+---------------------------------------+------------------------------+
| Slower; uses multithreading on a | Faster (no GIL contention) |
| single process and runs into Global | because it uses |
| Interpreter Lock (GIL) contention | multiprocessing |
+---------------------------------------+------------------------------+

Further Reading
`DP <https://pytorch.org/docs/stable/generated/torch.nn.DataParallel.html>`__ 는 데이터 병렬 처리의 이전 접근 방식입니다.
DP 는 간단하지만, (한 줄만 추가하면 됨) 성능은 훨씬 떨어집니다. DDP는 아래와 같은 방식으로 아키텍처를 개선합니다.

.. list-table::
:header-rows: 1

* - ``DataParallel``
- ``DistributedDataParallel``
* - 작업 부하가 큼, 전파될 때마다 모델이 복제 및 삭제됨
- 모델이 한 번만 복제됨
* - 단일 노드 병렬 처리만 가능
- 여러 머신으로 확장 가능
* - 느림, 단일 프로세스에서 멀티 스레딩을 사용하기 때문에 Global Interpreter Lock (GIL) 충돌이 발생
- 빠름, 멀티 프로세싱을 사용하기 때문에 GIL 충돌 없음
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ascii로된 table이 list-table형식으로 바꿔두셨는데
혹시 뭔가 이유가 있었을까요?
큰 이유가 없다면 원본 문서 형식을 맞춰두는게 차후 개선을 위해서 편리할거 같습니다

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

제가 잘못 작성했을 수도 있었을 것 같은데, 빌드해서 홈페이지 확인했을 때 테이블이 깨지는 것을 발견해서 형식을 바꿔두었습니다!



읽을거리
---------------

- `Multi-GPU training with DDP <ddp_series_multigpu.html>`__ (next tutorial in this series)
- `Multi-GPU training with DDP <ddp_series_multigpu.html>`__ (이 시리즈의 다음 튜토리얼)
- `DDP
API <https://pytorch.org/docs/stable/generated/torch.nn.parallel.DistributedDataParallel.html>`__
- `DDP Internal
Expand Down