Skip to content

Commit

Permalink
fix typo
Browse files Browse the repository at this point in the history
  • Loading branch information
9bow committed Jun 15, 2024
1 parent 596eaf9 commit ae8dc50
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions _posts/2023-04-19-accelerating-large-language-models.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ discuss_id: 1417

> <small style="line-height: 1.1">***Figure 1:** The Transformer model architecture based on [β€œAttention is All You Need”](https://arxiv.org/abs/1706.03762). With the new PyTorch SDPA operator, Multi-Head Attention is efficiently implemented by a linear layer for the in-projection, the SDPA operator, and a linear layer for the out-projection.*</small>
μƒˆλ‘œμš΄ scaled_dot_product_attention μ—°μ‚°μžλ₯Ό μ‚¬μš©ν•˜λ©΄ μ„ ν˜• λ ˆμ΄μ–΄λ₯Ό μ‚¬μš©ν•œ in-projection, SDPA, μ„ ν˜• λ ˆμ΄μ–΄λ₯Ό μ‚¬μš©ν•œ out-projection의 3λ‹¨κ³„λ§ŒμœΌλ‘œ λ©€ν‹°ν—€λ“œ μ–΄ν…μ…˜ κΈ°λŠ₯을 κ΅¬ν˜„ν•  수 μžˆμŠ΅λ‹ˆλ‹€.
μƒˆλ‘œμš΄ scaled_dot_product_attention μ—°μ‚°μžλ₯Ό μ‚¬μš©ν•˜λ©΄ μ„ ν˜• λ ˆμ΄μ–΄λ₯Ό μ‚¬μš©ν•œ in-projection, SDPA, μ„ ν˜• λ ˆμ΄μ–΄λ₯Ό μ‚¬μš©ν•œ out-projection의 3λ‹¨κ³„λ§ŒμœΌλ‘œ λ©€ν‹°ν—€λ“œ μ–΄ν…μ…˜ κΈ°λŠ₯을 κ΅¬ν˜„ν•  수 μžˆμŠ΅λ‹ˆλ‹€.
> With the new scaled_dot_product_attention operator, multihead attention can be implemented in just 3 steps: in projection with a linear layer, SDPA, and out projection with a linear layer.
```
Expand Down Expand Up @@ -75,7 +75,7 @@ PyTorch 2λŠ” νŠΉμ • μš”κ΅¬μ‚¬ν•­μ— 따라 νŠΉμ • μ‚¬μš© 사둀에 μ΅œμ ν™”λœ
SDPA μ—°μ‚°μžλŠ” GPT λͺ¨λΈμ˜ 핡심 ꡬ성 μš”μ†Œμ΄κΈ° λ•Œλ¬Έμ— μ˜€ν”ˆμ†ŒμŠ€ nanoGPT λͺ¨λΈμ΄ κ΅¬ν˜„μ˜ μš©μ΄μ„±κ³Ό PyTorch 2.0의 κ°€μ†ν™”λœ 트랜슀포머의 이점을 λͺ¨λ‘ μž…μ¦ν•  수 μžˆλŠ” ν›Œλ₯­ν•œ 후보라고 νŒλ‹¨ν–ˆμŠ΅λ‹ˆλ‹€. λ‹€μŒμ€ nanoGPTμ—μ„œ κ°€μ†ν™”λœ νŠΈλžœμŠ€ν¬λ¨Έκ°€ ν™œμ„±ν™”λœ μ •ν™•ν•œ ν”„λ‘œμ„ΈμŠ€λ₯Ό λ³΄μ—¬μ€λ‹ˆλ‹€.
> The SDPA operator being a critical component of the GPT model, we identified the open source nanoGPT model as an excellent candidate for both demonstrating the ease of implementation and benefits of PyTorch 2.0’s Accelerated Transformers. The following demonstrates the exact process by which Accelerated Transformers was enabled on nanoGPT.
이 ν”„λ‘œμ„ΈμŠ€λŠ” 크게 κΈ°μ‘΄ SDPA κ΅¬ν˜„μ„ [functional.py](https://github.com/pytorch/pytorch/blob/df14650f0b14b80db132b0c1797dc595fbee1054/torch/nn/functional.py#L4834)μ—μ„œ μƒˆλ‘œ μΆ”κ°€λœ F.scaled_dot_product_attention μ—°μ‚°μžλ‘œ λŒ€μ²΄ν•˜λŠ” 것을 μ€‘μ‹¬μœΌλ‘œ μ§„ν–‰λ©λ‹ˆλ‹€. 이 ν”„λ‘œμ„ΈμŠ€λŠ” λ‹€λ₯Έ λ§Žμ€ LLMμ—μ„œ 이 μ—°μ‚°μžλ₯Ό ν™œμ„±ν™”ν•˜λ„λ‘ μ‰½κ²Œ μ‘°μ •ν•  수 μžˆμŠ΅λ‹ˆλ‹€. λ˜λŠ” μ‚¬μš©μžκ°€ F.multi_head_attention_forward()λ₯Ό ν˜ΈμΆœν•˜κ±°λ‚˜ ν•΄λ‹Ήλ˜λŠ” 경우 nn.MultiHeadAttention λͺ¨λ“ˆμ„ 직접 ν™œμš©ν•  수 μžˆμŠ΅λ‹ˆλ‹€. λ‹€μŒ μ½”λ“œ μŠ€λ‹ˆνŽ«μ€ Karpathy의 nanoGPT μ €μž₯μ†Œμ—μ„œ κ°€μ Έμ˜¨ κ²ƒμž…λ‹ˆλ‹€.
이 ν”„λ‘œμ„ΈμŠ€λŠ” 크게 κΈ°μ‘΄ SDPA κ΅¬ν˜„μ„ [functional.py](https://github.com/pytorch/pytorch/blob/df14650f0b14b80db132b0c1797dc595fbee1054/torch/nn/functional.py#L4834)μ—μ„œ μƒˆλ‘œ μΆ”κ°€λœ F.scaled_dot_product_attention μ—°μ‚°μžλ‘œ λŒ€μ²΄ν•˜λŠ” 것을 μ€‘μ‹¬μœΌλ‘œ μ§„ν–‰λ©λ‹ˆλ‹€. 이 ν”„λ‘œμ„ΈμŠ€λŠ” λ‹€λ₯Έ λ§Žμ€ LLMμ—μ„œ 이 μ—°μ‚°μžλ₯Ό ν™œμ„±ν™”ν•˜λ„λ‘ μ‰½κ²Œ μ‘°μ •ν•  수 μžˆμŠ΅λ‹ˆλ‹€. λ˜λŠ” μ‚¬μš©μžκ°€ F.multi_head_attention_forward()λ₯Ό ν˜ΈμΆœν•˜κ±°λ‚˜ ν•΄λ‹Ήλ˜λŠ” 경우 nn.MultiHeadAttention λͺ¨λ“ˆμ„ 직접 ν™œμš©ν•  수 μžˆμŠ΅λ‹ˆλ‹€. λ‹€μŒ μ½”λ“œ μŠ€λ‹ˆνŽ«μ€ Karpathy의 nanoGPT μ €μž₯μ†Œμ—μ„œ κ°€μ Έμ˜¨ κ²ƒμž…λ‹ˆλ‹€.
> This process largely revolves around replacing the existing SDPA implementation with the newly added F.scaled_dot_product_attention operator from [functional.py](https://github.com/pytorch/pytorch/blob/df14650f0b14b80db132b0c1797dc595fbee1054/torch/nn/functional.py#L4834). This process can be easily adapted to enable the operator in many other LLMs. Alternatively, users can instead choose to call F.multi_head_attention_forward() or utilize the nn.MultiHeadAttention module directly where applicable. The following code snippets are adapted from Karpathy’s nanoGPT repository.
### 1단계: κΈ°μ‘΄ SDPA κ΅¬ν˜„ μ‹λ³„ν•˜κΈ° / Step 1: Identify the existing SDPA implementation
Expand Down

1 comment on commit ae8dc50

@9bow
Copy link
Member Author

@9bow 9bow commented on ae8dc50 Jun 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feed.xml의 PCDATA(Parsed Character Data) 이슈 해결을 μœ„ν•œ μ»€λ°‹μž…λ‹ˆλ‹€.

image

Please sign in to comment.