Skip to content

Commit

Permalink
beginner_source/nlp/sequence_models_tutorial.py λ²ˆμ—­ (#780)
Browse files Browse the repository at this point in the history
* beginner_source/nlp/sequence_models_tutorial.py λ²ˆμ—­
  • Loading branch information
convin305 committed Nov 26, 2023
1 parent 0970896 commit 4658d55
Showing 1 changed file with 122 additions and 121 deletions.
243 changes: 122 additions & 121 deletions beginner_source/nlp/sequence_models_tutorial.py
Original file line number Diff line number Diff line change
@@ -1,37 +1,37 @@
# -*- coding: utf-8 -*-
r"""
Sequence Models and Long Short-Term Memory Networks
μ‹œν€€μŠ€ λͺ¨λΈκ³Ό LSTM λ„€νŠΈμ›Œν¬
===================================================
**λ²ˆμ—­**: `λ°•μˆ˜λ―Ό <https://github.com/convin305>`_
At this point, we have seen various feed-forward networks. That is,
there is no state maintained by the network at all. This might not be
the behavior we want. Sequence models are central to NLP: they are
models where there is some sort of dependence through time between your
inputs. The classical example of a sequence model is the Hidden Markov
Model for part-of-speech tagging. Another example is the conditional
random field.
A recurrent neural network is a network that maintains some kind of
state. For example, its output could be used as part of the next input,
so that information can propagate along as the network passes over the
sequence. In the case of an LSTM, for each element in the sequence,
there is a corresponding *hidden state* :math:`h_t`, which in principle
can contain information from arbitrary points earlier in the sequence.
We can use the hidden state to predict words in a language model,
part-of-speech tags, and a myriad of other things.
LSTMs in Pytorch
μ§€κΈˆκΉŒμ§€ μš°λ¦¬λŠ” λ‹€μ–‘ν•œ μˆœμ „νŒŒ(feed-forward) 신경망듀을 보아 μ™”μŠ΅λ‹ˆλ‹€.
즉, λ„€νŠΈμ›Œν¬μ— μ˜ν•΄ μœ μ§€λ˜λŠ” μƒνƒœκ°€ μ „ν˜€ μ—†λ‹€λŠ” κ²ƒμž…λ‹ˆλ‹€.
이것은 μ•„λ§ˆ μš°λ¦¬κ°€ μ›ν•˜λŠ” λ™μž‘μ΄ 아닐 μˆ˜λ„ μžˆμŠ΅λ‹ˆλ‹€.
μ‹œν€€μŠ€ λͺ¨λΈμ€ NLP의 ν•΅μ‹¬μž…λ‹ˆλ‹€. μ΄λŠ” μž…λ ₯ 간에 μΌμ’…μ˜ μ‹œκ°„μ  쒅속성이 μ‘΄μž¬ν•˜λŠ” λͺ¨λΈμ„ λ§ν•©λ‹ˆλ‹€.
μ‹œν€€μŠ€ λͺ¨λΈμ˜ 고전적인 μ˜ˆλŠ” ν’ˆμ‚¬ νƒœκΉ…μ„ μœ„ν•œ νžˆλ“  마λ₯΄μ½”ν”„ λͺ¨λΈμž…λ‹ˆλ‹€.
또 λ‹€λ₯Έ μ˜ˆλŠ” 쑰건뢀 랜덀 ν•„λ“œμž…λ‹ˆλ‹€.
μˆœν™˜ 신경망은 μΌμ’…μ˜ μƒνƒœλ₯Ό μœ μ§€ν•˜λŠ” λ„€νŠΈμ›Œν¬μž…λ‹ˆλ‹€.
예λ₯Ό λ“€λ©΄, 좜λ ₯은 λ‹€μŒ μž…λ ₯의 μΌλΆ€λ‘œ μ‚¬μš©λ  수 μžˆμŠ΅λ‹ˆλ‹€.
μ •λ³΄λŠ” λ„€νŠΈμ›Œν¬κ°€ μ‹œν€€μŠ€λ₯Ό 톡과할 λ•Œ μ „νŒŒλ  수 μžˆμŠ΅λ‹ˆλ‹€.
LSTM의 κ²½μš°μ—, μ‹œν€€μŠ€μ˜ 각 μš”μ†Œμ— λŒ€μ‘ν•˜λŠ” *은닉 μƒνƒœ(hidden state)* :math:`h_t` κ°€ μ‘΄μž¬ν•˜λ©°,
μ΄λŠ” μ›μΉ™μ μœΌλ‘œ μ‹œν€€μŠ€μ˜ μ•žλΆ€λΆ„μ— μžˆλŠ” μž„μ˜ 포인트의 정보λ₯Ό 포함할 수 μžˆμŠ΅λ‹ˆλ‹€.
μš°λ¦¬λŠ” 은닉 μƒνƒœλ₯Ό μ΄μš©ν•˜μ—¬ μ–Έμ–΄ λͺ¨λΈμ—μ„œμ˜ 단어,
ν’ˆμ‚¬ νƒœκ·Έ λ“± 무수히 λ§Žμ€ 것듀을 μ˜ˆμΈ‘ν•  수 μžˆμŠ΅λ‹ˆλ‹€.
Pytorchμ—μ„œμ˜ LSTM
~~~~~~~~~~~~~~~~~
Before getting to the example, note a few things. Pytorch's LSTM expects
all of its inputs to be 3D tensors. The semantics of the axes of these
tensors is important. The first axis is the sequence itself, the second
indexes instances in the mini-batch, and the third indexes elements of
the input. We haven't discussed mini-batching, so let's just ignore that
and assume we will always have just 1 dimension on the second axis. If
we want to run the sequence model over the sentence "The cow jumped",
our input should look like
예제λ₯Ό μ‹œμž‘ν•˜κΈ° 전에, λͺ‡ 가지 사항을 μœ μ˜ν•˜μ„Έμš”.
Pytorchμ—μ„œμ˜ LSTM은 λͺ¨λ“  μž…λ ₯이 3D Tensor 일 κ²ƒμœΌλ‘œ μ˜ˆμƒν•©λ‹ˆλ‹€.
μ΄λŸ¬ν•œ ν…μ„œ μΆ•μ˜ μ˜λ―ΈλŠ” μ€‘μš”ν•©λ‹ˆλ‹€.
첫 번째 좕은 μ‹œν€€μŠ€ 자체이고, 두 번째 좕은 λ―Έλ‹ˆ 배치의 μΈμŠ€ν„΄μŠ€λ₯Ό μΈλ±μ‹±ν•˜λ©°,
μ„Έ 번째 좕은 μž…λ ₯ μš”μ†Œλ₯Ό μΈλ±μ‹±ν•©λ‹ˆλ‹€.
λ―Έλ‹ˆ λ°°μΉ˜μ— λŒ€ν•΄μ„œλŠ” λ…Όμ˜ν•˜μ§€ μ•Šμ•˜μœΌλ―€λ‘œ 이λ₯Ό λ¬΄μ‹œν•˜κ³ ,
두 번째 좕에 λŒ€ν•΄μ„œλŠ” 항상 1μ°¨μ›λ§Œ κ°€μ§ˆ 것이라고 κ°€μ •ν•˜κ² μŠ΅λ‹ˆλ‹€.
λ§Œμ•½ μš°λ¦¬κ°€ "The cow jumped."λΌλŠ” λ¬Έμž₯에 λŒ€ν•΄ μ‹œν€€μŠ€ λͺ¨λΈμ„ μ‹€ν–‰ν•˜λ €λ©΄,
μž…λ ₯은 λ‹€μŒκ³Ό κ°™μ•„μ•Ό ν•©λ‹ˆλ‹€.
.. math::
Expand All @@ -42,12 +42,12 @@
q_\text{jumped}
\end{bmatrix}
Except remember there is an additional 2nd dimension with size 1.
λ‹€λ§Œ, μ‚¬μ΄μ¦ˆκ°€ 1인 좔가적인 2차원이 μžˆλ‹€λŠ” 것을 κΈ°μ–΅ν•΄μ•Ό ν•©λ‹ˆλ‹€.
In addition, you could go through the sequence one at a time, in which
case the 1st axis will have size 1 also.
λ˜ν•œ ν•œ λ²ˆμ— ν•˜λ‚˜μ”© μ‹œν€€μŠ€λ₯Ό 진행할 수 있으며,
이 경우 첫 번째 좕도 μ‚¬μ΄μ¦ˆκ°€ 1이 λ©λ‹ˆλ‹€.
Let's see a quick example.
κ°„λ‹¨ν•œ 예λ₯Ό μ‚΄νŽ΄λ³΄κ² μŠ΅λ‹ˆλ‹€.
"""

# Author: Robert Guthrie
Expand All @@ -61,95 +61,96 @@

######################################################################

lstm = nn.LSTM(3, 3) # Input dim is 3, output dim is 3
inputs = [torch.randn(1, 3) for _ in range(5)] # make a sequence of length 5
lstm = nn.LSTM(3, 3) # μž…λ ₯ 3차원, 좜λ ₯ 3차원
inputs = [torch.randn(1, 3) for _ in range(5)] # 길이가 5인 μ‹œν€€μŠ€λ₯Ό λ§Œλ“­λ‹ˆλ‹€

# initialize the hidden state.
# 은닉 μƒνƒœλ₯Ό μ΄ˆκΈ°ν™”ν•©λ‹ˆλ‹€.
hidden = (torch.randn(1, 1, 3),
torch.randn(1, 1, 3))
for i in inputs:
# Step through the sequence one element at a time.
# after each step, hidden contains the hidden state.
# ν•œ λ²ˆμ— ν•œ μš”μ†Œμ”© μ‹œν€€μŠ€λ₯Ό ν†΅κ³Όν•©λ‹ˆλ‹€.
# 각 단계가 λλ‚˜λ©΄, hiddenμ—λŠ” 은닉 μƒνƒœκ°€ ν¬ν•¨λ©λ‹ˆλ‹€.
out, hidden = lstm(i.view(1, 1, -1), hidden)

# alternatively, we can do the entire sequence all at once.
# the first value returned by LSTM is all of the hidden states throughout
# the sequence. the second is just the most recent hidden state
# (compare the last slice of "out" with "hidden" below, they are the same)
# The reason for this is that:
# "out" will give you access to all hidden states in the sequence
# "hidden" will allow you to continue the sequence and backpropagate,
# by passing it as an argument to the lstm at a later time
# Add the extra 2nd dimension
# μ•„λ‹ˆλ©΄ μš°λ¦¬λŠ” 전체 μ‹œν€€μŠ€λ₯Ό ν•œ λ²ˆμ— μˆ˜ν–‰ν•  μˆ˜λ„ μžˆμŠ΅λ‹ˆλ‹€.
# LSTM에 μ˜ν•΄ λ°˜ν™˜λœ 첫 번째 값은 μ‹œν€€μŠ€ 전체에 λŒ€ν•œ 은닉 μƒνƒœμž…λ‹ˆλ‹€.
# 두 λ²ˆμ§ΈλŠ” κ°€μž₯ 졜근의 은닉 μƒνƒœμž…λ‹ˆλ‹€.
# (μ•„λž˜μ˜ "hidden"κ³Ό "out"의 λ§ˆμ§€λ§‰ 슬라이슀(slice)λ₯Ό 비ꡐ해 보면 λ‘˜μ€ λ™μΌν•©λ‹ˆλ‹€.)
# μ΄λ ‡κ²Œ ν•˜λŠ” μ΄μœ λŠ” λ‹€μŒκ³Ό κ°™μŠ΅λ‹ˆλ‹€:
# "out"은 μ‹œν€€μŠ€μ˜ λͺ¨λ“  은닉 μƒνƒœμ— λŒ€ν•œ μ•‘μ„ΈμŠ€λ₯Ό μ œκ³΅ν•˜κ³ ,
# "hidden"은 λ‚˜μ€‘μ— lstm에 인수 ν˜•νƒœλ‘œ μ „λ‹¬ν•˜μ—¬
# μ‹œν€€μŠ€λ₯Ό κ³„μ†ν•˜κ³ , μ—­μ „νŒŒ ν•˜λ„λ‘ ν•©λ‹ˆλ‹€.
# μΆ”κ°€λ‘œ 두 번째 차원을 λ”ν•©λ‹ˆλ‹€.
inputs = torch.cat(inputs).view(len(inputs), 1, -1)
hidden = (torch.randn(1, 1, 3), torch.randn(1, 1, 3)) # clean out hidden state
hidden = (torch.randn(1, 1, 3), torch.randn(1, 1, 3)) # 은닉 μƒνƒœλ₯Ό μ§€μ›λ‹ˆλ‹€.
out, hidden = lstm(inputs, hidden)
print(out)
print(hidden)


######################################################################
# Example: An LSTM for Part-of-Speech Tagging
# μ˜ˆμ‹œ: ν’ˆμ‚¬ νƒœκΉ…μ„ μœ„ν•œ LSTM
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
#
# In this section, we will use an LSTM to get part of speech tags. We will
# not use Viterbi or Forward-Backward or anything like that, but as a
# (challenging) exercise to the reader, think about how Viterbi could be
# used after you have seen what is going on. In this example, we also refer
# to embeddings. If you are unfamiliar with embeddings, you can read up
# about them `here <https://tutorials.pytorch.kr/beginner/nlp/word_embeddings_tutorial.html>`__.
# 이 μ„Ήμ…˜μ—μ„œλŠ” μš°λ¦¬λŠ” ν’ˆμ‚¬ νƒœκ·Έλ₯Ό μ–»κΈ° μœ„ν•΄ LSTM을 μ΄μš©ν•  κ²ƒμž…λ‹ˆλ‹€.
# λΉ„ν„°λΉ„(Viterbi)λ‚˜ 순방ν–₯-μ—­λ°©ν–₯(Forward-Backward) 같은 것듀은 μ‚¬μš©ν•˜μ§€ μ•Šμ„ κ²ƒμž…λ‹ˆλ‹€.
# κ·ΈλŸ¬λ‚˜ (도전적인) μ—°μŠ΅μœΌλ‘œ, μ–΄λ–»κ²Œ λŒμ•„κ°€λŠ”μ§€λ₯Ό ν™•μΈν•œ 뒀에
# λΉ„ν„°λΉ„λ₯Ό μ–΄λ–»κ²Œ μ‚¬μš©ν•  수 μžˆλŠ”μ§€μ— λŒ€ν•΄μ„œ 생각해 λ³΄μ‹œκΈ° λ°”λžλ‹ˆλ‹€.
# 이 μ˜ˆμ‹œμ—μ„œλŠ” μž„λ² λ”©λ„ μ°Έμ‘°ν•©λ‹ˆλ‹€. λ§Œμ•½μ— μž„λ² λ”©μ— μ΅μˆ™ν•˜μ§€ μ•Šλ‹€λ©΄,
# `μ—¬κΈ° <https://tutorials.pytorch.kr/beginner/nlp/word_embeddings_tutorial.html>`__.
# μ—μ„œ κ΄€λ ¨ λ‚΄μš©μ„ 읽을 수 μžˆμŠ΅λ‹ˆλ‹€.
#
# The model is as follows: let our input sentence be
# :math:`w_1, \dots, w_M`, where :math:`w_i \in V`, our vocab. Also, let
# :math:`T` be our tag set, and :math:`y_i` the tag of word :math:`w_i`.
# Denote our prediction of the tag of word :math:`w_i` by
# :math:`\hat{y}_i`.
# λͺ¨λΈμ€ λ‹€μŒκ³Ό κ°™μŠ΅λ‹ˆλ‹€. 단어가 :math:`w_i \in V` 일 λ•Œ,
# μž…λ ₯ λ¬Έμž₯을 :math:`w_1, \dots, w_M` 라고 ν•©μ‹œλ‹€. λ˜ν•œ,
# :math:`T` λ₯Ό 우리의 νƒœκ·Έ 집합라고 ν•˜κ³ , :math:`w_i` 의 단어 νƒœκ·Έλ₯Ό :math:`y_i` 라고 ν•©λ‹ˆλ‹€.
# 단어 :math:`w_i` 에 λŒ€ν•œ 예츑된 νƒœκ·Έλ₯Ό :math:`\hat{y}_i` 둜 ν‘œμ‹œν•©λ‹ˆλ‹€.
#
#
# This is a structure prediction, model, where our output is a sequence
# :math:`\hat{y}_1, \dots, \hat{y}_M`, where :math:`\hat{y}_i \in T`.
# 이것은 :math:`\hat{y}_i \in T` 일 λ•Œ, 좜λ ₯이 :math:`\hat{y}_1, \dots, \hat{y}_M` μ‹œν€€μŠ€μΈ
# ꡬ쑰 예츑 λͺ¨λΈμž…λ‹ˆλ‹€.
#
# To do the prediction, pass an LSTM over the sentence. Denote the hidden
# state at timestep :math:`i` as :math:`h_i`. Also, assign each tag a
# unique index (like how we had word\_to\_ix in the word embeddings
# section). Then our prediction rule for :math:`\hat{y}_i` is
# μ˜ˆμΈ‘μ„ ν•˜κΈ° μœ„ν•΄, LSTM에 λ¬Έμž₯을 μ „λ‹¬ν•©λ‹ˆλ‹€. ν•œ μ‹œκ°„ 단계
# :math:`i` 의 은닉 μƒνƒœλŠ” :math:`h_i` 둜 ν‘œμ‹œν•©λ‹ˆλ‹€. λ˜ν•œ 각 νƒœκ·Έμ—
# κ³ μœ ν•œ 인덱슀λ₯Ό ν• λ‹Ήν•©λ‹ˆλ‹€ (단어 μž„λ² λ”© μ„Ήμ…˜μ—μ„œ word\_to\_ix λ₯Ό μ‚¬μš©ν•œ 것과 μœ μ‚¬ν•©λ‹ˆλ‹€.)
# 그러면 :math:`\hat{y}_i` 예츑 κ·œμΉ™μ€ λ‹€μŒκ³Ό κ°™μŠ΅λ‹ˆλ‹€.
#
# .. math:: \hat{y}_i = \text{argmax}_j \ (\log \text{Softmax}(Ah_i + b))_j
#
# That is, take the log softmax of the affine map of the hidden state,
# and the predicted tag is the tag that has the maximum value in this
# vector. Note this implies immediately that the dimensionality of the
# target space of :math:`A` is :math:`|T|`.
# 즉, 은닉 μƒνƒœμ˜ μ•„ν•€ 맡(affine map)에 λŒ€ν•΄ 둜그 μ†Œν”„νŠΈλ§₯슀(log softmax)λ₯Ό μ·¨ν•˜κ³ ,
# 예츑된 νƒœκ·ΈλŠ” 이 λ²‘ν„°μ—μ„œ κ°€μž₯ 큰 값을 κ°€μ§€λŠ” νƒœκ·Έκ°€ λ©λ‹ˆλ‹€.
# 이것은 곧 :math:`A` 의 타깃 κ³΅κ°„μ˜ 차원이 :math:`|T|` λΌλŠ” 것을
# μ˜λ―Έν•œλ‹€λŠ” 것을 μ•Œμ•„λ‘μ„Έμš”.
#
#
# Prepare data:
# 데이터 μ€€λΉ„:

def prepare_sequence(seq, to_ix):
idxs = [to_ix[w] for w in seq]
return torch.tensor(idxs, dtype=torch.long)


training_data = [
# Tags are: DET - determiner; NN - noun; V - verb
# For example, the word "The" is a determiner
# νƒœκ·ΈλŠ” λ‹€μŒκ³Ό κ°™μŠ΅λ‹ˆλ‹€: DET - ν•œμ •μ‚¬;NN - λͺ…사;V - 동사
# 예λ₯Ό λ“€μ–΄, "The" λΌλŠ” λ‹¨μ–΄λŠ” ν•œμ •μ‚¬μž…λ‹ˆλ‹€.
("The dog ate the apple".split(), ["DET", "NN", "V", "DET", "NN"]),
("Everybody read that book".split(), ["NN", "V", "DET", "NN"])
]
word_to_ix = {}
# For each words-list (sentence) and tags-list in each tuple of training_data
# training_data의 각 νŠœν”Œμ— μžˆλŠ” 각 단어 λͺ©λ‘(λ¬Έμž₯) 및 νƒœκ·Έ λͺ©λ‘μ— λŒ€ν•΄
for sent, tags in training_data:
for word in sent:
if word not in word_to_ix: # word has not been assigned an index yet
word_to_ix[word] = len(word_to_ix) # Assign each word with a unique index
if word not in word_to_ix: # wordλŠ” 아직 λ²ˆν˜Έκ°€ ν• λ‹Ήλ˜μ§€ μ•Šμ•˜μŠ΅λ‹ˆλ‹€
word_to_ix[word] = len(word_to_ix) # 각 단어에 κ³ μœ ν•œ 번호 ν• λ‹Ή
print(word_to_ix)
tag_to_ix = {"DET": 0, "NN": 1, "V": 2} # Assign each tag with a unique index
tag_to_ix = {"DET": 0, "NN": 1, "V": 2} # 각 νƒœκ·Έμ— κ³ μœ ν•œ 번호 ν• λ‹Ή

# These will usually be more like 32 or 64 dimensional.
# We will keep them small, so we can see how the weights change as we train.
# 이것듀은 일반적으둜 32λ‚˜ 64차원에 κ°€κΉμŠ΅λ‹ˆλ‹€.
# ν›ˆλ ¨ν•  λ•Œ κ°€μ€‘μΉ˜κ°€ μ–΄λ–»κ²Œ λ³€ν•˜λŠ”μ§€ 확인할 수 μžˆλ„λ‘, μž‘κ²Œ μœ μ§€ν•˜κ² μŠ΅λ‹ˆλ‹€.
EMBEDDING_DIM = 6
HIDDEN_DIM = 6

######################################################################
# Create the model:
# λͺ¨λΈ 생성:


class LSTMTagger(nn.Module):
Expand All @@ -160,11 +161,11 @@ def __init__(self, embedding_dim, hidden_dim, vocab_size, tagset_size):

self.word_embeddings = nn.Embedding(vocab_size, embedding_dim)

# The LSTM takes word embeddings as inputs, and outputs hidden states
# with dimensionality hidden_dim.
# LSTM은 단어 μž„λ² λ”©μ„ μž…λ ₯으둜 λ°›κ³ ,
# 차원이 hidden_dim인 은닉 μƒνƒœλ₯Ό 좜λ ₯ν•©λ‹ˆλ‹€.
self.lstm = nn.LSTM(embedding_dim, hidden_dim)

# The linear layer that maps from hidden state space to tag space
# 은닉 μƒνƒœ κ³΅κ°„μ—μ„œ νƒœκ·Έ κ³΅κ°„μœΌλ‘œ λ§€ν•‘ν•˜λŠ” μ„ ν˜• λ ˆμ΄μ–΄
self.hidden2tag = nn.Linear(hidden_dim, tagset_size)

def forward(self, sentence):
Expand All @@ -175,79 +176,79 @@ def forward(self, sentence):
return tag_scores

######################################################################
# Train the model:
# λͺ¨λΈ ν•™μŠ΅:


model = LSTMTagger(EMBEDDING_DIM, HIDDEN_DIM, len(word_to_ix), len(tag_to_ix))
loss_function = nn.NLLLoss()
optimizer = optim.SGD(model.parameters(), lr=0.1)

# See what the scores are before training
# Note that element i,j of the output is the score for tag j for word i.
# Here we don't need to train, so the code is wrapped in torch.no_grad()
# ν›ˆλ ¨ μ „μ˜ 점수λ₯Ό ν™•μΈν•˜μ„Έμš”.
# 좜λ ₯의 i,jμš”μ†ŒλŠ” 단어 i에 λŒ€ν•œ νƒœκ·Έ j의 μ μˆ˜μž…λ‹ˆλ‹€.
# μ—¬κΈ°μ„œλŠ” ν›ˆλ ¨μ„ ν•  ν•„μš”κ°€ μ—†μœΌλ―€λ‘œ, μ½”λ“œλŠ” torch.no_grad()둜 λž˜ν•‘ λ˜μ–΄ μžˆμŠ΅λ‹ˆλ‹€.
with torch.no_grad():
inputs = prepare_sequence(training_data[0][0], word_to_ix)
tag_scores = model(inputs)
print(tag_scores)

for epoch in range(300): # again, normally you would NOT do 300 epochs, it is toy data
for epoch in range(300): # λ‹€μ‹œ λ§ν•˜μ§€λ§Œ, 일반적으둜 300에폭을 μˆ˜ν–‰ν•˜μ§€λŠ” μ•ŠμŠ΅λ‹ˆλ‹€. 이건 μž₯λ‚œκ° 데이터이기 λ•Œλ¬Έμž…λ‹ˆλ‹€.
for sentence, tags in training_data:
# Step 1. Remember that Pytorch accumulates gradients.
# We need to clear them out before each instance
# 1단계, PytorchλŠ” 변화도λ₯Ό μΆ•μ ν•œλ‹€λŠ” 것을 κΈ°μ–΅ν•˜μ„Έμš”.
# 각 μΈμŠ€ν„΄μŠ€ 전에 이λ₯Ό μ§€μ›Œμ€˜μ•Ό ν•©λ‹ˆλ‹€.
model.zero_grad()

# Step 2. Get our inputs ready for the network, that is, turn them into
# Tensors of word indices.
# 2단계, λ„€νŠΈμ›Œν¬μ— 맞게 μž…λ ₯을 μ€€λΉ„μ‹œν‚΅λ‹ˆλ‹€.
# 즉, μž…λ ₯듀을 단어 μΈλ±μŠ€λ“€μ˜ ν…μ„œλ‘œ λ³€ν™˜ν•©λ‹ˆλ‹€.
sentence_in = prepare_sequence(sentence, word_to_ix)
targets = prepare_sequence(tags, tag_to_ix)

# Step 3. Run our forward pass.
# 3단계, μˆœμ „νŒŒ 단계(forward pass)λ₯Ό μ‹€ν–‰ν•©λ‹ˆλ‹€.
tag_scores = model(sentence_in)

# Step 4. Compute the loss, gradients, and update the parameters by
# calling optimizer.step()
# 4단계, 손싀과 기울기λ₯Ό κ³„μ‚°ν•˜κ³ , optimizer.step()을 ν˜ΈμΆœν•˜μ—¬
# λ§€κ°œλ³€μˆ˜λ₯Ό μ—…λ°μ΄νŠΈν•©λ‹ˆλ‹€.
loss = loss_function(tag_scores, targets)
loss.backward()
optimizer.step()

# See what the scores are after training
# ν›ˆλ ¨ ν›„μ˜ 점수λ₯Ό 확인해 λ³΄μ„Έμš”.
with torch.no_grad():
inputs = prepare_sequence(training_data[0][0], word_to_ix)
tag_scores = model(inputs)

# The sentence is "the dog ate the apple". i,j corresponds to score for tag j
# for word i. The predicted tag is the maximum scoring tag.
# Here, we can see the predicted sequence below is 0 1 2 0 1
# since 0 is index of the maximum value of row 1,
# 1 is the index of maximum value of row 2, etc.
# Which is DET NOUN VERB DET NOUN, the correct sequence!
# λ¬Έμž₯은 "the dog ate the apple"μž…λ‹ˆλ‹€. i와 jλŠ” 단어 i에 λŒ€ν•œ νƒœκ·Έ j의 점수λ₯Ό μ˜λ―Έν•©λ‹ˆλ‹€.
# 예츑된 νƒœκ·ΈλŠ” κ°€μž₯ μ μˆ˜κ°€ 높은 νƒœκ·Έμž…λ‹ˆλ‹€.
# 자, μ•„λž˜μ˜ 예츑된 μˆœμ„œκ°€ 0 1 2 0 1μ΄λΌλŠ” 것을 확인할 수 μžˆμŠ΅λ‹ˆλ‹€.
# 0은 1행에 λŒ€ν•œ μ΅œλŒ“κ°’μ΄λ―€λ‘œ,
# 1은 2행에 λŒ€ν•œ μ΅œλŒ“κ°’μ΄ λ˜λŠ” μ‹μž…λ‹ˆλ‹€.
# DET NOUN VERB DET NOUN은 μ˜¬λ°”λ₯Έ μˆœμ„œμž…λ‹ˆλ‹€!
print(tag_scores)


######################################################################
# Exercise: Augmenting the LSTM part-of-speech tagger with character-level features
# μ—°μŠ΅ : 문자-λ‹¨μœ„ νŠΉμ§•κ³Ό LSTM ν’ˆμ‚¬ νƒœκ±° 증강
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
#
# In the example above, each word had an embedding, which served as the
# inputs to our sequence model. Let's augment the word embeddings with a
# representation derived from the characters of the word. We expect that
# this should help significantly, since character-level information like
# affixes have a large bearing on part-of-speech. For example, words with
# the affix *-ly* are almost always tagged as adverbs in English.
# μœ„μ˜ μ˜ˆμ œμ—μ„œ, 각 λ‹¨μ–΄λŠ” μ‹œν€€μŠ€ λͺ¨λΈμ— μž…λ ₯ 역할을 ν•˜λŠ” μž„λ² λ”©μ„ κ°€μ§‘λ‹ˆλ‹€.
# λ‹¨μ–΄μ˜ λ¬Έμžμ—μ„œ νŒŒμƒλœ ν‘œν˜„μœΌλ‘œ 단어 μž„λ² λ”©μ„ μ¦κ°€μ‹œμΌœλ³΄κ² μŠ΅λ‹ˆλ‹€.
# 접사(affixes)와 같은 문자 μˆ˜μ€€μ˜ μ •λ³΄λŠ” ν’ˆμ‚¬μ— 큰 영ν–₯을 미치기 λ•Œλ¬Έμ—,
# μƒλ‹Ήν•œ 도움이 될 κ²ƒμœΌλ‘œ μ˜ˆμƒν•©λ‹ˆλ‹€.
# 예λ₯Ό λ“€μ–΄, 접사 *-ly* κ°€ μžˆλŠ” λ‹¨μ–΄λŠ”
# μ˜μ–΄μ—μ„œ 거의 항상 λΆ€μ‚¬λ‘œ νƒœκ·Έκ°€ μ§€μ •λ©λ‹ˆλ‹€.
#
# To do this, let :math:`c_w` be the character-level representation of
# word :math:`w`. Let :math:`x_w` be the word embedding as before. Then
# the input to our sequence model is the concatenation of :math:`x_w` and
# :math:`c_w`. So if :math:`x_w` has dimension 5, and :math:`c_w`
# dimension 3, then our LSTM should accept an input of dimension 8.
# 이것을 ν•˜κΈ° μœ„ν•΄μ„œ, :math:`c_w` λ₯Ό 단어 :math:`w` 의 Cλ₯Ό 단어 w의 문자 μˆ˜μ€€ ν‘œν˜„μ΄λΌκ³  ν•˜κ³ ,
# μ „κ³Ό 같이 :math:`x_w` λ₯Ό λ‹¨μ–΄μž„λ² λ”©μ΄λΌκ³  ν•©μ‹œλ‹€.
# κ·Έλ ‡λ‹€λ©΄ 우리의 μ‹œν€€μŠ€ λͺ¨λΈμ— λŒ€ν•œ μž…λ ₯은 :math:`x_w` 와
# :math:`c_w` 의 연결이라고 ν•  수 μžˆμŠ΅λ‹ˆλ‹€. λ§Œμ•½μ— :math:`x_w` κ°€ 차원 5λ₯Ό 가지고, :math:`c_w`
# 차원 3을 가지면 LSTM은 차원 8의 μž…λ ₯을 λ°›μ•„λ“€μ—¬μ•Ό ν•©λ‹ˆλ‹€.
#
# To get the character level representation, do an LSTM over the
# characters of a word, and let :math:`c_w` be the final hidden state of
# this LSTM. Hints:
# 문자 μˆ˜μ€€μ˜ ν‘œν˜„μ„ μ–»κΈ° μœ„ν•΄μ„œ, λ‹¨μ–΄μ˜ λ¬Έμžμ— λŒ€ν•΄μ„œ LSTM을 μˆ˜ν–‰ν•˜κ³ 
# :math:`c_w` λ₯Ό LSTM의 μ΅œμ’… 은닉 μƒνƒœκ°€ λ˜λ„λ‘ ν•©λ‹ˆλ‹€.
# 힌트:
#
# * There are going to be two LSTM's in your new model.
# The original one that outputs POS tag scores, and the new one that
# outputs a character-level representation of each word.
# * To do a sequence model over characters, you will have to embed characters.
# The character embeddings will be the input to the character LSTM.
# * μƒˆ λͺ¨λΈμ—λŠ” 두 개의 LSTM이 μžˆμ„ κ²ƒμž…λ‹ˆλ‹€.
# POS νƒœκ·Έ 점수λ₯Ό 좜λ ₯ν•˜λŠ” μ›λž˜μ˜ LSTMκ³Ό
# 각 λ‹¨μ–΄μ˜ 문자 μˆ˜μ€€ ν‘œν˜„μ„ 좜λ ₯ν•˜λŠ” μƒˆλ‘œμš΄ LSTMμž…λ‹ˆλ‹€.
# * λ¬Έμžμ— λŒ€ν•΄ μ‹œν€€μŠ€ λͺ¨λΈμ„ μˆ˜ν–‰ν•˜λ €λ©΄, 문자λ₯Ό μž„λ² λ”©ν•΄μ•Ό ν•©λ‹ˆλ‹€.
# 문자 μž„λ² λ”©μ€ 문자 LSTM에 λŒ€ν•œ μž…λ ₯이 λ©λ‹ˆλ‹€.
#

0 comments on commit 4658d55

Please sign in to comment.