Skip to content
This repository has been archived by the owner on Aug 21, 2024. It is now read-only.

Commit

Permalink
Merge pull request #12 from JenniferOH/fix_tokenizer_init
Browse files Browse the repository at this point in the history
transformer 4.34.0 버전부터 tokenizer init 단계에서 vocab을 참조합니다.
  • Loading branch information
monologg authored Aug 20, 2024
2 parents 4e0a00e + 8426751 commit 7628bce
Showing 1 changed file with 9 additions and 8 deletions.
17 changes: 9 additions & 8 deletions kobert_transformers/tokenization_kobert.py
Original file line number Diff line number Diff line change
Expand Up @@ -82,14 +82,6 @@ def __init__(
mask_token="[MASK]",
**kwargs,
):
super().__init__(
unk_token=unk_token,
sep_token=sep_token,
pad_token=pad_token,
cls_token=cls_token,
mask_token=mask_token,
**kwargs,
)

# Build vocab
self.token2idx = dict()
Expand Down Expand Up @@ -117,6 +109,15 @@ def __init__(
self.sp_model = spm.SentencePieceProcessor()
self.sp_model.Load(vocab_file)

super().__init__(
unk_token=unk_token,
sep_token=sep_token,
pad_token=pad_token,
cls_token=cls_token,
mask_token=mask_token,
**kwargs,
)

@property
def vocab_size(self):
return len(self.idx2token)
Expand Down

0 comments on commit 7628bce

Please sign in to comment.