-
Notifications
You must be signed in to change notification settings - Fork 10.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
15.7.1 加载预训练 bert.base 模型报错 #1261
Closed
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
v2.0.0-alpha0
v2.0.0-alpha0 style hotfix
[hotfix] Support SageMaker and Colab
[hotfix] install d2l-zh release branch for colab
[hotfix] fix d2l-zh v2 link
[hotfix] sync bib
[hotfix] add zhibo link
v2.0.0-alpha1 release
Minor release: add 13.1, 13.2, 13.13, 13.14
Add CV object detection part
Add 13.9--13.12
Build PyTorch pdf
Release 2.0.0-alpha2
2.0.0-alpha2 hotfix
Add 14.1--14.4
Disable untranslated part of ch14, add MT of ch11
Add Complete Ch14
Pre-release for v2.0.0-beta0
[hotfix] Add d2l-zh-2.0.0.zip
Release v2.0.0-beta0
[minor fix]
[Bug and typo fix]
Sync d2l-en v0.17.3
Post-edits till Mar 14 and lib upgrading
Sync d2l-en v0.17.5
Release 2.0.0-beta1
2.0.0版发布
[No from-scratch eval] Add 2e links
Disable deploy slides
Update best-seller ranking
bug 问题继续根据报错信息. 进一步调整了输入的模型超参数 # bert模型
bert = BERTModel(vocab_size=60005,
num_hiddens=768,
norm_shape=[768],
ffn_num_input=768,
ffn_num_hiddens=3072,
num_heads=4,
num_layers=2,
dropout=0.2,
max_len=512,
key_size=768,
query_size=768,
value_size=768,
hid_in_features=768,
mlm_in_features=768,
nsp_in_features=768
) 加载模型base_path = r'this is bert abs path'
bert.load_state_dict(torch.load(data_dir)) 底部的参数对不上问题没有了, 但是顶部的 keys in state_dict 仍然存在 ---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
Cell In[70], line 2
1 # 加载
----> 2 bert.load_state_dict(torch.load(data_dir))
File ~/.virtualenvs/dl-pytorch/lib/python3.8/site-packages/torch/nn/modules/module.py:1671, in Module.load_state_dict(self, state_dict, strict)
1666 error_msgs.insert(
1667 0, 'Missing key(s) in state_dict: {}. '.format(
1668 ', '.join('"{}"'.format(k) for k in missing_keys)))
1670 if len(error_msgs) > 0:
-> 1671 raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
1672 self.__class__.__name__, "\n\t".join(error_msgs)))
1673 return _IncompatibleKeys(missing_keys, unexpected_keys)
RuntimeError: Error(s) in loading state_dict for BERTModel:
Unexpected key(s) in state_dict: "encoder.blks.2.attention.W_q.weight", "encoder.blks.2.attention.W_q.bias", "encoder.blks.2.attention.W_k.weight", "encoder.blks.2.attention.W_k.bias", "encoder.blks.2.attention.W_v.weight", "encoder.blks.2.attention.W_v.bias", "encoder.blks.2.attention.W_o.weight", "encoder.blks.2.attention.W_o.bias", "encoder.blks.2.addnorm1.ln.weight", "encoder.blks.2.addnorm1.ln.bias", "encoder.blks.2.ffn.dense1.weight", "encoder.blks.2.ffn.dense1.bias", "encoder.blks.2.ffn.dense2.weight", "encoder.blks.2.ffn.dense2.bias", "encoder.blks.2.addnorm2.ln.weight", "encoder.blks.2.addnorm2.ln.bias", "encoder.blks.3.attention.W_q.weight", "encoder.blks.3.attention.W_q.bias", "encoder.blks.3.attention.W_k.weight", "encoder.blks.3.attention.W_k.bias", "encoder.blks.3.attention.W_v.weight", "encoder.blks.3.attention.W_v.bias", "encoder.blks.3.attention.W_o.weight", "encoder.blks.3.attention.W_o.bias", "encoder.blks.3.addnorm1.ln.weight", "encoder.blks.3.addnorm1.ln.bias", "encoder.blks.3.ffn.dense1.weight", "encoder.blks.3.ffn.dense1.bias", "encoder.blks.3.ffn.dense2.weight", "encoder.blks.3.ffn.dense2.bias", "encoder.blks.3.addnorm2.ln.weight", "encoder.blks.3.addnorm2.ln.bias", "encoder.blks.4.attention.W_q.weight", "encoder.blks.4.attention.W_q.bias", "encoder.blks.4.attention.W_k.weight", "encoder.blks.4.attention.W_k.bias", "encoder.blks.4.attention.W_v.weight", "encoder.blks.4.attention.W_v.bias", "encoder.blks.4.attention.W_o.weight", "encoder.blks.4.attention.W_o.bias", "encoder.blks.4.addnorm1.ln.weight", "encoder.blks.4.addnorm1.ln.bias", "encoder.blks.4.ffn.dense1.weight", "encoder.blks.4.ffn.dense1.bias", "encoder.blks.4.ffn.dense2.weight", "encoder.blks.4.ffn.dense2.bias", "encoder.blks.4.addnorm2.ln.weight", "encoder.blks.4.addnorm2.ln.bias", "encoder.blks.5.attention.W_q.weight", "encoder.blks.5.attention.W_q.bias", "encoder.blks.5.attention.W_k.weight", "encoder.blks.5.attention.W_k.bias", "encoder.blks.5.attention.W_v.weight", "encoder.blks.5.attention.W_v.bias", "encoder.blks.5.attention.W_o.weight", "encoder.blks.5.attention.W_o.bias", "encoder.blks.5.addnorm1.ln.weight", "encoder.blks.5.addnorm1.ln.bias", "encoder.blks.5.ffn.dense1.weight", "encoder.blks.5.ffn.dense1.bias", "encoder.blks.5.ffn.dense2.weight", "encoder.blks.5.ffn.dense2.bias", "encoder.blks.5.addnorm2.ln.weight", "encoder.blks.5.addnorm2.ln.bias", "encoder.blks.6.attention.W_q.weight", "encoder.blks.6.attention.W_q.bias", "encoder.blks.6.attention.W_k.weight", "encoder.blks.6.attention.W_k.bias", "encoder.blks.6.attention.W_v.weight", "encoder.blks.6.attention.W_v.bias", "encoder.blks.6.attention.W_o.weight", "encoder.blks.6.attention.W_o.bias", "encoder.blks.6.addnorm1.ln.weight", "encoder.blks.6.addnorm1.ln.bias", "encoder.blks.6.ffn.dense1.weight", "encoder.blks.6.ffn.dense1.bias", "encoder.blks.6.ffn.dense2.weight", "encoder.blks.6.ffn.dense2.bias", "encoder.blks.6.addnorm2.ln.weight", "encoder.blks.6.addnorm2.ln.bias", "encoder.blks.7.attention.W_q.weight", "encoder.blks.7.attention.W_q.bias", "encoder.blks.7.attention.W_k.weight", "encoder.blks.7.attention.W_k.bias", "encoder.blks.7.attention.W_v.weight", "encoder.blks.7.attention.W_v.bias", "encoder.blks.7.attention.W_o.weight", "encoder.blks.7.attention.W_o.bias", "encoder.blks.7.addnorm1.ln.weight", "encoder.blks.7.addnorm1.ln.bias", "encoder.blks.7.ffn.dense1.weight", "encoder.blks.7.ffn.dense1.bias", "encoder.blks.7.ffn.dense2.weight", "encoder.blks.7.ffn.dense2.bias", "encoder.blks.7.addnorm2.ln.weight", "encoder.blks.7.addnorm2.ln.bias", "encoder.blks.8.attention.W_q.weight", "encoder.blks.8.attention.W_q.bias", "encoder.blks.8.attention.W_k.weight", "encoder.blks.8.attention.W_k.bias", "encoder.blks.8.attention.W_v.weight", "encoder.blks.8.attention.W_v.bias", "encoder.blks.8.attention.W_o.weight", "encoder.blks.8.attention.W_o.bias", "encoder.blks.8.addnorm1.ln.weight", "encoder.blks.8.addnorm1.ln.bias", "encoder.blks.8.ffn.dense1.weight", "encoder.blks.8.ffn.dense1.bias", "encoder.blks.8.ffn.dense2.weight", "encoder.blks.8.ffn.dense2.bias", "encoder.blks.8.addnorm2.ln.weight", "encoder.blks.8.addnorm2.ln.bias", "encoder.blks.9.attention.W_q.weight", "encoder.blks.9.attention.W_q.bias", "encoder.blks.9.attention.W_k.weight", "encoder.blks.9.attention.W_k.bias", "encoder.blks.9.attention.W_v.weight", "encoder.blks.9.attention.W_v.bias", "encoder.blks.9.attention.W_o.weight", "encoder.blks.9.attention.W_o.bias", "encoder.blks.9.addnorm1.ln.weight", "encoder.blks.9.addnorm1.ln.bias", "encoder.blks.9.ffn.dense1.weight", "encoder.blks.9.ffn.dense1.bias", "encoder.blks.9.ffn.dense2.weight", "encoder.blks.9.ffn.dense2.bias", "encoder.blks.9.addnorm2.ln.weight", "encoder.blks.9.addnorm2.ln.bias", "encoder.blks.10.attention.W_q.weight", "encoder.blks.10.attention.W_q.bias", "encoder.blks.10.attention.W_k.weight", "encoder.blks.10.attention.W_k.bias", "encoder.blks.10.attention.W_v.weight", "encoder.blks.10.attention.W_v.bias", "encoder.blks.10.attention.W_o.weight", "encoder.blks.10.attention.W_o.bias", "encoder.blks.10.addnorm1.ln.weight", "encoder.blks.10.addnorm1.ln.bias", "encoder.blks.10.ffn.dense1.weight", "encoder.blks.10.ffn.dense1.bias", "encoder.blks.10.ffn.dense2.weight", "encoder.blks.10.ffn.dense2.bias", "encoder.blks.10.addnorm2.ln.weight", "encoder.blks.10.addnorm2.ln.bias", "encoder.blks.11.attention.W_q.weight", "encoder.blks.11.attention.W_q.bias", "encoder.blks.11.attention.W_k.weight", "encoder.blks.11.attention.W_k.bias", "encoder.blks.11.attention.W_v.weight", "encoder.blks.11.attention.W_v.bias", "encoder.blks.11.attention.W_o.weight", "encoder.blks.11.attention.W_o.bias", "encoder.blks.11.addnorm1.ln.weight", "encoder.blks.11.addnorm1.ln.bias", "encoder.blks.11.ffn.dense1.weight", "encoder.blks.11.ffn.dense1.bias", "encoder.blks.11.ffn.dense2.weight", "encoder.blks.11.ffn.dense2.bias", "encoder.blks.11.addnorm2.ln.weight", "encoder.blks.11.addnorm2.ln.bias". |
pre-Release 2.0.1 (To enable GitHub Actions)
Closing this PR since we don't take external releases. Thanks! |
Job PR-1261-4b9a3a2 is done. |
ok, let's me close this issues |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
加载代码
报错截图
报错节选
RuntimeError: Error(s) in loading state_dict for BERTModel:
Unexpected key(s) in state_dict: "encoder.blks.2.attention.W_q.weight", "encoder.blks.2.attention.W_q.bias", "encoder.blks.2.attention.W_k.weight", "encoder.blks.2.attention.W_k.bias", "encoder.blks.2.attention.W_v.weight", "encoder.blks.2.attention.W_v.bias", "encoder.blks.2.attention.W_o.weight", "encoder.blks.2.attention.W_o.bias", "encoder.blks.2.addnorm1.ln.weight", "encoder.blks.2.addnorm1.ln.bias", "encoder.blks.2.ffn.dense1.weight", "encoder.blks.2.ffn.dense1.bias", "encoder.blks.2.ffn.dense2.weight", "encoder.blks.2.ffn.dense2.bias", "encoder.blks.2.addnorm2.ln.weight", "encoder.blks.2.addnorm2.ln.bias", "encoder.blks.3.attention.W_q.weight", "encoder.blks.3.attention.W_q.bias", "encoder.blks.3.attention.W_k.weight", "encoder.blks.3.attention.W_k.bias", "encoder.blks.3.attention.W_v.weight", "encoder.blks.3.attention.W_v.bias", "encoder.blks.3.attention.W_o.weight", "encoder.blks.3.attention.W_o.bias", "encoder.blks.3.addnorm1.ln.weight", "encoder.blks.3.addnorm1.ln.bias", "encoder.blks.3.ffn.dense1.weight", "encoder.blks.3.ffn.dense1.bias", "encoder.blks.3.ffn.dense2.weight", "encoder.blks.3.ffn.dense2.bias", "encoder.blks.3.addnorm2.ln.weight", "encoder.blks.3.addnorm2.ln.bias", "encoder.blks.4.attention.W_q.weight", "encoder.blks.4.attention.W_q.bias", "encoder.blks.4.attention.W_k.weight", "encoder.blks.4.attention.W_k.bias", "encoder.blks.4.attention.W_v.weight", "encoder.blks.4.attention.W_v.bias", "encoder.blks.4.attention.W_o.weight", "encoder.blks.4.attention.W_o.bias", "encoder.blks.4.addnorm1.ln.weight", "encoder.blks.4.addnorm1.ln.bias", "encoder.blks.4.ffn.dense1.weight",