We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
最近在学习GLM的理论知识,发现很多解释都是粗略的说GLM的双向attention能够帮助更好的理解上下文。但是和bert的MLM方式不同,glm把被mask的部分作为partB放在了partA的后面,在预训练的时候,也是用partB对应维度的输出去做loss的计算。那么partA部分的注意力是否是双向有起到作用吗? 还是说有做其他的预训练任务?
No response
none
- OS: - Python: - Transformers: - PyTorch: - CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) :
The text was updated successfully, but these errors were encountered:
No branches or pull requests
Is there an existing issue for this?
Current Behavior
最近在学习GLM的理论知识,发现很多解释都是粗略的说GLM的双向attention能够帮助更好的理解上下文。但是和bert的MLM方式不同,glm把被mask的部分作为partB放在了partA的后面,在预训练的时候,也是用partB对应维度的输出去做loss的计算。那么partA部分的注意力是否是双向有起到作用吗? 还是说有做其他的预训练任务?
Expected Behavior
No response
Steps To Reproduce
none
Environment
Anything else?
No response
The text was updated successfully, but these errors were encountered: