Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

commit synclm code #303

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
73 changes: 71 additions & 2 deletions NLP/ACL2022-SynCLM/README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,73 @@
SynCLM
===
Code for Findings of ACL 2022 paper: Syntax-guided Contrastive Learning for Pre-trained Language Model
====
Code for Findings of ACL 2022 long paper: [Syntax-guided Contrastive Learning for Pre-trained Language Model](https://aclanthology.org/2022.findings-acl.191/)




Abstract
---
Syntactic information has been proved to be useful for transformer-based pre-trained language models. Previous studies often rely on additional syntax-guided attention components to enhance the transformer, which require more parameters and additional syntactic parsing in downstream tasks. This increase in complexity severely limits the application of syntax-enhanced language model in a wide range of scenarios. In order to inject syntactic knowledge effectively and efficiently into pre-trained language models, we propose a novel syntax-guided contrastive learning method which does not change the transformer architecture. Based on constituency and dependency structures of syntax trees, we design phrase-guided and tree-guided contrastive objectives, and optimize them in the pre-training stage, so as to help the pre-trained language model to capture rich syntactic knowledge in its representations. Experimental results show that our contrastive method achieves consistent improvements in a variety of tasks, including grammatical error detection, entity tasks, structural probing and GLUE. Detailed analysis further verifies that the improvements come from the utilization of syntactic information, and the learned attention weights are more explainable in terms of linguistics.


![SynCLM](images/framework.png#pic_center)



Dependencies
---
python3.7.4\
cuda-10.1\
cudnn_v7.6\
nccl2.4.2\
java1.8
paddlepaddle-gpu2.1.2\
stanza1.2\
numpy1.20.2



Pre-trained Models
---
SynCLM is trained based on RoBERTa model, users can use the following command to download the paddle version of RoBERTa model:

```shell
cd /path/to/model_files
# download base model
sh ./download_roberta_base_en.sh
# or download large model
# sh ./download_roberta_large_en.sh
cd -
```
To obtain the syntactic structures of the text, we use [Stanza](https://github.com/stanfordnlp/stanza) to preprocess a data which is English Wikipedia and BookCorpus. We provide input examples in the `/path/to/data/pretrain` directory.

After preparing the data, you can run the following command for training:
```shell
cd /path/to
# base model
sh ./script/roberta_base_en/run.sh
# or large model
# sh ./script/roberta_large_en/run.sh
```
After pre-training the model, users can use the following command to fine-tune it on downstream tasks:
```shell
# classification
python ./src/run_classifier.py
# regression
python ./src/run_regression.py
```


Citation
---
If you find our paper and code useful, please cite the following paper:
```
@inproceedings{zhang2022syntax,
title={Syntax-guided Contrastive Learning for Pre-trained Language Model},
author={Zhang, Shuai and Lijie, Wang and Xiao, Xinyan and Wu, Hua},
booktitle={Findings of the Association for Computational Linguistics: ACL 2022},
pages={2430--2440},
year={2022}
}
```

1,000 changes: 1,000 additions & 0 deletions NLP/ACL2022-SynCLM/data/pretrain/demo_input

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions NLP/ACL2022-SynCLM/data/pretrain/train_filelist
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
./data/pretrain/demo_input
1 change: 1 addition & 0 deletions NLP/ACL2022-SynCLM/data/pretrain/valid_filelist
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
./data/pretrain/demo_input
25 changes: 25 additions & 0 deletions NLP/ACL2022-SynCLM/env_local/env.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
#!/usr/bin/env bash
set -x
#在LD_LIBRARY_PATH中添加cuda库的路径
export LD_LIBRARY_PATH=/home/work/cuda-10.1_cudnn7.6.5/lib64${LD_LIBRARY_PATH:+:$LD_LIBRARY_PATH}
export LD_LIBRARY_PATH=/home/work/cuda-10.1_cudnn7.6.5/extras/CUPTI/lib64:$LD_LIBRARY_PATH
#在LD_LIBRARY_PATH中添加cudnn库的路径
export LD_LIBRARY_PATH=/home/work/cudnn/cudnn_v7.6/cuda/lib64:$LD_LIBRARY_PATH
#需要先下载NCCL,然后在LD_LIBRARY_PATH中添加NCCL库的路径
export LD_LIBRARY_PATH=/home/work/nccl2.4.2_cuda10.1/lib:$LD_LIBRARY_PATH
#如果FLAGS_sync_nccl_allreduce为1,则会在allreduce_op_handle中调用cudaStreamSynchronize(nccl_stream),这种模式在某些情况下可以获得更好的性能
export FLAGS_sync_nccl_allreduce=1
#表示分配的显存块占GPU总可用显存大小的比例,范围[0,1]
export FLAGS_fraction_of_gpu_memory_to_use=1
#表示是否使用垃圾回收策略来优化网络的内存使用,<0表示禁用,>=0表示启用
export FLAGS_eager_delete_tensor_gb=1.0
#是否使用快速垃圾回收策略
export FLAGS_fast_eager_deletion_mode=1
#垃圾回收策略释放变量的内存大小百分比,范围为[0.0, 1.0]
export FLAGS_memory_fraction_of_eager_deletion=1

export iplist=`hostname -i`
#http_proxy
unset http_proxy
unset https_proxy
set +x
Binary file added NLP/ACL2022-SynCLM/images/framework.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
14 changes: 14 additions & 0 deletions NLP/ACL2022-SynCLM/model_files/config/roberta_base_en.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
{
"attention_probs_dropout_prob": 0.1,
"hidden_act": "gelu",
"hidden_dropout_prob": 0.1,
"hidden_size": 768,
"initializer_range": 0.02,
"max_position_embeddings": 514,
"num_attention_heads": 12,
"num_hidden_layers": 12,
"type_vocab_size": 0,
"sent_type_vocab_size": 0,
"task_type_vocab_size": 0,
"vocab_size": 50265
}
14 changes: 14 additions & 0 deletions NLP/ACL2022-SynCLM/model_files/config/roberta_large_en.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
{
"attention_probs_dropout_prob": 0.1,
"hidden_act": "gelu",
"hidden_dropout_prob": 0.1,
"hidden_size": 1024,
"initializer_range": 0.02,
"max_position_embeddings": 514,
"num_attention_heads": 16,
"num_hidden_layers": 24,
"type_vocab_size": 0,
"sent_type_vocab_size": 0,
"task_type_vocab_size": 0,
"vocab_size": 50265
}

Large diffs are not rendered by default.

Loading