This is the source code used for the paper "Enhancing Transformer-based Models’ Natural Language Understanding based on Word Importance Investigation (under revision)." The source code is composed of a fine-tuning model based on the WI-projected layer and a WI validation part.
For fine-tuning, these are the modification of run_glue.py, run_squad.py, and run_swag.py from transformer version 4.20.1. For model, these are the modification of modeling_bert.py, modeling_electra.py, and modeling_roberta.py from transformer version 4.20.1.
- Copy each modeling_"model".py file to the corresponding model directory within the transformer library.
- Run the run_"task".py file.
- After the preprocessing process is done, please enter 3 for wandb option.
- "bert-base-uncased": BERT
- "roberta-base": RoBERTa
- "google/electra-base-discriminator": ELECTRA
- 1: basic TFIDF
- 2: TFIDF w/ norm 1
- 3: TFIDF w/ norm 2
- run_glue.py
- Following the instructions below.
- python glue.py --model_name_or_path "model_name" --tokenizer_name "model_name" --no_use_fast --task_name "task_name" --do_train --do_eval --output_dir "your_path" --max_seq_length 128 --per_gpu_train_batch_size 32 --num_train_epochs 3 --use_wi "choose_metric"
- run_squad.py
- Following the instructions below.
- python run_squad.py --model_name_or_path "model_name" --tokenizer_name "model_name" --do_train --do_eval --output_dir "your_path" --per_gpu_train_batch_size 32 --num_train_epochs 3 --dataset_name squad --use_wi "choose_metric"
- run_swag.py
- Following the instructions below.
- python run_swag.py --model_name_or_path "model_name" --tokenizer_name "model_name" --no_use_fast_tokenizer --do_train --do_eval --output_dir "your_path" --max_seq_length 128 --per_gpu_train_batch_size 16 --num_train_epochs 3 --pad_to_max_length True --use_wi "choose_metric"
[jiant] (https://github.com/nyu-mll/jiant).
[instruct-eval] (https://github.com/declare-lab/instruct-eval).
- Perform WI validation on the target dataset, randomly select 2000 sentences, and generate the dataset by choosing positions of tokens with inter-token dependencies as well as positions of tokens without inter-token dependencies.
- Requires preprocessed data from the respective downstream task.
- Based on the extracted sentences and relationships between tokens within the sentences, extract attention values from those tokens. Then, compare the performance of dependency relationship prediction.
- Requires the trained model and its configuration.