This is the code for our paper on text structuring with silver-standard discourse trees from MEGA-DT treebank.
- Python (3.6+)
- Pytorch (1.3.0+)
- dgl (0.4.2 strictly)
- Transformers (3.0.2)
- Create the folder named "data".
- Download the pickled versions of MEGA-DT here (100k train, 250k train, 5k val, 15k test), and place it in the "data" folder.
- Run the train/testing script as described below. Each scripts accepts a single numeric (1 or 2) indicating whether the model should be trained on 100k or 250k version of MEGA-DT.
To train/evaluate the dependency model,
bash scripts/train_dep.sh dataset_id
bash scripts/eval_dep.sh dataset_id
bash scripts/train_pointer.sh dataset_id
bash scripts/eval_pointer.sh dataset_id
bash scripts/train_dep_treetrain_baseline.sh dataset_id
bash scripts/eval_dep_treetrain_baseline.sh dataset_id
bash scripts/eval_lm_baseline.sh
You can set hyperparameters and device type in the training/testing scripts for each model individually. The parameter values used in our experiments are already specified there.
@inproceedings{guz-carenini-2020-towards,
title = "Towards Domain-Independent Text Structuring Trainable on Large Discourse Treebanks",
author = "Guz, Grigorii and
Carenini, Giuseppe",
booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
month = nov,
year = "2020",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://www.aclweb.org/anthology/2020.findings-emnlp.281",
pages = "3141--3152",
}