Domain-Independent Neural Text Structuring

This is the code for our paper on text structuring with silver-standard discourse trees from MEGA-DT treebank.

Requirements

Python (3.6+)
Pytorch (1.3.0+)
dgl (0.4.2 strictly)
Transformers (3.0.2)

Running experiments

Create the folder named "data".
Download the pickled versions of MEGA-DT here (100k train, 250k train, 5k val, 15k test), and place it in the "data" folder.
Run the train/testing script as described below. Each scripts accepts a single numeric (1 or 2) indicating whether the model should be trained on 100k or 250k version of MEGA-DT.

Dependency Model

To train/evaluate the dependency model,

bash scripts/train_dep.sh dataset_id
bash scripts/eval_dep.sh dataset_id

Pointer Model

bash scripts/train_pointer.sh dataset_id
bash scripts/eval_pointer.sh dataset_id

Dependency no-pointer Baseline

bash scripts/train_dep_treetrain_baseline.sh  dataset_id
bash scripts/eval_dep_treetrain_baseline.sh dataset_id

Language Model Decoding Baseline

bash scripts/eval_lm_baseline.sh

Configuration

You can set hyperparameters and device type in the training/testing scripts for each model individually. The parameter values used in our experiments are already specified there.

Citation

@inproceedings{guz-carenini-2020-towards,
   title = "Towards Domain-Independent Text Structuring Trainable on Large Discourse Treebanks",
   author = "Guz, Grigorii  and
     Carenini, Giuseppe",
   booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
   month = nov,
   year = "2020",
   address = "Online",
   publisher = "Association for Computational Linguistics",
   url = "https://www.aclweb.org/anthology/2020.findings-emnlp.281",
   pages = "3141--3152",
}

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
dataset		dataset
models		models
scripts		scripts
.gitignore		.gitignore
README.md		README.md
eval_funcs.py		eval_funcs.py
main.py		main.py
preprocess.py		preprocess.py
requirements.txt		requirements.txt
runner.py		runner.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Domain-Independent Neural Text Structuring

Requirements

Running experiments

Dependency Model

Pointer Model

Dependency no-pointer Baseline

Language Model Decoding Baseline

Configuration

Citation

About

Releases

Packages

Languages

grig-guz/tree-content-structuring

Folders and files

Latest commit

History

Repository files navigation

Domain-Independent Neural Text Structuring

Requirements

Running experiments

Dependency Model

Pointer Model

Dependency no-pointer Baseline

Language Model Decoding Baseline

Configuration

Citation

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages