Skip to content

Latest commit

 

History

History
31 lines (20 loc) · 1.58 KB

README.md

File metadata and controls

31 lines (20 loc) · 1.58 KB

corpus2question

This repository presents corpus2question, a method for summarizing and exploring datasets based on latent questions on documents. It also contains the reference implementation for the paper Can questions summarize a corpus? Using question generation for characterizing COVID-19 research.

The method

Open All Collab

corpus2question relies on the question generation network used in doc2query and frequency aggregations. Check our tutorial for a small example.

Results over the CORD-19 dataset

All raw generated questions over the CORD-19 dataset are available at this link in the CSV format. You can also find the aggregated top 10k at this link. The reference implementation for the paper is available at this notebook.

Citing this work

If you use corpus2question on your academic work, or use the generated questions over the CORD-19 dataset, please cite us with:

@misc{surita2020questions,
    title={Can questions summarize a corpus? Using question generation for characterizing COVID-19 research},
    author={Gabriela Surita and Rodrigo Nogueira and Roberto Lotufo},
    year={2020},
    eprint={2009.09290},
    archivePrefix={arXiv},
    primaryClass={cs.IR}
}