Keyword Extraction Datasets

Different datasets for developing, evaluating and testing keyword extraction algorithms. For benchmarking performance see: O. Medelyan. 2009. Human-competitive automatic topic indexing. PhD Thesis. University of Waikato, New Zealand.

Extracting keywords using a controlled vocabulary or a thesaurus as a source:

NLM_500.zip - 500 PubMed documents with MeSH terms

fao780.tar.gz - 780 FAO publications with Agrovoc terms

fao30.tar.gz - 30 FAO publications, each annotated by 6 professional FAO indexers

Free-text keyword extraction (without a vocabulary):

citeulike180.tar.gz - 180 publications crawled from CiteULike, and keywords assigned by different CiteULike users who saved these publications

SemEval2010-Maui.zip - SemEval-2010 Keyphrase extraction track data in Maui format

keyphrextr.tar.gz - Keyphrase extraction model created using SemEval-2010 training data. This model is used in the Maui GPL demo when no vocabulary is selected.

Extracting keywords using Wikipedia as a controlled vocabulary of allowed terms:

wiki20.tar.gz - 20 Computer Science papers, each annotated with at least 5 Wikipedia articles by 15 teams of indexers

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Keyword Extraction Datasets

Extracting keywords using a controlled vocabulary or a thesaurus as a source:

Free-text keyword extraction (without a vocabulary):

Extracting keywords using Wikipedia as a controlled vocabulary of allowed terms:

About

Releases

Packages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
NLM_500.zip		NLM_500.zip
README.md		README.md
SemEval2010-Maui.zip		SemEval2010-Maui.zip
citeulike180.tar.gz		citeulike180.tar.gz
fao30.tar.gz		fao30.tar.gz
fao780.tar.gz		fao780.tar.gz
keyphrextr.tar.gz		keyphrextr.tar.gz
theses100.zip		theses100.zip
wiki20.tar.gz		wiki20.tar.gz

zelandiya/keyword-extraction-datasets

Folders and files

Latest commit

History

Repository files navigation

Keyword Extraction Datasets

Extracting keywords using a controlled vocabulary or a thesaurus as a source:

Free-text keyword extraction (without a vocabulary):

Extracting keywords using Wikipedia as a controlled vocabulary of allowed terms:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages