Different datasets for developing, evaluating and testing keyword extraction algorithms. For benchmarking performance see: O. Medelyan. 2009. Human-competitive automatic topic indexing. PhD Thesis. University of Waikato, New Zealand.
NLM_500.zip - 500 PubMed documents with MeSH terms
fao780.tar.gz - 780 FAO publications with Agrovoc terms
fao30.tar.gz - 30 FAO publications, each annotated by 6 professional FAO indexers
citeulike180.tar.gz - 180 publications crawled from CiteULike, and keywords assigned by different CiteULike users who saved these publications
SemEval2010-Maui.zip - SemEval-2010 Keyphrase extraction track data in Maui format
keyphrextr.tar.gz - Keyphrase extraction model created using SemEval-2010 training data. This model is used in the Maui GPL demo when no vocabulary is selected.
wiki20.tar.gz - 20 Computer Science papers, each annotated with at least 5 Wikipedia articles by 15 teams of indexers