-
Notifications
You must be signed in to change notification settings - Fork 58
Licenses for datasets
Ricardo Usbeck edited this page Mar 23, 2023
·
1 revision
Task | Type | License | Language |
---|---|---|---|
A2KB | news | LDC | en |
- https://catalog.ldc.upenn.edu/LDC2005T09
- Available at: https://cogcomp.cs.illinois.edu/page/resource_view/4
- This dataset is already included in gerbil_data.zip
Task | Type | License | Language |
---|---|---|---|
A2KB | news | CoNLL Licence | en |
- https://www.mpi-inf.mpg.de/departments/databases-and-information-systems/research/yago-naga/aida/downloads/
- This dataset is not included in gerbil_data.zip. Installation: the implemented adapter for the AIDA/CoNLL dataset expects the following file
gerbil_data/datasets/aida/AIDA-YAGO2-dataset-update.tsv
The adapter is working with the original AIDA-YAGO2-dataset.tsv
file as well. The differences between the original and the updated file seem to be the replacement of YAGO URL paths with IDs. However, our adapter does not use these values.
Task | Type | License | Language |
---|---|---|---|
A2KB | news | LDC User Agreement for Non-Members | en |
- https://catalog.ldc.upenn.edu/LDC2002T31
- Graff, D. 2002. The AQUAINT corpus of English news text. Technical report, Linguistic Data Consortium, Philadelphia, PA, USA.
- This dataset is not included in gerbil_data.zip. Installation: the implemented adapter for the AQUAINT dataset expects the following folders
gerbil_data/datasets/AQUAINT/RawTexts
gerbil_data/datasets/AQUAINT/Problems
Task | Type | License | Language |
---|---|---|---|
A2KB | news | CC BY 4.0 | en |
- http://www.yovisto.com/labs/ner-benchmarks/
- This dataset is already included in gerbil_data.zip
Task | Type | License | Language |
---|---|---|---|
A2KB | microposts | CC BY 4.0 | en |
- http://www.derczynski.com/sheffield/resources/ipm_nel.tar.gz
- Needs to be added to gerbil_data.zip
Task | Type | License | Language |
---|---|---|---|
A2KB | mixed | Public Domain | en |
- http://www.cse.iitb.ac.in/~soumen/doc/CSAW/Annot/
- This dataset is already included in gerbil_data.zip
Task | Type | License | Language |
---|---|---|---|
A2KB | news | CC BY 4.0 | en |
- http://www.yovisto.com/labs/ner-benchmarks/
- J. Hoffart, S. Seufert, D. B. Nguyen, M. Theobald, and G. Weikum, \KORE & Keyphrase Overlap Relatedness for Entity Disambiguation," presented at the Proceedings of the 21set ACM International Conference on Information and Knowledge Management, CIKM 2012, Hawaii, USA, 2012.
- This dataset is already included in gerbil_data.zip
Task | Type | License | Language |
---|---|---|---|
RT2KB | microposts | CC BY-NC-SA 3.0 | en |
- http://oak.dcs.shef.ac.uk/msm2013/ie_challenge/MSM2013-CEChallengeFinal.zip
- This dataset is not included in gerbil_data.zip. Installation: the implemented adapter expects the following files
gerbil_data/datasets/microposts2013/goldStandard.tsv
gerbil_data/datasets/microposts2013/testSet.tsv
gerbil_data/datasets/microposts2013/TweetsTrainingSetCH.tsv
Task | Type | License | Language |
---|---|---|---|
A2KB | microposts | Twitter license | en |
- http://www.scc.lancs.ac.uk/microposts2014/challenge/dataset/microposts2014-neel_challenge_gs.zip
- This dataset is not included in gerbil_data.zip. Installation: the implemented adapter expects the following files
gerbil_data/datasets/microposts2014/Microposts2014-NEEL_challenge_TweetsTestSet.csv
gerbil_data/datasets/microposts2014/Microposts2014-NEEL_challenge_TweetsTrainingSet.csv
Task | Type | License | Language |
---|---|---|---|
A2KB | microposts | CC BY 4.0 | en |
- Needs to be added to gerbil_data.zip
- This dataset is not included in gerbil_data.zip. Installation: the implemented adapter expects the following files
gerbil_data/datasets/microposts2015/dev/NEEL2015-dev-gold_v3.tsv
gerbil_data/datasets/microposts2015/dev/NEEL2015-dev-tweets.tsv
gerbil_data/datasets/microposts2015/test/NEEL2015-test-gold_v2.tsv
gerbil_data/datasets/microposts2015/test/NEEL2015-test-tweets.tsv
gerbil_data/datasets/microposts2015/training/NEEL2015-training-gold_v4.ts
gerbil_data/datasets/microposts2015/training/NEEL2015-training-tweets_v2.tsv
Task | Type | License | Language |
---|---|---|---|
A2KB | microposts | CC BY 4.0 | en |
- Needs to be added to gerbil_data.zip
- This dataset is not included in gerbil_data.zip. Installation: the implemented adapter expects the following files
gerbil_data/datasets/microposts2016/Dev Set/NEEL2016-dev.tsv
gerbil_data/datasets/microposts2016/Dev Set/NEEL2016-dev_neel.gs
gerbil_data/datasets/microposts2016/Test Set/NEEL2016-test.tsv
gerbil_data/datasets/microposts2016/Test Set/NEEL2016-test_neel.gs
gerbil_data/datasets/microposts2016/Training Set/NEEL2016-training.tsv
gerbil_data/datasets/microposts2016/Training Set/NEEL2016-training_neel.gs
Task | Type | License | Language |
---|---|---|---|
A2KB | news | - | en |
- http://cogcomp.cs.illinois.edu/page/resource_view/4
- http://research.microsoft.com/en-us/um/people/silviu/WebAssistant/TestData/
- S. Cucerzan. Large-scale named entity disambiguation based on wikipedia data. In Proc. EMNLP and CNLL, 708–716, 2007.
- This dataset is already included in gerbil_data.zip
Task | Type | License | Language |
---|---|---|---|
A2KB | news | CC-by-SA-NC 4.0 International License | en |
- https://github.com/AKSW/n3-collection
- This dataset is already included in gerbil_data.zip
Task | Type | License | Language |
---|---|---|---|
A2KB | RSS-feeds | CC-by-SA-NC 4.0 International License | en |
- https://github.com/AKSW/n3-collection
- This dataset is already included in gerbil_data.zip
Task | Type | License | Language |
---|---|---|---|
RT2KB | news | GNU v3 | en |
- https://github.com/aritter/twitter_nlp/blob/master/data/annotated/ner.txt
- This dataset needs to be included into gerbil_data.zip
Task | Type | License | Language |
---|---|---|---|
ERec | mixed | Public Domain | en |
- http://www.hipposmond.com/senseval2/Results/guidelines.htm#rawdata
- However, the corpora and corpus samples may be subject to copyright restrictions depending on the source.
Task | Type | License | Language |
---|---|---|---|
ERec | mixed | Public Domain | en |
Task | Type | License | Language |
---|---|---|---|
A2KB | microposts (Twitter) | CC-BY(?) | en |
- Locke, B. and Martin, J. (2009). Named entity recognition: Adapting to microblogging. Senior Thesis, University of Colorado.
Task | Type | License | Language |
---|---|---|---|
A2KB | microposts (Twitter) | CC-BY(?) | en |
- Habib, M. B. and van Keulen, M. (2012). Unsupervised improvement of named entity extraction in short informal context using disambiguation clues. In Proceedings of the Workshop on Semantic Web and Information Extraction (SWAIE 2012), pages 1–10.
Task | Type | License | Language |
---|---|---|---|
RT2KB | news | BSD 2 | en |
Task | Type | License | Language |
---|---|---|---|
C2KB | microposts | - | en |