Skip to content

Commit

Permalink
Add files via upload
Browse files Browse the repository at this point in the history
9 PORTULAN corpora added
  • Loading branch information
jakoble authored Nov 6, 2024
1 parent ffbda57 commit 7323781
Show file tree
Hide file tree
Showing 9 changed files with 131 additions and 0 deletions.
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
{
"Name": "CINTIL DependencyBank PREMIUM",
"URL": "https://hdl.handle.net/21.11129/0000-000B-D378-0",
"Family": "Manually annotated corpora",
"Description": "This is a corpus of Portuguese utterances manually annotated with the representation of grammatical dependency relations and the information of part-of-speech, inflection and lemmas.\nThe corpus is available from PORTULAN.",
"Language": ["por"],
"Licence": "MS NC-NoReD-ND",
"Size": ["3,000 Sentences", "79,378 tokens"],
"Annotation": ["PoS tagged", "lemmatised", "syntactically parsed"],
"Infrastructure": "CLARIN",
"Group": ["Syntactic parsing"],
"Access": {
"Download": "https://hdl.handle.net/21.11129/0000-000B-D378-0"
},
"Publication": "http://hdl.handle.net/10451/20226"
}
16 changes: 16 additions & 0 deletions corpora/manually-annotated-corpora/cintil-logicalformbank.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
{
"Name": "CINTIL-LogicalFormBank",
"URL": "https://hdl.handle.net/21.11129/0000-000B-D334-C",
"Family": "Manually annotated corpora",
"Description": "This is a a corpus of semantic dependencies between sentences taken from several domains (novels and news primarily).\nThe corpus is composed of representations of each sentence’s semantic relations resulting from a previous semi-automatic analysis with a double-blind annotation followed by adjudication.\nThe corpus is available from PORTULAN.",
"Language": ["por"],
"Licence": "MS NC-NoReD-ND",
"Size": ["10,039 sentences"],
"Annotation": ["semantic relations"],
"Infrastructure": "CLARIN",
"Group": ["Other annotation layers"],
"Access": {
"Download": "https://hdl.handle.net/21.11129/0000-000B-D334-C"
},
"Publication": ""
}
16 changes: 16 additions & 0 deletions corpora/manually-annotated-corpora/cintil-namedentities.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
{
"Name": "CINTIL-NamedEntities",
"URL": "https://hdl.handle.net/21.11129/0000-000B-D37E-A",
"Family": "Manually annotated corpora",
"Description": "This corpus consists of sentences with assigned named entities.\nThe named entities were manually disambiguated and annotated with links to appropriate pages in the <a href=\"http://pt.dbpedia.org/\">Portuguese Dbpedia</a>.\nThe corpus is available from PORTULAN.",
"Language": ["por"],
"Licence": "MS NC-NoReD-ND",
"Size": ["685,000 tokens"],
"Annotation": [""],
"Infrastructure": "CLARIN",
"Group": ["Named entity recognition"],
"Access": {
"Download": "https://hdl.handle.net/21.11129/0000-000B-D37E-A"
},
"Publication": ""
}
16 changes: 16 additions & 0 deletions corpora/manually-annotated-corpora/cintil-qatreebank.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
{
"Name": "CINTIL-QATreeBank",
"URL": "https://hdl.handle.net/21.11129/0000-000B-D328-A",
"Family": "Manually annotated corpora",
"Description": "This is a treebank containing sentences that can be used to support the development of Question Answering systems.\nFor the creation of the treebank, declarative sentences were manually transformed into interrogative and imperative ones.\nThe non-declarative sentences are annotated with several layers of linguistic information, namely (i) trees with information on constituency and grammatical function; (ii) sentence type; (iii) interrogative pronoun; (iv) question type; and (v) semantic type of expected answer. Moreover, these non-declarative sentences are paired with their declarative counterparts and associated with the expected answer snippets.\n.The treebank is available from PORTULAN.",
"Language": ["por"],
"Licence": "CC BY",
"Size": ["111 sentences"],
"Annotation": ["syntactically parsed"],
"Infrastructure": "CLARIN",
"Group": ["Other annotation layers"],
"Access": {
"Download": "https://hdl.handle.net/21.11129/0000-000B-D328-A"
},
"Publication": ""
}
18 changes: 18 additions & 0 deletions corpora/manually-annotated-corpora/cintil-wordsenses.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
{
"Name": "CINTIL-WordSenses",
"URL": "https://hdl.handle.net/21.11129/0000-000B-D37D-B",
"Family": "Manually annotated corpora",
"Description": "This corpus contains open-class terms manually disambiguated and annotated with synset identifiers from the <a href=\"https://catalogue.elra.info/en-us/repository/browse/ELRA-M0050/\">Portuguese MultiWordNet</a>.\nThe corpus is available from PORTULAN.",
"Language": ["por"],
"Licence": "MS NC-NoReD-ND",
"Size": [""],
"Annotation": ["24,000 sentences", "508,000 tokens"],
"Infrastructure": "CLARIN",
"Group": ["Other annotation layers"],
"Access": {
"Download": "https://hdl.handle.net/21.11129/0000-000B-D37D-B"
},
"Publication": ""
}


Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{ "Name": "CORDIAL-SIN – Syntax-oriented Corpus of Portuguese Dialects – TreeBank", "URL": "https://hdl.handle.net/21.11129/0000-000D-F91C-C", "Family": "Manually annotated corpora", "Description": "This treebank follows constituency-based system originally developed for the <a href = \"https://catalog.ldc.upenn.edu/LDC2020T16\">Penn Parsed Corpora of Historical English</a>\nThere are 177596 syntactic parse trees extracted from the <a href=\"https://hdl.handle.net/21.11129/0000-000D-F945-D\">Syntax-oriented Corpus of Portuguese Dialects</a>.\nThe corpus is available from PORTULAN.", "Language": ["por"], "Licence": "CC BY-NC-ND", "Size": ["600,000 words"], "Annotation": ["PoS tagged", "syntactically parsed"], "Infrastructure": "CLARIN", "Group": ["Syntactic parsing"], "Access": { "Download": "https://hdl.handle.net/21.11129/0000-000D-F91C-C" }, "Publication": "https://clul.ulisboa.pt/sites/default/files/recursos/pos_annotation_manual.pdf"}
Expand Down
16 changes: 16 additions & 0 deletions corpora/manually-annotated-corpora/himera-corpus.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
{
"Name": "HIMERA Corpus",
"URL": "https://hdl.handle.net/21.11129/0000-000B-D379-F",
"Family": "Manually annotated corpora",
"Description": "This corpus contains a set of published historical medical documents that have been manually annotated with semantic information that is relevant to the study of medical history and public health.\nSpecifically, annotations correspond to seven different entity types and two different event types (which encode relationships amongst entities), chosen based on extensive discussions with medical historians.\nThe corpus is available from PORTULAN.",
"Language": ["por"],
"Licence": "CC BY",
"Size": ["39 articles", "70,000 words"],
"Annotation": ["events", "named entities"],
"Infrastructure": "CLARIN",
"Group": ["Other annotation layers"],
"Access": {
"Download": "https://hdl.handle.net/21.11129/0000-000B-D379-F"
},
"Publication": ""
}
16 changes: 16 additions & 0 deletions corpora/manually-annotated-corpora/porttinari.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
{
"Name": "Porttinari – PORTuguese Treebank",
"URL": "https://hdl.handle.net/21.11129/0000-0011-2ACB-A",
"Family": "Manually annotated corpora",
"Description": "This is a treebank whose syntactic annotations follow the <a href=\"https://universaldependencies.org/\">Universal Dependencies</a> framework.\nThe treebank is available from PORTULAN.",
"Language": ["por"],
"Licence": "CC BY",
"Size": ["8,400 sentences", "168,000 tokens "],
"Annotation": ["PoS tagged","syntactically parsed"],
"Infrastructure": "CLARIN",
"Group": ["Syntactic parsing"],
"Access": {
"Download": "https://hdl.handle.net/21.11129/0000-0011-2ACB-A"
},
"Publication": ""
}
16 changes: 16 additions & 0 deletions corpora/manually-annotated-corpora/ps-corpus-treebank.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
{
"Name": "PS corpus (Post-Scriptum) - treebank",
"URL": "https://hdl.handle.net/21.11129/0000-000D-F924-2",
"Family": "Manually annotated corpora",
"Description": "This treebank is a syntactically annotated subset of the Portuguese <a href=\"https://hdl.handle.net/21.11129/0000-000D-F91F-9\">PS corpus (Post-Scriptum)-PT</a> and the Spanish <a href=\"https://hdl.handle.net/21.11129/0000-000D-F918-0 \">PS corpus (Post-Scriptum)-ES</a> corpora (see also the <a href=\"https://www.clarin.eu/resource-families/historical-corpora\">Historical corpora</a> resource family).\nThe treebank is available from PORTULAN.",
"Language": ["spa", "por"],
"Licence": "CC BY-NC-ND",
"Size": ["2,368 texts"],
"Annotation": ["syntactically parsed"],
"Infrastructure": "CLARIN",
"Group": ["Syntactic parsing"],
"Access": {
"Download": "https://hdl.handle.net/21.11129/0000-000D-F924-2"
},
"Publication": "https://www.janusdigital.es/anexo.htm?id=5"
}

0 comments on commit 7323781

Please sign in to comment.