-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
9 PORTULAN corpora added
- Loading branch information
Showing
9 changed files
with
131 additions
and
0 deletions.
There are no files selected for viewing
16 changes: 16 additions & 0 deletions
16
corpora/manually-annotated-corpora/cintil-dependency-bank-premium.json
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
{ | ||
"Name": "CINTIL DependencyBank PREMIUM", | ||
"URL": "https://hdl.handle.net/21.11129/0000-000B-D378-0", | ||
"Family": "Manually annotated corpora", | ||
"Description": "This is a corpus of Portuguese utterances manually annotated with the representation of grammatical dependency relations and the information of part-of-speech, inflection and lemmas.\nThe corpus is available from PORTULAN.", | ||
"Language": ["por"], | ||
"Licence": "MS NC-NoReD-ND", | ||
"Size": ["3,000 Sentences", "79,378 tokens"], | ||
"Annotation": ["PoS tagged", "lemmatised", "syntactically parsed"], | ||
"Infrastructure": "CLARIN", | ||
"Group": ["Syntactic parsing"], | ||
"Access": { | ||
"Download": "https://hdl.handle.net/21.11129/0000-000B-D378-0" | ||
}, | ||
"Publication": "http://hdl.handle.net/10451/20226" | ||
} |
16 changes: 16 additions & 0 deletions
16
corpora/manually-annotated-corpora/cintil-logicalformbank.json
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
{ | ||
"Name": "CINTIL-LogicalFormBank", | ||
"URL": "https://hdl.handle.net/21.11129/0000-000B-D334-C", | ||
"Family": "Manually annotated corpora", | ||
"Description": "This is a a corpus of semantic dependencies between sentences taken from several domains (novels and news primarily).\nThe corpus is composed of representations of each sentence’s semantic relations resulting from a previous semi-automatic analysis with a double-blind annotation followed by adjudication.\nThe corpus is available from PORTULAN.", | ||
"Language": ["por"], | ||
"Licence": "MS NC-NoReD-ND", | ||
"Size": ["10,039 sentences"], | ||
"Annotation": ["semantic relations"], | ||
"Infrastructure": "CLARIN", | ||
"Group": ["Other annotation layers"], | ||
"Access": { | ||
"Download": "https://hdl.handle.net/21.11129/0000-000B-D334-C" | ||
}, | ||
"Publication": "" | ||
} |
16 changes: 16 additions & 0 deletions
16
corpora/manually-annotated-corpora/cintil-namedentities.json
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
{ | ||
"Name": "CINTIL-NamedEntities", | ||
"URL": "https://hdl.handle.net/21.11129/0000-000B-D37E-A", | ||
"Family": "Manually annotated corpora", | ||
"Description": "This corpus consists of sentences with assigned named entities.\nThe named entities were manually disambiguated and annotated with links to appropriate pages in the <a href=\"http://pt.dbpedia.org/\">Portuguese Dbpedia</a>.\nThe corpus is available from PORTULAN.", | ||
"Language": ["por"], | ||
"Licence": "MS NC-NoReD-ND", | ||
"Size": ["685,000 tokens"], | ||
"Annotation": [""], | ||
"Infrastructure": "CLARIN", | ||
"Group": ["Named entity recognition"], | ||
"Access": { | ||
"Download": "https://hdl.handle.net/21.11129/0000-000B-D37E-A" | ||
}, | ||
"Publication": "" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
{ | ||
"Name": "CINTIL-QATreeBank", | ||
"URL": "https://hdl.handle.net/21.11129/0000-000B-D328-A", | ||
"Family": "Manually annotated corpora", | ||
"Description": "This is a treebank containing sentences that can be used to support the development of Question Answering systems.\nFor the creation of the treebank, declarative sentences were manually transformed into interrogative and imperative ones.\nThe non-declarative sentences are annotated with several layers of linguistic information, namely (i) trees with information on constituency and grammatical function; (ii) sentence type; (iii) interrogative pronoun; (iv) question type; and (v) semantic type of expected answer. Moreover, these non-declarative sentences are paired with their declarative counterparts and associated with the expected answer snippets.\n.The treebank is available from PORTULAN.", | ||
"Language": ["por"], | ||
"Licence": "CC BY", | ||
"Size": ["111 sentences"], | ||
"Annotation": ["syntactically parsed"], | ||
"Infrastructure": "CLARIN", | ||
"Group": ["Other annotation layers"], | ||
"Access": { | ||
"Download": "https://hdl.handle.net/21.11129/0000-000B-D328-A" | ||
}, | ||
"Publication": "" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
{ | ||
"Name": "CINTIL-WordSenses", | ||
"URL": "https://hdl.handle.net/21.11129/0000-000B-D37D-B", | ||
"Family": "Manually annotated corpora", | ||
"Description": "This corpus contains open-class terms manually disambiguated and annotated with synset identifiers from the <a href=\"https://catalogue.elra.info/en-us/repository/browse/ELRA-M0050/\">Portuguese MultiWordNet</a>.\nThe corpus is available from PORTULAN.", | ||
"Language": ["por"], | ||
"Licence": "MS NC-NoReD-ND", | ||
"Size": [""], | ||
"Annotation": ["24,000 sentences", "508,000 tokens"], | ||
"Infrastructure": "CLARIN", | ||
"Group": ["Other annotation layers"], | ||
"Access": { | ||
"Download": "https://hdl.handle.net/21.11129/0000-000B-D37D-B" | ||
}, | ||
"Publication": "" | ||
} | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
{ | ||
"Name": "HIMERA Corpus", | ||
"URL": "https://hdl.handle.net/21.11129/0000-000B-D379-F", | ||
"Family": "Manually annotated corpora", | ||
"Description": "This corpus contains a set of published historical medical documents that have been manually annotated with semantic information that is relevant to the study of medical history and public health.\nSpecifically, annotations correspond to seven different entity types and two different event types (which encode relationships amongst entities), chosen based on extensive discussions with medical historians.\nThe corpus is available from PORTULAN.", | ||
"Language": ["por"], | ||
"Licence": "CC BY", | ||
"Size": ["39 articles", "70,000 words"], | ||
"Annotation": ["events", "named entities"], | ||
"Infrastructure": "CLARIN", | ||
"Group": ["Other annotation layers"], | ||
"Access": { | ||
"Download": "https://hdl.handle.net/21.11129/0000-000B-D379-F" | ||
}, | ||
"Publication": "" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
{ | ||
"Name": "Porttinari – PORTuguese Treebank", | ||
"URL": "https://hdl.handle.net/21.11129/0000-0011-2ACB-A", | ||
"Family": "Manually annotated corpora", | ||
"Description": "This is a treebank whose syntactic annotations follow the <a href=\"https://universaldependencies.org/\">Universal Dependencies</a> framework.\nThe treebank is available from PORTULAN.", | ||
"Language": ["por"], | ||
"Licence": "CC BY", | ||
"Size": ["8,400 sentences", "168,000 tokens "], | ||
"Annotation": ["PoS tagged","syntactically parsed"], | ||
"Infrastructure": "CLARIN", | ||
"Group": ["Syntactic parsing"], | ||
"Access": { | ||
"Download": "https://hdl.handle.net/21.11129/0000-0011-2ACB-A" | ||
}, | ||
"Publication": "" | ||
} |
16 changes: 16 additions & 0 deletions
16
corpora/manually-annotated-corpora/ps-corpus-treebank.json
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
{ | ||
"Name": "PS corpus (Post-Scriptum) - treebank", | ||
"URL": "https://hdl.handle.net/21.11129/0000-000D-F924-2", | ||
"Family": "Manually annotated corpora", | ||
"Description": "This treebank is a syntactically annotated subset of the Portuguese <a href=\"https://hdl.handle.net/21.11129/0000-000D-F91F-9\">PS corpus (Post-Scriptum)-PT</a> and the Spanish <a href=\"https://hdl.handle.net/21.11129/0000-000D-F918-0 \">PS corpus (Post-Scriptum)-ES</a> corpora (see also the <a href=\"https://www.clarin.eu/resource-families/historical-corpora\">Historical corpora</a> resource family).\nThe treebank is available from PORTULAN.", | ||
"Language": ["spa", "por"], | ||
"Licence": "CC BY-NC-ND", | ||
"Size": ["2,368 texts"], | ||
"Annotation": ["syntactically parsed"], | ||
"Infrastructure": "CLARIN", | ||
"Group": ["Syntactic parsing"], | ||
"Access": { | ||
"Download": "https://hdl.handle.net/21.11129/0000-000D-F924-2" | ||
}, | ||
"Publication": "https://www.janusdigital.es/anexo.htm?id=5" | ||
} |