Linguistic data sets in Portuguese via cooperation with communities

Note: as described on issues#2 this file, README.en.md, is result of machine translation. No attempt is made to correct translation errors, in order to allow future volunteers to remain involved because they do not know English.

[work in progress] Permanent project to coordinate the creation and update linguistic data sets (such as those that can be used to detect discrimination and hate speech) preferably validated by people representatives of affected groups or subject matter experts. Dedicated to public domain.

Table of Contents

Data set
Groups involved
Working files

Data set

NOTE: at the moment, 2020-12-01, the content made available here is not ready for end use and primarily serves to test strategies for how to collect and HXL hashtags to use to classify information.

HXL-CPLP-Publico
- https://drive.google.com/drive/u/1/folders/1VLm29IBV6iOnfagRKKD8cLntDAjIjL0z

Groups involved

Role of Etica.AI

Unlike EticaAI/linguistic-datasets-portuguese (which is a list for different data sets in Portuguese from different sources) this repository contains reference for the data sets themselves where Etica.AI serves as organization to allow collaboration on an ongoing basis.

Linguistic datasets in Portuguese are rare, not very complete and, when they exist, often are on restricted use license or depend on access to APIs proprietary, even if free. The importance of our work here, from even freeing commercial use, has the potential to help with automation (such as detection of verbal attacks).

Role of HXL-CPLP

Not only HXL (The Humanitarian eXchange Language) is our main form data storage in this project, as there is an exchange of aids, via with people who already work in the information technology area of international humanitarian organizations.

Your feedback on how to improve collaboration processes can impact even even outside Portuguese-speaking countries. You, whether you are a developer of software to even a typically affected community member (even without knowing English or without having affinity with computers) if you are interested we can help you prepare beyond your home country.

Role of people in the community

For the purposes of this project, both Etica.AI and HXL-CPLP people should be seen as facilitators, not creators. Community people affected, even if they are not specialists with an academic doctorate (but who, still, has the courage to help assemble initial content that can be revised in the future) are the main enablers of every idea.

One of the implications of data sets dedicated to the public domain is that the final result may not contain names of individuals (not even Etica.AI / HXL-CPLP) as much as possible we will see alternative ways of valuing in special contribution from people who help to coordinate / revalidate work of others or who created meaningful initial content even if you prefer not to assume authorship of your contributions for fear of retaliation.

Working files

HXL-CPLP-Publico
- https://drive.google.com/drive/u/1/folders/1VLm29IBV6iOnfagRKKD8cLntDAjIjL0z

Licença

As far as possible under the law, Etica.AI waived all copyrights and neighboring or neighboring rights to this work for the [Public Domain] (UNLICENSE).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.en.md

README.en.md

Linguistic data sets in Portuguese via cooperation with communities

Data set

Groups involved

Role of Etica.AI

Role of HXL-CPLP

Role of people in the community

Working files

Licença

Files

README.en.md

Latest commit

History

README.en.md

File metadata and controls

Linguistic data sets in Portuguese via cooperation with communities

Data set

Groups involved

Role of Etica.AI

Role of HXL-CPLP

Role of people in the community

Working files

Licença