Skip to content

Latest commit

 

History

History
72 lines (46 loc) · 2.72 KB

README.md

File metadata and controls

72 lines (46 loc) · 2.72 KB

GSOC 2019 - Development of a Greek open source Morphological dictionary and application of it to Greek spelling tools

Dictionary Download

  • An SQL database containing the following data
  1. A morphological dictionary containing about 900.000 entries, with 518.000 distinct surface forms with information described according to Universal Dependencies.
  2. Definitions for most lemmas
  3. Etymologies for most lemmas
  4. 18500 Synonyms, 12500 of which are for Greek
  5. 5500 Antonyms, 4300 of which are for Greek
  6. 3310 Normalizations of words
  7. Almost 150.000 Translations
  • A spelling dictionary with 1.047.200 words, up from the 828.807 of the previous dictionary used in open source programs. The dictionary also includes frequencies for all words. It will be integrated into spelling dictionaries of Firefox and Thunderbird.

Documentation

Documentation can be in the directory data

Running the script

Information about running the script is found here

Final Report

You can find the final report in the following gist.

Project goals

During the summer a Morphological dictionary in sqlite3 format will be created. Information will be extracted automatically with a python script and using the pymediawiki library. In addition words and morphological information will be added to the spelling tool dictionaries.

Timeline

Phase 1 (May 27 - Jun 28)

Creation of a parsing tool for Greek wiktionary that parses nouns, adjectives, verbs using Universal Dependencies POS tags

Phase 2 (Jun 29 - Jul 26)

Addition of remaining parts of speech to the Morphological dictionary and addition of further information tags like toponyms and terminology extracted from page categories.

Phase 3 (Jul 27 - Aug 26)

Addition of extracted surface forms to Greek spelling dictionaries including words from reliable sources like European parliament translations.

Contributors

  • Google summer of code participant: Konstantinos Agiannis
  • Mentor: Kostas Papadimas
  • Mentor: Theodoros Karounos
  • Mentor: Alexios Zavras

License

The source code is under GPLv3.

The produced database with the morphological dictionary is under CC BY-SA 3.0

Links