This repository contains tools for compiling and deploying dictionaries for LanguageTool.
The owner, maintainer, and main dev for this repository is @p-goulart. Any potential shell and perl components may be better explained by @jaumeortola, though.
This is set up as a Poetry project, so you must have Poetry installed and ready to go.
Make sure you are using a virtual environment and then:
poetry install --with test,dev
In addition to the Python dependencies, you will also need to have Hunspell binaries installed on your system.
The most important one is unmunch
. Check if it's installed:
which unmunch
# should return a path to a bin directory, like
# /opt/homebrew/bin/unmunch
If it's not installed, you may need to compile Hunspell from source. Clone the Hunspell repo and then, from inside it, these steps should work on Ubuntu:
# install a bunch of dependencies needed for compilation
sudo apt-get install autoconf automake autopoint libtool
autoreconf -vfi
./configure
make
sudo make install
sudo ldconfig
The scripts here also depend on the languagetool
Java codebase (for word tokenisation).
Make sure you have LT cloned locally, and export the following environment variable in your shell configuration:
export LT_HOME=/path/to/languagetool
If this is not done, the code in this project will set that variable as a default to ../languagetool
(meaning one
directory up from wherever this repo is cloned).
This repository should be a submodule of language-specific repositories. For example, the Portuguese repository.
dict_tools
, which uses the underscore.
If you don't do this, you may fail to import it as a module.
This is the script that takes compiles source files into a binary dictionary to be used by the LT POS tagger, Word Tokeniser, and Synthesiser.
You can check the usage parameters by invoking it with --help
:
poetry run python scripts/build_tagger_dicts.py --help
This is the script that takes all the Hunspell and helper files as input and yields as output binary files to be used by the Morfologik speller.
You can check the usage parameters by invoking it with --help
:
poetry run python scripts/build_spelling_dicts.py --help