Skip to content

Workflow for JSON dump files

Aaron van Geffen edited this page Oct 25, 2020 · 4 revisions

The Localisation repository contains an objects folder. The JSON files in this folder can be used by translators to easily update strings in the objects repository. This document is intended to detail the workflow used to exchange information between these so-called dump files and the official JSON objects.

Creating a dump file for one language

A dump file for one particular language can be created by invoking language_dump.py, e.g.:

$ ./language_dump.py \
    -l ja-JP \
    -d ja-JP-all.json

This will create a file ja-JP-all.json containing all object strings, ordered by filename. It may be more practical to order them by English identifier, however. This is not done by the script, but can easily be done with a tool like jq:

$ jq 'to_entries | sort_by(.value."reference-name") | from_entries' \
    < ja-JP-all.json \
    > ja-JP-all-sorted.json

This takes ja-JP-all.json, orders its entries by their reference-name keys, and outputs the ordered version to ja-JP-all-sorted.json.

Creating a dump file for all languages

The same script, language_dump.py, can be used to create dump files for all languages:

$ mkdir dumps
$ ./language_dump.py -a -t dumps

This will export one dump file for each supported languages to the dumps folder. The locale name will be used as filename, combined with a '.json' extension.

As with any single dump file, the object keys will be ordered in filesystem order. We can employ jq to reorder them automatically as well:

$ mkdir dumps_sorted
$ for f in dumps/*.json; do \
    jq 'to_entries | sort_by(.value."reference-name") | from_entries' \
      < $f \
      > dumps_sorted/$(basename $f); \
  done

This takes all files in the dumps folder, orders their entries by their respective reference-name keys, and outputs the ordered versions the dumps_sorted folder.

Importing information from a dump file

When a dump file has been changed, its changes can be automatically imported back into the object files. For example, for nl-NL:

$ ./language_load.py -l nl-NL -i nl-NL_dump.json

Python's JSON serialiser produces different output to our own. The decision was made not to reserialise all object files, so we have to work around this. The language_clean_patch.py script is designed to only keep lines related to differences in translation, and leaves out all other changes:

$ git diff > patch.diff
$ git stash
$ ./language_clean_patch.py -l nl-NL -p patch.diff | git apply
$ git commit -m 'Updated object strings for nl-NL'

The workflow above produces a clean commit for all changes per language.

Importing information from all dump files

This is being worked on. For now, please refer to 'Importing information from a dump file'.