Skip to content

1.18.35

Compare
Choose a tag to compare
@fhieber fhieber released this 12 Jul 17:31
· 465 commits to main since this release

[1.18.35]

Added

  • ROUGE scores are now available in sockeye-evaluate.
  • Enabled CHRF as an early-stopping metric.
  • Added support for --beam-search-stop first for decoding jobs with --batch-size > 1.
  • Now supports negative constraints, which are phrases that must not appear in the output.
    • Global constraints can be listed in a (pre-processed) file, one per line: --avoid-list FILE
    • Per-sentence constraints are passed using the avoid keyword in the JSON object, with a list of strings as its field value.
  • Added option to pad vocabulary to a multiple of x: e.g. --pad-vocab-to-multiple-of 16.
  • Pre-training the RNN decoder. Usage:
    1. Train with flag --decoder-only.
    2. Feed identical source/target training data.

Fixed

  • Preserving max output length for each sentence to allow having identical translations for both with and without batching.

Changed

  • No longer restrict the vocabulary to 50,000 words by default, but rather create the vocabulary from all words which occur at least --word-min-count times. Specifying --num-words explicitly will still lead to a restricted
    vocabulary.