Releases · lex-lingo/lingo

Dropped support for Ruby 1.9.
Removed support for deprecated options and attendee names (old → new):
- Lingo::Language::Grammar:
  compositum → compound
- Lingo::Attendee::TextReader:
  lir-record-pattern → records
- Lingo::Config:
  multiworder → multi_worder,
  objectfilter → object_filter,
  textreader → text_reader,
  textwriter → text_writer,
  vectorfilter → vector_filter,
  wordsearcher → word_searcher
Lingo::Attendee::TextWriter learned format directives for ext option (currently supported are: %c = config name, %l = language name, %d = current date, %t = current time).
Lingo::Attendee::Sequencer remembers word form of sequences.
Updated and extended English system dictionary and suffix list.
Fixed errors with XML input (issue #15 by Thomas Berger).

Assets 2

15 Feb 13:02

blackwinter

v1.8.7

a287b7e

v1.8.7

Added Lingo::Attendee::LsiFilter to correlate semantically related terms
(LSI) over the
"corpus" of all files processed during a single program invocation; requires
lsi4r which in turn requires
rb-gsl. [EXPERIMENTAL: Interface may
be changed or removed in next release.]
Added Lingo::Attendee::HalFilter to correlate semantically related terms
(HAL) over
individual documents; requires hal4r
which in turn requires rb-gsl.
[EXPERIMENTAL: Interface may be changed or removed in next release.]
Added Lingo::Attendee::AnalysisFilter and associated lingoctl tooling.
Multiword dictionaries can now identify hyphenated variants (e.g.
automatic data-processing); set hyphenate: true in the
dictionary config.
Lingo::Attendee::Tokenizer no longer considers hyphens at word edges as part
of the word. As a consequence, Lingo::Attendee::Dehyphenizer has been
dropped.
Dropped Lingo::Attendee::NonewordFilter; use Lingo::Attendee::VectorFilter
with option lexicals: '\?' instead.
Lingo::Attendee::TextReader and Lingo::Attendee::TextWriter learned
encoding option to read/write text that is not UTF-8 encoded;
configuration files and dictionaries still need to be UTF-8, though.
Lingo::Attendee::TextReader and Lingo::Attendee::TextWriter learned to
read/write Gzip-compressed files (file extension .gz or .gzip).
Lingo::Attendee::Sequencer learned to recognize 0 in the pattern to match
number tokens.
Fixed Lingo::Attendee::TextReader to recognize BOM in input files; does not
apply to input read from STDIN.
Fixed regression introduced in 1.8.6 where Lingo::Attendee::Debugger would
no longer work immediately behind Lingo::Attendee::TextReader.
Fixed lingoctl copy commands when overwriting existing files.
Refactored Lingo::Database::Crypter into a module.
JRuby 9000 compatibility.

Assets 2

09 Feb 10:29

blackwinter

v1.8.6

61dc0d2

v1.8.6

Lingo::Attendee::VectorFilter learned pos option to print position and
byte offset with each word.
Lingo::Attendee::VectorFilter learned tfidf option to sort results based
on their tf–idf score; the document
frequencies are calculated over the "corpus" of all files processed during
a single program invocation.
Lingo::Attendee::VectorFilter learned tokens option to filter on
Lingo::Language::Token in addition to Lingo::Language::Word.
Lingo::Attendee::VectorFilter no longer supports debug (as well as
prompt and preamble); use Lingo::Attendee::DebugFilter instead.
Lingo::Attendee::TextReader no longer removes line endings; option chomp
is obsolete.
Lingo::Attendee::TextReader passes byte offset to the following attendee.
Lingo::Attendee::Tokenizer records token's byte offset.
Lingo::Attendee::Tokenizer records token's sequence position.
Lingo::Attendee::Tokenizer learned skip-tags option to skip over
specified tags' contents.
Lingo::Attendee subclasses warn when invalid or obsolete options or names
are used.
Changed German infix substitution /en to ch/chen in order to prevent
overly aggressive identifications.
Internal refactoring and API changes.

Assets 2

02 Oct 13:33

blackwinter

v1.8.5

0bd34b8

v1.8.5

Dictionary values (projections) are no longer sorted; hence, order of
definition affects processing.
Lexicals in Lingo::Language::Word are no longer sorted; in particular,
compound parts keep their original order.
Lexicals in Lingo::Language::Word are no longer cleaned from duplicates.
Compiled dictionaries are updated whenever the Lingo version or their
configuration changes, not only when the source file's size or modification
time changes.
Lingo::Attendee::Synonymer learned compound-parts option to also
generate synonyms for compound parts when set to true.
Lingo::Attendee::TextReader learned better PDF-to-text conversion using the
pdftotext command; specify filter: pdftotext in the config.
Lingo::Attendee::VectorFilter learned dict option to print words in
dictionary format (viz. Lingo::Database::Source::WordClass).
Lingo::Attendee::VectorFilter learned preamble option to print current
configuration to the beginning of the log file (debug: 'true');
set preamble: false to disable.
Multiword dictionaries compiled from base forms can now generate inflected
adjectives based on the gender of the head noun; set inflect: true
in the dictionary config.
Lingo::Database::Source::WordClass supports gender information being encoded
in the dictionary as well as shorthand notation for multiple word
classes/genders.
Lingo::Database::Source::WordClass supports compounds being encoded in the
dictionary (appending + to their parts' word classes is
recommended).
Lingo::Database::Source removes leading and trailing whitespace from
dictionary lines.
Lingo::Database::Crypter uses OpenSSL to encrypt/decrypt dictionaries.
Note: Can't decrypt dictionaries encrypted with the old scheme anymore.
Lingo::Attendee::Tokenizer learned subset of MediaWiki syntax.
Eliminated pathological behaviour of the URLS rule in
Lingo::Attendee::Tokenizer.
Fixed regression introduced in 1.8.2 where combine: all would no
longer work in Lingo::Attendee::MultiWorder.
Updated and extended Russian dictionaries. (Yulia Dorokhova, Thomas Müller)
lingoctl no longer overwrites existing files without confirmation.
lingoctl learned archive command.
Dictionary cleanup.

Assets 2

16 Sep 08:29

blackwinter

v1.8.4

6daf9c0

v1.8.4

Lingo::Attendee::Sequencer accepts regular expression patterns.
Lingo::Attendee::Sequencer substitutes 0 in the format string for the
matched pattern.
Lingo::Attendee::NonewordFilter learned dict option to print nonewords
in dictionary format.
Added progress reporting to Lingo::Attendee::TextReader for STDIN.
lingoctl demo reports successful initialization.
Russian localization for Lingo::Web. (Yulia Dorokhova, Thomas Müller)
Lingo::Web learned parameter hl to set UI language.
Lingo::Web displays the configuration in use.
Lingo::Srv accepts array of query strings in addition to single query
string.
Meeting config takes precedence over language config.
When dictionary entries are rejected during conversion, the location of the
reject file will be shown.
LIR record number defaults to match string in absence of capture group.
Optionally prevent Lingo from sorting any results by setting the
LINGO_NO_SORT environment variable.

Assets 2

16 Sep 08:31

blackwinter

v1.8.3

ee8c83e

v1.8.3

Fixed regression introduced in 1.8.2 where reading input from STDIN was no
longer possible.
Fixed regression introduced in 1.8.2 where Lingo would no longer run on Ruby
1.9.2.
Fixed length limit handling for multibyte characters in SDBM store.
Fixed encoding issue in SDBM store.
Fixed issue with BOM in config files.
Modified character handling to accept any Unicode letter (Alphabetic)
and digit (Decimal Number).
Modified Lingo::Attendee::Tokenizer to use only hard-coded tokenization
rules.
Modified Lingo::Attendee::VectorFilter option lexicals to be
case-sensitive.
Improved overall performance and memory usage; Lingo::Attendee::Sequencer
changed the order sequences are inserted into the stream.
Eliminated performance penalty caused by Lingo::Attendee::Abbreviator.
Added Russian language support. (Yulia Dorokhova, Thomas Müller)
Added fields option to Lingo::Attendee::TextReader to cut off field
labels; defaults to true in record (LIR) mode.
Added skip option to Lingo::Attendee::TextReader to skip lines matching
the given pattern.
Added src option to Lingo::Attendee::VectorFilter to print "source" part
of compounds.
Added lingosrv and lingoweb executables. The former provides a simple
HTTP endpoint with JSON output; the latter serves a demo web interface.
Refactored internal caching.
Made dependency on Ruby version >= 1.9.2 explicit.
Removed reporting facility (options --perfmon and --status).
Learned --profile option to collect profiling information while running.
Deprecated Lingo::Language::Grammar option compositum (now compound),
Lingo::Config option textreader (now text_reader), and
Lingo::Attendee::TextReader option lir-record-pattern (now records);
they will be removed in Lingo 1.9.

Assets 2

16 Sep 08:32

blackwinter

v1.8.2

3a0199d

v1.8.2

Performance improvements regarding Lingo::Attendee::VectorFilter (as well
as Lingo::Attendee::NonewordFilter) memory usage; set sort: false
in the config.
Added Lingo::Attendee::Stemmer (implementing Porter's algorithm for suffix
stripping).
Added progress reporting to Lingo::Attendee::TextReader; set progress: true in the config.
Added directory and glob processing to Lingo::Attendee::TextReader (new
options glob and recursive).
Renamed Lingo::Attendee::TextReader option lir-record-pattern to
records.
Fixed Lingo::Attendee::Debugger to forward all objects so it can be
inserted between any two attendees in the config.
Fixed regression introduced in 1.8.0 where Lingo would not use existing
compiled dictionary when source file is not present.
Fixed "invalid byte sequence in UTF-8" on Windows for SDBM store.
Enabled pluggable (compiled) dictionaries and storage backends.
Extensive internal refactoring and cleanup. (Finished for now.)

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: lex-lingo/lingo

v1.10.2

v1.10.1

v1.10.0

v1.9.0

v1.8.7

v1.8.6

v1.8.5

v1.8.4

v1.8.3

v1.8.2