Skip to content

Releases: Yoctol/uttut

1.4.0: Operator configs

15 Apr 04:22
Compare
Choose a tag to compare
  1. Pipe supports steps property: List of Operator
  2. Operator supports configs property
  3. Add new Operator: AddEndToken

1.3.4: Forget to register new operation

13 Mar 07:11
Compare
Choose a tag to compare

[Fix BUG]: Forget to register new operation CustomWordTokenizer.

1.3.3: Custom Word Tokenizer

13 Mar 06:47
Compare
Choose a tag to compare

Add a new operation: CustomWordTokenizer

This operation tokenizes the input string according to the given user words.
If the substring of the input string matches the user words, it would be chunked as a single token.
Otherwise, the substring would be tokenized as a list of characters.

1.3.2: Token2IndexwithHash

12 Mar 06:47
Compare
Choose a tag to compare

Add a new operation: Token2IndexwithHash.

This operation maps input tokens to indices given token2index dictionary.
If the token is not in the given dictionary, we hash the token and mod it with
the size of the dictionary.

1.3.1: New Op PunctuationExceptEndpointToWhitespace

25 Feb 04:20
Compare
Choose a tag to compare

Main Modification:
Added new Operator PunctuationExceptEndpointToWhitespace

1.3.0: Faster Pipe

13 Feb 08:04
Compare
Choose a tag to compare
  1. The interface of Operator has Huge breaking.
    • The inputs of Operator.transform are changed. (output_sequence, labels -> output_sequence)
    • The outputs of Operator.transform are changed.
      (output_sequence, output_labels, realigner -> output_sequence, label_aligner)
  2. LabelAligner substitutes Realigner.
  3. Add lighter Pipe transformation - transform_sequence.
  4. Cythonize the elements of edit, including Replacement, ReplacmentGroup, Span and SpanGroup.
  5. Cythonize label propagation.
  6. Add document for Transformer.

Note: the modification of 1~5 can cut time spending on Pipe transformation by more than 80%.

1.2.0: Bert Pipe

24 Jan 09:15
Compare
Choose a tag to compare
  1. Add building blocks for BERT tokenizer construction, including AddWhitespaceAroundCJK, AddWhitespaceAroundPunctuation, MergeWhiteSpaceCharacters, StripWhiteSpaceCharacters,
    StripAccentToken, WhiteSpaceTokenizer and SpanSubwords.
  2. Bert pipes are created in uttut/pipeline/bert/.

1.1.0: UttPipe can add checkpoints

17 Jan 04:29
Compare
Choose a tag to compare
  1. Implement Operator __eq__ : Operators can compare.
  2. Refactor common tests for Operators: more extensible to add test function.
  3. Add Lowercase Operator: Convert characters to lowercase.
  4. Add checkpoints: uttut pipe can output intermediate result by adding checkpoints.
    For example,
>>> from uttut.pipeline.pipe import Pipe
>>> p = Pipe()
>>> p.add('op_1', checkpoint='result_of_1')
>>> p.add('op_2')

>>> _, _, _, _, intermediate = p.transform(...)
>>> intermediate.get_from_checkpoint('result_of_1')
# output the intermediate result of op_1 including output_sequence, entity_labels

1.0.0: UttPipe

16 Jan 09:59
Compare
Choose a tag to compare
  1. Introduced bian (edit) for recording the changing process of sequence.
  2. Created Operators for sequence modification.
  3. Created Pipe for connecting a sequence of operators to do sequence transformation.

Merged PRs:
#58 #46 #54 #53 #56 #55 #50 #48 #51 #49 #47 #44 #45 #42
#41 #40 #38 #37 #33 #35 #36 #32 #34 #31 #27 #21 #20 #17
#16 #15 #13 #14

0.6.0

29 Oct 06:32
504c219
Compare
Choose a tag to compare

#11 Breaking of expand_by_entities.

  • User have to specify the sampling method.

Before 0.6.0:

expand_by_entity(
    datum,
    augment_method,
    augment_kwargs,
    include_replacements,
    include_orig,
)

After:

expand_by_entity(
    datum,
    sampling_method=lambda n_combinations: list(range(n_combinations)),
    include_orig=False,
)
  • The entities of the returned datum have no replacements.
  • include_orig