Releases: Yoctol/uttut
1.4.0: Operator configs
- Pipe supports steps property: List of Operator
- Operator supports configs property
- Add new Operator: AddEndToken
1.3.4: Forget to register new operation
[Fix BUG]: Forget to register new operation CustomWordTokenizer.
1.3.3: Custom Word Tokenizer
Add a new operation: CustomWordTokenizer
This operation tokenizes the input string according to the given user words.
If the substring of the input string matches the user words, it would be chunked as a single token.
Otherwise, the substring would be tokenized as a list of characters.
1.3.2: Token2IndexwithHash
Add a new operation: Token2IndexwithHash.
This operation maps input tokens to indices given token2index dictionary.
If the token is not in the given dictionary, we hash the token and mod it with
the size of the dictionary.
1.3.1: New Op PunctuationExceptEndpointToWhitespace
Main Modification:
Added new Operator PunctuationExceptEndpointToWhitespace
1.3.0: Faster Pipe
- The interface of
Operator
has Huge breaking.- The inputs of
Operator.transform
are changed. (output_sequence, labels -> output_sequence) - The outputs of
Operator.transform
are changed.
(output_sequence, output_labels, realigner -> output_sequence, label_aligner)
- The inputs of
LabelAligner
substitutesRealigner
.- Add lighter Pipe transformation -
transform_sequence
. - Cythonize the elements of edit, including
Replacement
,ReplacmentGroup
,Span
andSpanGroup
. - Cythonize label propagation.
- Add document for
Transformer
.
Note: the modification of 1~5 can cut time spending on Pipe transformation by more than 80%.
1.2.0: Bert Pipe
- Add building blocks for BERT tokenizer construction, including
AddWhitespaceAroundCJK
,AddWhitespaceAroundPunctuation
,MergeWhiteSpaceCharacters
,StripWhiteSpaceCharacters
,
StripAccentToken
,WhiteSpaceTokenizer
andSpanSubwords
. - Bert pipes are created in
uttut/pipeline/bert/
.
1.1.0: UttPipe can add checkpoints
- Implement Operator
__eq__
: Operators can compare. - Refactor common tests for Operators: more extensible to add test function.
- Add Lowercase Operator: Convert characters to lowercase.
- Add checkpoints: uttut pipe can output intermediate result by adding checkpoints.
For example,
>>> from uttut.pipeline.pipe import Pipe
>>> p = Pipe()
>>> p.add('op_1', checkpoint='result_of_1')
>>> p.add('op_2')
>>> _, _, _, _, intermediate = p.transform(...)
>>> intermediate.get_from_checkpoint('result_of_1')
# output the intermediate result of op_1 including output_sequence, entity_labels
1.0.0: UttPipe
- Introduced
bian
(edit
) for recording the changing process of sequence. - Created
Operators
for sequence modification. - Created
Pipe
for connecting a sequence of operators to do sequence transformation.
Merged PRs:
#58 #46 #54 #53 #56 #55 #50 #48 #51 #49 #47 #44 #45 #42
#41 #40 #38 #37 #33 #35 #36 #32 #34 #31 #27 #21 #20 #17
#16 #15 #13 #14
0.6.0
#11 Breaking of expand_by_entities
.
- User have to specify the sampling method.
Before 0.6.0:
expand_by_entity(
datum,
augment_method,
augment_kwargs,
include_replacements,
include_orig,
)
After:
expand_by_entity(
datum,
sampling_method=lambda n_combinations: list(range(n_combinations)),
include_orig=False,
)
- The entities of the returned datum have no replacements.
include_orig