Release Notes 🎉

Pipelines integration!
- Utilized now in text processing, which now could be deleted onto tokenization, entities assignation, frames assignation stages.
Repositories for opinions and network input samples!
Storage kernel customizations support for opinion and samples! Using Pandas by default.
Opinion-related service turn into providers: pairs, opinions, text-opinions, etc.

NOTE: issue #232 has been moved to the next release.
This version does not support RuAttitudes collection news parsing!
Will be fixed in the upcomming project.

Changelog

v0.22.0-rc (2022-03-17)

Full Changelog

Changes

Implemented enhancements:

create_term_embedding -- Embedding algorithm based on parts requires useless check #298
UnitTests -- BertOntoNotes is no longer below the core processing #293
SingleLabelScaler -- provide [QUICK] #291
BRAT visualization -- support processing in case of multiple documents. #286
Entity -- IDs Refactoring #280
BaseSampleRowProvider -- provide sentence id #279
BRAT tool -- adopt ui as a callback for the predict pipeline #275
ExperimentIterationHandler -- add Labeled Output Samples convertion to OpinionCollection #270
InferenceContext -- split bags and samples extraction from a single method [Quick] #268
DataFolding -- organize united data folding. #267
BaseDataFolding -- iter_index is not related to the base implementation #266
DataFolding -- move into experiment context #264
DataIO (exp_data var) -- rename it to ExperimentContext #263
ExperimentIterationHandler (Callback before) -- organize ExperimentEvaluationCallback #262
NetworkCallback -- this callback should not inherit experiment base Callback #261
Neural Network Hidden states writers and providers refactoring #260
TrainingCallback -- separate onto TrainingTerminationCallback and HiddenWriterCallback classes. #259
BaseTensorflowModel -- simplify fit and predict operations. #258
LabeledCollection -- remove is_empty and reset_labels api #257
NetworkCallback -- move train/predict notification info into callback #256
Tensorflow saver -- move the related logic outside of the model implementation #255
DefaultSingleLabelAnnotationAlgorithm -- single label is not a part of the algo #244
ThreeScaleTaskAnnotator -- rename and move into core. #243
Data/output -- create pipelines directory with the related output processing #240
Examples -- document parsing executes twicely #239
Might be utilized pipeline implementation #238
OpinionsProvider -- performs two actions, including ids assignation #236
entity_to_group_func -- BaseExperiment should not provide this method. #235
TextOpinionHelper -- to news/parsed/providers (implement the latter as a provider) #233
DefaultSingleLabelAnnotationAlgorithm -- iter_opinion duplicates the generalized pair opinion pair creation approach #231
Common languages dir -- move its contents into processing contrib. #229
Linked Text Opinions Refactoring. #228
Lemmatization should be a part of the frames processing pipeline stage #226
DefaultTextParser -- this class is actually a Tokenizer #225
News -- text-opinions provider and entities access API might be a part of a ParsedNews by means of NewsParser (new class) #224
StringLabelsFormatter -- switch to label_types instead of label instances. #223
AnnotationAlgorithm -- iter_opinions requires EntitiesCollection while the latter utilized for entities iteration #222
TextParseOptions -- add keep_tokens #221
FrameVariantsParser -- return modified terms only #218
FramesAnnotation -- is_inverted flag and processing shoult be a pipeline item #217
FramesCollection -- use FrameConnotationProvider instead #216
FrameVariantsParser -- move into processing subfolder. #215
OpinionOperations -- remove try_read_annotated_opinion_collection #213
DocumentOperation -- unify iter_doc_ids operation into one with tag parameter. #212
OpinionOperations -- move readers* into IO. #211
OpinionCollectionsProvider -- serialization should not be a part of this class #210
data -- separate data-related information from the experiment #209
BaseInputReader -- class stores _df, however it should replaced with BaseRowsStorage #207
Repositories -- fill method should be a part of a storage rather than provider. #204
BaseStorage -- exclude save method into separated class BaseRowsWriter #202
Experiments -- rename formats to api (QUICK) #201
Embedding and Vocabulary -- organize Storage/Repository with serialize/load operations. #200
Sample -- remove dependency from DefaultNetworkConfig. #199
BaseOutputFormatter -- both provider and formatter mixes df usage #198
OpinionProvider -- remove dependency from Opinion and Document Operation instances. #197
Repositiories -- add this class which unite all the providers for data writing #195
Add column providers #194
NetworkSampleFormatter -- switch to provider #193
BaseSampleStorage -- use store_labels instead of data_type passing (QUICK) #192
NetworkOutputEncoder -- separate formatting from serialization. #191
BaseSampleFormatter -- __create_row is not relted to the Formatter, should be moved. #190
BaseDocumentStatGenerator -- provider depends on IO files. #189
OpinonFormatter -- use the latter in experiment io. #188
News -- remove return_text parameter from iter_sentences method (QUICK) #187
BaseRowsFormatter -- move format method in another class #185
BaseSampleFormatter -- _iter_sentence_terms should not be a part of this class. (QUICK) #184
BaseSampleFormatter -- _provide_rows behavior depends on row_ids_provider instance type. #182
BaseSampleFormatter -- remove data_type parameter from ctor #181
BaseObjectParser -- parse method should return object of the same type as sentence #179
News -- remove entities_parser instance from News class. #178
BaseEntitiesParser -- generalize to BaseObjectsParser. #177
Provide SHA checksums utilization for downloaded resources. #176
OpinionCollectionsFormatter -- use it as instance, created within with block #175
BaseOutput -- move _csv_to_dataframe out of this class. #174
DataIO -- remove Stemmer instance #172
BaseRowsFormatter -- formatter_type_log_name mehod should be removed. #171
BaseOpinionsFormatter -- leave save method implementation for inheritor classes. #170
BaseSampleFormatter -- leave save method implementation for inheritor classes. #169
BaseIOUtils -- remove dependencies from file/(path) based data storage format #168
BaseIOUtils -- get_input_sample_filepath get_input_opinions_filepath are limit possible storage abilities. #166
perform_reading_and_initialization -- provide samples reader. #165
perform_reading_and_initialization -- remove dependency from doc_ops #164
NetworkInputSampleReader -- remove inheritance from TSV-based reader. #163
OpinionCollectionsFormatter -- use save_to and load_from notation for method names with source provider (file/archive/storage, etc.). #142
RuSentRelOpinionCollectionFormatter -- move all the opinion iteration during saving/loading into base class #141
news_id or doc_id -- normalize class and field names #133
embeddings subdir -- considered to be a part of networks contrib #132
Sentiment frame polarity (A0->A1) considered to be a part of the related experiment. #118
EnumServices -- provide a base class with string to Enum conversion functionality #117
EntityFormaters -- Move formaters into the particular experiment implementation #116
_create_parse_options -- remove this method from DocumentOperations across all the experiments. #112
NewsParseOptions -- provide this options for the particular DefaultParser derived from TextParser #111
TextParser -- Provide a separated class with a text processing algorithm implementation API #75
Providing all the logging information into log_utils.py #30

Fixed bugs:

ModuleNotFoundError: No module named 'arekit.common.data.input.providers.instances' #301
UnitTests -- Discard RuAttitudes-v1.2 support due to index out of range exception on reading #295
text_opinions_iter_pipeline -- ids assigments varies after multiple calls #278
EntitiesParser -- provide doc_level ids #277
DeepPavlovNER -- BertOntoNotes entities annotation [Treating string and list-based text representation simultaneously] #274
Examples -- get_index_by_term of Vocabulary failed #271
Annotator Performance -- keeps all possible pairs between entities. #253
Network SampleID -- has type unicode, but expected to be integer type #248
Example -- given two sentences results in samples of only last of them. #246
UnitTests -- Incorrect labels formatter (QUICK) #186
test_samples_iter.py -- incorrect API usage in Tensorflow contrib. #158

Closed issues:

Transfer examples folder into separated project [ARElight] #300
RuSentRel Experiment -- Text is lemmatized irrespect of the save_lemmas parameter in parser [OK] #297
Experiment -- refactor inference pipeline implementation #290
Example -- reorganize infer folder (experiment) #289
Experiment -- Organize pipeline stages as items of the BasePipeline #285
BaseSampleRowProvider -- provide entity values and entity types. [QUICK] #283
DeepPavlov NER -- adopt BERTontonotes. #272
NeuralNetworks -- graph and tf session should be initialized before the predict method call. #247
NewsServiceCollection -- implement #245
numpy 1.19.5 -- returns int64 by default #242
Organize unit tests for Output to Opinion conversion pipeline #241
Iter_opinions_collection -- complicated, considering pipeline processing instead #237
EntitiesCollection -- provide value_to_group function instead of SynonymsCollection. #230
BaseTextParser -- parse_news is not related to the text parsing concepts and should be a part of the another class #220
DocumentOperations -- _get_text_parser should not be a part of this API #219
Create simple parser for text with mentioned [entities] #214
NetworkInputHelper -- performing serialize_missed_collections during writing process #208
RowIDs -- should be common for input and output #206
SampleRowBalancerHelper -- simplify by using pandas group sampling #203
convert_output_to_opinion_collections -- pass opinion reader into parameters. #167
Experiment -- Separate TSV-based formater from based one for samples and opinions #162
Switch to Python3.6 #160
RuSentRel Experiment Contrib -- update description #153
Provide Cache for data sources #151
SynonymsCollection considered in ReadOnly mode only #5

Merged pull requests:

0.21.1 rc #234 (nicolay-r)
0.21.1 rc #196 (nicolay-r)
0.21.0 rc #159 (nicolay-r)
0.21.0 rc #157 (nicolay-r)
0.21.0 rc #152 (nicolay-r)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

arekit-0.22.0

Release Notes 🎉

Changelog

v0.22.0-rc (2022-03-17)

Changes