Releases: huggingface/evaluate
Releases · huggingface/evaluate
0.4.3
This release adds support for datasets>=3.0
by removing calls to deprecated code
What's Changed
- Fix CI with temporary pin nltk<3.9 by @albertvillanova in #623
- Replace deprecated use_auth_token with token by @albertvillanova in #621
- remove ignore_url_params by @lhoestq in #624
Full Changelog: v0.4.2...v0.4.3
v0.4.2
What's Changed
- Update the documentation and citation of mauve by @krishnap25 in #416
- Remove unused dependency by @daskol in #507
- Add confusion matrix by @osanseviero in #528
- Update python to 3.8 by @qubvel in #571
- Fix FileFreeLock by @lhoestq in #578
- Fix example doc in load function by @alexrs in #575
- Speeding up mean_iou metric computation by @qubvel in #569
New Contributors
- @rtrompier made their first contribution in #510
- @daskol made their first contribution in #507
- @qubvel made their first contribution in #571
- @alexrs made their first contribution in #575
Full Changelog: v0.4.1...v0.4.2
v0.4.1
What's Changed
- Add code example to docstrings by @stevhliu in #374
- [Minor fix] Typo by @cakiki in #403
- [Docs] fixed a typo in bertscore readme by @hazrulakmal in #386
- Add max_length kwarg to docstring of Perplexity measurement by @kdutia in #411
- Fix minor typo in a_quick_tour.mdx by @tupini07 in #417
- Fix Docs base_evaluator.mdx by @jorahn in #418
- Update Gradio description to clarify text-based input by @BramVanroy in #427
- fix
add
method by @hazrulakmal in #424 - Fix broken link in docs/a_quick_tour.mdx by @tupini07 in #419
- resolve #379 audio classification evaluator + docs by @Plutone11011 in #405
- fixed kwargs not being passed in combine by @Plutone11011 in #425
- add r^2 metric by @TKaanKoc in #407
- Update spaces gradio version to 3.19.1 by @BramVanroy in #426
- replace evaluate DownloadConfig with datasets by @lvwerra in #447
- Render Text2TextGenerationEvaluators' docstring examples by @mariosasko in #463
- Trigger CI on ci-* branches by @Wauplin in #467
- Update comet by @ricardorei in #443
- Fix
datasets
import in Meteor metric by @mariosasko in #490 - fix scikit-learn package name suggestion by @bzz in #498
- Release: 0.4.1 by @lhoestq in #505
New Contributors
- @cakiki made their first contribution in #403
- @hazrulakmal made their first contribution in #386
- @kdutia made their first contribution in #411
- @tupini07 made their first contribution in #417
- @jorahn made their first contribution in #418
- @Plutone11011 made their first contribution in #405
- @TKaanKoc made their first contribution in #407
- @mariosasko made their first contribution in #463
- @Wauplin made their first contribution in #467
- @ricardorei made their first contribution in #443
- @bzz made their first contribution in #498
- @lhoestq made their first contribution in #505
Full Changelog: v0.4.0...v0.4.1
v0.4.0
What's Changed
- add trainer integration docs by @lvwerra in #325
- Stop using model-defined truncation in perplexity calculation by @mathemakitten in #333
- Don't use eval for Evaluator instances in the doc by @fxmarty in #341
- fix caching by @lvwerra in #336
- Fix #327 set default row of gradio webui to 1 and drop empty/blank row by @Raibows in #335
- Update pr docs actions by @mishig25 in #344
- Fix
scikit-learn
install in spaces by @lvwerra in #345 - added MASE, sMAPE and MAPE metrics by @kashif in #330
- fix sklearn dependency in mape, mase and smape by @lvwerra in #346
- Update link text by @stevhliu in #360
- Corrected range of MAE by @clefourrier in #359
- Revert "Update pr docs actions" by @mishig25 in #363
- Evaluation suite by @mathemakitten in #337
- Matthews correlation coefficient by @sanderland in #362
- fix tf version by @lvwerra in #372
- Add TextGeneration Evaluator by @NimaBoscarino in #350
- Fix typo in rouge types by @davebulaval in #364
- Add
Evaluate
usage forscikit-learn
by @awinml in #368 - Adding metric visualization by @sashavor in #342
- Add NIST metric by @BramVanroy in #250
- add GitHub Actions CI by @lvwerra in #375
- Add Evaluate Usage for Keras and Tensorflow by @arjunpatel7 in #370
- fix version by @lvwerra in #380
- CharacTER: MT metric by @BramVanroy in #286
- CharCut: another character-based MT evaluation metric by @BramVanroy in #290
- asr model evaluator addition + doc by @bayartsogt-ya in #378
- Docs for EvaluationSuite by @mathemakitten in #340
- Update the documentation of Mauve by @krishnap25 in #377
- fix-ci-badge by @lvwerra in #385
New Contributors
- @Raibows made their first contribution in #335
- @kashif made their first contribution in #330
- @clefourrier made their first contribution in #359
- @davebulaval made their first contribution in #364
- @awinml made their first contribution in #368
- @arjunpatel7 made their first contribution in #370
- @bayartsogt-ya made their first contribution in #378
- @krishnap25 made their first contribution in #377
Full Changelog: v0.3.0...v0.4.0
v0.3.0
What's Changed
- add multilabel f1 eval usage by @fcakyon in #221
- Force get_supported_tasks() to return a list instead of dict keys by @mathemakitten in #227
- Unpin rouge_score by @albertvillanova in #220
- Remove import statement in Measurement Card by @meg-huggingface in #231
- make rouge support multi-ref by @lvwerra in #229
- Fix enforce string by @lvwerra in #230
- Fix examples in perplexity measurement docs by @mathemakitten in #238
- Add Wilcoxon's signed rank test by @douwekiela in #237
- Add support for two input columns for TextClassificationEvaluator by @fxmarty in #205
- fix bug in TEMPLATE_REQUIRE: add comma by @BramVanroy in #248
- Minor quicktour doc suggestions by @stevhliu in #236
- Clarify error message for ChrF no. references by @BramVanroy in #247
- only track unique missing dependencies by @BramVanroy in #246
- Update evaluate in spaces by @lvwerra in #228
- add
commit_hash
to args by @lvwerra in #253 - Change perplexity to be calculated with base e by @mathemakitten in #242
- Rebase for previous PR by @mathemakitten in #254
- Fix docstrings with new perplexities with base e by @mathemakitten in #255
- add a tokenizer option to rouge by @lvwerra in #258
- Adding list_duplicates=True to example. by @meg-huggingface in #263
- Minor change in describing what this does. by @meg-huggingface in #267
- Mapping example output to returned output. by @meg-huggingface in #268
- Changes "duplicates_list" to "duplicates_dict" (since it's dict) by @meg-huggingface in #265
- Changes "duplicates_list" to "duplicates_dict" in the example. by @meg-huggingface in #264
- Add slow flag to two column parity test by @lvwerra in #273
- Remove
handle_impossible_answer
from the defaultPIPELINE_KWARGS
in the question answering evaluator by @fxmarty in #272 - Toxicity Measurement by @sashavor in #262
- Automatically choose dataset split if none provided by @mathemakitten in #232
- Fix YAML in Toxicity by @lvwerra in #278
- Added metric Brier Score by @kadirnar in #275
- Check for mismatch in device setup in evaluator by @mathemakitten in #287
- Fix transfomers import in the evaluator by @mathemakitten in #291
- Add support for name field when loading data by @mathemakitten in #283
- Adding regard measurement by @sashavor in #271
- Raise exception instead of assert in BertScore by @BramVanroy in #292
- fix regard yaml by @lvwerra in #295
- Add CONTRIBUTING.md by @mathemakitten in #293
- Refactor kwargs and configs by @lvwerra in #188
- Revert "Refactor kwargs and configs" by @lvwerra in #299
- Add missing
split
andsubset
kwarg into other evaluators by @mathemakitten in #301 - Adding HONEST score by @sashavor in #279
- fix wrong sorting in check by @sanderland in #305
- Fix HONEST yaml by @lvwerra in #303
- Refactor current_features to selected_feature_format by @mathemakitten in #306
- replace datasets list with local list of tasks by @lvwerra in #309
- Adding torch to the requirements by @sashavor in #311
- Honest space fix by @sashavor in #312
- Use HTML relative paths for tiles by @lewtun in #318
- Test for valid YAML files by @mathemakitten in #308
- add versioning the
HubEvaluationModuleFactory
by @lvwerra in #314 - Add text2text evaluator by @lvwerra in #261
- try main if tag does not work by @lvwerra in #322
New Contributors
- @fcakyon made their first contribution in #221
- @meg-huggingface made their first contribution in #231
- @stevhliu made their first contribution in #236
- @kadirnar made their first contribution in #275
- @sanderland made their first contribution in #305
Full Changelog: v0.2.2...v0.3.0
v0.2.2
v0.2.1
What's Changed
- Add measurements to quality and style checks by @lvwerra in #203
- Add comparisons and measurements to code quality tests by @lvwerra in #204
- Remove mention to datasets from docs by @albertvillanova in #207
- Adding label distribution measurement by @sashavor in #202
- Fix spaces tagging by @lvwerra in #217
- set datasets to >=2.0.0 by @lvwerra in #216
Full Changelog: v0.2.0...v0.2.1
v0.2.0
What's New
evaluator
The evaluator
has been extended to three new tasks:
"image-classification"
"token-classification"
"question-answering"
combine
With combine
one can bundle several metrics into a single object that can be evaluated in one call and also used in combination with the evalutor
.
What's Changed
- Fix typo in WER docs by @pn11 in #147
- Fix rouge outputs by @lvwerra in #158
- add tutorial for custom pipeline by @lvwerra in #154
- refactor
evaluator
tests by @lvwerra in #155 - rename
input_texts
topredictions
in perplexity by @lvwerra in #157 - Add link to GitHub author by @lewtun in #166
- Add
combine
to compose multiple evaluations by @lvwerra in #150 - test string casting only on first element by @lvwerra in #159
- remove unused fixtures from unittests by @lvwerra in #170
- Add a test to check that Evaluator evaluations match transformers examples by @fxmarty in #163
- Add smaller model for
TextClassificationEvaluator
test by @fxmarty in #172 - Add tags to spaces by @lvwerra in #162
- Rename evaluation modules by @lvwerra in #160
- Update push_evaluations_to_hub.py by @lvwerra in #174
- update evaluate dependency for spaces by @lvwerra in #175
- Add
ImageClassificationEvaluator
by @fxmarty in #173 - attempting to let meteor handle multiple references per prediction by @sashavor in #164
- fixed duplicate calculation of spearmanr function in metrics wrapper. by @benlipkin in #176
- forbid hyphens in template for module names by @lvwerra in #177
- switch from Github to Hub module factory for canonical modules by @lvwerra in #180
- Fix bertscore idf by @lvwerra in #183
- refactor evaluator base and task classes by @lvwerra in #185
- Avoid importing tensorflow when importing evaluate by @NouamaneTazi in #135
- Add QuestionAnsweringEvaluator by @fxmarty in #179
- Evaluator perf by @ola13 in #178
- Fix QuestionAnsweringEvaluator for squad v2, fix examples by @fxmarty in #190
- Rename perf metric evaluator by @lvwerra in #191
- Fix typos in QA Evaluator by @lewtun in #192
- Evaluator device placement by @lvwerra in #193
- Change test command in installation.mdx to use exact_match by @mathemakitten in #194
- Add
TokenClassificationEvaluator
by @fxmarty in #167 - Pin rouge_score by @albertvillanova in #197
- add poseval by @lvwerra in #195
- Combine docs by @lvwerra in #201
- Evaluator column loading by @lvwerra in #200
- Evaluator documentation by @lvwerra in #199
New Contributors
- @pn11 made their first contribution in #147
- @fxmarty made their first contribution in #163
- @benlipkin made their first contribution in #176
- @NouamaneTazi made their first contribution in #135
- @mathemakitten made their first contribution in #194
Full Changelog: v0.1.2...v0.2.0
v0.1.2
What's Changed
- Fix trec sacrebleu by @lvwerra in #130
- Add distilled version Cometihno by @BramVanroy in #131
- fix: add yaml extension to github action for release by @lvwerra in #133
- fix docs badge by @lvwerra in #134
- fix cookiecutter path to repository by @lvwerra in #139
- docs: make metric cards more prominent by @lvwerra in #132
- Update README.md by @sashavor in #145
- Fix datasets download imports by @albertvillanova in #143
New Contributors
- @BramVanroy made their first contribution in #131
- @albertvillanova made their first contribution in #143
Full Changelog: v0.1.1...v0.1.2
v0.1.1
What's Changed
- Fix broken links by @mishig25 in #92
- Fix readme by @lvwerra in #98
- Fixing broken evaluate-measurement hub link by @panwarnaveen9 in #102
- fix typo in autodoc by @manueldeprada in #101
- fix typo by @manueldeprada in #100
- FIX
pip install evaluate[evaluator]
by @philschmid in #103 - fix description field in metric template readme by @lvwerra in #122
- Add automatic pypi release for evaluate by @osanseviero in #121
- Fix typos in Evaluator docstrings by @lewtun in #124
- Fix spaces description in metadata by @lvwerra in #123
- fix revision string if it is a python version by @lvwerra in #129
- Use accuracy as default metric for text classification Evaluator by @lewtun in #128
- bump
evaluate
dependency in spaces by @lvwerra in #88
New Contributors
- @panwarnaveen9 made their first contribution in #102
- @manueldeprada made their first contribution in #101
- @philschmid made their first contribution in #103
- @osanseviero made their first contribution in #121
- @lewtun made their first contribution in #124
Full Changelog: v0.1.0...v0.1.1