Add tests for utility functions (resolves #50) #51

m-charlton · 2023-09-21T12:24:16Z

Contributor checklist

This pull request is on a separate branch and not the main branch

Description

This PR addresses #50 by adding unit tests for the utils module as well as a few minor refactors:

Added type annotations for most of the functions for checking by mypy
Tinkered with error messages
Added error checking in functions get_language_words_to_ignore() & get_language_words_to_remove(). This puts them in line with other functions in the module.
Used assertCountEqual(), where list comparisons are involved, as I'm assuming that list order is not important. Please correct me if I'm wrong.

All these changes are in the second commit.

The first commit is the removal of the add_num_commas() & num_add_commas() functions

Although code coverage is 100%. I'm looking for feedback on test coverage, especially for check_command_line_args() & check_and_return_command_line_args()

Related issue

Noticed that there is a lot duplication of the same language data distributed throughout utils.py. Is there any interest in moving this data out to say a JSON file? This file would be loaded once on module import and then the utils functions could interrogate this loaded object.

I'm willing to go into more detail and/or write a PR.

f-strings can format numbers to use a comma as a thousands separator. The `add_num_commas` & `num_add_commas` functions are now redundant.

* Unit tests for `utils` module * Edit some error messages * Add type annotations for `mypy` checks

github-actions · 2023-09-21T12:24:41Z

Thank you for the pull request!

The Scribe team will do our best to address your contribution as soon as we can. The following is a checklist for maintainers to make sure this process goes as well as possible. Feel free to address the points below yourself in further commits if you realize that actions are needed :)

If you're not already a member of our public Matrix community, please consider joining! We'd suggest using Element as your Matrix client, and definitely join the General and Data rooms once you're in. It'd be great to have you!

Maintainer checklist

The commit messages for the remote branch should be checked to make sure the contributor's email is set up correctly so that they receive credit for their contribution
- The contributor's name and icon in remote commits should be the same as what appears in the PR
- If there's a mismatch, the contributor needs to make sure that the email they use for GitHub matches what they have for git config user.email in their local Scribe-Data repo
The CHANGELOG has been updated with a description of the changes for the upcoming release (if necessary)

m-charlton · 2023-09-21T14:36:21Z

The comment on line 199 of tests/load/test_update_utils.py needs to be removed. Slipped through final review.

andrewtavis · 2023-09-21T14:52:17Z

Thanks for sending this along, @m-charlton! I’ll try to get to the review in the coming days :) :)

andrewtavis · 2023-09-26T13:33:10Z

Hey @m-charlton 👋 Just FYI I’m a bit under the weather, so the review will take a bit longer. Apologies!

m-charlton · 2023-09-26T16:25:19Z

No worries. Get well soon. In the meantime I'll start to have look at #48

andrewtavis · 2023-09-26T16:40:04Z

Thanks so much, @m-charlton! :) Happy to answer any Wikidata related questions if needed 😊

wkyoshida

Is there any interest in moving this data out to say a JSON file? This file would be loaded once on module import and then the utils functions could interrogate this loaded object.

You know what? I do like the idea actually 🤔 I'd be fine with having an issue for this (we can hash out the details there) and then an accompanying PR 🙌🚀

wkyoshida · 2023-10-13T01:40:48Z