-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[search/save] Run bystro-stats on saved annotations #321
[search/save] Run bystro-stats on saved annotations #321
Conversation
@@ -1,5 +1,5 @@ | |||
[build-system] | |||
requires = ["maturin>=0.14,<0.15", "setuptools", "wheel", "Cython"] | |||
requires = ["maturin>=1.3.0,<1.4.0", "setuptools", "wheel", "Cython"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without this we get warnings on building the python project.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we just fold that into a code comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's caused by a mismatch between the version in requirements.txt and pyproject.toml. Do you think that's a connection that needs to be drawn (that the packages in [build-system] are required to be in your environment?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have full context here but in general whenever we're pinning on a minor release I've never regretted documenting the reason :). If there's a version mismatch between requirements.txt and pyproject.toml, optimally we'd resolve the mismatch, but if we can't it's probably worth jotting down somewhere why they have to differ.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice work on the tests-- would recommend revising from_dict
methods to require valid dicts or raise ValueError in order to avoid silent failures on typos, pls consider everything else non-blocking
@@ -191,7 +199,7 @@ def go( # pylint:disable=invalid-name | |||
output_dir = os.path.dirname(job_data.outputBasePath) | |||
basename = os.path.basename(job_data.outputBasePath) | |||
pathlib.Path(output_dir).mkdir(parents=True, exist_ok=True) | |||
outputs = AnnotationOutputs.from_path(output_dir, basename, True) | |||
outputs, stats = AnnotationOutputs.from_path(output_dir, basename, True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: would recommend passing the boolean as a kwarg, i.e.AnnotationOutputs.from_path(output_dir, basename, compress=True)
ret = subprocess.call( | ||
f'cd {output_dir}; tar --exclude ".*" --exclude={tarball_name} -cf {tarball_name} * --remove-files', # noqa: E501 | ||
f'cd {output_dir}; tar --exclude ".*" --exclude={tarball_name} -cf {tarball_name} * && rm {annotation_path}', # noqa: E501 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for completeness we'd want to substitute the name of the tar executable here as well
return StatisticsConfig() | ||
|
||
if "outputExtensions" in stats_config: | ||
stats_config["outputExtensions"] = StatisticsOutputExtensions( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it safe to mutate stats_config in place here? I would expect a method named from_dict
not to mutate its arguments, especially because it's also returning a value
|
||
|
||
def test_from_dict_no_arg(): | ||
config = DelimitersConfig.from_dict() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe there's some use case I'm not seeing but if I saw this line in application code it would attract my attention as a possible bug, especially when one could just call DelimitersConfig()
instead
|
||
def test_from_dict_no_delimiters_key(): | ||
config_dict = {"random_key": "random_value"} | ||
config = DelimitersConfig.from_dict(config_dict) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same concern here: I'd expect this code to fail with ValueError, especially when you consider the possibility of typos
|
||
def get_delimiters(annotation_config: dict[str, Any] | None = None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this method necessary, or would it be possible to consistently refer to delimiters with a DelimitersConfig throughout? It seems less error-prone to have a SSOT about what the delimiters are, if we can pull that off here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was necessary to avoid touching other code that relies on the dict form of DelimitersConfig, but I can fix here.
def test_get_config_file_path_no_path_found(mocker): | ||
mocker.patch( | ||
"bystro.search.utils.annotation.glob", return_value=[] | ||
) # Change `your_module` to the actual module name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this comment still in force? looks like it's now redundant [?]
Partially addresses #314