Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[search/save] Run bystro-stats on saved annotations #321

Merged
merged 6 commits into from
Oct 25, 2023

Conversation

akotlar
Copy link
Collaborator

@akotlar akotlar commented Oct 24, 2023

  • Add running bystro-stats on annotation output
  • Add tests for search/utils/annotation
  • Fix startup.yml and ensure proteomics server is defined in beanstalkd.yml

Partially addresses #314

@@ -1,5 +1,5 @@
[build-system]
requires = ["maturin>=0.14,<0.15", "setuptools", "wheel", "Cython"]
requires = ["maturin>=1.3.0,<1.4.0", "setuptools", "wheel", "Cython"]
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without this we get warnings on building the python project.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we just fold that into a code comment?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's caused by a mismatch between the version in requirements.txt and pyproject.toml. Do you think that's a connection that needs to be drawn (that the packages in [build-system] are required to be in your environment?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have full context here but in general whenever we're pinning on a minor release I've never regretted documenting the reason :). If there's a version mismatch between requirements.txt and pyproject.toml, optimally we'd resolve the mismatch, but if we can't it's probably worth jotting down somewhere why they have to differ.

Copy link
Contributor

@poneill poneill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice work on the tests-- would recommend revising from_dict methods to require valid dicts or raise ValueError in order to avoid silent failures on typos, pls consider everything else non-blocking

@@ -191,7 +199,7 @@ def go( # pylint:disable=invalid-name
output_dir = os.path.dirname(job_data.outputBasePath)
basename = os.path.basename(job_data.outputBasePath)
pathlib.Path(output_dir).mkdir(parents=True, exist_ok=True)
outputs = AnnotationOutputs.from_path(output_dir, basename, True)
outputs, stats = AnnotationOutputs.from_path(output_dir, basename, True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: would recommend passing the boolean as a kwarg, i.e.AnnotationOutputs.from_path(output_dir, basename, compress=True)

ret = subprocess.call(
f'cd {output_dir}; tar --exclude ".*" --exclude={tarball_name} -cf {tarball_name} * --remove-files', # noqa: E501
f'cd {output_dir}; tar --exclude ".*" --exclude={tarball_name} -cf {tarball_name} * && rm {annotation_path}', # noqa: E501
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for completeness we'd want to substitute the name of the tar executable here as well

return StatisticsConfig()

if "outputExtensions" in stats_config:
stats_config["outputExtensions"] = StatisticsOutputExtensions(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it safe to mutate stats_config in place here? I would expect a method named from_dict not to mutate its arguments, especially because it's also returning a value



def test_from_dict_no_arg():
config = DelimitersConfig.from_dict()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe there's some use case I'm not seeing but if I saw this line in application code it would attract my attention as a possible bug, especially when one could just call DelimitersConfig() instead


def test_from_dict_no_delimiters_key():
config_dict = {"random_key": "random_value"}
config = DelimitersConfig.from_dict(config_dict)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same concern here: I'd expect this code to fail with ValueError, especially when you consider the possibility of typos


def get_delimiters(annotation_config: dict[str, Any] | None = None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this method necessary, or would it be possible to consistently refer to delimiters with a DelimitersConfig throughout? It seems less error-prone to have a SSOT about what the delimiters are, if we can pull that off here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was necessary to avoid touching other code that relies on the dict form of DelimitersConfig, but I can fix here.

def test_get_config_file_path_no_path_found(mocker):
mocker.patch(
"bystro.search.utils.annotation.glob", return_value=[]
) # Change `your_module` to the actual module name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this comment still in force? looks like it's now redundant [?]

@akotlar akotlar merged commit 9a7584a into bystrogenomics:master Oct 25, 2023
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants