[search/save] Run bystro-stats on saved annotations #321

akotlar · 2023-10-24T03:26:24Z

Add running bystro-stats on annotation output
Add tests for search/utils/annotation
Fix startup.yml and ensure proteomics server is defined in beanstalkd.yml

Partially addresses #314

akotlar · 2023-10-24T03:27:28Z

python/pyproject.toml

@@ -1,5 +1,5 @@
 [build-system]
-requires = ["maturin>=0.14,<0.15", "setuptools", "wheel", "Cython"]
+requires = ["maturin>=1.3.0,<1.4.0", "setuptools", "wheel", "Cython"]


Without this we get warnings on building the python project.

could we just fold that into a code comment?

It's caused by a mismatch between the version in requirements.txt and pyproject.toml. Do you think that's a connection that needs to be drawn (that the packages in [build-system] are required to be in your environment?)

I don't have full context here but in general whenever we're pinning on a minor release I've never regretted documenting the reason :). If there's a version mismatch between requirements.txt and pyproject.toml, optimally we'd resolve the mismatch, but if we can't it's probably worth jotting down somewhere why they have to differ.

poneill

nice work on the tests-- would recommend revising from_dict methods to require valid dicts or raise ValueError in order to avoid silent failures on typos, pls consider everything else non-blocking

poneill · 2023-10-24T04:15:04Z

python/python/bystro/search/save/handler.py

@@ -191,7 +199,7 @@ def go(  # pylint:disable=invalid-name
    output_dir = os.path.dirname(job_data.outputBasePath)
    basename = os.path.basename(job_data.outputBasePath)
    pathlib.Path(output_dir).mkdir(parents=True, exist_ok=True)
-    outputs = AnnotationOutputs.from_path(output_dir, basename, True)
+    outputs, stats = AnnotationOutputs.from_path(output_dir, basename, True)


nit: would recommend passing the boolean as a kwarg, i.e.AnnotationOutputs.from_path(output_dir, basename, compress=True)

poneill · 2023-10-24T04:18:10Z

python/python/bystro/search/save/handler.py

        ret = subprocess.call(
-            f'cd {output_dir}; tar --exclude ".*" --exclude={tarball_name} -cf {tarball_name} * --remove-files', # noqa: E501
+            f'cd {output_dir}; tar --exclude ".*" --exclude={tarball_name} -cf {tarball_name} * && rm {annotation_path}',  # noqa: E501


for completeness we'd want to substitute the name of the tar executable here as well

python/python/bystro/search/utils/annotation.py

poneill · 2023-10-24T04:27:07Z

python/python/bystro/search/utils/annotation.py

+            return StatisticsConfig()
+
+        if "outputExtensions" in stats_config:
+            stats_config["outputExtensions"] = StatisticsOutputExtensions(


is it safe to mutate stats_config in place here? I would expect a method named from_dict not to mutate its arguments, especially because it's also returning a value

python/python/bystro/search/utils/annotation.py

poneill · 2023-10-24T04:49:22Z

python/python/bystro/search/utils/tests/test_annotation.py

+
+
+def test_from_dict_no_arg():
+    config = DelimitersConfig.from_dict()


maybe there's some use case I'm not seeing but if I saw this line in application code it would attract my attention as a possible bug, especially when one could just call DelimitersConfig() instead

poneill · 2023-10-24T04:50:06Z

python/python/bystro/search/utils/tests/test_annotation.py

+
+def test_from_dict_no_delimiters_key():
+    config_dict = {"random_key": "random_value"}
+    config = DelimitersConfig.from_dict(config_dict)


same concern here: I'd expect this code to fail with ValueError, especially when you consider the possibility of typos

poneill · 2023-10-24T04:54:09Z

python/python/bystro/search/utils/annotation.py


+def get_delimiters(annotation_config: dict[str, Any] | None = None):


is this method necessary, or would it be possible to consistently refer to delimiters with a DelimitersConfig throughout? It seems less error-prone to have a SSOT about what the delimiters are, if we can pull that off here?

This was necessary to avoid touching other code that relies on the dict form of DelimitersConfig, but I can fix here.

poneill · 2023-10-24T04:55:39Z

python/python/bystro/search/utils/tests/test_annotation.py

+def test_get_config_file_path_no_path_found(mocker):
+    mocker.patch(
+        "bystro.search.utils.annotation.glob", return_value=[]
+    )  # Change `your_module` to the actual module name


is this comment still in force? looks like it's now redundant [?]

python/python/bystro/search/utils/tests/test_annotation.py

working implementation of saving with statistics; also added fix for …

1776898

…issue bystrogenomics#316

akotlar requested review from wingolab, cristinaetrv and poneill October 24, 2023 03:26

akotlar assigned poneill, wingolab and cristinaetrv Oct 24, 2023

akotlar commented Oct 24, 2023

View reviewed changes

akotlar added 2 commits October 24, 2023 03:29

add helpful comment

f155c9f

format test_compress.py

db2527c

akotlar mentioned this pull request Oct 24, 2023

[search/save] Add statistics calculation and statistical filters support #314

Open

poneill approved these changes Oct 24, 2023

View reviewed changes

akotlar added 3 commits October 25, 2023 00:01

address comments

4ae9c0b

cleanup

699b09d

add missing header field in AnnotationOutputs

270c136

akotlar merged commit 9a7584a into bystrogenomics:master Oct 25, 2023
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[search/save] Run bystro-stats on saved annotations #321

[search/save] Run bystro-stats on saved annotations #321

akotlar commented Oct 24, 2023 •

edited

Loading

akotlar Oct 24, 2023

poneill Oct 24, 2023

akotlar Oct 24, 2023

poneill Oct 25, 2023

poneill left a comment

poneill Oct 24, 2023

poneill Oct 24, 2023

poneill Oct 24, 2023

poneill Oct 24, 2023

poneill Oct 24, 2023

poneill Oct 24, 2023

akotlar Oct 24, 2023

poneill Oct 24, 2023



		def test_from_dict_no_arg():
		config = DelimitersConfig.from_dict()


		def get_delimiters(annotation_config: dict[str, Any] \| None = None):

[search/save] Run bystro-stats on saved annotations #321

[search/save] Run bystro-stats on saved annotations #321

Conversation

akotlar commented Oct 24, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

poneill left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

akotlar commented Oct 24, 2023 •

edited

Loading