Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot change datatype of a collection #19064

Open
hexylena opened this issue Oct 28, 2024 · 8 comments
Open

Cannot change datatype of a collection #19064

hexylena opened this issue Oct 28, 2024 · 8 comments

Comments

@hexylena
Copy link
Member

Describe the bug

I'm trying to work around galaxyproject/tools-iuc#6493 which produces a collection labelled txt,tabular.

Galaxy Version and/or server at which you observed the bug

The Galaxy Server is running version 24.1.4.dev0 , and the web client was built on Saturday Oct 26th 10:07:28 2024 GMT+2 .
Commit: ccf4353

Browser and Operating System
Operating System: Linux
Browser: Chrome

To Reproduce
Steps to reproduce the behavior:

I've tried two solutions:

  • Changing the datatype of the output in the tool as it runs in a workflow. This has no effect.
  • Changing the datatype after the fact via the UI (I just need it to run to completion so I can work on other tasks while I wait for the IUC bug)
    • The change datatype tab is completely missing? I only see "Attributes" and "Database/build" neither of which includes a datatype.

Expected behavior

I can change the datatype, either via WF, or afterwards manually to work around issues manually.

Screenshots

a

Additional context

potentially xref #17734

@mvdbeek
Copy link
Member

mvdbeek commented Oct 28, 2024

Changing the datatype of the output in the tool as it runs in a workflow. This has no effect.

There are a lot of tests for this in the codebase, here's one that I just put together:
https://usegalaxy.org/u/marius/w/change-collection-datatype

It is possible that you have a traceback somewhere in your logs, in that case it would be good if you can post that.

The change datatype tab is completely missing?

You'll need celery for changing datatypes in batch. If you don't have Celery that tab isn't shown.

@hexylena
Copy link
Member Author

Celery was configured (I definitely forgot that was a requirement for that!)

gravity:
    celery:
        concurrency: 2
        loglevel: DEBUG

and seems to be processing jobs

galaxyctl[162059]: [2024-10-28 13:50:13,755: INFO/main] Task galaxy.dispatch_pending_notifications[ff410dcc-8d40-4899-a1ce-0c3896bad719] succeeded in 0.018974624574184418s: None
galaxyctl[162059]: [2024-10-28 13:55:14,394: INFO/main] Task galaxy.clean_object_store_caches[cc41221d-88bb-4050-9ff4-82563ecae6dc] received
galaxyctl[162059]: [2024-10-28 13:55:14,394: DEBUG/main] TaskPool: Apply <function fast_trace_task at 0x7f9e489e13a0> (args:('galaxy.clean_object_store_caches', 'cc41221d-88bb-4050-9ff4-82563ecae6dc', {'lang': 'py', 'task': 'galaxy.clean_object_store_caches', 'id': 'cc41221d-88bb-4050-9ff4-82563ecae6dc', 'shadow': None, 'eta': None, 'expires': None, 'group': None, 'group_index': None, 'retries': 0, 'timelimit': [None, None], 'root_id': 'cc41221d-88bb-4050-9ff4-82563ecae6dc', 'parent_id': None, 'argsrepr': '()', 'kwargsrepr': '{}', 'origin': '[email protected]', 'ignore_result': False, 'replaced_task_nesting': 0, 'stamped_headers': None, 'stamps': {}, 'properties': {'correlation_id': 'cc41221d-88bb-4050-9ff4-82563ecae6dc', 'reply_to': '7b208dab-a6ca-3fbd-a45d-f39e3521de2f', 'delivery_mode': 2, 'delivery_info': {'exchange': '', 'routing_key': 'galaxy.internal'}, 'priority': 0, 'body_encoding': 'base64', 'delivery_tag': '6a54628e-81e9-4ffa-81e0-7774b9889fc4'}, 'reply_to': '7b208dab-a6ca-3fbd-a45d-f39e3521de2f', 'correlation_id': 'cc41221d-88bb-4050-9ff4-82563ecae6dc', 'hostname':... kwargs:{})
galaxyctl[162059]: [2024-10-28 13:55:14,398: INFO/main] Successfully executed Celery task clean_object_store_caches to prune object store cache directories clean_object_store_caches to prune object store cache directories (0.132 ms)
galaxyctl[162059]: [2024-10-28 13:55:14,794: INFO/main] Task galaxy.clean_object_store_caches[cc41221d-88bb-4050-9ff4-82563ecae6dc] succeeded in 0.39791450649499893s: None

the tracebacks all look pretty normal:

$ journalctl -u galaxy-gunicorn --since '1 day ago' | grep Traceback -A50 | egrep '(Exception|Error)' | cut -c 44-
galaxyctl[108675]: AssertionError: File /srv/galaxy/var/tool-data/sift_db.loc specified by the 'filename' attribute not found
galaxyctl[109689]: AssertionError: File /srv/galaxy/var/tool-data/sift_db.loc specified by the 'filename' attribute not found
galaxyctl[109691]: AssertionError: File /srv/galaxy/var/tool-data/sift_db.loc specified by the 'filename' attribute not found
galaxyctl[158149]: AssertionError: File /srv/galaxy/var/tool-data/sift_db.loc specified by the 'filename' attribute not found
galaxyctl[159334]: AssertionError: File /srv/galaxy/var/tool-data/sift_db.loc specified by the 'filename' attribute not found
galaxyctl[159337]: galaxy.tool_util.toolbox.base ERROR 2024-10-28 10:55:04,276 [pN:main.2,p:159337,tN:Thread-2] Error reading tool from path: phenotype_association/sift.xml
galaxyctl[159337]: AssertionError: File /srv/galaxy/var/tool-data/sift_db.loc specified by the 'filename' attribute not found
galaxyctl[161842]: AssertionError: File /srv/galaxy/var/tool-data/sift_db.loc specified by the 'filename' attribute not found
galaxyctl[162844]: AssertionError: File /srv/galaxy/var/tool-data/sift_db.loc specified by the 'filename' attribute not found
galaxyctl[162846]: galaxy.tool_util.toolbox.base ERROR 2024-10-28 10:57:09,667 [pN:main.2,p:162846,tN:Thread-2] Error reading tool from path: phenotype_association/sift.xml
galaxyctl[162846]: AssertionError: File /srv/galaxy/var/tool-data/sift_db.loc specified by the 'filename' attribute not found
galaxyctl[162846]: AssertionError: File /srv/galaxy/var/tool-data/sift_db.loc specified by the 'filename' attribute not found
galaxyctl[162844]: AssertionError: File /srv/galaxy/var/tool-data/sift_db.loc specified by the 'filename' attribute not found
galaxyctl[162846]:     raise RequestParameterInvalidException(
galaxyctl[162846]: galaxy.exceptions.RequestParameterInvalidException: Extension 'txt,tabular' unknown, cannot use dataset collection as input

@mvdbeek
Copy link
Member

mvdbeek commented Oct 28, 2024

You need to enable celery in the galaxy config (the jobs you listed are cron-style jobs), and

galaxy.exceptions.RequestParameterInvalidException: Extension 'txt,tabular' unknown, cannot use dataset collection as input

explains the second part.

@hexylena
Copy link
Member Author

You need to enable celery in the galaxy config

right there's multiple celery toggles. Yes you're right I'm missing enable celery tasks.

this needs to be communicated more usefully to the user/admin, I think? e.g. showing the tab but disabling it and having a tooltip of "please enable celery tasks in your galaxy.yml to allow changing datatypes of a collection" would have potentially removed this issue completely.

explains the second part.

I'm not sure it does? that was unrelated testing on the same dataset and i triggered it by trying to extract element identifiers from that collection (which required manually dragging it into the form, hence I didn't report that) though I can see how it looks related

@natefoo
Copy link
Member

natefoo commented Nov 4, 2024

Is there a reason not to default enable_celery_tasks at this point? The two documented install and run methods (Ansible, Gravity via run.sh or directly) make sure you have a running Celery, and there are more and more parts of Galaxy that don't function without it.

Also, SQLAlchemy can be used as a results backend, is there any reason not to have Galaxy use it as the default if you don't specify something else (e.g. redis)?

@davelopez
Copy link
Contributor

Also, SQLAlchemy can be used as a results backend, is there any reason not to have Galaxy use it as the default if you don't specify something else (e.g. redis)?

Regarding this, the default is now using a simple SQLite database as the results backend #17949

I think the main concern about enabling it by default was the user rate limiting issue, but if I remember correctly it was fixed some time ago, so probably we could enable it by default at this point.

@hexylena
Copy link
Member Author

hexylena commented Nov 5, 2024

the default is now using a simple SQLite database as the results backend

couldn't/shouldn't this default to using whatever the database connection is? so it could default to postgres when that's in use?

either way would be great to have this enabled by default!! (or any notification to the end user that this feature is available in galaxy but disabled due to administrator (mis)configuration)

@jdavcs
Copy link
Member

jdavcs commented Nov 5, 2024

For 25.0 we'll consider enabling Celery by default.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants