Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fail to run task if remote_root path doesn't existed #436

Open
link89 opened this issue Feb 7, 2024 · 4 comments
Open

fail to run task if remote_root path doesn't existed #436

link89 opened this issue Feb 7, 2024 · 4 comments
Labels
enhancement New feature or request

Comments

@link89
Copy link

link89 commented Feb 7, 2024

When I running the following code with dflow

from dflow.plugins.dispatcher import DispatcherExecutor

kwargs = {'host': 'xxx', 'username': 'xxx', 'port': 6666, 'machine_dict': {'batch_type': 'Slurm', 'context_type': 'SSHContext', 'remote_profile': {
    'key_filename': '/home/xxx/.ssh/id_ed25519'}}, 'resources_dict': {'number_node': 1, 'cpu_per_node': 1}, 'queue_name': 'c52-small', 'remote_root': '/data/home/xxx/tmp/dflow-galaxy/square-sum'}
exeuctor = DispatcherExecutor(**kwargs)

If the remote_root path doesn't existed, it will raise error instead of create it automatically. I think this could be easily fixed by using mkdir -p or os.makedirs(path, exist_ok=True) somewhere in dpdispatcher.

@njzjz
Copy link
Member

njzjz commented Feb 7, 2024

This seems to be an unsafe behavior, when one gives a wrong path. Considering one only needs to create a directory once, but needs to ensure the path is correct in each submission, I prefer throwing the error when the directory does not exist.

@njzjz njzjz added the enhancement New feature or request label Feb 7, 2024
@link89
Copy link
Author

link89 commented Feb 8, 2024

This seems to be an unsafe behavior, when one gives a wrong path.

It's always possible for users to choose a wrong path whether you create path for them or not. It just make users feel less convenient than improve security.

The reason I think root path should be created automatically is I will choose different remote directories for different projects and different runs. I don't want to login to a remote session just for creating a path. Besides of it, for workflows that are running across diffferent clusters it would be annoy to create the path manually on each of them.

@njzjz
Copy link
Member

njzjz commented Feb 8, 2024

The reason I think root path should be created automatically is I will choose different remote directories for different projects and different runs.

This doesn't make sense. The submission runs in a temporary subdirectory of the root path instead of the root path itself, which will be entirely deleted after the submission is finished. The hash of the submission determines the subdirectory name, so the subdirectory will be different once any of the commands, the forward/backward file paths, or the local root is different.

Besides of it, for workflows that are running across diffferent clusters it would be annoy to create the path manually on each of them.

This is more dangerous, considering different machines may have different directory structures. If you don't log in to the cluster to check whether the directory exists, you may create directories in the wrong path.

@link89
Copy link
Author

link89 commented Feb 8, 2024

The submission runs in a temporary subdirectory of the root path instead of the root path itself, which will be entirely deleted after the submission is finished.

I am aware of it. Then how about provide an option to enable this behavior?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants