Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when deleting entry #41

Open
ericpre opened this issue Feb 20, 2023 · 6 comments
Open

Error when deleting entry #41

ericpre opened this issue Feb 20, 2023 · 6 comments
Labels
bug Something isn't working

Comments

@ericpre
Copy link
Contributor

ericpre commented Feb 20, 2023

I have setup a nomad-oasis server with a symbolic link of the ./volumes folder to a separate folder. It works fine to upload the data and I see the data being created in the right place, but there are errors when deleting the entry from the "your existing upload" section:

at first attempt:

Process delete_upload failed: OSError: [Errno 39] Directory not empty: 'archive'

at second attempt:

Process delete_upload failed: OSError: [Errno 16] Device or resource busy: '.nfs000000010fa5315e00000001'

image

@markus1978 markus1978 added question Further information is requested support-discussion This is not a nomad issue, but the discussion is relevant to support running NOMAD. labels Feb 20, 2023
@markus1978
Copy link
Member

It looks like there are some files not created by NOMAD (i.e. by the nomad user, linux uid 1000). When NOMAD is trying to delete the directory it is either not removing the extra files or not allowed to remove the extra files. From the file name, I would assume it is your nfs implementation creating some extra files in the upload folders or the nfs is during operation on the file, or something like this.

Could you check the owning user id and rights on this file '.nfs000000010fa5315e00000001' for us. This might help to find a solution. You have to imagine that NOMAD acts as a nomal user with id 1000 that tries to "rm -r" a directory.

@ericpre
Copy link
Contributor Author

ericpre commented Feb 23, 2023

Thanks @markus1978 for the quick reply, please find below the information for this file:

-rw-r--r--. 1 localadmin localadmin 184907 Feb 14 14:52 .nfs000000010fa5315e00000001

The UID of localadmin is 1000.

After restarting the container, the .nfsxxxx... disappeared and I was able to delete the entry. There are two things which are difference from the "standard" configuration, i.e. following https://nomad-lab.eu/prod/v1/staging/docs/oasis.html#quick-start. There are things which are different here:

  • symbolic link of the ./volumes folder to a separate folder
  • the destination folder is a folder mounted to this virtual machine, I don't know much more than that, but this is what I gather from the people who setup this virtual machine.

I have tried to change the path of the .volumes folder in the docker-compose.yaml file to point directly to the folder (and not use the symlink) and there is the same error and a .nfsxxxx file is created.

@markus1978 markus1978 added bug Something isn't working and removed question Further information is requested support-discussion This is not a nomad issue, but the discussion is relevant to support running NOMAD. labels Feb 24, 2023
@markus1978
Copy link
Member

I classify this as a bug for now. From what you are saying, NOMAD should been able to delete the file itself and consequently should be able to delete the folder. And even if not, NOMAD should expect these situations, because we want to enable clients to integrate the NOMAD directories into existing storage solutions like you are doing it.

@markus1978
Copy link
Member

@mohammadnakhaee Can you have a look at this, please. You could experiment with externally created extra "secret" files (starting with .) in the .volumes upload folders. It is more likely that this happens with such files in the upload folder or the upload archive folder. Just try if you can reproduce.

@ericpre
Copy link
Contributor Author

ericpre commented Feb 24, 2023

For completeness, the full path is:

/oasis-data/.volumes/fs/staging/ZX/ZXm1pqqmQ1eewxe6HCB7vw/archive/.nfs000000010638e2dc00000007

The file name is different because it is from a different upload/delete test. This file seems to be created when attempting to delete the data entry.

It deletes the upload successfully when following these steps:

  • upload a file (in this case, one of the example)
  • restart server
  • delete file

Could it be that there is something that keep a file open (I can see only a *.msg file in the archive folder) while it shouldn't and causes the error?

@mohammadnakhaee
Copy link
Contributor

I could reproduce it by changing the attribute of an extra file
sudo chattr +i .test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants