Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use S3 as storage backend #361

Open
tschoonj opened this issue May 6, 2021 · 12 comments
Open

Use S3 as storage backend #361

tschoonj opened this issue May 6, 2021 · 12 comments

Comments

@tschoonj
Copy link
Contributor

tschoonj commented May 6, 2021

Hi @vsoch

We are thinking of deploying our own sregistry instance, and I was wondering if it is currently possible to have the uploaded images stored on an S3 endpoint (Ceph in our case).

Thanks in advance!

@vsoch
Copy link
Member

vsoch commented May 6, 2021

@tschoonj I did write a client for Ceph with sregstry-cli, but here we use Minio (which uses the S3 multipart upload protocol) to mimic the Singularity client and interact directly with their library:// API. Would you be able/open to trying that?

@tschoonj
Copy link
Contributor Author

tschoonj commented May 6, 2021

We would like our users to be able to download and run the containers using a single singularity command (e.g. singularity pull shub://containers.page/collection/container:tag), and not rely on the sregistry tool to make this happen.

I am very much familiar with the S3 plugin of sregistry-cli, and I am sure that it would work nicely with Ceph through boto3, but given the user requirement not an option here 😢

It's not a big problem though, as we also have access to CephFS shares that we can mount on the VM running registry, and use those for storing the images.

@vsoch
Copy link
Member

vsoch commented May 6, 2021

Oh I’m not suggesting you use that, just that I’m familiar at least with the interactions. My suggestion (and question) is if you could use the default Minio backend, also S3 compliant, so you can just do singularity pull library://

@tschoonj
Copy link
Contributor Author

tschoonj commented May 6, 2021

Ah, apologies for misunderstanding then.

Is it possible then to configure Minio to use our Ceph endpoint with its credentials? And if so, how would I do that?

Thanks in advance!

@vsoch
Copy link
Member

vsoch commented May 6, 2021

That actually might work - I haven't given it a try (but it notes that it's possible here) minio/minio#6157. You should give it a shot!

@tschoonj
Copy link
Contributor Author

tschoonj commented May 7, 2021

Good morning/evening,

I had a closer look at the code, and it might be as easy as setting MINIO_SERVER and MINIO_EXTERNAL_SERVER to our Ceph endpoint.

In the docs you mention:

However for versions 1.1.24 and later, to better handle the Singularity library:// client that uses Amazon S3, we added a Minio Storage backend

So I assume that if this works with AWS S3, it should also work with Ceph. One thing that may be a bit of a problem is the presigned urls, where you enforce S3v4, which may not be supported by our (old) Ceph installation...

Thanks for the help!

@vsoch
Copy link
Member

vsoch commented May 7, 2021

Sure thing! Give it a shot and let me know what issues you encounter - there likely could be workarounds for them.

@tschoonj
Copy link
Contributor Author

Hi @vsoch ,

I was wondering if, given the use of S3 for storage through pre-signed urls, it is still necessary or advantageous to use your sregistry_nginx image instead of the regular nginx? If not, replacing it with traefik might be interesting, as it can take care of generating the letsencrypt certificates.

@vsoch
Copy link
Member

vsoch commented May 10, 2021

@tschoonj the reason the sregistry_nginx image is still there is because it's used for upload from a web interface, e.g., here

<form name="upload" method="POST" enctype="multipart/form-data" action="/upload">
(and you can verify by manually uploading a container to a collection). If you'd like to test removing that image and replacing the upload with something else, or figuring out how to update the sregistry_nginx image to support traefik, I would definitely be open to trying that!

@tschoonj
Copy link
Contributor Author

I knew you had a good reason to keep it around, I just didn't see it 😄

Do you think that these traefik config options might replicate the functionality offered by the nginx module you use for multipart uploads?

@vsoch
Copy link
Member

vsoch commented May 10, 2021

I've never used trafik so I can't comment, but conceptually we need something alongside the registry server (that can be bound to it) where we can upload a large stream, and then have a callback that points the server to where it as (at best for a copy). This plugin looks like it serves more as a protection from large requests, which is the opposite of what we want to do.

My general theory with these things is that if you aren't sure, give it a try and see if it works!

@tschoonj
Copy link
Contributor Author

Hmmm... I thought that maybe the memRequestBodyBytes setting would be useful here, the threshold (in bytes) from which the request will be buffered on disk instead of in memory...

I hope to give this a try over the next couple of days...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants