Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature: add support for S3 Express #225 #228

Closed
wants to merge 1 commit into from

Conversation

hveiga
Copy link

@hveiga hveiga commented Apr 12, 2024

These changes aim to bring support for S3 Express directory buckets #225 .

Overall the required changes are the following:

  • AWS V4 signature needs to use the service s3express instead of s3.
  • default.conf needs to use the full bucket + server url, for example:
...
map virtual $s3_host_hdr {
   virtual "my-directory-bucket--usw2-az5--x-s3.s3express-usw2-az5.us-west-2.amazonaws.com";
   path    "my-directory-bucket--usw2-az5--x-s3.s3express-usw2-az5.us-west-2.amazonaws.com:443";
   default "my-directory-bucket--usw2-az5--x-s3.s3express-usw2-az5.us-west-2.amazonaws.com";
}
...
  • upstreams.conf needs to use the full bucket + server url as well, for example:
upstream storage_urls {
    # Upstreams are not refreshed until NGINX configuration is reloaded.
    # NGINX Plus will dynamically reload upstreams when DNS records are changed.

    # Be sure to specify the port in the S3_SERVER and be sure that port
    # corresponds to the https/http in the proxy_pass directive.
    server my-directory-bucket--usw2-az5--x-s3.s3express-usw2-az5.us-west-2.amazonaws.com:443;
}

This PR does not implement the new session based S3 Express auth model to speed up requests.

I am looking for some guidance on how this should be implemented. I introduced a new env var S3_SERVICE and based on it the code is conditionally replacing default.conf and upstreams.conf. I am not sure if this is the approach that is desired but happy to receive feedback on it.

Also, when running tests by doing ./test.sh oss I am getting 2024/04/12 18:25:15 [emerg] 1#1: SyntaxError: "awscredentials" has already been declared in /etc/nginx/:3 and I am not sure how to pass that issue.

@4141done
Copy link
Collaborator

Hi @hveiga thank you so much for this contribution! I am going to make time to review/test tomorrow and get back to you with any comments.

@4141done
Copy link
Collaborator

So I'm still learning more about s3 Express and will be setting up a test for myself in the coming days to make sure I can review this properly. However after reading the PR I think there are some things for us to discuss:

  1. I'm concerned about the change of the path-style host from "${S3_SERVER}:${S3_SERVER_PORT}" to "${S3_BUCKET_NAME}.${S3_SERVER}:${S3_SERVER_PORT}" since based on my understanding, the host of the path-style request should not have the bucket name. Is it possible that for your case you need to set S3_STYLE to virtual?
  2. I think we need to find a way not to fork the default.conf file since it will be hard to keep both in sync. We need to find a way accomplish this in one file using nginx conditional logic or njs (I can help with this once we understand item 1 above)
  3. The documentation I have been reading mentions that requests for objects will need to go to zonal endpoints. I don't think that needs any code changes since it should just mean a different value provided to the S3_SERVER configuration setting but maybe we can add some documentation around it as we reach a conclusion on this PR.

So basically I'd like to dig in a bit to the changes in the hostname used in signing to make sure they are necessary (I couldn't find any docs that noted this but I could very well be missing soemthing). If they are we may need to restructure some of the logic in the default.conf or utilize a js_set handler to add some more complex conditional logic.

Your introduction of the S3_SERVICE config variable and its default look good 👍

@hveiga
Copy link
Author

hveiga commented Apr 16, 2024

All your concerns are valid and agree with them. To be honest I am not too keen on this approach I presented but I wanted to get some progress going. Let me add some insights to your points:

  1. I tried setting S3_STYLE to virtual but that did not work. The main reason is that when you have the S3_SERVER set to S3 General endpoint like s3-us-west-2.amazonaws.com the upstream works because it can accessed and nginx does not panic. When using S3 Express following the same logic, the endpoint for S3_SERVER would be something like s3express-usw2-az1.us-west-2.amazonaws.com. That unfortunately cannot be reached by nginx and panics. Maybe there is another combination I have not found to make this work, but I did try quite a few with no success except for having the full name: my-directory-bucket--usw2-az5--x-s3.s3express-usw2-az5.us-west-2.amazonaws.com:443.

  2. I am not in love with duplication but I did not want to complicate the env var substitution either. If we can add some conditional logic to keep a single file, I am all for it.

  3. I have tried the zonal endpoints as well with no luck. Again, I may have missed something so another check from your side would be greatly appreciated.

I am not too familiar with js_set in general, but let's use it if that simplifies the changes.

@4141done
Copy link
Collaborator

Thanks for the quick follow up @hveiga - sounds like our next task is to compare our understanding of the requirements for connection. I'm going to do some testing myself and report back then we can compare notes. This might take me a day or two but I'll try to prioritize it - very much appreciate your involvement in bringing this feature to the gateway

@4141done
Copy link
Collaborator

Hi @hveiga so I set up S3 Express One Zone bucket and I was able to confirm that we are actually able to use it without any changes to the gateway code. I've documented my findings here:
#229

Can you take a look and see if the described configuration works for your use case?

In the process, I learned some things about the gateway code that might be relevant:

  1. The actual upstream is set by S3_SERVER (https://github.com/nginxinc/nginx-s3-gateway/blob/master/oss/etc/nginx/templates/upstreams.conf.template) so bucket-name is not automatically added
  2. It does not seem that it is required to set the service as s3express in the signature

I think at this point if we have a way of using the express zone product with the gateway then documenting it can be our first step. I'm trying to limit the number of additional configuration variables to the project until I have a chance to do a bigger refactor of how the project works. However, if there are any smaller usability changes we can make to improve the experience for s3 express please let me know

@hveiga
Copy link
Author

hveiga commented Apr 19, 2024

I tried what you are suggesting on #229 but unfortunately it does not seem to work for my bucket. To eliminate potential issues, I recreated the bucket with a different name and I double checked I can access it with my AWS credentials but still I get the same 404 error in the logs.

With debug enabled, I can see the host is incorrect (my-bucket-name appears twice): my-bucket-name.my-bucket-name.s3express-usw2-az1.us-west-2.amazonaws.com. The settings I am using should be identical to the other ones:

S3_BUCKET_NAME=my-bucket-name
AWS_ACCESS_KEY_ID="REDACTED"
AWS_SECRET_ACCESS_KEY="REDACTED"
AWS_SESSION_TOKEN="REDACTED"
S3_SERVER=my-bucket-name.s3express-usw2-az1.us-west-2.amazonaws.com
S3_SERVER_PORT=443
S3_SERVER_PROTO=https
S3_REGION=us-west-2
S3_STYLE=virtual
DEBUG=true
AWS_SIGS_VERSION=4
ALLOW_DIRECTORY_LIST=false
PROVIDE_INDEX_PAGE=false
APPEND_SLASH_FOR_POSSIBLE_DIRECTORY=false
DIRECTORY_LISTING_PATH_PREFIX=""
PROXY_CACHE_MAX_SIZE=10g
PROXY_CACHE_SLICE_SIZE="1m"
PROXY_CACHE_INACTIVE=60m
PROXY_CACHE_VALID_OK=1h
PROXY_CACHE_VALID_NOTFOUND=1m
PROXY_CACHE_VALID_FORBIDDEN=30s

For completeness, I am running ghcr.io/nginxinc/nginx-s3-gateway/nginx-oss-s3-gateway:latest-20240306.

If we can have this working with no changes, that would be the best. Thank you for looking into this.

@4141done
Copy link
Collaborator

Hey so I just tested this again using the build you mentioned and was able to get it to work. I don't see an issue with the config you posted. I definitely am open to the possibility I'm missing something in my process, but looking at the error you're getting I am suspicious that the code from this pull request is involved. Here's the diff from your modified default.conf.template

--- a/common/etc/nginx/templates/default.conf.template
+++ b/common/etc/nginx/templates/default.conf.template
@@ -15,13 +15,13 @@ map $request_uri $uri_full_path {

 # Remove/replace a portion of request URL (if configured)
 map $uri_full_path $uri_path {
-       "~^$STRIP_LEADING_DIRECTORY_PATH(.*)"  $PREFIX_LEADING_DIRECTORY_PATH$1;
+    "~^$STRIP_LEADING_DIRECTORY_PATH(.*)"  $PREFIX_LEADING_DIRECTORY_PATH$1;
     default $PREFIX_LEADING_DIRECTORY_PATH$uri_full_path;
 }

 map $S3_STYLE $s3_host_hdr {
     virtual "${S3_BUCKET_NAME}.${S3_SERVER}";
-    path    "${S3_SERVER}:${S3_SERVER_PORT}";
+    path    "${S3_BUCKET_NAME}.${S3_SERVER}:${S3_SERVER_PORT}";
     default "${S3_BUCKET_NAME}.${S3_SERVER}";
 }

@@ -134,7 +134,7 @@ server {

         # Enable passing of the server name through TLS Server Name Indication extension.
         proxy_ssl_server_name on;
-        proxy_ssl_name ${S3_SERVER};
+        proxy_ssl_name ${S3_BUCKET_NAME}.${S3_SERVER};

         # Set the Authorization header to the AWS Signatures credentials
         proxy_set_header Authorization $s3auth;
@@ -195,7 +195,7 @@ server {

         # Enable passing of the server name through TLS Server Name Indication extension.

Which would result in the bucket name being included twice if it's added as part of the S3_SERVER config. Just so we can disprove that potential issue and move on with the solutioning, would you mind double checking that you've not got that version running somewhere?

@hveiga
Copy link
Author

hveiga commented Apr 19, 2024

I ran the changes from a fresh folder only having the settings file and running docker locally. Same result unfortunately.

One problem I see is the line virtual "${S3_BUCKET_NAME}.${S3_SERVER}"; from default.conf.template.

If we have:

  • S3_BUCKET_NAME = my-bucket-name
  • S3_SERVER = my-bucket-name.s3express-usw2-az1.us-west-2.amazonaws.com

Then virtual "${S3_BUCKET_NAME}.${S3_SERVER}"; would becomemy-bucket-name.my-bucket-name.s3express-usw2-az1.us-west-2.amazonaws.com. That happens with the version of the code available in main right now:

virtual "${S3_BUCKET_NAME}.${S3_SERVER}";

Can you check on your side what is the value of virtual?

@4141done
Copy link
Collaborator

4141done commented Apr 22, 2024

I ran the changes from a fresh folder only having the settings file and running docker locally. Same result unfortunately.

One problem I see is the line virtual "${S3_BUCKET_NAME}.${S3_SERVER}"; from default.conf.template.

If we have:

* `S3_BUCKET_NAME` = `my-bucket-name`

* `S3_SERVER` = `my-bucket-name.s3express-usw2-az1.us-west-2.amazonaws.com`

Then virtual "${S3_BUCKET_NAME}.${S3_SERVER}"; would becomemy-bucket-name.my-bucket-name.s3express-usw2-az1.us-west-2.amazonaws.com. That happens with the version of the code available in main right now:

virtual "${S3_BUCKET_NAME}.${S3_SERVER}";

Can you check on your side what is the value of virtual?

That's a good point. That code sets the Host header passed along to s3 and so we should make sure that it is correct. That along with the setting of the service name used to sign the s3 token header to s3express should be part of the changes we make if we want to do this right.

I pushed up some changes here: #229 so that we can go back to not including the s3 bucket in the S3_SERVER configuration variable as well as changing the service name. Note that this is not how I want to implement it finally but I wonder if you'd mind testing this?

In case it helps, here are the commands I'm using:

# From root of project
git checkout issue-225-usage-explanation
docker build --file Dockerfile.oss --tag issue-225  .

docker run --rm --env-file ./deployments/s3_express/settings.s3express.example --publish 80:80 --name issue-225 issue-225

Note that the S3_SERVER should be s3express-usw2-az1.us-west-2.amazonaws.com (or whatever the one is for your availability zone) and S3_BUCKET_NAME is the complex directory bucket name:
my-bucket-name--usw2-az1--x-s3

If you receive any errors - can you let me know the exact error message? It's still strange to me that I'm seeing this work and you are not. I may have some others also test to see if I'm missing some obvious misconfiguration on my test setup.

@hveiga
Copy link
Author

hveiga commented Apr 22, 2024

Finally working! I can confirm that with the changes in #229 it works for me.
Also I tested with both s3 and s3express as service and it works, so I would not really make that change.

I also verified that a regular S3 General buckets works as well with those changes. I'll leave a couple comments on the PR. Thanks for looking into this, greatly appreciated.

@4141done
Copy link
Collaborator

Finally working! I can confirm that with the changes in #229 it works for me. Also I tested with both s3 and s3express as service and it works, so I would not really make that change.

I also verified that a regular S3 General buckets works as well with those changes. I'll leave a couple comments on the PR. Thanks for looking into this, greatly appreciated.

Thank you for taking the time to work through this with me and testing it out. I'm going to adjust my PR to productionalize this then ask you for some review. I'll respond to your feedback in the other PR

@hveiga
Copy link
Author

hveiga commented Apr 22, 2024

I am closing this in favor of #229 .

@hveiga hveiga closed this Apr 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants