Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Force re-caching of one file? #333

Open
cmhac opened this issue Oct 28, 2024 · 6 comments
Open

Force re-caching of one file? #333

cmhac opened this issue Oct 28, 2024 · 6 comments

Comments

@cmhac
Copy link

cmhac commented Oct 28, 2024

Is your feature request related to a problem? Please describe

My team is using this configuration to publicly serve files from a private s3 bucket. We've noticed that there are some cases where files that have changed or been deleted from the bucket are still being served, which doesn't work for us.

Is there a way to force the gateway to get the latest version of a given file, perhaps via a header? I've googled as much as I can and haven't found any clear solutions.

Describe the solution you'd like

Ideally, it would be possible to send an http request for a file that tells nginx to not fetch a file from the bucket even if the file has been cached.

Describe alternatives you've considered

Currently we have disabled caching while we work on this project, which is not ideal. We want to utilize caching for obvious performance reasons but also need to have more fine-grained control over updating certain files quickly.

@cmhac cmhac changed the title Force re-caching of one file Force re-caching of one file? Oct 28, 2024
@4141done
Copy link
Collaborator

4141done commented Oct 31, 2024

Hello, sorry for the delay in response.

So the gateway does not allow this by default, but I have some options we can discuss for something like this. Take a look and see if these strategies can work for your use case, then we could talk about how to make them happen with the s3 gateway project.

proxy_cache_bypass

This is most similar to what you suggest. In this case, you can specify some variable that, if not empty or not 0 will bypass the cache. This has the side effect of updating the cached file. To handle the case where a file is deleted, you'll need to add proxy_cache_valid 404 1m; to allow us to cache 404s for a specific path. The cache TTL should be short.

http://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_cache_bypass

The scratch nginx config below uses two servers to illustrate this. So sending the Skip-Cache header will result in the cache being bypassed for that request, and subsequent requests will get the new value.

http {
    proxy_cache_path /path/to/my/cache levels=1:2 keys_zone=my_cache:10m max_size=10g inactive=60m use_temp_path=off;
    sendfile        on;

    server {
        listen       4000;
        add_header X-Cache-Status $upstream_cache_status;



        location / {
            proxy_cache_bypass $http_skip_cache;
            proxy_cache my_cache;
            proxy_cache_valid 404 1m;
            proxy_pass http://localhost:4001;
        }

    }

    server {
        listen 4001;
        root /path/to/my/files;

        location / {
            expires 1h;
            add_header Cache-Control "public, no-transform" always;
        }
    }

}

proxy_cache_purge

This is another option, but only available in NGINX Plus. This would allow you to set up an endpoint that takes flexible wildcards to allow purging. It also doesn't require the header on the public endpoint so security wise it could be a better choice.

http://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_cache_purge

I don't have a lot of personal experience with this directive but it could work for you.

@cmhac
Copy link
Author

cmhac commented Nov 1, 2024

Thank you! I was looking at proxy_cache_purge, but purchasing Plus for the purposes of the project I'm working on isn't really an option for organizational reasons. I'll try this today

@cmhac
Copy link
Author

cmhac commented Nov 1, 2024

I've set the proxy_cache_bypass as described, but am having an issue with caching 404s

I'm running in docker, and have created ssl certificates per @dekobon's super helpful comment in #138. I added this to my s3_server.conf.template

proxy_cache_bypass $http_skip_cache;
proxy_cache_valid 404 1m;

I am now seeing new versions of files when they're updated.

However, I'm only seeing a 404 for deleted files when the Skip-Cache: 1 header is present. Without it, I'm still seeing the deleted files in a regular browser request. Is there an additional value I should be setting in the configuration?

@dekobon
Copy link
Collaborator

dekobon commented Nov 1, 2024

Hi Chris,

Can you confirm that the browser is not using its own cache? I'm doubt it isn't the problem, but I want to exhaust that possibility before moving forward.

@cmhac
Copy link
Author

cmhac commented Nov 4, 2024

Yep definitely not a browser caching issue, I'm seeing the behavior with basic requests via curl.

I deleted a file "test/test.html" from the bucket. I have verified it has been deleted in aws console.

Running curl -H "Skip-Cache: 1" <url> shows 404, but curl <url> shows the old version.

This is not the behavior I'm experiencing with changes to the file. If the file is modified in the bucket, running the request with the Skip-Cache header shows the updated file, then subsequent requests, via curl or browser, also show the new version of the file as expected.

For reference, I have these values set in s3_server.conf.template:

proxy_cache_bypass $http_skip_cache;
proxy_cache_valid 404 1m;

I have this in my Dockerfile

FROM ghcr.io/nginxinc/nginx-s3-gateway/nginx-oss-s3-gateway:latest

COPY ./s3_server.conf.template /etc/nginx/templates/gateway/s3_server.conf.template
RUN chmod 644 /etc/nginx/templates/gateway/s3_server.conf.template

For what it's worth, deleted files are showing 404 after a few minutes once the cache expires normally, it just appears that the gateway isn't caching the 404 after the headers are used.

@4141done
Copy link
Collaborator

4141done commented Nov 5, 2024

@cmhac So I think what could be happening is that NGINX will only update the cache for the 404 response if the response contains the Cache-Control header too. S3 doesn't return the Cache-Control header for a 404 by default.

You can see in my example I had to add the always argument to the add_header directive in the example server. So I suspect that we can solve this issue by somehow getting S3 to return Cache-Control headers for a 404 response.

I have not personally done this before, but some quick googling shows that you could try enabling static website hosting, setting and index document, then configuring that to return the right headers.

There were also mentions of AWS Object lamba to add headers - but not sure if you want the additional layer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants