Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Content Discovery pagination support #2715

Open
uvegla opened this issue Oct 11, 2024 · 1 comment
Open

[Bug]: Content Discovery pagination support #2715

uvegla opened this issue Oct 11, 2024 · 1 comment
Assignees
Labels
bug Something isn't working rm-external Roadmap item submitted by non-maintainers

Comments

@uvegla
Copy link

uvegla commented Oct 11, 2024

zot version

v2.1.1

Describe the bug

When setting up onDemand: false sync extension rules, the prefix matching is done only the 1st page result of the content discovery made towards the remote registry. With large registries this results in some images never getting synced by Zot.

The pagination rules are defined by the specification here: https://github.com/opencontainers/distribution-spec/blob/v1.1.0/spec.md#content-discovery

The code that does the discovery is rooted here: https://github.com/project-zot/zot/blob/v2.1.1/pkg/extensions/sync/service.go#L221-L232

It makes a call to <remote>/v2/_catalog and only makes it once. Going further down the path, the Link response header is not checked and simply the response body is returned, thus not getting a full picture of the registry.

This could potentially be an issue for listing tags as well - as according to the docs, that supports pagination as well - but I did not check that use case.

To reproduce

Configuration

You have to try this with a large registry with more than a 100 repositories. Relevant part of the Zot JSON configuration:

"registries": [
  {
    "urls": [
      "https://example.org/"
    ],
    "onDemand": false,
    "pollInterval": "12h",
    "tlsVerify": true,
    "maxRetries": 3,
    "retryDelay": "30m",
    "onlySigned": false,
    "content": [
      {
        "prefix": "my-org/aaa-operator",
        "tags": {
          "regex": "^v?\\d+\\.\\d+\\.\\d+$"
        }
      },
      {
        "prefix": "my-org/zzz-operator",
        "tags": {
          "regex": "^v?\\d+\\.\\d+\\.\\d+$"
        }
      }
    ]
  }
]

Then simply run:

zot serve config.json

Assuming there are more than 100 images in the repository and my-org/zzz-operator is alphabetically beyond 100, the syncing will never occur for it, practically it is ignored.

Manual reproduction

# 1st 100 repositories
curl -H "Authorization: Bearer $(curl -s https://<REMOTE>/oauth2/token\?scope\=registry%3Acatalog%3A%2A\&service\=<REMOTE> | jq -r .access_token)" https://<REMOTE>/v2/_catalog

# Take the contents on the Link response header and make follow up requests, for example
curl -H "Authorization: Bearer $(curl -s https://<REMOTE>/oauth2/token\?scope\=registry%3Acatalog%3A%2A\&service\=<REMOTE> | jq -r .access_token)" 'https://<REMOTE>/v2/_catalog?last=my-org%2Fxxx-operator&n=100&orderby=>'

# Repeat until Link response header is empty

Expected behavior

In the GetRepositories logic the Link response header should be respected and followed until it is empty. Potentially this might affect listing tags for repositories as well, according to the specification.

Screenshots

No response

Additional context

No response

@uvegla uvegla added the bug Something isn't working label Oct 11, 2024
@rchincha rchincha added the rm-external Roadmap item submitted by non-maintainers label Oct 11, 2024
@rchincha
Copy link
Contributor

@eusebiu-constantin-petu-dbk can you take a look pls?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working rm-external Roadmap item submitted by non-maintainers
Projects
None yet
Development

No branches or pull requests

3 participants