Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Azure Blob Storage List Blobs does not return Marker #3010

Closed
rodyvansambeek opened this issue Jul 25, 2023 · 9 comments
Closed

Azure Blob Storage List Blobs does not return Marker #3010

rodyvansambeek opened this issue Jul 25, 2023 · 9 comments
Assignees
Labels

Comments

@rodyvansambeek
Copy link

Expected Behavior

When using the following code on a Azure Blob storage account with more than 10 files the marker value is not returned, but always returns an empty string:

var bindingSettings = new { maxResults = 10 };
var bindingSettingsBytes = JsonSerializer.SerializeToUtf8Bytes(bindingSettings);
bindingRequest.Data = new ReadOnlyMemory<byte>(bindingSettingsBytes);
var response = await _daprClient.InvokeBindingAsync(bindingRequest);

response.Metadata.TryGetValue("marker", out marker);

I would expect the marker variable to have the marker value.

I have tested it with a local Azurite blob storage account, but also an online Azure Blob Storage account. Both just return the first 10 results and an empty string as marker. This way we don't got any way to iterate through all pages.

Tested the same query using the AZ CLI, to confirm that it wasn't an issue in Azure, but that returns the "nextMarker" field correctly as the last object of the result:

az storage blob list --connection-string "[connstring]" -c [containername] --num-results 10 --show-next-marker

Will return:

[
  {
     [blob 1]
  },
  {
    [blob 2]
  },
  .... (total 10 results)
  {
    "nextMarker": "e61534b1-9010-4e8e-838a-46049d01b393/54-spiderfull/property-models/18216611.json"
  }
]

Actual Behavior

The marker value in the response.Metadata object is always ""

Steps to Reproduce the Problem

Use the above code using Dapr with C# and invoke the "list" operation on the binding with a maxResults lower then the number of items in the blob storage account.

@rodyvansambeek rodyvansambeek added the kind/bug Something isn't working label Jul 25, 2023
@ItalyPaleAle ItalyPaleAle added this to the v1.12 milestone Jul 25, 2023
@ItalyPaleAle
Copy link
Contributor

Thanks for reporting this, we'll take a look

@github-actions
Copy link

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged (pinned, good first issue, help wanted or triaged/resolved) or other activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale label Aug 24, 2023
@rodyvansambeek
Copy link
Author

I don't think this is fixed, so needs to stay open.

@github-actions github-actions bot removed the stale label Aug 25, 2023
@berndverst berndverst self-assigned this Sep 13, 2023
@berndverst
Copy link
Member

berndverst commented Sep 13, 2023

From what I can see, Dapr will correctly return a marker when there are more than 5000 (default max results) results, or whatever number you configured for this.
Additionally, Dapr will return a marker if fewer results had to be returned because of requests across multiple partitions.

However, when partitions are not involved and there are fewer results than max results, Dapr will automatically page for you and reach the last page which has no further additional marker. In this case we return the empty marker.

I do not see any issue here with the marker behavior. Our integration (certification) tests also show that the markers are returned correctly.

@rodyvansambeek perhaps try setting the max results to a lower number and you should see that Dapr is indeed returning markers.

@berndverst berndverst added question Further information is requested kind/bug Something isn't working and removed kind/bug Something isn't working question Further information is requested labels Sep 13, 2023
@berndverst berndverst removed this from the v1.12 milestone Sep 13, 2023
@berndverst
Copy link
Member

For convenience I am adding a new property pagesTraversed in the response metadata of the list operation. This will return the number of pages that were internally traversed.

However, I maintain that I do not see any issue with the component @rodyvansambeek

#3126

@rodyvansambeek
Copy link
Author

rodyvansambeek commented Sep 14, 2023

@berndverst Thank you for explaining the inner workings. In the next few days, I'll check the project I'm working on to see if it performs as described when I don't specify the 'maxResults' parameter and want to retrieve more than 5,000 results. In my initial test setup, I'm fairly certain it only returned the first 10 results, even though there were hundreds available in the blob storage account, and it did not include a marker. I'll update you as soon as possible. I also plan to test it with a storage account that has over 5,000 results to see if it returns a marker in that case.

@berndverst
Copy link
Member

@rodyvansambeek if you don't mind - wait until Dapr 1.12 - or the 1.12.0-rc.2 version (if you don't mind trying out our release candidate). That will include the change I made yesterday to also show how many pages were traversed, and that will report the number of items were returned (which now can be at most maxResults + pageSize)

@berndverst
Copy link
Member

@rodyvansambeek one additional comment here:

If you specify maxResults = 10 -- that is not the page size. This actually tells the server to only return 10 results. Even if more results exist, that does not mean there will pagination for additional results beyond 10 results - the server will not actually return more results.

We do not have a way to control the page size.

@rodyvansambeek
Copy link
Author

@berndverst That was the issue. I assumed that would be the page size, and the marker would be returned after that amount of results. I understand the inner workings now. Maybe it's a good idea to change this in the documentation because if you read the part of the data parameters on: https://docs.dapr.io/reference/components-reference/supported-bindings/blobstorage/#list-blobs
It looks like you need to specify a maxResults and get a marker back for every page to iterate through all the results.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants