-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] infinite iteration over blobList #9465
Comments
Hi, @asm0dey. This does sound like rather unexpected behavior. I can assure you that the listing results are not intended to loop this way! Are you at all able to capture any network traces so we can see if the PagedIterable is issuing repeated requests or simply emitting duplicate cached values? And do you only have those three blobs in your container? Or are there more expected results you aren't seeing? P.S. Sounds like a neat plugin :) |
I absolutely can, but I don't know why request and response bodies aren't being logged. But once again — I can't reproduce this behavior in clean environment so I don't thnk this is real bug, just something with classpath/classloader/whatever. P.S. And yes, you'll definitely like this plugin, it's https://plugins.jetbrains.com/plugin/12494-big-data-tools |
That'd be great. Thanks! I don't think request and response bodies are typically logged because they can by quite large. If you suspect some sort of classpath error, we could start with mvn dependency:tree and see what's going on? There may be some conflicts causing something strange. Wow that'd be great! Thanks for working on that project, and if we can be supportive in any way, please let us know. We love to see other folks in the community building tools around Storage and Azure in general! |
It's not that easy because we don't have traditional build system like maven or gradle, we'e building our project with IDEA itself (and on CI server too). But please, find UML diagram of dependencies here. I really can't find anything conflicting here, but I'm not that experienced in reactor and can't even imagine what could make it work the wrong way. |
Talking about logging: there is option to log headers and bodies, but body can obly be logged if size of body is known from headers, but according to logs there is no header. Here is small snip of log:
In class String contentTypeHeader = response.getHeaderValue("Content-Type");
long contentLength = this.getContentLength(logger, response.getHeaders());
if (this.shouldBodyBeLogged(contentTypeHeader, contentLength)) {
// snip
} else {
responseLogMessage.append("(body content not logged)").append(System.lineSeparator()).append("<-- END HTTP");
return this.logAndReturn(logger, responseLogMessage, response); and in method private long getContentLength(ClientLogger logger, HttpHeaders headers) {
long contentLength = 0L;
String contentLengthString = headers.getValue("Content-Length");
if (CoreUtils.isNullOrEmpty(contentLengthString)) {
return contentLength;
// snip So this API doesn't support logging of bodies. |
@asm0dey Sorry for the delay. Can you update your HttpLogOptions on your client builders to add "marker" as an allowedQueryParameter? It looks like it's being redacted from your logs, and I suspect that perhaps something is triggering a list call with the same marker over and over again. We coincidentally saw some similar behavior in some development we're doing over here, and I think it might be the same cause. I'm not seeing anything in the UML diagram that stands out, but I've also never looked at one before. |
Absolutely. Please find full log attached: |
|
Also we can see that there is some enormous number of requests: 673. It was like 2-4 requests from client side |
@rickle-msft any ideas on this? |
@rickle-msft just a friendly reminder |
Hi, @asm0dey. I'm so sorry about the delay! My emails from GitHub were getting filtered incorrectly for a little while, and when I noticed and fixed it, I must have also missed picking up this issue. Apologies again and thank you for the reminder. Yea it's interesting that all the requests have that empty @anuchandy Do you have any sense of if, as suggested, some sort of dependency problem could be causing this? It looks like he's using a shaded 12.5.0 jar, and I'm not too familiar with jar shading and how that might affect interactions with the classpath and classloader. |
Actually, I've even tried to unshade it and it changes literally nothing
in this behavior. Also I'm not changing any package names or smth like
that.
I've even reached one of reactor project developers, but his only idea
was that iterator is being recreated every call.
…On 20/04/29 10:53AM, Rick Ley wrote:
Hi, @asm0dey. I'm so sorry about the delay! My emails from GitHub were getting filtered incorrectly for a little while, and when I noticed and fixed it, I must have also missed picking up this issue. Apologies again and thank you for the reminder.
Yea it's interesting that all the requests have that empty `marker=` query parameter. The only way I can reproduce that specific behavior is by explicitly passing an empty string as the continuation token, and even that doesn't loop infinitely.
@anuchandy Do you have any sense of if, as suggested, some sort of dependency problem could be causing this? It looks like he's using a shaded 12.5.0 jar, and I'm not too familiar with jar shading and how that might affect interactions with the classpath and classloader.
--
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
#9465 (comment)
--
Regards,
Pasha
Big Data Tools @ JetBrains
|
could you please run the following code in your setup and share what you see as output. Yes, it's going to run infinitely, just break after you see BlobContainerAsyncClient blobContainerClient = new BlobContainerClientBuilder()
.endpoint("<your-storage-account-url>")
.sasToken("<your-sasToken>")
.containerName("mycontainer")
.buildAsyncClient();
PagedFlux<BlobItem> flux = blobContainerClient.listBlobsByHierarchy(path);
flux.byPage()
.doOnNext(new Consumer<PagedResponse<BlobItem>>() {
@Override
public void accept(PagedResponse<BlobItem> response) {
System.out.println("Processing Page:start");
List<BlobItem> items = response.getValue();
for (BlobItem item : items) {
System.out.println(item.getName());
}
if (response.getContinuationToken() == null) {
System.out.println("ContinuationToken: null");
} else if (response.getContinuationToken().length() == 0) {
System.out.println("ContinuationToken: empty");
} else {
System.out.println("ContinuationToken: non-empty:" + response.getContinuationToken());
}
System.out.println("Processing Page:end");
}
}).blockLast(); |
btw, please redact the item name if you wish, I'm interested in the |
Sorry for delay, I believe it's difference it TZs.
|
I'm writing in kotlin, so my code looks slightly different: BlobServiceClientBuilder()
.endpoint(endpoint)
.apply { authProvider(this) }
.httpClient(NettyAsyncHttpClientBuilder().build())
.buildAsyncClient()
.getBlobContainerAsyncClient(container)
.listBlobsByHierarchy("")
.byPage()
.doOnNext { response ->
println("Processing Page:start")
val items = response.value
for (item in items) {
println(item.name)
}
when {
response.continuationToken == null -> println("ContinuationToken: null")
response.continuationToken.isEmpty() -> println("ContinuationToken: empty")
else -> println("ContinuationToken: non-empty:" + response.continuationToken)
}
println("Processing Page:end")
}
.blockLast() And also I've already complained that http client isn't being autodetected because of classloader issues, so I have to create it manually. I hope it changes nothing. |
@asm0dey thank you for sharing this! @rickle-msft - from the output, it looks like the marker (aka continuationToken) for the last page is empty If I run the above code locally, the last page always has the marker property value as
if we manually set (in debug mode) that to empty string Any chance storage service over the wire returns |
@anuchandy I think the service always returns the empty xml element Is there any chance the different environment is somehow causing the deserializer to give empty string instead of null? We could also make the check for null or empty instead of just null? |
Any ideas on minimal test to reproduce substitution of "" with null in our environment? |
@rickle-msft thanks for the reference. Let me reach out to few people and get back, but I can see that the previous runtime indeed treat the empty string as an exit condition ref. while (nextPageLink != null && nextPageLink != "") { not completely sure but this is probably an indication that when next token is of type |
I'm seeing the same problem with the Iterable returned by DataLakeServiceClient.listFileSystems method. When I used an asynch client and printed markers, as in the suggestion above, I see empty strings. |
@rodburgett I suppose it could help if you post information on your environment. Possibly we could find common things in our environments. |
@rodburgett happy to hear you're unblocked. @asm0dey - we've some work-in-progress to enable users to configure JacksonAdapter. See these modules 1, 2. Hopefully, libraries allow the user to provide a configured adapter soon, we are piloting this with some selected SDKs (e.g., cosmos). Given the plugin brings its conflicting dependencies, another option to explore is shading. I feel like the plugin framework should provide a way to handle the standard java service-provider loading facility instead of developers working around it. If we look at the OSGi plugin/bundler world, it gives a service-loader mediator scheme to handle the same case, this enables loading to work on 3rd party libraries without app developers modifying it. I spend some time to look into how IntelliJ plugin works underneath and opened an issue in IntelliJ plugin repro: https://intellij-support.jetbrains.com/hc/en-us/community/posts/360008239419-How-to-make-the-plugin-CCTL-aware-of-the-Service-Provider-type-that-java-utils-ServiceLoader-looks-for |
@anuchandy I believe it will be too hard to introduce Classloader which will behave correctly in IDEA cause there are many plugins and many threads, but ContextClassLoader is being set up per thread. And yes, both OSGi and JBoss Modules provide a way to work with service loaders correctly, but ways they do that are not too simple. Do we really want more complexity in our IDE? :) |
The main reason I'm worried about setting CCL in the az libraries is, this may be a workaround for one case, but given its a library and will be used in many environments, setting CCL without understanding what goes underneath the env is risky. Even if we want to take the route of passing classloaders 3rd party libs, not all 3rd party libs may be supporting it. Jackson XMLMapper is an example. yea OSGi mediator is not simple, but I think the framework provided functionality is always safe and it ensure nothing breaks. So I thought it should be solved similarly in InteillJ plugin fx, hence the issue. I guess once we have the option to configure the JackonAdpater (like I mentioned in the previous comment), users should be able to provide the factory, so there is no CCL workaround. Or alternatively, shading is another possible option to consider, here is a related thread #11104, but in this case, there was also a conflict issue. |
I believe you can even create real issue at youtrack.jetbrains.com. Shading can be nice alternative, but we should decide who should be responsible for shading — library or plugin. And anyways shading should be very refined, because for example shading of reactor is not recommended by reactor team. |
Thank you for the correct link to report the issue, opened one here: https://youtrack.jetbrains.com/issue/IDEA-241229 |
@asm0dey @anuchandy What is the status of this issue? Do we need to keep it open for tracking or are we satisfied with opening the issue with jetbrains? |
@rickle-msft thank you very much for asking and tracking! I'm absolutely satisfied with current state. |
Sounds good. In that case I will close it and we can reopen it if we need further discussion here. Thanks for all your work! |
I am facing same issue in my custom plugin, what is the workaround? @anuchandy |
@tooptoop4 In my case it was classloader issue, did you check? |
I am facing the same problem with listBlobs(). Is there a solution other than changing classloader?
|
@nkanala my workaround was to add each blob name to a collection then break the loop if it finds a blob already in the collection |
@tooptoop4 @nkanala Thank you for your willingness to collaborate on this. One heads up I'll give you around that is if you're doing a hierarchical listing, maybe add an extra test case around the scenario where one blob name is a prefix for another blob name. e.g. case 1: foo and foo/bar. Case 2: foo/ and foo/bar. I believe in the former case the returned items will be foo and foo/ and in the latter case only foo/ will be returned, but I thought it better to mention in case your app has other logic to transform these results before comparing. |
Just FYI, I experience this issue when upgrading from jackson-databind So my solution in this case is to stay at |
I experienced the same issue, writing a plugin for graylog integration with Azure EventHubs. Adding Here is the code instantiating the BlobContainerAsyncClient: //Instantiating the BlobContainerAsyncClient
119 BlobContainerAsyncClient blobContainerAsyncClient =
120 new BlobContainerClientBuilder()
121 .connectionString(STORAGE_CONNECTION_STRING)
122 .containerName(STORAGE_CONTAINER_NAME)
123 .httpClient(httpClient)
124 .httpLogOptions(new HttpLogOptions().setLogLevel(HttpLogDetailLevel.BODY_AND_HEADERS))
125 .buildAsyncClient();
// Instantiating the event processor
129 BlobCheckpointStore blobCheckpointStore = new BlobCheckpointStore(blobContainerAsyncClient);
130 EventProcessorClientBuilder eventProcessorClientBuilder =
131 new EventProcessorClientBuilder()
132 .connectionString(EH_CONNECTION_STRING)
133 .consumerGroup(EH_CONSUMER_GROUP)
134 .processEvent(eventContext -> {
135 LOG.info(
136 "Processing event from partition {} with sequence {} %n",
137 eventContext.getPartitionContext().getPartitionId(),
138 eventContext.getEventData().getBodyAsString());
139 if (eventContext.getEventData().getSequenceNumber() % 10 == 0) {
140 }
141 })
142 .processError(errorContext -> {
143 LOG.error(
144 "Error occurred in partition processor for partition {}, {}.%n",
145 errorContext.getPartitionContext().getPartitionId(), errorContext.getThrowable());
146 })
147 .checkpointStore(blobCheckpointStore); The output would indefinitely be:
Sorry for the verbose comment, hope it helps someone who faces the same issue. |
I've experienced this same issue upgrading to |
Query/Question
When I'm calling following code
it cames out that loop is infinite, constantly repeating
item1 → item2 → item3 → item1 → …
Why is this not a Bug or a feature Request?
It doen't look like a bug because I can't reproduce it in clean environment. My custom environment is plugin for browsing Azure Blob Storage in IDEA.
Setup (please complete the following information if applicable):
The text was updated successfully, but these errors were encountered: