-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Truncate messages to not fail ingestion #17
Conversation
Is it the receiving side which is the "limiting factor"? I.e. does it use a fix buffer size to read the messages into? In any case, I think we should check the length of the byte buffer, since a filename can also contain non-ASCII characters. And until UTF-8 becomes the platform default encoding (Java 21?), we should specify an explicit encoding when converting to bytes. |
Indeed looks like the receiving side is the limit, and uses the same constant for the max String length, rather than number of bytes. rewrite-polyglot/src/main/java/org/openrewrite/polyglot/RemoteProgressBarReceiver.java Lines 57 to 63 in 9271cae
So yes, I guess we should look at byte array length rather than String length, as we don't seem to support UTF-8 right now, which would break the receiver loop. |
TODO: mimic animated progress bar line 240 |
if (message == null || message.length() <= maxLength) { | ||
return message; | ||
} | ||
return "..." + message.substring(Math.max(message.length() - maxLength - 3, 0)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could still fail (on the receiving side) if there are any non-ASCII characters requiring a multibyte encoding. But that is rather unlikely case, so I guess we can ignore that for now. Otherwise, we would have to truncate the encoding instead. To do that properly, we would have to check that we don't accidentally truncate the byte array in the middle of a multibyte encoding (and if we do, then truncate all those bytes part of that multibyte set as well). For UTF-8 I believe we can do that by checking if the value is greater than 0xBF
(0x00
- 0x7F
is the ASCII range and 0x80
to 0xBF
indicates the end of an UTF-8 sequence).
As discussed; We're not too worried about UTF-8 just yet, so we're good enough to fix ingestion for projects as is. |
What's changed?
When message size exceeds maximum length truncate, rather than throw an exception.
What's your motivation?
Stops us from breaking ingestion when Maven/Gradle messages exceed maximum length.
Have you considered any alternatives or workarounds?
We could make it even more clear that the message was truncated, or truncate elsewhere than at the end.
Any additional context
Ingestion failures for apache/maven and kiegroup/drools.