Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Truncate messages to not fail ingestion #17

Merged
merged 5 commits into from
Sep 12, 2023
Merged

Conversation

timtebeek
Copy link
Contributor

What's changed?

When message size exceeds maximum length truncate, rather than throw an exception.

What's your motivation?

Stops us from breaking ingestion when Maven/Gradle messages exceed maximum length.

Have you considered any alternatives or workarounds?

We could make it even more clear that the message was truncated, or truncate elsewhere than at the end.

Any additional context

Ingestion failures for apache/maven and kiegroup/drools.

@timtebeek timtebeek self-assigned this Sep 11, 2023
@knutwannheden
Copy link
Contributor

Is it the receiving side which is the "limiting factor"? I.e. does it use a fix buffer size to read the messages into?

In any case, I think we should check the length of the byte buffer, since a filename can also contain non-ASCII characters. And until UTF-8 becomes the platform default encoding (Java 21?), we should specify an explicit encoding when converting to bytes.

@timtebeek
Copy link
Contributor Author

Indeed looks like the receiving side is the limit, and uses the same constant for the max String length, rather than number of bytes.

for (; ; ) {
byte[] buf = new byte[MAX_MESSAGE_SIZE]; // no message should be longer than a terminal line length
DatagramPacket packet = new DatagramPacket(buf, buf.length);
try {
socket.receive(packet);
} catch (SocketTimeoutException ignored) {
break;

So yes, I guess we should look at byte array length rather than String length, as we don't seem to support UTF-8 right now, which would break the receiver loop.

@timtebeek timtebeek marked this pull request as draft September 12, 2023 09:21
@jkschneider jkschneider marked this pull request as ready for review September 12, 2023 10:09
@timtebeek
Copy link
Contributor Author

TODO: mimic animated progress bar line 240

if (message == null || message.length() <= maxLength) {
return message;
}
return "..." + message.substring(Math.max(message.length() - maxLength - 3, 0));
Copy link
Contributor

@knutwannheden knutwannheden Sep 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could still fail (on the receiving side) if there are any non-ASCII characters requiring a multibyte encoding. But that is rather unlikely case, so I guess we can ignore that for now. Otherwise, we would have to truncate the encoding instead. To do that properly, we would have to check that we don't accidentally truncate the byte array in the middle of a multibyte encoding (and if we do, then truncate all those bytes part of that multibyte set as well). For UTF-8 I believe we can do that by checking if the value is greater than 0xBF (0x00 - 0x7F is the ASCII range and 0x80 to 0xBF indicates the end of an UTF-8 sequence).

@timtebeek
Copy link
Contributor Author

As discussed; We're not too worried about UTF-8 just yet, so we're good enough to fix ingestion for projects as is.

@timtebeek timtebeek added the bug Something isn't working label Sep 12, 2023
@timtebeek timtebeek merged commit 7400a57 into main Sep 12, 2023
1 check passed
@timtebeek timtebeek deleted the truncate_messages branch September 12, 2023 11:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

2 participants