Truncate messages to not fail ingestion #17

timtebeek · 2023-09-11T21:55:56Z

What's changed?

When message size exceeds maximum length truncate, rather than throw an exception.

What's your motivation?

Stops us from breaking ingestion when Maven/Gradle messages exceed maximum length.

Have you considered any alternatives or workarounds?

We could make it even more clear that the message was truncated, or truncate elsewhere than at the end.

Any additional context

Ingestion failures for apache/maven and kiegroup/drools.

knutwannheden · 2023-09-12T05:55:11Z

Is it the receiving side which is the "limiting factor"? I.e. does it use a fix buffer size to read the messages into?

In any case, I think we should check the length of the byte buffer, since a filename can also contain non-ASCII characters. And until UTF-8 becomes the platform default encoding (Java 21?), we should specify an explicit encoding when converting to bytes.

timtebeek · 2023-09-12T09:21:41Z

Indeed looks like the receiving side is the limit, and uses the same constant for the max String length, rather than number of bytes.

rewrite-polyglot/src/main/java/org/openrewrite/polyglot/RemoteProgressBarReceiver.java

Lines 57 to 63 in 9271cae

    
           for (; ; ) { 
        
               byte[] buf = new byte[MAX_MESSAGE_SIZE]; // no message should be longer than a terminal line length 
        
               DatagramPacket packet = new DatagramPacket(buf, buf.length); 
        
               try { 
        
                   socket.receive(packet); 
        
               } catch (SocketTimeoutException ignored) { 
        
                   break;

So yes, I guess we should look at byte array length rather than String length, as we don't seem to support UTF-8 right now, which would break the receiver loop.

timtebeek · 2023-09-12T10:12:36Z

TODO: mimic animated progress bar line 240

knutwannheden · 2023-09-12T10:31:46Z

src/main/java/org/openrewrite/polyglot/RemoteProgressBarSender.java

+        if (message == null || message.length() <= maxLength) {
+            return message;
+        }
+        return "..." + message.substring(Math.max(message.length() - maxLength - 3, 0));


This could still fail (on the receiving side) if there are any non-ASCII characters requiring a multibyte encoding. But that is rather unlikely case, so I guess we can ignore that for now. Otherwise, we would have to truncate the encoding instead. To do that properly, we would have to check that we don't accidentally truncate the byte array in the middle of a multibyte encoding (and if we do, then truncate all those bytes part of that multibyte set as well). For UTF-8 I believe we can do that by checking if the value is greater than 0xBF (0x00 - 0x7F is the ASCII range and 0x80 to 0xBF indicates the end of an UTF-8 sequence).

timtebeek · 2023-09-12T11:03:44Z

As discussed; We're not too worried about UTF-8 just yet, so we're good enough to fix ingestion for projects as is.

Truncate messages to not fail ingestion

707a90d

timtebeek requested a review from jkschneider September 11, 2023 21:55

timtebeek self-assigned this Sep 11, 2023

timtebeek marked this pull request as draft September 12, 2023 09:21

jkschneider marked this pull request as ready for review September 12, 2023 10:09

timtebeek added 2 commits September 12, 2023 12:17

Truncate as we do in CLI AnimatedProgressBar

591706f

Add nullable annotations

511b2e5

knutwannheden reviewed Sep 12, 2023

View reviewed changes

Ensure we properly truncate messages

dc35ad6

Add note with link about UTF-8 handling

19aa200

timtebeek added the bug Something isn't working label Sep 12, 2023

timtebeek merged commit 7400a57 into main Sep 12, 2023
1 check passed

timtebeek deleted the truncate_messages branch September 12, 2023 11:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Truncate messages to not fail ingestion #17

Truncate messages to not fail ingestion #17

timtebeek commented Sep 11, 2023

knutwannheden commented Sep 12, 2023

timtebeek commented Sep 12, 2023

timtebeek commented Sep 12, 2023

knutwannheden Sep 12, 2023 •

edited

Loading

timtebeek commented Sep 12, 2023

Truncate messages to not fail ingestion #17

Truncate messages to not fail ingestion #17

Conversation

timtebeek commented Sep 11, 2023

What's changed?

What's your motivation?

Have you considered any alternatives or workarounds?

Any additional context

knutwannheden commented Sep 12, 2023

timtebeek commented Sep 12, 2023

timtebeek commented Sep 12, 2023

knutwannheden Sep 12, 2023 • edited Loading

Choose a reason for hiding this comment

timtebeek commented Sep 12, 2023

knutwannheden Sep 12, 2023 •

edited

Loading