Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(PUP-11841) Fix encoding of empty String #9102

Merged
merged 1 commit into from
Sep 12, 2023

Conversation

smortex
Copy link
Contributor

@smortex smortex commented Aug 21, 2023

In 2077971 empty string literals where replaced by a call to String.new in preparation for moving to frozen/immutable strings.

However, as stated in the String#new documentation, when no value is passed to #new the returned string has the 'ASCII-8BIT' encoding instead of the default one (which is assumed to be 'UTF-8' by Puppet).

This cause regressions in some areas of the code, for example when using the concat module with exported resource built using epp templates, the incorrect encoding cause the fragment to be misinterpreted and is base64 encoded. When collected, the base64 representation of the string is used instead of the actual value of the string, as reported here:
voxpupuli/puppet-bacula#189

Replace calls to String.new with ''.dup which use the current encoding. Do not change the few explicit but redundant occurrences of String.new.force_encoding('ASCII-8BIT') (so that the intent is clearly visible). Where appropriate, slightly adjust the code for better
readability.

@smortex smortex requested review from a team as code owners August 21, 2023 15:50
@puppetlabs-jenkins
Copy link
Collaborator

Can one of the admins verify this patch?

In 2077971 empty string literals where
replaced by a call to `String.new` in preparation for moving to
frozen/immutable strings.

However, as stated in the String#new documentation, when no value is
passed to `#new` the returned string has the 'ASCII-8BIT' encoding
instead of the default one (which is assumed to be 'UTF-8' by Puppet).

This cause regressions in some areas of the code, for example when using
the concat module with exported resource built using epp templates, the
incorrect encoding cause the fragment to be misinterpreted and is base64
encoded.  When collected, the base64 representation of the string is
used instead of the actual value of the string, as reported here:
voxpupuli/puppet-bacula#189

Replace calls to `String.new` with `''.dup` which use the current
encoding.  Do not change the few explicit but redundant occurrences of
`String.new.force_encoding('ASCII-8BIT')` (so that the intent is clearly
visible).  Where appropriate, slightly adjust the code for better
readability.
@smortex smortex changed the title Fix encoding of empty String (PUP-11841) Fix encoding of empty String Aug 21, 2023
@smortex
Copy link
Contributor Author

smortex commented Aug 21, 2023

Cc @joshcooper for review 😉

@joshcooper
Copy link
Contributor

Thanks @smortex! I completely missed the encoding behavior difference. Why would Ruby do that!?

@smortex
Copy link
Contributor Author

smortex commented Aug 28, 2023

Why would Ruby do that!?

My first reaction was that it should default to the current locale (just like '' magically does), then I though that a function receiving the same values as parameters and not returning the same thing depending on some external factor was not great. So this might be valid 🤷.

From my findings, this behavior seems to have been the same since at least since Ruby 2.2.0: ruby/ruby@72194a8. Tricky, unexpected, but documented.

@cthorn42
Copy link
Collaborator

Validated this with a puppet-agent run in our inteneral adhoc job. Passed integration testing with all supported platforms on the main branch. This LGTM.

@cthorn42 cthorn42 merged commit 312ab25 into puppetlabs:main Sep 12, 2023
9 checks passed
@smortex smortex deleted the fix-empty-sting-encoding branch September 12, 2023 14:41
@kenyon
Copy link
Contributor

kenyon commented Sep 17, 2023

I don't have access to https://puppet.atlassian.net/browse/PUP-11841 for some reason, but while searching for this I found https://puppet.atlassian.net/browse/PUP-10096 ("Parameters of resources are base64 encoded in puppetdb if created as raw byte strings by ruby functions") which seems similar, or maybe the same thing.

@joshcooper
Copy link
Contributor

which seems similar, or maybe the same thing.

Yeah Resolv.new.getname(args[0]) is another way for binary ruby strings to "leak" into puppet, which then leads base64 in puppetdb:

$ ruby -rresolv -e "puts Resolv.new.getname('127.0.0.1').encoding" 
ASCII-8BIT

@joshcooper
Copy link
Contributor

I filed https://perforce.atlassian.net/browse/PUP-11943 (which is unfortunately not public), but it's the issue that we're currently using the track this bug. It will be included in the next puppet 8.3 release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants