Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

doc: Guidance on the maximum, supported size of Cloud-Init userdata #138

Open
aruneshpa opened this issue Apr 12, 2023 · 0 comments
Open

Comments

@aruneshpa
Copy link
Contributor

aruneshpa commented Apr 12, 2023

Summary

Cloud-Init documents the maximum number of bytes for user data on the EC2 platform is 16KiB, and that this value is platform dependent.

VM Operator relies on the Cloud-Init Data Source for VMware, which currently uses a VM's GuestInfo key/values to send data into the guest. That means, by default, the maximum, supported size for Cloud-Init user data is 64KiB for VMs deployed by VM Operator on Supervisor.

This ticket tracks adding documentation with the above information and additional context.

Details

Cloud-Init was originally developed by Canonical for Amazon EC2, that is where the 16384 (16KiB) limit originates. In fact, this impacted @randomvariable a few years ago with his work on the Cluster API Provider for AWS and needing to inject certificates for kubeadm into the guests -- the maximum 16KiB limit on EC2 was a hinderance to this effort.

For vSphere the size is more than 16KiB, but less than 1MiB. The value is 64KiB, defined by GUESTMSG_MAX_IN_SIZE. This is to reduce the possibility of attacks from guest apps.

By default, a VM's entire vmx file has a 1MiB limit, but this value can be adjusted with a VMX property named tools.setInfo.sizeLimit. Some caveats include:

  • The parameter tools.setInfo.sizeLimit controls the size of the vmx file as a whole, not any individual property
  • It is not possible for DevOps users to set the tools.setInfo.sizeLimit parameter today via a VM Operator VirtualMachine resource.
  • There is no way, either by a DevOps user or platform admin, to adjust the max size for an individual GuestInfo value, which is what is subject to the limit set by the constant GUESTMSG_MAX_IN_SIZE.

FWIW, this size will increase once we update the Cloud-Init DataSource for VMware with support for Datasets, a feature new to vSphere 8.0, but that will require ESXi hosts at 8.0 as well as images with versions of Cloud-Init with the updated DataSource for VMware.

One way to deal with the 64KiB ceiling is to not only base64-encode plain-text to potentially reduce its size, but also gzip before you base64-encode it. We performed some analysis on ideal compression workflows for Cloud-Init back in 2018 on this very topic:

Each file in the below list contains Cloud Config data:

  • The first extension indicates how the child documents are encoded.
  • The second extension indicates how the outer document is encoded.
-rw-r--r--@  1 akutz  staff    94K Aug 28 11:39 cc.base64.base64
-rw-r--r--@  1 akutz  staff    32K Aug 28 11:40 cc.base64.gzip
-rw-r--r--@  1 akutz  staff    51K Aug 28 11:38 cc.gzip.base64
-rw-r--r--@  1 akutz  staff    31K Aug 28 11:41 cc.gzip.gzip
-rw-r--r--@  1 akutz  staff    22K Aug 28 11:49 cc.plaintext.gzip
-rw-r--r--@  1 akutz  staff    23K Aug 28 11:49 cc.urlencode.gzip

By comparing cc.gzip.gzip and cc.base64.gzip, there is a very minor improvement if child content is compressed prior to being included in the outer content. But again, it is very minor.

Now, cc.plaintext.gzip is of course the most compressed. Unfortunately it represents the inner-content as plain-text. This is not possible in production thanks to the outer content's rules about valid characters. However, using URL encoding for the inner content instead of base64 encoding makes the characters safe for YAML-transport, and look at the results! Only fractionally worse than the inner-content as plain-text. Unfortunately a Cloud-Config document does not support URL encoding for its file content (https://cloudinit.readthedocs.io/en/latest/topics/examples.html#writing-out-arbitrary-files):

encoding can be given b64 or gzip or (gz+b64).

In the end, the ideal solution is to keep all inner-data as plain-text and compress/encode the envelope.

Basically, the rule of thumb is this -- for the ideal compression ratio, do not encode/compress anything inside the cloud-config, such as file data. The more plain-text, the better, because it allows for better compression with gzip. Simply gzip the outer shell. Otherwise you're just shoving zips inside zips.

Imagine the following files:

  1. cloud-config-with-plain-text-file.yaml

    #cloud-config
    password: fake
    ssh_pwauth: true
    users:
    - name: fake
      sudo: ALL=(ALL) NOPASSWD:ALL
      lock_passwd: false
      passwd: fake
      shell: /bin/bash
    write_files:
    - path: /helloworld
      content: |
        Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
    
        Sed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium doloremque laudantium, totam rem aperiam, eaque ipsa quae ab illo inventore veritatis et quasi architecto beatae vitae dicta sunt explicabo. Nemo enim ipsam voluptatem quia voluptas sit aspernatur aut odit aut fugit, sed quia consequuntur magni dolores eos qui ratione voluptatem sequi nesciunt. Neque porro quisquam est, qui dolorem ipsum quia dolor sit amet, consectetur, adipisci velit, sed quia non numquam eius modi tempora incidunt ut labore et dolore magnam aliquam quaerat voluptatem. Ut enim ad minima veniam, quis nostrum exercitationem ullam corporis suscipit laboriosam, nisi ut aliquid ex ea commodi consequatur? Quis autem vel eum iure reprehenderit qui in ea voluptate velit esse quam nihil molestiae consequatur, vel illum qui dolorem eum fugiat quo voluptas nulla pariatur?
    
        At vero eos et accusamus et iusto odio dignissimos ducimus qui blanditiis praesentium voluptatum deleniti atque corrupti quos dolores et quas molestias excepturi sint occaecati cupiditate non provident, similique sunt in culpa qui officia deserunt mollitia animi, id est laborum et dolorum fuga. Et harum quidem rerum facilis est et expedita distinctio. Nam libero tempore, cum soluta nobis est eligendi optio cumque nihil impedit quo minus id quod maxime placeat facere possimus, omnis voluptas assumenda est, omnis dolor repellendus. Temporibus autem quibusdam et aut officiis debitis aut rerum necessitatibus saepe eveniet ut et voluptates repudiandae sint et molestiae non recusandae. Itaque earum rerum hic tenetur a sapiente delectus, ut aut reiciendis voluptatibus maiores alias consequatur aut perferendis doloribus asperiores repellat.
  2. cloud-config-with-compressed-file.yaml

    #cloud-config
    password: fake
    ssh_pwauth: true
    users:
    - name: fake
      sudo: ALL=(ALL) NOPASSWD:ALL
      lock_passwd: false
      passwd: fake
      shell: /bin/bash
    write_files:
    - path: /helloworld
      encoding: gzip
      content: !!binary |
        H4sIAFzKNmQCA41VW24jRwz81yl4AEFXWATIfgQIFgiSHIDqpi0C/Rj3w/Dxt8geyaPNYpEfwzPTbFYVq6g/a5NMuvWZKdZUG3UdxFnGmUItXcKQMRtx1E170PJKkhQfu0QUkOjsuUYakjcUawkaNc4yaA5KfMX1JGNdLZT5tTBx0rfJF/p3kBTNuJuy2j/veOR8prepnUrto81I8iEt6OChtdBMiXOo62Y7pF2tk1+pGw6TMIBnYKqLAFqNC/1uV/IcQtomkCyuWqjJ1uQmJUoDcbx4r2luaCeAA6YkvQsFTemuEAhNepmvyoOKAaKNGx5mu9DXjyDbkGkyQoMaAkvAuTA3jTysAiy2VjVKMRVNKTQNM21svKm+vGhQpihdmn3NNRkMNoEUcvRd15kvp9PfmALob9L6ZmUDLCe4UM2QhrSDRgG0TtLaPtwHQSgfwuxchj645bcpuH/G9fZMow7OZCZhdPHxCNsheMYQsxBfCfJU8MAAhwn0bmI6GMweZzAkbuGmA3aqdBUogUNqf6OGwUsH+diSBnC70DfJdZnD2uQjZojE9+e+3NqBzFg2mzDVaO/wj81od6oX7XZAK5w0K+pOGjBrd/Wb20yO/axEqQjMj0qDZuxh9lbdqKCXbSpu27uKe6K87S9idX7kapntANZ8UmZetyNk8EHUPWa8cvbLlOU9ZtlnBF4HTv+NHv8se/kpeyDl6QP+tnn4+gTwTXcEWrvVPycyPiVSPxM52xf6655JS35CqiCbZfM5kqYqEiL882Q6xaI3TZYUzEFhqkObs9+94nsckHXbQ/w266ehnhP95XT6bZidq1sEGq/I5OkPmAv8DGIVPoafeteMY3EGtRPW75q4wJAKqltjhHql7c7FgicJ2g9EZJizIG/DJzVY/dOgK0cPjnjjq2Y2fV41unaNuk77snm/bxvNamOR/7938MY2j55/2D0Puy0Zsc2/DrpxWypHsZXh3xir0/YASsUjLoYNcoFFCTAWEoUJJr2ayMvfgpCgtkOjYUm43i9I+gpbACz0qXbGuKzha/abfZiwNNRXSxJ+mjJ/aEZiEwcsHkMkzQJsw5r9vK/KhwG4I7nowivU6+vKMIwpCcOKs1/oH4eq13n3MHjjIbJr44vIJbVquepYXt9lKRLgXk+WXdAZN5NYBsUzLYe4dus7o8JHsLYPW8bB7DbkJr7IceBCfwzfz+LDWO1uGiBtWb/k6LYp/CDuvDBMgzl2cABsCj/0WPgyq7sQoYZCh3R5FbbvCyT1MhdqiWJbeZUt3fAzfPoOOGT7r28IAAA=

Which file is more easily compressed with gzip?

$ gzip -6cn cloud-config-with-plain-text-file.yaml | base64 | wc -c
    1501
$ gzip -6cn cloud-config-with-compressed-file.yaml | base64 | wc -c
    1621

While it may sound counter-intuitive, additional compression can often have diminishing, if not net-negative, returns. This is because of how gzip compression works by essentially deduplicating string values and replacing symbols based on a weighted frequency. This works best against plain-text, and thus gzip'ing content that's already been gzip'd does not really help.

Additionally, while not in direct scope of this ticket, it is related..., so it is worth mentioning that VM Operator simplified the end-user UX. On the latest version, we always attempt to base64-decode (until it fails) and the decompress any bootstrap data. Then we turn around and gzip+base64 encode it again. This way the end-user does not need to concern themselves with specifying the encoding of the user-data key in a Secret resource -- VM Operator will handle plain-text, base64, or gzip+base64 all just fine.

It is also worth noting that it is possible to predice the size of the eventual payload set on the VM. The commands in the examples above specify the same type of compression VM Operator uses to deflate the data before setting it on a VM:

  1. We call EncodeGzipBase64
  2. Which then uses gzip.NewWriter
  3. The Go, stdlib function gzip.NewWriter turns around and invokes NewWriterLevel with the constant DEFAULT_COMPRESSION
  4. The constant DEFAULT_COMPRESSION is defined as the value 6 in the Go, stdlib package compress/flate

This value, 6, is the same compression value passed to the command-line tool gzip used in the earlier examples. From gzip's man page:

     -1, --fast

     -2, -3, -4, -5, -6, -7, -8

     -9, --best        These options change the compression level used, with the -1 option being
                       the fastest, with less compression, and the -9 option being the slowest,
                       with optimal compression.  The default compression level is 6.

Therefore the following command should output the same number of bytes that will be used to set the Cloud-Init user data on a VM by VM Operator:

gzip -6cn <CLOUD_CONFIG_FILE> | base64 | wc -c

It is also possible to do it without a file at all using a heredoc:

$ cat <<EOF | gzip -6cn | base64 | wc -c
#cloud-config
password: fake
ssh_pwauth: true
users:
- name: fake
  sudo: ALL=(ALL) NOPASSWD:ALL
  lock_passwd: false
  passwd: fake
  shell: /bin/bash
EOF

Thus the following command will succeed only if the calculated bytes are within the range supported by VM Operator:

test $(gzip -6cn <CLOUD_CONFIG_FILE> | base64 | wc -c | awk '{print $1}') -le 67108864
@akutz akutz changed the title Documentation to indicate the maximum size of user data that can be specified via cloud-init bootstrap type doc: Guidance on the maximum, supported size of Cloud-Init userdata Apr 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants
@aruneshpa and others