Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for zstd layer upload #1827

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

slonopotamus
Copy link

@slonopotamus slonopotamus commented Oct 28, 2023

It is now possible to crane append -f bla.tar.zstd to upload Zstd layer, both from file and from pipe.

Before this commit, crane would erroneously upload such layer with gzip mime type.

This PR does not fully solve #1501 because I don't add commandline switch to allow recompression.

@slonopotamus
Copy link
Author

There is alternative implementation in #1798, but I believe it has several issues.

@@ -35,6 +35,7 @@ const (
DockerManifestSchema2 MediaType = "application/vnd.docker.distribution.manifest.v2+json"
DockerManifestList MediaType = "application/vnd.docker.distribution.manifest.list.v2+json"
DockerLayer MediaType = "application/vnd.docker.image.rootfs.diff.tar.gzip"
DockerLayerZstd MediaType = "application/vnd.docker.image.rootfs.diff.tar.zstd"
Copy link
Author

@slonopotamus slonopotamus Oct 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, to my surprise, such thing exists. See moby/buildkit#2344 (comment), moby/buildkit#3968 and containerd/containerd#9263.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more prooflink: containerd/containerd#9859

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🙈

@slonopotamus slonopotamus force-pushed the zstd-append branch 4 times, most recently from 49b2380 to 2161e22 Compare October 29, 2023 15:21
It is now possible to `crane append -f bla.tar.zstd` to upload Zstd layer, both from file and from pipe.

Before this commit, crane would erroneously upload such layer with gzip mime type.

While this PR does not _fully_ solve google#1501, it is very close to that.

Signed-off-by: Marat Radchenko <[email protected]>
var compressor io.WriteCloser

if comp == compression.ZStd {
compressor, _ = zstd.NewWriter(out)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test was very broken, because it didn't actually test zstd at all.

@ethan-gallant
Copy link

@imjasonh is there anything else left here before it can be merged?

@ethan-gallant
Copy link

@slonopotamus in testing with this PR I've noticed that the PeekReader implementation can fail due to ErrNoProgress.

ErrNoProgress happens when the Read call returns 0 bytes which in our case can happen due to compression being in-progress.

This can be worked around by creating a custom peek reader as I've done here, but obviously at the cost of added complexity.


type peekReader struct {
	src    io.Reader
	buffer []byte
	offset int
}

func newPeekReader(src io.Reader) PeekReader {
	return &peekReader{
		src:    src,
		buffer: make([]byte, 0, 6),
	}
}

// Peek returns the first n bytes without advancing the reader.
// It reads more data into the buffer if necessary.
func (p *peekReader) Peek(n int) ([]byte, error) {
	// Ensure the buffer has enough data
	if len(p.buffer)-p.offset < n {
		// Read more data: calculate how much more is needed
		needed := n - (len(p.buffer) - p.offset)
		tempBuf := make([]byte, needed)
		read, err := io.ReadFull(p.src, tempBuf)
		if err != nil && err != io.EOF {
			return nil, err // Propagate read errors (except EOF)
		}
		// Append any new data to the buffer
		p.buffer = append(p.buffer, tempBuf[:read]...)
	}

	// If after reading there's still not enough data for peek, but no more data is coming
	if len(p.buffer)-p.offset < n {
		return nil, io.EOF // Or another error indicating not enough data
	}

	// Return the requested data without advancing the offset
	return p.buffer[p.offset : p.offset+n], nil
}

// Read reads data into p, advancing the reader.
func (p *peekReader) Read(b []byte) (int, error) {
	// If there's buffered data, use it
	if p.offset < len(p.buffer) {
		n := copy(b, p.buffer[p.offset:])
		p.offset += n
		// If the buffer is fully consumed, reset it
		if p.offset == len(p.buffer) {
			p.buffer = p.buffer[:0]
			p.offset = 0
		}
		return n, nil
	}

	// No buffered data; read directly from the source
	return p.src.Read(b)
}

@slonopotamus
Copy link
Author

If I understand you properly, you want to say that bufio.Reader is broken with slow writers? I believe it is out of scope of current PR then. PeekCompression function existed long before and there are already several usages of it throughout codebase.

I am actually not sure... Why Peek would return zero bytes for you instead of blocking until data is available?

@ethan-gallant
Copy link

ethan-gallant commented Feb 10, 2024

If I understand you properly, you want to say that bufio.Reader is broken with slow writers? I believe it is out of scope of current PR then. PeekCompression function existed long before and there are already several usages of it throughout codebase.

I am actually not sure... Why Peek would return zero bytes for you instead of blocking until data is available?

It's part of a silly implementation in the bufio package. Lots of maintainer drama related to ErrNoProgress and when and when not to return it.

In the event that your reader source would return 0 bytes & no error (which is perfectly acceptable), it will throw. My use-case is that our ZStd library is using CGO bindings as it's more efficient than the go-native ZStd implementations.

So before it was acceptable to return 0 bytes and no error, but with this change it would break even non-zstd builds potentially.

Edit: here's the library we use https://github.com/DataDog/zstd

@slonopotamus
Copy link
Author

slonopotamus commented Feb 10, 2024

I am confused. Who is "we"? Current package uses klauspost/compress/zstd: https://github.com/google/go-containerregistry/blob/v0.19.0/internal/zstd/zstd.go#L24

Also, bufio would error out if it gets a hundred of zero-length reads in a row: https://github.com/golang/go/blob/go1.22.0/src/bufio/bufio.go#L42 You want to say you know underlying reader library that does that? But why? Such library doesn't follow Reader guidelines and should probably be fixed:

Implementations of Read are discouraged from returning a zero byte count with a nil error, except when len(p) == 0. Callers should treat a return of 0 and nil as indicating that nothing happened; in particular it does not indicate EOF.

We surely can Ctrl+C/Ctrl+V bufio and do exactly the same what it does, just without 100-empty-read-limit, but I cannot understand under what conditions the situation you describe can trigger currently.

I've asked what Go devs think about current bufio behavior: golang/go#27531 (comment)

@slonopotamus
Copy link
Author

Go devs say golang/go#27531 (comment):

A reader that returns 0, nil is valid, but a reader that returns it 100 times in a row probably has something wrong with it.

@asford
Copy link

asford commented Apr 11, 2024

Friendly ping @imjasonh @jonjohnsonjr is there anything we can do this get this reviewed for @slonopotamus?
This is a critical bit to enable zstd compression in bazel rules_oci, which has now popped up a few times.

I.e bazel-contrib/rules_oci#333 and bazel-contrib/rules_oci#523

cmd/crane/cmd/flatten.go Show resolved Hide resolved
}
}

// NewLayer creates a Layer from an io.ReadCloser.
func NewLayer(rc io.ReadCloser, opts ...LayerOption) *Layer {
func NewLayer(rc io.ReadCloser, opts ...LayerOption) (*Layer, error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to avoid this breaking API change if we can.

We could add an err field to the Layer struct and return it on the first access to any of methods. (I don't love it, but I also don't want to break stuff.)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is so weird... :D When later someone adds a method to Layer, they will definitely forget about this hack and introduce a bug. And what do we do if Layer has a method that doesn't return error?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So. Do you really want me to do this and this is the only thing preventing this PR from being merged?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ping?

@@ -35,6 +35,7 @@ const (
DockerManifestSchema2 MediaType = "application/vnd.docker.distribution.manifest.v2+json"
DockerManifestList MediaType = "application/vnd.docker.distribution.manifest.list.v2+json"
DockerLayer MediaType = "application/vnd.docker.image.rootfs.diff.tar.gzip"
DockerLayerZstd MediaType = "application/vnd.docker.image.rootfs.diff.tar.zstd"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🙈

Copy link

This Pull Request is stale because it has been open for 90 days with
no activity. It will automatically close after 30 more days of
inactivity. Keep fresh with the 'lifecycle/frozen' label.

Copy link

This Pull Request is stale because it has been open for 90 days with
no activity. It will automatically close after 30 more days of
inactivity. Keep fresh with the 'lifecycle/frozen' label.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants