Skip to content
This repository has been archived by the owner on Sep 22, 2020. It is now read-only.

serialize compression state #2

Open
philips opened this issue Mar 11, 2015 · 10 comments
Open

serialize compression state #2

philips opened this issue Mar 11, 2015 · 10 comments

Comments

@philips
Copy link
Contributor

philips commented Mar 11, 2015

It would be great to be able to serialize out the compression state/headers so that someone could take this serialized state, an uncompressed item and reproduce an identical asset.

For example in the case of rocket we want to be able to extract a tar.gz and put it on disk. Then at some later date we want to take those files on disk and exactly recreate the tar.gz so we can do a signature validation against.

@vbatts
Copy link

vbatts commented Mar 11, 2015

agreed.

I'll have to research more on this. While looking at golang's compress/gzip, the IEEE table they use, the gnu gzip crc32 tables includes some the same polynomials, but produces a different output (at the same compression levels). This would be great to acheive.

@peebs
Copy link
Contributor

peebs commented Mar 11, 2015

@vbatts So, compress/gzip produces different compressed outputs then gnu gzip at the same compression level? Do you know if this a result of different gzip headers or rather that the deflate functions between the libraries actually produce different results. If so, that means either one of the implementations deviates from rfc1951 or that the rfc doesn't guarantee reproducibility.

As far as serialization goes in gzran, the main thing to look at is restoring the step field of the point struct. Not understanding use of this field in the decompressor is what tripped me up for awhile during the initial implementation. Otherwise, you could pass the Index (with crc and gzip header) straight to something like Gob.

For the purpose of reproducing a bit-for-bit identical gzip file from the uncompressed data, saving the index isn't strictly necessary, but might be useful for other reasons. For reproducibility you should only need:
-to save and restore the gzip headers
-ensure that whatever deflate library first deflates the aci, can be reproduced by Go's compress/flate

The last point is the one that needs a little research.

@vbatts
Copy link

vbatts commented Mar 12, 2015

so, i'll do some rfc reading tomorrow. here was my initial investigation https://gist.github.com/vbatts/43fc209acf37ff21dd87

@vbatts
Copy link

vbatts commented Mar 12, 2015

Also, RFC 1951 is for deflate. Very much similar to gzip. RFC 1952 is for gzip, and what golang is implemented to.

@peebs
Copy link
Contributor

peebs commented Mar 12, 2015

Ah, this may be a problem. Gzip, in Go, is a header and checksum wrapped around DEFLATE (http://golang.org/pkg/compress/flate/) which is also used by zlib.

I assumed gnu gzip used DEFLATE as well, but the gnu gzip uses LZ77. The two packages don't seem compatible for reproducibility. DEFLATE is based on LZ77 but not the same.

Either we need LZ77 in Go or the initial ACI must be compressed with something implementing DEFLATE.

@vbatts
Copy link

vbatts commented Mar 12, 2015

then we should review http://golang.org/pkg/compress/lzw/ as well

@peebs
Copy link
Contributor

peebs commented Mar 12, 2015

Didn't even see that! Though, It appears LZW is not the same as LZ77 which is not the same as LZMA, LZSS, LZ78, ect.

@vbatts
Copy link

vbatts commented Mar 12, 2015

correct.

@peebs
Copy link
Contributor

peebs commented Mar 12, 2015

I'm confused about what compression method gzip uses. Here it seems to use DEFLATE: http://www.gzip.org/algorithm.txt

@vbatts
Copy link

vbatts commented Mar 25, 2015

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants