-
Notifications
You must be signed in to change notification settings - Fork 15
serialize compression state #2
Comments
agreed. I'll have to research more on this. While looking at golang's |
@vbatts So, compress/gzip produces different compressed outputs then gnu gzip at the same compression level? Do you know if this a result of different gzip headers or rather that the deflate functions between the libraries actually produce different results. If so, that means either one of the implementations deviates from rfc1951 or that the rfc doesn't guarantee reproducibility. As far as serialization goes in gzran, the main thing to look at is restoring the step field of the point struct. Not understanding use of this field in the decompressor is what tripped me up for awhile during the initial implementation. Otherwise, you could pass the Index (with crc and gzip header) straight to something like Gob. For the purpose of reproducing a bit-for-bit identical gzip file from the uncompressed data, saving the index isn't strictly necessary, but might be useful for other reasons. For reproducibility you should only need: The last point is the one that needs a little research. |
so, i'll do some rfc reading tomorrow. here was my initial investigation https://gist.github.com/vbatts/43fc209acf37ff21dd87 |
Also, RFC 1951 is for deflate. Very much similar to gzip. RFC 1952 is for gzip, and what golang is implemented to. |
Ah, this may be a problem. Gzip, in Go, is a header and checksum wrapped around DEFLATE (http://golang.org/pkg/compress/flate/) which is also used by zlib. I assumed gnu gzip used DEFLATE as well, but the gnu gzip uses LZ77. The two packages don't seem compatible for reproducibility. DEFLATE is based on LZ77 but not the same. Either we need LZ77 in Go or the initial ACI must be compressed with something implementing DEFLATE. |
then we should review http://golang.org/pkg/compress/lzw/ as well |
Didn't even see that! Though, It appears LZW is not the same as LZ77 which is not the same as LZMA, LZSS, LZ78, ect. |
correct. |
I'm confused about what compression method gzip uses. Here it seems to use DEFLATE: http://www.gzip.org/algorithm.txt |
also could be on the review-radar https://github.com/pierrec/lz4 (spec http://fastcompression.blogspot.fr/2013/04/lz4-streaming-format-final.html) |
It would be great to be able to serialize out the compression state/headers so that someone could take this serialized state, an uncompressed item and reproduce an identical asset.
For example in the case of rocket we want to be able to extract a tar.gz and put it on disk. Then at some later date we want to take those files on disk and exactly recreate the tar.gz so we can do a signature validation against.
The text was updated successfully, but these errors were encountered: