[3.0] Encrypt headers #151

bchess · 2024-06-27T20:53:21Z

Tensorizer v3 should extend encryption to include header information. This needs to be accounted for in the file metadata and the patterns we use to buffer and write the metadata.
Important things to consider:

Where do we put encryption parameters in the header?
- My take: Indicate in the file flags if the headers are encrypted, and then place encryption info immediately before the metadata section if so. It likely shouldn't be bundled into the metadata section, so that the length of the metadata section can also be encrypted. File flags and other meta-metadata like hashes of the metadata section can be encrypted too.
How does incremental writing interact with metadata encryption?
- How many nonces and MACs are used when encrypting and decrypting the metadata section?
- If one is used, then the metadata section can only be updated by rewriting the entire thing, because data can't be inserted into the middle of an encrypted stream (for the non-tensor-header metadata section), only appended at the end.
  - This might be cheap enough that we could do it anyway, though it feels like it could be limiting in the future.
- If two are used (one for metadata entries, one for tensor headers), the metadata section could actually be written as two independent streams, only rewriting their MACs on each synchronization.
- If multiple are used, then we need to handle size limits for the nonce & MAC list like size limits of the rest of the metadata section. (Luckily, this fits pretty well, since the size taken up by nonces and MACs is directly proportional to the number of entries being described when using one encrypted segment per metadata entry).
- Ideally, information like the choice of encryption method would be stored in a tagged and extensible format like tensor CryptInfo segments, which would allow changing this in order to make it more or less rigid in the future.
To what extent do we attempt to protect information about model "size"?
- The filesize gives away most information about that regardless of what we encrypt, but should we set out to protect information about how many tensors are contained in a file?
- The number of tensors could potentially be leaked by information like the length of the metadata section, or the spacing of padding between tensor data entries. The length of the metadata section can be determined by the padding at the end of the metadata section unless we scramble the padding bytes during encryption.

The text was updated successfully, but these errors were encountered:

bchess added this to the 3.0 milestone Jun 27, 2024

bchess added the schema-change label for 3.0-dev work that involves schema changes label Jun 27, 2024

Eta0 self-assigned this Jul 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[3.0] Encrypt headers #151

[3.0] Encrypt headers #151

bchess commented Jun 27, 2024

[3.0] Encrypt headers #151

[3.0] Encrypt headers #151

Comments

bchess commented Jun 27, 2024