Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[3.0] Encrypt headers #151

Open
bchess opened this issue Jun 27, 2024 · 0 comments
Open

[3.0] Encrypt headers #151

bchess opened this issue Jun 27, 2024 · 0 comments
Assignees
Labels
schema-change label for 3.0-dev work that involves schema changes
Milestone

Comments

@bchess
Copy link
Contributor

bchess commented Jun 27, 2024

from @Eta0 in #127 (review)

Tensorizer v3 should extend encryption to include header information. This needs to be accounted for in the file metadata and the patterns we use to buffer and write the metadata.
Important things to consider:

  1. Where do we put encryption parameters in the header?
    • My take: Indicate in the file flags if the headers are encrypted, and then place encryption info immediately before the metadata section if so. It likely shouldn't be bundled into the metadata section, so that the length of the metadata section can also be encrypted. File flags and other meta-metadata like hashes of the metadata section can be encrypted too.
  2. How does incremental writing interact with metadata encryption?
    • How many nonces and MACs are used when encrypting and decrypting the metadata section?
    • If one is used, then the metadata section can only be updated by rewriting the entire thing, because data can't be inserted into the middle of an encrypted stream (for the non-tensor-header metadata section), only appended at the end.
      • This might be cheap enough that we could do it anyway, though it feels like it could be limiting in the future.
    • If two are used (one for metadata entries, one for tensor headers), the metadata section could actually be written as two independent streams, only rewriting their MACs on each synchronization.
    • If multiple are used, then we need to handle size limits for the nonce & MAC list like size limits of the rest of the metadata section. (Luckily, this fits pretty well, since the size taken up by nonces and MACs is directly proportional to the number of entries being described when using one encrypted segment per metadata entry).
    • Ideally, information like the choice of encryption method would be stored in a tagged and extensible format like tensor CryptInfo segments, which would allow changing this in order to make it more or less rigid in the future.
  3. To what extent do we attempt to protect information about model "size"?
    • The filesize gives away most information about that regardless of what we encrypt, but should we set out to protect information about how many tensors are contained in a file?
    • The number of tensors could potentially be leaked by information like the length of the metadata section, or the spacing of padding between tensor data entries. The length of the metadata section can be determined by the padding at the end of the metadata section unless we scramble the padding bytes during encryption.
@bchess bchess added this to the 3.0 milestone Jun 27, 2024
@bchess bchess added the schema-change label for 3.0-dev work that involves schema changes label Jun 27, 2024
@Eta0 Eta0 self-assigned this Jul 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
schema-change label for 3.0-dev work that involves schema changes
Projects
None yet
Development

No branches or pull requests

2 participants