FIP-13: Canonical serialization for hashing messages #87
Replies: 7 comments 3 replies
-
I really like this proposal, it greatly simplifies the work needed to be done here. The only downside which I'm slightly concerned about the is data bloat attack vector. It might be worth creating a quick model to estimate what data store would be at 10M users if we extrapolated our current message patterns on the Hub, and what it would look like if all 10M users were attackers and maxing out byte size on the network. |
Beta Was this translation helpful? Give feedback.
-
@sanjayprabhu thanks for turning this into a draft. few comments:
|
Beta Was this translation helpful? Give feedback.
-
After attempting to implement this, there's one major issue that came up with this approach: We always need to keep the original bytestream and ship that around when requested (e.g. for sync). If we ever re-encode the message, the hash is not guaranteed to be the same and the message will not validate. This introduces a lot of additional complexity in the hub and invalidates the original rationale for this approach (simplicity). I went back and evaluated the different options, and unfortunately, there's just no easy answer. Everything has a tradeoff. I'll summarize them below: 1. Store a copy of the original bytestream along with the messageWe could modify the Message schema to be:
This maintains the simplicity of the approach, and is relatively simple on the client side, at the cost of doubling the storage cost of the hubs (and on the wire). Does not seem like good tradeoff, especially as we want to be mindful of storage requirements for hubs. 2. Transparently serialize/deserialize the original bytestream within the hubsIn this approach, there would be no change to the Message protobuf, but the typescript
When deserializing a message, we'd decode it as normal, and also attach the original bytestream to 3. Patch ts-proto/switch to a different protobuf libraryNeither of these actually solve the problem because protobuf is not designed to offer deterministic serialization. So, the issue would just show up again in a different form. protobuf v3 actually has an option for deterministic serialization, however it's severely limited:
Note the last sentence. 4. Define our own serialization format (just for hashing)In this approach, we'd still use protobufs, but define a custom approach to serializing the data for the purposes of computing the hash. There are significant downsides:
Within this, there are two ways we could go:
As a side note, I discovered Cosmos is essentially doing option 1. above: https://docs.cosmos.network/main/architecture/adr-020-protobuf-transaction-encoding |
Beta Was this translation helpful? Give feedback.
-
@sanjayprabhu removing the FIP number, since we're only supposed to assign these when in the review stage |
Beta Was this translation helpful? Give feedback.
-
After evaluating the tradeoffs, 4. b) Define convention for how to deterministically serialize protobufs is likely the best way forward. We would only need to define a convention for serialization, we don't need to worry about deserialization. The following rules from the Cosmos ADR-27 offer a good starting point.
However, there are not publicly available implementations. We'll need our own implementation in typescript and likely a reference implementation in one other language for validating everything works end-to-end across languages. |
Beta Was this translation helpful? Give feedback.
-
This issue only affects signature verification when submitting messages to hubs, is that correct? |
Beta Was this translation helpful? Give feedback.
-
What is the issue with @farcasterxyz maintaining a ts-proto fork? I've seen teams like @solana-labs maintain their own fork of projects such as
|
Beta Was this translation helpful? Give feedback.
-
Title: Canonical serialization for hashing messages
Type: Standards
Author: Sanjay Raveendran (@sanjayprabhu), Aditya (@adityapk00)
Abstract
We need a canonical way to serialize messages, or otherwise accept valid messages submitted by non-js libraries that serialize the data slightly differently. This FIP proposes that hubs support a new field with the raw serialized bytes and use that to verify the signature to avoid this issue.
Problem
Verifying a message signature currently involves deserializing and serializing it. Because different protobuf libraries (particularly from non-js languages) serialize data slightly differently, the raw bytes may not match when the hubs re-serialize the data to verify signature. So we end up rejecting messages that are actually valid.
Specification
Hubs will accept a new optional field called
data_bytes
in theMessage
object which will be set if the serialized form formdata
is not consistent with ts-proto.This field is mutually exclusive with
data
. Whendata_bytes
is set, the hub will deserialize this field and overwritedata
. It will also use the rawdata_bytes
bytestream to validate the hash and the signature. This allows the hub to support other protobuf implementation with minimal changes and little to no storage overhead.Rationale
There is a lot of context on this issue, so it's recommended to read it first. This excerpt summarizes the issue:
Alternatives considered
ts-proto
to serialize empty ints "correctly"protobuf-ts
orprotobuf-es
Release
This is a backwards compatible change since it only adds a new, optional field.
Target release for this proposal is protocol version 2023.11.15
Beta Was this translation helpful? Give feedback.
All reactions