Skip to content

Signed Documents

Jens Alfke edited this page Feb 12, 2014 · 13 revisions

This is a spec for digitally signing a JSON document. It's not tied to Couchbase Lite (or the Couchbase Sync Gateway) though it was created for use with them. Nor do those products require signed documents.

Signing a document provides these benefits:

  • The enclosed public key can be used as an identifier of the entity that signed the document.
  • Any later modification of the document can be detected, as it'll invalidate the signature.

Thus a signature serves as a form of authentication. Why do we need this when the Sync Gateway already supports several types of authentication?

  • It authenticates a document, not a connection. This is a very important distinction when documents are replicated, especially when the replication can involve more than one server. A document may be forwarded by an entity that didn't create it, so the fact that the replicator connection is authenticated does not authenticate the document. The document has to carry its own credentials.
  • Asymmetric keys allow for many types of identity and authentication. In the simplest case, an entity can create a key-pair and use the public key as its sole identification; this is useful even though it doesn't tie that entity to any external form of ID. More complex systems can use a hierarchical public-key infrastructure like X.509 or a "web of trust" like PGP.

Data Formats

Algorithmic Blob

This is a data blob, tagged with the identity of the algorithm that produced it. It's encoded as an array of two strings:

  1. A short string identifying the algorithm or algorithm family, e.g. "SHA1" for the SHA-1 algorithm.
  2. The base64-encoded data blob.

Signature Object

This is a JSON object that acts as a digital signature of some other JSON object (without specifying where that other object is.)

A signature object has at least the following properties:

  • digest: A cryptographic digest of the canonical encoding of the object being signed, encoded as an algorithmic blob.
  • key: The public key of the key-pair performing the signing, encoded as an algorithmic blob.
  • sig: The digital signature of the canonical encoding of the signature object minus this field. This is just a base64-encoded blob; the algorithm is implicitly the same as the one used for the key property.

Optional properties include:

  • date: A JSON-format timestamp identifying when the signature was generated.
  • expires: The number of seconds the signature remains valid after being generated.

Signed Object

This is simply a JSON object that directly contains its signature as the value of a (signed) property. Obviously this property needs to be ignored while computing the canonical digest of the object.

Algorithms

Generating Canonical Digests

Digest algorithms like SHA-1 operate on raw binary data, not abstract objects like JSON. There are many different ways to encode the same JSON object as data, which will all result in different digests. So for the signer and verifier to agree on the same digest of an object, we have to define a canonical encoding algorithm that always maps equivalent objects to identical data.

There is no standard for canonical JSON encoding yet, but the OLPC group has documented one that's pretty reasonable:

  • No whitespace.
  • Numbers must be representable as 48-bit integers (i.e. in the range [-2^47 .. 2^47-1].)
  • Numbers cannot have decimal points nor scientific notation nor leading zeros. "-0" is not allowed.
  • Strings (including keys) are converted to Unicode Normalization Form C.
  • No escape sequences in strings, other than \" and \\. All other characters must be represented literally, including control characters.
  • Object keys are lexicographically sorted by Unicode character values (code points). The sorting occurs before escape sequences are added.
  • The entire output is encoded in UTF-8.

Note: Non-integers are forbidden because different formatting libraries will convert them to textual form in different ways.

Note: Integers are restricted to 48-bit, not 64-bit, because many JSON parsers convert numbers to double-precision floating point, which is a 64-bit value but only has about 50 bits of mantissa.

Note: The linked OLPC spec says "string are uninterpreted bytes" and "arbitrary [binary] content may be represented as a string -- this is untrue. The JSON specification states that "a string is a sequence of zero or more Unicode characters". The encoding of a string must therefore be valid UTF-8 data.