Skip to content

Query Engine

Jens Alfke edited this page Dec 31, 2019 · 5 revisions

This is an internal design doc. You don't need to know any of this unless you work on LiteCore's query implementation.

LiteCore executes queries by translating their JSON form into SQL, compiling the SQL into a SQLite 'statement', and then evaluating that statement. Ta-da, that's it!

...except for the details. There are a number of complications:

  • Document properties are not stored in SQL columns; there's just one blob column called body, and the properties are Fleece-encoded inside that.
  • JSON documents can contain arrays; SQL has no notion of this, nor of the UNNEST and 'ANY'/EVERY features that N1QL uses to query arrays.
  • JSON (and N1QL) has a null value that is, confusingly, unlike SQL's null.

Query Lifecycle

A C4Query reference is a wrapper for a C++ SQLiteQuery instance. The first thing the SQLiteQuery does is instantiate a QueryParser and pass it the JSON query string.

The QueryParser translates the JSON into Fleece to make it easy to work with, then recursively descends into the node tree. As it processes each node, it writes SQL to an output stream. (This is much like the code-generation pass of a compiler, since the JSON format is already structured much like a parse tree.)

The SQLiteQuery then compiles the SQL and stores the compiled statement.

To evaluate the query, a SQLiteQueryEnumerator steps the statement, translates each output row into a Fleece array, and writes those into an outer array. The enumerator holds on to this encoded Fleece, and hands out an iterator to its rows.

Note: This has the unfortunate effect of buffering all the query results in memory at once, but it prevents the problem of leaving a SQLite statement running while changes are potentially being made (in between calls to c4queryenum_next()) by the same database connection -- this is explicitly disallowed and can cause garbage results.

Result columns of types other than string, number, or missing (SQL null) appear as SQLite blobs containing Fleece. The enumerator detects these blobs and handles them correctly.

Handling Document Properties

The QueryParser translates a document property reference into a call to the custom SQLite function fl_value. So for example the JSON query operation [".name.first"] would translate to fl_value(body, '.name.first'). The implementation of fl_body (in SQLiteFleeceFunctions.cc) does this:

  1. Interprets the first parameter, a blob, as a document body, finds the current revision, and gets a Fleece pointer to it.
  2. Uses Fleece's Path class to traverse the key-path given in the second parameter.
  3. Returns a SQLite value (sqlite3_value*) corresponding to the property value.

Data Types Inside Queries

SQLite only has five data types (NULL, integer, float, text, blob). But fortunately it provides an API to tag a sqlite3_value with an application-defined 8-bit "subtype".

  • Booleans are represented as the numbers 0 and 1 with the subtype kFleeceIntBoolean (0x68).
  • Arrays and Dictionaries are represented as SQLite blobs containing their Fleece encoding, with no subtype.
  • Blob data is represented as a SQLite blob with the subtype kPlainBlobSubtype (0x66).
  • A JSON null (as opposed to a SQL NULL, which as per N1QL we call MISSING) is represented as a zero-length SQLite blob with the subtype kFleeceNullSubtype (0x67).

The works OK inside a query except for some edge cases -- for example, the results of SQLite expressions don't have tags, so == results in an integer 0 or 1, not a (tagged) false or true.

Unfortunately the subtype tags are lost when values are returned from a SQLite query (probably because they were a later addition to the API.) So we're back to the regular five SQLite types. To work around this, all projected values (the expressions right after SELECT) are wrapped in a call to fl_result(). This function converts tagged bools, blobs and nulls into equivalent Fleece-encoded blobs. Then the exterior part of query handling, the SQLiteQueryEnumerator, handles all blob-typed values by Fleece-decoding them.

Querying Inside Arrays

Both UNNEST and the ANY/EVERY operators provide a sort of nested query on an array. LiteCore has two ways to translate this to SQL, depending on whether there is a LiteCore index on that array.

Un-indexed

If there's no index, LiteCore uses its fl_each SQL function (SQLiteFleeceEach.cc) This is a complicated thing called a SQLite Virtual Table. The primary use of virtual tables is to create SQLite tables that aren't implemented in the normal way (this is how FTS works), but they have a secondary use as table-valued functions, and that's how we use fl_each. A table-valued function can appear in a JOIN clause as though it were a table, and can present rows of data that are specific to each row the query is processing. So what fl_each does is make a Fleece array look like a SQL table, like a KeyStore table in fact. Each array element appears as a row with its own body column containing the Fleece value.

  • An UNNEST clause is translated to a JOIN on an fl_each call whose parameters are (like fl_value) the document body and the path to the array.
  • An ANY/EVERY expression is translated into a nested SELECT statement whose FROM is a similar fl_each call, with a test on its row count.

Note: This is super clever, but it's not original with us. It, and also our approach to querying doc properties, were inspired by SQLite's own JSON extension.

Indexed

If the array is indexed, the fl_each call is replaced by the name of the index's table, which has the same schema as fl_each's virtual table. There's an ON condition that matches the current document's rowid to the index table's docid.