Skip to content

Commit

Permalink
Define more clearly how fragments are combined. (#98)
Browse files Browse the repository at this point in the history
Add a new outermost form, `document`, and new abstract fragment `ivm`.
Together, these let us define more precisely how `ion_1_*` works.

---------

Co-authored-by: Matthew Pope <[email protected]>
Co-authored-by: Tyler Gregg <[email protected]>
  • Loading branch information
3 people authored Jun 12, 2024
1 parent 260ef99 commit 353608a
Showing 1 changed file with 80 additions and 38 deletions.
118 changes: 80 additions & 38 deletions conformance/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,7 @@ For example:
```


## Combining fragments
## Multiple fragments

It was noted above that the `text` clause is appended to the declared IVM to
produce an input document. This is an example of a _fragment_ clause, so named
Expand Down Expand Up @@ -172,66 +172,106 @@ ensuring that all input forms handle scenarios the same way.

### Combining Formats

TODO: What does combining text + binary mean?
The previous section implies that in general, a test case can be thought of as a
tree where the interior nodes are fragments, and the leaves are expectations.
Each expectation is tied to a single document formed by combining the fragments
along the path from the root to the expectation.

In theory, we can mix and match text, binary, and AST in a single
In theory, we could mix and match text, binary, and AST fragments in a single
document, since ultimately they all express the same thing in different ways.
This means the test-runner cannot simply concatenate fragments, and it must
either switch parsing modes mid-stream (theoretically sound since fragments must
contain full top-level values), or transcribe the fragment into one format.
But there are benefits to this approach:
However, that forces fairly complicated transcoding into the test framework.
To keep things reasonable, we constrain test trees such that `text` and `binary`
fragments cannot coexist on the path to one expectation.
AST fragments, however, can mix with text or binary fragments.
This is the case in most situations, since the `ion_1_*` clauses inherently
abstract over the bytes on an IVM.

Before checking an expectation, the conformance framework constructs a document
by combining the preceding sequence of fragments:
* If any fragment is `text`, any AST fragments are effectively converted to
text before processing. Adjacent text fragments MUST be joined with whitespace.
* If any fragment is `binary`, any AST fragments are effectively encoded to
binary before processing.

We say "effectively" because the implementation is not _required_ to do such
transcoding; that is not the behavior under test.
It may be easier and/or faster to skip that, as long as the observable results
are equivalent: the effect on the encoding context, the data produced, and any
errors signaled.

When all fragments are abstract (that is, there are no `text` or `binary`
fragments on the path to an expectation), it is assumed that the test case is
not intended to verify the behavior of the Ion parser/decoder, but rather the
expansion process that happens after parsing completes.
The implementation may verify the test accordingly, by transcoding to text
and/or binary if necessary, or doing neither if it can handle the abstract
syntax more directly.
For example, the framework could surface a stream of "raw" low-level events
common to both formats.

> [!NOTE]
> It would also be valuable if the DSL can be extended to intentionally focus on
> the encoder by providing AST fragments and expecting certain bytes.
> I can imagine that gets tricky given the variety of encoding options available.
> Perhaps we could have DSL clauses that direct specific encoding choices, so that
> we can expect specific byte sequences.
* A test runner can conceivably transcribe every test document into both text
and binary.
* The test suite would inherently exercise both parsers *and both encoders*.

When the implementation does not work with ASTs, those fragments can be
near-trivially be transcoded into text fragments.
## Ion versions

It would also be valuable if the DSL can be extended to intentionally focus on
the encoder by providing AST fragments and expecting certain bytes.
I can imagine that gets tricky given the variety of encoding options available.
Perhaps we could have DSL clauses that direct specific encoding choices, so that
we can expect specific byte sequences.
The examples above illustrate the `ion_1_0` and `ion_1_1` entry points.
These are derived forms, shorthand for the common extensions to a more primitive
starting point: an empty document. Here's how that is expressed:

```
(document (produces))
```

## Ion versions
The `document` clause is the true root of all tests.
It starts a test case with an empty document, no IVM, no bytes of any kind.
The body of the clause is either an expectation or some extensions with more
data.
The example above simply says that an empty Ion document produces no data.

The examples above illustrate the `ion_1_0` and `ion_1_1` entry points.
In addition, the `ion_1_x` form declares behavior common to _both_ 1.0 and 1.1.
To make a more meaningful test, we must add some input to the document:

```
(ion_1_x (text "1::true")
(signals "Invalid annotation"))
(document (then (text "null.int") (denotes (null int))))
```

To be more specific, `(ion_1_0 _form_ ...)` behaves like:
Here, the input document is an eight-byte text document containing exactly the
given characters, and the test expects the implementation to produce a null `int`.

The `ion_1_*` clauses are shorthands for extending the empty document with Ion
version markers.
To be more specific, `(ion_1_0 _form_ ...)` is equivalent to:

```
(each (text) // Text input with no IVM
(text "$ion_1_0")
(binary "E00100EA")
(then _form_ ...))
(document
(then (ivm 1 0) _form_ ...))
```

`(ion_1_1 _form_ ...)` is equivalent to:

```
(each (text "$ion_1_1")
(binary "E00101EA")
(then _form_ ...))
(document
(then (ivm 1 1) _form_ ...))
```

`(ion_1_x _form_ ...)` is equivalent to:
Finally, `(ion_1_x _form_ ...)` is equivalent to:

```
(each (text)
(text "$ion_1_0")
(binary "E00100EA")
(text "$ion_1_1")
(binary "E00101EA")
(then _form_ ...))
(document
(then (ivm 1 0) _form_ ...)
(then (ivm 1 1) _form_ ...))
```

This combination declares behavior common to _both_ 1.0 and 1.1.

```
(ion_1_x (text "1::true")
(signals "Invalid annotation"))
```

When this test suite is used by an implementation that only supports 1.0,
it must ignore any `ion_1_1` clauses, and interpret `ion_1_x` the same as
Expand Down Expand Up @@ -561,14 +601,16 @@ type.
These rules describe the overall shape of test cases:

```ebnf
test ::= "(" "ion_1_0" name-string? fragment* continuation ")"
test ::= "(" "document" name-string? continuation ")"
| "(" "ion_1_0" name-string? fragment* continuation ")"
| "(" "ion_1_1" name-string? fragment* continuation ")"
| "(" "ion_1_x" name-string? fragment* continuation ")"
name-string ::= string
fragment ::= "(" "text" string* ")"
| "(" "binary" bytes* ")"
| "(" "ivm" int int ")"
| "(" "toplevel" ast* ")"
| "(" "encoding" ast* ")"
| "(" "mactab" ast* ")"
Expand Down

0 comments on commit 353608a

Please sign in to comment.