From 353608a0d63d27120737ffbd38f8a0b1fd7f4246 Mon Sep 17 00:00:00 2001 From: "Todd V. Jonker" Date: Tue, 11 Jun 2024 19:39:54 -0700 Subject: [PATCH] Define more clearly how fragments are combined. (#98) Add a new outermost form, `document`, and new abstract fragment `ivm`. Together, these let us define more precisely how `ion_1_*` works. --------- Co-authored-by: Matthew Pope <81593196+popematt@users.noreply.github.com> Co-authored-by: Tyler Gregg --- conformance/README.md | 118 ++++++++++++++++++++++++++++-------------- 1 file changed, 80 insertions(+), 38 deletions(-) diff --git a/conformance/README.md b/conformance/README.md index 22e5a89..e94cd96 100644 --- a/conformance/README.md +++ b/conformance/README.md @@ -90,7 +90,7 @@ For example: ``` -## Combining fragments +## Multiple fragments It was noted above that the `text` clause is appended to the declared IVM to produce an input document. This is an example of a _fragment_ clause, so named @@ -172,66 +172,106 @@ ensuring that all input forms handle scenarios the same way. ### Combining Formats -TODO: What does combining text + binary mean? +The previous section implies that in general, a test case can be thought of as a +tree where the interior nodes are fragments, and the leaves are expectations. +Each expectation is tied to a single document formed by combining the fragments +along the path from the root to the expectation. -In theory, we can mix and match text, binary, and AST in a single +In theory, we could mix and match text, binary, and AST fragments in a single document, since ultimately they all express the same thing in different ways. -This means the test-runner cannot simply concatenate fragments, and it must -either switch parsing modes mid-stream (theoretically sound since fragments must -contain full top-level values), or transcribe the fragment into one format. -But there are benefits to this approach: +However, that forces fairly complicated transcoding into the test framework. +To keep things reasonable, we constrain test trees such that `text` and `binary` +fragments cannot coexist on the path to one expectation. +AST fragments, however, can mix with text or binary fragments. +This is the case in most situations, since the `ion_1_*` clauses inherently +abstract over the bytes on an IVM. + +Before checking an expectation, the conformance framework constructs a document +by combining the preceding sequence of fragments: + * If any fragment is `text`, any AST fragments are effectively converted to +text before processing. Adjacent text fragments MUST be joined with whitespace. + * If any fragment is `binary`, any AST fragments are effectively encoded to +binary before processing. + +We say "effectively" because the implementation is not _required_ to do such +transcoding; that is not the behavior under test. +It may be easier and/or faster to skip that, as long as the observable results +are equivalent: the effect on the encoding context, the data produced, and any +errors signaled. + +When all fragments are abstract (that is, there are no `text` or `binary` +fragments on the path to an expectation), it is assumed that the test case is +not intended to verify the behavior of the Ion parser/decoder, but rather the +expansion process that happens after parsing completes. +The implementation may verify the test accordingly, by transcoding to text +and/or binary if necessary, or doing neither if it can handle the abstract +syntax more directly. +For example, the framework could surface a stream of "raw" low-level events +common to both formats. + +> [!NOTE] +> It would also be valuable if the DSL can be extended to intentionally focus on +> the encoder by providing AST fragments and expecting certain bytes. +> I can imagine that gets tricky given the variety of encoding options available. +> Perhaps we could have DSL clauses that direct specific encoding choices, so that +> we can expect specific byte sequences. -* A test runner can conceivably transcribe every test document into both text - and binary. -* The test suite would inherently exercise both parsers *and both encoders*. -When the implementation does not work with ASTs, those fragments can be -near-trivially be transcoded into text fragments. +## Ion versions -It would also be valuable if the DSL can be extended to intentionally focus on -the encoder by providing AST fragments and expecting certain bytes. -I can imagine that gets tricky given the variety of encoding options available. -Perhaps we could have DSL clauses that direct specific encoding choices, so that -we can expect specific byte sequences. +The examples above illustrate the `ion_1_0` and `ion_1_1` entry points. +These are derived forms, shorthand for the common extensions to a more primitive +starting point: an empty document. Here's how that is expressed: +``` +(document (produces)) +``` -## Ion versions +The `document` clause is the true root of all tests. +It starts a test case with an empty document, no IVM, no bytes of any kind. +The body of the clause is either an expectation or some extensions with more +data. +The example above simply says that an empty Ion document produces no data. -The examples above illustrate the `ion_1_0` and `ion_1_1` entry points. -In addition, the `ion_1_x` form declares behavior common to _both_ 1.0 and 1.1. +To make a more meaningful test, we must add some input to the document: ``` -(ion_1_x (text "1::true") - (signals "Invalid annotation")) +(document (then (text "null.int") (denotes (null int)))) ``` -To be more specific, `(ion_1_0 _form_ ...)` behaves like: +Here, the input document is an eight-byte text document containing exactly the +given characters, and the test expects the implementation to produce a null `int`. + +The `ion_1_*` clauses are shorthands for extending the empty document with Ion +version markers. +To be more specific, `(ion_1_0 _form_ ...)` is equivalent to: ``` -(each (text) // Text input with no IVM - (text "$ion_1_0") - (binary "E00100EA") - (then _form_ ...)) +(document + (then (ivm 1 0) _form_ ...)) ``` `(ion_1_1 _form_ ...)` is equivalent to: ``` -(each (text "$ion_1_1") - (binary "E00101EA") - (then _form_ ...)) +(document + (then (ivm 1 1) _form_ ...)) ``` -`(ion_1_x _form_ ...)` is equivalent to: +Finally, `(ion_1_x _form_ ...)` is equivalent to: + ``` -(each (text) - (text "$ion_1_0") - (binary "E00100EA") - (text "$ion_1_1") - (binary "E00101EA") - (then _form_ ...)) +(document + (then (ivm 1 0) _form_ ...) + (then (ivm 1 1) _form_ ...)) ``` +This combination declares behavior common to _both_ 1.0 and 1.1. + +``` +(ion_1_x (text "1::true") + (signals "Invalid annotation")) +``` When this test suite is used by an implementation that only supports 1.0, it must ignore any `ion_1_1` clauses, and interpret `ion_1_x` the same as @@ -561,7 +601,8 @@ type. These rules describe the overall shape of test cases: ```ebnf -test ::= "(" "ion_1_0" name-string? fragment* continuation ")" +test ::= "(" "document" name-string? continuation ")" + | "(" "ion_1_0" name-string? fragment* continuation ")" | "(" "ion_1_1" name-string? fragment* continuation ")" | "(" "ion_1_x" name-string? fragment* continuation ")" @@ -569,6 +610,7 @@ name-string ::= string fragment ::= "(" "text" string* ")" | "(" "binary" bytes* ")" + | "(" "ivm" int int ")" | "(" "toplevel" ast* ")" | "(" "encoding" ast* ")" | "(" "mactab" ast* ")"