A cross-implementation test driver for Amazon Ion readers and writers.
This tool is licensed under the Apache 2.0 License.
The entry point for the tool is amazon/iontest/ion_test_driver.py
,
which targets Python 3.5+. Running the script with the --help
command will enumerate the available options.
The cross-implementation test harness compares the behavior of all Ion implementations in order to assert consensus.
This provides developers:
- An automated way of verifying that all implementations maintain consistent reading/writing behavior.
- A large amount of test coverage with minimal test-specific integration effort.
This provides users:
- A centralized dashboard summarizing the limitations of each implementation, which may be used when evaluating which implementation to use.
- Confidence that their Ion data is not coupled to a particular reader/writer implementation.
Each implementation should have its own set of unit tests to assert correctness of its behavior. There are three reasons for this:
- The cross-implementation test harness depends on each implementation, not vice-versa; intra-implementation unit tests are therefore required for rapid development of that implementation.
- Integration with the cross-implementation harness requires Ion readers and writers that can correctly perform their basic functions (in order to send and receive instructions through the CLI).
- The cross-implementation testing harness is not intended to verify correctness of each implementation; rather, its purpose is to verify behavioral consensus among all implementations under the assumption that at least one of them is correct.
In addition to certain hand-coded unit tests used to exercise targeted
code paths, each implementation should have its own ion-tests harness
that fully implements the good
, bad
, good/equivs
, and
good/non-equivs
semantics. This will provide some duplicate coverage
upon integration with the cross-implementation testing harness, but will
enable a rapid development and testing cycle within that implementation.
It is advisable to implement this test harness such that it can read Ion
streams into EventStreams (described in the sections that follow),
compare in-memory EventStreams for equivalence, and write EventStreams
to Ion streams. This will simplify integration with the
cross-implementation testing harness.
ImportLocation: (import_name: string, location: int)
SymbolToken: (text: string, import_location: ImportLocation)
,
where (null, null)
represents symbol zero.
Note: Implementations that are fully spec-compliant will already provide a SymbolToken implementation that is compatible with the above definition.
An Event represents a single read or write event within an Ion stream. A stream of such Events is an EventStream, which can be interpreted as both Ion reader output and Ion writer input. When Ion stream A is read as EventStream E, and E is written as Ion stream B, A and B must be equivalent under the Ion data model.
When a field is not valid for a particular event, that field should be
omitted. Missing fields should be treated equivalently to null
fields
(for any type of null
).
EventType:
symbol[CONTAINER_START | CONTAINER_END | SCALAR | SYMBOL_TABLE | STREAM_END]
IonType:
symbol[NULL | BOOL | INT | FLOAT | DECIMAL | TIMESTAMP | SYMBOL | STRING | CLOB | BLOB | LIST | SEXP | STRUCT]
ImportDescriptor:
(import_name: string, max_id: int, version: int)
Event:
(event_type: EventType, ion_type: IonType, field_name: SymbolToken, annotations: list<SymbolToken>, value_text: string, value_binary: list<byte>, imports: list<ImportDescriptor>, depth: int)
EventStream:
stream<Event>
, initiated (after the IVM, if applicable) by the top-level
symbol $ion_event_stream
.
ErrorType: symbol[READ | WRITE | STATE]
ErrorDescription:
(error_type: ErrorType, message: string, location: string, event_index: int)
,
where location
refers to the input file (for READ
and certain STATE
errors), or the output file (for WRITE
and certain STATE
errors);
event_index
refers to the index of the event being processed when the
error was raised, if known; and message
can be used to convey the source
file and line number at which the error occurred, and a reason for the error.
ErrorReport: stream<ErrorDescription>
ComparisonResultType: symbol[EQUAL | NOT_EQUAL | ERROR]
ComparisonContext:
(location: string, event: Event, event_index: int)
ComparisonResult:
(result: ComparisonResultType, lhs: ComparisonContext, rhs: ComparisonContext, message: string)
ComparisonReport: stream<ComparisonResult>
ComparisonResults should only be generated when the result of the comparison differs from what was expected.
For example, comparing the stream (contained in stream_a.ion
)
abc [1]
to (contained in stream_b.ion
) abc [2]
would produce the
following serialized ComparisonReport when the two streams are expected
to be equivalent:
{result: NOT_EQUAL, lhs:{location: "stream_a.ion", event: {event_type: SCALAR, ion_type: INT, value_text: "1", value_binary: [0x21, 0x01], depth:1}, event_index: 2}, rhs:{location: "stream_b.ion", event: {event_type: SCALAR, ion_type: INT, value_text: "2", value_binary: [0x21, 0x02], depth:1}, event_index:2}, message: "1 vs. 2"}
Note that this ComparisonReport contains only one ComparisonResult because only one event pair was not equal.
PerformanceIO:
(name: string, size: int)
PerformanceReport:
(options: string, input: PerformanceIO, output: PerformanceIO, memory_usage: int, elapsed_time: int)
For example, the command
ion-c process -f binary -p perf.ion -o customer.10n customer.ion
could produce the following serialized PerformanceReport:
{options: "-f binary", output: {name: "customer.10n", size: 12345}, input: {name: "customer.ion", size: 12345}, memory_usage:12345, elapsed_time:12345}
ReadInstruction: symbol[NEXT | SKIP]
, where NEXT tells the reader to
emit the event representing the next value in the stream (stepping in if
it is currently positioned on a container), and SKIP (which does nothing
at the top-level) tells the reader to skip to the end of the current
container, step out, and emit a CONTAINER_END event.
ReadInstructionStream: stream<ReadInstruction>
In a good/equivs
or good/non-equivs
vector, a string element of a
top-level Ion sequence (list or s-expression) annotated with
"embedded_documents" (soon to be "$ion_embedded_streams"). This
string should be interpreted as a stream of text Ion data. For example,
in the following Ion stream,
$ion_embedded_streams::(
"$ion_1_0 abc"
'''$ion_1_0 $ion_symbol_table::{symbols:["abc"]} $10'''
)
both of the elements of the s-expression are interpreted as Ion streams.
In an EventStream, a sequence of Events between the depth-zero CONTAINER_START and CONTAINER_END events for an Ion sequence (list or s-expression) with the "embedded_documents" (soon to be "$ion_embedded_streams") annotation, representing a standalone EventStream. These embedded EventStreams always restart at depth zero and end with a STREAM_END event. For example, the following EventStream
$ion_event_stream
{event_type: CONTAINER_START, ion_type: SEXP, annotations: [{text:"$ion_embedded_streams"}], depth:0}
{event_type: SCALAR, ion_type: INT, value_text: "10", value_binary: [0x21, 0x0A], depth:0}
{event_type: STREAM_END, depth:0}
{event_type: SCALAR, ion_type: INT, value_text: "10", value_binary: [0x21, 0x0A], depth:0}
{event_type: STREAM_END, depth:0}
{event_type: CONTAINER_END, ion_type: SEXP, depth:0}
{event_type: STREAM_END, depth:0}
contains two embedded EventStreams, each with a single top-level int.
Both import_name and location must be defined, and the corresponding fields must be exactly equal.
- Compare text. If equal, the symbol tokens are equivalent. If not equal,
- Compare import_locations.
- Compare event_type. If equal,
- Compare depth. If equal,
- Compare ion_type. If equal,
- Compare field_name for SymbolToken equality. If equal,
- Compare each annotation in annotations for SymbolToken equality. If equal,
- For SCALAR events, read the value_text and value_binary from both events into EventStreams, which each must contain exactly one SCALAR event. Extract the scalar value from each of these events into the appropriate programming language type. Assert that the value_text and value_binary values from the same Event are equivalent. Finally, compare under the Ion data model either value against the value produced by the other event.
The recursive algorithm for determining EventStream equality is better expressed in code than in prose. However, there are a few things to note:
- EventStream equivalence cannot simply be determined by comparing
Events at corresponding indices. This is because
- A container value is comprised of at least two Events, and structs are unordered.
- Streams with symbol table boundaries at different positions in the stream may still be equivalent. Therefore, when a SYMBOL_TABLE event is encountered in either stream, that event must be skipped.
- Between the CONTAINER_START and CONTAINER_END events for structs, value Events need to be matched using field names. Because structs may have multiple values for the same field name, determining that two events have equal field names but unequal values is not sufficient to determine non-equivalence unless all other field name/value pairs have already been compared.
The CLI provides a common language-agnostic interface for all implementations. It is designed to be useful not only to the tests, but also to users. That said, it contains certain features that may not be useful to users (e.g. support for embedded streams); it may be desirable to provide wrappers over this CLI that simplify common commands (e.g. jq-like filtering) and hide the features that exist to facilitate internal testing. In the end, each implementation that integrates with the test harness will have a well-tested user-facing CLI (or set of CLIs).
Each implementation of the CLI should support being used in "interactive mode," which may be entered by invoking the executable with zero arguments. In this mode, the CLI will accept commands and provide responses until interrupted; the behavior of the individual commands will be equivalent between interactive and non-interactive modes. This may be useful in languages with a high startup and/or shutdown cost.
Command invocations that result in errors will exit with non-zero status codes. All other command invocations will exit with status code zero.
Usage:
ion
ion process [--output <file>] [--error-report <file>] [--output-format (text | pretty | binary | events | none)] [--catalog <file>]... [--imports <file>]... [--perf-report <file>] [--filter <filter> | --traverse <file>] [-] [<input_file>]...
ion compare [--output <file>] [--error-report <file>] [--output-format (text | pretty | binary | none)] [--catalog <file>]... [--comparison-type (basic | equivs | non-equivs | equiv-timeline)] [-] [<input_file>]...
ion extract [--output <file>] [--error-report <file>] [--output-format (text | pretty | binary | none)] (--symtab-name <name>) (--symtab-version <version>) [-] [<input_file>]...
ion help [extract | compare | process]
ion --help
ion version
ion --version
Commands:
extract Extract the symbols from the given input(s) into a shared symbol table with the given name and
version.
compare Compare all inputs (which may contain Ion streams and/or EventStreams) against all other inputs
using the Ion data model's definition of equality. Write a ComparisonReport to the output.
process Read the input file(s) (optionally, specifying ReadInstructions or a filter) and re-write in the
format specified by --output.
help Print this general help. If provided a command, prints help specific to that command.
version Print version information about this tool.
Options:
-o, --output <file>
Output location. [default: stdout]
-f, --output-format <type>
Output format, from the set (text | pretty | binary | events| none). 'events' is only available with the
'process' command, and outputs a serialized EventStream representing the input Ion stream(s).
[default: pretty]
-e, --error-report <file>
ErrorReport location. [default: stderr]
-p, --perf-report <file>
PerformanceReport location. If left unspecified, a performance report is not generated.
-c, --catalog <file>
Location(s) of files containing Ion streams of shared symbol tables from which to populate a catalog. This
catalog will be used by all readers and writers when encountering shared symbol table import descriptors.
-i, --imports <file>
Location(s) of files containing list(s) of shared symbol table import descriptors. These imports will be
used by writers during serialization. If a catalog is available (see: --catalog), the writer will attempt
to match those import descriptors to actual shared symbol tables using the catalog.
-F, --filter <filter>
JQ-style filter to perform on the input stream(s) before writing the result.
-t, --traverse <file>
Location of a file containing a stream of ReadInstructions to use when reading the input stream(s) instead
of performing a full traversal.
-n, --symtab-name <symtab_name>
Name of the shared symbol table to be extracted.
-V, --symtab-version <symtab_version>
Version of the shared symbol table to be extracted.
-y, --comparison-type (basic | equivs | non-equivs | equiv-timeline)
Comparison semantics to be used with the compare command, from the set (basic | equivs | non-equivs |
equiv-timeline). Any embedded streams in the inputs are compared for EventStream equality. 'basic' performs
a standard data-model comparison between the corresponding events (or embedded streams) in the inputs.
'equivs' verifies that each value (or embedded stream) in a top-level sequence is equivalent to every other
value (or embedded stream) in that sequence. 'non-equivs' does the same, but verifies that the values (or
embedded streams) are not equivalent. 'equiv-timeline' is the same as 'equivs', except that when top-level
sequences contain timestamp values, they are considered equivalent if they represent the same instant
regardless of whether they are considered equivalent by the Ion data model. [default: basic]
-h, --help
Synonym for the help command.
--version
Synonym for the version command.
Examples:
Read input.10n and pretty-print it to stdout.
$ ion process input.10n
Read input.ion (using a catalog comprised of the shared symbol tables contained in catalog.10n) without
re-writing, and write a performance report to stdout.
$ ion process --output-format none --catalog catalog.10n --perf-report -- input.10n
Read input.10n according to the ReadInstructions specified by instructions.ion and write the resulting Events
to output.ion.
$ ion process -o output.ion -f events -t instructions.ion input.10n
Extract a shared symbol table with name "foo_table" and version 1 from the piped Ion stream and write it in
binary format to foo_table.10n.
$ echo 'foo' | ion extract -n 'foo_table' -V 1 -o foo_table.10n -f binary -
Read input1.ion and input2.10n and output to stdout any values in the streams that match the filter .foo.
$ ion process --filter .foo input1.ion input2.10n
Compare each stream in read_events.ion, input1.ion, and input2.10n against all other streams in the set and
output a ComparisonReport to comparison_report.ion.
$ ion compare -o comparison_report.ion read_events.ion input1.ion input2.10n
ReadInstructions can be translated into reader API calls, allowing tests
to define a traversal for a particular test stream (see --traverse
).
When not positioned on a container, the NEXT instruction translates into
reader.next()
in ion-java. When positioned on a container (meaning
that the event_type
of the last event emitted was CONTAINER_START),
the NEXT instruction translates to reader.stepIn()
followed by
reader.next()
in ion-java. In both cases, emit an event representing
the value at the reader's new position.
When at the top-level and not positioned on a container, the SKIP
instruction has no effect. When at a depth of at least one and not
positioned on a container, the SKIP instruction translates to
reader.stepOut()
in ion-java. When positioned on a container at any
depth (meaning that the last event emitted had event_type
CONTAINER_START), the SKIP instruction translates to reader.next()
in
ion-java, which skips over the container without stepping in. Whenever
SKIP has an effect, emit a CONTAINER_END event.
If the ReadInstruction stream ends before the reader reaches the end of its Ion stream,
- If the reader is at depth zero, finish reading and convey success.
- If the reader is at depth greater than zero, write an ErrorDescription to the ErrorReport.
If the reader reaches the end of the stream (and emits a STREAM_END event) before the ReadInstruction stream ends, write an ErrorDescription to the ErrorReport.
Perform a full traversal of the test stream, stepping into and fully iterating every container encountered.
With or without ReadInstructions, an Ion stream can be read into an
EventStream (--output-format events
).
When the reader encounters a value that is not a system value, create an
event of the appropriate event_type (CONTAINER_START for containers,
otherwise SCALAR), and set the event's ion_type
to the type of the
current value. Also set the event's depth
to the reader's current
depth (where the top-level is depth zero), set the event's field_name
if the value is in a struct, and set any annotations on the current
value. For SCALAR events, initialize temporary writers (with any shared
symbol tables required by symbol tokens with unknown text) to
re-serialize their values as both text and binary Ion (including a local
symbol table, if required) into the event's value_text
and
value_binary
fields, respectively. If the scalar is a symbol value
with the same text as an IVM, the serialized Ion value contained in
the value_text
and value_binary
fields must be annotated with the
special $ion_user_value
annotation. This prevents the writers from
interpreting the IVM-like symbol as an IVM and prescribing IVM
semantics. This annotation is always ignored by EventStream readers.
When the reader reaches the end of the current container, create a
CONTAINER_END event with the same ion_type
and depth as that
container's corresponding CONTAINER_START event; leave all other fields
in the event undefined.
When the reader encounters a local symbol table, create a SYMBOL_TABLE
event. Set this event's imports
field to a list of ImportDescriptors
representing any shared symbol table imports included by the new local
symbol table. These will be used by writers following this event in the
stream. Set the event's depth
to zero and leave all other fields in
the event undefined. SYMBOL_TABLE events are always skipped during
EventStream comparison.
Denote the end of the stream with a STREAM_END event. Set the event's
depth
to zero and leave all other fields in the event undefined.
If, before the end of the stream, the reader raises an error for any reason, write an ErrorDescription to the ErrorReport. Abort reading without writing any additional Events to the EventStream.
A good
input file called bar_baz_foo.ion
with the contents
bar::baz::{foo:1}
read using the command
$ ion process --output bar_baz_foo_read_events.ion --output-format events bar_baz_foo.ion
would serialize the following EventStream to
bar_baz_foo_read_events.ion
:
$ion_event_stream
{event_type: CONTAINER_START, ion_type: STRUCT, annotations:[{text:"bar"}, {text:"baz"}], depth:0}
{event_type: SCALAR, ion_type: INT, field_name: {text:"foo"}, value_text: "1", value_binary: [0x21, 0x01], depth:1}
{event_type: CONTAINER_END, ion_type: STRUCT, depth:0}
{event_type: STREAM_END, depth:0}
A bad
input file called repeatedUnderscore.ion
with the contents
[1__0]
would generate the serialized event stream
$ion_event_stream
{event_type: CONTAINER_START, ion_type: LIST, depth:0}
and an ErrorReport similar to the following:
{error_type: READ, message: "ion_reader_text.c:999 Line 1 index 3: Numeric values must not have repeated underscores.", location: "bad/repeatedUnderscore.ion", event_index: 1}
Read each embedded Ion stream in the input as a separate EventStream. Insert these streams back into the source stream, replacing the string SCALAR events from which they originated.
A good/equivs
input file called ten.ion
with the contents
$ion_embedded_streams::(
"$ion_1_0 10"
"1_0"
)
read using the command
$ ion process --output ten_events_embedded.ion --output-format events ten.ion
would serialize the following EventStream to ten_events_embedded.ion
:
$ion_event_stream
{event_type: CONTAINER_START, ion_type: SEXP, annotations: [{text:"$ion_embedded_streams"}], depth:0}
{event_type: SCALAR, ion_type: INT, value_text: "10", value_binary: [0x21, 0x0A], depth:0}
{event_type: STREAM_END, depth:0}
{event_type: SCALAR, ion_type: INT, value_text: "10", value_binary: [0x21, 0x0A], depth:0}
{event_type: STREAM_END, depth:0}
{event_type: CONTAINER_END, ion_type: SEXP, depth:0}
{event_type: STREAM_END, depth:0}
A serialized EventStream can be used as a sequence of write instructions
(--output-format (text | pretty | binary)
). Using the combination of an
Event's event_type
and its ion_type
, translate the Event into a
writer API. For example, encountering a SCALAR event with ion_type
INT
in ion-java would translate to writer.writeInt
. Encountering a
CONTAINER_START event with ion_type STRUCT in ion-c would translate to
ion_writer_start_container(writer, tid_STRUCT)
.
For SCALAR events, use temporary readers to read the Events'
value_text
and value_binary
; test these values for data model
equivalence. If they are not equivalent, abort and write an
ErrorDescription to the ErrorReport.
For CONTAINER_END events, finish the current container.
For SYMBOL_TABLE events, flush the writer's existing local symbol table
and any buffered data, forcing the writer to create a new local symbol
table that imports the list of symbol tables declared by the imports
field of the Event. This ensures that symbol tokens with unknown text
that occur in subsequent events in the stream can be written correctly.
For STREAM_END events, finish the writer's current stream, forcing the writer to flush any buffered data. If additional Events follow, the writer must first write an Ion version marker.
If the writer raises an error at any point, or the EventStream ends without a STREAM_END event, abort writing and write an ErrorDescription to the ErrorReport.
The input file one_events.ion
with the contents
$ion_event_stream
{event_type: SCALAR, ion_type: INT, value_text: "1", value_binary: [0x21, 0x01], depth:0}
{event_type: STREAM_END, depth:0}
written using
$ ion process --output one.ion --output-format text one_events.ion
$ ion process --output one.10n --output-format binary one_events.ion
would first read the EventStream into memory and use temporary readers
to read the SCALAR event's value_text
and value_binary
. After
verifying that these are equivalent, it would write to one.ion
with
the contents
1
and to one.10n
with the bytes
\xE0\x01\x00\xEA\x21\x01
Write any embedded EventStreams in the input using a temporary text Ion writer, and write the resulting text Ion as a single Ion string per embedded EventStream. Future iterations of the test harness may allow these embedded streams to be written in the binary Ion format.
The input file ten_events_embedded.ion
with the contents
$ion_event_stream
{event_type: CONTAINER_START, ion_type: SEXP, annotations: [{text:"$ion_embedded_streams"}], depth:0}
{event_type: SCALAR, ion_type: INT, value_text: "10", value_binary: [0x21, 0x0A], depth:0}
{event_type: STREAM_END, depth:0}
{event_type: SCALAR, ion_type: INT, value_text: "10", value_binary: [0x21, 0x0A], depth:0}
{event_type: STREAM_END, depth:0}
{event_type: CONTAINER_END, ion_type: SEXP, depth:0}
{event_type: STREAM_END, depth:0}
read using the command
$ ion process --output ten_embedded.ion --output-format text ten_events_embedded.ion
would write the following to ten_embedded.ion
:
$ion_embedded_streams::(
"10"
"10"
)
Assume there are two implementations. The CLI executables for both are
available to the cross-implementation test harness under the names
ion-c
and ion-java
.
The following file locations are defined:
INPUT_FILE: The vector under test.
READ_EVENTS_ION_C: The EventStream generated by ion-c while reading the input data.
READ_EVENTS_ION_C_ERROR: The ErrorReport generated by ion-c during the read test for the input file.
READ_EVENTS_ION_JAVA: The EventStream generated by ion-java while reading the input data.
READ_EVENTS_ION_JAVA_ERROR: The ErrorReport generated by ion-java during the read test for the input file.
READ_VERIFY_ION_C: The ComparisonReport generated by ion-c during the verification phase of the read test for the input file.
READ_VERIFY_ION_C_ERROR: The ErrorReport generated by ion-c during the verification phase of the read test for the input file.
READ_VERIFY_ION_JAVA: The ComparisonReport generated by ion-java during the verification phase of the read test for the input file.
READ_VERIFY_ION_JAVA_ERROR: The ErrorReport generated by ion-java during the verification phase of the read test for the input file.
READ_VERIFY_EQUIVS_ION_C: The ComparisonReport generated by ion-c during the equivs/non-equivs semantics verification phase of the read test.
READ_VERIFY_EQUIVS_ION_C_ERROR: The ErrorReport generated by ion-c during the equivs/non-equivs semantics verification phase of the read test.
READ_VERIFY_EQUIVS_ION_JAVA: The ComparisonReport generated by ion-java during the equivs/non-equivs semantics verification phase of the read test.
READ_VERIFY_EQUIVS_ION_JAVA_ERROR: The ErrorReport generated by ion-java during the equivs/non-equivs semantics verification phase of the read test.
WRITE_STREAM_ION_C_ION_C_TEXT: The text Ion stream written by ion-c from the EventStream read by ion-c.
WRITE_STREAM_ION_C_ION_C_TEXT_ERROR: The ErrorReport generated by ion-c while attempting to write a text Ion stream from the EventStream read by ion-c.
WRITE_STREAM_ION_C_ION_C_BINARY: The binary Ion stream written by ion-c from the EventStream read by ion-c.
WRITE_STREAM_ION_C_ION_C_BINARY_ERROR: The ErrorReport generated by ion-c while attempting to write a binary Ion stream from the EventStream read by ion-c.
WRITE_STREAM_ION_C_ION_JAVA_TEXT: The text Ion stream written by ion-c from the EventStream read by ion-java.
WRITE_STREAM_ION_C_ION_JAVA_TEXT_ERROR: The ErrorReport generated by ion-c while attempting to write a text Ion stream from the EventStream read by ion-java.
WRITE_STREAM_ION_C_ION_JAVA_BINARY: The binary Ion stream written by ion-c from the EventStream read by ion-java.
WRITE_STREAM_ION_C_ION_JAVA_BINARY_ERROR: The ErrorReport generated by ion-c while attempting to write a binary Ion stream from the EventStream read by ion-java.
WRITE_STREAM_ION_JAVA_ION_C_TEXT: The text Ion stream written by ion-java from the EventStream read by ion-c.
WRITE_STREAM_ION_JAVA_ION_C_TEXT_ERROR: The ErrorReport generated by ion-java while attempting to write a text Ion stream from the EventStream read by ion-c.
WRITE_STREAM_ION_JAVA_ION_C_BINARY: The binary Ion stream written by ion-java from the EventStream read by ion-c.
WRITE_STREAM_ION_JAVA_ION_C_BINARY_ERROR: The ErrorReport generated by ion-java while attempting to write a binary Ion stream from the EventStream read by ion-c.
WRITE_STREAM_ION_JAVA_ION_JAVA_TEXT: The text Ion stream written by ion-java from the EventStream read by ion-java.
WRITE_STREAM_ION_JAVA_ION_JAVA_TEXT_ERROR: The ErrorReport generated by ion-java while attempting to write a text Ion stream from the EventStream read by ion-java.
WRITE_STREAM_ION_JAVA_ION_JAVA_BINARY: The binary Ion stream written by ion-java from the EventStream read by ion-java.
WRITE_STREAM_ION_JAVA_ION_JAVA_BINARY_ERROR:The ErrorReport generated by ion-java while attempting to write a binary Ion stream from the EventStream read by ion-java.
WRITE_VERIFY_ION_C: The ComparisonReport generated by ion-c during the verification phase of the write test for the input file.
WRITE_VERIFY_ION_C_ERROR: The ErrorReport generated by ion-c during the verification phase of the write test for the input file.
WRITE_VERIFY_ION_JAVA: The ComparisonReport generated by ion-java during the verification phase of the write test for the input file.
WRITE_VERIFY_ION_JAVA_ERROR: The ErrorReport generated by ion-java during the verification phase of the write test for the input file.
WRITE_VERIFY_EQUIVS_ION_C: The ComparisonReport generated by ion-c during the equivs/non-equivs semantics verification phase of the write test.
WRITE_VERIFY_EQUIVS_ION_C_ERROR: The ErrorReport generated by ion-c during the equivs/non-equivs semantics verification phase of the write test.
WRITE_VERIFY_EQUIVS_ION_JAVA: The ComparisonReport generated by ion-java during the equivs/non-equivs semantics verification phase of the write test.
WRITE_VERIFY_EQUIVS_ION_JAVA_ERROR: The ErrorReport generated by ion-java during the equivs/non-equivs semantics verification phase of the write test.
The harness selects a test file, which contains the Ion stream 1
, to
be the INPUT_FILE.
Using the CLI, write a file containing the stream of Events that were
read from the input Ion stream. The CLI supports being provided with an
optional catalog (see --catalog
) to use while reading, which enables
streams with local symbol tables that declare shared imports to be
tested with or without a catalog.
$ ion-c process --output READ_EVENTS_ION_C --output-format events --error-report READ_EVENTS_ION_C_ERROR INPUT_FILE
$ ion-java process --output READ_EVENTS_ION_JAVA --output-format events --error-report READ_EVENTS_ION_JAVA_ERROR INPUT_FILE
At this point, both READ_EVENTS_ION_C and READ_EVENTS_ION_JAVA should contain an EventStream which is data model equivalent to
$ion_event_stream
{event_type: SCALAR, ion_type: INT, value_text: "1", value_binary: [0x21, 0x01], depth:0}
{event_type: STREAM_END, depth:0}
Because this is a good
file, the test harness expects none of the
implementations to have raised an error on read. It verifies this by
confirming that the ErrorReports located at READ_EVENTS_ION_C_ERROR
and READ_EVENTS_ION_JAVA_ERROR are empty. If either is not, the test
harness extracts messages from them and fails the test for the
implementation that generated the offending ErrorReport for this vector.
If this were a bad
file, the test harness would expect all of the
implementations to have equivalent incomplete event streams and
non-empty ErrorReports.
If any of the implementations read the vector successfully, the next step is to make sure all successful implementations agree on the read results by asking each of them to compare their EventStream against those generated by all others.
$ ion-c compare --output READ_VERIFY_ION_C --error-report READ_VERIFY_ION_C_ERROR READ_EVENTS_ION_C READ_EVENTS_ION_JAVA INPUT_FILE
$ ion-java compare --output READ_VERIFY_ION_JAVA --error-report READ_VERIFY_ION_JAVA_ERROR READ_EVENTS_ION_C READ_EVENTS_ION_JAVA INPUT_FILE
First, check the error reports at READ_VERIFY_ION_C_ERROR and READ_VERIFY_ION_JAVA_ERROR. If either of them is not empty, extract messages from them and fail the test for that implementation for this vector.
At this point, both READ_VERIFY_ION_C and READ_VERIFY_ION_JAVA should contain empty ComparisonReports, because equivalence was expected and no elements of the EventStream differed. If this is not the case (meaning that there is at least one ComparisonResult in the ComparisonReport, and its type is either NOT_EQUAL or ERROR), extract an error message, and fail the test for that implementation for this vector.
(Note: bad
vectors skip phases 3 and 4.)
Using the EventStreams generated by implementations that successfully passed phase 2, write Ion streams in both text and binary Ion.
$ ion-c process --output-format text --output WRITE_STREAM_ION_C_ION_C_TEXT --error-report WRITE_STREAM_ION_C_ION_C_TEXT_ERROR READ_EVENTS_ION_C
$ ion-c process --output-format binary --output WRITE_STREAM_ION_C_ION_C_BINARY --error-report WRITE_STREAM_ION_C_ION_C_BINARY_ERROR READ_EVENTS_ION_C
$ ion-c process --output-format text --output WRITE_STREAM_ION_C_ION_JAVA_TEXT --error-report WRITE_STREAM_ION_C_ION_JAVA_TEXT_ERROR READ_EVENTS_ION_JAVA
$ ion-c process --output-format binary --output WRITE_STREAM_ION_C_ION_JAVA_BINARY --error-report WRITE_STREAM_ION_C_ION_JAVA_BINARY_ERROR READ_EVENTS_ION_JAVA
$ ion-java process --output-format text --output WRITE_STREAM_ION_JAVA_ION_C_TEXT --error-report WRITE_STREAM_ION_JAVA_ION_C_TEXT_ERROR READ_EVENTS_ION_C
$ ion-java process --output-format binary --output WRITE_STREAM_ION_JAVA_ION_C_BINARY --error-report WRITE_STREAM_ION_JAVA_ION_C_BINARY_ERROR READ_EVENTS_ION_C
$ ion-java process --output-format text --output WRITE_STREAM_ION_JAVA_ION_JAVA_TEXT --error-report WRITE_STREAM_ION_JAVA_ION_JAVA_TEXT_ERROR READ_EVENTS_ION_JAVA
$ ion-java process --output-format binary --output WRITE_STREAM_ION_JAVA_ION_JAVA_BINARY --error-report WRITE_STREAM_ION_JAVA_ION_JAVA_BINARY_ERROR READ_EVENTS_ION_JAVA
This should produce four text and four binary streams, all of which
should be data-model equivalent to the text Ion 1
.
Verify that all implementations agree that all of the re-written streams are equivalent to the original stream and to each other.
$ ion-c compare --output WRITE_VERIFY_ION_C --error-report WRITE_VERIFY_ION_C_ERROR INPUT_FILE ION_C_ION_C_TEXT_FILE WRITE_STREAM_ION_C_ION_C_BINARY WRITE_STREAM_ION_C_ION_JAVA_TEXT WRITE_STREAM_ION_C_ION_JAVA_BINARY WRITE_STREAM_ION_JAVA_ION_C_TEXT WRITE_STREAM_ION_JAVA_ION_C_BINARY WRITE_STREAM_ION_JAVA_ION_JAVA_TEXT WRITE_STREAM_ION_JAVA_ION_JAVA_BINARY
$ ion-java compare --output WRITE_VERIFY_ION_JAVA --error-report WRITE_VERIFY_ION_JAVA_ERROR INPUT_FILE ION_C_ION_C_TEXT_FILE WRITE_STREAM_ION_C_ION_C_BINARY WRITE_STREAM_ION_C_ION_JAVA_TEXT WRITE_STREAM_ION_C_ION_JAVA_BINARY WRITE_STREAM_ION_JAVA_ION_C_TEXT WRITE_STREAM_ION_JAVA_ION_C_BINARY WRITE_STREAM_ION_JAVA_ION_JAVA_TEXT WRITE_STREAM_ION_JAVA_ION_JAVA_BINARY
The same technique is used to verify the comparison and error reports generated by these commands within the test harness as was used in Phase 2.
If the test harness has made it to this point without raising an error, then the test for this vector is successful.
The harness selects a good/non-equivs
test file to be the INPUT_FILE.
The file has the following contents:
$ion_embedded_streams::(
"1"
"1.0"
)
Using the CLI, write a file containing the stream of Events that were
read from the input Ion stream. The embedded Ion streams are detected
because of the $ion_embedded_streams
annotation.
$ ion-c process --output READ_EVENTS_ION_C --output-format events --error-report READ_EVENTS_ION_C_ERROR INPUT_FILE
$ ion-java process --output READ_EVENTS_ION_JAVA --output-format events --error-report READ_EVENTS_ION_JAVA_ERROR INPUT_FILE
At this point, both READ_EVENTS_ION_C and READ_EVENTS_ION_JAVA should contain an EventStream which is data model equivalent to
$ion_event_stream
{event_type: CONTAINER_START, ion_type: SEXP, annotations: [{text:"$ion_embedded_streams"}], depth:0}
{event_type: SCALAR, ion_type: INT, value_text: "1", value_binary: [0x21, 0x01], depth:0}
{event_type: STREAM_END, depth:0}
{event_type: SCALAR, ion_type: DECIMAL, value_text: "1.0", value_binary: [0x52, 0xC1, 0x0A], depth:0}
{event_type: STREAM_END, depth:0}
{event_type: CONTAINER_END, ion_type: SEXP, depth:0}
{event_type: STREAM_END, depth:0}
Just as in phase 2 of the previous example, the test harness verifies that none of the implementations generated non-empty ErrorReports in READ_EVENTS_ION_C_ERROR and READ_EVENTS_ION_JAVA_ERROR.
Also as in phase 2 of the previous example, read results from all successful implementations should now be compared for equivalence with each other and with the input file. The embedded Ion streams will be compared against the corresponding embedded EventStreams in the other files.
$ ion-c compare --output READ_VERIFY_ION_C --error-report READ_VERIFY_ION_C_ERROR READ_EVENTS_ION_C READ_EVENTS_ION_JAVA INPUT_FILE
$ ion-java compare --output READ_VERIFY_ION_JAVA --error-report READ_VERIFY_ION_JAVA_ERROR READ_EVENTS_ION_C READ_EVENTS_ION_JAVA INPUT_FILE
These commands should produce empty ComparisonReports in READ_VERIFY_ION_C and READ_VERIFY_ION_JAVA because equivalence is expected.
Now, the same inputs must be compared according to the good/equivs
or
good/non-equivs
test semantics, which require that all elements of
top-level sequences be either equal or not equal to all other elements
of the same sequence. Since this is a good/non-equivs
vector, the
--comparison-type non-equivs
option achieves this.
$ ion-c compare --output READ_VERIFY_EQUIVS_ION_C --error-report READ_VERIFY_EQUIVS_ION_C_ERROR --comparison_type non-equivs READ_EVENTS_ION_C READ_EVENTS_ION_JAVA INPUT_FILE
$ ion-java compare --output READ_VERIFY_EQUIVS_ION_JAVA --error-report READ_VERIFY_EQUIVS_ION_JAVA_ERROR --comparison_type non-equivs READ_EVENTS_ION_C READ_EVENTS_ION_JAVA INPUT_FILE
These commands should produce empty ComparisonReports in READ_VERIFY_EQUIVS_ION_C and READ_VERIFY_EQUIVS_ION_JAVA. Since non-equivalence is expected, any ComparisonResults present in the ComparisonReport will have type EQUAL or ERROR. The test harness must report these as errors and fail the test for that implementation for this vector.
Using the EventStreams generated by all implementations that successfully passed phase 2, write Ion streams in both text and binary Ion. The embedded EventStreams will be detected and written as string values containing Ion text.
$ ion-c process --output-format text --output WRITE_STREAM_ION_C_ION_C_TEXT --error-report WRITE_STREAM_ION_C_ION_C_TEXT_ERROR READ_EVENTS_ION_C
$ ion-c process --output-format binary --output WRITE_STREAM_ION_C_ION_C_BINARY --error-report WRITE_STREAM_ION_C_ION_C_BINARY_ERROR READ_EVENTS_ION_C
$ ion-c process --output-format text --output WRITE_STREAM_ION_C_ION_JAVA_TEXT --error-report WRITE_STREAM_ION_C_ION_JAVA_TEXT_ERROR READ_EVENTS_ION_JAVA
$ ion-c process --output-format binary --output WRITE_STREAM_ION_C_ION_JAVA_BINARY --error-report WRITE_STREAM_ION_C_ION_JAVA_BINARY_ERROR READ_EVENTS_ION_JAVA
$ ion-java process --output-format text --output WRITE_STREAM_ION_JAVA_ION_C_TEXT --error-report WRITE_STREAM_ION_JAVA_ION_C_TEXT_ERROR READ_EVENTS_ION_C
$ ion-java process --output-format binary --output WRITE_STREAM_ION_JAVA_ION_C_BINARY --error-report WRITE_STREAM_ION_JAVA_ION_C_BINARY_ERROR READ_EVENTS_ION_C
$ ion-java process --output-format text --output WRITE_STREAM_ION_JAVA_ION_JAVA_TEXT --error-report WRITE_STREAM_ION_JAVA_ION_JAVA_TEXT_ERROR READ_EVENTS_ION_JAVA
$ ion-java process --output-format binary --output WRITE_STREAM_ION_JAVA_ION_JAVA_BINARY --error-report WRITE_STREAM_ION_JAVA_ION_JAVA_BINARY_ERROR READ_EVENTS_ION_JAVA
This should produce four text and four binary streams, all of which should be data-model equivalent to the text Ion:
$ion_embedded_streams::(
"1"
"1.0"
)
Verify that all implementations agree that all of the re-written streams are equivalent to the original stream and to each other.
$ ion-c compare --output WRITE_VERIFY_ION_C --error-report WRITE_VERIFY_ION_C_ERROR INPUT_FILE ION_C_ION_C_TEXT_FILE WRITE_STREAM_ION_C_ION_C_BINARY WRITE_STREAM_ION_C_ION_JAVA_TEXT WRITE_STREAM_ION_C_ION_JAVA_BINARY WRITE_STREAM_ION_JAVA_ION_C_TEXT WRITE_STREAM_ION_JAVA_ION_C_BINARY WRITE_STREAM_ION_JAVA_ION_JAVA_TEXT WRITE_STREAM_ION_JAVA_ION_JAVA_BINARY
$ ion-java compare --output WRITE_VERIFY_ION_JAVA --error-report WRITE_VERIFY_ION_JAVA_ERROR INPUT_FILE ION_C_ION_C_TEXT_FILE WRITE_STREAM_ION_C_ION_C_BINARY WRITE_STREAM_ION_C_ION_JAVA_TEXT WRITE_STREAM_ION_C_ION_JAVA_BINARY WRITE_STREAM_ION_JAVA_ION_C_TEXT WRITE_STREAM_ION_JAVA_ION_C_BINARY WRITE_STREAM_ION_JAVA_ION_JAVA_TEXT WRITE_STREAM_ION_JAVA_ION_JAVA_BINARY
Now, verify that the good/non-equivs
semantics still hold for the
re-written streams.
$ ion-c compare --comparison-type non-equivs --output WRITE_VERIFY_EQUIVS_ION_C --error-report WRITE_VERIFY_EQUIVS_ION_C_ERROR INPUT_FILE ION_C_ION_C_TEXT_FILE WRITE_STREAM_ION_C_ION_C_BINARY WRITE_STREAM_ION_C_ION_JAVA_TEXT WRITE_STREAM_ION_C_ION_JAVA_BINARY WRITE_STREAM_ION_JAVA_ION_C_TEXT WRITE_STREAM_ION_JAVA_ION_C_BINARY WRITE_STREAM_ION_JAVA_ION_JAVA_TEXT WRITE_STREAM_ION_JAVA_ION_JAVA_BINARY
$ ion-java compare --comparison-type non-equivs --output WRITE_VERIFY_EQUIVS_ION_JAVA --error-report WRITE_VERIFY_EQUIVS_ION_JAVA_ERROR INPUT_FILE ION_C_ION_C_TEXT_FILE WRITE_STREAM_ION_C_ION_C_BINARY WRITE_STREAM_ION_C_ION_JAVA_TEXT WRITE_STREAM_ION_C_ION_JAVA_BINARY WRITE_STREAM_ION_JAVA_ION_C_TEXT WRITE_STREAM_ION_JAVA_ION_C_BINARY WRITE_STREAM_ION_JAVA_ION_JAVA_TEXT WRITE_STREAM_ION_JAVA_ION_JAVA_BINARY
The same technique is used to verify the comparison and error reports generated by these commands within the test harness as was used in Phase 2.
If the test harness has made it to this point without raising an error, then the test for this vector is successful.
Inevitably, some implementations will fail to pass the tests for certain vectors. Failures should be reported in ErrorReports, but must not cause the entire test run to fail. When an implementation fails a test for a particular vector and the fix for the defect can be deferred, an issue referencing the failure and describing the defect should be added to that implementation's queue. Determining which implementation actually failed may require some investigation. If, for example, one out of N implementations disagrees during verification of another implementation's EventStream, the user must decide which implementation (or implementations) contains a defect and only create an issue for the offending implementation(s).
The test harness will exist in its own repository. It will locally clone the latest commit of ion-tests and all Ion implementations.
Starting a test run will involve triggering a build of each implementation, distributing work to each implementation through that implementation's CLI, determining success or failure of the tests by processing ErrorReports and ComparisonReports, and generating a visual report of the results to be used by developers and prospective users to determine relative compliance between the implementations.
Ultimately, in the spirit of continuous integration, pushing a change to any of the implementations (or ion-tests) should update the test harness's dependency to the latest version and kick off a test run.
The last step is integrating the ion-test-driver into GitHub Actions pipeline to trigger the ion-test-driver for every pull request.
Option --results-diff is able to analyze an existing result file to identify any differences between the two revisions. To compare two revisions of each test file:
- Compare the two revisions'
result
field, and if they both pass, then proceed to the next file. Otherwise proceed to the next step. - Check
read_error
field. If both of them have the same read_error or don't have any errors, proceed to the next step. Otherwise, writeread performance changed
error to the final report and then move on to the next step. - Check
read_compare
field. Analyze the given read_compare report and find all the disagree revision pairs. After extracting the two disagree lists, compare the master branch and new commit using these two cases: 3.1. If they agree with each other, their disagree lists should be the same. Raise acli compare diff
error if they are not the same. 3.2. If they disagree with each other, write down what implementations that the master commit no longer agrees with and what implementations the master commit starts to agree with. - Check
write_error
- refer to step 2. - Check
write_compare
- refer to step 3.
The GitHub Actions logic is located in each implementation's .github/workflow/ion-test-driver.yml
.
The workflow of the pipeline follows the steps stated below when a PR is created in an implementation:
- Running ion-test-driver and including the new commit in it.
- Using --results-diff option to analyze the result from the step above and find the difference between HEAD and the new commit of the implementations.
- If the new commit changes reader/writer behaviors and analysis result returns a non-zero value, open an issue for it.
Not all of the features described above need to be implemented immediately. The core functionality is prioritized in versions 1 and 2, while longer-term needs are addressed in versions 3 and 4.
Version 1 will implement the functionality required to
support all current ion-tests semantics (good
, bad
, good/equivs
,
good/non-equivs
, and embedded streams) in ion-c, ion-python, and
ion-java. Other language implementations can be added incrementally.
Minimally, this involves implementing the following CLI commands and options in each language:
process
--output
--output-format
--error-report
compare
--output
--output-format
--error-report
--comparison-type
This also involves providing the ion-test-harness as a command-line tool that reports its results in the Ion text format.
Version 2 will add more support for tests that include shared symbol table processing. This is not commonly leveraged in the current suite of ion-tests vectors.
This involves adding the following options to the existing CLI commands:
process
--catalog
--imports
compare
--catalog
--imports
At this point, more ion-tests vectors that leverage shared symbol tables (including symbol tokens with unknown text) should be added. ion-tests should also define a set (or sets) of shared symbol tables that may be used to populate a test catalog (or catalogs). Because all valid Ion data can be roundtripped with or without actually resolving the shared symbol table imports, test vectors with local symbol tables that declare shared imports could be run both with and without the test catalog to verify that the implementation correctly handles both cases.
Additionally, the ion-test-harness tool should be enhanced to provide easier-to-read HTML reports, which may be published for the benefit of users.
Version 3 will add fuzz testing for randomly generated traversals (generated by the test harness) over the input data. The results can be used to verify that all implementations behave in the same way for that traversal, regardless of whether the traversal is valid.
Non-normative traversals (e.g. stepping out of a container before consuming all of its values, or stepping over a nested container below depth zero) are essential to test in intra-implementation unit tests, and are known to have been the source of bugs in the past.
This involves adding the following option to the existing CLI commands:
process
--traverse
Version 4 will add features that will improve the CLI's usefulness to users. This includes support for JQ-like filtering, which is a common ask from users who wish Ion had tooling parity with JSON; performance testing and report generation, allowing for the CLI to be used to drive automated cross-implementation performance testing in the future; and shared symbol table extraction from sample data.
This requires adding the extract
command and adding the following
options to the CLI:
process
--perf-report
--filter
extract
--output
--output-format
--error-report
--symtab-name
--symtab-version
-h, help
-v, version
Q: Does the test harness verify correctness?
A: No -- it verifies consensus. Verifying correctness to the spec for a particular implementation must be left to that implementation's unit tests. If at least one of the implementations has correct behavior for a particular test vector, and this test harness confirms consensus among all implementations for that vector, then all implementations have correct behavior for that test vector.
Q: Will this catch intra-implementation symmetrical read-then-write
bugs? For example, given the Ion data 1
, the implementation
incorrectly reads the value 8
into memory. It then incorrectly writes
the value 1
.
A: Not necessarily. Serializing the event stream, which is used to verify read behavior, still requires use of the implementation's Ion writers, which would mask the error (the risk is somewhat lessened by the fact that the streams are re-written as both text and binary Ion, requiring the defect to be present in both writers in order to be masked). For this reason, this test harness cannot fully replace intra-implementation read tests. Note that intra-implementation symmetrical write-then-read bugs WOULD be caught, because each implementation reads the data written by every other implementation in order to drive consensus.
Q: Why use event streams at all? Why not just exchange roundtripped test files?
A: Event streams allow for comparison of reader behavior. This enables verification that expected errors happen at the correct point in the stream, enabling more consistent error behavior between implementations. It also increases the chances that reader errors will be identified as such. Although this can't be used to determine that a scalar value was read incorrectly (because the writer must be used to serialize the value in the event), it can expose that the reader read the structure of the data incorrectly (e.g. by providing the wrong type of event, missing events, or adding superfluous events). This can help the developer narrow the scope of the debugging effort. For example, consider the input data "(++a)". Implementation A correctly reads this as an s-expression with two elements: "++" and "a". Implementation B reads this as an s-expression with a single element: "++a". A's writer minimizes spacing in s-expressions, so both A and B re-write the stream as "(++a)". Although the stream was roundtripped correctly, it was not read correctly by both A and B. This difference would have been caught with event-based read verification.