Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validation doesn’t produce violations in a deterministic order #206

Open
jobarr-amzn opened this issue Feb 7, 2024 · 3 comments
Open
Labels
bug Something isn't working

Comments

@jobarr-amzn
Copy link

As reported through another channel by an ion-schema-rust user:

#[test]
fn test_non_determinism() {
    use ion_rs::element::Element;
    use ion_schema::{
        system::SchemaSystem,
        isl::{
            IslSchema,
            isl_constraint::v_1_0::fields,
            isl_type::v_1_0::named_type,
            isl_type_reference::v_1_0::{
                variably_occurring_type_ref,
                named_type_ref,
            },
            isl_range::Range
        }
    };

    let element = Element::read_one("{str: 123, number: \"fox\"}").unwrap();

    assert!((0..20).map(|_| {
        let struct_type = named_type("my_type", vec![fields([
            ("str".into(), variably_occurring_type_ref(named_type_ref("string"), Range::required())),
            ("number".into(), variably_occurring_type_ref(named_type_ref("int"), Range::required())),
        ].into_iter())]);
        let isl_schema = IslSchema::schema_v_1_0("my_schema", vec![], vec![struct_type], vec![], vec![]);
        let mut schema_system = SchemaSystem::new(vec![]);
        let schema = schema_system.load_schema_from_isl_schema_v1_0(isl_schema).unwrap();
        let type_definition = schema.get_type("my_type").unwrap();
        let violation = type_definition.validate(&element).unwrap_err();
        violation
            .flattened_violations()
            .into_iter()
            .cloned()
            // Uncomment to make this test pass
            // .sorted_by(|left, right| left.message().cmp(right.message()))
            .collect::<Vec<_>>()
    }).all_equal());
}

As far as I can tell this is because HashMap is used to store field constraints and its hash function is randomly seeded.

@jobarr-amzn jobarr-amzn added the bug Something isn't working label Feb 7, 2024
@jobarr-amzn
Copy link
Author

Version information for the above:

[[package]]
name = "ion-schema"
version = "0.9.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "f4b66b5bf004a664fce2936505d4fdced4e7e50f8e911c7306b9aa2d4c754eab"
dependencies = [
 "half",
 "ion-rs",
 "num-bigint 0.3.3",
 "num-traits",
 "regex 1.10.3",
 "thiserror",
]

@popematt
Copy link
Contributor

As far as I can tell this is because HashMap is used to store field constraints and its hash function is randomly seeded.

I don't think there's anything wrong with this part. Ion Schema Language is a declarative rather than procedural language, the constraints are defined in an Ion struct (which is an unordered data structure), and Ion Schema makes no guarantees about the order in which constraints are evaluated.

The problem is that the order of execution is leaking into the user-facing output, and the solution should be to change the violations output to use a set instead of a Vec. If that is too much of a breaking change, then we can recommend that users map the violations to a set before comparing them.

Long term, we could provide some functions or macros to assist with writing tests for schemas.

Finally, this is concerning to me:

// Uncomment to make this test pass
// .sorted_by(|left, right| left.message().cmp(right.message()))

The violation messages are not guaranteed to be stable, so this is a very brittle comparison. It would be much better to check the source of the violation (i.e. which constraint produced this violation; and when relevant e.g. which list element in the input data).

@desaikd
Copy link
Contributor

desaikd commented Mar 11, 2024

With #208, as a short-term solution added two macros assert_equivalent_violations and assert_non_equivalent_violations for Violation comparison.
As per the discussion in #208, long-term solution would be to remove the tree structure of Violations.

I think the best long-term solution is to rewrite Violations to get rid of the tree structure completely.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants