Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Round-trip failure: untagged enum containing Vec variant is improperly deserialized #357

Closed
Tracked by #397
obi1kenobi opened this issue Jan 4, 2022 · 13 comments · Fixed by #477
Closed
Tracked by #397
Labels

Comments

@obi1kenobi
Copy link

Consider the following types:

#[derive(Debug, PartialEq, Eq, Serialize, Deserialize)]
#[serde(untagged)]
enum MyValue {
    Int(i64),
    String(String),
    Enum(Enum),
    List(Vec<MyValue>),
}

#[derive(Debug, PartialEq, Eq, Serialize, Deserialize)]
enum Enum {
    First(String),
    Second(i64),
}

Let's try serializing a MyValue::Enum value like the following: MyValue::Enum(Enum::First("foo".to_string())). Its RON-serialized value is First("foo").

That First("foo") value unfortunately deserializes to MyValue::List([MyValue::String("foo")]) instead of our original enum-based value. This is surprising to me, because the deserialization code decides to ignore the First prefix entirely, and skip forward until the ( then deserialize a list with it. It's especially surprising because the MyValue::Enum variant is before the MyValue::List variant, and yet we still get a MyValue::List back instead of MyValue::Enum.

Here's a complete repro:

#[cfg(test)]
mod tests {
    use serde::{Serialize, Deserialize};

    #[derive(Debug, PartialEq, Eq, Serialize, Deserialize)]
    #[serde(untagged)]
    enum MyValue {
        Int(i64),
        String(String),
        Enum(Enum),
        List(Vec<MyValue>),
    }

    #[derive(Debug, PartialEq, Eq, Serialize, Deserialize)]
    enum Enum {
        First(String),
        Second(i64),
    }

    fn serialize_then_deserialize(value: &MyValue) -> MyValue {
        ron::from_str(ron::to_string(value).unwrap().as_str()).unwrap()
    }

    #[test]
    fn test() {
        let value = MyValue::Enum(Enum::First("foo".to_string()));
        let deserialized = serialize_then_deserialize(&value);
        assert_eq!(value, deserialized, "Serialized as: {}", ron::to_string(&value).unwrap());
    }
}

This fails with:

running 1 test
thread 'file::tests::test' panicked at 'assertion failed: `(left == right)`
  left: `Enum(First("foo"))`,
 right: `List([String("foo")])`: Serialized as: First("foo")', src/file.rs:X:9

Thanks to everyone that contributes to building and maintaining this awesome library! Any advice on avoiding this issue would be much appreciated.

@juntyr
Copy link
Member

juntyr commented Jan 5, 2022

I'll take a look now

@juntyr
Copy link
Member

juntyr commented Jan 5, 2022

Ok, I think I can at least explain what's happening here. #[serde(untagged)] first deserializes into a private Content value type to buffer what has been found. Then all variants attempt to deserialize from that Content type in order, and the first one to succeed returns. The issue is that RON is more expressive than serde's data model, so when the Content type is created with deserialize_any, the information that First("foo") looks like a tuple struct with name First is lost and all that remains is the information that it looks like a tuple, i.e. sequence ("foo",). @torkleyy would this issue be fixed in your future reboot? I'm not sure if this can be fixed in RON as the encoding through serde isn't lossless (unlike for JSON), but I'll see if I can find something.

@juntyr
Copy link
Member

juntyr commented Jan 5, 2022

#253 might be related, I'll check if it would fix the issue

@juntyr
Copy link
Member

juntyr commented Jan 5, 2022

With #253 the test case would pass, but:

1. Unit variants now deserialize to Value::String instead of Value::Unit:

// No longer works
assert_eq!("Foo".parse(), Ok(Value::Unit));
// This is now the new result
assert_eq!("Foo".parse(), Ok(Value::String(String::from("Foo"))));

2. A potential bug with earlier untagged enum variants is fixed as the extraneous If tag is no longer accepted in:

BuildSystem(
    version: "1.0.0",
    flags: [
        "--enable-thing",
        "--enable-other-thing",
        If("some-conditional", ["--enable-third-thing"]),
    ]
)

3. Anything that looks named-struct-like now deserializes to a singular Value::Map with one key, the struct name, and one value, the inner content:

assert_eq!("Room(width: 20)".parse(), Ok(
    Value::Map(
        vec![(
            Value::String("Room".to_owned()),
            Value::Map(
                vec![(
                    Value::String("width".to_owned()),
                    Value::Number(Number::new(20)),
                )].into_iter().collect(),
            ),
        )].into_iter().collect(),
    )
);

4. Newtype structs no longer deserialize if going through Value. If the struct name is not mentioned, e.g. (a: 42), this can be fixed with a small change in #253. However, if the struct name is mentioned, e.g. Newtype(a: 42), the pass through Value no longer works as the extra map layer is not expected by the newtype struct deserialization.

@juntyr
Copy link
Member

juntyr commented Jan 5, 2022

@torkleyy @kvark What do you think about this tradeoff? I could try to reopen and refactor #253.

juntyr added a commit to juntyr/ron that referenced this issue Jan 5, 2022
juntyr added a commit to juntyr/ron that referenced this issue Jan 5, 2022
@juntyr
Copy link
Member

juntyr commented Jan 5, 2022

Maybe at least part of #4 could be solved by adding some extra functionality to Value when doing non-self-describing deserialization.

@juntyr
Copy link
Member

juntyr commented Jan 6, 2022

Ok, I played around with it a bit more to get a feel of what a rough implementation could look like. It's essentially #253 but with (a) a fix that makes unnamed struct-likes work again (though this fix requires a lot more though to ensure it actually works in all edge cases) and (b) special handling in the Value Deserializer impl to get all types of structs and enums to properly roundtrip through Value. Some of that is fixing issues that #253 introduces to get untagged enums to work, but I think getting enums to roundtrip through Value might be new? Most of the new Value code is adapted from serde internals for now and could be cleaned up a lot.

@juntyr
Copy link
Member

juntyr commented Jan 6, 2022

You can have a look here master...MomoLangenstein:253-untagged-enums to see roughly what would be changed.

@juntyr
Copy link
Member

juntyr commented Jan 6, 2022

I guess my summary is that

  • it is possible
  • only Value and deserialize_any are affected at all
  • the semantics of Value change a bit as anything struct-like including enums are now encoded in the JSON way serde forces on us if we want to use untagged enums (i.e. they are wrapped in a 1-element map that goes from name to content)
  • it requires a lot of extra code in the Value Deserializer but it should only add functionality

@NiklasEi
Copy link

NiklasEi commented Aug 8, 2022

@MomoLangenstein has there been any further development on this issue?

@juntyr
Copy link
Member

juntyr commented Aug 8, 2022

@NiklasEi I haven't done any more work on this since. If there is interest, I could definitely have another look and see if I could resolve some of the issues I came across during the initial attempt. However, this method seems to require serde's unfortunate enum workaround in any case ... which is unfortunate given that ron is more expressive than json. So if we'd want this to work, it might require a tradeoff for deserialize_any and Value. @torkleyy @kvark What would be your thoughts on this? Would it be worth looking into further?

@juntyr
Copy link
Member

juntyr commented Aug 18, 2023

Thank you @obi1kenobi for providing a very good test case - I just realised that #451 fixed the issue and have added a test to ensure it remains working.

@juntyr juntyr removed the serde-bug label Aug 18, 2023
@obi1kenobi
Copy link
Author

My pleasure, thanks for the fix! 🙌

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants