Deserialization for elements and attributes with ":" in name #64

spease · 2018-02-20T05:23:12Z

This is a considerable problem when trying to parse DTDs. Replacing the colons with underscores allows for parsing otherwise.

RReverser · 2018-02-20T22:17:29Z

Could you please provide more details?

spease · 2018-02-21T02:17:32Z

https://raw.githubusercontent.com/HUPO-PSI/mzML/master/schema/schema_1.1/mzML1.1.0.xsd

The xml: and xs: in tag and attribute names would not parse (using Serde rename to set the exact name for a Rust struct member). After renaming with _ instead of :, I was able to parse it.

RReverser · 2018-02-23T10:09:39Z

I mean, details on how do you see this working on the Rust side? Obviously : is not a valid part of a Rust identifier, so you still need to either ignore part before : as library currently does or emit full name but then force every consumer of such name to use serde(rename). Not sure which one is better. Or do you have other suggestions?

oli-obk · 2018-02-23T10:16:51Z

In the old xml parser I simply stripped such prefixes. I have no idea what they even do, and if everyone just ignores them anyway, I don't see a reason to keep them.

RReverser · 2018-02-23T10:30:44Z

Yeah, that's what I'm doing too.

RReverser · 2018-02-23T10:32:49Z

@oli-obk While you're here - could you please reply to serde-deprecated/xml#35 (comment)? I left it a while ago but still don't know if it's desirable :)

spease · 2018-02-23T18:11:07Z

@RReverser I'm accustomed to serde libraries using #[serde(rename)] for such cases rather than throwing out part of the identifier. That's what I've generally done with csv files for sure, but I think I've had the problem with json as well.

A bit of googling shows that part of the identifier is used for namespaces, so beyond being counterintuitive (at least to me) it seems like this would lead to potential name collisions and prevent validation (items with a wrong or nonexistent namespace could not get distinguished from those in the expected nanespace)

@dtolnay may have a better overview of what libraries tend to do however

TatriX · 2019-03-29T13:10:44Z

I've encountered a problem with wordpress xml which has
content:encoded and excerpt:encoded tags. I'm getting:

Error(Custom("duplicate field `encoded`")

See https://gist.github.com/iwek/3977831

TatriX · 2019-03-29T13:13:47Z

I did a workaround for now:

   pub encoded: Vec<String>, // encoded[0] is `content:encoded`

Though it's not very reliable.

apiraino · 2019-10-09T16:46:23Z

Hello,

stumbled on this issue. Another way to reproduce is also an XML such as:

<title>This is the title</title>
<itunes:title>This is the repeated title because why not</itunes:title>

so no suggested workaround works (using serde rename or parsing the field as an array).

Can you advice on the suggested way to cope with this without touching the source XML? Sadly, I'm already thinking to pre-process the XML as suggested by @spease or dropping this crate and directly XML parsing (f.e. with quick-xml).

Thank you in advance for any hint (I just tried this library, so I may have missed something).

EDIT: on a second thought, this seems to work

    #[serde(rename = "itunes:title", default)]
    title: String,

and ignore the other title field.

punkstarman · 2019-10-09T19:33:48Z

The problem is that this library has very limited support for namespaces. The deserializer will ignore the namespace. The serializer is currently incapable of generating a document using namespaces.

@apiraino, are you sure that the title field is really filled using your workaround? I think that it will always contain the default value which is an empty string.

apiraino · 2019-10-09T21:50:55Z

hey @punkstarman thanks for the reply. Damn, you're right the field is set to a default empty string ( I was confused by too many fields).

Besides the lack of support for namespaces (which is a feature), the real issue I see is that the parser panics when a tag with namespace is found. is there a way to avoid this? I'd avoid to pre-process the XML.

I hope that this library lifecycle will move forward, it's actually the only good option to work on XML files the way we're used with serde. Thanks for working on this library!

punkstarman · 2019-10-10T07:58:42Z

The parser doesn't panic when it encounters a tag with namespace. It just lops off the namespace part and produces a field with the remainder. The parser panics when it tries to fit two XML elements with the same name into a single Rust struct field that is of collection type.

spease · 2019-10-10T16:09:57Z

This seems like it should be an error rather than a panic. An application could recover from it.

punkstarman · 2019-10-11T08:24:43Z

@spease , panic was a poor choice of words. It is in fact an error (for example see #64 (comment)).

AntoniosBarotsis · 2022-08-30T16:18:31Z

I am trying to parse an RSS feed and the part that is relevant looks like this

<link>...</link>
<atom:link href="..." rel="self" type="application/rss+xml"/>

I want to get the value of link.

The thread doesn't seem to have a concrete solution for this but posting anyway in case someone came up with one and just didn't reply.

I tried setting link to a String vector but that doesn't work for me.

Is there perhaps a different workaround for this since the atom:link element does not even have a value unlike link?

Any news on this?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deserialization for elements and attributes with ":" in name #64

Deserialization for elements and attributes with ":" in name #64

spease commented Feb 20, 2018

RReverser commented Feb 20, 2018

spease commented Feb 21, 2018

RReverser commented Feb 23, 2018

oli-obk commented Feb 23, 2018

RReverser commented Feb 23, 2018

RReverser commented Feb 23, 2018

spease commented Feb 23, 2018

TatriX commented Mar 29, 2019

TatriX commented Mar 29, 2019

apiraino commented Oct 9, 2019 •

edited

Loading

punkstarman commented Oct 9, 2019 •

edited

Loading

apiraino commented Oct 9, 2019 •

edited

Loading

punkstarman commented Oct 10, 2019

spease commented Oct 10, 2019

punkstarman commented Oct 11, 2019

AntoniosBarotsis commented Aug 30, 2022 •

edited

Loading

Deserialization for elements and attributes with ":" in name #64

Deserialization for elements and attributes with ":" in name #64

Comments

spease commented Feb 20, 2018

RReverser commented Feb 20, 2018

spease commented Feb 21, 2018

RReverser commented Feb 23, 2018

oli-obk commented Feb 23, 2018

RReverser commented Feb 23, 2018

RReverser commented Feb 23, 2018

spease commented Feb 23, 2018

TatriX commented Mar 29, 2019

TatriX commented Mar 29, 2019

apiraino commented Oct 9, 2019 • edited Loading

punkstarman commented Oct 9, 2019 • edited Loading

apiraino commented Oct 9, 2019 • edited Loading

punkstarman commented Oct 10, 2019

spease commented Oct 10, 2019

punkstarman commented Oct 11, 2019

AntoniosBarotsis commented Aug 30, 2022 • edited Loading

apiraino commented Oct 9, 2019 •

edited

Loading

punkstarman commented Oct 9, 2019 •

edited

Loading

apiraino commented Oct 9, 2019 •

edited

Loading

AntoniosBarotsis commented Aug 30, 2022 •

edited

Loading