Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deserialization for elements and attributes with ":" in name #64

Open
spease opened this issue Feb 20, 2018 · 16 comments
Open

Deserialization for elements and attributes with ":" in name #64

spease opened this issue Feb 20, 2018 · 16 comments

Comments

@spease
Copy link

spease commented Feb 20, 2018

This is a considerable problem when trying to parse DTDs. Replacing the colons with underscores allows for parsing otherwise.

@RReverser
Copy link
Owner

Could you please provide more details?

@spease
Copy link
Author

spease commented Feb 21, 2018

https://raw.githubusercontent.com/HUPO-PSI/mzML/master/schema/schema_1.1/mzML1.1.0.xsd

The xml: and xs: in tag and attribute names would not parse (using Serde rename to set the exact name for a Rust struct member). After renaming with _ instead of :, I was able to parse it.

@RReverser
Copy link
Owner

I mean, details on how do you see this working on the Rust side? Obviously : is not a valid part of a Rust identifier, so you still need to either ignore part before : as library currently does or emit full name but then force every consumer of such name to use serde(rename). Not sure which one is better. Or do you have other suggestions?

@oli-obk
Copy link
Contributor

oli-obk commented Feb 23, 2018

In the old xml parser I simply stripped such prefixes. I have no idea what they even do, and if everyone just ignores them anyway, I don't see a reason to keep them.

@RReverser
Copy link
Owner

Yeah, that's what I'm doing too.

@RReverser
Copy link
Owner

@oli-obk While you're here - could you please reply to serde-deprecated/xml#35 (comment)? I left it a while ago but still don't know if it's desirable :)

@spease
Copy link
Author

spease commented Feb 23, 2018

@RReverser I'm accustomed to serde libraries using #[serde(rename)] for such cases rather than throwing out part of the identifier. That's what I've generally done with csv files for sure, but I think I've had the problem with json as well.

A bit of googling shows that part of the identifier is used for namespaces, so beyond being counterintuitive (at least to me) it seems like this would lead to potential name collisions and prevent validation (items with a wrong or nonexistent namespace could not get distinguished from those in the expected nanespace)

@dtolnay may have a better overview of what libraries tend to do however

@TatriX
Copy link

TatriX commented Mar 29, 2019

I've encountered a problem with wordpress xml which has
content:encoded and excerpt:encoded tags. I'm getting:

Error(Custom("duplicate field `encoded`")

See https://gist.github.com/iwek/3977831

@TatriX
Copy link

TatriX commented Mar 29, 2019

I did a workaround for now:

   pub encoded: Vec<String>, // encoded[0] is `content:encoded`

Though it's not very reliable.

@apiraino
Copy link

apiraino commented Oct 9, 2019

Hello,

stumbled on this issue. Another way to reproduce is also an XML such as:

<title>This is the title</title>
<itunes:title>This is the repeated title because why not</itunes:title>

so no suggested workaround works (using serde rename or parsing the field as an array).

Can you advice on the suggested way to cope with this without touching the source XML? Sadly, I'm already thinking to pre-process the XML as suggested by @spease or dropping this crate and directly XML parsing (f.e. with quick-xml).

Thank you in advance for any hint (I just tried this library, so I may have missed something).

EDIT: on a second thought, this seems to work

    #[serde(rename = "itunes:title", default)]
    title: String,

and ignore the other title field.

@punkstarman
Copy link
Collaborator

punkstarman commented Oct 9, 2019

The problem is that this library has very limited support for namespaces. The deserializer will ignore the namespace. The serializer is currently incapable of generating a document using namespaces.

@apiraino, are you sure that the title field is really filled using your workaround? I think that it will always contain the default value which is an empty string.

@apiraino
Copy link

apiraino commented Oct 9, 2019

hey @punkstarman thanks for the reply. Damn, you're right the field is set to a default empty string ( I was confused by too many fields).

Besides the lack of support for namespaces (which is a feature), the real issue I see is that the parser panics when a tag with namespace is found. is there a way to avoid this? I'd avoid to pre-process the XML.

I hope that this library lifecycle will move forward, it's actually the only good option to work on XML files the way we're used with serde. Thanks for working on this library!

@punkstarman
Copy link
Collaborator

The parser doesn't panic when it encounters a tag with namespace. It just lops off the namespace part and produces a field with the remainder. The parser panics when it tries to fit two XML elements with the same name into a single Rust struct field that is of collection type.

@spease
Copy link
Author

spease commented Oct 10, 2019

This seems like it should be an error rather than a panic. An application could recover from it.

@punkstarman
Copy link
Collaborator

@spease , panic was a poor choice of words. It is in fact an error (for example see #64 (comment)).

@AntoniosBarotsis
Copy link

AntoniosBarotsis commented Aug 30, 2022

I am trying to parse an RSS feed and the part that is relevant looks like this

<link>...</link>
<atom:link href="..." rel="self" type="application/rss+xml"/>

I want to get the value of link.

The thread doesn't seem to have a concrete solution for this but posting anyway in case someone came up with one and just didn't reply.

I tried setting link to a String vector but that doesn't work for me.

Is there perhaps a different workaround for this since the atom:link element does not even have a value unlike link?

Any news on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants