Support for all characters #6

PierreVDL · 2016-12-06T12:33:47Z

According to the xml standard e.g. http://www.w3.org/TR/xml/#sec-references
one can refer to any charachter (including non-printables) using &#\d+; and &#\h+;.
However, these characters seem to be ignored by hexpat.

For example, the HUnit test case

testSingleEscapedTextNode :: Test
testSingleEscapedTextNode = TestCase $ 
    let nodeName = "singleNode" in
        let nodeText = "a text with escaped characters &amp; &lt; &gt; &#x12; &#x07; &#x1B; is not correctly handled" in
        let (xml, mErr) = ( parse defaultParseOptions (pack $ map c2w ("<" ++ nodeName ++ ">" ++ nodeText ++ "</" ++ nodeName ++ ">") ) ) :: (UNode String, Maybe XMLParseError)  in do
            assertEqual "Single Node" xml (Element nodeName [] [Text nodeText])

yields as output

### Failure in: 0:Library Tests:1:Text.XML.Expat.Tree:3:Single Escaped Text Node
Single Node
expected: Element "singleNode" [] [Text "a text with escaped ",Text "characters ",Text "&",Text " ",Text "<",Text " ",Text ">",Text " "]
 but got: Element "singleNode" [] [Text "a text with escaped characters &amp; &lt; &gt; &#x12; &#x07; &#x1B; is not correctly handled"]

The list is wrong since it contains no elements after the first &#\h+; character.

Note: I know there is also an error in my test: it assumes only one Text element not a list of Text elements, but this is irrelevant for this problem!

The text was updated successfully, but these errors were encountered:

PierreVDL · 2016-12-06T15:28:01Z

See also https://en.wikipedia.org/wiki/Valid_characters_in_XML.
The characters are valid in XML 1.1

hartwork · 2021-05-23T19:17:11Z

Please note that the underlying parser is implementing XML 1.0 fourth edition, neither XML 1.1 nor XML 1.0 fifth edition. Expat ticket libexpat/libexpat#171 may be of interest.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for all characters #6

Support for all characters #6

PierreVDL commented Dec 6, 2016

PierreVDL commented Dec 6, 2016

hartwork commented May 23, 2021 •

edited

Loading

Support for all characters #6

Support for all characters #6

Comments

PierreVDL commented Dec 6, 2016

PierreVDL commented Dec 6, 2016

hartwork commented May 23, 2021 • edited Loading

hartwork commented May 23, 2021 •

edited

Loading