Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parser output for elements with property class and root class names? #51

Open
jgarber623 opened this issue May 12, 2020 · 3 comments
Open

Comments

@jgarber623
Copy link
Member

Following up on a conversation I started in chat today, I'd like to clarify a section in the parsing spec related to generating output for parsed elements containing both property class and root class names.

The wording from section 1.2 of the parsing spec (emphasis added):

  • parse a child element for microformats (recurse)
    • if that child element itself has a microformat ("h-*" or backcompat roots) and is a property element, add it into the array of values for that property as a { } structure, add to that { } structure:
      • value:
        • if it's a p-* property element, use the first p-name of the h-* child
        • else if it's an e-* property element, re-use its { } structure with existing value: inside.
        • else if it's a u-* property element and the h-* child has a u-url, use the first such u-url
        • else use the parsed property value per p-*, u-*, dt-* parsing respectively

The test suite includes test cases for p-* and u-* (see microformats-v2/h-entry/impliedvalue-nested.html, for instance) properties, but I couldn't find a test case against an e-* property whose element also had a root class name.

I interpret "re-use its { } structure with existing value: to mean that the nested item's value should be set to the hash structure. That would result in something like:

"value": {
  "html": "",
  "value": ""
}

Current Behavior

Using a contrived markup example like:

<div class="h-entry">
  <div class="e-content h-card">
    <p class="p-name">Jason Garber</p>
  </div>
</div>

…parsers currently output results like:

{
  "items": [
    {
      "type": ["h-entry"],
      "properties": {
        "content": [
          {
            "type": ["h-card"],
            "properties": {
              "name": ["Jason Garber"]
            },
            "html": "<p class=\"p-name\">Jason Garber</p>",
            "value": "Jason Garber"
          }
        ]
      }
    }
  ]
}

Expected Behavior

Using the same markup example, and by my interpretation of the specification, I'd expect output like:

{
  "items": [
    {
      "type": ["h-entry"],
      "properties": {
        "content": [
          {
            "type": ["h-card"],
            "properties": {
              "name": ["Jason Garber"]
            },
            "value": {
              "html": "<p class=\"p-name\">Jason Garber</p>",
              "value": "Jason Garber"
            }
          }
        ]
      }
    }
  ]
}

Proposals?

Which of the above is a correct interpretation of the spec? Existing evidence from parsers and the non-authoritative microformats2-json wiki page point to those being the correct interpretation despite the unclear wording in the spec.

Is that the consensus of the community? If so, we should find a way to re-word the spec. If not, we should find a way to re-word the spec.

Thanks for reading! Looking forward to feedback.

@gRegorLove
Copy link
Member

I think the current behavior listed above results in a more consistent result for consumers, with html and value appearing in a consistent location and value always being a string.

@aimee-gm
Copy link
Member

aimee-gm commented May 13, 2020

So, it turns out that because this isn't included in the test suite, I managed to skip that line in the specification.

I don't want to get too involved in what the values should be (I would like to know though!), but a couple of comments:

  • The root element will now have a html property - this is described no-where in the specification so cannot be expected to be there.
  • I noticed there's a similarity here with images with an alt. I believe there is more ambiguity here too.

Take the markup:

<div class="h-entry">
  <img class="u-photo h-card" alt="My name" src="/photo.jpg">
</div>

Looking at the specification:

else if it's a u-* property element and the h-* child has a u-url, use the first such u-url

The photo above doesn't have a url property, so it falls back to the photo property from:

else use the parsed property value per p-,u-,dt-* parsing respectively

As it has no nested u-photo, it becomes an implied photo, whose value comes from:

if img.h-x[src], then use the result of "parse an img element for src and alt" (see Sec.1.5) for photo

Which means it should be: { value: "...", alt: "..." }. This then becomes the complete value of the h-card based on the above specification.

Expected output

{
      "type": ["h-entry"],
      "properties": {
        "photo": [
          {
            "type": ["h-card"],
            "properties": {
              "name": ["My name"],
              "photo": [
                { "alt": "My name", "value": "http://example.com/photo.jpg" }
              ]
            },
            "value": {
              "alt": "My name",
              "value": "http://example.com/photo.jpg"
            }
          }
        ]
      }
    }

Here, the PHP parse at microformats.io doesn't parse the alt at all at any level here, I believe incorrectly, so I've omitted it's output.

Again, the contents of value would no-longer be a string. How should these be handled?

The way I've decided to interpret this is to take the value out of the nested property.

@gRegorLove
Copy link
Member

The root element will now have a html property - this is described no-where in the specification so cannot be expected to be there.

I'm not sure I understand this part. What do you mean by root element? I would expect the parsed content property to have an html property in both cases.

In the common e-content example:

<div class="h-entry">
<div class="e-content"><p>This is the content</p></div>
</div>

The parsed result is:

"items": [
    {
        "type": [
            "h-entry"
        ],
        "properties": {
            "content": [
                {
                    "html": "<p>This is the content</p>",
                    "value": "This is the content"
                }
            ]
        }
    }
]

Adding a nested h-card:

<div class="h-entry">
<div class="e-content h-card"><p>This is the content</p></div>
</div>

I would expect the parse to be:

"items": [
    {
        "type": [
            "h-entry"
        ],
        "properties": {
            "content": [
                {
                    "type": [
                        "h-card"
                    ],
                    "properties": {
                        "name": [
                            "This is the content"
                        ]
                    },
                    "html": "<p>This is the content</p>",
                    "value": "This is the content"
                }
            ]
        }
    }
]

Images are a special case where if there's an alt, the parsed result will be an object, otherwise a string. (alt parsing is in php-mf2 master branch and hopefully will be in a new release soon.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants