Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2.0: Replace title= props with data-ocr-* attributes #77

Open
kba opened this issue Oct 22, 2016 · 2 comments
Open

2.0: Replace title= props with data-ocr-* attributes #77

kba opened this issue Oct 22, 2016 · 2 comments
Milestone

Comments

@kba
Copy link
Owner

kba commented Oct 22, 2016

Reusing the title= attribute of HTML elements for OCR-specific values is bad practice. It's understandable since at the time of hOCR's initial development, there were few mechanisms to extend HTML, but in HTML5, there are quite a few.

In a (possible) next major revision of the standard, we could use data-ocr-* attributes for that purpose.

<span id="line1" class="ocr_line" title="bbox 0 0 100 100">...</span>

could be expressed as

<span id="line1" data-ocr-tag="line" data-ocr-bbox="[0,0,100,100]"> ... </span>

This is more verbose but it would make it much easier to specify behavior and work with the content, i.e. in Javascript, you could do:

var line = document.querySelector("#line1");
var bbox = JSON.parse(line.dataset.ocrBbox);
var width = ocrBbox[2] - ocrBbox[0];
@kba kba added this to the Version 2.0 milestone Oct 22, 2016
@zuphilip
Copy link
Collaborator

I think the data-ocr-* attributes would be a good way to continue. But is there any reason to change the class as well? This is standard HTML and has very good support like document.getElementsByClassName("ocr_line").

@kba
Copy link
Owner Author

kba commented Oct 22, 2016

It would make it easier to map between formats (ALTO) and serializations, if the OCR application profile of the HTML would be uniform, i.e. you wouldn't force a naming convention on class, id or title.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants