You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Reusing the title= attribute of HTML elements for OCR-specific values is bad practice. It's understandable since at the time of hOCR's initial development, there were few mechanisms to extend HTML, but in HTML5, there are quite a few.
In a (possible) next major revision of the standard, we could use data-ocr-* attributes for that purpose.
I think the data-ocr-* attributes would be a good way to continue. But is there any reason to change the class as well? This is standard HTML and has very good support like document.getElementsByClassName("ocr_line").
It would make it easier to map between formats (ALTO) and serializations, if the OCR application profile of the HTML would be uniform, i.e. you wouldn't force a naming convention on class, id or title.
Reusing the
title=
attribute of HTML elements for OCR-specific values is bad practice. It's understandable since at the time of hOCR's initial development, there were few mechanisms to extend HTML, but in HTML5, there are quite a few.In a (possible) next major revision of the standard, we could use
data-ocr-*
attributes for that purpose.could be expressed as
This is more verbose but it would make it much easier to specify behavior and work with the content, i.e. in Javascript, you could do:
The text was updated successfully, but these errors were encountered: