-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Display document page metadata #16
Comments
This would be very useful! Unfortunately it will only work for ALTO though, since for PAGE-XML there is no such provenance but one rather has to fallback on the METS container instead. Also note that the |
For PAGE files: <pc:Metadata>
<pc:Creator>OCR-D/core 2.17.0</pc:Creator>
<pc:Created>2020-10-02T09:13:28</pc:Created>
<pc:LastChange>2020-10-02T09:13:28</pc:LastChange>
<pc:MetadataItem type="processingStep" name="preprocessing/optimization/binarization" value="ocrd-olena-binarize">
<pc:Labels>
<pc:Label value="sauvola-ms-split" type="impl"/>
<pc:Label value="0.34" type="k"/>
<pc:Label value="0" type="win-size"/>
<pc:Label value="0" type="dpi"/>
</pc:Labels>
</pc:MetadataItem>
<pc:MetadataItem type="processingStep" name="layout/segmentation/region" value="ocrd-sbb-textline-detector">
<pc:Labels externalModel="ocrd-tool" externalId="parameters">
<pc:Label value="/var/lib/textline_detection" type="model"/>
</pc:Labels>
</pc:MetadataItem>
<pc:MetadataItem type="processingStep" name="recognition/text-recognition" value="ocrd-calamari-recognize">
<pc:Labels externalModel="ocrd-tool" externalId="parameters">
<pc:Label value="/var/lib/calamari-models/GT4HistOCR/2019-07-22T15_49+0200/*.ckpt.json" type="checkpoint"/>
<pc:Label value="glyph" type="textequiv_level"/>
<pc:Label value="confidence_voter_default_ctc" type="voter"/>
<pc:Label value="0.001" type="glyph_conf_cutoff"/>
</pc:Labels>
</pc:MetadataItem>
</pc:Metadata> |
But note that only PAGE files produced by OCR-D include this information - I am not aware of any other tool producing PAGE output currently populating this section in this way. |
Yeah, if it's not there it will not be displayed. |
ALTO files contains meta information like this:
The report should display it.
The text was updated successfully, but these errors were encountered: