-
Notifications
You must be signed in to change notification settings - Fork 8
Home
Christian Clausner edited this page Jul 19, 2018
·
5 revisions
PAGE (Page Analysis and Ground Truth Elements) is a collection of XML formats, developed and maintained by the PRImA Research Lab at the University of Salford, UK.
See also: http://www.primaresearch.org/tools/PAGELibraries http://www.primaresearch.org/publications/ICPR2010_Pletschacher_PAGE
The most actively used XML formats are:
- PAGE XML for page content (regions, text lines, words, glyphs, reading order, text content, ...)
- PAGE XML for layout analysis evaluation (evaluation profiles, evaluation results, ...)
- PAGE XML for document image dewarping (dewarping grids)
All formats are defined by an XML schema, hosted officially on primaresearch.org: http://www.primaresearch.org/schema/PAGE/gts/pagecontent/2018-07-15/pagecontent.xsd http://www.primaresearch.org/schema/PAGE/eval/layout/2013-07-15/layouteval.xsd http://www.primaresearch.org/schema/PAGE/gts/dewarping/2014-08-26/dewarping.xsd
Documentation on each schema can be found in the respective folders in this GitHub repository.