-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
define Standard Parameters for dpi, input-level, output-level #136
base: master
Are you sure you want to change the base?
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -10,6 +10,84 @@ services](swagger). | |
|
||
To validate a `ocrd-tool.json` file, use `ocrd ocrd-tool /path/to/ocrd-tool.json validate`. | ||
|
||
## Standard parameters | ||
|
||
There is a number of parameters common to all processors that MUST be supported by processors. | ||
|
||
### `dpi` | ||
|
||
Custom DPI to assume for pixel density of images. | ||
Comment on lines
+17
to
+19
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I have already proposed a better solution for DPI overrides: by static annotation in the respective PAGE-XML attributes. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I do. I'd still offer it as a standard parameter with There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah, I see. Makes sense (as long as we make sure these standard parameters are really automatic for tools). |
||
|
||
MUST default to 300. | ||
|
||
Comment on lines
+21
to
+22
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. A default here? If you don't make this optional, then trusting the binary image meta-data density value becomes impossible. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good point. But we need a fallback somewhere to activate if
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Indeed. The only place I have seen in the spec for that was your recent addition to mets.md:
In practise, this is all just a contract around behaviour in core's OcrdExif (or whatever that will be named after we respect the PAGE-XML overrides). |
||
### `input-level` | ||
|
||
On what level of typography should input images be processed? | ||
|
||
Processors MAY define a `default` value. | ||
|
||
`enum` MUST be a list of one or more of: | ||
|
||
* `page` | ||
* `block` | ||
* `line` | ||
* `word` | ||
* `glyph` | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. IMHO we should only have There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I removed the section for input-level here and added an example what behavior should be like if the expected input-level isn't provided. It's probably more productive to continue that discussion on input/output-level in #134. I've assgned it to the reviewers here, so let's settle that first and then I'll update the PR. |
||
### `output-level` | ||
|
||
On what level of typography should output images be produced? | ||
|
||
Processors MAY define a `default` value. | ||
|
||
`enum` MUST be a list of one or more of: | ||
|
||
* `page` | ||
* `block` | ||
* `line` | ||
* `word` | ||
* `glyph` | ||
|
||
Whether `input-level` and `output-level` match semantically is up to the | ||
processor. I.e. if `input-level` and `output-level` are inconsistent according | ||
to its semantics, processors MUST refuse further processing. | ||
|
||
### Sample for standard parameters | ||
|
||
Here is a snippet of an `ocrd-tool.json` for a tool that can operate on `page`, `block` or `line` level | ||
and produce output on `block`, `line` or `glyph` level, e.g. [ocrd-cis-ocropy-segment](https://github.com/cisocrgroup/ocrd_cis/blob/dev/ocrd_cis/ocropy/segment.py): | ||
|
||
```hjson | ||
{ | ||
[...] | ||
"parameter": { | ||
"dpi": { | ||
"type": "number", | ||
"default": 300, | ||
}, | ||
"input-level": { | ||
"type": "string": | ||
"enum": ["page", "block", "line"], | ||
"default": "page" | ||
} | ||
"output-level": { | ||
"type": "array": | ||
"item": { | ||
"type": "string", | ||
"enum": ["block", "line", "glyph"], | ||
} | ||
"default": "block" | ||
} | ||
} | ||
} | ||
``` | ||
|
||
Some sample parameters by the user and how they are passed to the processor: | ||
|
||
* `{}` --> `{"dpi": 300, "input-level": "page", "output-level": "block"}` | ||
* `{"dpi": 72}` --> `{"dpi": 72, "input-level": "page", "output-level": "block"}` | ||
* `{"input-level": "glyph"}` --> `{"dpi": 72, "input-level": "glyph", "output-level": "block"}` (This should in all likelihood be an error since it's highly unlikely that `output-level` is above the `input-level` but that is to be handled by processor) | ||
|
||
## File parameters | ||
|
||
To mark a parameter as expecting the address of a file, it must declare the | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought the idea was actually to provide extra parameters automatically to all processors (without any need for specification/validation and implementation in the individual processors). So this would have to enter the spec at CLI.md, not here – right?
And besides using extra parameters, I have proposed other mechanisms (like extra CLI options, and environment variables), too. Shouldn't we at least discuss these options first?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, the point here was to have a place for that discussion. I wrote down what came to mind at the place where it fitted best. I wouldn't want to necessarily force everyone to implement a
dpi
parameter they do not need (core might help here though). Could be overrideable via env var, getopt-style extension etc. We just need a solution soon and a PR is more urgent than issues.Again, I'm open for changing the details, as long as there is one place with a fragid
#standard-parameters
that describes these "special" "parameters"/"flags"/"settings".