Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(jpeg): Support encoding/decoding arbitrary metadata as comments #4430

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

lukasstockner
Copy link

This is needed to port Blender's current JPEG IO code to using OIIO, but is also a useful feature to have in general.

For reading, the code tries to parse comments as colon-separated key-value pairs and sets metadata accordingly. For writing, this needs to be explicitly enabled by setting jpeg:com_attributes to 1 in order to avoid accidentally bloating files for existing applications.

Tests

I've added a small (~10KB) JPEG file containing Blender metadata and a basic test that parses it, checks that the metadata was read correctly, writes it twice (once with and once without jpeg:com_attributes), and then checks that those files are also parsed as expected.
In case you're wondering why the info for "no-attribs.jpg" still contains one Blender attribute - that's because the first COM field is still put into ImageDescription just like before, so even without jpeg:com_attributes it ends up being written to the output file and recognized during parsing.

Checklist:

  • I have read the contribution guidelines.
  • I have updated the documentation, if applicable.
  • I have ensured that the change is tested somewhere in the testsuite
    (adding new test cases if necessary).
  • If I added or modified a C++ API call, I have also amended the
    corresponding Python bindings (and if altering ImageBufAlgo functions, also
    exposed the new functionality as oiiotool options).
  • My code follows the prevailing code style of this project. If I haven't
    already run clang-format before submitting, I definitely will look at the CI
    test that runs clang-format and fix anything that it highlights as being
    nonconforming.

Copy link

linux-foundation-easycla bot commented Sep 17, 2024

CLA Signed

The committers listed above are authorized under a signed CLA.

  • ✅ login: lukasstockner / name: Lukas Stockner (ba93c68)

This is needed to port Blender's current JPEG IO code to using OIIO,
but is also a useful feature to have in general.

For reading, the code tries to parse comments as colon-separated key-value
pairs and sets metadata accordingly. For writing, this needs to be explicitly
enabled by setting jpeg:com_attributes to 1 in order to avoid accidentally
bloating files for existing applications.

Signed-off-by: Lukas Stockner <[email protected]>
@lgritz
Copy link
Collaborator

lgritz commented Sep 18, 2024

The Mac failures are unrelated and fixed by a different PR that has already been merged.

std::string((const char*)m->data,
m->data_length));
m_spec.attribute("ImageDescription", data);
// Additional string metadata can be stored in JPEG files as
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's worth explicitly starting by spelling out:

// The first COM block encountered will be interpreted as the image description.
// Subsequent COM blocks, if in the form "key:value", ... blah blah

By the way, is this exactly what we want? What if the first COM looks like "key:value", should that always be slotted into ImageDescription?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer only setting ImageDescription if the parsing fails, but I figured that's a potentially breaking change so I kept it safe for now.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I bet that most traditional JPEG COM blocks that are true "image comments" are unlikely to have the specific form of "[ident:]string1:string2" where the optional ident (namespace prefix) follows C identifier rules and string1 won't start or end with whitespace. If we interpret only that pattern as metadata and the first COM that doesn't follow the pattern is "ImageDescription".

I'm willing to risk that an occasional "comment" with a quirky format might be incorrectly interpreted as metadata. Especially if there is some kind of OIIO global option that lets you revert to the old behavior (first COM is always ImageDescription), so somebody can get out of a bind if they have a pile of images with the ambiguous formatting of their COM blocks.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the verdict here, @lukasstockner? Do you want to make any more changes, or do you want to keep the logic as-is and we can always revise later if it causes trouble?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's fine with you, I'll go ahead and add some more logic to only set the ImageDescription if a global option is set and/or the matched key fails a heuristic.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants