Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix 1431 binary comment (0.27) #1432

Merged
merged 7 commits into from
Dec 15, 2020
Merged

Conversation

clanmills
Copy link
Collaborator

Two changes:

  1. Removed isBinary() function and call in src/value.cpp std::string CommentValue::comment()
  2. Updated test/data/exiv2-bug528.jpg to avoid a \n in a comment (and associated python test).

I will wait for you to test on your application and provide feedback before submitting this to the code base. Both @norbertj42 and @vinc17fr may also wish to comment.

Discussion
Let me explain how I believe this works. I didn't write Exiv2, so there are matters which I do not totally understand. There are three "Comments" in the Exif Spec (CIPA DC-008-2019, p45). UserComment and two GPS tags:

Exif Comment Specification

The character code used in the UserComment tag is identified based on an ID code in a fixed 8-byte area at the start of the tag data area. The unused portion of the area shall be padded with NULL ("00.H"). ID codes are assigned by means of registration. The designation method and references for each character code are given in Table 9. The value of Count N is determined based on the 8 bytes in the character code area and the number of bytes in the user comment part. Since the TYPE is not ASCII, NULL termination is not necessary.

Character Code Code Designation (8 bytes) References
ASCII "ASCII\0\0\0" ITU-T T.50 IA5
JIS "JIS\0\0\0\0\0" JIS X208-1990
Unicode "UNICODE\0" Unicode Standard
Undefined "\0\0\0\0\0\0\0\0" Undefined
/0.27-maintenance/build $ bin/exiv2 -M'set Exif.Photo.UserComment charset=Ascii ABCD' ~/S.jpg
/0.27-maintenance/build $ bin/exiv2 -g Comment ~/S.jpg 
Exif.Photo.UserComment                       Undefined  12  charset=Ascii ABCD
/0.27-maintenance/build $ tvisitor -pR ~/S.jpg | grep -i Comment
         428 | 0x9286 Exif.Photo.UserComment       | UNDEFINED |       12 |       820 | ASCII___ABCD
/0.27-maintenance/build $ 

Exiv2 Comment Implementation

When an application reads the image (with image->readMetadata()), the image is updated to have a a vector[] of Exifdatum key/values. https://exiv2.org/doc/classExiv2_1_1Exifdatum.html

You have 100% transparent access to the "raw" data read from the file. The API also provides convenience methods such as toString() which reports "charset=Ascii ABCD".

I understand your complain that toString() was returning binary comment. I have modified it to report the bytes in the string. Please be aware, that your concern about toString() is a presentation issue.

Reporting charset=encoding in front of the data was added in v0.27.3 in response to #1046. The exiv2 command-line program was enhanced to parse charset=encoding, Comments were tested in the test suite, and Comments were documented in the exiv2.1 man page.

The subject of the correctness of this was discussed in #1258. I received several very unpleasant messages from @vinc17fr on the subject of charset=encoding appearing in the UI of some applications which use libexiv2. He used the meaningless term "buggy". I believe it is correct for toString() to return charset=Ascii ABCD. An application may present the value in a different format using the "raw" data visible to them.

@ghost
Copy link

ghost commented Dec 11, 2020

Thank you very much for your very fast response!!
I tested the new code of std::string CommentValue::comment(const char* encoding) const,
IsBinary removed together with removal of the trailing nulls from #1266 worked fine in my test!
Thanks!

@clanmills
Copy link
Collaborator Author

Good. Let's see if @nicofooo, @vinc17fr, @norbertj42, @tester0077 have comments about this as they have been involved in other changes concerning the comment code.

Here's the bill for my services: Head over to https://openhub.net, it's free to register. Give me some kudos and say something nice about Exiv2.

@tester0077
Copy link
Collaborator

Is any particular image used/required/recommended for this test?
Do any of the participants here know of or have copies available of any images with user comments in any of the optional other encodings, other than default ASCII or undefined for testing ?
I have been away from these tests and so I will have to update my copy of exiv2lib first

@clanmills
Copy link
Collaborator Author

clanmills commented Dec 11, 2020

Thanks Arnold (@tester0077). I'm delighted to say that @nicofooo has tested this PR and he is happy. I'll wait a couple of days to see if @vinc17fr and @norbertj42 wish to comment before merging.

Yes there is a test file which @JuergenWolz shared privately and discussed by email. So, I think we're close to closing and merging this PR. The following discussion with @JuergenWolz is very technical and presented here for transparency.


I’m choosing to use the utilities tvisitor and dmpf which are in my book. tvisitor is a lot like exiv2. dmpf is dd+od for sniffing inside files. Here’s the Comment in that file:

$ tvisitor -pR ~/Downloads/IMG_0027.CR2 | grep -i comment
       452 | 0x9286 Exif.Photo.UserComment       | UNDEFINED |      264 |     75780 | ________Tiere Sand Momo_____________ +++

$ dmpf /Users/rmills/Downloads/IMG_0027.CR2 skip=75780 count=264 
 0x12804    75780: ________Tiere Sand Momo_________  ->  00 00 00 00 00 00 00 00 54 69 65 72 65 20 53 61 6e 64 20 4d 6f 6d 6f 00 00 00 00 00 00 00 00 00
 0x12824    75812: ________________________________  ->  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 0x12844    75844: ________________________________  ->  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 0x12864    75876: ________________________________  ->  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 0x12884    75908: ________________________________  ->  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 0x128a4    75940: ________________________________  ->  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 0x128c4    75972: ________________________________  ->  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 0x128e4    76004: ________________________________  ->  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 0x12904    76036: ________                          ->  00 00 00 00 00 00 00 00

So there are 8 null bytes “== undefined” followed by 264-8 bytes. The new behaviour (with PR #1432 and the fix to #1266) is:

$ ../0.27-maintenance/build/bin/exiv2 -g Comment ~/Downloads/IMG_0027.CR2 
Exif.Photo.UserComment                       Undefined 264  Tiere Sand Momo
$   

Curiously, #1267 also solves @JuergenWolz's issue as the current 0.27-maintenance (without #1432) behaviour is good, thanks to removing trailing nul bytes.

$ exiv2 --grep comment/i ~/Downloads/IMG_0027.CR2 
Exif.Photo.UserComment                       Undefined 264  Tiere Sand Momo
$ 

None-the-less the 0.27.3 toString() return of "binary value" was a mistake inspired by the 0.27.2 behaviour which has been in Exiv2 forever (and about which I know nothing). The exiv2 option --binary does nothing at all now on 0.27-maintenance.

$ ../0.27.2/build/bin/exiv2 --grep comment/i ~/Downloads/IMG_0027.CR2 
Exif.Photo.UserComment                       Undefined 264  (Binary value suppressed)
$ bin/exiv2 --binary --grep comment/i ~/Downloads/IMG_0027.CR2 
Exif.Photo.UserComment                       Undefined 264  Tiere Sand Momo
$ 

@clanmills
Copy link
Collaborator Author

Apologies to @norbertj42 for not mentioning his involvement in this issue in my previous message. Comments on this issue are welcome from @vinc17fr and @norbertj42 and anybody else.

@tester0077
Copy link
Collaborator

As side note and looking at some of the sample runs of your tvisitor app shown in some of your replies, is there a man page or more detailed explanation of the app's options?

As I am not a frequent user of either tvisitor or exiv2, the 'help' output from tvisitor is rather sparse and there seems to be more fine grained modifiers for the app.
tvisitor.exe [ { U | S | R | X | C | I } ] path
I've tried them all and I am unsure how much of the output I see depends on the image I use for the tests

@clanmills
Copy link
Collaborator Author

clanmills commented Dec 12, 2020

The current syntax for tvisitor is:

$ tvisitor
usage: tvisitor -{ U | S | R | X | C | I }+ path+
$

The options are:

Option Exiv2 Equivalent Meaning Description
U --unknown Unknown Reports unknown tags
S -pS Structure This is the default
R -pR Recursive Descend as deeply as possible
X -pX XMP Report XMP
C -pC Icc Report ICC Profiles
I -pi Iptc Report IPTC Data

The options can be in any order and undefined characters are ignored. The most common option that I use is -pR which is equivalent to exiv2's -pR. It could be simply stated as tvisitor -R foo. In the test harness, I use -pRU.

The option U prints Unknown (and known) tags. An unknown tag is an item of metadata for which tvisitor does not know the name. A tiff 'tag' is identified by a 16 bit integer and these are defined in the TIFF-EP and Exif Specifications. The MakeNote tags are not standardised and unknown tags are reported in the following style:

   address |    tag                    |      type |    count |    offset | value
       382 | 0x003b Exif.Nikon.0x3b    |  RATIONAL |        4 |      1519 | 256/256 256/256 256/256 256/256

The option S prints the Structure of the file. For example, use the following file: https://clanmills.com/Stonehenge.jpg, we see the structure of the JPEG file:

$ tvisitor -pS ~/Stonehenge.jpg 
STRUCTURE OF JPEG FILE (II): /Users/rmills/Stonehenge.jpg
 address | marker       |  length | signature
       0 | 0xffd8 SOI  
       2 | 0xffe1 APP1  |   15288 | Exif__II*_.___._..._.___.___..._.___.___
   15292 | 0xffe1 APP1  |    2610 | http://ns.adobe.com/xap/1.0/_<?xpacket b
   17904 | 0xffed APP13 |      96 | Photoshop 3.0_8BIM.._____'..__._...Z_..%
   18002 | 0xffe2 APP2  |    4094 | MPF_II*_.___.__.._.___0100..._.___.___..
   22098 | 0xffdb DQT   |     132 | _.......................................
   22232 | 0xffc0 SOF0  |      17 | ....p..!_........
   22251 | 0xffc4 DHT   |     418 | __........________............_.........
   22671 | 0xffda SOS   |      12 | .._...._?_..
END: /Users/rmills/Stonehenge.jpg
$ 

The option R performs a Recursive descent of the file and dumps embedded structures such as the TIFF which contains the Exif Metadata. It also descends into IPTC data and ICC Profiles.

$ tvisitor -pR ~/Stonehenge.jpg 
STRUCTURE OF JPEG FILE (II): /Users/rmills/Stonehenge.jpg
 address | marker       |  length | signature
       0 | 0xffd8 SOI  
       2 | 0xffe1 APP1  |   15288 | Exif__II*_.___._..._.___.___..._.___.___
  STRUCTURE OF TIFF FILE (II): /Users/rmills/Stonehenge.jpg:12->15280
   address |    tag                              |      type |    count |    offset | value
        10 | 0x010f Exif.Image.Make              |     ASCII |       18 |       146 | NIKON CORPORATION
        22 | 0x0110 Exif.Image.Model             |     ASCII |       12 |       164 | NIKON D5300
  ....
  END: /Users/rmills/Stonehenge.jpg:12->15280
   15292 | 0xffe1 APP1  |    2610 | http://ns.adobe.com/xap/1.0/_<?xpacket b
   17904 | 0xffed APP13 |      96 | Photoshop 3.0_8BIM.._____'..__._...Z_..%
  STRUCTURE OF 8BIM FILE (MM): /Users/rmills/Stonehenge.jpg:17922->78
       offset |   kind | tagName                      |  len | data | 
            0 | 0x0404 | PSD.8BIM.IPTCNAA             |   30 | 12+1 | ..__._...Z_..%G..__._...x_.___
    STRUCTURE OF IPTC FILE (MM): /Users/rmills/Stonehenge.jpg:17922->78:12->39
        Record | DataSet | Name                           | Length | Data
             1 |       0 | Iptc.Envelope.ModelVersion     |      2 | _.
             1 |      90 | Iptc.Envelope.CharacterSet     |      3 | .%G
             2 |       0 | Iptc.Application.RecordVersion |      2 | _.
             2 |     120 | Iptc.Application.Caption       |     12 | Classic View
    END: /Users/rmills/Stonehenge.jpg:17922->78:12->39
  END: /Users/rmills/Stonehenge.jpg:17922->78
   18002 | 0xffe2 APP2  |    4094 | MPF_II*_.___.__.._.___0100..._.___.___..
   22098 | 0xffdb DQT   |     132 | _.......................................
   22232 | 0xffc0 SOF0  |      17 | ....p..!_........
   22251 | 0xffc4 DHT   |     418 | __........________............_.........
   22671 | 0xffda SOS   |      12 | .._...._?_..
END: /Users/rmills/Stonehenge.jpg
$

There is no plan to have a man page for tvisitor because it has a 200 page book! tvisitor isn't intended for any production use and has been written to explain how Exiv2 works. In less than 4000 lines of code it decodes the metadata in all formats supported by Exiv2 plus ISOBMFF formats .CR3, .HEIC and .AVIF. Additionally, it supports BigTiff, extended JPEG, dumping ICC profiles and many other features which are not supported in Exiv2.

tvisitor is currently being tested using more than 10,000 images harvested from ExifTool, raw.Pixls.us, RawSamples.ch and images collected from issues reported to Exiv2. My aim is to successfully read 9990 which is 99.9% reliability. I fully expect the Community to attack me concerning the 0.1% that are not successfully decoded. On-line abuse from the Community is the reason that I am retiring.

I have written the book for two purposes:

  1. To Explain how Exiv2 works.
  2. To Explain how to parse metadata.

Exiv2 provides a unique capability to the Community and its long term maintenance is of importance to Linux. To my knowledge, no book as been written about Image Metadata. The tvisitor.cpp code would provide a good resource from which to develop a new Metadata Library.

@norbertj42
Copy link

I had a look on the modification and did tests with some of my pictures. Behaviour is now perfect for me. Thanks a lot.

@vinc17fr
Copy link

Note that if you change the API in an incompatible way so that applications silently break, that's bad, because various existing applications that were working suddenly behave incorrectly because of the change. Here, this might also have security issues as the comment is longer than expected, thus could yield a buffer overflow.

If an application displays charset=Ascii before the comment, this is disturbing: the end user does not have to get that; he just wants the comment to be displayed.

So toString() should just return the comment as it did in the past. Why not adding a new function that provides the encoding for users who want to have this information?

@tester0077
Copy link
Collaborator

@clanmills Thank you Robin. I haven't had a chance to download and review your book, but I presume you will simply add the explanation to it - or perhaps it is already part of it.

@norbertj42
Copy link

@vinc17fr : I had a discussion with Robin about this some time ago. There was a good reason for the change: when you want to change a comment field with the exiv2 command line tool, charset must be given. So it makes sense to have charset also in the output to be consistent.
Those who use the exiv2 lib in their application (like me) can remove that easily.
From my point of view the the only gap was, that this change was not mentioned in list of changes, but this can happen. And we should never forget: we get this great library for free!

@vinc17fr
Copy link

@norbertj42 I'm not talking of the command line tool (it is not used by applications), but of the API. As I've said, it is sufficient to add a new function to the API. This new function can be used by applications if they need it, and by the command line tool to output the charset.

The issue is that many applications are not maintained actively. I had a look at the gThumb source, but did not manage to find where the charset prologue was added. Or perhaps you can provide patches for these applications?

@vinc17fr
Copy link

For Nomacs, I think I will be able to patch the code: it already has a function to remove charset="ASCII" (DkMetaDataT::exiv2ToQString) case-insensitively, but assumes that there are quotes around the charset, and does that only for ASCII. So I suppose that it should just be generalized.

@tester0077
Copy link
Collaborator

@norbertj42 I agree with vinc17fr, the charset info ought to be separate from the data.
It represents metadata for the metadata and as such it ought to be not only separate but also available, albeit via separate data item or function.
Combining the two in the data output may well have been, at least partly, as a result of a conversation I had with Robin when I was looking for that very bit of additional information from exiv2lib

@clanmills
Copy link
Collaborator Author

@tester0077 I wrote that text this morning and added it to the book.

Today, I arrived at the conclusion that the book cannot be finished in the next two weeks. Every week, I discover something new that deserves attention and it ends up in the book. A new issue about Nikon LensData arrived last night. This is an encrypted tag that Exiv2 doesn't understand. It'll be a 3 or 4 days to investigate and deal with that next week. The book will get finished in 2021.

@norbertj42 Working on a release on my own is quite a daunting task. I'm careful about tracking everything in the release. However, the charset= change was a by-product of comment processing fixes for in #1046 and the change to "toString()" was an unexpected by-product. I'm working mostly alone on 0-27-maintenance. #1018 (comment)

open-source-cartoon

@clanmills
Copy link
Collaborator Author

clanmills commented Dec 12, 2020

This channge has been made for good reason. If I remove the "charset=" behaviour, I'm sure @vinc17fr will want to dispute that also. It's not a perfect world. This is purely cosmetic. The security argument is bogus. The function returns a string and the application should deal with it in a safe an orderly manner.

@vinc17fr
Copy link

All what I'm saying is that one should get the comment as before. I don't see any good reason for this change: if the charset is useful, it could be provided by a new function, thus without gratuitously breaking existing applications.

In the unpatched Nomacs, due to the "charset=..." text, the comment is almost not visible at all without scrolling (this depends on the pane width, but if one increases the width, one wastes other screen space), so that this is not just cosmetic.

Now, I've written a patch for Nomacs, which extends the exiv2 string cleanup to support charset=Ascii and charset=Unicode . This is a bit of a hack as it will remove such substrings anywhere in the string, though such substrings are rather unlikely to appear in the real comment. This patch is available in my Debian bug report (direct link to the patch).

@clanmills
Copy link
Collaborator Author

@vinc17fr You are a very determined person who will not give up until you have things your own way. I am not changing it on 0.27-maintenance as I'm sure it somebody equally strong willed will want to argue that it shouldn't be reverted.

I propose that you submit a PR to 'master' and you'll have your way in 0.28.

For now, I'm working hard to get my book finished and to retire. I would like to thank you for your generous words of appreciation for my years of work on Exiv2.

@clanmills
Copy link
Collaborator Author

@vinc17fr I will revisit this on Monday. Your point of view has merit.

@clanmills
Copy link
Collaborator Author

I've slept on this and spent another couple of hours looking at the code. I want charset=encoding (prologed string) in the output from the exiv2 command-line program. It should be possible for CommentValue::toString() to return an "unprologed string" and for CommentValue::write() to output the "prologed string". I believe that would make everybody happy.

I didn't write Exiv2 and there are aspects of the code that I don't understand. Exifdatum::write() uses toString(). Without changing the API, I can't see a way to determine the value of the prolog string charset=encoding from Exiv2datum::write(). For a "dot" release, I am reluctant to change the API. I am happy to screen share on Zoom to step the code together. Two heads are better than one.

I'll sleep on this again tonight. If I can't find a way to determine the prolog string charset=encoding in Exifdatum::write(), I will merge this PR on Monday.

@clanmills
Copy link
Collaborator Author

I think toString() is behaving correctly by returning a prologed string such as "charset=encoding ABCD".

The correct method to get the value of a comment is CommentValue.comment() and documented here: https://exiv2.org/doc/classExiv2_1_1CommentValue.html

screenshot_38

One way to call that is:

    Exiv2::CommentValue commentValue (value().toString());
    os << commentValue.comment() ;

I've plugged this in here:

    std::ostream& Exifdatum::write(std::ostream& os, const ExifData* pMetadata) const
    {
        if (value().count() == 0) return os;

        PrintFct       fct = printValue;
        const TagInfo* ti  = Internal::tagInfo(tag(), static_cast<IfdId>(ifdId()));
        // be careful with comments (User.Photo.UserComment, GPSAreaInfo etc).
        if ( ti ) {
            fct = ti->printFct_;
            if ( ti->typeId_ == comment ) {
                fct=NULL;
#if 0
                os << value().toString();
#else
                Exiv2::CommentValue commentValue (value().toString());
                os << "comment=" << commentValue.comment() ;
#endif
            }
        }
        if ( fct ) fct(os, value(), pMetadata);
        return os;
    }

S.jpg is a copy of https://clanmills.com/Stonehenge.jpg

1009 rmills@rmillsmm-local:~/gnu/github/exiv2/0.27-maintenance/build $ exiv2 -g UserComment ~/S.jpg 
Exif.Photo.UserComment                       Undefined  20  charset=Ascii Classic View  <-- 0.27.3 Behaviour
1013 rmills@rmillsmm-local:~/gnu/github/exiv2/0.27-maintenance/build $ bin/exiv2 -g UserComment ~/S.jpg 
Exif.Photo.UserComment                       Undefined  20  comment=Classic View        <-- with the plug

I don't think we should modify the 0.27.3 behaviour as it appears to be correct. Applications wanting the unprologed comment should use the clumsy code:

    Exiv2::CommentValue(value().toString()).comment() ;

Comments welcome. I'm now very confident about merging this PR on Monday (without the illustrative plug in this comment).

@tester0077
Copy link
Collaborator

Thank you for having another look at this issue. I am a bit behind right now because I replaced my main C: drive, along with the Christmas preparation, such as they are under Covid, but I will have a look at the code in the links you have provided ASAP.
In fact, I will be happy with the 'clumsy' version, but I will have a closer look as soon as I can.

@vinc17fr
Copy link

Well, according to https://exiv2.org/doc/classExiv2_1_1CommentValue.html, there are two ways to get a comment: Exiv2::CommentValue::comment (which also converts the comment to UTF-8) and Exiv2::CommentValue::read. But this doesn't say how toString() should behave. And https://www.exiv2.org/doc/namespaceExiv2.html#aeb5a1b7e0a14e8671707d5be16cbda24 just says for toString(): "Utility function to convert the argument of any type to a string."

So this doesn't mean that a prologed string is more correct than just the comment string. Until now, toString() just returned the comment string (without a prologue), so I assume that this was the expected form. Or can you show from the documentation that this wasn't?

Perhaps it could be decided in the future to change the behavior, but this is a change of API in an incompatible way, and this must not be done in a minor release. Please read: Semantic Versioning 2.0.0.

Note also that in general (outside exiv2), the toString method does not care about the internal encoding of the value to convert to a string. This would mean that toString() should return the comment without the prologue.

@clanmills
Copy link
Collaborator Author

Right. I've found the fix for which I was searching this morning. With this, we should both be happy because I'm doing things your way, and the output of the exiv2 command-line program shows the prolog as documented in the man page.

I've moved the prolog printer from CommentValue::write() into Exifdatum::write(). So:

  1. the exiv2 command-line program emits a prolog as documented in the man page
  2. value().toString() does not emit a prolog.
    std::ostream& Exifdatum::write(std::ostream& os, const ExifData* pMetadata) const
    {
        if (value().count() == 0) return os;

        PrintFct       fct = printValue;
        const TagInfo* ti  = Internal::tagInfo(tag(), static_cast<IfdId>(ifdId()));
        // be careful with comments (User.Photo.UserComment, GPSAreaInfo etc).
        if ( ti ) {
            fct = ti->printFct_;
            if ( ti->typeId_ == comment ) {
                fct=NULL;
                const Exiv2::CommentValue* cv = dynamic_cast<const Exiv2::CommentValue*>(&value());
                Exiv2::CommentValue::CharsetId csId = cv->charsetId();
                if ( csId != CommentValue::undefined ) {
                    os << "charset=" << CommentValue::CharsetInfo::name(csId) << " ";
                }
                os << cv->comment();
#if 1
                os << "|| value().toString() = " << value().toString();
#endif
            }
        }
        if ( fct ) fct(os, value(), pMetadata);
        return os;
    }

When the plug is enabled, we get:

.../0.27-maintenance/build $ make exiv2 ; bin/exiv2 --grep UserComment ~/S.jpg 
...
Exif.Photo.UserComment               Undefined  20  charset=Ascii Classic View|| value().toString() = Classic View
.../0.27-maintenance/build $ 

I'll submit this now (with the plug disabled) for you to test.

This change has caused exceptions to appear in our test suite in conversions.sh which I will investigate tomorrow (I believe they are valid behaviour and should modify the reference files in the test suite).

As a future enhancement the "special code" in std::ostream& Exifdatum::write() should be moved into a global function "printComment()" and the TagInfo for Photo.UserComment should declare that instead of printValue. I'll do that when this 0.27-maintenance is ported to 'master' for 0.28. I avoid extending the API on 0.27-maintenance.

@ghost
Copy link

ghost commented Dec 14, 2020

As I see, nobody objected the original changes as in 1432, this is removing isBinary in CommentValue::comment().
So, this change could well be submitted to the code base.
As the discussion drifted from the original problem "Binary comment" to prefixed recognition of the string format in a comment value, by charset=xxxxx
Certainly, one could have decided years ago to leave the value ToString() function totally unprefixed by charset text.
Apps could detect the charset by CommentValue::charsetId() in order to decide if the stringvalue obtained from CommentValue::comment() is ASCII or UTF-8 converted output.
But I'm afraid it's too late now. Each change in that function will raise new incompatibilities in apps, e.g. where apps now rely on prefix charset=Unicode in order to decide the string is UTF-8 ...
Your update to Exifdatum::write does not affect the Value::ToString() method for comment-values, I think?

@clanmills
Copy link
Collaborator Author

You're right. We've wandered a little away from your issue. Several people have discussed matters relating to this since @0.27.3 shipped and I am trying to make everybody happy.

I'll be honest and say that I don't know what CommentValue toString() did in years gone by.

Everything we're discussing here arises from the response to #1046 which was a "hard bug" concerning GPSProcessingMethod and GPSAreaInformation. This lead to many discoveries including:

  1. We weren't processing comments correctly.
  2. Comments (and charsets) were not documented in the man page.
  3. We weren't testing any charsets.
  4. The charset= prefix was being handled in the exiv2 application (and not the library)

And there were other matters such as binary ending up in the output, and other pain I have forgotten.

I always try to make everybody happy. I don't always succeed. I hope we're done.

@ghost
Copy link

ghost commented Dec 14, 2020

Hi Robin,
In your latest code changes, you write that you are restoring behaviour of 0.27.2.
I newest changes for Value.cpp, CommentValue::write does no longer return the charset=xxx prefix. But in 0.23, 0,24 and maybe even earlier it always did ever since.
For correct conversions, I need to know if the string that Value.toString returns for a comment, is ASCII or UTF-8 encoded (for comment CharsetId UNICODE).
How could this be achieved for an Exiv2::Value::AutoPtr, now the prefix is ommited?

@clanmills
Copy link
Collaborator Author

@JuergenWolz Thanks for your input on this. There's a saying "when you're in a hole, stop digging!".

I'm strongly tempted to return to the state of the PR on Saturday. Here's the 0.27.2 behaviour with a very slightly modified samples/exifvalue which reports toString() and value().

528 rmills@rmillsmm-local:~/gnu/github/exiv2/v0.27 $ build/bin/exiv2 --version | head -1 
exiv2 0.27.2
529 rmills@rmillsmm-local:~/gnu/github/exiv2/v0.27 $ build/bin/exifvalue ~/S.jpg Exif.Photo.UserComment
exifData[key]             = Classic View
exifData[key].value()     = Classic View
exifData[key].toString()  = charset="Ascii" Classic View
530 rmills@rmillsmm-local:~/gnu/github/exiv2/v0.27 $ build/bin/exiv2 -pv -g UserComment ~/S.jpg 
0x9286 Photo        UserComment                 Undefined  20  charset="Ascii" Classic View
531 rmills@rmillsmm-local:~/gnu/github/exiv2/v0.27 $ build/bin/exiv2 -pt -g UserComment ~/S.jpg 
Exif.Photo.UserComment                       Undefined  20  Classic View
532 rmills@rmillsmm-local:~/gnu/github/exiv2/v0.27 $ 

And, I believe that's how it has worked for years. You'll notice that toString() reports charset in a slightly different syntax from that documented in 0.27.3

I've been trying, without success, to pacify @vinc17fr in his complaints that some applications are behaving differently with 0.27.3. The prime libexiv2 suspect has been toString(), however this evidence does not support that.

I'm happy to tabulate similar behaviour with:

  1. 0.27.3 (as shipped)
  2. 0.27-maintenance without this PR.
  3. 0.27-maintenance with Saturday's PR
  4. 0.27-maintenance with the PR as it is now.

Before I undertake that investigation (which will take about 30 minutes), I'd like to hear your thoughts about the following proposal:

  1. Revert to Saturday's PR which was accepted by you, @norbertj42 and @nicofooo
  2. @vinc17fr should file a separate issue to tabulate the applications and behaviour about which he is complaining. I would also like him to file bug reports with the applications involved.
  3. I am retiring from Exiv2 (after 12+ years). Thursday (2020-12-17) will be my last day working on Exiv2 (other than the complete my book).

Your thoughts?

@ghost
Copy link

ghost commented Dec 14, 2020

Yes, I had observed slight differences in syntax, charset="Unicode" vs. charset=Unicode , and I know that charset="Unicode" existed for years.
Also, I lately saw charset=InvalidCharsetId or "binary comment", which I had not seen before 0.27.3. I can't tell if this is new or I did not see this lacking metadata that would show.
I can live with new or modified syntax for charset=xxx, but what I need is to know if the result of comment.toString is encoded ASCII or UTF-8. How ever this could be rendered possible.
Thanks.

@ghost
Copy link

ghost commented Dec 14, 2020

P.S.: 3. 0.27-maintenance with Saturday's PR would be fine for me.

@norbertj42
Copy link

@clanmills I am fine with your approach - and will muss you, thanks for all

@clanmills
Copy link
Collaborator Author

clanmills commented Dec 14, 2020

Took longer that 30 minutes. Running the builds/tests is easy. Presenting the results in MarkDown is tough:

Release:

Test
0.27.2


0.27.3
0.27‑maintenance
PR 1432 Saturday


PR 1432 Monday
bin/exifvalue ~/S.jpg Exif.Photo.UserComment value() C charset=Ascii C charset=Ascii C
bin/exifvalue ~/S.jpg Exif.Photo.UserComment toString() charset="Ascii" C charset=Ascii C C
bin/exiv2 -pv -g UserComment ~/S.jpg C charset=Ascii C charset=Ascii C
bin/exiv2 -pt -g UserComment ~/S.jpg C charset=Ascii C charset=Ascii C
S.jpg=https://clanmills.com/Stonehenge.jpg C=Classic View Wrong! toString()

Conclusions

I will restore this PR to the code on Saturday because:

  1. It's the minimum change to solve Exiv2 returns "binary data" for valid content in Exif.Photo.UserComment #1431 and accepted by @JuergenWolz, @norbertj42 and @nicofooo.
  2. It does not disturb the 0.27.3 behaviour.
  3. No API/man-page/test changes.
  4. It respects Semantic Versioning by preserving value.toString().
  5. @vinc17fr should submit issue reports to gThumb etc for investigation. The behaviour of these apps is irrelevant to fixing Exiv2 returns "binary data" for valid content in Exif.Photo.UserComment #1431. From the discussion here and elsewhere (for which he has provided no evidence), gThumb might be determining the comment with something like:
std::ostringstream os ;
os << value() ;
std::string comment = os.str(); 

The documented way to determine the comment without the prefix charset= is discussed above:

std::string comment = Exiv2::CommentValue(value().toString()).comment());

@vinc17fr
Copy link

OK, Debian bug 974608 against gthumb and Debian bug 974616 against nomacs updated to say not to use internal libexiv2 functions to get the user comment, but the documented way you gave.

BTW, it would be better not to expose internals, such as value().

@clanmills
Copy link
Collaborator Author

@vinc17fr: As you have not said "Thank You, Robin", I will retire from open-source when this PR is merged/closed. I have no interest in your opinion and aggressive behaviour.

ab-cartoon

@clanmills clanmills merged commit 94ab08e into 0.27-maintenance Dec 15, 2020
@clanmills clanmills deleted the fix_1431_binary_comment branch December 15, 2020 07:50
@clanmills clanmills removed the request for review from piponazo December 15, 2020 07:58
@ghost
Copy link

ghost commented Dec 15, 2020

Hi Robin,
Thanks! I inspected the current 1432 updates to value.cpp, function CommentValue::comment. They look good to me.
I want to thank you for your long years work for exiv2 and wish you all the best for the future!

@clanmills
Copy link
Collaborator Author

Thanks, @JuergenWolz. It's been mostly fun, interesting and mentally rewarding. If everybody was as nice as you, @nicofooo, @norbertj42 and @tester0077 it would be all fun. Sadly there are a few people in the Community who make trouble. You might find the introduction to Chapter 11 Project Management interesting https://clanmills.com/exiv2/book/#11

(By the way, Chapter 11 in the US Tax Code is Bankruptcy!)

I am now retired from the Exiv2. We leave on a pre-Christmas family vacation with the grand-kids on Friday. I'll get the book finished in the New Year. The book is mostly impenetrable and very detailed. That's quite intentional. This is not something to help you relax, or read to the cat. It's solid shit!

@tester0077
Copy link
Collaborator

tester0077 commented Dec 15, 2020 via email

@clanmills
Copy link
Collaborator Author

I'll do of those things (and more), Arnold. We've enjoyed a few laughs over the years. Best Wishes to You.

RobinEuphonium

@clanmills clanmills mentioned this pull request Jan 11, 2021
@clanmills
Copy link
Collaborator Author

clanmills commented Feb 3, 2021

@tester0077 I've made a remarkable discovery today while working on Previews for my book and also #1464.

Two or three months ago you asked me about data inside Stonehenge.jpg that doesn't seem to be needed. It's a couple of thumbnails that have nothing to do with Exif, ICC, XMP or IPTC. I've beefed up tvisitor to dump the segments that follow SOS (start of scan). There's a couple of thumbnails hiding at the bottom of the lake. I learn something every day. #1464 (comment)

I've agreed to release Exiv2 v0.27.4 on 2021-05-22 to include ISOBMFF support (.CR3, .HEIC and .AVIF) and after that I'm out of here. There's going to be a Zoom meeting late-February/early-March to discuss the future of Exiv2. You'll get an invitation.

@tester0077
Copy link
Collaborator

tester0077 commented Feb 3, 2021 via email

@clanmills
Copy link
Collaborator Author

Arnold: @tester0077 You're on the invitation list on #1466. I hope you can join us.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Exiv2 returns "binary data" for valid content in Exif.Photo.UserComment
4 participants