Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to modify existing Annots ? #276

Open
devilsclaw opened this issue Aug 17, 2024 · 11 comments
Open

How to modify existing Annots ? #276

devilsclaw opened this issue Aug 17, 2024 · 11 comments

Comments

@devilsclaw
Copy link

So I am trying to figure out how to modify the Annots for a pdf generated with Libre Draw. I have figure out how to parse it and get the data out of it. Now I would like to then modify the content.

Attached is a sample project with a sample pdf where the Annots have been filled in with a pdf reader. It is able to read an print out all the elements.

I have looked around but there has not really been any simple examples of how to take and the modify Annots. I have seen example of copying and creating new ones but not modify.

pdf_annots.zip

@devilsclaw
Copy link
Author

devilsclaw commented Aug 17, 2024

I forgot to mention the data that needs modified is V the one with Testing 1 - Testing 51

@devilsclaw
Copy link
Author

devilsclaw commented Aug 17, 2024

I am not looking to in place edit the info. I know the the style is to copy to a new PDF. But I don't know how to get a copy of the data that then can be modified and then put into the new PDF. The modify part is what I am confused about.

@galkahana
Copy link
Owner

well, i can probably arrange for an example if i get to it, but a solution to a problem like that with this lib goes through:

  • opening the document in modification mode. this will give you both writing abilities and a parser that you can read the PDF content
  • locating the element you want to modify
  • using the element number to create a new version of the element, copying some of its content and write anew what needs to be modified.

the test example of ModifyingExistingFileContent.cpp shows a common approach to modifying part of an element (pages mostly), where a new version is created, most of its content is just copied, but the elements that you want to change, which are created as par what change you want to make.

If you haven't yet done that, try going through the modification documentation here - https://github.com/galkahana/PDF-Writer/wiki/Modification.

Depending on how the week goes, i'll try to get around and setup a more comprehensive example per what you're trying to do.

@galkahana
Copy link
Owner

ok. actually seems like this is just a form and its widgets annotations. in this case you can read the examples for either filling a form or locking it. there's quite a few over at hummusjs (the legacy nodejs interface of this library), and the operators are fairly similar. see if it helps you, otherwise we can discuss details.

here's the example:
https://github.com/galkahana/HummusJSSamples/blob/master/filling-form-values/pdf-form-fill.js

its a general purpose example for how to create a modified file version of an existing file...and also in particular about how to fill form values...in case this sort of thing interest you.

@devilsclaw
Copy link
Author

Thanks. I will look into it.

@devilsclaw
Copy link
Author

devilsclaw commented Aug 22, 2024

I looked at the JS script and currently working to port it to C++ and the modify it to work with my forms if needed I noticed a bug in it which it will ignore fields that its supposed to change. I would post it in the repo but it looks like its never checked.

var data = {
    "Given Name Text Box": "Eric",
    "Family Name Text Box": "Jones",
    "House nr Text Box": "someplace",
    "Address 1 Text Box": "somewhere 1",
    "Address 2 Text Box": "somewhere 2",
    "Postcode Text Box": "123456",
    "Country Combo Box": "Spain",
    "Height Formatted Field": "198",
    "Driving License Check Box": true,
    "Favourite Colour List Box": "Brown",
    "Language 1 Check Box": true,
    "Language 2 Check Box": true,
    "Language 3 Check Box": false,
    "Language 4 Check Box": false,
    "Language 5 Check Box": true,
    "Gender List Box": "Man"
};

The ones with false will not be process. due to the line below

if (handles.data[fullName]) {

It should be

if (handles.data[fullName] != undefined) {

@devilsclaw
Copy link
Author

devilsclaw commented Aug 23, 2024

So I am trying to figure out the C++ equivalent to this

if(handles.acroformDict.exists('DR')) {
    handles.writer.getEvents().once('OnResourcesWrite',function(args){
        // copy all but the keys that exist already
        var dr = handles.reader.queryDictionaryObject(handles.acroformDict,'DR').toPDFDictionary().toJSObject();
            Object.getOwnPropertyNames(dr).forEach(function(element,index,array) {
                if (element !== 'ProcSet' && (!textOptions || element !== 'Font')) {
                    args.pageResourcesDictionaryContext.writeKey(element);
                    handles.copyingContext.copyDirectObjectAsIs(dr[element]);
                }
            });
    });
}

I was thinking it might be AddDocumentContextExtender but that seems to be the entire document type thing where the above seems to only happens once for each instance.

@devilsclaw
Copy link
Author

Not sure but would this work ?

  if(handles.acroformDict->Exists("DR")) { //maybe
    // copy all but the keys that exist already
    PDFObjectCastPtr<PDFDictionary> dr = handles.reader.QueryDictionaryObject(handles.acroformDict.GetPtr(), "DR");

    MapIterator<PDFNameToPDFObjectMap> it = dr->GetIterator();
    RefCountPtr<PDFName> key;
    PDFObject* value;
    DictionaryContext* page_out_dic = handles.writer.GetObjectsContext().StartDictionary();
    while(it.MoveNext()) {
      key = it.GetKey();
      value = it.GetValue();
      if(key->GetValue() != "ProcSet" && (textOptions == NULL || key->GetValue() != "Font")) {
        page_out_dic->WriteKey(key->GetValue());
        handles.copyingContext->CopyDirectObjectAsIs(value);
      }
    }
  }

@devilsclaw
Copy link
Author

devilsclaw commented Aug 24, 2024

So here is the C++ port which is like 95 to 99 % ported. If fills out the test pdf the same amount as the original js example does. It also worked on my PDF. So thanks for the pointer.

https://github.com/devilsclaw/pdf_form_fill

https://github.com/devilsclaw/pdf_form_fill/blob/main/pdf_form_fill.h

@galkahana
Copy link
Owner

awesome :) glad it worked out.

@devilsclaw
Copy link
Author

devilsclaw commented Aug 26, 2024

I also made a pdf_info tool that parses the whole PDF and prints all elements in a human readable form other then input streams which are printed in hex notation since it can store anything. This would of been really helpful for me originally so I made available as well.

NOTE: Indirect's are also printed at the end of each page since they can point to each other in an infinite loop. So they are handle differently.

Small clipped sample below even a small PDF is pages long if I would put the whole parse here

PDF Header level = 1.400000
Number of objects in PDF = 63
Number of pages in PDF = 1

// Showing info for Page 0 //////////////////////////////////////////////////////////
Showing info for page 0:
ePDFObjectDictionary: 
  key = Annots
    ePDFObjectArray: 
      ePDFObjectDictionary: 
        key = AP
          ePDFObjectDictionary: 
            key = N
              ePDFObjectIndirectObjectReference: value = 38
        key = DA
          ePDFObjectLiteralString: value = 0 0 0 rg /F3 11 Tf
        key = DR
          ePDFObjectDictionary: 
            key = Font
              ePDFObjectIndirectObjectReference: value = 6
        key = DV
          ePDFObjectHexString: value =
        key = F
          ePDFObjectInteger: value = 4
        key = FT
          ePDFObjectName: value = Tx
        key = MaxLen
          ePDFObjectInteger: value = 40
        key = P
          ePDFObjectIndirectObjectReference: value = 1
        key = Rect
          ePDFObjectArray: 
            ePDFObjectReal: value = 165.700000
            ePDFObjectReal: value = 453.700000
            ePDFObjectReal: value = 315.700000
            ePDFObjectReal: value = 467.900000
        key = Subtype
          ePDFObjectName: value = Widget
        key = T
          ePDFObjectLiteralString: value = Given Name Text Box
        key = TU
          ePDFObjectHexString: value = First name
        key = Type
          ePDFObjectName: value = Annot
        key = V
          ePDFObjectHexString: value =

Indirect example

ePDFObjectIndirectObjectReference: Start : value = 35
  ePDFObjectDictionary: 
    key = Ascent
      ePDFObjectInteger: value = 905
    key = CapHeight
      ePDFObjectInteger: value = 1005
    key = Descent
      ePDFObjectInteger: value = -211
    key = Flags
      ePDFObjectInteger: value = 4
    key = FontBBox
      ePDFObjectArray: 
        ePDFObjectInteger: value = -664
        ePDFObjectInteger: value = -324
        ePDFObjectInteger: value = 2000
        ePDFObjectInteger: value = 1006
    key = FontName
      ePDFObjectName: value = ArialMT
    key = ItalicAngle
      ePDFObjectInteger: value = 0
    key = StemV
      ePDFObjectInteger: value = 80
    key = Type
      ePDFObjectName: value = FontDescriptor
ePDFObjectIndirectObjectReference: End   : value = 35

https://github.com/devilsclaw/pdf_info/

https://github.com/devilsclaw/pdf_info/blob/main/pdf_info.cpp

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants