Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PDFString supports only one-byte characters #1649

Open
2 tasks done
m-kemarskyi opened this issue Jul 4, 2024 · 2 comments
Open
2 tasks done

PDFString supports only one-byte characters #1649

m-kemarskyi opened this issue Jul 4, 2024 · 2 comments

Comments

@m-kemarskyi
Copy link

What were you trying to do?

I was trying to add a comment to PDF with cyrillic letters.

How did you attempt to do it?

const commentAnnotRef = this.pdfDocument.context.register(
  this.pdfDocument.context.obj({
    Type: 'Annot',
    Subtype: 'Text',
    Open: true,
    Name: 'Comment', // Determines the icon to place in the document
    T: PDFString.of('abc абві äüöß'), // Comment title
    Contents: PDFString.of('abc абві äüöß'), // Comment main text
    // The position of the annotation
    Rect: [
      xCoordinate,
      pageHeight - yCoordinate,
      xCoordinate,
      pageHeight - yCoordinate,
    ],
  })
)

What actually happened?

It turned out that one-byte per characters is used under the hood (see the result on the screenshot)
Screenshot 2024-07-04 at 13 45 17

What did you expect to happen?

I expected UTF-8 characters to work correctly.

How can we reproduce the issue?

Try to add the comment to PDF file using the code I've provided

Version

1.17.1

What environment are you running pdf-lib in?

Node

Checklist

  • My report includes a Short, Self Contained, Correct (Compilable) Example.
  • I have attached all PDFs, images, and other files needed to run my SSCCE.

Additional Notes

No response

@m-kemarskyi
Copy link
Author

I've tried to come up with the custom PDFUnicodeString class but it didn't worked out:

export class PDFUnicodeString extends PDFObject {
  // The PDF spec allows newlines and parens to appear directly within a literal
  // string. These character _may_ be escaped. But they do not _have_ to be. So
  // for simplicity, we will not bother escaping them.
  static of = (value: string) => new PDFUnicodeString(value);

  private readonly value: string;

  private constructor(value: string) {
    super();
    this.value = value;
  }

  asBytes(): Uint8Array {
    return new TextEncoder().encode(this.value)
  }

  asString(): string {
    return this.value;
  }

  clone(): PDFUnicodeString {
    return PDFUnicodeString.of(this.value);
  }

  toString(): string {
    return `(${this.value})`;
  }

  sizeInBytes(): number {
    return new TextEncoder().encode(this.value).length + 2;
  }

  copyBytesInto(buffer: Uint8Array, offset: number): number {
    buffer[offset++] = 40;
    const encodedValue = new TextEncoder().encode(this.value);
    buffer.set(encodedValue, offset);
    offset += encodedValue.length;
    buffer[offset++] = 41;
    
    return encodedValue.length + 2;
  }
}

@m-kemarskyi
Copy link
Author

UPD: PDFHexString class solves the problem: PDFHexString.fromText(YOUR_TEXT)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant