Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RTL Support (Arabic, Hebrew...) #330

Open
thmclellan opened this issue Sep 26, 2023 · 9 comments
Open

RTL Support (Arabic, Hebrew...) #330

thmclellan opened this issue Sep 26, 2023 · 9 comments

Comments

@thmclellan
Copy link
Contributor

Thanks for the excellent work in maintaining and improving this library!

I wondered if adding support for Right-To-Left languages including Arabic and Hebrew is on your roadmap or an area where you'd consider a PR.

Related links:

Reading into issue 56 above, it sounds like one key consideration was which library to use (and related licensing). I wondered if you had any preferences and/or suggested approach. I'm in fact finding mode and just trying to get an idea of the right approach and magnitude before experimenting with a fork or PR. Thanks

@julianhille
Copy link
Owner

I'd start with the html to pdf part to test if that considers RTL as I have no idea if it takes that into account.

@thmclellan
Copy link
Contributor Author

Thanks, I don't think hummus-recipe ever added RTL support (chunyenHuang/hummusRecipe#108) but it's worth a try, maybe it gets handled through a lower-level library.... I'll give it a try.

@thmclellan
Copy link
Contributor Author

I experimented with the ModifyExistingPageContent.js test to see if I could add some Hebrew text onto an existing PDF with the writeText() function. I used an arial-unicode font that included Hebrew characters.

The characters came out in reverse order when rendered on the PDF. (Copy of the test below with screenshot).

As a next step I'm goingto experiment with using a BIDI library to see if the text can be modified before passing to writeText(), possibly https://www.npmjs.com/package/bidi-js. I'll keep you posted on how it goes or if you have a preferred approach let me know. Thanks

describe('ModifyExistingPageContent', function () {
  it('should complete without error', function () {
    var pdfWriter = muhammara.createWriterToModify(__dirname + '/TestMaterials/BasicJPGImagesTest.PDF', {
      modifiedFilePath: __dirname + '/output/BasicJPGImagesTestPageModified2.pdf',
    });

    var pageModifier = new muhammara.PDFPageModifier(pdfWriter, 0);
    // const content = `The quick brown fox jumped over the lazy dog`;
    const content = `השועל החום המהיר קפץ מעל הכלב העצלן`;
    pageModifier
      .startContext()
      .getContext()
      .writeText(content, 75, 805, {
        font: pdfWriter.getFontForFile(__dirname + '/TestMaterials/fonts/arial-unicode.ttf'),
        size: 18,
        colorspace: 'gray',
        color: 0x00,
      });

    pageModifier.endContext().writePage();
    pdfWriter.end();
  });
});

Rendered on PDF in reverse:

image

@julianhille
Copy link
Owner

Would you mind trying a freetype font having a draw direction? I guess it would work.

@thmclellan
Copy link
Contributor Author

Thanks but I didn't quite understand. Are you suggesting to render the text using a specialized font that assumes we're writing in LTR mode and have reversed the characters of the string before rendering?

That would be nice to handle it at the font / text level. It doesn't seem like PDF spec has any inherent support for RTL writing. The writeText function (using Tm() and Tj()) doesn't support a draw direction, so I guess the X coordinate would be the end / left-most position of the text.

The previous discussion (galkahana/HummusJS#56 (comment)) looks at using Arabic Presentation Forms B and a mapping script like https://github.com/NaurozAhmad/Arabic-Urdu-Converter-From-and-To-Presentation-Forms-B.

For Hebrew I wonder if it would work to just reverse the string characters and use the X position as the ending point for the text.

@julianhille
Copy link
Owner

Freetype fonts might have a draw direction as far as I understood the c code for freetype fonts.
I wondered if maybe the libs do use this direction for drawing. That's why I suggested to try a specific rtl only freetype font.

@thmclellan
Copy link
Contributor Author

Okay interesting, I looked for RTL only freetype-compatible fonts but didn't find any clear options.

Maybe best for now just to assume that RTL handling should be done before calling MuhammaraJS writeText(). My main use case is Hebrew, which we can probably handle by reversing the text and using a unicode font. It seems like there's an extra level of effort to render Arabic in PDF.

Feel free to close this issue... thanks again for the help in troubleshooting!

@julianhille
Copy link
Owner

Im not really into rtl are there cases where reversing does not work?

@thmclellan
Copy link
Contributor Author

I guess in Arabic the reversing method can cause issues where the characters aren't joining together (like cursive writing). It sounds like the reversal approach is generally okay for Hebrew. Some more detail at galkahana/HummusJS#56 (comment).

Just reading through past PR efforts on this with PDF-Writer and it looks like Gal outlined a special build approach (galkahana/PDF-Writer#65 (comment)) for RTL, presumably due to licensing issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants