Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance: glyphnames::name_to_unicode is very slow #34

Open
badicsalex opened this issue Feb 27, 2022 · 4 comments
Open

Performance: glyphnames::name_to_unicode is very slow #34

badicsalex opened this issue Feb 27, 2022 · 4 comments

Comments

@badicsalex
Copy link

badicsalex commented Feb 27, 2022

I have a very accent-heavy hungarian document I'm parsing, and 95% of the processing time was spent in name_to_unicode

Please consider using a HashMap or, even better, a compile-time perfect hash function. Example patch here:
badicsalex@5cb9b67

@badicsalex
Copy link
Author

BTW, there are a lot of duplicate entries in the map.

@Grant-Brinkman
Copy link

Can confirm that the suggested fix vastly improves speed. For a sample size of 60 PDFs, with most of them being multiple pages long, simply extracting text from them took 27 seconds.

After implementing this fix, and the one suggested in issue 33 my run time went down to 1 second. Highly recommend.

@jrmuizel
Copy link
Owner

Is it possible for you to share that or a similar document?

@badicsalex
Copy link
Author

badicsalex commented Mar 20, 2022

Here's a long one that's pretty good for benchmarking:
http://www.kozlonyok.hu/nkonline/MKPDF/hiteles/MK13031.pdf

And a short one to benchmark font parsing:
http://www.kozlonyok.hu/nkonline/MKPDF/hiteles/MK20058.pdf

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants