Performance: glyphnames::name_to_unicode is very slow #34

badicsalex · 2022-02-27T20:20:36Z

I have a very accent-heavy hungarian document I'm parsing, and 95% of the processing time was spent in name_to_unicode

Please consider using a HashMap or, even better, a compile-time perfect hash function. Example patch here:
badicsalex@5cb9b67

The text was updated successfully, but these errors were encountered:

badicsalex · 2022-02-27T20:58:15Z

BTW, there are a lot of duplicate entries in the map.

Grant-Brinkman · 2022-03-18T16:42:03Z

Can confirm that the suggested fix vastly improves speed. For a sample size of 60 PDFs, with most of them being multiple pages long, simply extracting text from them took 27 seconds.

After implementing this fix, and the one suggested in issue 33 my run time went down to 1 second. Highly recommend.

jrmuizel · 2022-03-18T20:46:16Z

Is it possible for you to share that or a similar document?

badicsalex · 2022-03-20T10:01:34Z

Here's a long one that's pretty good for benchmarking:
http://www.kozlonyok.hu/nkonline/MKPDF/hiteles/MK13031.pdf

And a short one to benchmark font parsing:
http://www.kozlonyok.hu/nkonline/MKPDF/hiteles/MK20058.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance: glyphnames::name_to_unicode is very slow #34

Performance: glyphnames::name_to_unicode is very slow #34

badicsalex commented Feb 27, 2022 •

edited

Loading

badicsalex commented Feb 27, 2022

Grant-Brinkman commented Mar 18, 2022

jrmuizel commented Mar 18, 2022

badicsalex commented Mar 20, 2022 •

edited

Loading

Performance: glyphnames::name_to_unicode is very slow #34

Performance: glyphnames::name_to_unicode is very slow #34

Comments

badicsalex commented Feb 27, 2022 • edited Loading

badicsalex commented Feb 27, 2022

Grant-Brinkman commented Mar 18, 2022

jrmuizel commented Mar 18, 2022

badicsalex commented Mar 20, 2022 • edited Loading

badicsalex commented Feb 27, 2022 •

edited

Loading

badicsalex commented Mar 20, 2022 •

edited

Loading