-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sanity Check - Unicode Mismatch #61
Comments
Can you share a PDF that this happens with? |
Sending over some example errors I'm running into along with the files @jrmuizel
Backtrace for above:
Backtrace:
qt7vq3z6v1_noSplash_11cf93c4e513781acd1abae3cbe4e90d.pdf
Backtrace
|
Hi everyone, I am getting a unicode mismatch error too. It results the entire program quitting via panic. Is it possible if we can figure out if a pdf is suitable to work with the crate or not before hand, or instead of panicking is it possible to return an error, so that we can keep processing the rest of the files?? |
@sagarp-patel probably the pdf file would be good as well |
@piotroxp I think I found a solution that works for now. You can handle the panics using std::panic::catch_unwind . This will also keep the data from what has already been processed in the pdf so far. So just do a
|
i have similar error using this PDF: https://www3.weforum.org/docs/WEF_Future_of_Jobs_2020.pdf
Tried
|
I have created a PDF search application that scours your folders in search of documents and allows you to find keywords in the document.
At first, I was not using this crate, but at some point it turned out that my app was not finding the right wording in the PDFs. https://github.com/piotroxp/pdfscan
I am learning Rust at the same time when solving my real life need, which is going over terabytes of scientific PDF articles and finding the keywords in them.
Since I want to build a warp drive xD and have a very admirable cache of papers, you can understand that its critical for me to read all files regardless of encoding.
Today marks about 4 hours spent on looking at this error:
For some PDF docs, it works. For others, mainly those downloaded from popular scientific publishers, i am hit with that log.
My repo is attached just so you can understand what I want to achieve.
Wherein is the issue? I am new to Rust. I'm pretty sure that Rust, being a systems programming language, does supply PDF libs regardless of encoding. I can be wrong in that statement.
How can I fix my code? Ideally, I would enjoy the ability to read in bytes raw, and only then transform that representation to utf8. Right now, I am unable to search through sci papers.
This ticket is created just because I find it amusing and mentally challenging to understand what I do wrong. Unless you are doing something wrong, which is also a learning expierience.
The text was updated successfully, but these errors were encountered: