Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not able to read in papers with Corpus() #1

Open
swood-ecology opened this issue Jul 6, 2017 · 2 comments
Open

Not able to read in papers with Corpus() #1

swood-ecology opened this issue Jul 6, 2017 · 2 comments

Comments

@swood-ecology
Copy link

When executing the following command in R from your tutorial

papers <- Corpus(URISource(pdf), readerControl = list(reader=pdfRead))

I get the following error

sh: pdfinfo: command not found
sh: pdftotext: command not found
Error in system2("pdftotext", c(control$text, shQuote(x), "-"), stdout = TRUE) :
error in running command

My pdf object looks like this:

pdf
[1] "arees-informatics-2006-reprint.pdf"
[2] "Borer et al 2009 Bull ESA_Effective Data Management.pdf"
[3] "Fegraus-esa_bulletin_eml_ms_07_2005.pdf"
[4] "Harris_2017_Environ._Res._Lett._12_024012.pdf"
[5] "Heidorn_2008_Shedding Light on the Dark Data in the Long Tail of Science.pdf"
[6] "MORTON_et_al-2008-Global_Change_Biology.pdf"
[7] "Ohara et al 2016_Aligning marine species range data to better serve science and
conservation.pdf"
[8] "peerj-preprints-549.pdf"

and my readPDF function object looks like this:

pdfRead
function (elem, language, id)
{
uri <- processURI(elem$uri)
meta <- pdf_info(uri)
content <- pdf_text(uri)
PlainTextDocument(content, meta$Author, meta$CreationDate,
meta$Subject, meta$Title, basename(elem$uri), language,
meta$Creator)
}
<environment: 0x110bd22a0>

@brunj7
Copy link
Member

brunj7 commented Jul 6, 2017

Hi Steve,

It seems like pdftotext is not install on your machine. The best way to test this: from the terminal type:

pdftotext

If you get an error, it means the library is not installed. You can try to follow that: http://www.foolabs.com/xpdf/download.html

it you get like a description of the tool, then it is something else. Let me know

@swood-ecology
Copy link
Author

swood-ecology commented Jul 6, 2017 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants