Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

search Google sheets? #30

Open
raynamharris opened this issue Aug 14, 2018 · 5 comments
Open

search Google sheets? #30

raynamharris opened this issue Aug 14, 2018 · 5 comments
Labels
enhancement Feature request new type Issues requiring new types of documents/item in the search index

Comments

@raynamharris
Copy link
Contributor

As far as I can tell, the search function is not searching Google Sheets. I'm not sure if this is known or not. The search engine says that it's indexing them, but it never returns a sheet that I know contains the keyword.

@charlesreid1
Copy link
Contributor

Correct, that's because Google Drive files are put into two buckets:

  • Documents (.docx files) where content is indexed
  • Everything else (including spreadsheets) where only the filename is indexed

Because spreadsheets are structured, there's definitely the possibility to extract text from spreadsheets, I just didn't want to delve too deep into the second bullet point above in the interest of getting a prototype working.

@charlesreid1 charlesreid1 added enhancement Feature request new type Issues requiring new types of documents/item in the search index labels Aug 14, 2018
raynamharris added a commit that referenced this issue Aug 14, 2018
add "(.docx files)" as per issue #30
@raynamharris
Copy link
Contributor Author

Ah okay. Maybe I'm the only one, but I see "document" and think any of file in Google drive... kinda like how you can ask someone "Do you want a coke" and they can say, "Yes, I'll have a Dr.Pepper/Pepsi/CocaCola/Sprite" because coke and document are blanket terms.

@ctb
Copy link

ctb commented Aug 14, 2018 via email

@charlesreid1
Copy link
Contributor

xlrd library would provide a (possible?) way to extract text from spreadsheet files. would probably want to do additional processing, since this would all be word soup and has high potential to become an irrelevant result in lots of searches.

pptx library provides ability to navigate powerpoint files, extract text: https://stackoverflow.com/a/39430554 (same with post-processing, see above)

@charlesreid1
Copy link
Contributor

for PDFs: I remember using PDFMiner to extract text from a PDF file a while back. Can't remember where that script went but probably in copper repo somewhere.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Feature request new type Issues requiring new types of documents/item in the search index
Projects
None yet
Development

No branches or pull requests

3 participants