You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Multiple problems standing in the way of running this as a serverless Flask instance, but there are several ways to solve them:
Problems
The search index is stored on disk - it is not possible to have persistent files on disk on Heroku, things must be stored in memory.
Google drive .docx documents must be downloaded to disk so that they can be converted to plain text using Pandoc - again, not possible to have persistent files on disk on Heroku, this must be done in memory.
Pandoc is not a Python program, nor is it pip-installable. You can't build arbitrary packages on a Heroku node.
(Container) Solutions
Solve 1, 2, and 3 in a fell swoop by deploying Centillion to Heroku as a_Docker container (added advantage: Dockerizing services has proven relatively easy in the past). link with more info - basically, Heroku runs their own container registry, so you build docker images, test them, push them to the registry, and deploy Heroku nodes that run a container image.
(Can also easily do multi-container applications using docker-compose, as I now have experience building multi-container pods.)
(Container-less) Solutions
Solve 1 using the very well-developed solution of SQLAlchemy + Whoosh to store a search index in memory. This requires creating a database and linking the search index schema to the alchemy database, see e.g. gyllstromk/Flask-WhooshAlchemy
Solve 2 without containers by using some advanced piping tricks. Using the URLs for Drive documents, download the .docx file into a pipe, and pass contents of that pipe into pandoc. You can call pandoc on stdin just as you can call it on input files.
Solve 3 without containers by installing the Heroku pandoc buildpack into the project. This is the equivalent of running apt-get install pandoc on your Heroku node.
After installing the pandoc buildpack, pandoc is at /app/vendor/pandoc/bin, so you would probably call that binary with subprocess.Popen(). Alternatively use pypandoc (this would work because pandoc is added to $PATH when the pandoc build pack is installed, and that's how pypandoc finds a version of pandoc to wrap).
The text was updated successfully, but these errors were encountered:
Multiple problems standing in the way of running this as a serverless Flask instance, but there are several ways to solve them:
Problems
The search index is stored on disk - it is not possible to have persistent files on disk on Heroku, things must be stored in memory.
Google drive .docx documents must be downloaded to disk so that they can be converted to plain text using Pandoc - again, not possible to have persistent files on disk on Heroku, this must be done in memory.
Pandoc is not a Python program, nor is it pip-installable. You can't build arbitrary packages on a Heroku node.
(Container) Solutions
Solve 1, 2, and 3 in a fell swoop by deploying Centillion to Heroku as a_Docker container (added advantage: Dockerizing services has proven relatively easy in the past). link with more info - basically, Heroku runs their own container registry, so you build docker images, test them, push them to the registry, and deploy Heroku nodes that run a container image.
(Can also easily do multi-container applications using docker-compose, as I now have experience building multi-container pods.)
(Container-less) Solutions
Solve 1 using the very well-developed solution of SQLAlchemy + Whoosh to store a search index in memory. This requires creating a database and linking the search index schema to the alchemy database, see e.g. gyllstromk/Flask-WhooshAlchemy
Solve 2 without containers by using some advanced piping tricks. Using the URLs for Drive documents, download the .docx file into a pipe, and pass contents of that pipe into pandoc. You can call pandoc on stdin just as you can call it on input files.
Solve 3 without containers by installing the Heroku pandoc buildpack into the project. This is the equivalent of running
apt-get install pandoc
on your Heroku node.After installing the pandoc buildpack, pandoc is at
/app/vendor/pandoc/bin
, so you would probably call that binary with subprocess.Popen(). Alternatively use pypandoc (this would work because pandoc is added to$PATH
when the pandoc build pack is installed, and that's how pypandoc finds a version of pandoc to wrap).The text was updated successfully, but these errors were encountered: