Run centillion on Heroku #20

charlesreid1 · 2018-08-10T20:42:46Z

Multiple problems standing in the way of running this as a serverless Flask instance, but there are several ways to solve them:

Problems

The search index is stored on disk - it is not possible to have persistent files on disk on Heroku, things must be stored in memory.
Google drive .docx documents must be downloaded to disk so that they can be converted to plain text using Pandoc - again, not possible to have persistent files on disk on Heroku, this must be done in memory.
Pandoc is not a Python program, nor is it pip-installable. You can't build arbitrary packages on a Heroku node.

(Container) Solutions

Solve 1, 2, and 3 in a fell swoop by deploying Centillion to Heroku as a_Docker container (added advantage: Dockerizing services has proven relatively easy in the past). link with more info - basically, Heroku runs their own container registry, so you build docker images, test them, push them to the registry, and deploy Heroku nodes that run a container image.

(Can also easily do multi-container applications using docker-compose, as I now have experience building multi-container pods.)

(Container-less) Solutions

Solve 1 using the very well-developed solution of SQLAlchemy + Whoosh to store a search index in memory. This requires creating a database and linking the search index schema to the alchemy database, see e.g. gyllstromk/Flask-WhooshAlchemy

Solve 2 without containers by using some advanced piping tricks. Using the URLs for Drive documents, download the .docx file into a pipe, and pass contents of that pipe into pandoc. You can call pandoc on stdin just as you can call it on input files.

Solve 3 without containers by installing the Heroku pandoc buildpack into the project. This is the equivalent of running apt-get install pandoc on your Heroku node.

After installing the pandoc buildpack, pandoc is at /app/vendor/pandoc/bin, so you would probably call that binary with subprocess.Popen(). Alternatively use pypandoc (this would work because pandoc is added to $PATH when the pandoc build pack is installed, and that's how pypandoc finds a version of pandoc to wrap).

The text was updated successfully, but these errors were encountered:

charlesreid1 · 2018-08-10T20:48:31Z

Path forward: do everything.

a) Dockerize it

also

b) implement sqlalchemy + pandoc buildpack + pypandoc + tricky subprocess pipes

also

c) providing service scripts/cron jobs if running as a native unix service

charlesreid1 mentioned this issue Aug 17, 2018

Containerize centillion #66

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Run centillion on Heroku #20

Run centillion on Heroku #20

charlesreid1 commented Aug 10, 2018 •

edited

Loading

charlesreid1 commented Aug 10, 2018

Run centillion on Heroku #20

Run centillion on Heroku #20

Comments

charlesreid1 commented Aug 10, 2018 • edited Loading

Problems

(Container) Solutions

(Container-less) Solutions

charlesreid1 commented Aug 10, 2018

charlesreid1 commented Aug 10, 2018 •

edited

Loading