Review notebook cacheing and execution packages #3

choldgraf · 2020-02-17T07:49:03Z

A place to discover and list other tools that do some form of notebook cacheing / execution / storage abstractions

Scrapbook (metadata tagging for python objects and cell outputs)
Bookstore (storage layer on S3 for notebooks)
Zarr (chunked storage interface https://zarr.readthedocs.io/en/stable/)

chrisjsewell · 2020-02-17T08:14:33Z

tinydb is a well-used, lightweight package with a simple JSON database API. Different storage classes can be used, which can also be wrapped in Middleware to customise their behaviour:

>>> from tinydb.storages import JSONStorage
>>> from tinydb.middlewares import CachingMiddleware
>>> db = TinyDB('/path/to/db.json', storage=CachingMiddleware(JSONStorage))

chrisjsewell · 2020-02-17T08:29:15Z

scrapbook contains (in-memory only) classes to represent a collection of notebooks Scrapbook, and a single notebook Notebook.

Of note, is that these have methods for returning notebook/cell execution metrics (like time taken), which they presumably store during notebook execution.

They also provide methods to access 'scraps' which are outputs stored with name identifiers (see ExecutableBookProject/myst_parser#46)

chrisjsewell · 2020-02-19T11:03:53Z

This is the link to the cacheing currently implemented by @mmcky and @AakashGfude: https://github.com/QuantEcon/sphinxcontrib-jupyter/blob/b5d9b2e77fdc571c4c718e67847020625d096d6d/sphinxcontrib/jupyter/builders/jupyter_code.py#L119

chrisjsewell · 2020-02-19T11:16:34Z

Another thought I had, is to look at git itself and e.g. GitPython. I could conceive of something like the cache being its own small repository and when you add a new notebook or update one, you 'stage' it, then on execution you get all the 'staged' notebooks, run them, then commit back the final notebooks.

chrisjsewell · 2020-02-19T12:44:07Z

rossant/ipycache (last commit 2016), SmartDataInnovationLab/ipython-cache (last commit 2018) are both examples of cell level magics that pickle the outputs of cells for later use.
mkery/Verdant (last commit Oct 24, 2019) is a JupyterLab extension that automatically records the 'history' of Jupyter notebook cells, and stores them in a .ipyhistory JSON file. Note, the code is all written in TypeScript.

choldgraf · 2020-02-19T15:04:35Z

Another thought I had, is to look at git itself and e.g. GitPython. I could conceive of something like the cache being its own small repository and when you add a new notebook or update one, you 'stage' it, then on execution you get all the 'staged' notebooks, run them, then commit back the final notebooks.

I think this is the kinda thing that some more bespoke notebook UIs do. E.g., I believe that Gigantum.IO (a proprietary cloud interface for notebooks) commits notebooks to a git repository on-the-fly, and then gives you the option to go back in history if needed. I don't believe they do any execution cacheing, just content cacheing

eldad-a · 2020-05-04T08:41:39Z

Thank you for creating this helpful resource!

As I am on the search myself, here is another pointer (which I still need explore):

dask.cache and cachey

choldgraf transferred this issue from executablebooks/meta Feb 19, 2020

chrisjsewell changed the title ~~List of notebook cacheing and execution~~ Review notebook cacheing and execution packages Feb 25, 2020

chrisjsewell added the software review label Feb 25, 2020

eldad-a mentioned this issue May 4, 2020

Import pickle rossant/ipycache#48

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Review notebook cacheing and execution packages #3

Review notebook cacheing and execution packages #3

choldgraf commented Feb 17, 2020

chrisjsewell commented Feb 17, 2020

chrisjsewell commented Feb 17, 2020 •

edited

Loading

chrisjsewell commented Feb 19, 2020

chrisjsewell commented Feb 19, 2020

chrisjsewell commented Feb 19, 2020 •

edited

Loading

choldgraf commented Feb 19, 2020 •

edited

Loading

eldad-a commented May 4, 2020

Review notebook cacheing and execution packages #3

Review notebook cacheing and execution packages #3

Comments

choldgraf commented Feb 17, 2020

chrisjsewell commented Feb 17, 2020

chrisjsewell commented Feb 17, 2020 • edited Loading

chrisjsewell commented Feb 19, 2020

chrisjsewell commented Feb 19, 2020

chrisjsewell commented Feb 19, 2020 • edited Loading

choldgraf commented Feb 19, 2020 • edited Loading

eldad-a commented May 4, 2020

chrisjsewell commented Feb 17, 2020 •

edited

Loading

chrisjsewell commented Feb 19, 2020 •

edited

Loading

choldgraf commented Feb 19, 2020 •

edited

Loading