Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Review notebook cacheing and execution packages #3

Open
choldgraf opened this issue Feb 17, 2020 · 7 comments
Open

Review notebook cacheing and execution packages #3

choldgraf opened this issue Feb 17, 2020 · 7 comments

Comments

@choldgraf
Copy link
Member

A place to discover and list other tools that do some form of notebook cacheing / execution / storage abstractions

@chrisjsewell
Copy link
Member

  • tinydb is a well-used, lightweight package with a simple JSON database API. Different storage classes can be used, which can also be wrapped in Middleware to customise their behaviour:
>>> from tinydb.storages import JSONStorage
>>> from tinydb.middlewares import CachingMiddleware
>>> db = TinyDB('/path/to/db.json', storage=CachingMiddleware(JSONStorage))

@chrisjsewell
Copy link
Member

chrisjsewell commented Feb 17, 2020

scrapbook contains (in-memory only) classes to represent a collection of notebooks Scrapbook, and a single notebook Notebook.

Of note, is that these have methods for returning notebook/cell execution metrics (like time taken), which they presumably store during notebook execution.

They also provide methods to access 'scraps' which are outputs stored with name identifiers (see ExecutableBookProject/myst_parser#46)

@chrisjsewell
Copy link
Member

@chrisjsewell
Copy link
Member

Another thought I had, is to look at git itself and e.g. GitPython. I could conceive of something like the cache being its own small repository and when you add a new notebook or update one, you 'stage' it, then on execution you get all the 'staged' notebooks, run them, then commit back the final notebooks.

@chrisjsewell
Copy link
Member

chrisjsewell commented Feb 19, 2020

  • rossant/ipycache (last commit 2016), SmartDataInnovationLab/ipython-cache (last commit 2018) are both examples of cell level magics that pickle the outputs of cells for later use.
  • mkery/Verdant (last commit Oct 24, 2019) is a JupyterLab extension that automatically records the 'history' of Jupyter notebook cells, and stores them in a .ipyhistory JSON file. Note, the code is all written in TypeScript.

@choldgraf
Copy link
Member Author

choldgraf commented Feb 19, 2020

Another thought I had, is to look at git itself and e.g. GitPython. I could conceive of something like the cache being its own small repository and when you add a new notebook or update one, you 'stage' it, then on execution you get all the 'staged' notebooks, run them, then commit back the final notebooks.

I think this is the kinda thing that some more bespoke notebook UIs do. E.g., I believe that Gigantum.IO (a proprietary cloud interface for notebooks) commits notebooks to a git repository on-the-fly, and then gives you the option to go back in history if needed. I don't believe they do any execution cacheing, just content cacheing

@choldgraf choldgraf transferred this issue from executablebooks/meta Feb 19, 2020
@chrisjsewell chrisjsewell changed the title List of notebook cacheing and execution Review notebook cacheing and execution packages Feb 25, 2020
@eldad-a
Copy link

eldad-a commented May 4, 2020

Thank you for creating this helpful resource!

As I am on the search myself, here is another pointer (which I still need explore):

dask.cache and cachey

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants