Skip to content

Commit

Permalink
Merge pull request #33 from ExecutableBookProject/improvements
Browse files Browse the repository at this point in the history
See commit messages and README
  • Loading branch information
chrisjsewell authored Mar 12, 2020
2 parents 9a43e29 + 19341e7 commit 14f7a90
Show file tree
Hide file tree
Showing 8 changed files with 130 additions and 67 deletions.
98 changes: 50 additions & 48 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ to come ...

## Example CLI usage

<!-- This section was auto-generated by: tests/make_cli_readme.py -->
<!-- This section was auto-generated on 2020-03-12 17:31 by: /Users/cjs14/GitHub/jupyter-cache/tests/make_cli_readme.py -->

From the checked-out repository folder:

Expand Down Expand Up @@ -83,13 +83,13 @@ Options:
--help Show this message and exit.

Commands:
add-many Cache notebook(s) that have already been executed.
add-one Cache a notebook, with possible artefact files.
cat-artifact Print the contents of a cached artefact.
diff-nb Print a diff of a notebook to one stored in the cache.
list List cached notebook records in the cache.
remove Remove notebooks stored in the cache.
show Show details of a cached notebook in the cache.
add Cache notebook(s) that have already been executed.
add-with-artefacts Cache a notebook, with possible artefact files.
cat-artifact Print the contents of a cached artefact.
diff-nb Print a diff of a notebook to one stored in the cache.
list List cached notebook records in the cache.
remove Remove notebooks stored in the cache.
show Show details of a cached notebook in the cache.
```

The first time the cache is required, it will be lazily created:
Expand All @@ -107,7 +107,7 @@ When caching, a check will be made that the notebooks look to have been executed
correctly, i.e. the cell execution counts go sequentially up from 1.

```console
$ jcache cache add-many tests/notebooks/basic.ipynb
$ jcache cache add tests/notebooks/basic.ipynb
Caching: ../tests/notebooks/basic.ipynb
Validity Error: Expected cell 1 to have execution_count 1 not 2
The notebook may not have been executed, continue caching? [y/N]: y
Expand All @@ -117,7 +117,7 @@ Success!
Or to skip validation:

```console
$ jcache cache add-many --no-validate tests/notebooks/basic.ipynb tests/notebooks/basic_failing.ipynb tests/notebooks/basic_unrun.ipynb tests/notebooks/complex_outputs.ipynb tests/notebooks/external_output.ipynb
$ jcache cache add --no-validate tests/notebooks/basic.ipynb tests/notebooks/basic_failing.ipynb tests/notebooks/basic_unrun.ipynb tests/notebooks/complex_outputs.ipynb tests/notebooks/external_output.ipynb
Caching: ../tests/notebooks/basic.ipynb
Caching: ../tests/notebooks/basic_failing.ipynb
Caching: ../tests/notebooks/basic_unrun.ipynb
Expand All @@ -139,19 +139,21 @@ You can remove cached records by their ID.

```console
$ jcache cache list
ID URI Created Accessed
ID Origin URI Created Accessed
---- ------------------------------------- ---------------- ----------------
5 tests/notebooks/external_output.ipynb 2020-02-29 17:48 2020-02-29 17:48
4 tests/notebooks/complex_outputs.ipynb 2020-02-29 17:48 2020-02-29 17:48
3 tests/notebooks/basic_unrun.ipynb 2020-02-29 17:48 2020-02-29 17:48
2 tests/notebooks/basic_failing.ipynb 2020-02-29 17:48 2020-02-29 17:48
5 tests/notebooks/external_output.ipynb 2020-03-12 17:31 2020-03-12 17:31
4 tests/notebooks/complex_outputs.ipynb 2020-03-12 17:31 2020-03-12 17:31
3 tests/notebooks/basic_unrun.ipynb 2020-03-12 17:31 2020-03-12 17:31
2 tests/notebooks/basic_failing.ipynb 2020-03-12 17:31 2020-03-12 17:31
```

Tip: Use the `--latest-only` option, to only show the latest versions of cached notebooks.

You can also cache notebooks with artefacts
(external outputs of the notebook execution).

```console
$ jcache cache add-one -nb tests/notebooks/basic.ipynb tests/notebooks/artifact_folder/artifact.txt
$ jcache cache add-with-artefacts -nb tests/notebooks/basic.ipynb tests/notebooks/artifact_folder/artifact.txt
Caching: ../tests/notebooks/basic.ipynb
Validity Error: Expected cell 1 to have execution_count 1 not 2
The notebook may not have been executed, continue caching? [y/N]: y
Expand All @@ -163,9 +165,9 @@ Show a full description of a cached notebook by referring to its ID
```console
$ jcache cache show 6
ID: 6
URI: ../tests/notebooks/basic.ipynb
Created: 2020-02-29 17:48
Accessed: 2020-02-29 17:48
Origin URI: ../tests/notebooks/basic.ipynb
Created: 2020-03-12 17:31
Accessed: 2020-03-12 17:31
Hashkey: 818f3412b998fcf4fe9ca3cca11a3fc3
Artifacts:
- artifact_folder/artifact.txt
Expand All @@ -174,7 +176,7 @@ Artifacts:
Note artefact paths must be 'upstream' of the notebook folder:

```console
$ jcache cache add-one -nb tests/notebooks/basic.ipynb tests/test_db.py
$ jcache cache add-with-artefacts -nb tests/notebooks/basic.ipynb tests/test_db.py
Caching: ../tests/notebooks/basic.ipynb
Artifact Error: Path '../tests/test_db.py' is not in folder '../tests/notebooks''
```
Expand Down Expand Up @@ -236,12 +238,12 @@ Options:
--help Show this message and exit.

Commands:
add-many Stage notebook(s) for execution.
add-one Stage a notebook, with possible asset files.
list List notebooks staged for possible execution.
remove-ids Un-stage notebook(s), by ID.
remove-uris Un-stage notebook(s), by URI.
show Show details of a staged notebook.
add Stage notebook(s) for execution.
add-with-assets Stage a notebook, with possible asset files.
list List notebooks staged for possible execution.
remove-ids Un-stage notebook(s), by ID.
remove-uris Un-stage notebook(s), by URI.
show Show details of a staged notebook.
```

Staged notebooks are recorded as pointers to their URI,
Expand All @@ -252,7 +254,7 @@ you can list them to see which have existing records in the cache (by hash),
and which will require execution:

```console
$ jcache stage add-many tests/notebooks/basic.ipynb tests/notebooks/basic_failing.ipynb tests/notebooks/basic_unrun.ipynb tests/notebooks/complex_outputs.ipynb tests/notebooks/external_output.ipynb
$ jcache stage add tests/notebooks/basic.ipynb tests/notebooks/basic_failing.ipynb tests/notebooks/basic_unrun.ipynb tests/notebooks/complex_outputs.ipynb tests/notebooks/external_output.ipynb
Staging: ../tests/notebooks/basic.ipynb
Staging: ../tests/notebooks/basic_failing.ipynb
Staging: ../tests/notebooks/basic_unrun.ipynb
Expand All @@ -265,11 +267,11 @@ Success!
$ jcache stage list
ID URI Created Assets Cache ID
---- ------------------------------------- ---------------- -------- ----------
5 tests/notebooks/external_output.ipynb 2020-02-29 17:48 0 5
4 tests/notebooks/complex_outputs.ipynb 2020-02-29 17:48 0
3 tests/notebooks/basic_unrun.ipynb 2020-02-29 17:48 0 6
2 tests/notebooks/basic_failing.ipynb 2020-02-29 17:48 0 2
1 tests/notebooks/basic.ipynb 2020-02-29 17:48 0 6
5 tests/notebooks/external_output.ipynb 2020-03-12 17:31 0 5
4 tests/notebooks/complex_outputs.ipynb 2020-03-12 17:31 0
3 tests/notebooks/basic_unrun.ipynb 2020-03-12 17:31 0 6
2 tests/notebooks/basic_failing.ipynb 2020-03-12 17:31 0 2
1 tests/notebooks/basic.ipynb 2020-03-12 17:31 0 6
```

You can remove a staged notebook by its URI or ID:
Expand Down Expand Up @@ -297,7 +299,7 @@ Executing: ../tests/notebooks/basic_failing.ipynb
error: Execution Failed: ../tests/notebooks/basic_failing.ipynb
Executing: ../tests/notebooks/basic_unrun.ipynb
Execution Succeeded: ../tests/notebooks/basic_unrun.ipynb
Finished!
Finished! Successfully executed notebooks have been cached.
succeeded:
- ../tests/notebooks/basic.ipynb
- ../tests/notebooks/basic_unrun.ipynb
Expand All @@ -315,23 +317,23 @@ that are inside the notebook folder, and data supplied by the executor.
$ jcache stage list
ID URI Created Assets Cache ID
---- ------------------------------------- ---------------- -------- ----------
5 tests/notebooks/external_output.ipynb 2020-02-29 17:48 0 5
3 tests/notebooks/basic_unrun.ipynb 2020-02-29 17:48 0 6
2 tests/notebooks/basic_failing.ipynb 2020-02-29 17:48 0
1 tests/notebooks/basic.ipynb 2020-02-29 17:48 0 6
5 tests/notebooks/external_output.ipynb 2020-03-12 17:31 0 5
3 tests/notebooks/basic_unrun.ipynb 2020-03-12 17:31 0 6
2 tests/notebooks/basic_failing.ipynb 2020-03-12 17:31 0
1 tests/notebooks/basic.ipynb 2020-03-12 17:31 0 6
```

Execution data (such as execution time) will be stored in the cache record:

```console
$ jcache cache show 6
ID: 6
URI: ../tests/notebooks/basic_unrun.ipynb
Created: 2020-02-29 17:48
Accessed: 2020-02-29 17:48
Origin URI: ../tests/notebooks/basic_unrun.ipynb
Created: 2020-03-12 17:31
Accessed: 2020-03-12 17:31
Hashkey: 818f3412b998fcf4fe9ca3cca11a3fc3
Data:
execution_seconds: 1.2727476909999993
execution_seconds: 1.0559415130000005

```

Expand All @@ -341,18 +343,18 @@ Failed notebooks will not be cached, but the exception traceback will be added t
$ jcache stage show 2
ID: 2
URI: ../tests/notebooks/basic_failing.ipynb
Created: 2020-02-29 17:48
Created: 2020-03-12 17:31
Failed Last Execution!
Traceback (most recent call last):
File "../jupyter_cache/executors/basic.py", line 152, in execute
executenb(nb_bundle.nb, cwd=tmpdirname)
File "//anaconda/envs/mistune/lib/python3.7/site-packages/nbconvert/preprocessors/execute.py", line 737, in executenb
File "/anaconda/envs/mistune/lib/python3.7/site-packages/nbconvert/preprocessors/execute.py", line 737, in executenb
return ep.preprocess(nb, resources, km=km)[0]
File "//anaconda/envs/mistune/lib/python3.7/site-packages/nbconvert/preprocessors/execute.py", line 405, in preprocess
File "/anaconda/envs/mistune/lib/python3.7/site-packages/nbconvert/preprocessors/execute.py", line 405, in preprocess
nb, resources = super(ExecutePreprocessor, self).preprocess(nb, resources)
File "//anaconda/envs/mistune/lib/python3.7/site-packages/nbconvert/preprocessors/base.py", line 69, in preprocess
File "/anaconda/envs/mistune/lib/python3.7/site-packages/nbconvert/preprocessors/base.py", line 69, in preprocess
nb.cells[index], resources = self.preprocess_cell(cell, resources, index)
File "//anaconda/envs/mistune/lib/python3.7/site-packages/nbconvert/preprocessors/execute.py", line 448, in preprocess_cell
File "/anaconda/envs/mistune/lib/python3.7/site-packages/nbconvert/preprocessors/execute.py", line 448, in preprocess_cell
raise CellExecutionError.from_cell_and_msg(cell, out)
nbconvert.preprocessors.execute.CellExecutionError: An error occurred while executing the following cell:
------------------
Expand Down Expand Up @@ -388,15 +390,15 @@ As with artefacts, these files must be in the same folder as the notebook,
or a sub-folder.

```console
$ jcache stage add-one -nb tests/notebooks/basic.ipynb tests/notebooks/artifact_folder/artifact.txt
$ jcache stage add-with-assets -nb tests/notebooks/basic.ipynb tests/notebooks/artifact_folder/artifact.txt
Success!
```

```console
$ jcache stage show 1
ID: 1
URI: ../tests/notebooks/basic.ipynb
Created: 2020-02-29 17:48
Created: 2020-03-12 17:31
Cache ID: 6
Assets:
- ../tests/notebooks/artifact_folder/artifact.txt
Expand Down
24 changes: 16 additions & 8 deletions jupyter_cache/cache/db.py
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@ def get_value(key: str, db: Engine, default=None):
Setting.set_value(key, default, db)
result = [default]
else:
raise KeyError(key)
raise KeyError("Setting not found in DB: {}".format(key))
value = result[0]
return value

Expand Down Expand Up @@ -121,7 +121,9 @@ def record_from_hashkey(hashkey: str, db: Engine) -> "NbCacheRecord":
session.query(NbCacheRecord).filter_by(hashkey=hashkey).one_or_none()
)
if result is None:
raise KeyError(hashkey)
raise KeyError(
"Cache record not found for NB with hashkey: {}".format(hashkey)
)
session.expunge(result)
return result

Expand All @@ -130,7 +132,7 @@ def record_from_pk(pk: int, db: Engine) -> "NbCacheRecord":
with session_context(db) as session: # type: Session
result = session.query(NbCacheRecord).filter_by(pk=pk).one_or_none()
if result is None:
raise KeyError(pk)
raise KeyError("Cache record not found for NB with PK: {}".format(pk))
session.expunge(result)
return result

Expand All @@ -139,7 +141,7 @@ def touch(pk, db: Engine):
with session_context(db) as session: # type: Session
record = session.query(NbCacheRecord).filter_by(pk=pk).one_or_none()
if record is None:
raise KeyError(pk)
raise KeyError("Cache record not found for NB with PK: {}".format(pk))
record.accessed = datetime.utcnow()
session.commit()

Expand All @@ -150,7 +152,9 @@ def touch_hashkey(hashkey, db: Engine):
session.query(NbCacheRecord).filter_by(hashkey=hashkey).one_or_none()
)
if record is None:
raise KeyError(hashkey)
raise KeyError(
"Cache record not found for NB with hashkey: {}".format(hashkey)
)
record.accessed = datetime.utcnow()
session.commit()

Expand Down Expand Up @@ -260,7 +264,7 @@ def record_from_pk(pk: int, db: Engine) -> "NbStageRecord":
with session_context(db) as session: # type: Session
result = session.query(NbStageRecord).filter_by(pk=pk).one_or_none()
if result is None:
raise KeyError(pk)
raise KeyError("Staging record not found for NB with PK: {}".format(pk))
session.expunge(result)
return result

Expand All @@ -269,7 +273,9 @@ def record_from_uri(uri: str, db: Engine) -> "NbStageRecord":
with session_context(db) as session: # type: Session
result = session.query(NbStageRecord).filter_by(uri=uri).one_or_none()
if result is None:
raise KeyError(uri)
raise KeyError(
"Staging record not found for NB with URI: {}".format(uri)
)
session.expunge(result)
return result

Expand All @@ -292,7 +298,9 @@ def set_traceback(uri: str, traceback: Optional[str], db: Engine):
with session_context(db) as session: # type: Session
result = session.query(NbStageRecord).filter_by(uri=uri).one_or_none()
if result is None:
raise KeyError(uri)
raise KeyError(
"Staging record not found for NB with URI: {}".format(uri)
)
result.traceback = traceback
try:
session.commit()
Expand Down
8 changes: 6 additions & 2 deletions jupyter_cache/cache/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -279,7 +279,9 @@ def get_cache_bundle(self, pk: int) -> NbBundleOut:
path = self._get_notebook_path_cache(record.hashkey)
artifact_folder = self._get_artifact_path_cache(record.hashkey)
if not path.exists():
raise KeyError(pk)
raise KeyError(
"Notebook file does not exist for cache record PK: {}".format(pk)
)

return NbBundleOut(
nbf.reads(path.read_text(), NB_VERSION),
Expand Down Expand Up @@ -307,7 +309,9 @@ def remove_cache(self, pk: int):
record = NbCacheRecord.record_from_pk(pk, self.db)
path = self._get_notebook_path_cache(record.hashkey)
if not path.exists():
raise KeyError(pk)
raise KeyError(
"Notebook file does not exist for cache record PK: {}".format(pk)
)
shutil.rmtree(path.parent)
NbCacheRecord.remove_records([pk], self.db)

Expand Down
25 changes: 20 additions & 5 deletions jupyter_cache/cli/commands/cmd_cache.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ def cmnd_cache():
def format_cache_record(record, hashkeys, path_length):
data = {
"ID": record.pk,
"URI": str(shorten_path(record.uri, path_length)),
"Origin URI": str(shorten_path(record.uri, path_length)),
"Created": record.created.isoformat(" ", "minutes"),
"Accessed": record.accessed.isoformat(" ", "minutes"),
# "Description": record.description,
Expand All @@ -28,9 +28,15 @@ def format_cache_record(record, hashkeys, path_length):

@cmnd_cache.command("list")
@options.CACHE_PATH
@click.option("-h", "--hashkeys", is_flag=True, help="Whether to show hashkeys.")
@click.option(
"-l",
"--latest-only",
is_flag=True,
help="Show only the most recent record per origin URI.",
)
@click.option("-h", "--hashkeys", is_flag=True, help="Show the hashkey of notebook.")
@options.PATH_LENGTH
def list_caches(cache_path, hashkeys, path_length):
def list_caches(cache_path, latest_only, hashkeys, path_length):
"""List cached notebook records in the cache."""
import tabulate

Expand All @@ -39,6 +45,15 @@ def list_caches(cache_path, hashkeys, path_length):
if not records:
click.secho("No Cached Notebooks", fg="blue")
# TODO optionally list number of artifacts
if latest_only:
latest_records = {}
for record in records:
if record.uri not in latest_records:
latest_records[record.uri] = record
continue
if latest_records[record.uri].created < record.created:
latest_records[record.uri] = record
records = list(latest_records.values())
click.echo(
tabulate.tabulate(
[
Expand Down Expand Up @@ -131,7 +146,7 @@ def cache_file(db, nbpath, validate, overwrite, artifact_paths=()):
return True


@cmnd_cache.command("add-one")
@cmnd_cache.command("add-with-artefacts")
@arguments.ARTIFACT_PATHS
@options.NB_PATH
@options.CACHE_PATH
Expand All @@ -145,7 +160,7 @@ def cache_nb(cache_path, artifact_paths, nbpath, validate, overwrite):
click.secho("Success!", fg="green")


@cmnd_cache.command("add-many")
@cmnd_cache.command("add")
@arguments.NB_PATHS
@options.CACHE_PATH
@options.VALIDATE_NB
Expand Down
4 changes: 3 additions & 1 deletion jupyter_cache/cli/commands/cmd_exec.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,5 +28,7 @@ def execute_nbs(cache_path, entry_point, pks):
logger.error(str(error))
return 1
result = executor.run_and_cache(filter_pks=pks or None)
click.secho("Finished!", fg="green")
click.secho(
"Finished! Successfully executed notebooks have been cached.", fg="green"
)
click.echo(yaml.safe_dump(result, sort_keys=False))
Loading

0 comments on commit 14f7a90

Please sign in to comment.