Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improvements to CLI #33

Merged
merged 5 commits into from
Mar 12, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
98 changes: 50 additions & 48 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ to come ...

## Example CLI usage

<!-- This section was auto-generated by: tests/make_cli_readme.py -->
<!-- This section was auto-generated on 2020-03-12 17:31 by: /Users/cjs14/GitHub/jupyter-cache/tests/make_cli_readme.py -->

From the checked-out repository folder:

Expand Down Expand Up @@ -83,13 +83,13 @@ Options:
--help Show this message and exit.

Commands:
add-many Cache notebook(s) that have already been executed.
add-one Cache a notebook, with possible artefact files.
cat-artifact Print the contents of a cached artefact.
diff-nb Print a diff of a notebook to one stored in the cache.
list List cached notebook records in the cache.
remove Remove notebooks stored in the cache.
show Show details of a cached notebook in the cache.
add Cache notebook(s) that have already been executed.
add-with-artefacts Cache a notebook, with possible artefact files.
cat-artifact Print the contents of a cached artefact.
diff-nb Print a diff of a notebook to one stored in the cache.
list List cached notebook records in the cache.
remove Remove notebooks stored in the cache.
show Show details of a cached notebook in the cache.
```

The first time the cache is required, it will be lazily created:
Expand All @@ -107,7 +107,7 @@ When caching, a check will be made that the notebooks look to have been executed
correctly, i.e. the cell execution counts go sequentially up from 1.

```console
$ jcache cache add-many tests/notebooks/basic.ipynb
$ jcache cache add tests/notebooks/basic.ipynb
Caching: ../tests/notebooks/basic.ipynb
Validity Error: Expected cell 1 to have execution_count 1 not 2
The notebook may not have been executed, continue caching? [y/N]: y
Expand All @@ -117,7 +117,7 @@ Success!
Or to skip validation:

```console
$ jcache cache add-many --no-validate tests/notebooks/basic.ipynb tests/notebooks/basic_failing.ipynb tests/notebooks/basic_unrun.ipynb tests/notebooks/complex_outputs.ipynb tests/notebooks/external_output.ipynb
$ jcache cache add --no-validate tests/notebooks/basic.ipynb tests/notebooks/basic_failing.ipynb tests/notebooks/basic_unrun.ipynb tests/notebooks/complex_outputs.ipynb tests/notebooks/external_output.ipynb
Caching: ../tests/notebooks/basic.ipynb
Caching: ../tests/notebooks/basic_failing.ipynb
Caching: ../tests/notebooks/basic_unrun.ipynb
Expand All @@ -139,19 +139,21 @@ You can remove cached records by their ID.

```console
$ jcache cache list
ID URI Created Accessed
ID Origin URI Created Accessed
---- ------------------------------------- ---------------- ----------------
5 tests/notebooks/external_output.ipynb 2020-02-29 17:48 2020-02-29 17:48
4 tests/notebooks/complex_outputs.ipynb 2020-02-29 17:48 2020-02-29 17:48
3 tests/notebooks/basic_unrun.ipynb 2020-02-29 17:48 2020-02-29 17:48
2 tests/notebooks/basic_failing.ipynb 2020-02-29 17:48 2020-02-29 17:48
5 tests/notebooks/external_output.ipynb 2020-03-12 17:31 2020-03-12 17:31
4 tests/notebooks/complex_outputs.ipynb 2020-03-12 17:31 2020-03-12 17:31
3 tests/notebooks/basic_unrun.ipynb 2020-03-12 17:31 2020-03-12 17:31
2 tests/notebooks/basic_failing.ipynb 2020-03-12 17:31 2020-03-12 17:31
```

Tip: Use the `--latest-only` option, to only show the latest versions of cached notebooks.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about the default being --latest-only and having a flag like --include-history instead?

Copy link
Member Author

@chrisjsewell chrisjsewell Mar 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason why I'm hesitant to use --include-history is that the terminology is not strictly true in how the cache actually operates. Filtering for a certain Origin URI and sorting by Created does not guarantee to show a history for that URI; for example if you start changing file names of notebooks, or for a specific example:

  • create a git repository with notebook.ipynb and cache it
  • switch the branch of the git repository, modify notebook.ipynb and cache that
  • switch back to the original git branch

Assuming the actual cache isn't committed to git, you now have two versions of notebook.ipynb in the cache and the older one is actually the latest for the branch you are currently on.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#34 could be an alternative way to show only particular records

Copy link
Member

@choldgraf choldgraf Mar 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that makes sense. I am just trying to think of the most common use-case, and generally speaking we should be tailoring the default UI around the common use-cases. I think it's much more common for someone to think "I want to see a list of the notebooks that are in the cache" than for them to also think "and I want to see values for every version that I have ever cached for a given notebook".

Though your branching example is a tricky one, that would definitely confuse people if they only had a single notebook listed in the cache by default even though it was out of date:-/

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m definitely open to some form of solution, but it should be carefully considered what the balance is between ease-of-use and compliance with underlying logic


You can also cache notebooks with artefacts
(external outputs of the notebook execution).

```console
$ jcache cache add-one -nb tests/notebooks/basic.ipynb tests/notebooks/artifact_folder/artifact.txt
$ jcache cache add-with-artefacts -nb tests/notebooks/basic.ipynb tests/notebooks/artifact_folder/artifact.txt
Caching: ../tests/notebooks/basic.ipynb
Validity Error: Expected cell 1 to have execution_count 1 not 2
The notebook may not have been executed, continue caching? [y/N]: y
Expand All @@ -163,9 +165,9 @@ Show a full description of a cached notebook by referring to its ID
```console
$ jcache cache show 6
ID: 6
URI: ../tests/notebooks/basic.ipynb
Created: 2020-02-29 17:48
Accessed: 2020-02-29 17:48
Origin URI: ../tests/notebooks/basic.ipynb
Created: 2020-03-12 17:31
Accessed: 2020-03-12 17:31
Hashkey: 818f3412b998fcf4fe9ca3cca11a3fc3
Artifacts:
- artifact_folder/artifact.txt
Expand All @@ -174,7 +176,7 @@ Artifacts:
Note artefact paths must be 'upstream' of the notebook folder:

```console
$ jcache cache add-one -nb tests/notebooks/basic.ipynb tests/test_db.py
$ jcache cache add-with-artefacts -nb tests/notebooks/basic.ipynb tests/test_db.py
Caching: ../tests/notebooks/basic.ipynb
Artifact Error: Path '../tests/test_db.py' is not in folder '../tests/notebooks''
```
Expand Down Expand Up @@ -236,12 +238,12 @@ Options:
--help Show this message and exit.

Commands:
add-many Stage notebook(s) for execution.
add-one Stage a notebook, with possible asset files.
list List notebooks staged for possible execution.
remove-ids Un-stage notebook(s), by ID.
remove-uris Un-stage notebook(s), by URI.
show Show details of a staged notebook.
add Stage notebook(s) for execution.
add-with-assets Stage a notebook, with possible asset files.
list List notebooks staged for possible execution.
remove-ids Un-stage notebook(s), by ID.
remove-uris Un-stage notebook(s), by URI.
show Show details of a staged notebook.
```

Staged notebooks are recorded as pointers to their URI,
Expand All @@ -252,7 +254,7 @@ you can list them to see which have existing records in the cache (by hash),
and which will require execution:

```console
$ jcache stage add-many tests/notebooks/basic.ipynb tests/notebooks/basic_failing.ipynb tests/notebooks/basic_unrun.ipynb tests/notebooks/complex_outputs.ipynb tests/notebooks/external_output.ipynb
$ jcache stage add tests/notebooks/basic.ipynb tests/notebooks/basic_failing.ipynb tests/notebooks/basic_unrun.ipynb tests/notebooks/complex_outputs.ipynb tests/notebooks/external_output.ipynb
Staging: ../tests/notebooks/basic.ipynb
Staging: ../tests/notebooks/basic_failing.ipynb
Staging: ../tests/notebooks/basic_unrun.ipynb
Expand All @@ -265,11 +267,11 @@ Success!
$ jcache stage list
ID URI Created Assets Cache ID
---- ------------------------------------- ---------------- -------- ----------
5 tests/notebooks/external_output.ipynb 2020-02-29 17:48 0 5
4 tests/notebooks/complex_outputs.ipynb 2020-02-29 17:48 0
3 tests/notebooks/basic_unrun.ipynb 2020-02-29 17:48 0 6
2 tests/notebooks/basic_failing.ipynb 2020-02-29 17:48 0 2
1 tests/notebooks/basic.ipynb 2020-02-29 17:48 0 6
5 tests/notebooks/external_output.ipynb 2020-03-12 17:31 0 5
4 tests/notebooks/complex_outputs.ipynb 2020-03-12 17:31 0
3 tests/notebooks/basic_unrun.ipynb 2020-03-12 17:31 0 6
2 tests/notebooks/basic_failing.ipynb 2020-03-12 17:31 0 2
1 tests/notebooks/basic.ipynb 2020-03-12 17:31 0 6
```

You can remove a staged notebook by its URI or ID:
Expand Down Expand Up @@ -297,7 +299,7 @@ Executing: ../tests/notebooks/basic_failing.ipynb
error: Execution Failed: ../tests/notebooks/basic_failing.ipynb
Executing: ../tests/notebooks/basic_unrun.ipynb
Execution Succeeded: ../tests/notebooks/basic_unrun.ipynb
Finished!
Finished! Successfully executed notebooks have been cached.
succeeded:
- ../tests/notebooks/basic.ipynb
- ../tests/notebooks/basic_unrun.ipynb
Expand All @@ -315,23 +317,23 @@ that are inside the notebook folder, and data supplied by the executor.
$ jcache stage list
ID URI Created Assets Cache ID
---- ------------------------------------- ---------------- -------- ----------
5 tests/notebooks/external_output.ipynb 2020-02-29 17:48 0 5
3 tests/notebooks/basic_unrun.ipynb 2020-02-29 17:48 0 6
2 tests/notebooks/basic_failing.ipynb 2020-02-29 17:48 0
1 tests/notebooks/basic.ipynb 2020-02-29 17:48 0 6
5 tests/notebooks/external_output.ipynb 2020-03-12 17:31 0 5
3 tests/notebooks/basic_unrun.ipynb 2020-03-12 17:31 0 6
2 tests/notebooks/basic_failing.ipynb 2020-03-12 17:31 0
1 tests/notebooks/basic.ipynb 2020-03-12 17:31 0 6
```

Execution data (such as execution time) will be stored in the cache record:

```console
$ jcache cache show 6
ID: 6
URI: ../tests/notebooks/basic_unrun.ipynb
Created: 2020-02-29 17:48
Accessed: 2020-02-29 17:48
Origin URI: ../tests/notebooks/basic_unrun.ipynb
Created: 2020-03-12 17:31
Accessed: 2020-03-12 17:31
Hashkey: 818f3412b998fcf4fe9ca3cca11a3fc3
Data:
execution_seconds: 1.2727476909999993
execution_seconds: 1.0559415130000005

```

Expand All @@ -341,18 +343,18 @@ Failed notebooks will not be cached, but the exception traceback will be added t
$ jcache stage show 2
ID: 2
URI: ../tests/notebooks/basic_failing.ipynb
Created: 2020-02-29 17:48
Created: 2020-03-12 17:31
Failed Last Execution!
Traceback (most recent call last):
File "../jupyter_cache/executors/basic.py", line 152, in execute
executenb(nb_bundle.nb, cwd=tmpdirname)
File "//anaconda/envs/mistune/lib/python3.7/site-packages/nbconvert/preprocessors/execute.py", line 737, in executenb
File "/anaconda/envs/mistune/lib/python3.7/site-packages/nbconvert/preprocessors/execute.py", line 737, in executenb
return ep.preprocess(nb, resources, km=km)[0]
File "//anaconda/envs/mistune/lib/python3.7/site-packages/nbconvert/preprocessors/execute.py", line 405, in preprocess
File "/anaconda/envs/mistune/lib/python3.7/site-packages/nbconvert/preprocessors/execute.py", line 405, in preprocess
nb, resources = super(ExecutePreprocessor, self).preprocess(nb, resources)
File "//anaconda/envs/mistune/lib/python3.7/site-packages/nbconvert/preprocessors/base.py", line 69, in preprocess
File "/anaconda/envs/mistune/lib/python3.7/site-packages/nbconvert/preprocessors/base.py", line 69, in preprocess
nb.cells[index], resources = self.preprocess_cell(cell, resources, index)
File "//anaconda/envs/mistune/lib/python3.7/site-packages/nbconvert/preprocessors/execute.py", line 448, in preprocess_cell
File "/anaconda/envs/mistune/lib/python3.7/site-packages/nbconvert/preprocessors/execute.py", line 448, in preprocess_cell
raise CellExecutionError.from_cell_and_msg(cell, out)
nbconvert.preprocessors.execute.CellExecutionError: An error occurred while executing the following cell:
------------------
Expand Down Expand Up @@ -388,15 +390,15 @@ As with artefacts, these files must be in the same folder as the notebook,
or a sub-folder.

```console
$ jcache stage add-one -nb tests/notebooks/basic.ipynb tests/notebooks/artifact_folder/artifact.txt
$ jcache stage add-with-assets -nb tests/notebooks/basic.ipynb tests/notebooks/artifact_folder/artifact.txt
Success!
```

```console
$ jcache stage show 1
ID: 1
URI: ../tests/notebooks/basic.ipynb
Created: 2020-02-29 17:48
Created: 2020-03-12 17:31
Cache ID: 6
Assets:
- ../tests/notebooks/artifact_folder/artifact.txt
Expand Down
24 changes: 16 additions & 8 deletions jupyter_cache/cache/db.py
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@ def get_value(key: str, db: Engine, default=None):
Setting.set_value(key, default, db)
result = [default]
else:
raise KeyError(key)
raise KeyError("Setting not found in DB: {}".format(key))
value = result[0]
return value

Expand Down Expand Up @@ -121,7 +121,9 @@ def record_from_hashkey(hashkey: str, db: Engine) -> "NbCacheRecord":
session.query(NbCacheRecord).filter_by(hashkey=hashkey).one_or_none()
)
if result is None:
raise KeyError(hashkey)
raise KeyError(
"Cache record not found for NB with hashkey: {}".format(hashkey)
)
session.expunge(result)
return result

Expand All @@ -130,7 +132,7 @@ def record_from_pk(pk: int, db: Engine) -> "NbCacheRecord":
with session_context(db) as session: # type: Session
result = session.query(NbCacheRecord).filter_by(pk=pk).one_or_none()
if result is None:
raise KeyError(pk)
raise KeyError("Cache record not found for NB with PK: {}".format(pk))
session.expunge(result)
return result

Expand All @@ -139,7 +141,7 @@ def touch(pk, db: Engine):
with session_context(db) as session: # type: Session
record = session.query(NbCacheRecord).filter_by(pk=pk).one_or_none()
if record is None:
raise KeyError(pk)
raise KeyError("Cache record not found for NB with PK: {}".format(pk))
record.accessed = datetime.utcnow()
session.commit()

Expand All @@ -150,7 +152,9 @@ def touch_hashkey(hashkey, db: Engine):
session.query(NbCacheRecord).filter_by(hashkey=hashkey).one_or_none()
)
if record is None:
raise KeyError(hashkey)
raise KeyError(
"Cache record not found for NB with hashkey: {}".format(hashkey)
)
record.accessed = datetime.utcnow()
session.commit()

Expand Down Expand Up @@ -260,7 +264,7 @@ def record_from_pk(pk: int, db: Engine) -> "NbStageRecord":
with session_context(db) as session: # type: Session
result = session.query(NbStageRecord).filter_by(pk=pk).one_or_none()
if result is None:
raise KeyError(pk)
raise KeyError("Staging record not found for NB with PK: {}".format(pk))
session.expunge(result)
return result

Expand All @@ -269,7 +273,9 @@ def record_from_uri(uri: str, db: Engine) -> "NbStageRecord":
with session_context(db) as session: # type: Session
result = session.query(NbStageRecord).filter_by(uri=uri).one_or_none()
if result is None:
raise KeyError(uri)
raise KeyError(
"Staging record not found for NB with URI: {}".format(uri)
)
session.expunge(result)
return result

Expand All @@ -292,7 +298,9 @@ def set_traceback(uri: str, traceback: Optional[str], db: Engine):
with session_context(db) as session: # type: Session
result = session.query(NbStageRecord).filter_by(uri=uri).one_or_none()
if result is None:
raise KeyError(uri)
raise KeyError(
"Staging record not found for NB with URI: {}".format(uri)
)
result.traceback = traceback
try:
session.commit()
Expand Down
8 changes: 6 additions & 2 deletions jupyter_cache/cache/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -279,7 +279,9 @@ def get_cache_bundle(self, pk: int) -> NbBundleOut:
path = self._get_notebook_path_cache(record.hashkey)
artifact_folder = self._get_artifact_path_cache(record.hashkey)
if not path.exists():
raise KeyError(pk)
raise KeyError(
"Notebook file does not exist for cache record PK: {}".format(pk)
)

return NbBundleOut(
nbf.reads(path.read_text(), NB_VERSION),
Expand Down Expand Up @@ -307,7 +309,9 @@ def remove_cache(self, pk: int):
record = NbCacheRecord.record_from_pk(pk, self.db)
path = self._get_notebook_path_cache(record.hashkey)
if not path.exists():
raise KeyError(pk)
raise KeyError(
"Notebook file does not exist for cache record PK: {}".format(pk)
)
shutil.rmtree(path.parent)
NbCacheRecord.remove_records([pk], self.db)

Expand Down
25 changes: 20 additions & 5 deletions jupyter_cache/cli/commands/cmd_cache.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ def cmnd_cache():
def format_cache_record(record, hashkeys, path_length):
data = {
"ID": record.pk,
"URI": str(shorten_path(record.uri, path_length)),
"Origin URI": str(shorten_path(record.uri, path_length)),
"Created": record.created.isoformat(" ", "minutes"),
"Accessed": record.accessed.isoformat(" ", "minutes"),
# "Description": record.description,
Expand All @@ -28,9 +28,15 @@ def format_cache_record(record, hashkeys, path_length):

@cmnd_cache.command("list")
@options.CACHE_PATH
@click.option("-h", "--hashkeys", is_flag=True, help="Whether to show hashkeys.")
@click.option(
"-l",
"--latest-only",
is_flag=True,
help="Show only the most recent record per origin URI.",
)
@click.option("-h", "--hashkeys", is_flag=True, help="Show the hashkey of notebook.")
@options.PATH_LENGTH
def list_caches(cache_path, hashkeys, path_length):
def list_caches(cache_path, latest_only, hashkeys, path_length):
"""List cached notebook records in the cache."""
import tabulate

Expand All @@ -39,6 +45,15 @@ def list_caches(cache_path, hashkeys, path_length):
if not records:
click.secho("No Cached Notebooks", fg="blue")
# TODO optionally list number of artifacts
if latest_only:
latest_records = {}
for record in records:
if record.uri not in latest_records:
latest_records[record.uri] = record
continue
if latest_records[record.uri].created < record.created:
latest_records[record.uri] = record
records = list(latest_records.values())
click.echo(
tabulate.tabulate(
[
Expand Down Expand Up @@ -131,7 +146,7 @@ def cache_file(db, nbpath, validate, overwrite, artifact_paths=()):
return True


@cmnd_cache.command("add-one")
@cmnd_cache.command("add-with-artefacts")
@arguments.ARTIFACT_PATHS
@options.NB_PATH
@options.CACHE_PATH
Expand All @@ -145,7 +160,7 @@ def cache_nb(cache_path, artifact_paths, nbpath, validate, overwrite):
click.secho("Success!", fg="green")


@cmnd_cache.command("add-many")
@cmnd_cache.command("add")
@arguments.NB_PATHS
@options.CACHE_PATH
@options.VALIDATE_NB
Expand Down
4 changes: 3 additions & 1 deletion jupyter_cache/cli/commands/cmd_exec.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,5 +28,7 @@ def execute_nbs(cache_path, entry_point, pks):
logger.error(str(error))
return 1
result = executor.run_and_cache(filter_pks=pks or None)
click.secho("Finished!", fg="green")
click.secho(
"Finished! Successfully executed notebooks have been cached.", fg="green"
)
click.echo(yaml.safe_dump(result, sort_keys=False))
Loading