Skip to content

Commit

Permalink
Merge pull request #291 from pangeo-forge/0.8.0-release-notes
Browse files Browse the repository at this point in the history
Update docs for 0.8.0 release
  • Loading branch information
cisaacstern authored Feb 17, 2022
2 parents 9686998 + 01859c4 commit 6bc66f7
Show file tree
Hide file tree
Showing 2 changed files with 57 additions and 8 deletions.
5 changes: 5 additions & 0 deletions docs/development/release_notes.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,10 @@
# Release Notes

## v0.8.0 - 2022-02-17

- **Breaking change:** Replace recipe classes' storage attibutes with `.storage_config` of type {class}`pangeo_forge_recipes.storage.StorageConfig`. {pull}`288`
- Add `setup_logging` convenience function. {pull}`287`

## v0.7.0 - 2022-02-14 ❤️

- Apache Beam executor added. {issue}`169`. By [Alex Merose](https://github.com/alxmrs).
Expand Down
60 changes: 52 additions & 8 deletions docs/recipe_user_guide/storage.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,16 @@
# Storage

Recipes need a place to store data.
The location where the final dataset produced by the recipe is stored is called the
``Target``. Pangeo forge has a special class for this: {class}`pangeo_forge_recipes.storage.FSSpecTarget`
Recipes need a place to store data. This information is provided to the recipe by its `.storage_config` attribute, which is an object of type {class}`pangeo_forge_recipes.storage.StorageConfig`.

Creating a Target requires two arguments:
## Default storage

By default, `.storage_config` points to a local [`tempfile.TemporaryDirectory`](https://docs.python.org/3/library/tempfile.html#tempfile.TemporaryDirectory). This allows you to write data to temporary local storage during the recipe development and debugging process.

## Customizing storage: the `target`

To write a recipe's full dataset to a persistant storage location, simply re-assign `.storage_config` to be a {class}`pangeo_forge_recipes.storage.StorageConfig` pointing to the location(s) of your choice. The minimal requirement for instantiating `StorageConfig` is a location in which to store the final dataset produced by the recipe. This is called the ``target``. Pangeo Forge has a special class for this: {class}`pangeo_forge_recipes.storage.FSSpecTarget`.

Creating a ``target`` requires two arguments:
- The ``fs`` argument is an [fsspec](https://filesystem-spec.readthedocs.io/en/latest/)
filesystem. Fsspec supports many different types of storage via its
[built in](https://filesystem-spec.readthedocs.io/en/latest/api.html#built-in-implementations)
Expand All @@ -15,27 +21,65 @@ Creating a Target requires two arguments:
For example, creating a storage target for AWS S3 might look like this:
```{code-block} python
import s3fs
from pangeo_forge_recipes.storage import FSSpecTarget
fs = s3fs.S3FileSystem(key="MY_AWS_KEY", secret="MY_AWS_SECRET")
target_path = "pangeo-forge-bucket/my-dataset-v1.zarr"
target = FSSpecTarget(fs=fs, root_path=target_path)
```

Temporary data is can be cached via a {class}`pangeo_forge_recipes.storage.CacheFSSpecTarget` object.
Some recipes require separate caching of metadata, which is provided by a third {class}`pangeo_forge_recipes.storage.FSSpecTarget`.
This target can then be assiged to a recipe as follows:
```{code-block} python
from pangeo_forge_recipes.storage import StorageConfig
recipe.storage_config = StorageConfig(target)
```

Once assigned, the `target` can be accessed from the recipe with:

```{code-block} python
recipe.target
```

## Customizing storage continued: caching

Oftentimes it is useful to cache input files, rather than read them directly from the data provider. Input files can be cached at a location defined by a {class}`pangeo_forge_recipes.storage.CacheFSSpecTarget` object. Some recipes require separate caching of metadata, which is provided by a third class {class}`pangeo_forge_recipes.storage.MetadataTarget`.

A `StorageConfig` which declares all three storage locations is assigned as follows:

```{code-block} python
from pangeo_forge_recipes.storage import CacheFSSpecTarget, FSSpecTarget, MetadataTarget, StorageConfig
# define your fsspec filesystems for the target, cache, and metadata locations here
target = FSSpecTarget(fs=<fsspec-filesystem-for-target>, root_path="<path-for-target>")
cache = CacheFSSpecTarget(fs=<fsspec-filesystem-for-cache>, root_path="<path-for-cache>")
metadata = MetadataTarget(fs=<fsspec-filesystem-for-metadata>, root_path="<path-for-metadata>")
recipe.storage_config = StorageConfig(target, cache, metadata)
```

## API

```{eval-rst}
.. autoclass:: pangeo_forge_recipes.storage.StorageConfig
:members:
```

```{eval-rst}
.. autoclass:: pangeo_forge_recipes.storage.FSSpecTarget
:members:
```

```{eval-rst}
.. autoclass:: pangeo_forge_recipes.storage.FlatFSSpecTarget
.. autoclass:: pangeo_forge_recipes.storage.CacheFSSpecTarget
:members:
:show-inheritance:
```

```{eval-rst}
.. autoclass:: pangeo_forge_recipes.storage.CacheFSSpecTarget
.. autoclass:: pangeo_forge_recipes.storage.MetadataTarget
:members:
:show-inheritance:
```

0 comments on commit 6bc66f7

Please sign in to comment.