Skip to content

Commit

Permalink
Remove "object cache" misfeature; add simpler "resource cache" (#708)
Browse files Browse the repository at this point in the history
* Remove object_cache.

* Add TLRU resource cache.

* Remove object cache from adapters.

* Apply resource cache to single TIFF adapter.

* Make resource cache tunable via env vars

* Apply resource cache to HDF5 adapter

* Handle zero-sized cache.

* Update CHANGELOG

* Finish typing resource_cache

* Add unit tests for resource cache

* Fix outdated comment

Co-authored-by: Eugene <[email protected]>

* Include detailed documentation on resource cache.

* Add reference docs for resource cache.

* Fix documented env var

* Use more specific cache keys

---------

Co-authored-by: Eugene <[email protected]>
  • Loading branch information
danielballan and genematx authored Apr 5, 2024
1 parent 225fa33 commit 58bc300
Show file tree
Hide file tree
Showing 20 changed files with 204 additions and 739 deletions.
11 changes: 11 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,17 @@ Write the date in place of the "Unreleased" in the case a new version is release

## Next

### Added

- Added `tiled.adapters.resource_cache` for caching file handles between
requests.

### Removed

- Removed object cache from the codebase. If `object_cache` is included in
the server configuration file, a warning is raised that this configuration
has no effected.

### Fixed

- The configuration setting `tiled_admins` did not work in practice. If a user
Expand Down
69 changes: 32 additions & 37 deletions docs/source/explanations/caching.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,22 +11,14 @@ Tiled has two kinds of caching:

1. **Client-side response cache.** The Tiled Python client implements a standard
web cache, similar in both concept and implementation to a web browser's cache.
3. **Service-side object cache.** The _response_ caches operate near the outer
edges of the application, stashing and retrieve HTTP response bytes. The
_object_ cache is more deeply integrated into the application: it is
available for authors of Adapters to use for stashing any objects that may be
useful in expediting future work. These objects may serializable, such as chunks
of array data, or unserializable, such as file handles. Requests that ask for
overlapping but distinct slices of data or requests that ask for the same
data but in varied formats will not benefit from the _response_ cache; they
will "miss". The _object_ cache, however, can slice and encode its cached
resources differently for different requests. The object cache will not provide
quite the same speed boost as a response cache, but it has a broader impact.
2. **Server-side resource cache.** The resource cache is used to cache file
handles and related system resources, to avoid rapidly opening, closing,
and reopening the same files while handling a burst of requests.

(client-http-response-cache)=
## Client-side HTTP Response Cache

The client response cache is an LRU response cache backed by a SQLite file.
The client response cache is an LRU (Least Recently Used) response cache backed by a SQLite file.


```py
Expand All @@ -48,40 +40,43 @@ cache = Cache(
)
```

## Server-side Object Cache
## Server-side Resource Cache

TO DO
The "resource cache" is a TLRU (Time-aware Least Recently Used) cache. When
items are evicted from the cache, a hard reference is dropped, freeing the
resource to be closed by the garbage collector if there are no other extant
hard references. Items are evicted if:

### Connection to Dask
- They have been in the cache for a _total_ of more than a given time.
(Accessing an item does not reset this time.)
- The cache is at capacity and this item is the least recently used item.

Dask provides an opt-in, experimental
[opportunistic caching](https://docs.dask.org/en/latest/caching.html) mechanism.
It caches at the granularity of "tasks", such as chunks of array or partitions
of dataframes.
It is not expected that users should need to tune this cache, except in
debugging scenarios. These environment variables may be set to tune
the cache parameters:

Tiled's object cache is generic---not exclusive to dask code paths---but it plugs
into dask in a similar way to make it easy for any Adapters that happen to use
dask to leverage Tiled's object cache very simply, like this:
```sh
TILED_RESOURCE_CACHE_MAX_SIZE # default 1024 items
TILED_RESOURCE_CACHE_TTU # default 60. seconds
```

```py
from tiled.server.object_cache import get_object_cache
The "size" is measured in cached items; that is, each item in the cache has
size 1.

To disable the resource cache, set:

with get_object_cache().dask_context:
# Any tasks that happen to already be cached will be looked up
# instead of computed here. Anything that _is_ computed here may
# be cached, depending on its bytesize and its cost (how long it took to
# compute).
dask_object.compute()
```sh
TILED_RESOURCE_CACHE_MAX_SIZE=0
```

Items can be proactively cleared from the cache like so:
It is also possible to register a custom cache:

```py
from tiled.server.object_cache import get_object_cache, NO_CACHE
```python
from cachetools import Cache
from tiled.adapters.resource_cache import set_resource_cache


cache = get_object_cache()
if cache is not NO_CACHE:
cache.discard_dask(dask_object.__dask_keys__())
cache = Cache(maxsize=1)
set_resouurce_cache(cache)
```

Any object satisfying the `cachetools.Cache` interface is acceptable.
83 changes: 0 additions & 83 deletions docs/source/how-to/tune-caches.md

This file was deleted.

1 change: 0 additions & 1 deletion docs/source/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,6 @@ how-to/api-keys
how-to/custom-clients
how-to/metrics
how-to/direct-client
how-to/tune-caches
how-to/tiled-authn-database
how-to/register
```
Expand Down
28 changes: 5 additions & 23 deletions docs/source/reference/service.md
Original file line number Diff line number Diff line change
Expand Up @@ -129,32 +129,14 @@ See {doc}`../explanations/structures` for more context.
tiled.server.app.build_app_from_config
```

## Object Cache

The "object" cache is available to all Adapters to cache any objects, including
serializable objects like array chunks and unserializable objects like file
handles. It is a process-global singleton.

Implementation detail: It is backed by [Cachey](https://github.com/dask/cachey).

Adapters that use the cache _must_ use a tuple of strings and/or numbers as a
cache key and _should_ use a cache key of the form `(class.__module__,
class.__qualname__, ...)` to avoid collisions with other Adapters. See
`tiled.adapters.tiff` for a generic example and see `tiled.adapters.table` for
an example that uses integration with dask.
## Resource Cache

```{eval-rst}
.. autosummary::
:toctree: generated
tiled.server.object_cache.get_object_cache
tiled.server.object_cache.set_object_cache
tiled.server.object_cache.ObjectCache
tiled.server.object_cache.ObjectCache.available_bytes
tiled.server.object_cache.ObjectCache.get
tiled.server.object_cache.ObjectCache.put
tiled.server.object_cache.ObjectCache.discard
tiled.server.object_cache.ObjectCache.clear
tiled.server.object_cache.ObjectCache.dask_context
tiled.server.object_cache.ObjectCache.discard_dask
tiled.adapters.resource_cache.get_resource_cache
tiled.adapters.resource_cache.set_resource_cache
tiled.adapters.resource_cache.default_resource_cache
tiled.adapters.resource_cache.with_resource_cache
```
Loading

0 comments on commit 58bc300

Please sign in to comment.