Skip to content

Commit

Permalink
Refactor 'tiled serve directory' to use the catalog (#511)
Browse files Browse the repository at this point in the history
* Back out over-eager find/replace.

* WIP

* Logging and error handling.

* The files example runs.

* Access policy as required adapter interface

* Tests pass.

* Simplify. Tests not passing yet.

* Simplify more.

* Revert mistake

* Fix id vs key confusion.

* Fix typo

* Traversing into HDF5 works.

* Port over netCDF support.

* Adapter construction may block.

* Tests pass

* 'tiled catalog register ...'

* register --prefix works

* Refactor walker in prep for zarr, tiff.

* Zarr works but there is a doubling issue.

* Zarr works and code seems sane

* Sketched TIFF seq support, did not test yet

* TIFF sequence works nicely

* Support registering single file.

* Draft delete_tree, not tested.

* BOOM

* docstrings

* Use regex to match extension too.

* TEMP: Hide walker tests.

* Fix up breakage from moving to separate mimetypes module.

* Show clear error message if old directory adapter is used.

* delete_tree works

* object_cache tests pass

* Make TIFF test more realistic and challenging.

* Catch readable_storage as tmpdir or Path.

* Log groupings of TIFF sequences.

* Fix validaiton.

* Address event-loop crossing issue.

* Remove problematic stdin test. Does not play well.

* Update reference docs.

* WIP: Watchfiles logs changes

* Implement filtering.

* WIP: Sketch accumulate changes during initial scan.

* The concurrency with both CLIs is worked out.

* --watch works in a rudimentary way

* Enable custom mimetype detection.

* Update docs to drop 'tree: files' and 'DirectoryAdapter.'

* Avoid [] as default.

* Make thread leaker activated by env.

* readers -> adapters

* XDI example works

* Raise clear error if serve does not know mimetype.

* Reading works on XDI examples.

* REF: No need for a subprocess

* Use better CLI syntax.

* Handle binary values in metadata by b64-encoding.

* Handle collisions.

* Remove mention of watch mode from the intro tutorial.

* Work around PostgreSQL requiring JSON as str.

* Refactor for stricter interfaces.
  • Loading branch information
danielballan authored Jul 19, 2023
1 parent 4009414 commit c4f50d2
Show file tree
Hide file tree
Showing 46 changed files with 1,879 additions and 1,757 deletions.
28 changes: 15 additions & 13 deletions docs/source/explanations/specialized-formats.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,5 @@
# Case Study: Reading and Exporting a Specialized Format



For this guide, we'll take the example of
[XDI](https://github.com/XraySpectroscopy/XAS-Data-Interchange/blob/master/specification/spec.md#example-xdi-file),
which is a formalized text-based format for X-ray Spectroscopy data.
Expand Down Expand Up @@ -94,12 +92,11 @@ Now take the following simple server configuration:
# config.yml
trees:
- path: /
tree: tiled.adapters.files:DirectoryAdapter.from_directory
tree: tiled.catalog:from_uri
args:
directory: "data"
mimetypes_by_file_ext:
.xdi: application/x-xdi
readers_by_mimetype:
uri: ./catalog.db
- readable_storage: ./data/
adapters_by_mimetype:
application/x-xdi: tiled.examples.xdi:XDIDataFrameAdapter.from_file
```
Expand All @@ -109,6 +106,12 @@ and serve it:
tiled serve config --public config.yml
```

And register the files:

```
tiled catalog register catalog.db --config config.yml --ext '.xdi=application/x-xdi' data/
```

As is, we can access the data as CSV, for example.

```
Expand Down Expand Up @@ -204,14 +207,13 @@ Add new sections to the configuration as follows.
```yaml
trees:
- path: /
tree: tiled.adapters.files:DirectoryAdapter.from_directory
tree: tiled.catalog:from_uri
args:
directory: "data"
mimetypes_by_file_ext:
.xdi: application/x-xdi
readers_by_mimetype:
uri: ./catalog.db
readable_storage:
- ./data/
adapters_by_mimetype:
application/x-xdi: tiled.examples.xdi:XDIDataFrameAdapter.from_file

media_types:
xdi:
application/x-xdi: tiled.examples.xdi:write_xdi
Expand Down
29 changes: 14 additions & 15 deletions docs/source/how-to/profiles.md
Original file line number Diff line number Diff line change
Expand Up @@ -168,12 +168,12 @@ Here is a complete example.
```yaml
# profiles.yml
my_profile:
direct:
trees:
- path: /
tree: tiled.adapters.files:DirectoryAdapter.from_directory
args:
directory: "path/to/files"
direct:
trees:
- path: /
tree: tiled.catalog:from_uri
args:
uri: "/path/to/catalog.db"
```
This takes the place of the `uri:` parameter. A profile must contain
Expand All @@ -184,15 +184,14 @@ usual client-side configuration, such as
```yaml
# profiles.yml
my_profile:
direct:
trees:
- path: /
tree: tiled.adapters.files:DirectoryAdapter.from_directory
args:
directory: "path/to/files"
cache:
memory:
available_bytes: 2_000_000_000 # 2 GB
direct:
trees:
- path: /
tree: tiled.catalog:from_uri
args:
directory: "/path/to/catalog.db"
cache:
capacity: 2_000_000_000 # 2 GB
```

## Reference
Expand Down
88 changes: 32 additions & 56 deletions docs/source/how-to/read-custom-formats.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,8 @@ But starting with files is a good way to get rolling with Tiled.
## Formats are named using "MIME types"

Tiled refers to formats using a web standard called
[MIME types](https://developer.mozilla.org/en-US/docs/Web/HTTP/Basics_of_HTTP/MIME_types).
[MIME types](https://developer.mozilla.org/en-US/docs/Web/HTTP/Basics_of_HTTP/MIME_types)
a.k.a. "media types".
MIME types look like:

```
Expand Down Expand Up @@ -50,31 +51,17 @@ told that it should read `*.stuff` files like CSVs.

### Map the unfamiliar file extension to a MIME type

We use a configuration file like this:

```yaml
# config.yml
trees:
- tree: files
args:
directory: path/to/directory
mimetypes_by_file_ext:
.stuff: text/csv
```
tiled serve directory path/to/directory --ext '.stuff=text/csv'
```

We are mapping the file extension, `.stuff` (including the leading `.`) to
the MIME type `text/csv`.

Multiple file extensions can be mapped to the same MIME type. For example,
Tiled's default configuration maps both `.tif` and `.tiff` to `image/tiff`.

We then use the configuration file like this:

```
tiled serve config config.yml
```

The configuration file `config.yml` can be named anything you like.
Multiple custom mapping can be specified by using `--ext` repeatedly.

## Case 2: No File Extension

Expand Down Expand Up @@ -122,40 +109,18 @@ its file extension. Therefore, this function can be used to catch files that
have no file extension or to _override_ the determination based file extension
if it is wrong.

If the Python script `custom.py` is placed in the same directory as
`config.yml`, Tiled will find it. (Tiled temporarily adds the directory
containing the configuration file(s) to the Python import path while
it parses the configuration.)

```yaml
# config.yml
trees:
- tree: files
args:
directory: path/to/directory
mimetype_detection_hook: custom:detect_mimetype
```

Alternatively, if the function can be defined in some external Python package
like `my_package.my_module.func` and configured like
Place `custom.py` in the current working directory and reference it like this:

```
mimetype_detection_hook: my_package.my_module:func
tiled serve directory path/to/directory --mimetype-hook custom:detect_mimetype
```

Note that the packages are separated by `.` but the final object (`func`) is
preceded by a `:`. If you forget this, Tiled will raise a clear error to remind
you.
The names `custom.py` and `detect_mimetype` are arbitrary. The
`mimetype_detection_hook` may be used in combination with
`mimetypes_by_file_ext`.
As in Case 1, we use the configuration file like this:
```
tiled serve config config.yml
```
* The names `custom.py` and `detect_mimetype` are arbitrary.
* The function may be in the any importable location; it does not have to be
in the current working directory. Functions in nested packages can referenced
like `package.module.submodule:function_name`. Notice the `.`s between
modules and the `:` before the function.
* The `--mimetype-hook` may be used in combination with `--ext` above.

## Case 3: Custom Format

Expand Down Expand Up @@ -259,11 +224,11 @@ Specify them as an argument to the Adapter, as in:
DataFrameAdapter(..., specs=["xdi"])
```

### Configure Tiled to use this Adapter
### Configure Tiled Server to use this Adapter

Our configuration file should use `mimetypes_by_file_ext` (Case 1) or
`mimetype_detection_hook` (Case 2) to recognize this custom file.
Additionally, it should add a section `readers_by_mimetype` to
Additionally, it should add a section `adapters_by_mimetype` to
map our MIME type `application/x-stuff` to our custom function.

Again, Tiled will find `custom.py` if it is placed in the same directory as
Expand All @@ -273,16 +238,27 @@ needed.
```yaml
# config.yml
trees:
- tree: files
args:
directory: path/to/directory
mimetype_detection_hook: custom:detect_mimetype
readers_by_mimetype:
application/x-stuff: custom:read_custom_format
- tree: catalog
path: /
args:
uri: ./catalog.db
readable_storage:
- path/to/directory
adapters_by_mimetype:
application/x-stuff: custom:read_custom_format
```
We then use the configuration file like this:
```
tiled serve config config.yml
```

and register the files in a separate step. Use `--ext` and/or `--mimetype-hook`
described above to register files as your custom MIME type (e.g.
`application/x-stuff`). For example:


```
tiled catalog register catalog.db --ext '.stuff=application/x-stuff' path/to/directory
```
10 changes: 7 additions & 3 deletions docs/source/reference/service.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,17 +18,21 @@ or its dask counterpart.
tiled.adapters.xarray.DatasetAdapter.from_dataset
```

### File and Directory Adapters
### File Adapters

```{eval-rst}
.. autosummary::
:toctree: generated
tiled.adapters.files.DirectoryAdapter
tiled.adapters.dataframe.DataFrameAdapter.read_csv
tiled.adapters.tiff.TiffAdapter
tiled.adapters.excel.ExcelAdapter
tiled.adapters.hdf5.HDF5Adapter
tiled.adapters.netcdf.read_netcdf
tiled.adapters.parquet.ParquetDatasetAdapter
tiled.adapters.sparse_blocks_parquet.SparseBlocksParquetAdapter
tiled.adapters.tiff.TiffAdapter
tiled.adapters.zarr.ZarrArrayAdapter
tiled.adapters.zarr.ZarrGroupAdapter
```

## Search Queries
Expand Down
4 changes: 0 additions & 4 deletions docs/source/tutorials/serving-files.md
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,3 @@ array([[1., 1., 1., ..., 1., 1., 1.],
1 2 5
2 3 6
```

Try deleting, moving, or adding files, and notice that the ``client`` object
updates its structure. It continually watches the filesystem for changes in an
efficient fashion.
3 changes: 0 additions & 3 deletions example_configs/external_service/custom.py
Original file line number Diff line number Diff line change
Expand Up @@ -79,9 +79,6 @@ async def lookup_adapter(self, segments):
# or something custom, or another AuthenticatedAdapter...
return ArrayAdapter(data, metadata=metadata)

# TODO This can be a fast-path.
lookup_node = lookup_adapter

async def keys_range(self, offset, limit):
url = ... # based on self._segments
return await self._client.get_contents(url, token=self._token)
Expand Down
23 changes: 12 additions & 11 deletions tiled/_tests/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -78,17 +78,18 @@ def tmpdir_module(request, tmpdir_factory):
return tmpdir_factory.mktemp(request.module.__name__)


# This can un-commented to debug leaked threads.
# import threading
# import time
#
# def poll_enumerate():
# while True:
# time.sleep(1)
# print("THREAD COUNT", len(threading.enumerate()))
#
# thread = threading.Thread(target=poll_enumerate, daemon=True)
# thread.start()
# Use this with pytest -s option.
if os.getenv("TILED_DEBUG_LEAKED_THREADS"):
import threading
import time

def poll_enumerate():
while True:
time.sleep(1)
print("THREAD COUNT", len(threading.enumerate()))

thread = threading.Thread(target=poll_enumerate, daemon=True)
thread.start()


# To test with postgres, start a container like:
Expand Down
Loading

0 comments on commit c4f50d2

Please sign in to comment.