Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update docs #13

Draft
wants to merge 19 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from 12 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1,042 changes: 475 additions & 567 deletions Pipfile.lock

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions docs/explanations.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,3 +9,4 @@ Explanation of how the library works and why it works that way.
:caption: Explanations

explanations/why-is-something-so
OCopping marked this conversation as resolved.
Show resolved Hide resolved
explanations/why-multiprocessing
39 changes: 39 additions & 0 deletions docs/explanations/why-multiprocessing.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
Why use Multiprocessing?
OCopping marked this conversation as resolved.
Show resolved Hide resolved
========================

The main issue with Python is working with the Global Interpreter Lock (GIL).
This lock means Python can only run natively in a single thread, making it
super difficult to do any CPU heavy workloads.

One workaround for this would be to deploy the webapp on a cluster, such as
Kubernetes, and have some load balancing code to spin up new instances whenever
a call is made.
The easier solution, however, is to implement the Multiprocessing library. This
OCopping marked this conversation as resolved.
Show resolved Hide resolved
allows for a new process to be spawned for a specific function call, hence
avoiding the GIL.

For the HDF5 Reader Service, Multiprocessing has been implemented into every
endpoint. This allows for multiple calls to be made at once without any
blocking.

For example, in the ``/info`` endpoint, the Multiprocessing code is:

.. code-block:: python

p = mp.Process(target=fetch_info, args=(path, subpath, queue))
p.start()
p.join()

where ``fetch_info`` is the function containing the logic for fetching the
metadata. The process has to be ``start``\ed and then ``join``\ed once completed
to prevent memory leaks.

One downside of using Multiprocessing is that standard Python constructs in
the global namespace, such as dictionaries, cannot be called from within the
Process. This requires using a Multiprocessing Queue object, which can have
data ``put`` onto it in the process, and can be ``get`` from it back in the
main process.

.. code-block:: python

queue: mp.Queue = mp.Queue()
2 changes: 2 additions & 0 deletions docs/tutorials.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,3 +9,5 @@ Tutorials for installation, library and commandline usage. New users start here.
:caption: Tutorials
callumforrester marked this conversation as resolved.
Show resolved Hide resolved

tutorials/installation
tutorials/querying-endpoints
tutorials/running-the-server
40 changes: 40 additions & 0 deletions docs/tutorials/querying-endpoints.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
Query the HDF5 Reader Service Endpoints
=======================================

The HDF5 Reader Service provides REST endpoints for querying HDF5 files.

Tree
----

The ``/tree`` endpoint returns a JSON representation of the HDF5 file tree structure.

http://0.0.0.0/8000/tree/?path=<path>

Info
----

The ``/info`` endpoint returns metadata about the given node in the HDF5 file.

http://0.0.0.0/8000/info/?path=<path>&subpath=<subpath>

Shapes
------

The ``/shapes`` endpoint fetches the shapes of the datasets.

http://0.0.0.0/8000/shapes/?path=<path>

Search
------

The ``/search`` endpoint fetches the subnode structure of the current subnode.

http://0.0.0.0/8000/search/?path=<path>&subpath=<subpath>

Slice
-----

The ``/slice`` endpoint fetches the requested slice of the given dataset.

http://0.0.0.0/8000/search/?path=<path>&subpath=<subpath>&slice=0:0:0,0:0:0

17 changes: 17 additions & 0 deletions docs/tutorials/running-the-server.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
Running the HDF5 Reader Service server
======================================

This tutorial shows how to run the HDF5 Reader Service server.

Start the server
----------------

It is very easy to start the HDF5 reader service. All that is needed is to
run the following command in a terminal:

$ uvicorn hdf5_reader_service.main:app
OCopping marked this conversation as resolved.
Show resolved Hide resolved

However, it is also possible to run the server on a specific host or port. To
do this, just use the ``--host`` and ``--port`` flags.

$ uvicorn hdf5_reader_service.main:app --host 127.0.0.1 --port 8000
OCopping marked this conversation as resolved.
Show resolved Hide resolved