feat: plugins/middleware/hooks #65

jvkersch · 2024-08-05T08:32:57Z

Following the example of proTES, it would be useful to have a plugin/middleware/hooks system to provide additional functionality to DRS-filer, or to modify existing behaviour. Such extra functionality could e.g. include support for crypt4gh (as implemented in a plugin-less way in pa-DRS-Crypt4GH-PoC).

This is a first, rough design document to describe such a plugin system. Caveat: I only know of one realistic example of a plugin so far (offering support for Crypt4GH). If we can find a few more then we can check whether the proposed design is suitably general to support all usecases.

Considerations/context

To facilitate developer experience, the plugin/middleware system here should be as similar as possible to that of proTES. The way that plugins are configured (through import paths in the config file), and the use of a middleware manager are things that are immediately relevant for DRS-filer as well.
There are also areas where the design/implementation may differ. For example, proTES applies middlewares directly to the request and this suffices to pass in some additional HTTP headers. I feel this design is too limited for DRS-filer, and it would be more flexible to work on the "connexion/foca level", after the request has been parsed and before the response is serialized.
That said, there may be times where you want to fiddle with the actual request before it has been parsed/validated by connexion (e.g. to pass in an additional header, do some authentication, etc). I can't think of a convincing example, but the design should be so that this is not a priori impossible.

Tentative design

Given that the plugin should be able to interfere with the behaviour of each endpoint at two moments in time (after the request has been parsed, and before the response is serialized), this suggests having two dedicated methods per endpoint (, as in the design below:

class DummyMiddleware:

  def pre_GetObject(self, object_id):
     # Code that is run before GetObject is called goes here
  def post_GetObject(self, object_id, object):
     # Code that is run after GetObject returns goes here

  def pre_getServiceInfo(self):
     # ...
  def post_getServiceInfo(self, service_info):
     # ...

  # Other endpoints go here

Note how the pre/post methods follow the signature of the endpoint that they wrap. Plugins do not have to implement all methods, just the ones for which they have functionality to contribute.

Advantages/disadvantages

The names pre/post may be confusing with the POST from HTTP methods; other suggestions welcome.
The current design may lead to plugins that have a lot of methods: with 9 endpoints (currently) and 2 methods per endpoint, there is a maximum of 18 methods. This should be mitigated by the fact that most plugins will not have to implement all 18 methods at once, a small subset will do.
The plugin API is tightly connected to the API of the endpoints. When the latter changes, the plugins will also have to be updated. This is unavoidable to a certain extent.

Example plugin (Crypt4GH)

This plugin has to offer two pieces of functionality.

It has to advertise that the server has support for Crypt4GH encryption. This is done through an entry in the service info dictionary.
When a user requests an access URL, it has to provide a re-encrypted version of the object pointed to by the access URL. This is done by retrieving the user's public key from the header, issuing a call to a reencrypt function, and returning a suitably modified access URL.

class Crypt4GHMiddleware:

  def post_getServiceInfo(self, service_info):
    server_pubkey = load_server_pubkey()
    service_info["crypt4gh"] = {
      "version" = "1.0",
      "server_pubkey" = server_pubkey,
    }

  def post_GetAccessURL(self, object_id, access_id, access_url):
    client_pubkey = request.headers.get("Crypt4Gh-Pubkey")
    crypt4gh_conf = getattr(current_app.config.foca, "crypt4gh", None)
    access_url = reencrypt(access_url, client_pubkey, crypt4gh_conf)
    return access_url

Note that the specific implementation is not subject to any standard, and is likely to change in the future.

The text was updated successfully, but these errors were encountered:

uniqueg · 2024-08-05T11:46:19Z

Good stuff, thanks a lot.

Recently, I was looking into upgrading FOCA from Connexion 2 to Connexion 3 (https://connexion.readthedocs.io/en/latest/v3.html#migrating-from-connexion-2), and I stumbled across a possible alternative.

Connexion 3 is a major rewrite that is built on Starlette (instead of Flask) to migrate from WSGI to ASGI (though Flask is still supported via some WSGI-to-ASGI compatibility layer). Importantly, Connexion 3 now applies basically all its functionalities via a Starlette-based middleware stack: https://www.starlette.io/middleware/

So, given that FOCA is underlying basically all of our services and we are planning to migrate to Connexion 3 as soon as I manage to put in the time to do so (I had already started and finished the first migration to about 75% when the summer break hit me), we could also consider making use of Starlette middlwares.

Advantages:

Highly generic
Implementation and documentation will be available "free" after migration to Connexion 3
Highly flexible stack (not just pre/post)
Middlewares can be used by any Starlette/Connexion 3/FOCA or other Starlette-based apps

Disadvantages:

Highly generic
Middlewares can ONLY be used by Starlette/Connexion 3/FOCA or other Starlette-based apps (but that applies to a custom middleware engine even more!)

I've put "highly generic" as both an advantage and disadvantage, because having a bit of structure might make development easier, or at least more consistent. However, a pre-defined structure can also make things more restrictive, especially because we can't foresee all use cases yet.

I guess what we really need to focus on is the design of a mechanism that checks when (and when not) a middleware applies. That way we don't need to write different methods for different operations, but rather include all middlewares in the stack and just make sure that each one only runs if the right conditions are met.

jvkersch · 2024-08-06T09:13:43Z

@uniqueg I agree that a pre-defined framework would be the better option. The main weakness with my proposal is that there's currently one 1 example, and a custom framework risks being premature.

I guess what we really need to focus on is the design of a mechanism that checks when (and when not) a middleware applies.

Yes! We can use the intervening time (until the migration to connexion 3 is complete) to figure this out.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: plugins/middleware/hooks #65

feat: plugins/middleware/hooks #65

jvkersch commented Aug 5, 2024

uniqueg commented Aug 5, 2024

jvkersch commented Aug 6, 2024

feat: plugins/middleware/hooks #65

feat: plugins/middleware/hooks #65

Comments

jvkersch commented Aug 5, 2024

Considerations/context

Tentative design

Advantages/disadvantages

Example plugin (Crypt4GH)

uniqueg commented Aug 5, 2024

jvkersch commented Aug 6, 2024