Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: plugins/middleware/hooks #65

Open
jvkersch opened this issue Aug 5, 2024 · 2 comments
Open

feat: plugins/middleware/hooks #65

jvkersch opened this issue Aug 5, 2024 · 2 comments

Comments

@jvkersch
Copy link
Contributor

jvkersch commented Aug 5, 2024

Following the example of proTES, it would be useful to have a plugin/middleware/hooks system to provide additional functionality to DRS-filer, or to modify existing behaviour. Such extra functionality could e.g. include support for crypt4gh (as implemented in a plugin-less way in pa-DRS-Crypt4GH-PoC).

This is a first, rough design document to describe such a plugin system. Caveat: I only know of one realistic example of a plugin so far (offering support for Crypt4GH). If we can find a few more then we can check whether the proposed design is suitably general to support all usecases.

Considerations/context

  • To facilitate developer experience, the plugin/middleware system here should be as similar as possible to that of proTES. The way that plugins are configured (through import paths in the config file), and the use of a middleware manager are things that are immediately relevant for DRS-filer as well.
  • There are also areas where the design/implementation may differ. For example, proTES applies middlewares directly to the request and this suffices to pass in some additional HTTP headers. I feel this design is too limited for DRS-filer, and it would be more flexible to work on the "connexion/foca level", after the request has been parsed and before the response is serialized.
  • That said, there may be times where you want to fiddle with the actual request before it has been parsed/validated by connexion (e.g. to pass in an additional header, do some authentication, etc). I can't think of a convincing example, but the design should be so that this is not a priori impossible.

Tentative design

Given that the plugin should be able to interfere with the behaviour of each endpoint at two moments in time (after the request has been parsed, and before the response is serialized), this suggests having two dedicated methods per endpoint (, as in the design below:

class DummyMiddleware:

  def pre_GetObject(self, object_id):
     # Code that is run before GetObject is called goes here
  def post_GetObject(self, object_id, object):
     # Code that is run after GetObject returns goes here

  def pre_getServiceInfo(self):
     # ...
  def post_getServiceInfo(self, service_info):
     # ...

  # Other endpoints go here

Note how the pre/post methods follow the signature of the endpoint that they wrap. Plugins do not have to implement all methods, just the ones for which they have functionality to contribute.

Advantages/disadvantages

  • The names pre/post may be confusing with the POST from HTTP methods; other suggestions welcome.
  • The current design may lead to plugins that have a lot of methods: with 9 endpoints (currently) and 2 methods per endpoint, there is a maximum of 18 methods. This should be mitigated by the fact that most plugins will not have to implement all 18 methods at once, a small subset will do.
  • The plugin API is tightly connected to the API of the endpoints. When the latter changes, the plugins will also have to be updated. This is unavoidable to a certain extent.

Example plugin (Crypt4GH)

This plugin has to offer two pieces of functionality.

  1. It has to advertise that the server has support for Crypt4GH encryption. This is done through an entry in the service info dictionary.
  2. When a user requests an access URL, it has to provide a re-encrypted version of the object pointed to by the access URL. This is done by retrieving the user's public key from the header, issuing a call to a reencrypt function, and returning a suitably modified access URL.
class Crypt4GHMiddleware:

  def post_getServiceInfo(self, service_info):
    server_pubkey = load_server_pubkey()
    service_info["crypt4gh"] = {
      "version" = "1.0",
      "server_pubkey" = server_pubkey,
    }

  def post_GetAccessURL(self, object_id, access_id, access_url):
    client_pubkey = request.headers.get("Crypt4Gh-Pubkey")
    crypt4gh_conf = getattr(current_app.config.foca, "crypt4gh", None)
    access_url = reencrypt(access_url, client_pubkey, crypt4gh_conf)
    return access_url

Note that the specific implementation is not subject to any standard, and is likely to change in the future.

@uniqueg
Copy link
Member

uniqueg commented Aug 5, 2024

Good stuff, thanks a lot.

Recently, I was looking into upgrading FOCA from Connexion 2 to Connexion 3 (https://connexion.readthedocs.io/en/latest/v3.html#migrating-from-connexion-2), and I stumbled across a possible alternative.

Connexion 3 is a major rewrite that is built on Starlette (instead of Flask) to migrate from WSGI to ASGI (though Flask is still supported via some WSGI-to-ASGI compatibility layer). Importantly, Connexion 3 now applies basically all its functionalities via a Starlette-based middleware stack: https://www.starlette.io/middleware/

So, given that FOCA is underlying basically all of our services and we are planning to migrate to Connexion 3 as soon as I manage to put in the time to do so (I had already started and finished the first migration to about 75% when the summer break hit me), we could also consider making use of Starlette middlwares.

Advantages:

  • Highly generic
  • Implementation and documentation will be available "free" after migration to Connexion 3
  • Highly flexible stack (not just pre/post)
  • Middlewares can be used by any Starlette/Connexion 3/FOCA or other Starlette-based apps

Disadvantages:

  • Highly generic
  • Middlewares can ONLY be used by Starlette/Connexion 3/FOCA or other Starlette-based apps (but that applies to a custom middleware engine even more!)

I've put "highly generic" as both an advantage and disadvantage, because having a bit of structure might make development easier, or at least more consistent. However, a pre-defined structure can also make things more restrictive, especially because we can't foresee all use cases yet.

I guess what we really need to focus on is the design of a mechanism that checks when (and when not) a middleware applies. That way we don't need to write different methods for different operations, but rather include all middlewares in the stack and just make sure that each one only runs if the right conditions are met.

@jvkersch
Copy link
Contributor Author

jvkersch commented Aug 6, 2024

@uniqueg I agree that a pre-defined framework would be the better option. The main weakness with my proposal is that there's currently one 1 example, and a custom framework risks being premature.

I guess what we really need to focus on is the design of a mechanism that checks when (and when not) a middleware applies.

Yes! We can use the intervening time (until the migration to connexion 3 is complete) to figure this out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants