Re-thinking our communication channels. What do you think of a pub/sub approach? #4316

beraldoleal · 2020-11-19T18:30:55Z

beraldoleal
Nov 19, 2020
Maintainer

The Avocado project has been adapting its architecture to solve distinguished issues and also to support the needs of new users/projects. New components have been introduced and, probably, new ones will arrive. The new runner (nrunner) it is being designed to have a more decoupled architecture, exchanging more and more messages with these components.

I could identify a few methods of communication, inside Avocado project:

For plugins, we have an internal "Dispatcher" implementation where specific methods are 'hooked' based on the plugin type; Passing to those methods some specific args;
When triggering "runners" on a podman or process, we are serializing/deserializing the necessary data via command-line arguments;
To make item 2 less painful, we have a "recipe file", that is a JSON, and runners can execute from those files as well.
Since runners are executing on a "decoupled way", when reporting status, Tasks are posting progress to a status_server, using "asyncio streams";
avocado-server is running on an HTTP REST API model.
Some internal components are relying on multiple nested "yields" to get results of some calls;
Did I forget any method?

Of course, each method makes sense, and I'm sure that there are some strong arguments for using most of them. Also, I do understand that those components were introduced in different moments and possible by different contributors. And most important: They are working fine.

But since debugging here is not so trivial, and we are moving towards a more decoupled direction with more components, having new features delivered in record time it is a strong requirement.

IMO, in order to move fast, and have more contributors we need to take a step back and re-think our communication models/channels.

So, by looking at those cases it seems to me that we need one major thing: A mechanism to subscribe to some events; Let's say: "wake me up every time a job is finished/started", for instance. Regardless of it is a local or remote process.

And this is the proposal that I would like to make: "Improve our communication channel with a standard pub/sub mechanism". This is a very well know message pattern, but here is a quote from Wikipedia:

In software architecture, publish–subscribe is a messaging pattern where senders of messages, called publishers, do not program the messages to be sent directly to specific receivers, called subscribers, but instead categorize published messages into classes without knowledge of which subscribers if any, there may be. Similarly, subscribers express interest in one or more classes and only receive messages that are of interest, without knowledge of which publishers, if any, there are.

So, the idea would be to have one decorator (@listen_to()) for getting events notifications and a method (self.publish()) helping to publish events, like this:

import asyncio

from avocado.core.components import RemoteComponent
from avocado.core.helpers import listen_to


class MyComponent(RemoteComponent):

    @listen_to('avocado.foo')
    def handle_foo(self, data):
        print(data)
        self.publish('avocado.foo.finished', "Data received")

The example above is working as a proof-of-concept that I did, decoupled from Avocado. For this experiment, I'm using WebSocket under the hood, because: 1) Most of the languages have multiple libraries to handle all low-level details, and 2) I believe that we could benefit from this protocol to use on our web server (avocado-server). But yes, we could choose a different transport protocol for this.

You can find the code here:

https://github.com/beraldoleal/avocado-pubsub

(Please, keep in mind that the code is just an experiment and it is not complete!)

Yes, I do understand that we have today some two strong requirements: a) having runners as standalone applications; and b) allowing contributors to write custom runners in any language;

Regarding "a', as discussed before, I still believe that we could better distribute the runners (let's say with a .whl, pip, or even rpm packages) in order to handle better the dependencies. But even for the cases where we are going to use this API, websockets are pretty normal nowadays, and having a client subscribed to a topic is not so complicated. We have a few libraries in multiple languages to support that. So, to accomplish "b" we could have some "hello-world" runner examples in our "Contributor's Guide" for a couple of languages.

Even with this "extra work" on this front, I believe the overall benefit will be better for the project in the mid-long term.

If you look at the code, you will notice that I tried to keep some rules/requirements:

Hide most of the internal details from the developers, by creating one decorator and one basic high-level API (RemoteComponent() class); IMO it is important to abstract this from developers, so we could change this in the future.
By doing the first, I also would like to use the same API even if we are using different "communication methods". So for instance, let's say the holy grail (one single communication method) is not possible and we decide to go with two major methods: one for remote components, and another one for local components. In my ideal world, the API should be the same, and this is fine.
"Realtime" would be nice, a plus. Once an event is triggered, all "listeners" (including HTML pages) should be able to receive that notification as soon as possible, without waiting for pooling time.
Bi-directional communication.

Maybe I'm being too naive, I don't know.... I just would like to share with you and collect feedbacks. Please, join the discussion and let me know what you think about this. And if we decide to proceed, I can draft a Blueprint.

willianrampazzo · 2020-11-25T17:49:47Z

willianrampazzo
Nov 25, 2020
Maintainer

Hi @beraldoleal, thanks for the time you spent evaluating a better communication protocol for Avocado; I really would like to see a standard and modularized solution implemented.

I did a high-level review and tested your code. It looks promising to me. It hides most of the development's complexity from the developers. I found it easy to use.

Related to the description you made here, I still have some questions or observations I think are worth discussing. To make it easy to reply, I'll quote the text and my questions/observations in other comments.

0 replies

willianrampazzo · 2020-11-25T17:53:32Z

willianrampazzo
Nov 25, 2020
Maintainer

IMO, in order to move fast, and have more contributors we need to take a step back and re-think our communication models/channels.

+1

Although I agree with you that what we have today works, the fewer communication protocols we have, the easy to debug and extend the code.

0 replies

willianrampazzo · 2020-11-25T18:31:04Z

willianrampazzo
Nov 25, 2020
Maintainer

So, by looking at those cases it seems to me that we need one major thing: A mechanism to subscribe to some events; Let's say: "wake me up every time a job is finished/started", for instance. Regardless of it is a local or remote process.

And this is the proposal that I would like to make: "Improve our communication channel with a standard pub/sub mechanism". This is a very well know message pattern, but here is a quote from Wikipedia:

In software architecture, publish–subscribe is a messaging pattern where senders of messages, called publishers, do not program the messages to be sent directly to specific receivers, called subscribers, but instead categorize published messages into classes without knowledge of which subscribers if any, there may be. Similarly, subscribers express interest in one or more classes and only receive messages that are of interest, without knowledge of which publishers, if any, there are.

I did some research on the Pub/Sub topic and found it interesting for the decoupling task we are pursuing on Avocado. Based on my experience with other communication architectures, this architecture could be a solution to be used, with some open points for discussion.

Adopting a pub/sub communication protocol would make it easy to decouple the main modules, like resolvers, runners, state machines, output producers. This means we could have not just one module for each category I mentioned, but multiple, allowing a configurable workflow.

One point to discuss is backward communication. In the current implementation, the state machine always waits for the return of an event in each step. I understand the state machine would subscribe for the desired event to have the necessary information, but the target module cannot communicate back to the broker in some cases.

For example, I'm thinking about a test running on a virtual machine or in a container, where there is no communication backward. In this case, the module that started the test would need to ask if the test finished constantly.

I don't think backward communication is a total show stopper for the solution, but it needs to be addressed as it directly impacts item 4 of the rules/requirements you listed.

0 replies

willianrampazzo · 2020-11-25T18:32:33Z

willianrampazzo
Nov 25, 2020
Maintainer

So, the idea would be to have one decorator (@listen_to()) for getting events notifications and a method (self.publish()) helping to publish events

Following pub/sub theory, @subscribe_to() please :D

0 replies

willianrampazzo · 2020-11-25T18:34:17Z

willianrampazzo
Nov 25, 2020
Maintainer

The example above is working as a proof-of-concept that I did, decoupled from Avocado. For this experiment, I'm using WebSocket under the hood, because: 1) Most of the languages have multiple libraries to handle all low-level details, and 2) I believe that we could benefit from this protocol to use on our web server (avocado-server). But yes, we could choose a different transport protocol for this.

+1 for WebSockets. It is well known, supported, and documented.

0 replies

willianrampazzo · 2020-11-25T19:01:07Z

willianrampazzo
Nov 25, 2020
Maintainer

Yes, I do understand that we have today some two strong requirements: a) having runners as standalone applications; and b) allowing contributors to write custom runners in any language;

Regarding "a', as discussed before, I still believe that we could better distribute the runners (let's say with a .whl, pip, or even rpm packages) in order to handle better the dependencies. But even for the cases where we are going to use this API, websockets are pretty normal nowadays, and having a client subscribed to a topic is not so complicated. We have a few libraries in multiple languages to support that. So, to accomplish "b" we could have some "hello-world" runner examples in our "Contributor's Guide" for a couple of languages.

I don't see a problem here. I see a positive point. Today it is not possible to write a resolver totally decoupled from Avocado and Python. With a Pub/Sub mechanism, it would be possible. Same for other components of Avocado.

0 replies

willianrampazzo · 2020-11-25T19:05:45Z

willianrampazzo
Nov 25, 2020
Maintainer

Even with this "extra work" on this front, I believe the overall benefit will be better for the project in the mid-long term.

If you look at the code, you will notice that I tried to keep some rules/requirements:

Hide most of the internal details from the developers, by creating one decorator and one basic high-level API (RemoteComponent() class); IMO it is important to abstract this from developers, so we could change this in the future.

+1 here. It makes it easy to develop.

By doing the first, I also would like to use the same API even if we are using different "communication methods". So for instance, let's say the holy grail (one single communication method) is not possible and we decide to go with two major methods: one for remote components, and another one for local components. In my ideal world, the API should be the same, and this is fine.

+1 and I don't see a problem. Standardization is key for flexibility and ease of maintenance.

"Realtime" would be nice, a plus. Once an event is triggered, all "listeners" (including HTML pages) should be able to receive that notification as soon as possible, without waiting for pooling time.

Agreed.

Bi-directional communication.

This point needs a deeper discussion. We need to find a way to accomplish it with the requirements we have today or remove the rule.

0 replies

clebergnu · 2020-11-25T19:25:24Z

clebergnu
Nov 25, 2020
Maintainer

The Avocado project has been adapting its architecture to solve distinguished issues and also to support the needs of new users/projects. New components have been introduced and, probably, new ones will arrive. The new runner (nrunner) it is being designed to have a more decoupled architecture, exchanging more and more messages with these components.

I could identify a few methods of communication, inside Avocado project:

For plugins, we have an internal "Dispatcher" implementation where specific methods are 'hooked' based on the plugin type; Passing to those methods some specific args;

When triggering "runners" on a podman or process, we are serializing/deserializing the necessary data via command-line arguments;

To make item 2 less painful, we have a "recipe file", that is a JSON, and runners can execute from those files as well.

Since runners are executing on a "decoupled way", when reporting status, Tasks are posting progress to a status_server, using "asyncio streams";

avocado-server is running on an HTTP REST API model.

Some internal components are relying on multiple nested "yields" to get results of some calls;

Did I forget any method?

There's the old plugin mechanism in avocado.core.loaders, but let's disregards this. Nice analysis here btw.

Of course, each method makes sense, and I'm sure that there are some strong arguments for using most of them. Also, I do understand that those components were introduced in different moments and possible by different contributors. And most important: They are working fine.

But since debugging here is not so trivial, and we are moving towards a more decoupled direction with more components, having new features delivered in record time it is a strong requirement.

IMO, in order to move fast, and have more contributors we need to take a step back and re-think our communication models/channels.

So, by looking at those cases it seems to me that we need one major thing: A mechanism to subscribe to some events; Let's say: "wake me up every time a job is finished/started", for instance. Regardless of it is a local or remote process.

And this is the proposal that I would like to make: "Improve our communication channel with a standard pub/sub mechanism". This is a very well know message pattern, but here is a quote from Wikipedia:

In software architecture, publish–subscribe is a messaging pattern where senders of messages, called publishers, do not program the messages to be sent directly to specific receivers, called subscribers, but instead categorize published messages into classes without knowledge of which subscribers if any, there may be. Similarly, subscribers express interest in one or more classes and only receive messages that are of interest, without knowledge of which publishers, if any, there are.

So, the idea would be to have one decorator (@listen_to()) for getting events notifications and a method (self.publish()) helping to publish events, like this:
import asyncio

from avocado.core.components import RemoteComponent
from avocado.core.helpers import listen_to


class MyComponent(RemoteComponent):

    @listen_to('avocado.foo')
    def handle_foo(self, data):
        print(data)
        self.publish('avocado.foo.finished', "Data received")

In this snippet, import asyncio is not relevant, right? (just checking)

The example above is working as a proof-of-concept that I did, decoupled from Avocado. For this experiment, I'm using WebSocket under the hood, because: 1) Most of the languages have multiple libraries to handle all low-level details, and 2) I believe that we could benefit from this protocol to use on our web server (avocado-server). But yes, we could choose a different transport protocol for this.

You can find the code here:

https://github.com/beraldoleal/avocado-pubsub

(Please, keep in mind that the code is just an experiment and it is not complete!)

Yes, I do understand that we have today some two strong requirements: a) having runners as standalone applications; and b) allowing contributors to write custom runners in any language;

Regarding "a', as discussed before, I still believe that we could better distribute the runners (let's say with a .whl, pip, or even rpm packages) in order to handle better the dependencies. But even for the cases where we are going to use this API, websockets are pretty normal nowadays, and having a client subscribed to a topic is not so complicated. We have a few libraries in multiple languages to support that. So, to accomplish "b" we could have some "hello-world" runner examples in our "Contributor's Guide" for a couple of languages.

Regarding a), I think a high priority task is to attempt to reuse the know how that Ansible has accumulated in their "Anziball" (or something like it) generation, in which they bundle the requirements to run a module in a single Python file. I've never heard of that mechanism failing, so it inspires a lot of trust to me. Going wheel/rpm etc, is something that other systems such as Beaker have done for tests, and I believe at this point that it'd be overkill. Also, one requirement I've heard from QE folks, is that they don't want to system spoiled when running tests, that is, the test runner itself should not be installing new code. Copying a single file to /tmp/, is acceptable though.

WRT b), yes, that has always been the idea... using common protocols available, if possible, from the standard library of most/all languages, and provide examples or even reference implementations.

Even with this "extra work" on this front, I believe the overall benefit will be better for the project in the mid-long term.

If you look at the code, you will notice that I tried to keep some rules/requirements:

Hide most of the internal details from the developers, by creating one decorator and one basic high-level API (RemoteComponent() class); IMO it is important to abstract this from developers, so we could change this in the future.

By doing the first, I also would like to use the same API even if we are using different "communication methods". So for instance, let's say the holy grail (one single communication method) is not possible and we decide to go with two major methods: one for remote components, and another one for local components. In my ideal world, the API should be the same, and this is fine.

"Realtime" would be nice, a plus. Once an event is triggered, all "listeners" (including HTML pages) should be able to receive that notification as soon as possible, without waiting for pooling time.

I understand that realtime would be a consequence of a sound architecture (like you're proposing). And to really have effective UIs for distributed tests, I see no other way than something along the lines of what you're proposing here.

Just for comparison purposes, autotest would ssh into a machine, run the autotest-client $arguments and parse the output generated, updating a database. The UI would then feed from the database. Clearly not the architecture we want.

Bi-directional communication.

Maybe I'm being too naive, I don't know.... I just would like to share with you and collect feedbacks. Please, join the discussion and let me know what you think about this. And if we decide to proceed, I can draft a Blueprint.

I think this is a worthy proposal to investigate further. But we need to do a better cost/analysis and pick the battles we have the most chance of winning with the least amount of troops. So, for instance, the packing/deployment of runners is something that is currently limiting our progress, and that would make contributors run away from writing their runners that have extra dependencies.

Also, I'm a bit skeptical of this pattern fitting well to replace the plugins, but this is just intended to give extra incentive for us to prove my skepticism unfounded 😄

0 replies

beraldoleal · 2020-11-26T11:04:27Z

beraldoleal
Nov 26, 2020
Maintainer Author

One point to discuss is backward communication. In the current implementation, the state machine always waits for the return of an event in each step. I understand the state machine would subscribe for the desired event to have the necessary information, but the target module cannot communicate back to the broker in some cases.

For example, I'm thinking about a test running on a virtual machine or in a container, where there is no communication backward. In this case, the module that started the test would need to ask if the test finished constantly.

Yes, this is an important point, agreed. Today we already to a "backward" communication using SSH. So we are assuming that our vm/containers are reachable somehow. In a perfect world, I would like to see this as a requirement for a spawner. But for sure, we need to discuss this better.

0 replies

beraldoleal · 2020-11-26T11:07:00Z

beraldoleal
Nov 26, 2020
Maintainer Author

Bi-directional communication.

This point needs a deeper discussion. We need to find a way to accomplish it with the requirements we have today or remove the rule.

Agreed @willianrampazzo . Like I said before, in a "perfect world" this could be a spawner requirement. But for sure we need to discuss this.

0 replies

beraldoleal · 2020-11-26T11:12:58Z

beraldoleal
Nov 26, 2020
Maintainer Author

Thanks, @clebergnu for your comments. IIUC, you are ok with further investigation and keep walking on this road.

Yes, I agree with you, we could start one front (in parallel) thinking about who to properly distribute our runners. But I'm perceiving your comments as a positive welcome.

I will wait for a few more comments and start a blueprint if you agree with it.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Re-thinking our communication channels. What do you think of a pub/sub approach? #4316

{{title}}

Replies: 11 comments

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Re-thinking our communication channels. What do you think of a pub/sub approach? #4316

beraldoleal Nov 19, 2020 Maintainer

Replies: 11 comments

willianrampazzo Nov 25, 2020 Maintainer

willianrampazzo Nov 25, 2020 Maintainer

willianrampazzo Nov 25, 2020 Maintainer

willianrampazzo Nov 25, 2020 Maintainer

willianrampazzo Nov 25, 2020 Maintainer

willianrampazzo Nov 25, 2020 Maintainer

willianrampazzo Nov 25, 2020 Maintainer

clebergnu Nov 25, 2020 Maintainer

beraldoleal Nov 26, 2020 Maintainer Author

beraldoleal Nov 26, 2020 Maintainer Author

beraldoleal Nov 26, 2020 Maintainer Author

beraldoleal
Nov 19, 2020
Maintainer

willianrampazzo
Nov 25, 2020
Maintainer

willianrampazzo
Nov 25, 2020
Maintainer

willianrampazzo
Nov 25, 2020
Maintainer

willianrampazzo
Nov 25, 2020
Maintainer

willianrampazzo
Nov 25, 2020
Maintainer

willianrampazzo
Nov 25, 2020
Maintainer

willianrampazzo
Nov 25, 2020
Maintainer

clebergnu
Nov 25, 2020
Maintainer

beraldoleal
Nov 26, 2020
Maintainer Author

beraldoleal
Nov 26, 2020
Maintainer Author

beraldoleal
Nov 26, 2020
Maintainer Author