Re-thinking our communication channels. What do you think of a pub/sub approach? #4316
Replies: 11 comments
-
Hi @beraldoleal, thanks for the time you spent evaluating a better communication protocol for Avocado; I really would like to see a standard and modularized solution implemented. I did a high-level review and tested your code. It looks promising to me. It hides most of the development's complexity from the developers. I found it easy to use. Related to the description you made here, I still have some questions or observations I think are worth discussing. To make it easy to reply, I'll quote the text and my questions/observations in other comments. |
Beta Was this translation helpful? Give feedback.
-
+1 Although I agree with you that what we have today works, the fewer communication protocols we have, the easy to debug and extend the code. |
Beta Was this translation helpful? Give feedback.
-
I did some research on the Pub/Sub topic and found it interesting for the decoupling task we are pursuing on Avocado. Based on my experience with other communication architectures, this architecture could be a solution to be used, with some open points for discussion. Adopting a pub/sub communication protocol would make it easy to decouple the main modules, like resolvers, runners, state machines, output producers. This means we could have not just one module for each category I mentioned, but multiple, allowing a configurable workflow. One point to discuss is backward communication. In the current implementation, the state machine always waits for the return of an event in each step. I understand the state machine would subscribe for the desired event to have the necessary information, but the target module cannot communicate back to the broker in some cases. For example, I'm thinking about a test running on a virtual machine or in a container, where there is no communication backward. In this case, the module that started the test would need to ask if the test finished constantly. I don't think backward communication is a total show stopper for the solution, but it needs to be addressed as it directly impacts item 4 of the rules/requirements you listed. |
Beta Was this translation helpful? Give feedback.
-
Following pub/sub theory, |
Beta Was this translation helpful? Give feedback.
-
+1 for WebSockets. It is well known, supported, and documented. |
Beta Was this translation helpful? Give feedback.
-
I don't see a problem here. I see a positive point. Today it is not possible to write a resolver totally decoupled from Avocado and Python. With a Pub/Sub mechanism, it would be possible. Same for other components of Avocado. |
Beta Was this translation helpful? Give feedback.
-
+1 here. It makes it easy to develop.
+1 and I don't see a problem. Standardization is key for flexibility and ease of maintenance.
Agreed.
This point needs a deeper discussion. We need to find a way to accomplish it with the requirements we have today or remove the rule. |
Beta Was this translation helpful? Give feedback.
-
There's the old plugin mechanism in
In this snippet,
Regarding WRT
I understand that realtime would be a consequence of a sound architecture (like you're proposing). And to really have effective UIs for distributed tests, I see no other way than something along the lines of what you're proposing here. Just for comparison purposes, autotest would ssh into a machine, run the
I think this is a worthy proposal to investigate further. But we need to do a better cost/analysis and pick the battles we have the most chance of winning with the least amount of troops. So, for instance, the packing/deployment of runners is something that is currently limiting our progress, and that would make contributors run away from writing their runners that have extra dependencies. Also, I'm a bit skeptical of this pattern fitting well to replace the plugins, but this is just intended to give extra incentive for us to prove my skepticism unfounded 😄 |
Beta Was this translation helpful? Give feedback.
-
Yes, this is an important point, agreed. Today we already to a "backward" communication using SSH. So we are assuming that our vm/containers are reachable somehow. In a perfect world, I would like to see this as a requirement for a spawner. But for sure, we need to discuss this better. |
Beta Was this translation helpful? Give feedback.
-
Agreed @willianrampazzo . Like I said before, in a "perfect world" this could be a spawner requirement. But for sure we need to discuss this. |
Beta Was this translation helpful? Give feedback.
-
Thanks, @clebergnu for your comments. IIUC, you are ok with further investigation and keep walking on this road. Yes, I agree with you, we could start one front (in parallel) thinking about who to properly distribute our runners. But I'm perceiving your comments as a positive welcome. I will wait for a few more comments and start a blueprint if you agree with it. |
Beta Was this translation helpful? Give feedback.
-
The Avocado project has been adapting its architecture to solve distinguished issues and also to support the needs of new users/projects. New components have been introduced and, probably, new ones will arrive. The new runner (nrunner) it is being designed to have a more decoupled architecture, exchanging more and more messages with these components.
I could identify a few methods of communication, inside Avocado project:
For plugins, we have an internal "Dispatcher" implementation where specific methods are 'hooked' based on the plugin type; Passing to those methods some specific args;
When triggering "runners" on a podman or process, we are serializing/deserializing the necessary data via command-line arguments;
To make item 2 less painful, we have a "recipe file", that is a JSON, and runners can execute from those files as well.
Since runners are executing on a "decoupled way", when reporting status, Tasks are posting progress to a status_server, using "asyncio streams";
avocado-server is running on an HTTP REST API model.
Some internal components are relying on multiple nested "yields" to get results of some calls;
Did I forget any method?
Of course, each method makes sense, and I'm sure that there are some strong arguments for using most of them. Also, I do understand that those components were introduced in different moments and possible by different contributors. And most important: They are working fine.
But since debugging here is not so trivial, and we are moving towards a more decoupled direction with more components, having new features delivered in record time it is a strong requirement.
IMO, in order to move fast, and have more contributors we need to take a step back and re-think our communication models/channels.
So, by looking at those cases it seems to me that we need one major thing: A mechanism to subscribe to some events; Let's say: "wake me up every time a job is finished/started", for instance. Regardless of it is a local or remote process.
And this is the proposal that I would like to make: "Improve our communication channel with a standard pub/sub mechanism". This is a very well know message pattern, but here is a quote from Wikipedia:
So, the idea would be to have one decorator (
@listen_to()
) for getting events notifications and a method (self.publish()
) helping to publish events, like this:The example above is working as a proof-of-concept that I did, decoupled from Avocado. For this experiment, I'm using WebSocket under the hood, because: 1) Most of the languages have multiple libraries to handle all low-level details, and 2) I believe that we could benefit from this protocol to use on our web server (avocado-server). But yes, we could choose a different transport protocol for this.
You can find the code here:
https://github.com/beraldoleal/avocado-pubsub
(Please, keep in mind that the code is just an experiment and it is not complete!)
Yes, I do understand that we have today some two strong requirements: a) having runners as standalone applications; and b) allowing contributors to write custom runners in any language;
Regarding "a', as discussed before, I still believe that we could better distribute the runners (let's say with a .whl, pip, or even rpm packages) in order to handle better the dependencies. But even for the cases where we are going to use this API, websockets are pretty normal nowadays, and having a client subscribed to a topic is not so complicated. We have a few libraries in multiple languages to support that. So, to accomplish "b" we could have some "hello-world" runner examples in our "Contributor's Guide" for a couple of languages.
Even with this "extra work" on this front, I believe the overall benefit will be better for the project in the mid-long term.
If you look at the code, you will notice that I tried to keep some rules/requirements:
Hide most of the internal details from the developers, by creating one decorator and one basic high-level API (
RemoteComponent()
class); IMO it is important to abstract this from developers, so we could change this in the future.By doing the first, I also would like to use the same API even if we are using different "communication methods". So for instance, let's say the holy grail (one single communication method) is not possible and we decide to go with two major methods: one for remote components, and another one for local components. In my ideal world, the API should be the same, and this is fine.
"Realtime" would be nice, a plus. Once an event is triggered, all "listeners" (including HTML pages) should be able to receive that notification as soon as possible, without waiting for pooling time.
Bi-directional communication.
Maybe I'm being too naive, I don't know.... I just would like to share with you and collect feedbacks. Please, join the discussion and let me know what you think about this. And if we decide to proceed, I can draft a Blueprint.
Beta Was this translation helpful? Give feedback.
All reactions