-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problem with reaching an actor (from the outside) after a while #64
Comments
Hi @rifoerster , It would be great if you were able to reproduce this with a simple Actor or set of Actors that you could then attach to this issue for me to reproduce your issue locally. You mention "after a while" but don't give an actual timeframe; I can say that I've used Actors in production that have run without problems for months at a time, so the behavior you are describing is not the expected behavior. You can also look at the internal thesplog logging file that Thespian maintains (https://github.com/thespianpy/Thespian/blob/master/thespian/system/utilis.py#L77-L80) although you will probably want to change the maximum size of that file and the base logging severity (https://github.com/thespianpy/Thespian/blob/master/thespian/system/utilis.py#L25-L28). |
Hi Kevin, I'm a colleague of Riccardo and spent a lot of time hunting this bug. Meanwhile I have reorganized our complete ActorSystem, introduced a Registrar Actor keeping track of all created Actors. The Registrar Actor can send regular Here are some of my findings:
For now I will stay with multiprocUDPBase what seems to be a reliable workaround but every hint to bring some light into this mystery will be very welcome. Regards, |
The TCP protocol is a streaming protocol, so in order for Thespian to reconstruct a message from the byte stream and confirm that reception, there is some bidirectional communication over the TCP transport between the sending actor and the receiving actor. In contrast, the UDP protocol is message oriented with no confirmation of delivery (thus earning it the "unreliable" label since there's no way to know if the packet was received). I cannot tell where you are running the different actors in your architecture, or what the actors are doing without some sample code that reproduces the problem, but you might want to verify that bidirectional traffic is fully functional and promptly handled in your network. It may also be that you have some sort of router or gateway device that is closing or otherwise dropping long-running inactive socket connections. |
All actors are running on the same computer, a Raspberry Pi 4. |
This issue is bothering me since months. Several weeks ago I introduced threads in some of my Actors in order to improve their responsiveness. Thus I experienced this issue for the first time in a reproducible way. It seems to appear, when we send an Actor message from inside a thread that is running within an Actor. The issue was reproducible as well on Linux as on Windows. Meanwhile I have been able to create a kind of minimal example based on your hellogoodbye.py. The example is running with multiprocTCPBase but not with multiprocUDPBase. |
I got a problem with reaching certain actors after a while. The Actor-system sends the message, but the receive function is never called. No dead letter either (Dead letter handling doesn't trigger).
For debugging reasons there is a wakeupafter loop, which still runs, where the actor basically sends itself a message every one minute, and which continues to work. Just after a certain, still unknown frame of time of inactivity (except for the wake-up loop) the actor just cases to receive any messages.
Is there any way to debug this further, what information can i provide to pin down the situation, and how do i get those?
Version: 3.10.5
Python: 3.9
Pipenv: 2021.11.23
The text was updated successfully, but these errors were encountered: