Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Errors at Irregular Intervals #18

Open
sgrallert opened this issue Aug 26, 2022 · 6 comments
Open

Errors at Irregular Intervals #18

sgrallert opened this issue Aug 26, 2022 · 6 comments
Assignees

Comments

@sgrallert
Copy link

Hello everyone,
first of all thank you for this great library. After wmic stopped working, I searched a long time for a good replacement.

I use the wmic_server to monitor some servers. But now I get the following error in irregular intervals:

Aug 25 16:26:14 ##### nice[675]: [2022-08-25 16:26:14 +0200] [675] [CRITICAL] WORKER TIMEOUT (pid:122590)
Aug 25 16:26:14 ##### nice[122590]: ERROR:asyncio:Fatal error: protocol.data_received() call failed.
Aug 25 16:26:14 ##### nice[122590]: protocol: <aiowmi.protocol.Protocol object at 0x7fb800628a30>
Aug 25 16:26:14 ##### nice[122590]: transport: <_SelectorSocketTransport fd=14 read=polling write=<idle, bufsize=0>>
Aug 25 16:26:14 ##### nice[122590]: Traceback (most recent call last):
Aug 25 16:26:14 ##### nice[122590]:   File "/usr/lib/python3.8/asyncio/runners.py", line 44, in run
Aug 25 16:26:14 ##### nice[122590]:     return loop.run_until_complete(main)
Aug 25 16:26:14 ##### nice[122590]:   File "/usr/lib/python3.8/asyncio/base_events.py", line 603, in run_until_complete
Aug 25 16:26:14 ##### nice[122590]:     self.run_forever()
Aug 25 16:26:14 ##### nice[122590]:   File "/usr/lib/python3.8/asyncio/base_events.py", line 570, in run_forever
Aug 25 16:26:14 ##### nice[122590]:     self._run_once()
Aug 25 16:26:14 ##### nice[122590]:   File "/usr/lib/python3.8/asyncio/base_events.py", line 1823, in _run_once
Aug 25 16:26:14 ##### nice[122590]:     event_list = self._selector.select(timeout)
Aug 25 16:26:14 ##### nice[122590]:   File "/usr/lib/python3.8/selectors.py", line 468, in select
Aug 25 16:26:14 ##### nice[122590]:     fd_event_list = self._selector.poll(timeout, max_ev)
Aug 25 16:26:14 ##### nice[122590]:   File "/usr/local/lib/python3.8/dist-packages/gunicorn/workers/base.py", line 203, in handle_abort
Aug 25 16:26:14 ##### nice[122590]:     sys.exit(1)
Aug 25 16:26:14 ##### nice[122590]: SystemExit: 1
Aug 25 16:26:14 ##### nice[122590]: During handling of the above exception, another exception occurred:
Aug 25 16:26:14 ##### nice[122590]: Traceback (most recent call last):
Aug 25 16:26:14 ##### nice[122590]:   File "/usr/lib/python3.8/asyncio/selector_events.py", line 862, in _read_ready__data_received
Aug 25 16:26:14 ##### nice[122590]:     self._protocol.data_received(data)
Aug 25 16:26:14 ##### nice[122590]:   File "/usr/local/lib/python3.8/dist-packages/aiowmi/protocol.py", line 86, in data_received
Aug 25 16:26:14 ##### nice[122590]:     req, data, = self._requests[self._buf.call_id], self._buf.data
Aug 25 16:26:14 ##### nice[122590]: KeyError: 7
Aug 25 16:26:15 ##### nice[675]: [2022-08-25 16:26:15 +0200] [675] [WARNING] Worker with pid 122590 was terminated due to signal 9
Aug 25 16:26:15 ##### nice[133453]: [2022-08-25 16:26:15 +0200] [133453] [INFO] Booting worker with pid: 133453

The servers are read out every 5 minutes with several querys.
What could be the reason for the error?

@eduni
Copy link

eduni commented Sep 1, 2022

Hi everyone,

i have a similar issue. When running wmi queries one by one everything seems fine. After i fire up naemon for regular checks (limited to 6 parallel) with check_wmi_plus (1.68) the server crashes regularly:

Sep 01 15:47:00 ##### nice[25964]: [2022-09-01 15:47:00 +0200] [25964] [CRITICAL] WORKER TIMEOUT (pid:777)
Sep 01 15:47:00 ##### nice[25964]: ERROR:asyncio:Fatal error: protocol.data_received() call failed.
Sep 01 15:47:00 ##### nice[25964]: protocol: <aiowmi.protocol.Protocol object at 0x7f0f7886edf0>
Sep 01 15:47:00 ##### nice[25964]: transport: <_SelectorSocketTransport fd=14 read=polling write=<idle, bufsize=0>>
Sep 01 15:47:00 ##### nice[25964]: Traceback (most recent call last):
Sep 01 15:47:00 ##### nice[25964]:   File "/usr/lib/python3.8/asyncio/runners.py", line 43, in run
Sep 01 15:47:00 ##### nice[25964]:     return loop.run_until_complete(main)
Sep 01 15:47:00 ##### nice[25964]:   File "/usr/lib/python3.8/asyncio/base_events.py", line 595, in run_until_complete
Sep 01 15:47:00 ##### nice[25964]:     self.run_forever()
Sep 01 15:47:00 ##### nice[25964]:   File "/usr/lib/python3.8/asyncio/base_events.py", line 563, in run_forever
Sep 01 15:47:00 ##### nice[25964]:     self._run_once()
Sep 01 15:47:00 ##### nice[25964]:   File "/usr/lib/python3.8/asyncio/base_events.py", line 1808, in _run_once
Sep 01 15:47:00 ##### nice[25964]:     event_list = self._selector.select(timeout)
Sep 01 15:47:00 ##### nice[25964]:   File "/usr/lib/python3.8/selectors.py", line 468, in select
Sep 01 15:47:00 ##### nice[25964]:     fd_event_list = self._selector.poll(timeout, max_ev)
Sep 01 15:47:00 ##### nice[25964]:   File "/usr/lib/python3/dist-packages/gunicorn/workers/base.py", line 192, in handle_abort
Sep 01 15:47:00 ##### nice[25964]: SystemExit: 1
Sep 01 15:47:00 ##### nice[25964]: During handling of the above exception, another exception occurred:
Sep 01 15:47:00 ##### nice[25964]: Traceback (most recent call last):
Sep 01 15:47:00 ##### nice[25964]:   File "/usr/lib/python3.8/asyncio/selector_events.py", line 850, in _read_ready__data_received
Sep 01 15:47:00 ##### nice[25964]:     self._protocol.data_received(data)
Sep 01 15:47:00 ##### nice[25964]:   File "/etc/aiowmi/aiowmi/protocol.py", line 88, in data_received
Sep 01 15:47:00 ##### nice[25964]:     req, data, = self._requests[self._buf.call_id], self._buf.data
Sep 01 15:47:00 ##### nice[25964]: KeyError: 222

@joente
Copy link
Member

joente commented Sep 5, 2022

This could be a result of the connection which is not being closed when it should. In the latest version, 0.2.2 we fixed closing the connection in the finally block. Is it possible to test if this solves the problem? And if not, can you explain which steps to take to reproduce this issue?

@sgrallert
Copy link
Author

Thanks for the fix. I will apply it to my system and check if the error still occurs. However, this may take a few days, as the error sometimes appeared only after a few days.

To recreate the error you basically just need to set up the wmic_server and query multiple servers.
For example, query the classes win32_product, win32_service, win32_logicaldisk and win32_computersystem ("select * from win32_...") every x minutes on x servers. The error sometimes occurred several times a day, sometimes only every few days.

@eduni
Copy link

eduni commented Sep 6, 2022

Hi @joente and thank you for the quick fix. unfortunately the issue is still present in my enviroment:

Sep 06 15:05:10 ##### nice[6999]: [2022-09-06 15:05:10 +0200] [6999] [CRITICAL] WORKER TIMEOUT (pid:7016)
Sep 06 15:05:10 ##### nice[6999]: ERROR:asyncio:Fatal error: protocol.data_received() call failed.
Sep 06 15:05:10 ##### nice[6999]: protocol: <aiowmi.protocol.Protocol object at 0x7f89b60f9670>
Sep 06 15:05:10 ##### nice[6999]: transport: <_SelectorSocketTransport fd=14 read=polling write=<idle, bufsize=0>>
Sep 06 15:05:10 ##### nice[6999]: Traceback (most recent call last):
Sep 06 15:05:10 ##### nice[6999]:   File "/usr/lib/python3.8/asyncio/runners.py", line 43, in run
Sep 06 15:05:10 ##### nice[6999]:     return loop.run_until_complete(main)
Sep 06 15:05:10 ##### nice[6999]:   File "/usr/lib/python3.8/asyncio/base_events.py", line 595, in run_until_complete
Sep 06 15:05:10 ##### nice[6999]:     self.run_forever()
Sep 06 15:05:10 ##### nice[6999]:   File "/usr/lib/python3.8/asyncio/base_events.py", line 563, in run_forever
Sep 06 15:05:10 ##### nice[6999]:     self._run_once()
Sep 06 15:05:10 ##### nice[6999]:   File "/usr/lib/python3.8/asyncio/base_events.py", line 1808, in _run_once
Sep 06 15:05:10 ##### nice[6999]:     event_list = self._selector.select(timeout)
Sep 06 15:05:10 ##### nice[6999]:   File "/usr/lib/python3.8/selectors.py", line 468, in select
Sep 06 15:05:10 ##### nice[6999]:     fd_event_list = self._selector.poll(timeout, max_ev)
Sep 06 15:05:10 ##### nice[6999]:   File "/usr/local/lib/python3.8/dist-packages/gunicorn/workers/base.py", line 203, in handle_abort
Sep 06 15:05:10 ##### nice[6999]:     sys.exit(1)
Sep 06 15:05:10 ##### nice[6999]: SystemExit: 1
Sep 06 15:05:10 ##### nice[6999]: During handling of the above exception, another exception occurred:
Sep 06 15:05:10 ##### nice[6999]: Traceback (most recent call last):
Sep 06 15:05:10 ##### nice[6999]:   File "/usr/lib/python3.8/asyncio/selector_events.py", line 850, in _read_ready__data_received
Sep 06 15:05:10 ##### nice[6999]:     self._protocol.data_received(data)
Sep 06 15:05:10 ##### nice[6999]:   File "/etc/aiowmi/aiowmi/protocol.py", line 88, in data_received
Sep 06 15:05:10 ##### nice[6999]:     req, data, = self._requests[self._buf.call_id], self._buf.data
Sep 06 15:05:10 ##### nice[6999]: KeyError: 149
Sep 06 15:05:11 ##### nice[6999]: [2022-09-06 15:05:11 +0200] [6999] [WARNING] Worker with pid 7016 was terminated due to signal 9
Sep 06 15:05:11 ##### nice[6999]: [2022-09-06 15:05:11 +0200] [31914] [INFO] Booting worker with pid: 31914

the current setup is:
ubuntu 18.4
python 3.8
aiowmi 0.2.2 on gunicorn (version 20.1.0)
check_wmi_plus 1.68
omd labs edition 4.60 with naemon core

so after i have startet the wmic_server service i can run multiple check_wmi_plus checks without any issues. Everything is working like a charm. The issue only occures only after i start the omd site which has multiple parallel check_wmi_plus checks.
the wmic_server service becomes unresponsive and is terminated within the first 30sek.

i can try to reduce the number of checks by moving out some naemon cfg files to reduce the number of checks if it helps.

@sgrallert
Copy link
Author

The error seems to still occur for me as well.

@yjfguhd
Copy link

yjfguhd commented Nov 22, 2022

Meanwhile, it can be run as a SystemD service without using gunicorn by using these lines in wmic_server.service:

ExecStart=/bin/sh -c 'PYTHONPATH=/opt/aiowmi python3 /opt/aiowmi/contrib/wmic_server/wmic_server.py'
ExecStop=/usr/bin/pkill -f 'python3 .*wmic_server\.py'

Alter to suit, of course.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants