Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Connection timeout is not set to client._session_timeout / 1000 / len(client.hosts) in all places #756

Open
andanicolae opened this issue Jul 9, 2024 · 2 comments

Comments

@andanicolae
Copy link

andanicolae commented Jul 9, 2024

Expected Behavior

Client should wait for maximum client._session_timeout / 1000 / len(client.hosts) seconds before moving to next zookeeper server.

Actual Behavior

We are using kazoo to connect each of our clients to one of the 3 zookeeper servers that form a cluster. We set timeout=120s when we instantiate kazoo.client.KazooClient class. Therefore, client._session_timeout = 120000.

Below is the code from kazoo that sets client._session_timeout:
self._session_timeout = int(timeout * 1000)

Our purpose is to have negotiated session timeout=120s and connection_timeout=120s/3=40s. In some places of kazoo code it is timeout=client._session_timeout / 1000.0, while in some other places it is timeout=client._session_timeout / 1000.0 / len(client.hosts).

In all our zookeeper servers, in /conf/zoo.cfg, we have set maxSessionTimeout=120s.
After negotiation, indeed, negotiated_session_timeout is 120s.

We add an iptables rule and we isolate zookeeper leader. The remaining 2 zookeeper servers close the sessions with all the clients and perform leader election. From what I have read, this is the behavior of zookeeper servers when the leader goes down.

kazoo tries to connect to next zookeeper server from the local server list (we shuffle zookeeper server list). Sometimes, kazoo will try to connect to the isolated zookeeper server.
When kazoo tries to connect to any zookeeper server, this code will be called:
with self._socket_error_handling():
self._socket = self.handler.create_connection(
address=(hostip, port),
timeout=client._session_timeout / 1000.0,
use_ssl=self.client.use_ssl,
keyfile=self.client.keyfile,
certfile=self.client.certfile,
ca=self.client.ca,
keyfile_password=self.client.keyfile_password,
verify_certs=self.client.verify_certs,
)

As we can see, timeout is 120s. kazoo waits for 120s and then moves to another zookeeper server. Our expectation is to wait for 40s.

I have added additional logs in protocol/connection.py, which start with 'anda:'
Here I have used dummy values as session ids/passwords.

Logs with logging in DEBUG mode

07/08/2024 03:12:51 AM Using session_id: XXXX session_passwd: YYYY
07/08/2024 03:12:51 AM anda: before create_connection
07/08/2024 03:14:51 AM Connection dropped: socket connection error: None

07/08/2024 03:14:51 AM anda: in _connect_loop after _connect_attempt host = hostip =
07/08/2024 03:14:51 AM anda: in _connect_loop before _connect_attempt host = hostip =
07/08/2024 03:14:51 AM Connecting to :port, use_ssl: False
07/08/2024 03:14:51 AM Using session_id: 0123 session_passwd: VVVV
07/08/2024 03:14:51 AM anda: before create_connection
07/08/2024 03:14:51 AM anda: after create_connection
07/08/2024 03:14:51 AM anda: after Connect
07/08/2024 03:14:51 AM Sending request(xid=None): Connect(protocol_version=0, last_zxid_seen=1234, time_out=120000, session_id=ZZZZ, passwd="TTTT", read_only=None)
07/08/2024 03:14:51 AM Read response Connect(protocol_version=0, last_zxid_seen=0, time_out=120000, session_id=ZZZZ, passwd="TTTT", read_only=False)
07/08/2024 03:14:51 AM anda: after _invoke
07/08/2024 03:14:51 AM Session created, session_id: ZZZZ session_passwd: AAAA
negotiated session timeout: 120000
connect timeout: 40000
read timeout: 80000.0

Logs when I divide timeout value to len(client.hosts). Below is the modified code:
with self._socket_error_handling():
self._socket = self.handler.create_connection(
address=(hostip, port),
timeout=client._session_timeout / 1000.0 / len(client.hosts),
use_ssl=self.client.use_ssl,
keyfile=self.client.keyfile,
certfile=self.client.certfile,
ca=self.client.ca,
keyfile_password=self.client.keyfile_password,
verify_certs=self.client.verify_certs,
)
07/09/2024 04:10:00 AM Connection dropped: socket connection broken
07/09/2024 04:10:00 AM Transition to CONNECTING
07/09/2024 04:10:00 AM Zookeeper connection lost
07/09/2024 04:10:00 AM anda: in _connect_loop after _connect_attempt host = hostip =
07/09/2024 04:10:00 AM anda: in _connect_loop before _connect_attempt host = hostip =
07/09/2024 04:10:00 AM Connecting to :port, use_ssl: False
07/09/2024 04:10:00 AM Using session_id: 1111 session_passwd: 1234
07/09/2024 04:10:00 AM anda: before create_connection
07/09/2024 04:10:40 AM Connection dropped: socket connection error: None

07/09/2024 04:10:40 AM anda: in _connect_loop after _connect_attempt host = hostip =
07/09/2024 04:10:40 AM anda: in _connect_loop before _connect_attempt host = =
07/09/2024 04:10:40 AM Connecting to :port, use_ssl: False
07/09/2024 04:10:40 AM Using session_id: 1234 session_passwd: AAAA
07/09/2024 04:10:40 AM anda: before create_connection
07/09/2024 04:10:40 AM anda: after create_connection
07/09/2024 04:10:40 AM anda: before setblocking
07/09/2024 04:10:40 AM anda: after setblocking
07/09/2024 04:10:40 AM anda: after Connect
07/09/2024 04:10:40 AM Sending request(xid=None): Connect(protocol_version=0, last_zxid_seen=3333, time_out=120000, session_id=1234, passwd='AAAA', read_only=None)
07/09/2024 04:10:40 AM Read response Connect(protocol_version=0, last_zxid_seen=0, time_out=120000, session_id=1234, passwd='BBBB', read_only=False)
07/09/2024 04:10:40 AM anda: after _invoke
07/09/2024 04:10:40 AM Session created, session_id: 1234 session_passwd: GGGG
negotiated session timeout: 120000
connect timeout: 40000
read timeout: 80000.0
07/09/2024 04:10:40 AM Zookeeper connection established, state: CONNECTED

Specifications

  • Kazoo version: 2.7.0
  • Zookeeper version: 3.8.4
  • Zookeeper configuration: put here any useful ZK configuration (authentication, encryption, number of ZK members, number of (concurrent?) clients, Java version, krb5 version, etc.)
  • Python version: Python 2.7
  • OS: RHOSP 8.4
@andanicolae
Copy link
Author

Hello. Gentle reminder :)
Can someone please take a look? Thank you

@ceache
Copy link
Contributor

ceache commented Jul 15, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants