Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(m)TLS replication is broken in 2.9.0 #2490

Open
1 of 2 tasks
kinoute opened this issue Aug 13, 2024 · 4 comments
Open
1 of 2 tasks

(m)TLS replication is broken in 2.9.0 #2490

kinoute opened this issue Aug 13, 2024 · 4 comments
Labels
bug type bug

Comments

@kinoute
Copy link
Contributor

kinoute commented Aug 13, 2024

Search before asking

  • I had searched in the issues and found no similar issues.

Version

2.9.0

Minimal reproduce step

When upgrading Kvrocks from 2.8.0 to 2.9.0, we started to get SSL/TLS errors when trying to connect a slave to the master. No problem on replication without TLS.

Both master and slave are on 2.9.0. Rolling back the master to 2.8.0 and keeping the replica on 2.9.0 is working so it is definitely on the "server/master" part.

When both were on 2.9.0, using redis-cli on the slave instance to connect to the master was working with the certificates so they are fine:

No errors (on replica instance):

redis-cli -h kvrocks-master \
  -p 6379 \
   --tls \
   --cacert /ca/kvrocks/ca.crt \
   --cert /tls/kvrocks/tls.crt \
   --key /tls/kvrocks/tls.key

Errors (replica instance, see below):

kvrocks -c kvrocks.conf \
      --dir /var/lib/kvrocks \
      --pidfile /var/run/kvrocks/kvrocks.pid \
      --masterauth "xxx" \
      --slaveof "kvrocks-master 6379" \
      --tls-ca-cert-file /ca/kvrocks/ca.crt \
      --tls-key-file /tls/kvrocks/tls.key \
      --tls-cert-file /tls/kvrocks/tls.crt \
      --tls-replication yes \
      --bind 0.0.0.0

What did you expect to see?

A working (m)TLS replication that either does psync or full synchronization

What did you see instead?

Server (MASTER) :

kvrocks I20240813 08:37:26.913834 121 cmd_replication.cc:60] Slave 100.65.46.20:45098, listening port: 6379, announce ip: 100.65.46.20 asks for synchronization with next sequence: 1 replication id: not supported, and local sequence: 344837857
kvrocks E20240813 08:37:26.918999 121 redis_connection.cc:109] [connection] Going to remove the client: 100.65.46.20:45098, while encounter error: Success, SSL Error: error:0A000126:SSL routines::unexpected eof while reading
kvrocks I20240813 08:37:26.986922 193 cmd_replication.cc:242] [replication] Succeed sending full data file info to 100.65.46.20
kvrocks W20240813 08:37:27.038514 194 cmd_replication.cc:299] [replication] Fail to send file CURRENT to 100.65.46.20, error: Success
kvrocks I20240813 08:37:37.086854 195 cmd_replication.cc:242] [replication] Succeed sending full data file info to 100.65.46.20
kvrocks W20240813 08:37:37.127951 196 cmd_replication.cc:299] [replication] Fail to send file CURRENT to 100.65.46.20, error: Success

Client (REPLICA) :

W20240813 08:00:12.653694 50 replication.cc:935] [fetch] Fail to fetch file 005813.sst, err: fetch file err: read sst file: failed to read from SSL connection: error:00000000:lib(0)::reason(0)
W20240813 08:00:12.655525 49 replication.cc:935] [fetch] Fail to fetch file 009665.sst, err: fetch file err: read sst file: failed to read from SSL connection: error:00000000:lib(0)::reason(0)
W20240813 08:00:12.660212 51 replication.cc:935] [fetch] Fail to fetch file 008792.sst, err: fetch file err: read sst file: failed to read from SSL connection: error:00000000:lib(0)::reason(0)
W20240813 08:00:12.661721 52 replication.cc:935] [fetch] Fail to fetch file 005736.sst, err: fetch file err: read sst file: failed to read from SSL connection: error:00000000:lib(0)::reason(0)

Anything Else?

Is it safe to downgrade to 2.8.0 on instances where I need (m)tls replication? Could it be due to the switch to Debian?

Are you willing to submit a PR?

  • I'm willing to submit a PR!
@kinoute kinoute added the bug type bug label Aug 13, 2024
@PragmaTwice
Copy link
Member

Could it be due to the switch to Debian?

Have you tried to build kvrocks in your own environment and see if TLS replication works well?

@kinoute
Copy link
Contributor Author

kinoute commented Aug 13, 2024

Could it be due to the switch to Debian?

Have you tried to build kvrocks in your own environment and see if TLS replication works well?

We use Kvrocks in Kubernetes with the official Docker images

@kinoute
Copy link
Contributor Author

kinoute commented Aug 13, 2024

I tried with the unstable/nightly Docker tag for the master instance: I don't have the error. I then rollbacked to 2.9.0, the error is gone and does not reappear. Really weird.

I still have some running instances (on other clusters) where I didn't upgrade to nightly so the problem is still there, do you want me to run some checks/commands in order to get some ideas about why this is happening?

Edit: The problem is still here, nevermind

@kinoute
Copy link
Contributor Author

kinoute commented Aug 13, 2024

I built Kvrocks 2.9.0 with the Docker Alpine image from 2.8: no SSL/TLS replication errors. The image is here: https://hub.docker.com/r/hivacruz/kvrocks-alpine/tags

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug type bug
Projects
None yet
Development

No branches or pull requests

2 participants