Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error: pq: duplicate key value violates unique constraint "pk_msgkey" #1048

Open
wutkemtt opened this issue May 22, 2020 · 5 comments
Open

Comments

@wutkemtt
Copy link

Hi,
we are currently facing an error related to the message store on Postgres:
[1] 2020/05/20 21:54:44.186034 [ERR] STREAM: [Client:NatsConnector2fcc937df-d4b0-44e3-8139-95a98ee668b6] Error processing message for subject "_STAN.pub.nats_cluster.dps.eu.machine.actualvaluesv01": sql: error executing "INSERT INTO Messages VALUES ($1, $2, $3, $4, $5)": pq: duplicate key value violates unique constraint "pk_msgkey" [1] 2020/05/20 21:54:44.189513 [ERR] STREAM: [Client:NatsConnector2fcc937df-d4b0-44e3-8139-95a98ee668b6] Error processing message for subject "_STAN.pub.nats_cluster.dps.eu.machine.actualvaluesv01": sql: error executing "INSERT INTO Messages VALUES ($1, $2, $3, $4, $5)": pq: duplicate key value violates unique constraint "pk_msgkey" [1] 2020/05/20 21:54:44.193846 [ERR] STREAM: [Client:NatsConnector2fcc937df-d4b0-44e3-8139-95a98ee668b6] Error processing message for subject "_STAN.pub.nats_cluster.dps.eu.machine.actualvaluesv01": sql: error executing "INSERT INTO Messages VALUES ($1, $2, $3, $4, $5)": pq: duplicate key value violates unique constraint "pk_msgkey" [1] 2020/05/20 21:54:44.197352 [ERR] STREAM: [Client:NatsConnector2fcc937df-d4b0-44e3-8139-95a98ee668b6] Error processing message for subject "_STAN.pub.nats_cluster.dps.eu.machine.actualvaluesv01": sql: error executing "INSERT INTO Messages VALUES ($1, $2, $3, $4, $5)": pq: duplicate key value violates unique constraint "pk_msgkey" [1] 2020/05/20 21:54:44.200989 [ERR] STREAM: [Client:NatsConnector2fcc937df-d4b0-44e3-8139-95a98ee668b6] Error processing message for subject "_STAN.pub.nats_cluster.dps.eu.machine.actualvaluesv01": sql: error executing "INSERT INTO Messages VALUES ($1, $2, $3, $4, $5)": pq: duplicate key value violates unique constraint "pk_msgkey" [1] 2020/05/20 21:54:44.205274 [ERR] STREAM: [Client:NatsConnector2fcc937df-d4b0-44e3-8139-95a98ee668b6] Error processing message for subject "_STAN.pub.nats_cluster.dps.eu.machine.actualvaluesv01": sql: error executing "INSERT INTO Messages VALUES ($1, $2, $3, $4, $5)": pq: duplicate key value violates unique constraint "pk_msgkey" [1] 2020/05/20 21:54:44.209437 [ERR] STREAM: [Client:NatsConnector2fcc937df-d4b0-44e3-8139-95a98ee668b6] Error processing message for subject "_STAN.pub.nats_cluster.dps.eu.machine.actualvaluesv01": sql: error executing "INSERT INTO Messages VALUES ($1, $2, $3, $4, $5)": pq: duplicate key value violates unique constraint "pk_msgkey" [1] 2020/05/20 21:54:44.214883 [ERR] STREAM: [Client:NatsConnector2fcc937df-d4b0-44e3-8139-95a98ee668b6] Error processing message for subject "_STAN.pub.nats_cluster.dps.eu.machine.actualvaluesv01": sql: error executing "INSERT INTO Messages VALUES ($1, $2, $3, $4, $5)": pq: duplicate key value violates unique constraint "pk_msgkey" [1] 2020/05/20 21:54:44.218560 [ERR] STREAM: [Client:NatsConnector2fcc937df-d4b0-44e3-8139-95a98ee668b6] Error processing message for subject "_STAN.pub.nats_cluster.dps.eu.machine.actualvaluesv01": sql: error executing "INSERT INTO Messages VALUES ($1, $2, $3, $4, $5)": pq: duplicate key value violates unique constraint "pk_msgkey" [1] 2020/05/20 21:54:44.387503 [ERR] STREAM: [Client:NatsConnectorDPSWSF_AObd2b11e8-5555-4056-ba5e-763b5f8750df] Error processing message for subject "_STAN.pub.nats_cluster.dps.eu.machine.actualvaluesv01": sql: error executing "INSERT INTO Messages VALUES ($1, $2, $3, $4, $5)": pq: duplicate key value violates unique constraint "pk_msgkey"
The shown subject is the most used subject with approx. 7,5 Mio messages stored in the messages table. At the moment we have 3,5 Messages per second. Our NATS-Streaming instance is running in an AKS Cluster with 2-core nodes, using 95% cpu and approx. 770 MB memory.

@kozlovic
Copy link
Member

Sorry about the trouble.
Are you running on clustering mode or standalone/FT?
I think at this point the best course of action if in standalone/FT mode would be to restart the server so that it gets its internal state (in this case the nextSequence to store a message) properly sync'ed with what is in the table.
Which version of NATS Streaming Server are you using?

@wutkemtt
Copy link
Author

No problem, at the moment its okay as we prepare the next rollout phase, so we learn a lot these days.
Our NATS-Streaming server runs in FT mode with the latest version 0.17.0. In addition to this problem we saw some high IOPS pressure and suppose our metrics system was not able to write away the data in time. So we changed the configuration of it and the network traffic stablilized at a normal level. The NATS-Streaming pods were restarted too, so currently it seems to work.
Sometimes on high loads in the cluster (many messages or clients requesting the whole history of a subject) we saw the STAN pods switching, because the leader restarted. In the logs we could not find the cause for that strange behaviour.
Is there a recommendation regarding the node size and IOPS when running NATS-Streaming inside a Kubernetes cluster?
Should we think about a daily restart of the NATS-Streaming leader to avoid the mentioned problem?

@kozlovic
Copy link
Member

Sometimes on high loads in the cluster (many messages or clients requesting the whole history of a subject) we saw the STAN pods switching, because the leader restarted. In the logs we could not find the cause for that strange behaviour.

To be clear, you are saying that you run in FT mode. FT is different from clustering. Are you saying that "leader" here is the active server and it failed so a standby took over?

Is there a recommendation regarding the node size and IOPS when running NATS-Streaming inside a Kubernetes cluster?

You understand that it all depends on usage. I can't give you any number.

Should we think about a daily restart of the NATS-Streaming leader to avoid the mentioned problem?

In FT mode, when the active server is stopped and the standby becomes active, it has to recover the store, which depending on the size and disk speed, can take a bit of time. So again, it all depends your usage, and if you can afford "downtime" during the day.

@wutkemtt
Copy link
Author

Yes, I meant active server, the standby took over.
This failover took a few minutes because of our database, which is 10GB in size and approx. 21Mio messages. So this short downtime will be a problem during working hours as soon as the production lines of our customer are started. I'll keep a sharp eye on the NATS cluster and the STAN instances. Hopefully I
'll get an idea for the cause of the restarts.

@kozlovic
Copy link
Member

Checking on this issue: if the standby has taken over, it should get the last sequence from the channel and there should not be duplicate key issues. It almost look like the active was writing say 100, 101 and 102, but when the standby becomes active, it gets from the DB that the last is 100 (as if 101 and 102 was not yet visible in the DB), then those 101 and 102 are committed while the standby-now-active tries to store the next message as 101.

First, the standby should not be able to become active if the active is still running (because of FT heartbeats and storelock table being updated).

Second, we use DB transactions, so say active writes messages 101 and 102 and crashes, then transaction would be rolledback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants