[10.x] Detect MySQL read-only mode error as a lost connection #48517

cosmastech · 2023-09-24T10:39:48Z

To resolve #48486. The case described in the issue is using database for the queue connection and AWS Aurora failing over during the queue worker run.

If database is in read-only mode, then queue worker cannot update tables as needed and should kill the worker.

I admit that I do not know what consequences there might be for including this in the DetectsLostConnections trait, as it is employed by several other classes... seems to me if the PDO is throwing an exception and that exception is reporting a read-only error, it's probably safe to call it a lost connection, but I'll admit I'm looking at this through a pretty myopic lens.

taylorotwell · 2023-09-25T00:42:09Z

So... does this fix the issue? Have you tested it in a real scenario?

cosmastech · 2023-09-25T00:44:41Z

So... does this fix the issue? Have you tested it in a real scenario?

I ran a test scenario, not against a master failover.

Create a job that sets the database to read-only. The worker dies with the PR change. Without it, the queue worker will keep running but will never be able to advance.

GrahamCampbell · 2023-09-25T07:58:31Z

There is some risk here that this will result in thrashing connections to people's databases if the writer is in read-only mode or we are connected to a reader, and re-connecting doesn't change that.

cosmastech · 2023-09-25T10:45:33Z

There is some risk here that this will result in thrashing connections to people's databases if the writer is in read-only mode or we are connected to a reader, and re-connecting doesn't change that.

Any idea what might be a better solution? We could create a new trait DetectsReadOnlyMode and employ it on the worker (so that it's not thrashing when calling DetectsLostConnections@causedByLostConnection() from within Connector or ManagesTransactions)

That said, from a quick scan at the current errors checked within DetectLostConnections, isn't there a chance of thrashing as described already?

'running with the --read-only option so it cannot execute this statement' and 'Reason: Server is in script upgrade mode. Only administrator can connect at this time.'

peterlupu · 2023-09-30T15:13:59Z

Just from the top of my head, what about on the first error encountered we reconnect and if we get the same read-only error, we fall back to current behavior of throwing the exception and failing?

cosmastech · 2023-10-03T14:44:53Z

Just from the top of my head, what about on the first error encountered we reconnect and if we get the same read-only error, we fall back to current behavior of throwing the exception and failing?

@peterlupu I think that's a good approach, but I feel like it wouldn't make sense to apply it to just this one exception. Anything that is considered recoverable should be tried N times before finally bubbling the exception up to the client code.

Would love to get some feedback on which cases those are.

deleugpn · 2023-10-07T10:29:34Z

Reading the linked issue, it sounds to me this is a Worker problem and not a connection problem. A Queue Worker is an infinite loop process and as such it makes itself resilient by having an eager try/catch that prevents it from dying. But some situations are good reason to cause the infinite loop to die.

i.e. suppose your Queue Worker is writing some logs to local storage. If your disk reaches 100%, it's pointless to keep trying to work anything as everything will always be in a failing state. This particular issue is better fixed in a way that the infinite loop of the Worker is able to detect that there was a read-only connection. Then the worker can decide whether it wants to try reconnecting and/or simply die and let the outside orchestration (ECS, Kubernetes, Supervisord, etc) spin up a fresh worker.

driesvints · 2023-11-06T10:52:08Z

@cosmastech are you still working on this?

cosmastech · 2023-11-06T11:19:33Z

@cosmastech are you still working on this?

@driesvints to be honest, from the feedback I got, I didn't know if there was a particular change that needs to be made. I would rather have it opened back up for review from @taylorotwell. He seemed concerned in his comment that I had not tested it at all (which I had), but some other concerns were brought up.

taylorotwell · 2023-11-07T13:25:27Z

This shows zero files changed?

detect read-only mode error as a lost connection

326dc6f

cosmastech force-pushed the feature/read-only-mode-failure branch from a88ac20 to 326dc6f Compare September 24, 2023 11:15

cosmastech marked this pull request as ready for review September 24, 2023 11:15

taylorotwell marked this pull request as draft September 25, 2023 00:42

Merge branch '10.x' into feature/read-only-mode-failure

3f6a662

cosmastech marked this pull request as ready for review November 6, 2023 11:19

taylorotwell closed this Nov 7, 2023

cosmastech mentioned this pull request Nov 7, 2023

[10.x] Detect MySQL read-only mode error as a lost connection #48937

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[10.x] Detect MySQL read-only mode error as a lost connection #48517

[10.x] Detect MySQL read-only mode error as a lost connection #48517

cosmastech commented Sep 24, 2023 •

edited

Loading

taylorotwell commented Sep 25, 2023

cosmastech commented Sep 25, 2023

GrahamCampbell commented Sep 25, 2023

cosmastech commented Sep 25, 2023

peterlupu commented Sep 30, 2023

cosmastech commented Oct 3, 2023

deleugpn commented Oct 7, 2023

driesvints commented Nov 6, 2023

cosmastech commented Nov 6, 2023

taylorotwell commented Nov 7, 2023

[10.x] Detect MySQL read-only mode error as a lost connection #48517

[10.x] Detect MySQL read-only mode error as a lost connection #48517

Conversation

cosmastech commented Sep 24, 2023 • edited Loading

taylorotwell commented Sep 25, 2023

cosmastech commented Sep 25, 2023

GrahamCampbell commented Sep 25, 2023

cosmastech commented Sep 25, 2023

peterlupu commented Sep 30, 2023

cosmastech commented Oct 3, 2023

deleugpn commented Oct 7, 2023

driesvints commented Nov 6, 2023

cosmastech commented Nov 6, 2023

taylorotwell commented Nov 7, 2023

cosmastech commented Sep 24, 2023 •

edited

Loading