-
Notifications
You must be signed in to change notification settings - Fork 176
WildFly Transaction Recovery on OpenShift
This document aims to summarize ideas and implementation details on Narayana transaction recovery used as WildFly component on OpenShift.
The issue of running Narayana on OpenShift is that the application server may be started and stopped arbitrarily (app server is containerized and deployed inside of a pod). When the app server is stopped/killed there could be left some unfinished XA transaction records. We bother about XA transactions as the local transactions are maintained in memory and when container is stopped the timeout is capable "resolve" them with a rollback.
At time the XA transaction is prepared the record is persisted at Narayana object store and in the transaction log of the resource (database, JMS broker). If the app server stops at that particular time then the resource (database) may be blocked in other actions by existence of an uncommitted transaction log record. The resolution is to start the WildFly/application server once again and the recovery manager (a component of Narayana) will check the existence of the records at the both sides and commit the XA transaction which means removal ofthe transaction log record at the resource (in database).
If the Narayana object store disappears then there is no record on what was decision of the XA transaction outcome. It’s the Narayana (the transaction manager) who is the arbiter who decides to commit or to rollback.
For Narayana recovery manager being able to correctly decide to commit or to rollback and to process this intention it requires following things
-
access to the Narayana object store
-
ability to contact the remote resource (database)
-
ensuring unique identity of a started Narayana manager instance
-
ensuring stable hostname/IP address (when remote transaction context propagation is in play)
Note
|
In current Narayana version ( |
Existence and the content of the Narayana object store defines whether to commit or rollback. If object store contains and the remote resource transaction log contains the record about the XA transaction then the XA transaction will be committed. If only object store contains the record then the information is printed to server.log
and the record is ignored. When only the remote resource transaction log contains the record then the XA transaction is rolled-back.
Ability to contact remote resource defines if the remote resource log store will be cleared and if the data changes, that could block other operations at the resources, are finished and locks are released.
Identity of the recovery manager ensures that the recovery manager does not roll-back records in the resource transaction log when it was not the creator of the record. Let’s imagine situation that two Narayana manager accesses the same database. First manager creates an XA transaction and processes with the prepare. At that time the first Narayana object store and the remote resource transaction log contains a preapred (in-doubt) transaction record. If everything goes in order then the first Narayana commits the XA transaction. But if in the same time the second Narayana recovery manager accesses the transaction log of the database it can see there is an unfinised prepared record. The second Narayana can see it knows nothing about such XA transaction (no record in second Narayana object store) and it will command to rollback it. That way we got an inconsistency.
For this would not happen the recovery manager is permitted to roll-back the record in the remote resource transaction log only when the transaction identifier
matches. When WildFly is started, it sets the transaction identifier. This identifier is saved into every XA trasaction record (in Narayana object store and in remote resource transaction log). Recovery manager then loads all in-doubt records from the remote resource and filters only those which matches the node identifer.
When the transaction context is propagated to a different JVM at a different node then the recovery manager accesses that remote JVM to finish the transaction. It calls the remote node and asks to commit or rollback. The remote node should not decide on its own, it should wait until the originator of the transaction decides.
When the transaction is started the WildFly saves data on what’s the remote address of the remote JVM. When recovery manager gets into work, it loads that information and connects there. When the remote JVM changes its hostname the recovery manager is not able to contact the remote JVM and command it to finish the transaction.
When the pod is removed we need to be able to access the Narayana object store, have the same configuration to be able to connect to remote resource, be aware of node identifer
used and in case run the pod with the same hostname/IP address. Then we can run the recovery manager and waits for clearing the Narayana object store and the remote resource transaction log.
If done and both are clear then the pod and the storage can be destroyed.
We have two different implementations of the process above.
The first is used for OpenShift 3.x, it’s a bash and python scripts that are part of the s2i
scripts. The second is for Openshift 4.x, it’s part of the WildFly Operator (in golang, if it’s not rewritten to Java/Quarkus).
The solution for OpenShift 3.x is based on a template (https://github.com/jboss-container-images/jboss-eap-7-openshift-image/blob/7.2.x/templates/eap72-tx-recovery-s2i.json) which configures two DeploymentSet`s. First `DeploymentSet
configures the application and the second DeploymentSet
configures a recovery pod which runs alongside of the main application. Important prerequisite is that both DeploymentSet`s (i.e. all started pods) have access to the shared Narayana object store. That could be achieved with a database or with a shared `PersistentVolume
(the volume can be read and written by any pod, any pod may access data of any other pod).
The highlevel process is that when the DeploymentSet
starts a pod the pod saves a descriptor with it’s node identifier
at shared volume or to database. When it’s scaled-down then the application pod is removed. The alongside running recovery pod detects orphaned Narayana object store. Detection works with contacting OpenShift API and listing all running pods. The recovery pod finds the node identifer
from the orphaned descriptor. Then it starts a application server capable to contact the remote resources (prerequisite: the second DeploymentSet
has the same configuration as the first DeploymentSet
). The app server will be started with the orphaned Narayana object store.
WARN: the limitation is that transactions which run the remote transaction context propagation cannot be safely recovered
-
Design document: https://docs.google.com/document/d/1p6IAt0ocaEaepsXtNXpbAmiMMioQgoZr65thYrlkZ2I/edit?ts=5f5a9a38#*
-
Example template with two `DeploymentSet`s: https://github.com/jboss-container-images/jboss-eap-7-openshift-image/blob/7.3.x/templates/eap73-openjdk11-tx-recovery-s2i.json
-
s2i script which launch the app server: https://github.com/jboss-container-images/jboss-eap-modules/blob/7.3.x/jboss/container/eap/launch/added/openshift-launch.sh#L56
-
s2i recovery script: https://github.com/jboss-container-images/jboss-eap-modules/blob/7.3.x/os-eap-txnrecovery/bash/added/partitionPV.sh
-
to build the EAP image: https://docs.google.com/document/d/123lvasGDg65KBfRW1G_HC261uQ6QVjIWZ_R1V0iDrKw/edit#heading=h.4b26xmtt6uhd
-
work with MiniShift (3.x localy): https://github.com/jbosstm/narayana/wiki/Notes-on-CRC-and-Minishift