Never ending periodic recovery of heuristic participants

问题

For days our log has been full of this message

2018-06-15 12:19:23 WARN [com.arjuna.ats.arjuna] (Periodic Recovery) Transaction 0:ffff0a983f1e:1f3aa2ff:5a09aa02:d1c08c has 1 heuristic participant(s)!
2018-06-15 12:19:23 WARN [com.arjuna.ats.jta] (Periodic Recovery) ARJUNA016037: Could not find new XAResource to use for recovering non-serializable XAResource XAResourceRecord < resource:null, txid:< formatId=131077, gtrid_length=46, bqual_length=36, tx_uid=0:ffff0a983f1e:1f3aa2ff:5a09aa02:d1c08c, node_name=acme_node, branch_uid=0:ffff0a983f1e:1f3aa2ff:5a09aa02:d1c08d, subordinatenodename=null, eis_name=unknown eis name >, heuristic: TwoPhaseOutcome.FINISH_OK com.arjuna.ats.internal.jta.resources.arjunacore.XAResourceRecord@6569a57c >
2018-06-15 12:19:23 WARN [com.arjuna.ats.arjuna] (Periodic Recovery) Transaction 0:ffff0a983f1e:1f3aa2ff:5a09aa02:d1c08c restored heuristic participant XAResourceRecord < resource:null, txid:< formatId=131077, gtrid_length=46, bqual_length=36, tx_uid=0:ffff0a983f1e:1f3aa2ff:5a09aa02:d1c08c, node_name=acme_node, branch_uid=0:ffff0a983f1e:1f3aa2ff:5a09aa02:d1c08d, subordinatenodename=null, eis_name=unknown eis name >, heuristic: TwoPhaseOutcome.FINISH_OK com.arjuna.ats.internal.jta.resources.arjunacore.XAResourceRecord@6569a57c >

It is always the same Xid. Is there a way to solve this? We are considering gracefully shutting down the application and deleting the files in data/tx-object-store. Is this a good idea?

That's with WildFly 11. We have XA transactions set up with Oracle 12c and IBM WebSphere MQ. We are doing XA transactions from a message driven bean to JDBC.

回答1:

I found the answer to the problem in 2.4.1. Assumed complete of the transaction guide.

If a failure occurs in the transaction environment after the transaction coordinator had told the XAResource to commit but before the transaction log has been updated to remove the participant, then recovery will attempt to replay the commit. In the case of a Serialized XAResource, the response from the XAResource will enable the participant to be removed from the log, which will eventually be deleted when all participants have been committed. However, if the XAResource is not recoverable then it is extremely unlikely that any XAResourceRecovery instance will be able to provide the recovery sub-system with a fresh XAResource to use in order to attempt recovery; in which case recovery will continually fail and the log entry will never be removed.

There are two possible solutions to this problem:

Rely on the relevant ExpiryScanner to eventually move the log elsewhere. Manual intervention will then be needed to ensure the log can be safely deleted. If a log entry is moved, suitable warning messages will be output.

Set the com.arjuna.ats.jta.xaAssumeRecoveryComplete to true. This option is checked whenever a new XAResource instance cannot be located from any registered XAResourceRecovery instance. If false (the default), recovery assumes that there is a transient problem with the XAResourceRecovery instances (e.g., not all have been registered with the sub-system) and will attempt recovery periodically. If true then recovery assumes that a previous commit attempt succeeded and this instance can be removed from the log with no further recovery attempts. This option is global, so needs to be used with care since if used incorrectly XAResource instances may remain in an uncommitted state.

回答2:

There is a db transaction that didn't complete and your server is trying to recover. Check insde

SERVER_HOME/standalone/data/tx-object-store/ShadowNoFileLockStore/defaultStore/StateManager/BasicAction/TwoPhaseCoordinator/AtomicAction/

There are transaction files. Maybe first delete the transaction files (On your local/dev env preferably) and tail your logs to identify the transaction/s that didn't complete/commit. Fix the root of the problem and the warnings will go away. Alternatively check the jndiName on the WARNING for an idea which data-source is creating those warnings.

The warnings are "never ending" because you probably have a scheduled task that keeps on attempting (on your specified interval) to talk to your db but transactions are never completed due to the underlying error that you must first fix.

来源：https://stackoverflow.com/questions/50885703/never-ending-periodic-recovery-of-heuristic-participants

标签

Oracle

wildfly

ibm-mq