** Update for Mining Sanctuaries that have Lagging Blocks **
So I've been pulling my hair out trying to find out why 'sometimes' (and this also was reported by Oncoapop recently) out of 10 nodes in a Temple, 1 or 2 will start lagging behind on the block count and eventually they need restarted with -erasechain=1 (that was the recommended current solution at least).
Today I finally got the raw logs from one sanc that was doing that and I found part of the root cause:
2023-09-15T23:47:39Z CChainLocksHandler::EnforceBestChainLock -- CLSIG (CChainLockSig(nHeight=447722, blockHash=b6202fdf5ffdf1b315577a2d4ab53f141212f2b8652b7aaff426d781c956bff9)) marked block 494b4e05928fd613d1d831c40fbeb3c2af0044333e793c5ae09e15fbfcddc91e as conflicting
2023-09-15T23:47:39Z ConflictingChainFound: conflicting block=494b4e05928fd613d1d831c40fbeb3c2af0044333e793c5ae09e15fbfcddc91e height=447720 log2_work=60.28583982 date=2023-09-15T23:40:31Z
2023-09-15T23:47:39Z ConflictingChainFound: current best=dd261ab9c7215d3e4acde9d8b4fa2e24c7dccd822b294734798b8f406220b81d height=447717 log2_work=60.28583982 date=2023-09-15T23:25:08Z
2023-09-15T23:47:39Z CChainLocksHandler::EnforceBestChainLock -- CLSIG (CChainLockSig(nHeight=447722, blockHash=b6202fdf5ffdf1b315577a2d4ab53f141212f2b8652b7aaff426d781c956bff9)) marked block 494b4e05928fd613d1d831c40fbeb3c2af0044333e793c5ae09e15fbfcddc91e as conflicting
In llmq for chainlocks we go round robin around all the sancs and form quorums. When the chainlocks is Active, blocks must be enforced by the whole quorum for the best chain. Now we have a unique scenario where the actual sancs are mining also. So if a sanc finds two blocks in a row, I believe quicker than those can be locked by the rest of the network, while another sanc solves a block at the same height that gets locked before the first sanc, it ends up in an internal conflicting state (its pindex status for the conflicting blocks keep getting stored in memory as a conflict and does not get updated). The real question is why does it not reorganize when the main chain grows larger (the chainlocked chain). That part is still is not solved.
But I have a partial solution for now:
If this happens to your node, you can simply stop it with './biblepay-cli -conf=/configs/yournodenumber.conf stop\r\n./biblepayd -conf=/configs/yournodenumber.conf &' to stop and start the node, and when it restarts, the conflicting block will be purged from ram and it will sync to the tip (which is a big relief because I did not want to recommend resyncing the chain in this case as that gets old pretty quick).
I will keep looking into the root cause to see if we can make the node recover when this happens.