Issue Details

Number
29662
Title
More control of maintenance processes at startup/restart
Description
### Please describe the feature you'd like to see added. I'd like some way to control the levelDB compaction processes a bit more. Specifically, if I could control when they are scheduled (hourly in the background, for example), and possibly limit resources they consume (I/O, in particular), I think it would help. ### Is your feature related to a problem, if so please describe it. The problem I'm having is that when I reboot my bitcoin-core node (which runs with `txindex=1`), the RPC listener comes up pretty promptly, so my healthchecks (currently just TCP) pass. However, the service is not in its usual baseline state; it is doing a lot of read I/O, and logging about levelDB compaction. This condition lasts for about an hour. Here is an illustration of the I/O level relative to baseline. The left side is the restart, and the lower right side is after this abates. Dark blue is read: <img width="634" alt="image" src="https://github.com/bitcoin/bitcoin/assets/35895831/f0902e3d-a41f-4a45-bbd4-b22c18537fa6"> Here's a summary of the logs at this time: <img width="500" alt="image" src="https://github.com/bitcoin/bitcoin/assets/35895831/4d42e318-5430-45a7-b761-5c16394d76ab"> The "Compacting" log line is extremely elevated during this period (though it does occur at a much lower level after): <img width="765" alt="image" src="https://github.com/bitcoin/bitcoin/assets/35895831/815b361b-b242-4a89-a32a-a7c5c7e66155"> Here's a graph of RPC trace P50 duration before, during, and after this phase: <img width="1200" alt="image" src="https://github.com/bitcoin/bitcoin/assets/35895831/f6c6ca27-6db2-4ef2-937f-d390da18e66d"> The real problem is the last graph, the elevated RPC latency. The tail and head latencies are also much worse than normal, so it's not just a tail latency issue I could solve with timeouts / hedging. The server goes from microsecond/millisecond latency to 10s of seconds, especially for `sendrawtransaction` (10-30s max), with `listunspent` a distant second (3-4s max). This latency abates as soon as the high I/O and compaction logging stops. I am therefore making the **intuitive leap** (so experts, please consider this critically and I welcome other explanations) that resource utilization during compaction is causing some RPCs to be very slow. I also considered lock contention (perhaps `cs_main`) but I couldn't see it in the code. ### Describe the solution you'd like I'd like the experts to recommend a solution. Intuitively, it seems like levelDB could amortize this compaction work during normal operation as a background task ([the README seems to imply it already should?](https://github.com/google/leveldb?tab=readme-ov-file#read-performance)). Or maybe some way to limit resources used for compaction? ### Describe any alternatives you've considered Currently, I'm looking at alternative ways to do the RPC I enabled `txindex` for, which is `getrawtransaction` without the blockhash. But, that will only sidestep this issue with compaction and RPC latency. ### Please leave any additional context I am using a slower filesystem than most. It is a regionally-replicated NFS store, which we chose for resiliency reasons. Intuitively, I'd expect this problem to be less severe (or shorter duration) with lower-latency storage, but still present. Command line args: ``` txindex="1", rpcworkqueue="1024", rpc_*="redacted", debug="coindb", debug="estimatefee", debug="reindex", debug="leveldb", debug="walletdb", debug="lock", debug="rpc", dbcache="5734", datadir="/home/bitcoin/data", chain="main" ```
URL
https://github.com/bitcoin/bitcoin/issue/29662
Closed by
Back to List