This article discusses an issue where applications experience issues with operations involving lightweight transactions.
The impact of the issue varies from cluster to cluster. It can include some or all of the following:
- increased latency for write requests,
- queries timing out,
- high number of connections on application server(s), and/or
- conditional updates for some partitions fail.
A common symptom is the size of
system.paxos table increasing (or already very large) on one or more nodes in the cluster. Here is an example
nodetool cfstats output on a node where the
system.paxos table has a high SSTable count and has grown to over 25GB with over a million keys:
Table: paxos SSTable count: 140 SSTables in each level: [1, 10, 100, 29, 0, 0, 0, 0, 0] Space used (live): 27783858594 Space used (total): 27783858594 Space used by snapshots (total): 0 Off heap memory used (total): 120740454 SSTable Compression Ratio: 0.24511286813067962 Number of keys (estimate): 106189898 ...
cfhistograms were deprecated in Cassandra 2.2 by CASSANDRA-8872. Use
tablehistograms for DSE 5.0 and later.
Attempts to replicate the problem to date have not been successful. The hypothesis is that one or more transactions get blocked after the paxos proposal is accepted. The transaction(s) might have been committed but possibly doesn't get acknowledged, effectively locking out the affected partition(s) from further updates.
In the meantime, new proposals continue to arrive causing the paxos table to grow. It eventually builds up sufficiently that the LWT requests either start timing out or never get processed.
It is possible to get around the issue by removing the contents of the
system.paxos table on the affected nodes, e.g. a node with a large amount of data based on the output of
nodetool tablestats. In a small number of cases, it may be necessary to remove the contents of the paxos table on all nodes in the cluster.
WARNING - It is not advisable to
system.paxos table since the effect it will have on transactions which are in-flight can be very unpredictable. Instead, follow the production-tested procedure below.
PREREQUISITE - Run
nodetool tablestats system.paxos on all nodes in the cluster and compare the outputs to determine which nodes are affected based on:
- SSTable count,
- total space used, and/or
- number of keys.
Perform the steps below on all affected nodes:
Step 1 - Prepare for a clean shutdown which flushes all memtables to disk by running the following command:
$ nodetool drain
NOTE - This ensures there will be no commit logs to replay at startup.
Step 2 - Shutdown DSE.
Step 3 - Delete all the data files in
Step 4 - Restart DSE.
Repeat the steps above until all affected nodes have been remediated.