Summary
This article discusses an issue where applications experience issues with operations involving lightweight transactions.
Symptoms
The impact of the issue varies from cluster to cluster. It can include some or all of the following:
- increased latency for write requests,
- queries timing out,
- high number of connections on application server(s), and/or
- conditional updates for some partitions fail.
A common symptom is the size of system.paxos
table increasing (or already very large) on one or more nodes in the cluster. Here is an example nodetool cfstats
output on a node where the system.paxos
table has a high SSTable count and has grown to over 25GB with over a million keys:
Table: paxos SSTable count: 140 SSTables in each level: [1, 10, 100, 29, 0, 0, 0, 0, 0] Space used (live): 27783858594 Space used (total): 27783858594 Space used by snapshots (total): 0 Off heap memory used (total): 120740454 SSTable Compression Ratio: 0.24511286813067962 Number of keys (estimate): 106189898 ...
NOTE - cfstats
and cfhistograms
were deprecated in Cassandra 2.2 by CASSANDRA-8872. Use tablestats
or tablehistograms
for DSE 5.0 and later.
Cause
Attempts to replicate the problem to date have not been successful. The hypothesis is that one or more transactions get blocked after the paxos proposal is accepted. The transaction(s) might have been committed but possibly doesn't get acknowledged, effectively locking out the affected partition(s) from further updates.
In the meantime, new proposals continue to arrive causing the paxos table to grow. It eventually builds up sufficiently that the LWT requests either start timing out or never get processed.
Workaround
It is possible to get around the issue by removing the contents of the system.paxos
table on the affected nodes, e.g. a node with a large amount of data based on the output of nodetool tablestats
. In a small number of cases, it may be necessary to remove the contents of the paxos table on all nodes in the cluster.
WARNING - It is not advisable to TRUNCATE
the system.paxos
table since the effect it will have on transactions which are in-flight can be very unpredictable. Instead, follow the production-tested procedure below.
PREREQUISITE - Run nodetool tablestats system.paxos
on all nodes in the cluster and compare the outputs to determine which nodes are affected based on:
- SSTable count,
- total space used, and/or
- number of keys.
Perform the steps below on all affected nodes:
Step 1 - Prepare for a clean shutdown which flushes all memtables to disk by running the following command:
$ nodetool drain
NOTE - This ensures there will be no commit logs to replay at startup.
Step 2 - Shutdown DSE.
Step 3 - Delete all the data files in system/paxos-*
.
Step 4 - Restart DSE.
Repeat the steps above until all affected nodes have been remediated.