Summary
Several hours after decommissioning a node, the node still shows as UL
or LEAVING
state despite having very little data.
Symptoms
In the context of this article, a stuck or hung decommission is characterised by:
- DataStax Enterprise (DSE) or Cassandra process is still running
- no activity in the
system.log
nodetool status
shows the node in UL or LEAVING statenodetool netstats
shows the node in LEAVING mode and not sending streamsnodetool compactionstats
shows 0 pending tasks- very low or close to zero CPU utilisation
Cause
A common cause of this problem is administrators incorrectly running nodetool drain
prior to decommissioning a node.
When a node is "drained", data is flushed to disk and Cassandra stops listening for connections from clients and other nodes in the cluster. Despite the Cassandra process and JVM still running, for all intents and purposes Cassandra is no longer operational.
The nodetool drain
command is used to prepare a node for a Cassandra or DSE upgrade. Do not run the command as part of a decommission process.
Workaround
Use these steps to get the node to decommission.
Step 1 - Restart DSE on the node.
Step 2 - Run nodetool decommission
again and the node should be removed from the cluster as expected.
See also
DataStax doc - nodetool drain
command