Several hours after decommissioning a node, the node still shows as
LEAVING state despite having very little data.
In the context of this article, a stuck or hung decommission is characterised by:
- DataStax Enterprise (DSE) or Cassandra process is still running
- no activity in the
nodetool statusshows the node in UL or LEAVING state
nodetool netstatsshows the node in LEAVING mode and not sending streams
nodetool compactionstatsshows 0 pending tasks
- very low or close to zero CPU utilisation
A common cause of this problem is administrators incorrectly running
nodetool drain prior to decommissioning a node.
When a node is "drained", data is flushed to disk and Cassandra stops listening for connections from clients and other nodes in the cluster. Despite the Cassandra process and JVM still running, for all intents and purposes Cassandra is no longer operational.
nodetool drain command is used to prepare a node for a Cassandra or DSE upgrade. Do not run the command as part of a decommission process.
Use these steps to get the node to decommission.
Step 1 - Restart DSE on the node.
Step 2 - Run
nodetool decommission again and the node should be removed from the cluster as expected.
DataStax doc -
nodetool drain command