Overview
This article applies to those scenarios when:
- A node is live but suffers a hardware problem or is showing signs of Failure.
- One node in the cluster is down and cannot be restarted perhaps due to a hardware issue.
- EC2 instance replacement due to instance failure, EC2 instance update, changing instance type whilst keeping the data on the original EBS volume.
Applies to
- DSE 6.0.0 to 6.0.8
- DSE 5.1.0 to 5.1.16
Prerequisite
The requirement for this procedure is that the data directory on the node that is going to be replaced is still accessible and not corrupted.
Benefits
The benefit of this procedure is that no streaming of data will happen in the cluster.
Assumptions
This procedure is valid for single token or vnode clusters.
Let’s assume we have a 3 node cluster. This is the output of nodetool ring showing the tokens and nodetool status showing the nodeID
$ nodetool ring Datacenter: DC1
==========
Address Rack Status State Load Owns Token
3074457345618258602
10.101.34.23 rack1 Up Normal 107.66 MiB ? -9223372036854775808
10.101.33.198 rack1 Up Normal 108.28 MiB ? -3074457345618258603
10.101.33.216 rack1 Down Normal 76.87 MiB ? 3074457345618258602
$ nodetool status
Datacenter: DC1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN 10.101.34.23 107.66 MiB 1 ? 5271e349-1a36-4cd4-80f7-a4640ab10f19 rack1
UN 10.101.33.198 108.28 MiB 1 ? 385e530b-1dc5-4d0f-8740-532eead9af83 rack1
DN 10.101.33.216 76.87 MiB 1 ? a5fd805e-2a96-4358-86f0-76f0b4d1720e rack1
The plan is to replace node 10.101.33.216 with a brand new node that has a different IP without removing the old node and without boostrapping again or streaming data.
Steps
-
We assume the node is down. The node appears in nodetool status DN.
-
Create a backup of the /data directory on this node (10.101.33.216) and scp the data directory to the brand new node - the brand new node must be installed with the same DSE version of the nodes in the cluster. The cassandra.yaml and cassandra-rackdc.properties must also be the same (except for the IP addresses) of all the other nodes in the cluster.
-
Unpack the data directory from the old into the data directory of the new node and check that the data directory and all subdirectories belong to the user running DSE
- Startup the new node (the new node has IP 10.101.32.220) and verify that the new node replaces the old node: the new node has the same nodeID of the old node.
A number of messages will explain what happens, as shown in the brand new node system.log:
INFO [GossipStage:1] 2019-08-20 10:31:18,979 StorageService.java:2739 - Node /10.101.33.198 state jump to NORMAL
INFO [InternalResponseStage:1] 2019-08-20 10:31:18,980 Gossiper.java:1230 - InetAddress /10.101.33.198 is now UP <-- the new node contacts the seed node and starts discovering the cluster
INFO [GossipStage:1] 2019-08-20 10:31:19,001 TokenMetadata.java:520 - Updating topology for /10.101.33.198
INFO [GossipStage:1] 2019-08-20 10:31:19,003 TokenMetadata.java:520 - Updating topology for /10.101.33.198
WARN [GossipTasks:1] 2019-08-20 10:31:19,897 FailureDetector.java:288 - Not marking nodes down due to local pause of 9648546129 > 5000000000
INFO [GossipStage:1] 2019-08-20 10:31:19,901 Gossiper.java:1268 - Node /10.101.33.216 is now part of the cluster <-- the new node discovers the node that we are replacing
INFO [GossipStage:1] 2019-08-20 10:31:19,901 Gossiper.java:1290 - WRITING LOCAL JOIN INFO to [com.datastax.bdp.util.Addresses$Internode$AddressCacheManager@21d06819, com.datastax.bdp.snitch.EndpointStateTracker@47d79d90, org.apache.cassandra.gms.Gossiper$2@7252b74c, org.apache.cassandra.service.StorageService@7aeecaf9, org.apache.cassandra.locator.ReconnectableSnitchHelper@58dce51d, org.apache.cassandra.service.LoadBroadcaster@239cc88b]
WARN [GossipStage:1] 2019-08-20 10:31:19,910 StorageService.java:2778 - Not updating host ID a5fd805e-2a96-4358-86f0-76f0b4d1720e for /10.101.33.216 because it's mine <-- the new node takes the old hostID for itself
INFO [GossipStage:1] 2019-08-20 10:31:19,911 StorageService.java:2836 - Nodes /10.101.33.216 and /10.101.32.220 have the same token 3074457345618258602. Ignoring /10.101.33.216 <-- the new node takes the token that belonged to the old node
INFO [GossipStage:1] 2019-08-20 10:31:19,943 Gossiper.java:1245 - InetAddress /10.101.33.216 is now DOWN
INFO [main] 2019-08-20 10:31:43,060 StorageService.java:1147 - Using saved tokens [3074457345618258602]
INFO [main] 2019-08-20 10:31:43,096 StorageService.java:1844 - JOINING: Finish joining ring <-- the new node joins the ring successfully
The output of nodetool status and nodetool ring now shows:
$ nodetool ring
Datacenter: DC1
==========
Address Rack Status State Load Owns Token
3074457345618258602
10.101.34.23 rack1 Up Normal 110.79 MiB ? -9223372036854775808
10.101.33.198 rack1 Up Normal 114.2 MiB ? -3074457345618258603
10.101.32.220 rack1 Up Normal 97.45 MiB ? 3074457345618258602
$ nodetool status
Datacenter: DC1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN 10.101.34.23 107.66 MiB 1 ? 5271e349-1a36-4cd4-80f7-a4640ab10f19 rack1
UN 10.101.33.198 108.27 MiB 1 ? 385e530b-1dc5-4d0f-8740-532eead9af83 rack1
UN 10.101.32.220 80.79 MiB 1 ? a5fd805e-2a96-4358-86f0-76f0b4d1720e rack1
Caveats
1 - Risk of data inconsistency.
During the procedure, the nodes that stay up will be written to by client applications, while the brand new node will contain an amount of data that was frozen when the copy was done on the down node.
So, it is recommended to run a full repair after the new node has been completely added to the cluster.
2 - If the cluster is monitored by OpsCenter, make sure to install the DataStax agent on the new node to be able to monitor the new node in OpsCenter. DataStax agents must be installed on every managed node in a cluster.