Summary
Sometimes users may run nodetool cleanup. In most cases this is safe to do so, however you need to make sure that the addition of any new nodes into the cluster is complete before running cleanup
Symptoms
After new nodes are added into the cluster and running cleanup, there appears to be data loss.
Cause
If new nodes have not joined the cluster completely then cleanup will remove the data from the original owning nodes
Example
The following example demonstrates this particular scenario
Test case - running cleanup before rebuild / bootstrap / repair
1 - 2 node cluster, keyspace using RF:2
-- Address Load Tokens Owns Host ID Rack
UN 192.168.56.21 513.37 MB 1 50.5% 6e6aadea-d7a2-4f4a-ae3d-b28ecd78acf4 RAC1
UN 192.168.56.22 645.21 MB 1 49.5% e5a97fd6-4c1d-48f2-9493-ee811539e8a4 RAC1
2 - Inserted 1000 records, the following cqlsh query was run on nodes 192.168.56.21 and 192.168.56.22
cqlsh> select * from markc.mark1 ;
<...truncated for brevity...>
(1000 rows)
3 - Added a 3rd node 192.168.56.23 with autobootstrap = false
4 - Ensured node joined the cluster ok (no data is streamed)
-- Address Load Tokens Owns Host ID Rack
UN 192.168.56.21 511.83 MB 1 32.4% 6e6aadea-d7a2-4f4a-ae3d-b28ecd78acf4 RAC1
UN 192.168.56.23 63.58 KB 1 18.1% 96d4d58f-f654-4e5d-81ad-77ca9e6ce983 RAC1
UN 192.168.56.22 645.21 MB 1 49.5% e5a97fd6-4c1d-48f2-9493-ee811539e8a4 RAC1
5 - Run nodetool cleanup on 192.168.56.21 and 192.168.56.22
6 - Rechecked cqlsh on all nodes:
cqlsh> select * from markc.mark1 ;
<...truncated for brevity...>
(820 rows)
As you can see the results show some keys are missing now.
Solution
Ensure that any new nodes have bootstrapped into the cluster correctly. If starting with autobootstrap=false then ensure all repairs have run and / or nodetool rebuild has run correctly. A sanity check of keys for a given range could be taken before and after to check.