DataStax Help Center

When auto-bootstrap=false can lead to the loss of unreplicated data

Summary

Sometimes users may run nodetool cleanup. In most cases this is safe to do so, however you need to make sure that the addition of any new nodes into the cluster is complete before running cleanup

Symptoms

After new nodes are added into the cluster and running cleanup, there appears to be data loss.

Cause

If new nodes have not joined the cluster completely then cleanup will remove the data from the original owning nodes

Example

The following example demonstrates this particular scenario

Test case - running cleanup before rebuild / bootstrap / repair

1 - 2 node cluster, keyspace using RF:2

--  Address        Load       Tokens  Owns   Host ID                               Rack
UN  192.168.56.21  513.37 MB  1       50.5%  6e6aadea-d7a2-4f4a-ae3d-b28ecd78acf4  RAC1
UN  192.168.56.22  645.21 MB  1       49.5%  e5a97fd6-4c1d-48f2-9493-ee811539e8a4  RAC1

2 - Inserted 1000 records, the following cqlsh query was run on nodes 192.168.56.21 and 192.168.56.22

cqlsh> select * from markc.mark1 ;

<...truncated for brevity...>

(1000 rows)

3 - Added a 3rd node 192.168.56.23 with autobootstrap = false
4 - Ensured node joined the cluster ok (no data is streamed)

--  Address        Load       Tokens  Owns   Host ID                               Rack
UN  192.168.56.21  511.83 MB  1       32.4%  6e6aadea-d7a2-4f4a-ae3d-b28ecd78acf4  RAC1
UN  192.168.56.23  63.58 KB   1       18.1%  96d4d58f-f654-4e5d-81ad-77ca9e6ce983  RAC1
UN  192.168.56.22  645.21 MB  1       49.5%  e5a97fd6-4c1d-48f2-9493-ee811539e8a4  RAC1

5 - Run nodetool cleanup on 192.168.56.21 and 192.168.56.22
6 - Rechecked cqlsh on all nodes:

cqlsh> select * from markc.mark1 ;

<...truncated for brevity...>

(820 rows)

As you can see the results show some keys are missing now.

Solution

Ensure that any new nodes have bootstrapped into the cluster correctly. If starting with autobootstrap=false then ensure all repairs have run and / or nodetool rebuild has run correctly. A sanity check of keys for a given range could be taken before and after to check.

Was this article helpful?
0 out of 0 found this helpful
Have more questions? Submit a request

Comments

Powered by Zendesk