Summary
This article discusses an issue where the DSE NodeSync service does not startup as expected.
Applies to
- DataStax Enterprise 6.0.0 to 6.0.4
- DataStax Enterprise 6.7.0
Symptoms
When DSE is started on a node, the NodeSync service fails to start. Here are example log entries from a node running DSE 6.7.0:
ERROR [ValidationSchedulerEventExecutor:1] 2019-01-31 12:34:56,789 CassandraDaemon.java:605 - \ Unexpected error starting the NodeSync service. No tables will be validated by NodeSync. java.util.concurrent.CompletionException: java.lang.RuntimeException: \ com.google.common.util.concurrent.ExecutionError: java.lang.AssertionError at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:273) at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:280) ... Caused by: java.lang.RuntimeException: com.google.common.util.concurrent.ExecutionError: java.lang.AssertionError ... at com.datastax.bdp.db.nodesync.NodeSyncService$Instance.lambda$start$0(NodeSyncService.java:158) at java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:822) ... 8 common frames omitted Caused by: com.google.common.util.concurrent.ExecutionError: java.lang.AssertionError ... at com.datastax.bdp.db.nodesync.NodeSyncState.getOrLoad(NodeSyncState.java:106) at com.datastax.bdp.db.nodesync.ValidationScheduler.lambda$null$3(ValidationScheduler.java:241) ... Caused by: java.lang.AssertionError: null at com.datastax.bdp.db.nodesync.NodeSyncRecord.(NodeSyncRecord.java:75) at com.datastax.bdp.db.nodesync.NodeSyncRecord.consolidate(NodeSyncRecord.java:168) at com.datastax.bdp.db.nodesync.TableState$StateHolder.populateRangeFromStatusTable(TableState.java:603) at com.datastax.bdp.db.nodesync.TableState$StateHolder.lambda$populateFromStatusTable$0(TableState.java:590) at java.lang.Iterable.forEach(Iterable.java:75) ...
Note that a similar error is logged when manually starting the service using the nodetool nodesyncservice enable
command.
Cause
In some circumstances, data segments to be synchronised might get calculated incorrectly. This leads to issues during startup when NodeSync tries to determine whether segments have been (or have not been) successfully validated (internal ID DB-2280).
Workaround
Clear NodeSync historical data to force validation segments to be re-calculated. Performing these steps will not have a negative impact on the operation of the cluster. It will have the same effect as NodeSync being enabled for the very first time.
Step 1 - Clear historical data by truncating the following tables:
cqlsh> TRUNCATE system_distributed.nodesync_status ; cqlsh> TRUNCATE system_distributed.nodesync_user_validations ;
Step 2 - Start the NodeSync service on the affected node:
$ nodetool nodesyncservice enable
Solution
DB-2280 has been fixed in DSE 6.0.5 and 6.7.1. Upgrade to the latest DSE release applicable to the affected cluster.