Hints files that contain an unknown column family id may cause nodes to stop and also fail to restart
Applies to
DSE5.0.0 to DSE 5.0.11, DSE 5.1.0 to DSE5.1.4 inclusive
Symptoms
The following exception may be seen when the node tried to replay hints from a corrupted hint file:
ERROR [HintsDispatcher:59] 2018-03-07 15:00:01 HintsDispatchExecutor.java:243 - Failed to dispatch hints file 81ce70a6-91c8-4956-b2c3-26db27cb37fc-1520427216096-1.hints: file is corrupted ({})
org.apache.cassandra.io.FSReadError: java.io.IOException: Digest mismatch exception
Cause
This issue is caused by the following Cassandra bug:
https://issues.apache.org/jira/browse/CASSANDRA-13696
A hints file that contains an unknown column family on the node will cause the hints file to fail to replay. This in turn can cause a i/o exception and either cause the node to stop and / or fail to restart.
Workaround
Deleting the offending hint file should mitigate the situation and allow the node to be restarted. Note the logs may not always specify the actual file but searching for all or part of the UUID in the message above on the node's filesystem will help to locate the hints file.
Although you should always run repair in your cluster as part of normal operations, when deleting hints files, it is advisable to run repair afterwards as soon as practically possible.
Solution
Upgrade to DSE5.0.12, DSE5.1.4 or later
Comments
0 comments
Please sign in to leave a comment.