ERROR Message Example
ERROR [main] 2019-11-26 17:19:12,019 HintsDescriptor.java:251 - Failed to deserialize hints descriptor /xxx/cassandra/hints/dxxxx-xxxx-xxxx-xxxx-20000000-10000000-1.hints
TYPICALLY PRODUCES A STACK TRACE SIMILAR TO THE FOLLOWING (STACK TRACE WILL DIFFER BASED ON VERSION):
java.io.EOFException: null
at java.io.RandomAccessFile.readInt(RandomAccessFile.java:803)
at org.apache.cassandra.hints.HintsDescriptor.deserialize(HintsDescriptor.java:398)
at org.apache.cassandra.hints.HintsDescriptor.readFromFileQuietly(HintsDescriptor.java:243)
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
at java.util.Iterator.forEachRemaining(Iterator.java:116)
at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
at org.apache.cassandra.hints.HintsCatalog.load(HintsCatalog.java:69)
at org.apache.cassandra.hints.HintsService.<init>(HintsService.java:99)
at org.apache.cassandra.hints.HintsService.<init>(HintsService.java:90)
at org.apache.cassandra.hints.HintsService.<clinit>(HintsService.java:68)
at org.apache.cassandra.service.StorageProxy.<clinit>(StorageProxy.java:124)
What does this ERROR message mean?
This error can be generated from the Cassandra main process during the node startup when reading the hints files or from the hints dispatcher process when dispatching the hints records to other replicas. This error indicates the hints file got corrupted so that Cassandra cannot de-serialize the records in the hints file.
Why does this ERROR occur?
The error typically occurs due to the following reasons:
- Abnormal shut down of the nodes which can cause the incomplete hints records being written to the hints file
- Known issues in CASSANDRA-11090 which can generate the empty hints file
- File system issue which can cause the hints file being corrupted when hints records are flushed to the disk
How do you fix this ERROR?
When encountering such issue, users can remove the corrupted hints files from the hints directory according to the error messages(/xxx/cassandra/hints/dxxxx-xxxx-xxxx-xxxx-20000000-10000000-1.hints) which will prevent Cassandra process reading hints records from the problematic hints files.
Users need to check if any of the following scenarios were encountered:
Abnormal shut down of the nodes which can cause the incomplete hints records being written to the hints file.
Examine the system.log for signs that the node was not shut down normally. For example, there are no such normal DRAIN messages before startup messages:
2020-08-31 03:11:03,603 StorageService.java:1523 - DRAINING: flushing column families
INFO [main] 2020-08-31 03:24:34,190 YamlConfigurationLoader.java:89 - Configuration location: file:/etc/cassandra/cassandra.yaml
INFO [main] 2020-08-31 03:24:34,400 Config.java:516 - Node configuration:[xxx]
...
StorageService.java:870 - Starting up server gossip
Signs that the Cassandra process was killed by external parties like OS OOM Killer.
Examine the OS messages file under /var/log/ directory:
[ 54.125380] Out of memory: Kill process 8320 (cassandra) score 324 or sacrifice child
[ 54.125382] Killed process 8320 (cassandra) total-vm:1309660kB, anon-rss:1287796kB, file-rss:76kB
Users should always follow the correct procedure to stop Cassandra:
Run nodetool drain first. This will make sure all the hints and memtable records to be flushed to the disk and stop accepting requests from the client.
Also users need to stabilize the environment to avoid any external party(like OS OOM Killer) killing Cassandra process abnormally.
Known issues in CASSANDRA-11090 which can generate the empty hints file
Check if the Cassandra version is affected by this JIRA (e.g prior to 3.0.4 or 3.4) and if the hints file is in zero bytes (empty). If the Cassandra version is affected by this JIRA, the upgrade is required (upgrade to the latest 3.0.x, 3.11.x or 4.0.x versions).
File system issue which can cause the hints file being corrupted when hints records are flushed on the disk
If any file system issues occurred or are suspected to have occurred, run the following fsck command to verify the health of the disk where the hints files are located:
fsck -A <file system name where the hints files are located>
If there is any non-zero code returned, more investigation will be required for the file system.